Sample records for jade featuring selections

  1. Accessing files in an Internet: The Jade file system

    NASA Technical Reports Server (NTRS)

    Peterson, Larry L.; Rao, Herman C.

    1991-01-01

    Jade is a new distribution file system that provides a uniform way to name and access files in an internet environment. It makes two important contributions. First, Jade is a logical system that integrates a heterogeneous collection of existing file systems, where heterogeneous means that the underlying file systems support different file access protocols. Jade is designed under the restriction that the underlying file system may not be modified. Second, rather than providing a global name space, Jade permits each user to define a private name space. These private name spaces support two novel features: they allow multiple file systems to be mounted under one directory, and they allow one logical name space to mount other logical name spaces. A prototype of the Jade File System was implemented on Sun Workstations running Unix. It consists of interfaces to the Unix file system, the Sun Network File System, the Andrew File System, and FTP. This paper motivates Jade's design, highlights several aspects of its implementation, and illustrates applications that can take advantage of its features.

  2. Accessing files in an internet - The Jade file system

    NASA Technical Reports Server (NTRS)

    Rao, Herman C.; Peterson, Larry L.

    1993-01-01

    Jade is a new distribution file system that provides a uniform way to name and access files in an internet environment. It makes two important contributions. First, Jade is a logical system that integrates a heterogeneous collection of existing file systems, where heterogeneous means that the underlying file systems support different file access protocols. Jade is designed under the restriction that the underlying file system may not be modified. Second, rather than providing a global name space, Jade permits each user to define a private name space. These private name spaces support two novel features: they allow multiple file systems to be mounted under one directory, and they allow one logical name space to mount other logical name spaces. A prototype of the Jade File System was implemented on Sun Workstations running Unix. It consists of interfaces to the Unix file system, the Sun Network File System, the Andrew File System, and FTP. This paper motivates Jade's design, highlights several aspects of its implementation, and illustrates applications that can take advantage of its features.

  3. The Jade File System. Ph.D. Thesis

    NASA Technical Reports Server (NTRS)

    Rao, Herman Chung-Hwa

    1991-01-01

    File systems have long been the most important and most widely used form of shared permanent storage. File systems in traditional time-sharing systems, such as Unix, support a coherent sharing model for multiple users. Distributed file systems implement this sharing model in local area networks. However, most distributed file systems fail to scale from local area networks to an internet. Four characteristics of scalability were recognized: size, wide area, autonomy, and heterogeneity. Owing to size and wide area, techniques such as broadcasting, central control, and central resources, which are widely adopted by local area network file systems, are not adequate for an internet file system. An internet file system must also support the notion of autonomy because an internet is made up by a collection of independent organizations. Finally, heterogeneity is the nature of an internet file system, not only because of its size, but also because of the autonomy of the organizations in an internet. The Jade File System, which provides a uniform way to name and access files in the internet environment, is presented. Jade is a logical system that integrates a heterogeneous collection of existing file systems, where heterogeneous means that the underlying file systems support different file access protocols. Because of autonomy, Jade is designed under the restriction that the underlying file systems may not be modified. In order to avoid the complexity of maintaining an internet-wide, global name space, Jade permits each user to define a private name space. In Jade's design, we pay careful attention to avoiding unnecessary network messages between clients and file servers in order to achieve acceptable performance. Jade's name space supports two novel features: (1) it allows multiple file systems to be mounted under one direction; and (2) it permits one logical name space to mount other logical name spaces. A prototype of Jade was implemented to examine and validate its

  4. [Study on the vibrational spectra and XRD characters of Huanglong jade from Longling County, Yunnan Province].

    PubMed

    Pei, Jing-cheng; Fan, Lu-wei; Xie, Hao

    2014-12-01

    Based on the conventional test methods, the infrared absorption spectrum, Raman spectrum and X-ray diffraction (XRD) were employed to study the characters of the vibration spectrum and mineral composition of Huanglong jade. The testing results show that Huanglong jade shows typical vibrational spectrum characteristics of quartziferous jade. The main infrared absorption bands at 1162, 1076, 800, 779, 691, 530 and 466 cm(-1) were induced by the asymmetric stretching vibration, symmetrical stretching vibration and bending vibration of Si-O-Si separately. Especially the absorption band near 800 cm(-1) is split, which indicates that Huanglong jade has good crystallinity. In Raman spectrum, the main strong vibration bands at 463 and 355 cm(-1) were attributed to bending vibration of Si-O-Si. XRD test confirmed that Quartz is main mineral composition of Huanglong jade and there is a small amount of hematite in red color samples which induced the red color of Huanglong jade. This is the first report on the infrared, Raman and XRD spectra feature of Huanglong jade. It will provide a scientific basis for the identification, naming and other research for huanglong jade.

  5. Mineral identification of black-jade gemstone from Aceh Indonesia

    NASA Astrophysics Data System (ADS)

    Ismail; Nizar, Akmal; Mursal

    2018-04-01

    One of the gemstones in Aceh Indonesia is called black-jade where the name of black-jade is a local name. Unfortunately, detail information about this gemstone is still limited. No one knows whether this gemstone can be categorized as jade or not until this study is presented. We have utilized X-Ray Fluorescent (XRF) to study the black-jade gemstone from Aceh Tengah (Takengon) and Nagan Raya regions in Indonesia. Our results show that the black-jade gemstone from Aceh Tengah contains 39.6% of SiO2, 35% of Fe2O3, 17% of MgO, 3% of CaO, and 2% of NiO. While, the black-jade gemstone from Nagan Raya contains a little bit less SiO2 but more Fe2O3 than that of black-jade from Aceh Tengah: 38.4% of SiO2, 39% of Fe2O3, 17% of MgO, 0.5% of CaO, and 2.6% of NiO. By comparing the results to the available mineral data (jadeite, nephrite-actinolite, nephrite-tremolite, serpentine-clinochrysotile, serpentine-antigorite, and vesuvianite), we found that oxide compounds contained in the black-jade gemstone from Aceh are found in the serpentine-antigorite, except H2O. The total difference between the oxide compositions in black-jade and serpentine-antigorite is 43% with its average difference of 11%. This means that the oxide composition in black-jade is almost the same as in the serpentine-antigorite. Accordingly, the black-jade gemstone from Aceh Indonesia is a type of serpentine-antigorite-jade.

  6. The Jovian Auroral Distributions Experiment (JADE) on the Juno Mission to Jupiter

    NASA Astrophysics Data System (ADS)

    McComas, D. J.; Alexander, N.; Allegrini, F.; Bagenal, F.; Beebe, C.; Clark, G.; Crary, F.; Desai, M. I.; De Los Santos, A.; Demkee, D.; Dickinson, J.; Everett, D.; Finley, T.; Gribanova, A.; Hill, R.; Johnson, J.; Kofoed, C.; Loeffler, C.; Louarn, P.; Maple, M.; Mills, W.; Pollock, C.; Reno, M.; Rodriguez, B.; Rouzaud, J.; Santos-Costa, D.; Valek, P.; Weidner, S.; Wilson, P.; Wilson, R. J.; White, D.

    2017-11-01

    The Jovian Auroral Distributions Experiment (JADE) on Juno provides the critical in situ measurements of electrons and ions needed to understand the plasma energy particles and processes that fill the Jovian magnetosphere and ultimately produce its strong aurora. JADE is an instrument suite that includes three essentially identical electron sensors (JADE-Es), a single ion sensor (JADE-I), and a highly capable Electronics Box (EBox) that resides in the Juno Radiation Vault and provides all necessary control, low and high voltages, and computing support for the four sensors. The three JADE-Es are arrayed 120∘ apart around the Juno spacecraft to measure complete electron distributions from ˜0.1 to 100 keV and provide detailed electron pitch-angle distributions at a 1 s cadence, independent of spacecraft spin phase. JADE-I measures ions from ˜5 eV to ˜50 keV over an instantaneous field of view of 270∘×90∘ in 4 s and makes observations over all directions in space each 30 s rotation of the Juno spacecraft. JADE-I also provides ion composition measurements from 1 to 50 amu with m/Δ m˜2.5, which is sufficient to separate the heavy and light ions, as well as O+ vs S+, in the Jovian magnetosphere. All four sensors were extensively tested and calibrated in specialized facilities, ensuring excellent on-orbit observations at Jupiter. This paper documents the JADE design, construction, calibration, and planned science operations, data processing, and data products. Finally, the Appendix describes the Southwest Research Institute [SwRI] electron calibration facility, which was developed and used for all JADE-E calibrations. Collectively, JADE provides remarkably broad and detailed measurements of the Jovian auroral region and magnetospheric plasmas, which will surely revolutionize our understanding of these important and complex regions.

  7. Electron Pitch Angle Distributions Along Field Lines Connected to the Auroral Region from 25 to 1.2 RJ Measured by the Jovian Auroral Distributions Experiment-Electrons (JADE-E) on Juno

    NASA Astrophysics Data System (ADS)

    Allegrini, F.; Bagenal, F.; Bolton, S. J.; Bonfond, B.; Chae, K.; Clark, G. B.; Connerney, J. E. P.; Ebert, R. W.; Gladstone, R.; Hue, V.; Hospodarsky, G. B.; Kim, T. K. H.; Kurth, W. S.; Levin, S.; Louarn, P.; Mauk, B.; McComas, D. J.; Pollock, C. J.; Ranquist, D. A.; Reno, M. L.; Saur, J.; Szalay, J.; Thomsen, M. F.; Valek, P. W.; Wilson, R. J.

    2017-12-01

    The Jovian Auroral Distributions Experiment (JADE) on Juno provides critical in situ measurements of electrons and ions needed to understand the plasma distributions and processes that fill the Jovian magnetosphere and ultimately produce Jupiter's bright and dynamic aurora. JADE is an instrument suite that includes two essentially identical electron sensors (JADE-Es) and a single ion sensor (JADE-I). JADE-E measures electron energy distributions from 0.1 to 100 keV and provides detailed electron pitch angle distributions (PAD) at 7.5° resolution. Juno's trajectories in the northern hemisphere have allowed JADE to sample electron energy and pitch angle distributions on field lines connected to the auroral regions from as close as 1.2 RJ all the way to distances greater than 25 RJ. Here, we report on the evolution of these distributions. Specifically, the PADs change from mostly uniform at distances greater than 20 RJ, to butterfly from 18 to 12 RJ, to field aligned or pancake, depending on the energy, closer to Jupiter. Below 1.5 RJ, electron beams and loss cones are observed.

  8. Cold blobs of protons in Jupiter's outer magnetosphere as observed by Juno's JADE

    NASA Astrophysics Data System (ADS)

    Wilson, R. J.; Bagenal, F.; Valek, P. W.; Allegrini, F.; Angold, N. G.; Chae, K.; Ebert, R. W.; Kim, T. K. H.; Loeffler, C.; Louarn, P.; McComas, D. J.; Pollock, C. J.; Ranquist, D. A.; Reno, C.; Szalay, J. R.; Thomsen, M. F.; Weidner, S.; Bolton, S. J.; Levin, S.

    2017-12-01

    Juno's 53-day polar orbits cut through the equatorial plane when inbound to perijove. The JADE instrument has been observing thermal ions (0.01-50 keV/q) and electrons (0.1-100 keV/q) in these regions since Orbit 05. Even at distances greater than 70 RJ, magnetodisk crossings are clear with high count rates measured before returning to rarified plasma conditions outside the disk. However JADE's detectors observes regions of slightly greater ion counts that last for about an hour. The ion counts are too low to analyze at the typical 30s or 60s low rate instrument cadence, but by summing to 10-minute resolution the features become analyzable. We find these regions are populated with protons with higher density than those typically observed outside the magnetodisk, and that they are colder than the ambient plasma. Reanalysis of Voyager data (DOI: 10.1002/2017JA024053) also showed cold dense blobs of plasma in the inner to middle magnetosphere, however these were of heavier ion species, short lived (several minutes) and within 40 RJ of Jupiter. This presentation will investigate the JADE identified cold blobs observed to date and compare with those observed with Voyager.

  9. [Identification of B jade by Raman spectroscopy].

    PubMed

    Zu, En-dong; Chen, Da-peng; Zhang, Peng-xiang

    2003-02-01

    Raman spectroscopy has been found to be a useful tool for identification of bleached and polymer-impregnated jadeites (so-called B jade). The major advantage of this system over classical methods of gem testing is the non-destructive identification of inclusions in gemstones and the determination of organic fracture filling in jade. Fissures in jadeites have been filled with oils and various resins to enhance their clarity, such as paraffin wax, paraffin oil, AB glue and epoxy resins. They show different peaks depending on their chemical composition. The characteristic spectrum ranges from 1,200-1,700 cm-1 to 2,800-3,100 cm-1. The spectra of resins show that they all have four strongest peaks related with phenyl: two C-C stretching modes at 1,116 and 1,609 cm-1, respectively, one C-H stretching mode at 3,069 cm-1, and a in-plane C-H bending mode at 1,189 cm-1. In addition, other two -CH2, -CH3 stretching modes at 2,906 and 2,869 cm-1, respectively, are very similar to paraffin. Therefore, the peaks at 1,116, 1,609, 1,189 and 3,069 cm-1 are important in distinguishing resin from paraffin, and we can identify B jade depending on them.

  10. The first year of observations of Jupiter's magnetosphere from Juno's Jovian Auroral Distributions Experiment (JADE)

    NASA Astrophysics Data System (ADS)

    Valek, P. W.; Allegrini, F.; Angold, N. G.; Bagenal, F.; Bolton, S. J.; Chae, K.; Connerney, J. E. P.; Ebert, R. W.; Gladstone, R.; Kim, T. K. H.; Kurth, W. S.; Levin, S.; Louarn, P.; Loeffler, C. E.; Mauk, B.; McComas, D. J.; Pollock, C. J.; Reno, M. L.; Szalay, J. R.; Thomsen, M. F.; Weidner, S.; Wilson, R. J.

    2017-12-01

    Juno observations of the Jovian plasma environment are made by the Jovian Auroral Distributions Experiment (JADE) which consists of two nearly identical electron sensors - JADE-E - and an ion sensor - JADE-I. JADE-E measures the electron distribution in the range of 100 eV to 100 keV and uses electrostatic deflection to measure the full pitch angle distribution. JADE-I measures the composition separated energy per charge in the range of 10 eV / q to 46 keV / q. The large orbit - apojove 110 Rj, perijove 1.05 Rj - allows JADE to periodically cross through the magnetopause into the magnetosheath, transverse the outer, middle, and inner magnetosphere, and measures the plasma population down to the ionosphere. We present here in situ plasma observations of the Jovian magnetosphere and topside ionosphere made by the JADE instrument during the first year in orbit. Dawn-side crossings of the plasmapause have shown a general dearth of heavy ions except during some intervals at lower magnetic latitudes. Plasma disk crossings in the middle and inner magnetosphere show a mixture of heavy and light ions. During perijove crossings at high latitudes when Juno was connected to the Io torus, JADE-I observed heavy ions with energies consistent with a corotating pickup population. In the auroral regions the core of the electron energy distribution is generally from about 100 eV when on field lines that are connected to the inner plasmasheet, several keVs when connected to the outer plasmasheet, and tens of keVs when Juno is over the polar regions. JADE has observed upward electron beams and upward loss cones, both in the north and south auroral regions, and downward electron beams in the south. Some of the beams are of short duration ( 1 s) implying that the magnetosphere has a very fine spatial and/or temporal structure within the auroral regions. Joint observations with the Waves instrument have demonstrated that the observed loss cone distributions provide sufficient growth rates

  11. Online feature selection with streaming features.

    PubMed

    Wu, Xindong; Yu, Kui; Ding, Wei; Wang, Hao; Zhu, Xingquan

    2013-05-01

    We propose a new online feature selection framework for applications with streaming features where the knowledge of the full feature space is unknown in advance. We define streaming features as features that flow in one by one over time whereas the number of training examples remains fixed. This is in contrast with traditional online learning methods that only deal with sequentially added observations, with little attention being paid to streaming features. The critical challenges for Online Streaming Feature Selection (OSFS) include 1) the continuous growth of feature volumes over time, 2) a large feature space, possibly of unknown or infinite size, and 3) the unavailability of the entire feature set before learning starts. In the paper, we present a novel Online Streaming Feature Selection method to select strongly relevant and nonredundant features on the fly. An efficient Fast-OSFS algorithm is proposed to improve feature selection performance. The proposed algorithms are evaluated extensively on high-dimensional datasets and also with a real-world case study on impact crater detection. Experimental results demonstrate that the algorithms achieve better compactness and higher prediction accuracy than existing streaming feature selection algorithms.

  12. 76 FR 45646 - Culturally Significant Objects Imported for Exhibition Determinations: “5,000 Years of Chinese...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-07-29

    ... DEPARTMENT OF STATE [Public Notice: 7540] Culturally Significant Objects Imported for Exhibition Determinations: ``5,000 Years of Chinese Jade Featuring Selections From the National Museum of Taiwan and the... ``5,000 Years of Chinese Jade Featuring Selections from the National Museum of Taiwan and the Arthur M...

  13. From design to implementation - The Joint Asia Diabetes Evaluation (JADE) program: A descriptive report of an electronic web-based diabetes management program

    PubMed Central

    2010-01-01

    Background The Joint Asia Diabetes Evaluation (JADE) Program is a web-based program incorporating a comprehensive risk engine, care protocols, and clinical decision support to improve ambulatory diabetes care. Methods The JADE Program uses information technology to facilitate healthcare professionals to create a diabetes registry and to deliver an evidence-based care and education protocol tailored to patients' risk profiles. With written informed consent from participating patients and care providers, all data are anonymized and stored in a databank to establish an Asian Diabetes Database for research and publication purpose. Results The JADE electronic portal (e-portal: http://www.jade-adf.org) is implemented as a Java application using the Apache web server, the mySQL database and the Cocoon framework. The JADE e-portal comprises a risk engine which predicts 5-year probability of major clinical events based on parameters collected during an annual comprehensive assessment. Based on this risk stratification, the JADE e-portal recommends a care protocol tailored to these risk levels with decision support triggered by various risk factors. Apart from establishing a registry for quality assurance and data tracking, the JADE e-portal also displays trends of risk factor control at each visit to promote doctor-patient dialogues and to empower both parties to make informed decisions. Conclusions The JADE Program is a prototype using information technology to facilitate implementation of a comprehensive care model, as recommended by the International Diabetes Federation. It also enables health care teams to record, manage, track and analyze the clinical course and outcomes of people with diabetes. PMID:20465815

  14. From design to implementation--the Joint Asia Diabetes Evaluation (JADE) program: a descriptive report of an electronic web-based diabetes management program.

    PubMed

    Ko, Gary T; So, Wing-Yee; Tong, Peter C; Le Coguiec, Francois; Kerr, Debborah; Lyubomirsky, Greg; Tamesis, Beaver; Wolthers, Troels; Nan, Jennifer; Chan, Juliana

    2010-05-13

    The Joint Asia Diabetes Evaluation (JADE) Program is a web-based program incorporating a comprehensive risk engine, care protocols, and clinical decision support to improve ambulatory diabetes care. The JADE Program uses information technology to facilitate healthcare professionals to create a diabetes registry and to deliver an evidence-based care and education protocol tailored to patients' risk profiles. With written informed consent from participating patients and care providers, all data are anonymized and stored in a databank to establish an Asian Diabetes Database for research and publication purpose. The JADE electronic portal (e-portal: http://www.jade-adf.org) is implemented as a Java application using the Apache web server, the mySQL database and the Cocoon framework. The JADE e-portal comprises a risk engine which predicts 5-year probability of major clinical events based on parameters collected during an annual comprehensive assessment. Based on this risk stratification, the JADE e-portal recommends a care protocol tailored to these risk levels with decision support triggered by various risk factors. Apart from establishing a registry for quality assurance and data tracking, the JADE e-portal also displays trends of risk factor control at each visit to promote doctor-patient dialogues and to empower both parties to make informed decisions. The JADE Program is a prototype using information technology to facilitate implementation of a comprehensive care model, as recommended by the International Diabetes Federation. It also enables health care teams to record, manage, track and analyze the clinical course and outcomes of people with diabetes.

  15. 76 FR 52378 - Culturally Significant Objects Imported for Exhibition

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-08-22

    ... Determinations: ``5,000 Years of Chinese Jade Featuring Selections From the National Museum of History, Taiwan..., which is ``5,000 Years of Chinese Jade Featuring Selections from the National Museum of History, Taiwan... objects is at the San Antonio Museum of Art, San Antonio, TX, from on or about October 1, 2011, until on...

  16. Latitudinal distribution of the Jovian plasma sheet ions observed by Juno JADE-I

    NASA Astrophysics Data System (ADS)

    Kim, T. K. H.; Valek, P. W.; McComas, D. J.; Allegrini, F.; Bagenal, F.; Bolton, S. J.; Connerney, J. E. P.; Ebert, R. W.; Levin, S.; Louarn, P.; Pollock, C. J.; Ranquist, D. A.; Szalay, J.; Thomsen, M. F.; Wilson, R. J.

    2017-12-01

    The Jovian plasma sheet is a region where the centrifugal force dominates the heavy ion plasma. Properties of the plasma sheet ions near the equatorial plane have been studied with in-situ measurements from the Pioneer, Voyager, and Galileo spacecraft. However, the ion properties for the off-equator regions are not well known due to the limited measurements. Juno is the first polar orbiting spacecraft that can investigate the high latitude region of the Jovian magnetosphere. With Juno's unique trajectory, we will investigate the latitudinal distribution of the Jovian plasma sheet ions using measurements from the Jovian Auroral Distributions Experiment Ion sensor (JADE-I). JADE-I measures an ion's energy-per-charge (E/Q) from 0.01 keV/q to 46.2 keV/q with an electrostatic analyzer (ESA) and a mass-per-charge (M/Q) up to 64 amu/q with a carbon-foil-based time-of-flight (TOF) mass spectrometer. We have shown that the ambiguity between and (both have M/Q of 16) can be resolved in JADE-I using a semi-empirical simulation tool based on carbon foil effects (i.e., charge state modification, angular scattering, and energy loss) from incident ions passing through the TOF mass spectrometer. Based on the simulation results, we have developed an Ion Composition Analysis Tool (ICAT) that determines ion composition at each energy step of JADE-I (total of 64 steps). The velocity distribution for each ion species can be obtained from the ion composition as a function of each energy step. Since there is an ambipolar electric field due to mobile electrons and equatorially confined heavy ions, we expect to see acceleration along the field line. This study will show the species separated velocity distribution at various latitudes to investigate how the plasma sheet ions evolve along the field line.

  17. jade: An End-To-End Data Transfer and Catalog Tool

    NASA Astrophysics Data System (ADS)

    Meade, P.

    2017-10-01

    The IceCube Neutrino Observatory is a cubic kilometer neutrino telescope located at the Geographic South Pole. IceCube collects 1 TB of data every day. An online filtering farm processes this data in real time and selects 10% to be sent via satellite to the main data center at the University of Wisconsin-Madison. IceCube has two year-round on-site operators. New operators are hired every year, due to the hard conditions of wintering at the South Pole. These operators are tasked with the daily operations of running a complex detector in serious isolation conditions. One of the systems they operate is the data archiving and transfer system. Due to these challenging operational conditions, the data archive and transfer system must above all be simple and robust. It must also share the limited resource of satellite bandwidth, and collect and preserve useful metadata. The original data archive and transfer software for IceCube was written in 2005. After running in production for several years, the decision was taken to fully rewrite it, in order to address a number of structural drawbacks. The new data archive and transfer software (JADE2) has been in production for several months providing improved performance and resiliency. One of the main goals for JADE2 is to provide a unified system that handles the IceCube data end-to-end: from collection at the South Pole, all the way to long-term archive and preservation in dedicated repositories at the North. In this contribution, we describe our experiences and lessons learned from developing and operating the data archive and transfer software for a particle physics experiment in extreme operational conditions like IceCube.

  18. Characterization of Mesoamerican jade

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bishop, R.L.; Sayre, E.V.; van Zelst, L.

    1983-11-23

    Jadeite occurring in the Motague River Valley of Guatemala has been characterized by neutron activation analysis and forms two district, phase-related groups. Comparison of the compositional profiles of Mayan jadeite artifacts reveals many specimens having profiles matching those of the Montagua source. Of particular interest are the large number of jadeite artifacts which show internal similarity yet have compositional patterns which are significantly different from the Montagua samples and Montagua-related artifacts. A few of the analyzed Costa Rican artifacts show patterns similar to those of the Motagua yet the vast majority fall within one of the two Costa Rican compositionalmore » groups. When considering the non-Motagua related Mayan artifacts, the analytical approach appears to be sufficiently sensitive so as to distinguish differences between the Chrome-green and Chichen-green material. Even two Honduran site specific groups of albite - cultural jade - form distinct groups.« less

  19. Fluoro-Jade and TUNEL staining as useful tools to identify ischemic brain damage following moderate extradural compression of sensorimotor cortex.

    PubMed

    Kundrotiene, Jurgita; Wägner, Anna; Liljequist, Sture

    2004-01-01

    Cerebral ischemia was produced by moderate compression for 30 min of a specific brain area in the sensorimotor cortex of Sprague-Dawley rats. On day 1, that is 24 h after the transient sensorimotor compression, ischemia-exposed animals displayed a marked focal neurological deficit documented as impaired beam walking performance. This functional disturbance was mainly due to contralateral fore- and hind-limb paresis. As assessed by daily beam walking tests it was shown that there was a spontaneous recovery of motor functions over a period of five to seven days after the ischemic event. Using histopathological analysis (Nissl staining) we have previously reported that the present experimental paradigm does not produce pannecrosis (tissue cavitation) despite the highly reproducible focal neurological deficit. We now show how staining with fluorescent markers for neuronal death, that is Fluoro-Jade and TUNEL, respectively, identifies regional patterns of selective neuronal death. These observations add further support to the working hypothesis that the brain damage caused by cortical compression-induced ischemia consists of scattered, degenerating neurons in specific brain regions. Postsurgical administration of the AMPA receptor specific antagonist, LY326325 (30 mg/kg; i.p., 70 min after compression), not only improved beam walking performance on day 1 to 3, respectively but also significantly reduced the number of Fluoro-Jade stained neurons on day 5. These results suggest that enhanced AMPA/glutamate receptor activity is at least partially responsible for the ischemia-produced brain damage detected by the fluorescent marker Fluoro-Jade.

  20. An On-the-Fly Surface-Hopping Program JADE for Nonadiabatic Molecular Dynamics of Polyatomic Systems: Implementation and Applications.

    PubMed

    Du, Likai; Lan, Zhenggang

    2015-04-14

    Nonadiabatic dynamics simulations have rapidly become an indispensable tool for understanding ultrafast photochemical processes in complex systems. Here, we present our recently developed on-the-fly nonadiabatic dynamics package, JADE, which allows researchers to perform nonadiabatic excited-state dynamics simulations of polyatomic systems at an all-atomic level. The nonadiabatic dynamics is based on Tully's surface-hopping approach. Currently, several electronic structure methods (CIS, TDHF, TDDFT(RPA/TDA), and ADC(2)) are supported, especially TDDFT, aiming at performing nonadiabatic dynamics on medium- to large-sized molecules. The JADE package has been interfaced with several quantum chemistry codes, including Turbomole, Gaussian, and Gamess (US). To consider environmental effects, the Langevin dynamics was introduced as an easy-to-use scheme into the standard surface-hopping dynamics. The JADE package is mainly written in Fortran for greater numerical performance and Python for flexible interface construction, with the intent of providing open-source, easy-to-use, well-modularized, and intuitive software in the field of simulations of photochemical and photophysical processes. To illustrate the possible applications of the JADE package, we present a few applications of excited-state dynamics for various polyatomic systems, such as the methaniminium cation, fullerene (C20), p-dimethylaminobenzonitrile (DMABN) and its primary amino derivative aminobenzonitrile (ABN), and 10-hydroxybenzo[h]quinoline (10-HBQ).

  1. Attentional Selection of Feature Conjunctions Is Accomplished by Parallel and Independent Selection of Single Features.

    PubMed

    Andersen, Søren K; Müller, Matthias M; Hillyard, Steven A

    2015-07-08

    Experiments that study feature-based attention have often examined situations in which selection is based on a single feature (e.g., the color red). However, in more complex situations relevant stimuli may not be set apart from other stimuli by a single defining property but by a specific combination of features. Here, we examined sustained attentional selection of stimuli defined by conjunctions of color and orientation. Human observers attended to one out of four concurrently presented superimposed fields of randomly moving horizontal or vertical bars of red or blue color to detect brief intervals of coherent motion. Selective stimulus processing in early visual cortex was assessed by recordings of steady-state visual evoked potentials (SSVEPs) elicited by each of the flickering fields of stimuli. We directly contrasted attentional selection of single features and feature conjunctions and found that SSVEP amplitudes on conditions in which selection was based on a single feature only (color or orientation) exactly predicted the magnitude of attentional enhancement of SSVEPs when attending to a conjunction of both features. Furthermore, enhanced SSVEP amplitudes elicited by attended stimuli were accompanied by equivalent reductions of SSVEP amplitudes elicited by unattended stimuli in all cases. We conclude that attentional selection of a feature-conjunction stimulus is accomplished by the parallel and independent facilitation of its constituent feature dimensions in early visual cortex. The ability to perceive the world is limited by the brain's processing capacity. Attention affords adaptive behavior by selectively prioritizing processing of relevant stimuli based on their features (location, color, orientation, etc.). We found that attentional mechanisms for selection of different features belonging to the same object operate independently and in parallel: concurrent attentional selection of two stimulus features is simply the sum of attending to each of those

  2. Quantum-enhanced feature selection with forward selection and backward elimination

    NASA Astrophysics Data System (ADS)

    He, Zhimin; Li, Lvzhou; Huang, Zhiming; Situ, Haozhen

    2018-07-01

    Feature selection is a well-known preprocessing technique in machine learning, which can remove irrelevant features to improve the generalization capability of a classifier and reduce training and inference time. However, feature selection is time-consuming, particularly for the applications those have thousands of features, such as image retrieval, text mining and microarray data analysis. It is crucial to accelerate the feature selection process. We propose a quantum version of wrapper-based feature selection, which converts a classical feature selection to its quantum counterpart. It is valuable for machine learning on quantum computer. In this paper, we focus on two popular kinds of feature selection methods, i.e., wrapper-based forward selection and backward elimination. The proposed feature selection algorithm can quadratically accelerate the classical one.

  3. Selective Audiovisual Semantic Integration Enabled by Feature-Selective Attention.

    PubMed

    Li, Yuanqing; Long, Jinyi; Huang, Biao; Yu, Tianyou; Wu, Wei; Li, Peijun; Fang, Fang; Sun, Pei

    2016-01-13

    An audiovisual object may contain multiple semantic features, such as the gender and emotional features of the speaker. Feature-selective attention and audiovisual semantic integration are two brain functions involved in the recognition of audiovisual objects. Humans often selectively attend to one or several features while ignoring the other features of an audiovisual object. Meanwhile, the human brain integrates semantic information from the visual and auditory modalities. However, how these two brain functions correlate with each other remains to be elucidated. In this functional magnetic resonance imaging (fMRI) study, we explored the neural mechanism by which feature-selective attention modulates audiovisual semantic integration. During the fMRI experiment, the subjects were presented with visual-only, auditory-only, or audiovisual dynamical facial stimuli and performed several feature-selective attention tasks. Our results revealed that a distribution of areas, including heteromodal areas and brain areas encoding attended features, may be involved in audiovisual semantic integration. Through feature-selective attention, the human brain may selectively integrate audiovisual semantic information from attended features by enhancing functional connectivity and thus regulating information flows from heteromodal areas to brain areas encoding the attended features.

  4. Selective Audiovisual Semantic Integration Enabled by Feature-Selective Attention

    PubMed Central

    Li, Yuanqing; Long, Jinyi; Huang, Biao; Yu, Tianyou; Wu, Wei; Li, Peijun; Fang, Fang; Sun, Pei

    2016-01-01

    An audiovisual object may contain multiple semantic features, such as the gender and emotional features of the speaker. Feature-selective attention and audiovisual semantic integration are two brain functions involved in the recognition of audiovisual objects. Humans often selectively attend to one or several features while ignoring the other features of an audiovisual object. Meanwhile, the human brain integrates semantic information from the visual and auditory modalities. However, how these two brain functions correlate with each other remains to be elucidated. In this functional magnetic resonance imaging (fMRI) study, we explored the neural mechanism by which feature-selective attention modulates audiovisual semantic integration. During the fMRI experiment, the subjects were presented with visual-only, auditory-only, or audiovisual dynamical facial stimuli and performed several feature-selective attention tasks. Our results revealed that a distribution of areas, including heteromodal areas and brain areas encoding attended features, may be involved in audiovisual semantic integration. Through feature-selective attention, the human brain may selectively integrate audiovisual semantic information from attended features by enhancing functional connectivity and thus regulating information flows from heteromodal areas to brain areas encoding the attended features. PMID:26759193

  5. [Appropriate dust control measures for jade carving operations].

    PubMed

    Liu, Jiang; Wang, Qiushui; Liu, Guangquan

    2002-12-01

    To provide the appropriate dust control measures for jade carving operations. Dust concentrations in the workplace were measured according to GB/T 5748-85. Ventilation system of dust control were measured according to GB/T 16157-1996. Dust particle size distributions for different sources and particle size fraction collecting efficiencies of the dust collectors were measured with WY-1 in-stack 7 stage cascade impactors. On the basis of adopting wet process in the carving operations, local exhaust ventilation system for dust control was installed, which included: the special designed slot exhaust hoods with hood face velocity of 2.5 m/s and exhaust volume of 600 m3/h. The pipe sizes were determined according to the air volume passing through the pipe and the reasonable air velocities. Impinging scrubber or bag filter dust collector were selected to treat the dust laden air from the local exhaust ventilation system, which gave a total collecting efficiency of 97% for impinging scrubber and 98% for bag filter; The type of fan and its size were selected according to the total air volume of the ventilation system and maximum total pressure needed for the longest pipe line plus the pressure drop of the dust collector. Practical application showed that, after installation and use of the appropriate dust control measures, the dust concentrations in the workplaces could meet or nearly meet the national hygienic standard and the dust laden air at the local exhaust ventilation system could meet the national emission standard.

  6. Extraction and Production of Omega-3 from UniMAP Puyu (Jade Perch) and Mackarel

    NASA Astrophysics Data System (ADS)

    Nur Izzati, I.; Zainab, H.; Nornadhiratulhusna, M.; Chee Hann, Y.; Khairunissa Syairah, A. S.; Amira Farzana, S.

    2018-03-01

    Extraction techniques to extract fish oil from various types of fish are numerous but not widely accepted because of the use of chemicals that may be harmful to health. In this study, fish oil is extracted using a technique of Microwave-Assisted Extraction, which uses only water. The optimum conditions required for the production of fish oil for extraction is carried out by examining three parameters such as microwave power (300-700W), extraction time (10-30 min) and amount of water used (70-190ml). Optimum conditions were determined after using design of experiments (DOE). The optimum condition obtained was 300 W for microwave power, 10 minutes extraction time and 190 milliliter amounts of water used. Fourier transform infrared spectroscopy (FTIR) was used to analyze the functional groups of fish oil. Two types of fish such as Jade Perch or UniMAP Puyu and Indian Mackerel were used. A standard omega-3 oil sample (Blackmores) purchased from pharmacy was also determined to confirm the presence of omega-3 oil in these fishes. Similar compounds were present in Jade Perch and Indian Mackerel as compared to the standard. Therefore, there were presence of omega-3 fish oil in the two types of fish. From this study, omega-3 in UniMAP Puyu fish was higher compared to Indian Mackerel fish. However, based on the FTIR analysis, besides the presence of omega-3, the two fishes also contain other functional groups such as alkanes, alkenes, aldehyde, ketones and many others. The yield of fish oil for the Jade Perch was low compared to Indian Mackerel which was 9% while Indian Mackerel was 10 %.

  7. Feature engineering for drug name recognition in biomedical texts: feature conjunction and feature selection.

    PubMed

    Liu, Shengyu; Tang, Buzhou; Chen, Qingcai; Wang, Xiaolong; Fan, Xiaoming

    2015-01-01

    Drug name recognition (DNR) is a critical step for drug information extraction. Machine learning-based methods have been widely used for DNR with various types of features such as part-of-speech, word shape, and dictionary feature. Features used in current machine learning-based methods are usually singleton features which may be due to explosive features and a large number of noisy features when singleton features are combined into conjunction features. However, singleton features that can only capture one linguistic characteristic of a word are not sufficient to describe the information for DNR when multiple characteristics should be considered. In this study, we explore feature conjunction and feature selection for DNR, which have never been reported. We intuitively select 8 types of singleton features and combine them into conjunction features in two ways. Then, Chi-square, mutual information, and information gain are used to mine effective features. Experimental results show that feature conjunction and feature selection can improve the performance of the DNR system with a moderate number of features and our DNR system significantly outperforms the best system in the DDIExtraction 2013 challenge.

  8. Fetal ECG extraction using independent component analysis by Jade approach

    NASA Astrophysics Data System (ADS)

    Giraldo-Guzmán, Jader; Contreras-Ortiz, Sonia H.; Lasprilla, Gloria Isabel Bautista; Kotas, Marian

    2017-11-01

    Fetal ECG monitoring is a useful method to assess the fetus health and detect abnormal conditions. In this paper we propose an approach to extract fetal ECG from abdomen and chest signals using independent component analysis based on the joint approximate diagonalization of eigenmatrices approach. The JADE approach avoids redundancy, what reduces matrix dimension and computational costs. Signals were filtered with a high pass filter to eliminate low frequency noise. Several levels of decomposition were tested until the fetal ECG was recognized in one of the separated sources output. The proposed method shows fast and good performance.

  9. Local Feature Selection for Data Classification.

    PubMed

    Armanfard, Narges; Reilly, James P; Komeili, Majid

    2016-06-01

    Typical feature selection methods choose an optimal global feature subset that is applied over all regions of the sample space. In contrast, in this paper we propose a novel localized feature selection (LFS) approach whereby each region of the sample space is associated with its own distinct optimized feature set, which may vary both in membership and size across the sample space. This allows the feature set to optimally adapt to local variations in the sample space. An associated method for measuring the similarities of a query datum to each of the respective classes is also proposed. The proposed method makes no assumptions about the underlying structure of the samples; hence the method is insensitive to the distribution of the data over the sample space. The method is efficiently formulated as a linear programming optimization problem. Furthermore, we demonstrate the method is robust against the over-fitting problem. Experimental results on eleven synthetic and real-world data sets demonstrate the viability of the formulation and the effectiveness of the proposed algorithm. In addition we show several examples where localized feature selection produces better results than a global feature selection method.

  10. EEG feature selection method based on decision tree.

    PubMed

    Duan, Lijuan; Ge, Hui; Ma, Wei; Miao, Jun

    2015-01-01

    This paper aims to solve automated feature selection problem in brain computer interface (BCI). In order to automate feature selection process, we proposed a novel EEG feature selection method based on decision tree (DT). During the electroencephalogram (EEG) signal processing, a feature extraction method based on principle component analysis (PCA) was used, and the selection process based on decision tree was performed by searching the feature space and automatically selecting optimal features. Considering that EEG signals are a series of non-linear signals, a generalized linear classifier named support vector machine (SVM) was chosen. In order to test the validity of the proposed method, we applied the EEG feature selection method based on decision tree to BCI Competition II datasets Ia, and the experiment showed encouraging results.

  11. System Complexity Reduction via Feature Selection

    ERIC Educational Resources Information Center

    Deng, Houtao

    2011-01-01

    This dissertation transforms a set of system complexity reduction problems to feature selection problems. Three systems are considered: classification based on association rules, network structure learning, and time series classification. Furthermore, two variable importance measures are proposed to reduce the feature selection bias in tree…

  12. Feature Selection for Chemical Sensor Arrays Using Mutual Information

    PubMed Central

    Wang, X. Rosalind; Lizier, Joseph T.; Nowotny, Thomas; Berna, Amalia Z.; Prokopenko, Mikhail; Trowell, Stephen C.

    2014-01-01

    We address the problem of feature selection for classifying a diverse set of chemicals using an array of metal oxide sensors. Our aim is to evaluate a filter approach to feature selection with reference to previous work, which used a wrapper approach on the same data set, and established best features and upper bounds on classification performance. We selected feature sets that exhibit the maximal mutual information with the identity of the chemicals. The selected features closely match those found to perform well in the previous study using a wrapper approach to conduct an exhaustive search of all permitted feature combinations. By comparing the classification performance of support vector machines (using features selected by mutual information) with the performance observed in the previous study, we found that while our approach does not always give the maximum possible classification performance, it always selects features that achieve classification performance approaching the optimum obtained by exhaustive search. We performed further classification using the selected feature set with some common classifiers and found that, for the selected features, Bayesian Networks gave the best performance. Finally, we compared the observed classification performances with the performance of classifiers using randomly selected features. We found that the selected features consistently outperformed randomly selected features for all tested classifiers. The mutual information filter approach is therefore a computationally efficient method for selecting near optimal features for chemical sensor arrays. PMID:24595058

  13. A multicentre demonstration project to evaluate the effectiveness and acceptability of the web-based Joint Asia Diabetes Evaluation (JADE) programme with or without nurse support in Chinese patients with Type 2 diabetes.

    PubMed

    Tutino, G E; Yang, W Y; Li, X; Li, W H; Zhang, Y Y; Guo, X H; Luk, A O; Yeung, R O P; Yin, J M; Ozaki, R; So, W Y; Ma, R C W; Ji, L N; Kong, A P S; Weng, J P; Ko, G T C; Jia, W P; Chan, J C N

    2017-03-01

    To test the hypothesis that delivery of integrated care augmented by a web-based disease management programme and nurse coordinator would improve treatment target attainment and health-related behaviour. The web-based Joint Asia Diabetes Evaluation (JADE) and Diabetes Monitoring Database (DIAMOND) portals contain identical built-in protocols to integrate structured assessment, risk stratification, personalized reporting and decision support. The JADE portal contains an additional module to facilitate structured follow-up visits. Between January 2009 and September 2010, 3586 Chinese patients with Type 2 diabetes from six sites in China were randomized to DIAMOND (n = 1728) or JADE, plus nurse-coordinated follow-up visits (n = 1858) with comprehensive assessments at baseline and 12 months. The primary outcome was proportion of patients achieving ≥ 2 treatment targets (HbA 1c < 53 mmol/mol (7%), blood pressure < 130/80 mmHg and LDL cholesterol < 2.6 mmol/l). Of 3586 participants enrolled (mean age 57 years, 54% men, median disease duration 5 years), 2559 returned for repeat assessment after a median (interquartile range) follow-up of 12.5 (4.6) months. The proportion of participants attaining ≥ 2 treatment targets increased in both groups (JADE 40.6 to 50.0%; DIAMOND 38.2 to 50.8%) and there were similar absolute reductions in HbA 1c [DIAMOND -8 mmol/mol vs JADE -7 mmol/mol (-0.69 vs -0.62%)] and LDL cholesterol (DIAMOND -0.32 mmol/l vs JADE -0.28 mmol/l), with no between-group difference. The JADE group was more likely to self-monitor blood glucose (50.5 vs 44.2%; P = 0.005) and had fewer defaulters (25.6 vs 32.0%; P < 0.001). Integrated care augmented by information technology improved cardiometabolic control, with additional nurse contacts reducing the default rate and enhancing self-care. (Clinical trials registry no.: NCT01274364). © 2016 The Authors. Diabetic Medicine published by John Wiley & Sons Ltd on behalf of Diabetes UK.

  14. Collective feature selection to identify crucial epistatic variants.

    PubMed

    Verma, Shefali S; Lucas, Anastasia; Zhang, Xinyuan; Veturi, Yogasudha; Dudek, Scott; Li, Binglan; Li, Ruowang; Urbanowicz, Ryan; Moore, Jason H; Kim, Dokyoon; Ritchie, Marylyn D

    2018-01-01

    Machine learning methods have gained popularity and practicality in identifying linear and non-linear effects of variants associated with complex disease/traits. Detection of epistatic interactions still remains a challenge due to the large number of features and relatively small sample size as input, thus leading to the so-called "short fat data" problem. The efficiency of machine learning methods can be increased by limiting the number of input features. Thus, it is very important to perform variable selection before searching for epistasis. Many methods have been evaluated and proposed to perform feature selection, but no single method works best in all scenarios. We demonstrate this by conducting two separate simulation analyses to evaluate the proposed collective feature selection approach. Through our simulation study we propose a collective feature selection approach to select features that are in the "union" of the best performing methods. We explored various parametric, non-parametric, and data mining approaches to perform feature selection. We choose our top performing methods to select the union of the resulting variables based on a user-defined percentage of variants selected from each method to take to downstream analysis. Our simulation analysis shows that non-parametric data mining approaches, such as MDR, may work best under one simulation criteria for the high effect size (penetrance) datasets, while non-parametric methods designed for feature selection, such as Ranger and Gradient boosting, work best under other simulation criteria. Thus, using a collective approach proves to be more beneficial for selecting variables with epistatic effects also in low effect size datasets and different genetic architectures. Following this, we applied our proposed collective feature selection approach to select the top 1% of variables to identify potential interacting variables associated with Body Mass Index (BMI) in ~ 44,000 samples obtained from Geisinger

  15. Integrated feature extraction and selection for neuroimage classification

    NASA Astrophysics Data System (ADS)

    Fan, Yong; Shen, Dinggang

    2009-02-01

    Feature extraction and selection are of great importance in neuroimage classification for identifying informative features and reducing feature dimensionality, which are generally implemented as two separate steps. This paper presents an integrated feature extraction and selection algorithm with two iterative steps: constrained subspace learning based feature extraction and support vector machine (SVM) based feature selection. The subspace learning based feature extraction focuses on the brain regions with higher possibility of being affected by the disease under study, while the possibility of brain regions being affected by disease is estimated by the SVM based feature selection, in conjunction with SVM classification. This algorithm can not only take into account the inter-correlation among different brain regions, but also overcome the limitation of traditional subspace learning based feature extraction methods. To achieve robust performance and optimal selection of parameters involved in feature extraction, selection, and classification, a bootstrapping strategy is used to generate multiple versions of training and testing sets for parameter optimization, according to the classification performance measured by the area under the ROC (receiver operating characteristic) curve. The integrated feature extraction and selection method is applied to a structural MR image based Alzheimer's disease (AD) study with 98 non-demented and 100 demented subjects. Cross-validation results indicate that the proposed algorithm can improve performance of the traditional subspace learning based classification.

  16. Economic indicators selection for crime rates forecasting using cooperative feature selection

    NASA Astrophysics Data System (ADS)

    Alwee, Razana; Shamsuddin, Siti Mariyam Hj; Salleh Sallehuddin, Roselina

    2013-04-01

    Features selection in multivariate forecasting model is very important to ensure that the model is accurate. The purpose of this study is to apply the Cooperative Feature Selection method for features selection. The features are economic indicators that will be used in crime rate forecasting model. The Cooperative Feature Selection combines grey relational analysis and artificial neural network to establish a cooperative model that can rank and select the significant economic indicators. Grey relational analysis is used to select the best data series to represent each economic indicator and is also used to rank the economic indicators according to its importance to the crime rate. After that, the artificial neural network is used to select the significant economic indicators for forecasting the crime rates. In this study, we used economic indicators of unemployment rate, consumer price index, gross domestic product and consumer sentiment index, as well as data rates of property crime and violent crime for the United States. Levenberg-Marquardt neural network is used in this study. From our experiments, we found that consumer price index is an important economic indicator that has a significant influence on the violent crime rate. While for property crime rate, the gross domestic product, unemployment rate and consumer price index are the influential economic indicators. The Cooperative Feature Selection is also found to produce smaller errors as compared to Multiple Linear Regression in forecasting property and violent crime rates.

  17. Feature selection in feature network models: finding predictive subsets of features with the Positive Lasso.

    PubMed

    Frank, Laurence E; Heiser, Willem J

    2008-05-01

    A set of features is the basis for the network representation of proximity data achieved by feature network models (FNMs). Features are binary variables that characterize the objects in an experiment, with some measure of proximity as response variable. Sometimes features are provided by theory and play an important role in the construction of the experimental conditions. In some research settings, the features are not known a priori. This paper shows how to generate features in this situation and how to select an adequate subset of features that takes into account a good compromise between model fit and model complexity, using a new version of least angle regression that restricts coefficients to be non-negative, called the Positive Lasso. It will be shown that features can be generated efficiently with Gray codes that are naturally linked to the FNMs. The model selection strategy makes use of the fact that FNM can be considered as univariate multiple regression model. A simulation study shows that the proposed strategy leads to satisfactory results if the number of objects is less than or equal to 22. If the number of objects is larger than 22, the number of features selected by our method exceeds the true number of features in some conditions.

  18. Fizzy: feature subset selection for metagenomics.

    PubMed

    Ditzler, Gregory; Morrison, J Calvin; Lan, Yemin; Rosen, Gail L

    2015-11-04

    Some of the current software tools for comparative metagenomics provide ecologists with the ability to investigate and explore bacterial communities using α- & β-diversity. Feature subset selection--a sub-field of machine learning--can also provide a unique insight into the differences between metagenomic or 16S phenotypes. In particular, feature subset selection methods can obtain the operational taxonomic units (OTUs), or functional features, that have a high-level of influence on the condition being studied. For example, in a previous study we have used information-theoretic feature selection to understand the differences between protein family abundances that best discriminate between age groups in the human gut microbiome. We have developed a new Python command line tool, which is compatible with the widely adopted BIOM format, for microbial ecologists that implements information-theoretic subset selection methods for biological data formats. We demonstrate the software tools capabilities on publicly available datasets. We have made the software implementation of Fizzy available to the public under the GNU GPL license. The standalone implementation can be found at http://github.com/EESI/Fizzy.

  19. Archaeological jade mystery solved using a 119-year-old rock collection specimen

    NASA Astrophysics Data System (ADS)

    Harlow, G. E.; Davies, H. L.; Summerhayes, G. R.; Matisoo-Smith, E.

    2012-12-01

    In a recent publication (Harlow et al. 2012), a ~3200-year old small stone artefact from an archaeological excavation on Emirau Island, Bismarck Archipelago, Papua New Guinea was described and determined to be a piece of jadeite jade (jadeitite). True jadeitite from any part of New Guinea was not previously known, either in an archaeological or geological context, so this object was of considerable interest with respect to its geological source and what that would mean about trade between this source and Emirau Island. Fortuitously, the artefact, presumably a wood-carving gouge, is very unusual with respect to both pyroxene composition and minor mineral constituents. Pyroxene compositions lie essentially along the jadeite-aegirine join: Jd94Ae6 to Jd63Ae36, and without any coexisting omphacite. This contrasts with Jd-Di or Jd-Aug compositional trends commonly observed in jadeitites worldwide. Paragonite and albite occur in veins and cavities with minor titanite, epidote-allanite, and zircon, an assemblage seen in a few jadeitites. Surprisingly, some titanite contains up to 6 wt% Nb2O5 with only trace Ta and a single grain of a Y-Nb phase (interpreted as fergusonite) is present; these are unique for jadeitite. In a historical tribute to C.E.A. Wichmann, a German geologist who taught at Utrecht University, the Netherlands, a previously unpublished description of chlormelanite from the Torare River in extreme northeast Papua, Indonesia was given. The bulk composition essentially matches the pyroxene composition of the jade, so this sample was hypothesized as coming from the source. We were able to arrange a loan from the petrology collection at Utrecht University of the specimen acquired by Wichmann in 1893. In addition we borrowed stone axes from the Natural History Museum - Naturalis in Leiden obtained from natives near what is now Jayapura in eastern-most Papua. Petrography and microprobe analysis of sections of these samples clearly show that (1) Wichmann's 1893

  20. Feature Grouping and Selection Over an Undirected Graph.

    PubMed

    Yang, Sen; Yuan, Lei; Lai, Ying-Cheng; Shen, Xiaotong; Wonka, Peter; Ye, Jieping

    2012-01-01

    High-dimensional regression/classification continues to be an important and challenging problem, especially when features are highly correlated. Feature selection, combined with additional structure information on the features has been considered to be promising in promoting regression/classification performance. Graph-guided fused lasso (GFlasso) has recently been proposed to facilitate feature selection and graph structure exploitation, when features exhibit certain graph structures. However, the formulation in GFlasso relies on pairwise sample correlations to perform feature grouping, which could introduce additional estimation bias. In this paper, we propose three new feature grouping and selection methods to resolve this issue. The first method employs a convex function to penalize the pairwise l ∞ norm of connected regression/classification coefficients, achieving simultaneous feature grouping and selection. The second method improves the first one by utilizing a non-convex function to reduce the estimation bias. The third one is the extension of the second method using a truncated l 1 regularization to further reduce the estimation bias. The proposed methods combine feature grouping and feature selection to enhance estimation accuracy. We employ the alternating direction method of multipliers (ADMM) and difference of convex functions (DC) programming to solve the proposed formulations. Our experimental results on synthetic data and two real datasets demonstrate the effectiveness of the proposed methods.

  1. Classification Influence of Features on Given Emotions and Its Application in Feature Selection

    NASA Astrophysics Data System (ADS)

    Xing, Yin; Chen, Chuang; Liu, Li-Long

    2018-04-01

    In order to solve the problem that there is a large amount of redundant data in high-dimensional speech emotion features, we analyze deeply the extracted speech emotion features and select better features. Firstly, a given emotion is classified by each feature. Secondly, the recognition rate is ranked in descending order. Then, the optimal threshold of features is determined by rate criterion. Finally, the better features are obtained. When applied in Berlin and Chinese emotional data set, the experimental results show that the feature selection method outperforms the other traditional methods.

  2. Fizzy. Feature subset selection for metagenomics

    DOE PAGES

    Ditzler, Gregory; Morrison, J. Calvin; Lan, Yemin; ...

    2015-11-04

    Background: Some of the current software tools for comparative metagenomics provide ecologists with the ability to investigate and explore bacterial communities using α– & β–diversity. Feature subset selection – a sub-field of machine learning – can also provide a unique insight into the differences between metagenomic or 16S phenotypes. In particular, feature subset selection methods can obtain the operational taxonomic units (OTUs), or functional features, that have a high-level of influence on the condition being studied. For example, in a previous study we have used information-theoretic feature selection to understand the differences between protein family abundances that best discriminate betweenmore » age groups in the human gut microbiome. Results: We have developed a new Python command line tool, which is compatible with the widely adopted BIOM format, for microbial ecologists that implements information-theoretic subset selection methods for biological data formats. We demonstrate the software tools capabilities on publicly available datasets. Conclusions: We have made the software implementation of Fizzy available to the public under the GNU GPL license. The standalone implementation can be found at http://github.com/EESI/Fizzy.« less

  3. Rough sets and Laplacian score based cost-sensitive feature selection.

    PubMed

    Yu, Shenglong; Zhao, Hong

    2018-01-01

    Cost-sensitive feature selection learning is an important preprocessing step in machine learning and data mining. Recently, most existing cost-sensitive feature selection algorithms are heuristic algorithms, which evaluate the importance of each feature individually and select features one by one. Obviously, these algorithms do not consider the relationship among features. In this paper, we propose a new algorithm for minimal cost feature selection called the rough sets and Laplacian score based cost-sensitive feature selection. The importance of each feature is evaluated by both rough sets and Laplacian score. Compared with heuristic algorithms, the proposed algorithm takes into consideration the relationship among features with locality preservation of Laplacian score. We select a feature subset with maximal feature importance and minimal cost when cost is undertaken in parallel, where the cost is given by three different distributions to simulate different applications. Different from existing cost-sensitive feature selection algorithms, our algorithm simultaneously selects out a predetermined number of "good" features. Extensive experimental results show that the approach is efficient and able to effectively obtain the minimum cost subset. In addition, the results of our method are more promising than the results of other cost-sensitive feature selection algorithms.

  4. Ordinal feature selection for iris and palmprint recognition.

    PubMed

    Sun, Zhenan; Wang, Libin; Tan, Tieniu

    2014-09-01

    Ordinal measures have been demonstrated as an effective feature representation model for iris and palmprint recognition. However, ordinal measures are a general concept of image analysis and numerous variants with different parameter settings, such as location, scale, orientation, and so on, can be derived to construct a huge feature space. This paper proposes a novel optimization formulation for ordinal feature selection with successful applications to both iris and palmprint recognition. The objective function of the proposed feature selection method has two parts, i.e., misclassification error of intra and interclass matching samples and weighted sparsity of ordinal feature descriptors. Therefore, the feature selection aims to achieve an accurate and sparse representation of ordinal measures. And, the optimization subjects to a number of linear inequality constraints, which require that all intra and interclass matching pairs are well separated with a large margin. Ordinal feature selection is formulated as a linear programming (LP) problem so that a solution can be efficiently obtained even on a large-scale feature pool and training database. Extensive experimental results demonstrate that the proposed LP formulation is advantageous over existing feature selection methods, such as mRMR, ReliefF, Boosting, and Lasso for biometric recognition, reporting state-of-the-art accuracy on CASIA and PolyU databases.

  5. Rough sets and Laplacian score based cost-sensitive feature selection

    PubMed Central

    Yu, Shenglong

    2018-01-01

    Cost-sensitive feature selection learning is an important preprocessing step in machine learning and data mining. Recently, most existing cost-sensitive feature selection algorithms are heuristic algorithms, which evaluate the importance of each feature individually and select features one by one. Obviously, these algorithms do not consider the relationship among features. In this paper, we propose a new algorithm for minimal cost feature selection called the rough sets and Laplacian score based cost-sensitive feature selection. The importance of each feature is evaluated by both rough sets and Laplacian score. Compared with heuristic algorithms, the proposed algorithm takes into consideration the relationship among features with locality preservation of Laplacian score. We select a feature subset with maximal feature importance and minimal cost when cost is undertaken in parallel, where the cost is given by three different distributions to simulate different applications. Different from existing cost-sensitive feature selection algorithms, our algorithm simultaneously selects out a predetermined number of “good” features. Extensive experimental results show that the approach is efficient and able to effectively obtain the minimum cost subset. In addition, the results of our method are more promising than the results of other cost-sensitive feature selection algorithms. PMID:29912884

  6. Natural image statistics and low-complexity feature selection.

    PubMed

    Vasconcelos, Manuela; Vasconcelos, Nuno

    2009-02-01

    Low-complexity feature selection is analyzed in the context of visual recognition. It is hypothesized that high-order dependences of bandpass features contain little information for discrimination of natural images. This hypothesis is characterized formally by the introduction of the concepts of conjunctive interference and decomposability order of a feature set. Necessary and sufficient conditions for the feasibility of low-complexity feature selection are then derived in terms of these concepts. It is shown that the intrinsic complexity of feature selection is determined by the decomposability order of the feature set and not its dimension. Feature selection algorithms are then derived for all levels of complexity and are shown to be approximated by existing information-theoretic methods, which they consistently outperform. The new algorithms are also used to objectively test the hypothesis of low decomposability order through comparison of classification performance. It is shown that, for image classification, the gain of modeling feature dependencies has strongly diminishing returns: best results are obtained under the assumption of decomposability order 1. This suggests a generic law for bandpass features extracted from natural images: that the effect, on the dependence of any two features, of observing any other feature is constant across image classes.

  7. Robust Feature Selection Technique using Rank Aggregation.

    PubMed

    Sarkar, Chandrima; Cooley, Sarah; Srivastava, Jaideep

    2014-01-01

    Although feature selection is a well-developed research area, there is an ongoing need to develop methods to make classifiers more efficient. One important challenge is the lack of a universal feature selection technique which produces similar outcomes with all types of classifiers. This is because all feature selection techniques have individual statistical biases while classifiers exploit different statistical properties of data for evaluation. In numerous situations this can put researchers into dilemma as to which feature selection method and a classifiers to choose from a vast range of choices. In this paper, we propose a technique that aggregates the consensus properties of various feature selection methods to develop a more optimal solution. The ensemble nature of our technique makes it more robust across various classifiers. In other words, it is stable towards achieving similar and ideally higher classification accuracy across a wide variety of classifiers. We quantify this concept of robustness with a measure known as the Robustness Index (RI). We perform an extensive empirical evaluation of our technique on eight data sets with different dimensions including Arrythmia, Lung Cancer, Madelon, mfeat-fourier, internet-ads, Leukemia-3c and Embryonal Tumor and a real world data set namely Acute Myeloid Leukemia (AML). We demonstrate not only that our algorithm is more robust, but also that compared to other techniques our algorithm improves the classification accuracy by approximately 3-4% (in data set with less than 500 features) and by more than 5% (in data set with more than 500 features), across a wide range of classifiers.

  8. Higher criticism thresholding: Optimal feature selection when useful features are rare and weak.

    PubMed

    Donoho, David; Jin, Jiashun

    2008-09-30

    In important application fields today-genomics and proteomics are examples-selecting a small subset of useful features is crucial for success of Linear Classification Analysis. We study feature selection by thresholding of feature Z-scores and introduce a principle of threshold selection, based on the notion of higher criticism (HC). For i = 1, 2, ..., p, let pi(i) denote the two-sided P-value associated with the ith feature Z-score and pi((i)) denote the ith order statistic of the collection of P-values. The HC threshold is the absolute Z-score corresponding to the P-value maximizing the HC objective (i/p - pi((i)))/sqrt{i/p(1-i/p)}. We consider a rare/weak (RW) feature model, where the fraction of useful features is small and the useful features are each too weak to be of much use on their own. HC thresholding (HCT) has interesting behavior in this setting, with an intimate link between maximizing the HC objective and minimizing the error rate of the designed classifier, and very different behavior from popular threshold selection procedures such as false discovery rate thresholding (FDRT). In the most challenging RW settings, HCT uses an unconventionally low threshold; this keeps the missed-feature detection rate under better control than FDRT and yields a classifier with improved misclassification performance. Replacing cross-validated threshold selection in the popular Shrunken Centroid classifier with the computationally less expensive and simpler HCT reduces the variance of the selected threshold and the error rate of the constructed classifier. Results on standard real datasets and in asymptotic theory confirm the advantages of HCT.

  9. Higher criticism thresholding: Optimal feature selection when useful features are rare and weak

    PubMed Central

    Donoho, David; Jin, Jiashun

    2008-01-01

    In important application fields today—genomics and proteomics are examples—selecting a small subset of useful features is crucial for success of Linear Classification Analysis. We study feature selection by thresholding of feature Z-scores and introduce a principle of threshold selection, based on the notion of higher criticism (HC). For i = 1, 2, …, p, let πi denote the two-sided P-value associated with the ith feature Z-score and π(i) denote the ith order statistic of the collection of P-values. The HC threshold is the absolute Z-score corresponding to the P-value maximizing the HC objective (i/p − π(i))/i/p(1−i/p). We consider a rare/weak (RW) feature model, where the fraction of useful features is small and the useful features are each too weak to be of much use on their own. HC thresholding (HCT) has interesting behavior in this setting, with an intimate link between maximizing the HC objective and minimizing the error rate of the designed classifier, and very different behavior from popular threshold selection procedures such as false discovery rate thresholding (FDRT). In the most challenging RW settings, HCT uses an unconventionally low threshold; this keeps the missed-feature detection rate under better control than FDRT and yields a classifier with improved misclassification performance. Replacing cross-validated threshold selection in the popular Shrunken Centroid classifier with the computationally less expensive and simpler HCT reduces the variance of the selected threshold and the error rate of the constructed classifier. Results on standard real datasets and in asymptotic theory confirm the advantages of HCT. PMID:18815365

  10. Feature Selection for Ridge Regression with Provable Guarantees.

    PubMed

    Paul, Saurabh; Drineas, Petros

    2016-04-01

    We introduce single-set spectral sparsification as a deterministic sampling-based feature selection technique for regularized least-squares classification, which is the classification analog to ridge regression. The method is unsupervised and gives worst-case guarantees of the generalization power of the classification function after feature selection with respect to the classification function obtained using all features. We also introduce leverage-score sampling as an unsupervised randomized feature selection method for ridge regression. We provide risk bounds for both single-set spectral sparsification and leverage-score sampling on ridge regression in the fixed design setting and show that the risk in the sampled space is comparable to the risk in the full-feature space. We perform experiments on synthetic and real-world data sets; a subset of TechTC-300 data sets, to support our theory. Experimental results indicate that the proposed methods perform better than the existing feature selection methods.

  11. Feature selection for the classification of traced neurons.

    PubMed

    López-Cabrera, José D; Lorenzo-Ginori, Juan V

    2018-06-01

    The great availability of computational tools to calculate the properties of traced neurons leads to the existence of many descriptors which allow the automated classification of neurons from these reconstructions. This situation determines the necessity to eliminate irrelevant features as well as making a selection of the most appropriate among them, in order to improve the quality of the classification obtained. The dataset used contains a total of 318 traced neurons, classified by human experts in 192 GABAergic interneurons and 126 pyramidal cells. The features were extracted by means of the L-measure software, which is one of the most used computational tools in neuroinformatics to quantify traced neurons. We review some current feature selection techniques as filter, wrapper, embedded and ensemble methods. The stability of the feature selection methods was measured. For the ensemble methods, several aggregation methods based on different metrics were applied to combine the subsets obtained during the feature selection process. The subsets obtained applying feature selection methods were evaluated using supervised classifiers, among which Random Forest, C4.5, SVM, Naïve Bayes, Knn, Decision Table and the Logistic classifier were used as classification algorithms. Feature selection methods of types filter, embedded, wrappers and ensembles were compared and the subsets returned were tested in classification tasks for different classification algorithms. L-measure features EucDistanceSD, PathDistanceSD, Branch_pathlengthAve, Branch_pathlengthSD and EucDistanceAve were present in more than 60% of the selected subsets which provides evidence about their importance in the classification of this neurons. Copyright © 2018 Elsevier B.V. All rights reserved.

  12. Making death 'good': instructional tales for dying in newspaper accounts of Jade Goody's death.

    PubMed

    Frith, Hannah; Raisborough, Jayne; Klein, Orly

    2013-03-01

    Facilitating a 'good' death is a central goal for hospices and palliative care organisations. The key features of such a death include an acceptance of death, an open awareness of and communication about death, the settling of practical and interpersonal business, the reduction of suffering and pain, and the enhancement of autonomy, choice and control. Yet deaths are inherently neither good nor bad; they require cultural labour to be 'made over' as good. Drawing on media accounts of the controversial death of UK reality television star Jade Goody, and building on existing analyses of her death, we examine how cultural discourses actively work to construct deaths as good or bad and to position the dying and those witnessing their death as morally accountable. By constructing Goody as bravely breaking social taboos by openly acknowledging death, by contextualising her dying as occurring at the end of a life well lived and by emphasising biographical continuity and agency, newspaper accounts serve to position themselves as educative rather than exploitative, and readers as information-seekers rather than ghoulishly voyeuristic. We argue that popular culture offers moral instruction in dying well which resonates with the messages from palliative care. © 2012 The Authors. Sociology of Health & Illness © 2012 Foundation for the Sociology of Health & Illness/Blackwell Publishing Ltd.

  13. Joint Feature Selection and Classification for Multilabel Learning.

    PubMed

    Huang, Jun; Li, Guorong; Huang, Qingming; Wu, Xindong

    2018-03-01

    Multilabel learning deals with examples having multiple class labels simultaneously. It has been applied to a variety of applications, such as text categorization and image annotation. A large number of algorithms have been proposed for multilabel learning, most of which concentrate on multilabel classification problems and only a few of them are feature selection algorithms. Current multilabel classification models are mainly built on a single data representation composed of all the features which are shared by all the class labels. Since each class label might be decided by some specific features of its own, and the problems of classification and feature selection are often addressed independently, in this paper, we propose a novel method which can perform joint feature selection and classification for multilabel learning, named JFSC. Different from many existing methods, JFSC learns both shared features and label-specific features by considering pairwise label correlations, and builds the multilabel classifier on the learned low-dimensional data representations simultaneously. A comparative study with state-of-the-art approaches manifests a competitive performance of our proposed method both in classification and feature selection for multilabel learning.

  14. Adversarial Feature Selection Against Evasion Attacks.

    PubMed

    Zhang, Fei; Chan, Patrick P K; Biggio, Battista; Yeung, Daniel S; Roli, Fabio

    2016-03-01

    Pattern recognition and machine learning techniques have been increasingly adopted in adversarial settings such as spam, intrusion, and malware detection, although their security against well-crafted attacks that aim to evade detection by manipulating data at test time has not yet been thoroughly assessed. While previous work has been mainly focused on devising adversary-aware classification algorithms to counter evasion attempts, only few authors have considered the impact of using reduced feature sets on classifier security against the same attacks. An interesting, preliminary result is that classifier security to evasion may be even worsened by the application of feature selection. In this paper, we provide a more detailed investigation of this aspect, shedding some light on the security properties of feature selection against evasion attacks. Inspired by previous work on adversary-aware classifiers, we propose a novel adversary-aware feature selection model that can improve classifier security against evasion attacks, by incorporating specific assumptions on the adversary's data manipulation strategy. We focus on an efficient, wrapper-based implementation of our approach, and experimentally validate its soundness on different application examples, including spam and malware detection.

  15. RESIDENTIAL RADON RESISTANT CONSTRUCTION FEATURE SELECTION SYSTEM

    EPA Science Inventory

    The report describes a proposed residential radon resistant construction feature selection system. The features consist of engineered barriers to reduce radon entry and accumulation indoors. The proposed Florida standards require radon resistant features in proportion to regional...

  16. FSMRank: feature selection algorithm for learning to rank.

    PubMed

    Lai, Han-Jiang; Pan, Yan; Tang, Yong; Yu, Rong

    2013-06-01

    In recent years, there has been growing interest in learning to rank. The introduction of feature selection into different learning problems has been proven effective. These facts motivate us to investigate the problem of feature selection for learning to rank. We propose a joint convex optimization formulation which minimizes ranking errors while simultaneously conducting feature selection. This optimization formulation provides a flexible framework in which we can easily incorporate various importance measures and similarity measures of the features. To solve this optimization problem, we use the Nesterov's approach to derive an accelerated gradient algorithm with a fast convergence rate O(1/T(2)). We further develop a generalization bound for the proposed optimization problem using the Rademacher complexities. Extensive experimental evaluations are conducted on the public LETOR benchmark datasets. The results demonstrate that the proposed method shows: 1) significant ranking performance gain compared to several feature selection baselines for ranking, and 2) very competitive performance compared to several state-of-the-art learning-to-rank algorithms.

  17. Aging, selective attention, and feature integration.

    PubMed

    Plude, D J; Doussard-Roosevelt, J A

    1989-03-01

    This study used feature-integration theory as a means of determining the point in processing at which selective attention deficits originate. The theory posits an initial stage of processing in which features are registered in parallel and then a serial process in which features are conjoined to form complex stimuli. Performance of young and older adults on feature versus conjunction search is compared. Analyses of reaction times and error rates suggest that elderly adults in addition to young adults, can capitalize on the early parallel processing stage of visual information processing, and that age decrements in visual search arise as a result of the later, serial stage of processing. Analyses of a third, unconfounded, conjunction search condition reveal qualitatively similar modes of conjunction search in young and older adults. The contribution of age-related data limitations is found to be secondary to the contribution of age decrements in selective attention.

  18. Hybrid feature selection for supporting lightweight intrusion detection systems

    NASA Astrophysics Data System (ADS)

    Song, Jianglong; Zhao, Wentao; Liu, Qiang; Wang, Xin

    2017-08-01

    Redundant and irrelevant features not only cause high resource consumption but also degrade the performance of Intrusion Detection Systems (IDS), especially when coping with big data. These features slow down the process of training and testing in network traffic classification. Therefore, a hybrid feature selection approach in combination with wrapper and filter selection is designed in this paper to build a lightweight intrusion detection system. Two main phases are involved in this method. The first phase conducts a preliminary search for an optimal subset of features, in which the chi-square feature selection is utilized. The selected set of features from the previous phase is further refined in the second phase in a wrapper manner, in which the Random Forest(RF) is used to guide the selection process and retain an optimized set of features. After that, we build an RF-based detection model and make a fair comparison with other approaches. The experimental results on NSL-KDD datasets show that our approach results are in higher detection accuracy as well as faster training and testing processes.

  19. Feature-selective attention in healthy old age: a selective decline in selective attention?

    PubMed

    Quigley, Cliodhna; Müller, Matthias M

    2014-02-12

    Deficient selection against irrelevant information has been proposed to underlie age-related cognitive decline. We recently reported evidence for maintained early sensory selection when older and younger adults used spatial selective attention to perform a challenging task. Here we explored age-related differences when spatial selection is not possible and feature-selective attention must be deployed. We additionally compared the integrity of feedforward processing by exploiting the well established phenomenon of suppression of visual cortical responses attributable to interstimulus competition. Electroencephalogram was measured while older and younger human adults responded to brief occurrences of coherent motion in an attended stimulus composed of randomly moving, orientation-defined, flickering bars. Attention was directed to horizontal or vertical bars by a pretrial cue, after which two orthogonally oriented, overlapping stimuli or a single stimulus were presented. Horizontal and vertical bars flickered at different frequencies and thereby elicited separable steady-state visual-evoked potentials, which were used to examine the effect of feature-based selection and the competitive influence of a second stimulus on ongoing visual processing. Age differences were found in feature-selective attentional modulation of visual responses: older adults did not show consistent modulation of magnitude or phase. In contrast, the suppressive effect of a second stimulus was robust and comparable in magnitude across age groups, suggesting that bottom-up processing of the current stimuli is essentially unchanged in healthy old age. Thus, it seems that visual processing per se is unchanged, but top-down attentional control is compromised in older adults when space cannot be used to guide selection.

  20. Effective traffic features selection algorithm for cyber-attacks samples

    NASA Astrophysics Data System (ADS)

    Li, Yihong; Liu, Fangzheng; Du, Zhenyu

    2018-05-01

    By studying the defense scheme of Network attacks, this paper propose an effective traffic features selection algorithm based on k-means++ clustering to deal with the problem of high dimensionality of traffic features which extracted from cyber-attacks samples. Firstly, this algorithm divide the original feature set into attack traffic feature set and background traffic feature set by the clustering. Then, we calculates the variation of clustering performance after removing a certain feature. Finally, evaluating the degree of distinctiveness of the feature vector according to the result. Among them, the effective feature vector is whose degree of distinctiveness exceeds the set threshold. The purpose of this paper is to select out the effective features from the extracted original feature set. In this way, it can reduce the dimensionality of the features so as to reduce the space-time overhead of subsequent detection. The experimental results show that the proposed algorithm is feasible and it has some advantages over other selection algorithms.

  1. Selective processing of multiple features in the human brain: effects of feature type and salience.

    PubMed

    McGinnis, E Menton; Keil, Andreas

    2011-02-09

    Identifying targets in a stream of items at a given constant spatial location relies on selection of aspects such as color, shape, or texture. Such attended (target) features of a stimulus elicit a negative-going event-related brain potential (ERP), termed Selection Negativity (SN), which has been used as an index of selective feature processing. In two experiments, participants viewed a series of Gabor patches in which targets were defined as a specific combination of color, orientation, and shape. Distracters were composed of different combinations of color, orientation, and shape of the target stimulus. This design allows comparisons of items with and without specific target features. Consistent with previous ERP research, SN deflections extended between 160-300 ms. Data from the subsequent P3 component (300-450 ms post-stimulus) were also examined, and were regarded as an index of target processing. In Experiment A, predominant effects of target color on SN and P3 amplitudes were found, along with smaller ERP differences in response to variations of orientation and shape. Manipulating color to be less salient while enhancing the saliency of the orientation of the Gabor patch (Experiment B) led to delayed color selection and enhanced orientation selection. Topographical analyses suggested that the location of SN on the scalp reliably varies with the nature of the to-be-attended feature. No interference of non-target features on the SN was observed. These results suggest that target feature selection operates by means of electrocortical facilitation of feature-specific sensory processes, and that selective electrocortical facilitation is more effective when stimulus saliency is heightened.

  2. Selective attention to temporal features on nested time scales.

    PubMed

    Henry, Molly J; Herrmann, Björn; Obleser, Jonas

    2015-02-01

    Meaningful auditory stimuli such as speech and music often vary simultaneously along multiple time scales. Thus, listeners must selectively attend to, and selectively ignore, separate but intertwined temporal features. The current study aimed to identify and characterize the neural network specifically involved in this feature-selective attention to time. We used a novel paradigm where listeners judged either the duration or modulation rate of auditory stimuli, and in which the stimulation, working memory demands, response requirements, and task difficulty were held constant. A first analysis identified all brain regions where individual brain activation patterns were correlated with individual behavioral performance patterns, which thus supported temporal judgments generically. A second analysis then isolated those brain regions that specifically regulated selective attention to temporal features: Neural responses in a bilateral fronto-parietal network including insular cortex and basal ganglia decreased with degree of change of the attended temporal feature. Critically, response patterns in these regions were inverted when the task required selectively ignoring this feature. The results demonstrate how the neural analysis of complex acoustic stimuli with multiple temporal features depends on a fronto-parietal network that simultaneously regulates the selective gain for attended and ignored temporal features. © The Author 2013. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  3. Hypergraph Based Feature Selection Technique for Medical Diagnosis.

    PubMed

    Somu, Nivethitha; Raman, M R Gauthama; Kirthivasan, Kannan; Sriram, V S Shankar

    2016-11-01

    The impact of internet and information systems across various domains have resulted in substantial generation of multidimensional datasets. The use of data mining and knowledge discovery techniques to extract the original information contained in the multidimensional datasets play a significant role in the exploitation of complete benefit provided by them. The presence of large number of features in the high dimensional datasets incurs high computational cost in terms of computing power and time. Hence, feature selection technique has been commonly used to build robust machine learning models to select a subset of relevant features which projects the maximal information content of the original dataset. In this paper, a novel Rough Set based K - Helly feature selection technique (RSKHT) which hybridize Rough Set Theory (RST) and K - Helly property of hypergraph representation had been designed to identify the optimal feature subset or reduct for medical diagnostic applications. Experiments carried out using the medical datasets from the UCI repository proves the dominance of the RSKHT over other feature selection techniques with respect to the reduct size, classification accuracy and time complexity. The performance of the RSKHT had been validated using WEKA tool, which shows that RSKHT had been computationally attractive and flexible over massive datasets.

  4. Features selection and classification to estimate elbow movements

    NASA Astrophysics Data System (ADS)

    Rubiano, A.; Ramírez, J. L.; El Korso, M. N.; Jouandeau, N.; Gallimard, L.; Polit, O.

    2015-11-01

    In this paper, we propose a novel method to estimate the elbow motion, through the features extracted from electromyography (EMG) signals. The features values are normalized and then compared to identify potential relationships between the EMG signal and the kinematic information as angle and angular velocity. We propose and implement a method to select the best set of features, maximizing the distance between the features that correspond to flexion and extension movements. Finally, we test the selected features as inputs to a non-linear support vector machine in the presence of non-idealistic conditions, obtaining an accuracy of 99.79% in the motion estimation results.

  5. A Features Selection for Crops Classification

    NASA Astrophysics Data System (ADS)

    Liu, Yifan; Shao, Luyi; Yin, Qiang; Hong, Wen

    2016-08-01

    The components of the polarimetric target decomposition reflect the differences of target since they linked with the scattering properties of the target and can be imported into SVM as the classification features. The result of decomposition usually concentrate on part of the components. Selecting a combination of components can reduce the features that importing into the SVM. The features reduction can lead to less calculation and targeted classification of one target when we classify a multi-class area. In this research, we import different combinations of features into the SVM and find a better combination for classification with a data of AGRISAR.

  6. Feature Selection Methods for Zero-Shot Learning of Neural Activity

    PubMed Central

    Caceres, Carlos A.; Roos, Matthew J.; Rupp, Kyle M.; Milsap, Griffin; Crone, Nathan E.; Wolmetz, Michael E.; Ratto, Christopher R.

    2017-01-01

    Dimensionality poses a serious challenge when making predictions from human neuroimaging data. Across imaging modalities, large pools of potential neural features (e.g., responses from particular voxels, electrodes, and temporal windows) have to be related to typically limited sets of stimuli and samples. In recent years, zero-shot prediction models have been introduced for mapping between neural signals and semantic attributes, which allows for classification of stimulus classes not explicitly included in the training set. While choices about feature selection can have a substantial impact when closed-set accuracy, open-set robustness, and runtime are competing design objectives, no systematic study of feature selection for these models has been reported. Instead, a relatively straightforward feature stability approach has been adopted and successfully applied across models and imaging modalities. To characterize the tradeoffs in feature selection for zero-shot learning, we compared correlation-based stability to several other feature selection techniques on comparable data sets from two distinct imaging modalities: functional Magnetic Resonance Imaging and Electrocorticography. While most of the feature selection methods resulted in similar zero-shot prediction accuracies and spatial/spectral patterns of selected features, there was one exception; A novel feature/attribute correlation approach was able to achieve those accuracies with far fewer features, suggesting the potential for simpler prediction models that yield high zero-shot classification accuracy. PMID:28690513

  7. Feature Selection Methods for Zero-Shot Learning of Neural Activity.

    PubMed

    Caceres, Carlos A; Roos, Matthew J; Rupp, Kyle M; Milsap, Griffin; Crone, Nathan E; Wolmetz, Michael E; Ratto, Christopher R

    2017-01-01

    Dimensionality poses a serious challenge when making predictions from human neuroimaging data. Across imaging modalities, large pools of potential neural features (e.g., responses from particular voxels, electrodes, and temporal windows) have to be related to typically limited sets of stimuli and samples. In recent years, zero-shot prediction models have been introduced for mapping between neural signals and semantic attributes, which allows for classification of stimulus classes not explicitly included in the training set. While choices about feature selection can have a substantial impact when closed-set accuracy, open-set robustness, and runtime are competing design objectives, no systematic study of feature selection for these models has been reported. Instead, a relatively straightforward feature stability approach has been adopted and successfully applied across models and imaging modalities. To characterize the tradeoffs in feature selection for zero-shot learning, we compared correlation-based stability to several other feature selection techniques on comparable data sets from two distinct imaging modalities: functional Magnetic Resonance Imaging and Electrocorticography. While most of the feature selection methods resulted in similar zero-shot prediction accuracies and spatial/spectral patterns of selected features, there was one exception; A novel feature/attribute correlation approach was able to achieve those accuracies with far fewer features, suggesting the potential for simpler prediction models that yield high zero-shot classification accuracy.

  8. NetProt: Complex-based Feature Selection.

    PubMed

    Goh, Wilson Wen Bin; Wong, Limsoon

    2017-08-04

    Protein complex-based feature selection (PCBFS) provides unparalleled reproducibility with high phenotypic relevance on proteomics data. Currently, there are five PCBFS paradigms, but not all representative methods have been implemented or made readily available. To allow general users to take advantage of these methods, we developed the R-package NetProt, which provides implementations of representative feature-selection methods. NetProt also provides methods for generating simulated differential data and generating pseudocomplexes for complex-based performance benchmarking. The NetProt open source R package is available for download from https://github.com/gohwils/NetProt/releases/ , and online documentation is available at http://rpubs.com/gohwils/204259 .

  9. Multi-task feature selection in microarray data by binary integer programming.

    PubMed

    Lan, Liang; Vucetic, Slobodan

    2013-12-20

    A major challenge in microarray classification is that the number of features is typically orders of magnitude larger than the number of examples. In this paper, we propose a novel feature filter algorithm to select the feature subset with maximal discriminative power and minimal redundancy by solving a quadratic objective function with binary integer constraints. To improve the computational efficiency, the binary integer constraints are relaxed and a low-rank approximation to the quadratic term is applied. The proposed feature selection algorithm was extended to solve multi-task microarray classification problems. We compared the single-task version of the proposed feature selection algorithm with 9 existing feature selection methods on 4 benchmark microarray data sets. The empirical results show that the proposed method achieved the most accurate predictions overall. We also evaluated the multi-task version of the proposed algorithm on 8 multi-task microarray datasets. The multi-task feature selection algorithm resulted in significantly higher accuracy than when using the single-task feature selection methods.

  10. Random forest feature selection approach for image segmentation

    NASA Astrophysics Data System (ADS)

    Lefkovits, László; Lefkovits, Szidónia; Emerich, Simina; Vaida, Mircea Florin

    2017-03-01

    In the field of image segmentation, discriminative models have shown promising performance. Generally, every such model begins with the extraction of numerous features from annotated images. Most authors create their discriminative model by using many features without using any selection criteria. A more reliable model can be built by using a framework that selects the important variables, from the point of view of the classification, and eliminates the unimportant once. In this article we present a framework for feature selection and data dimensionality reduction. The methodology is built around the random forest (RF) algorithm and its variable importance evaluation. In order to deal with datasets so large as to be practically unmanageable, we propose an algorithm based on RF that reduces the dimension of the database by eliminating irrelevant features. Furthermore, this framework is applied to optimize our discriminative model for brain tumor segmentation.

  11. Ancient jades map 3,000 years of prehistoric exchange in Southeast Asia

    PubMed Central

    Hung, Hsiao-Chun; Iizuka, Yoshiyuki; Bellwood, Peter; Nguyen, Kim Dung; Bellina, Bérénice; Silapanth, Praon; Dizon, Eusebio; Santiago, Rey; Datan, Ipoi; Manton, Jonathan H.

    2007-01-01

    We have used electron probe microanalysis to examine Southeast Asian nephrite (jade) artifacts, many archeologically excavated, dating from 3000 B.C. through the first millennium A.D. The research has revealed the existence of one of the most extensive sea-based trade networks of a single geological material in the prehistoric world. Green nephrite from a source in eastern Taiwan was used to make two very specific forms of ear pendant that were distributed, between 500 B.C. and 500 A.D., through the Philippines, East Malaysia, southern Vietnam, and peninsular Thailand, forming a 3,000-km-diameter halo around the southern and eastern coastlines of the South China Sea. Other Taiwan nephrite artifacts, especially beads and bracelets, were distributed earlier during Neolithic times throughout Taiwan and from Taiwan into the Philippines. PMID:18048347

  12. Ancient jades map 3,000 years of prehistoric exchange in Southeast Asia.

    PubMed

    Hung, Hsiao-Chun; Iizuka, Yoshiyuki; Bellwood, Peter; Nguyen, Kim Dung; Bellina, Bérénice; Silapanth, Praon; Dizon, Eusebio; Santiago, Rey; Datan, Ipoi; Manton, Jonathan H

    2007-12-11

    We have used electron probe microanalysis to examine Southeast Asian nephrite (jade) artifacts, many archeologically excavated, dating from 3000 B.C. through the first millennium A.D. The research has revealed the existence of one of the most extensive sea-based trade networks of a single geological material in the prehistoric world. Green nephrite from a source in eastern Taiwan was used to make two very specific forms of ear pendant that were distributed, between 500 B.C. and 500 A.D., through the Philippines, East Malaysia, southern Vietnam, and peninsular Thailand, forming a 3,000-km-diameter halo around the southern and eastern coastlines of the South China Sea. Other Taiwan nephrite artifacts, especially beads and bracelets, were distributed earlier during Neolithic times throughout Taiwan and from Taiwan into the Philippines.

  13. Compact cancer biomarkers discovery using a swarm intelligence feature selection algorithm.

    PubMed

    Martinez, Emmanuel; Alvarez, Mario Moises; Trevino, Victor

    2010-08-01

    Biomarker discovery is a typical application from functional genomics. Due to the large number of genes studied simultaneously in microarray data, feature selection is a key step. Swarm intelligence has emerged as a solution for the feature selection problem. However, swarm intelligence settings for feature selection fail to select small features subsets. We have proposed a swarm intelligence feature selection algorithm based on the initialization and update of only a subset of particles in the swarm. In this study, we tested our algorithm in 11 microarray datasets for brain, leukemia, lung, prostate, and others. We show that the proposed swarm intelligence algorithm successfully increase the classification accuracy and decrease the number of selected features compared to other swarm intelligence methods. Copyright © 2010 Elsevier Ltd. All rights reserved.

  14. Feature Selection and Effective Classifiers.

    ERIC Educational Resources Information Center

    Deogun, Jitender S.; Choubey, Suresh K.; Raghavan, Vijay V.; Sever, Hayri

    1998-01-01

    Develops and analyzes four algorithms for feature selection in the context of rough set methodology. Experimental results confirm the expected relationship between the time complexity of these algorithms and the classification accuracy of the resulting upper classifiers. When compared, results of upper classifiers perform better than lower…

  15. Unbiased feature selection in learning random forests for high-dimensional data.

    PubMed

    Nguyen, Thanh-Tung; Huang, Joshua Zhexue; Nguyen, Thuy Thi

    2015-01-01

    Random forests (RFs) have been widely used as a powerful classification method. However, with the randomization in both bagging samples and feature selection, the trees in the forest tend to select uninformative features for node splitting. This makes RFs have poor accuracy when working with high-dimensional data. Besides that, RFs have bias in the feature selection process where multivalued features are favored. Aiming at debiasing feature selection in RFs, we propose a new RF algorithm, called xRF, to select good features in learning RFs for high-dimensional data. We first remove the uninformative features using p-value assessment, and the subset of unbiased features is then selected based on some statistical measures. This feature subset is then partitioned into two subsets. A feature weighting sampling technique is used to sample features from these two subsets for building trees. This approach enables one to generate more accurate trees, while allowing one to reduce dimensionality and the amount of data needed for learning RFs. An extensive set of experiments has been conducted on 47 high-dimensional real-world datasets including image datasets. The experimental results have shown that RFs with the proposed approach outperformed the existing random forests in increasing the accuracy and the AUC measures.

  16. Oculomotor selection underlies feature retention in visual working memory.

    PubMed

    Hanning, Nina M; Jonikaitis, Donatas; Deubel, Heiner; Szinte, Martin

    2016-02-01

    Oculomotor selection, spatial task relevance, and visual working memory (WM) are described as three processes highly intertwined and sustained by similar cortical structures. However, because task-relevant locations always constitute potential saccade targets, no study so far has been able to distinguish between oculomotor selection and spatial task relevance. We designed an experiment that allowed us to dissociate in humans the contribution of task relevance, oculomotor selection, and oculomotor execution to the retention of feature representations in WM. We report that task relevance and oculomotor selection lead to dissociable effects on feature WM maintenance. In a first task, in which an object's location was encoded as a saccade target, its feature representations were successfully maintained in WM, whereas they declined at nonsaccade target locations. Likewise, we observed a similar WM benefit at the target of saccades that were prepared but never executed. In a second task, when an object's location was marked as task relevant but constituted a nonsaccade target (a location to avoid), feature representations maintained at that location did not benefit. Combined, our results demonstrate that oculomotor selection is consistently associated with WM, whereas task relevance is not. This provides evidence for an overlapping circuitry serving saccade target selection and feature-based WM that can be dissociated from processes encoding task-relevant locations. Copyright © 2016 the American Physiological Society.

  17. Feature and Region Selection for Visual Learning.

    PubMed

    Zhao, Ji; Wang, Liantao; Cabral, Ricardo; De la Torre, Fernando

    2016-03-01

    Visual learning problems, such as object classification and action recognition, are typically approached using extensions of the popular bag-of-words (BoWs) model. Despite its great success, it is unclear what visual features the BoW model is learning. Which regions in the image or video are used to discriminate among classes? Which are the most discriminative visual words? Answering these questions is fundamental for understanding existing BoW models and inspiring better models for visual recognition. To answer these questions, this paper presents a method for feature selection and region selection in the visual BoW model. This allows for an intermediate visualization of the features and regions that are important for visual learning. The main idea is to assign latent weights to the features or regions, and jointly optimize these latent variables with the parameters of a classifier (e.g., support vector machine). There are four main benefits of our approach: 1) our approach accommodates non-linear additive kernels, such as the popular χ(2) and intersection kernel; 2) our approach is able to handle both regions in images and spatio-temporal regions in videos in a unified way; 3) the feature selection problem is convex, and both problems can be solved using a scalable reduced gradient method; and 4) we point out strong connections with multiple kernel learning and multiple instance learning approaches. Experimental results in the PASCAL VOC 2007, MSR Action Dataset II and YouTube illustrate the benefits of our approach.

  18. Hadoop neural network for parallel and distributed feature selection.

    PubMed

    Hodge, Victoria J; O'Keefe, Simon; Austin, Jim

    2016-06-01

    In this paper, we introduce a theoretical basis for a Hadoop-based neural network for parallel and distributed feature selection in Big Data sets. It is underpinned by an associative memory (binary) neural network which is highly amenable to parallel and distributed processing and fits with the Hadoop paradigm. There are many feature selectors described in the literature which all have various strengths and weaknesses. We present the implementation details of five feature selection algorithms constructed using our artificial neural network framework embedded in Hadoop YARN. Hadoop allows parallel and distributed processing. Each feature selector can be divided into subtasks and the subtasks can then be processed in parallel. Multiple feature selectors can also be processed simultaneously (in parallel) allowing multiple feature selectors to be compared. We identify commonalities among the five features selectors. All can be processed in the framework using a single representation and the overall processing can also be greatly reduced by only processing the common aspects of the feature selectors once and propagating these aspects across all five feature selectors as necessary. This allows the best feature selector and the actual features to select to be identified for large and high dimensional data sets through exploiting the efficiency and flexibility of embedding the binary associative-memory neural network in Hadoop. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.

  19. Informative Feature Selection for Object Recognition via Sparse PCA

    DTIC Science & Technology

    2011-04-07

    constraint on images collected from low-power camera net- works instead of high-end photography is that establishing wide-baseline feature correspondence of...variable selection tool for selecting informative features in the object images captured from low-resolution cam- era sensor networks. Firstly, we...More examples can be found in Figure 4 later. 3. Identifying Informative Features Classical PCA is a well established tool for the analysis of high

  20. Multiclass feature selection for improved pediatric brain tumor segmentation

    NASA Astrophysics Data System (ADS)

    Ahmed, Shaheen; Iftekharuddin, Khan M.

    2012-03-01

    In our previous work, we showed that fractal-based texture features are effective in detection, segmentation and classification of posterior-fossa (PF) pediatric brain tumor in multimodality MRI. We exploited an information theoretic approach such as Kullback-Leibler Divergence (KLD) for feature selection and ranking different texture features. We further incorporated the feature selection technique with segmentation method such as Expectation Maximization (EM) for segmentation of tumor T and non tumor (NT) tissues. In this work, we extend the two class KLD technique to multiclass for effectively selecting the best features for brain tumor (T), cyst (C) and non tumor (NT). We further obtain segmentation robustness for each tissue types by computing Bay's posterior probabilities and corresponding number of pixels for each tissue segments in MRI patient images. We evaluate improved tumor segmentation robustness using different similarity metric for 5 patients in T1, T2 and FLAIR modalities.

  1. Select Features in "Finale 2011" for Music Educators

    ERIC Educational Resources Information Center

    Thompson, Douglas Earl

    2011-01-01

    A feature-laden software program such as "Finale" is an overwhelming tool to master--if one hopes to master many features in a short amount of time. Believing that working with a fewer number of features can be a helpful approach, this article looks at a select number of features in "Finale 2011" of obvious use to music educators. These features…

  2. Improving Naive Bayes with Online Feature Selection for Quick Adaptation to Evolving Feature Usefulness

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pon, R K; Cardenas, A F; Buttler, D J

    The definition of what makes an article interesting varies from user to user and continually evolves even for a single user. As a result, for news recommendation systems, useless document features can not be determined a priori and all features are usually considered for interestingness classification. Consequently, the presence of currently useless features degrades classification performance [1], particularly over the initial set of news articles being classified. The initial set of document is critical for a user when considering which particular news recommendation system to adopt. To address these problems, we introduce an improved version of the naive Bayes classifiermore » with online feature selection. We use correlation to determine the utility of each feature and take advantage of the conditional independence assumption used by naive Bayes for online feature selection and classification. The augmented naive Bayes classifier performs 28% better than the traditional naive Bayes classifier in recommending news articles from the Yahoo! RSS feeds.« less

  3. [Electroencephalogram Feature Selection Based on Correlation Coefficient Analysis].

    PubMed

    Zhou, Jinzhi; Tang, Xiaofang

    2015-08-01

    In order to improve the accuracy of classification with small amount of motor imagery training data on the development of brain-computer interface (BCD systems, we proposed an analyzing method to automatically select the characteristic parameters based on correlation coefficient analysis. Throughout the five sample data of dataset IV a from 2005 BCI Competition, we utilized short-time Fourier transform (STFT) and correlation coefficient calculation to reduce the number of primitive electroencephalogram dimension, then introduced feature extraction based on common spatial pattern (CSP) and classified by linear discriminant analysis (LDA). Simulation results showed that the average rate of classification accuracy could be improved by using correlation coefficient feature selection method than those without using this algorithm. Comparing with support vector machine (SVM) optimization features algorithm, the correlation coefficient analysis can lead better selection parameters to improve the accuracy of classification.

  4. Tidal cycles of total particulate mercury in the Jade Bay, lower Saxonian Wadden Sea, southern North Sea.

    PubMed

    Jin, Huafang; Liebezeit, Gerd

    2013-01-01

    In this study, we evaluate the nature of the relationship between particulate matter and total mercury concentrations. For this purpose, we estimate both of the two values in water column over 12-h tidal cycles of the Jade Bay, southern North Sea. Total particulate mercury in 250 mL water samples was determined by oxygen combustion-gold amalgamation. Mercury contents varied from 63 to 259 ng/g suspended particulate matter (SPM) or 3.5-52.8 ng/L in surface waters. Total particulate mercury content (THg(p)) was positively correlated with (SPM), indicating that mercury in tidal waters is mostly associated with (SPM), and that tidal variations of total particulate mercury are mainly due to changes in (SPM) content throughout the tidal cycle. Maximum values for THg(p) were observed during mid-flood and mid-ebb, while the lowest values were determined at low tide and high tide. These data suggest that there are no mercury point sources in the Jade Bay. Moreover, the THg(p) content at low tide and high tide were significantly lower than the values recorded in the bottom sediment of the sampling site (>200 ng/g DW), while THg(p) content during the mid-flood and mid-ebb were comparable to the THg content in the surface bottom sediments. Therefore, changes in THg(p) content in the water column due to tidal forcing may have resulted from re-suspension of underlying surface sediments with relatively high mercury content.

  5. Embedded Incremental Feature Selection for Reinforcement Learning

    DTIC Science & Technology

    2012-05-01

    Prior to this work, feature selection for reinforce- ment learning has focused on linear value function ap- proximation ( Kolter and Ng, 2009; Parr et al...InProceed- ings of the the 23rd International Conference on Ma- chine Learning, pages 449–456. Kolter , J. Z. and Ng, A. Y. (2009). Regularization and feature

  6. Combining Feature Selection and Integration—A Neural Model for MT Motion Selectivity

    PubMed Central

    Beck, Cornelia; Neumann, Heiko

    2011-01-01

    Background The computation of pattern motion in visual area MT based on motion input from area V1 has been investigated in many experiments and models attempting to replicate the main mechanisms. Two different core conceptual approaches were developed to explain the findings. In integrationist models the key mechanism to achieve pattern selectivity is the nonlinear integration of V1 motion activity. In contrast, selectionist models focus on the motion computation at positions with 2D features. Methodology/Principal Findings Recent experiments revealed that neither of the two concepts alone is sufficient to explain all experimental data and that most of the existing models cannot account for the complex behaviour found. MT pattern selectivity changes over time for stimuli like type II plaids from vector average to the direction computed with an intersection of constraint rule or by feature tracking. Also, the spatial arrangement of the stimulus within the receptive field of a MT cell plays a crucial role. We propose a recurrent neural model showing how feature integration and selection can be combined into one common architecture to explain these findings. The key features of the model are the computation of 1D and 2D motion in model area V1 subpopulations that are integrated in model MT cells using feedforward and feedback processing. Our results are also in line with findings concerning the solution of the aperture problem. Conclusions/Significance We propose a new neural model for MT pattern computation and motion disambiguation that is based on a combination of feature selection and integration. The model can explain a range of recent neurophysiological findings including temporally dynamic behaviour. PMID:21814543

  7. Altitudinal patterns of plant diversity on the Jade Dragon Snow Mountain, southwestern China.

    PubMed

    Xu, Xiang; Zhang, Huayong; Tian, Wang; Zeng, Xiaoqiang; Huang, Hai

    2016-01-01

    Understanding altitudinal patterns of biological diversity and their underlying mechanisms is critically important for biodiversity conservation in mountainous regions. The contribution of area to plant diversity patterns is widely acknowledged and may mask the effects of other determinant factors. In this context, it is important to examine altitudinal patterns of corrected taxon richness by eliminating the area effect. Here we adopt two methods to correct observed taxon richness: a power-law relationship between richness and area, hereafter "method 1"; and richness counted in equal-area altitudinal bands, hereafter "method 2". We compare these two methods on the Jade Dragon Snow Mountain, which is the nearest large-scale altitudinal gradient to the Equator in the Northern Hemisphere. We find that seed plant species richness, genus richness, family richness, and species richness of trees, shrubs, herbs and Groups I-III (species with elevational range size <150, between 150 and 500, and >500 m, respectively) display distinct hump-shaped patterns along the equal-elevation altitudinal gradient. The corrected taxon richness based on method 2 (TRcor2) also shows hump-shaped patterns for all plant groups, while the one based on method 1 (TRcor1) does not. As for the abiotic factors influencing the patterns, mean annual temperature, mean annual precipitation, and mid-domain effect explain a larger part of the variation in TRcor2 than in TRcor1. In conclusion, for biodiversity patterns on the Jade Dragon Snow Mountain, method 2 preserves the significant influences of abiotic factors to the greatest degree while eliminating the area effect. Our results thus reveal that although the classical method 1 has earned more attention and approval in previous research, method 2 can perform better under certain circumstances. We not only confirm the essential contribution of method 1 in community ecology, but also highlight the significant role of method 2 in eliminating the area

  8. Effective Feature Selection for Classification of Promoter Sequences.

    PubMed

    K, Kouser; P G, Lavanya; Rangarajan, Lalitha; K, Acharya Kshitish

    2016-01-01

    Exploring novel computational methods in making sense of biological data has not only been a necessity, but also productive. A part of this trend is the search for more efficient in silico methods/tools for analysis of promoters, which are parts of DNA sequences that are involved in regulation of expression of genes into other functional molecules. Promoter regions vary greatly in their function based on the sequence of nucleotides and the arrangement of protein-binding short-regions called motifs. In fact, the regulatory nature of the promoters seems to be largely driven by the selective presence and/or the arrangement of these motifs. Here, we explore computational classification of promoter sequences based on the pattern of motif distributions, as such classification can pave a new way of functional analysis of promoters and to discover the functionally crucial motifs. We make use of Position Specific Motif Matrix (PSMM) features for exploring the possibility of accurately classifying promoter sequences using some of the popular classification techniques. The classification results on the complete feature set are low, perhaps due to the huge number of features. We propose two ways of reducing features. Our test results show improvement in the classification output after the reduction of features. The results also show that decision trees outperform SVM (Support Vector Machine), KNN (K Nearest Neighbor) and ensemble classifier LibD3C, particularly with reduced features. The proposed feature selection methods outperform some of the popular feature transformation methods such as PCA and SVD. Also, the methods proposed are as accurate as MRMR (feature selection method) but much faster than MRMR. Such methods could be useful to categorize new promoters and explore regulatory mechanisms of gene expressions in complex eukaryotic species.

  9. AVC: Selecting discriminative features on basis of AUC by maximizing variable complementarity.

    PubMed

    Sun, Lei; Wang, Jun; Wei, Jinmao

    2017-03-14

    The Receiver Operator Characteristic (ROC) curve is well-known in evaluating classification performance in biomedical field. Owing to its superiority in dealing with imbalanced and cost-sensitive data, the ROC curve has been exploited as a popular metric to evaluate and find out disease-related genes (features). The existing ROC-based feature selection approaches are simple and effective in evaluating individual features. However, these approaches may fail to find real target feature subset due to their lack of effective means to reduce the redundancy between features, which is essential in machine learning. In this paper, we propose to assess feature complementarity by a trick of measuring the distances between the misclassified instances and their nearest misses on the dimensions of pairwise features. If a misclassified instance and its nearest miss on one feature dimension are far apart on another feature dimension, the two features are regarded as complementary to each other. Subsequently, we propose a novel filter feature selection approach on the basis of the ROC analysis. The new approach employs an efficient heuristic search strategy to select optimal features with highest complementarities. The experimental results on a broad range of microarray data sets validate that the classifiers built on the feature subset selected by our approach can get the minimal balanced error rate with a small amount of significant features. Compared with other ROC-based feature selection approaches, our new approach can select fewer features and effectively improve the classification performance.

  10. Feature Selection for Classification of Polar Regions Using a Fuzzy Expert System

    NASA Technical Reports Server (NTRS)

    Penaloza, Mauel A.; Welch, Ronald M.

    1996-01-01

    Labeling, feature selection, and the choice of classifier are critical elements for classification of scenes and for image understanding. This study examines several methods for feature selection in polar regions, including the list, of a fuzzy logic-based expert system for further refinement of a set of selected features. Six Advanced Very High Resolution Radiometer (AVHRR) Local Area Coverage (LAC) arctic scenes are classified into nine classes: water, snow / ice, ice cloud, land, thin stratus, stratus over water, cumulus over water, textured snow over water, and snow-covered mountains. Sixty-seven spectral and textural features are computed and analyzed by the feature selection algorithms. The divergence, histogram analysis, and discriminant analysis approaches are intercompared for their effectiveness in feature selection. The fuzzy expert system method is used not only to determine the effectiveness of each approach in classifying polar scenes, but also to further reduce the features into a more optimal set. For each selection method,features are ranked from best to worst, and the best half of the features are selected. Then, rules using these selected features are defined. The results of running the fuzzy expert system with these rules show that the divergence method produces the best set features, not only does it produce the highest classification accuracy, but also it has the lowest computation requirements. A reduction of the set of features produced by the divergence method using the fuzzy expert system results in an overall classification accuracy of over 95 %. However, this increase of accuracy has a high computation cost.

  11. Feature extraction and selection strategies for automated target recognition

    NASA Astrophysics Data System (ADS)

    Greene, W. Nicholas; Zhang, Yuhan; Lu, Thomas T.; Chao, Tien-Hsin

    2010-04-01

    Several feature extraction and selection methods for an existing automatic target recognition (ATR) system using JPLs Grayscale Optical Correlator (GOC) and Optimal Trade-Off Maximum Average Correlation Height (OT-MACH) filter were tested using MATLAB. The ATR system is composed of three stages: a cursory regionof- interest (ROI) search using the GOC and OT-MACH filter, a feature extraction and selection stage, and a final classification stage. Feature extraction and selection concerns transforming potential target data into more useful forms as well as selecting important subsets of that data which may aide in detection and classification. The strategies tested were built around two popular extraction methods: Principal Component Analysis (PCA) and Independent Component Analysis (ICA). Performance was measured based on the classification accuracy and free-response receiver operating characteristic (FROC) output of a support vector machine(SVM) and a neural net (NN) classifier.

  12. Feature selection using probabilistic prediction of support vector regression.

    PubMed

    Yang, Jian-Bo; Ong, Chong-Jin

    2011-06-01

    This paper presents a new wrapper-based feature selection method for support vector regression (SVR) using its probabilistic predictions. The method computes the importance of a feature by aggregating the difference, over the feature space, of the conditional density functions of the SVR prediction with and without the feature. As the exact computation of this importance measure is expensive, two approximations are proposed. The effectiveness of the measure using these approximations, in comparison to several other existing feature selection methods for SVR, is evaluated on both artificial and real-world problems. The result of the experiments show that the proposed method generally performs better than, or at least as well as, the existing methods, with notable advantage when the dataset is sparse.

  13. Feature Selection Using Information Gain for Improved Structural-Based Alert Correlation

    PubMed Central

    Siraj, Maheyzah Md; Zainal, Anazida; Elshoush, Huwaida Tagelsir; Elhaj, Fatin

    2016-01-01

    Grouping and clustering alerts for intrusion detection based on the similarity of features is referred to as structurally base alert correlation and can discover a list of attack steps. Previous researchers selected different features and data sources manually based on their knowledge and experience, which lead to the less accurate identification of attack steps and inconsistent performance of clustering accuracy. Furthermore, the existing alert correlation systems deal with a huge amount of data that contains null values, incomplete information, and irrelevant features causing the analysis of the alerts to be tedious, time-consuming and error-prone. Therefore, this paper focuses on selecting accurate and significant features of alerts that are appropriate to represent the attack steps, thus, enhancing the structural-based alert correlation model. A two-tier feature selection method is proposed to obtain the significant features. The first tier aims at ranking the subset of features based on high information gain entropy in decreasing order. The‏ second tier extends additional features with a better discriminative ability than the initially ranked features. Performance analysis results show the significance of the selected features in terms of the clustering accuracy using 2000 DARPA intrusion detection scenario-specific dataset. PMID:27893821

  14. [Feature extraction for breast cancer data based on geometric algebra theory and feature selection using differential evolution].

    PubMed

    Li, Jing; Hong, Wenxue

    2014-12-01

    The feature extraction and feature selection are the important issues in pattern recognition. Based on the geometric algebra representation of vector, a new feature extraction method using blade coefficient of geometric algebra was proposed in this study. At the same time, an improved differential evolution (DE) feature selection method was proposed to solve the elevated high dimension issue. The simple linear discriminant analysis was used as the classifier. The result of the 10-fold cross-validation (10 CV) classification of public breast cancer biomedical dataset was more than 96% and proved superior to that of the original features and traditional feature extraction method.

  15. Tumor recognition in wireless capsule endoscopy images using textural features and SVM-based feature selection.

    PubMed

    Li, Baopu; Meng, Max Q-H

    2012-05-01

    Tumor in digestive tract is a common disease and wireless capsule endoscopy (WCE) is a relatively new technology to examine diseases for digestive tract especially for small intestine. This paper addresses the problem of automatic recognition of tumor for WCE images. Candidate color texture feature that integrates uniform local binary pattern and wavelet is proposed to characterize WCE images. The proposed features are invariant to illumination change and describe multiresolution characteristics of WCE images. Two feature selection approaches based on support vector machine, sequential forward floating selection and recursive feature elimination, are further employed to refine the proposed features for improving the detection accuracy. Extensive experiments validate that the proposed computer-aided diagnosis system achieves a promising tumor recognition accuracy of 92.4% in WCE images on our collected data.

  16. McTwo: a two-step feature selection algorithm based on maximal information coefficient.

    PubMed

    Ge, Ruiquan; Zhou, Manli; Luo, Youxi; Meng, Qinghan; Mai, Guoqin; Ma, Dongli; Wang, Guoqing; Zhou, Fengfeng

    2016-03-23

    High-throughput bio-OMIC technologies are producing high-dimension data from bio-samples at an ever increasing rate, whereas the training sample number in a traditional experiment remains small due to various difficulties. This "large p, small n" paradigm in the area of biomedical "big data" may be at least partly solved by feature selection algorithms, which select only features significantly associated with phenotypes. Feature selection is an NP-hard problem. Due to the exponentially increased time requirement for finding the globally optimal solution, all the existing feature selection algorithms employ heuristic rules to find locally optimal solutions, and their solutions achieve different performances on different datasets. This work describes a feature selection algorithm based on a recently published correlation measurement, Maximal Information Coefficient (MIC). The proposed algorithm, McTwo, aims to select features associated with phenotypes, independently of each other, and achieving high classification performance of the nearest neighbor algorithm. Based on the comparative study of 17 datasets, McTwo performs about as well as or better than existing algorithms, with significantly reduced numbers of selected features. The features selected by McTwo also appear to have particular biomedical relevance to the phenotypes from the literature. McTwo selects a feature subset with very good classification performance, as well as a small feature number. So McTwo may represent a complementary feature selection algorithm for the high-dimensional biomedical datasets.

  17. An improved wrapper-based feature selection method for machinery fault diagnosis

    PubMed Central

    2017-01-01

    A major issue of machinery fault diagnosis using vibration signals is that it is over-reliant on personnel knowledge and experience in interpreting the signal. Thus, machine learning has been adapted for machinery fault diagnosis. The quantity and quality of the input features, however, influence the fault classification performance. Feature selection plays a vital role in selecting the most representative feature subset for the machine learning algorithm. In contrast, the trade-off relationship between capability when selecting the best feature subset and computational effort is inevitable in the wrapper-based feature selection (WFS) method. This paper proposes an improved WFS technique before integration with a support vector machine (SVM) model classifier as a complete fault diagnosis system for a rolling element bearing case study. The bearing vibration dataset made available by the Case Western Reserve University Bearing Data Centre was executed using the proposed WFS and its performance has been analysed and discussed. The results reveal that the proposed WFS secures the best feature subset with a lower computational effort by eliminating the redundancy of re-evaluation. The proposed WFS has therefore been found to be capable and efficient to carry out feature selection tasks. PMID:29261689

  18. Feature Extraction and Selection Strategies for Automated Target Recognition

    NASA Technical Reports Server (NTRS)

    Greene, W. Nicholas; Zhang, Yuhan; Lu, Thomas T.; Chao, Tien-Hsin

    2010-01-01

    Several feature extraction and selection methods for an existing automatic target recognition (ATR) system using JPLs Grayscale Optical Correlator (GOC) and Optimal Trade-Off Maximum Average Correlation Height (OT-MACH) filter were tested using MATLAB. The ATR system is composed of three stages: a cursory region of-interest (ROI) search using the GOC and OT-MACH filter, a feature extraction and selection stage, and a final classification stage. Feature extraction and selection concerns transforming potential target data into more useful forms as well as selecting important subsets of that data which may aide in detection and classification. The strategies tested were built around two popular extraction methods: Principal Component Analysis (PCA) and Independent Component Analysis (ICA). Performance was measured based on the classification accuracy and free-response receiver operating characteristic (FROC) output of a support vector machine(SVM) and a neural net (NN) classifier.

  19. Adaptive feature selection using v-shaped binary particle swarm optimization.

    PubMed

    Teng, Xuyang; Dong, Hongbin; Zhou, Xiurong

    2017-01-01

    Feature selection is an important preprocessing method in machine learning and data mining. This process can be used not only to reduce the amount of data to be analyzed but also to build models with stronger interpretability based on fewer features. Traditional feature selection methods evaluate the dependency and redundancy of features separately, which leads to a lack of measurement of their combined effect. Moreover, a greedy search considers only the optimization of the current round and thus cannot be a global search. To evaluate the combined effect of different subsets in the entire feature space, an adaptive feature selection method based on V-shaped binary particle swarm optimization is proposed. In this method, the fitness function is constructed using the correlation information entropy. Feature subsets are regarded as individuals in a population, and the feature space is searched using V-shaped binary particle swarm optimization. The above procedure overcomes the hard constraint on the number of features, enables the combined evaluation of each subset as a whole, and improves the search ability of conventional binary particle swarm optimization. The proposed algorithm is an adaptive method with respect to the number of feature subsets. The experimental results show the advantages of optimizing the feature subsets using the V-shaped transfer function and confirm the effectiveness and efficiency of the feature subsets obtained under different classifiers.

  20. Adaptive feature selection using v-shaped binary particle swarm optimization

    PubMed Central

    Dong, Hongbin; Zhou, Xiurong

    2017-01-01

    Feature selection is an important preprocessing method in machine learning and data mining. This process can be used not only to reduce the amount of data to be analyzed but also to build models with stronger interpretability based on fewer features. Traditional feature selection methods evaluate the dependency and redundancy of features separately, which leads to a lack of measurement of their combined effect. Moreover, a greedy search considers only the optimization of the current round and thus cannot be a global search. To evaluate the combined effect of different subsets in the entire feature space, an adaptive feature selection method based on V-shaped binary particle swarm optimization is proposed. In this method, the fitness function is constructed using the correlation information entropy. Feature subsets are regarded as individuals in a population, and the feature space is searched using V-shaped binary particle swarm optimization. The above procedure overcomes the hard constraint on the number of features, enables the combined evaluation of each subset as a whole, and improves the search ability of conventional binary particle swarm optimization. The proposed algorithm is an adaptive method with respect to the number of feature subsets. The experimental results show the advantages of optimizing the feature subsets using the V-shaped transfer function and confirm the effectiveness and efficiency of the feature subsets obtained under different classifiers. PMID:28358850

  1. Sparse Zero-Sum Games as Stable Functional Feature Selection

    PubMed Central

    Sokolovska, Nataliya; Teytaud, Olivier; Rizkalla, Salwa; Clément, Karine; Zucker, Jean-Daniel

    2015-01-01

    In large-scale systems biology applications, features are structured in hidden functional categories whose predictive power is identical. Feature selection, therefore, can lead not only to a problem with a reduced dimensionality, but also reveal some knowledge on functional classes of variables. In this contribution, we propose a framework based on a sparse zero-sum game which performs a stable functional feature selection. In particular, the approach is based on feature subsets ranking by a thresholding stochastic bandit. We provide a theoretical analysis of the introduced algorithm. We illustrate by experiments on both synthetic and real complex data that the proposed method is competitive from the predictive and stability viewpoints. PMID:26325268

  2. Speech Emotion Feature Selection Method Based on Contribution Analysis Algorithm of Neural Network

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wang Xiaojia; Mao Qirong; Zhan Yongzhao

    There are many emotion features. If all these features are employed to recognize emotions, redundant features may be existed. Furthermore, recognition result is unsatisfying and the cost of feature extraction is high. In this paper, a method to select speech emotion features based on contribution analysis algorithm of NN is presented. The emotion features are selected by using contribution analysis algorithm of NN from the 95 extracted features. Cluster analysis is applied to analyze the effectiveness for the features selected, and the time of feature extraction is evaluated. Finally, 24 emotion features selected are used to recognize six speech emotions.more » The experiments show that this method can improve the recognition rate and the time of feature extraction.« less

  3. Asymmetric bagging and feature selection for activities prediction of drug molecules.

    PubMed

    Li, Guo-Zheng; Meng, Hao-Hua; Lu, Wen-Cong; Yang, Jack Y; Yang, Mary Qu

    2008-05-28

    Activities of drug molecules can be predicted by QSAR (quantitative structure activity relationship) models, which overcomes the disadvantages of high cost and long cycle by employing the traditional experimental method. With the fact that the number of drug molecules with positive activity is rather fewer than that of negatives, it is important to predict molecular activities considering such an unbalanced situation. Here, asymmetric bagging and feature selection are introduced into the problem and asymmetric bagging of support vector machines (asBagging) is proposed on predicting drug activities to treat the unbalanced problem. At the same time, the features extracted from the structures of drug molecules affect prediction accuracy of QSAR models. Therefore, a novel algorithm named PRIFEAB is proposed, which applies an embedded feature selection method to remove redundant and irrelevant features for asBagging. Numerical experimental results on a data set of molecular activities show that asBagging improve the AUC and sensitivity values of molecular activities and PRIFEAB with feature selection further helps to improve the prediction ability. Asymmetric bagging can help to improve prediction accuracy of activities of drug molecules, which can be furthermore improved by performing feature selection to select relevant features from the drug molecules data sets.

  4. Effect of feature-selective attention on neuronal responses in macaque area MT

    PubMed Central

    Chen, X.; Hoffmann, K.-P.; Albright, T. D.

    2012-01-01

    Attention influences visual processing in striate and extrastriate cortex, which has been extensively studied for spatial-, object-, and feature-based attention. Most studies exploring neural signatures of feature-based attention have trained animals to attend to an object identified by a certain feature and ignore objects/displays identified by a different feature. Little is known about the effects of feature-selective attention, where subjects attend to one stimulus feature domain (e.g., color) of an object while features from different domains (e.g., direction of motion) of the same object are ignored. To study this type of feature-selective attention in area MT in the middle temporal sulcus, we trained macaque monkeys to either attend to and report the direction of motion of a moving sine wave grating (a feature for which MT neurons display strong selectivity) or attend to and report its color (a feature for which MT neurons have very limited selectivity). We hypothesized that neurons would upregulate their firing rate during attend-direction conditions compared with attend-color conditions. We found that feature-selective attention significantly affected 22% of MT neurons. Contrary to our hypothesis, these neurons did not necessarily increase firing rate when animals attended to direction of motion but fell into one of two classes. In one class, attention to color increased the gain of stimulus-induced responses compared with attend-direction conditions. The other class displayed the opposite effects. Feature-selective activity modulations occurred earlier in neurons modulated by attention to color compared with neurons modulated by attention to motion direction. Thus feature-selective attention influences neuronal processing in macaque area MT but often exhibited a mismatch between the preferred stimulus dimension (direction of motion) and the preferred attention dimension (attention to color). PMID:22170961

  5. Effect of feature-selective attention on neuronal responses in macaque area MT.

    PubMed

    Chen, X; Hoffmann, K-P; Albright, T D; Thiele, A

    2012-03-01

    Attention influences visual processing in striate and extrastriate cortex, which has been extensively studied for spatial-, object-, and feature-based attention. Most studies exploring neural signatures of feature-based attention have trained animals to attend to an object identified by a certain feature and ignore objects/displays identified by a different feature. Little is known about the effects of feature-selective attention, where subjects attend to one stimulus feature domain (e.g., color) of an object while features from different domains (e.g., direction of motion) of the same object are ignored. To study this type of feature-selective attention in area MT in the middle temporal sulcus, we trained macaque monkeys to either attend to and report the direction of motion of a moving sine wave grating (a feature for which MT neurons display strong selectivity) or attend to and report its color (a feature for which MT neurons have very limited selectivity). We hypothesized that neurons would upregulate their firing rate during attend-direction conditions compared with attend-color conditions. We found that feature-selective attention significantly affected 22% of MT neurons. Contrary to our hypothesis, these neurons did not necessarily increase firing rate when animals attended to direction of motion but fell into one of two classes. In one class, attention to color increased the gain of stimulus-induced responses compared with attend-direction conditions. The other class displayed the opposite effects. Feature-selective activity modulations occurred earlier in neurons modulated by attention to color compared with neurons modulated by attention to motion direction. Thus feature-selective attention influences neuronal processing in macaque area MT but often exhibited a mismatch between the preferred stimulus dimension (direction of motion) and the preferred attention dimension (attention to color).

  6. EFS: an ensemble feature selection tool implemented as R-package and web-application.

    PubMed

    Neumann, Ursula; Genze, Nikita; Heider, Dominik

    2017-01-01

    Feature selection methods aim at identifying a subset of features that improve the prediction performance of subsequent classification models and thereby also simplify their interpretability. Preceding studies demonstrated that single feature selection methods can have specific biases, whereas an ensemble feature selection has the advantage to alleviate and compensate for these biases. The software EFS (Ensemble Feature Selection) makes use of multiple feature selection methods and combines their normalized outputs to a quantitative ensemble importance. Currently, eight different feature selection methods have been integrated in EFS, which can be used separately or combined in an ensemble. EFS identifies relevant features while compensating specific biases of single methods due to an ensemble approach. Thereby, EFS can improve the prediction accuracy and interpretability in subsequent binary classification models. EFS can be downloaded as an R-package from CRAN or used via a web application at http://EFS.heiderlab.de.

  7. Feature selection methods for big data bioinformatics: A survey from the search perspective.

    PubMed

    Wang, Lipo; Wang, Yaoli; Chang, Qing

    2016-12-01

    This paper surveys main principles of feature selection and their recent applications in big data bioinformatics. Instead of the commonly used categorization into filter, wrapper, and embedded approaches to feature selection, we formulate feature selection as a combinatorial optimization or search problem and categorize feature selection methods into exhaustive search, heuristic search, and hybrid methods, where heuristic search methods may further be categorized into those with or without data-distilled feature ranking measures. Copyright © 2016 Elsevier Inc. All rights reserved.

  8. Feature-Selective Attention Adaptively Shifts Noise Correlations in Primary Auditory Cortex.

    PubMed

    Downer, Joshua D; Rapone, Brittany; Verhein, Jessica; O'Connor, Kevin N; Sutter, Mitchell L

    2017-05-24

    Sensory environments often contain an overwhelming amount of information, with both relevant and irrelevant information competing for neural resources. Feature attention mediates this competition by selecting the sensory features needed to form a coherent percept. How attention affects the activity of populations of neurons to support this process is poorly understood because population coding is typically studied through simulations in which one sensory feature is encoded without competition. Therefore, to study the effects of feature attention on population-based neural coding, investigations must be extended to include stimuli with both relevant and irrelevant features. We measured noise correlations ( r noise ) within small neural populations in primary auditory cortex while rhesus macaques performed a novel feature-selective attention task. We found that the effect of feature-selective attention on r noise depended not only on the population tuning to the attended feature, but also on the tuning to the distractor feature. To attempt to explain how these observed effects might support enhanced perceptual performance, we propose an extension of a simple and influential model in which shifts in r noise can simultaneously enhance the representation of the attended feature while suppressing the distractor. These findings present a novel mechanism by which attention modulates neural populations to support sensory processing in cluttered environments. SIGNIFICANCE STATEMENT Although feature-selective attention constitutes one of the building blocks of listening in natural environments, its neural bases remain obscure. To address this, we developed a novel auditory feature-selective attention task and measured noise correlations ( r noise ) in rhesus macaque A1 during task performance. Unlike previous studies showing that the effect of attention on r noise depends on population tuning to the attended feature, we show that the effect of attention depends on the tuning

  9. Feature-Selective Attention Adaptively Shifts Noise Correlations in Primary Auditory Cortex

    PubMed Central

    2017-01-01

    Sensory environments often contain an overwhelming amount of information, with both relevant and irrelevant information competing for neural resources. Feature attention mediates this competition by selecting the sensory features needed to form a coherent percept. How attention affects the activity of populations of neurons to support this process is poorly understood because population coding is typically studied through simulations in which one sensory feature is encoded without competition. Therefore, to study the effects of feature attention on population-based neural coding, investigations must be extended to include stimuli with both relevant and irrelevant features. We measured noise correlations (rnoise) within small neural populations in primary auditory cortex while rhesus macaques performed a novel feature-selective attention task. We found that the effect of feature-selective attention on rnoise depended not only on the population tuning to the attended feature, but also on the tuning to the distractor feature. To attempt to explain how these observed effects might support enhanced perceptual performance, we propose an extension of a simple and influential model in which shifts in rnoise can simultaneously enhance the representation of the attended feature while suppressing the distractor. These findings present a novel mechanism by which attention modulates neural populations to support sensory processing in cluttered environments. SIGNIFICANCE STATEMENT Although feature-selective attention constitutes one of the building blocks of listening in natural environments, its neural bases remain obscure. To address this, we developed a novel auditory feature-selective attention task and measured noise correlations (rnoise) in rhesus macaque A1 during task performance. Unlike previous studies showing that the effect of attention on rnoise depends on population tuning to the attended feature, we show that the effect of attention depends on the tuning to the

  10. Select Features in "Sibelius 6" for Music Educators

    ERIC Educational Resources Information Center

    Thompson, Douglas Earl

    2012-01-01

    Select features of interest to music educators are spotlighted in "Sibelius 6" (version 6.2.0 build 88) and presented under three headings: (a) Worksheets, (2) Plug-Ins, and (3) Potpourri. Guidance is given for accessing these features, and the commentary suggests their application in the music education classroom. (Contains 3 figures.)

  11. Variable selection in near-infrared spectroscopy: benchmarking of feature selection methods on biodiesel data.

    PubMed

    Balabin, Roman M; Smirnov, Sergey V

    2011-04-29

    During the past several years, near-infrared (near-IR/NIR) spectroscopy has increasingly been adopted as an analytical tool in various fields from petroleum to biomedical sectors. The NIR spectrum (above 4000 cm(-1)) of a sample is typically measured by modern instruments at a few hundred of wavelengths. Recently, considerable effort has been directed towards developing procedures to identify variables (wavelengths) that contribute useful information. Variable selection (VS) or feature selection, also called frequency selection or wavelength selection, is a critical step in data analysis for vibrational spectroscopy (infrared, Raman, or NIRS). In this paper, we compare the performance of 16 different feature selection methods for the prediction of properties of biodiesel fuel, including density, viscosity, methanol content, and water concentration. The feature selection algorithms tested include stepwise multiple linear regression (MLR-step), interval partial least squares regression (iPLS), backward iPLS (BiPLS), forward iPLS (FiPLS), moving window partial least squares regression (MWPLS), (modified) changeable size moving window partial least squares (CSMWPLS/MCSMWPLSR), searching combination moving window partial least squares (SCMWPLS), successive projections algorithm (SPA), uninformative variable elimination (UVE, including UVE-SPA), simulated annealing (SA), back-propagation artificial neural networks (BP-ANN), Kohonen artificial neural network (K-ANN), and genetic algorithms (GAs, including GA-iPLS). Two linear techniques for calibration model building, namely multiple linear regression (MLR) and partial least squares regression/projection to latent structures (PLS/PLSR), are used for the evaluation of biofuel properties. A comparison with a non-linear calibration model, artificial neural networks (ANN-MLP), is also provided. Discussion of gasoline, ethanol-gasoline (bioethanol), and diesel fuel data is presented. The results of other spectroscopic

  12. Max-AUC Feature Selection in Computer-Aided Detection of Polyps in CT Colonography

    PubMed Central

    Xu, Jian-Wu; Suzuki, Kenji

    2014-01-01

    We propose a feature selection method based on a sequential forward floating selection (SFFS) procedure to improve the performance of a classifier in computerized detection of polyps in CT colonography (CTC). The feature selection method is coupled with a nonlinear support vector machine (SVM) classifier. Unlike the conventional linear method based on Wilks' lambda, the proposed method selected the most relevant features that would maximize the area under the receiver operating characteristic curve (AUC), which directly maximizes classification performance, evaluated based on AUC value, in the computer-aided detection (CADe) scheme. We presented two variants of the proposed method with different stopping criteria used in the SFFS procedure. The first variant searched all feature combinations allowed in the SFFS procedure and selected the subsets that maximize the AUC values. The second variant performed a statistical test at each step during the SFFS procedure, and it was terminated if the increase in the AUC value was not statistically significant. The advantage of the second variant is its lower computational cost. To test the performance of the proposed method, we compared it against the popular stepwise feature selection method based on Wilks' lambda for a colonic-polyp database (25 polyps and 2624 nonpolyps). We extracted 75 morphologic, gray-level-based, and texture features from the segmented lesion candidate regions. The two variants of the proposed feature selection method chose 29 and 7 features, respectively. Two SVM classifiers trained with these selected features yielded a 96% by-polyp sensitivity at false-positive (FP) rates of 4.1 and 6.5 per patient, respectively. Experiments showed a significant improvement in the performance of the classifier with the proposed feature selection method over that with the popular stepwise feature selection based on Wilks' lambda that yielded 18.0 FPs per patient at the same sensitivity level. PMID:24608058

  13. Max-AUC feature selection in computer-aided detection of polyps in CT colonography.

    PubMed

    Xu, Jian-Wu; Suzuki, Kenji

    2014-03-01

    We propose a feature selection method based on a sequential forward floating selection (SFFS) procedure to improve the performance of a classifier in computerized detection of polyps in CT colonography (CTC). The feature selection method is coupled with a nonlinear support vector machine (SVM) classifier. Unlike the conventional linear method based on Wilks' lambda, the proposed method selected the most relevant features that would maximize the area under the receiver operating characteristic curve (AUC), which directly maximizes classification performance, evaluated based on AUC value, in the computer-aided detection (CADe) scheme. We presented two variants of the proposed method with different stopping criteria used in the SFFS procedure. The first variant searched all feature combinations allowed in the SFFS procedure and selected the subsets that maximize the AUC values. The second variant performed a statistical test at each step during the SFFS procedure, and it was terminated if the increase in the AUC value was not statistically significant. The advantage of the second variant is its lower computational cost. To test the performance of the proposed method, we compared it against the popular stepwise feature selection method based on Wilks' lambda for a colonic-polyp database (25 polyps and 2624 nonpolyps). We extracted 75 morphologic, gray-level-based, and texture features from the segmented lesion candidate regions. The two variants of the proposed feature selection method chose 29 and 7 features, respectively. Two SVM classifiers trained with these selected features yielded a 96% by-polyp sensitivity at false-positive (FP) rates of 4.1 and 6.5 per patient, respectively. Experiments showed a significant improvement in the performance of the classifier with the proposed feature selection method over that with the popular stepwise feature selection based on Wilks' lambda that yielded 18.0 FPs per patient at the same sensitivity level.

  14. Discriminative least squares regression for multiclass classification and feature selection.

    PubMed

    Xiang, Shiming; Nie, Feiping; Meng, Gaofeng; Pan, Chunhong; Zhang, Changshui

    2012-11-01

    This paper presents a framework of discriminative least squares regression (LSR) for multiclass classification and feature selection. The core idea is to enlarge the distance between different classes under the conceptual framework of LSR. First, a technique called ε-dragging is introduced to force the regression targets of different classes moving along opposite directions such that the distances between classes can be enlarged. Then, the ε-draggings are integrated into the LSR model for multiclass classification. Our learning framework, referred to as discriminative LSR, has a compact model form, where there is no need to train two-class machines that are independent of each other. With its compact form, this model can be naturally extended for feature selection. This goal is achieved in terms of L2,1 norm of matrix, generating a sparse learning model for feature selection. The model for multiclass classification and its extension for feature selection are finally solved elegantly and efficiently. Experimental evaluation over a range of benchmark datasets indicates the validity of our method.

  15. Improving Classification of Protein Interaction Articles Using Context Similarity-Based Feature Selection.

    PubMed

    Chen, Yifei; Sun, Yuxing; Han, Bing-Qing

    2015-01-01

    Protein interaction article classification is a text classification task in the biological domain to determine which articles describe protein-protein interactions. Since the feature space in text classification is high-dimensional, feature selection is widely used for reducing the dimensionality of features to speed up computation without sacrificing classification performance. Many existing feature selection methods are based on the statistical measure of document frequency and term frequency. One potential drawback of these methods is that they treat features separately. Hence, first we design a similarity measure between the context information to take word cooccurrences and phrase chunks around the features into account. Then we introduce the similarity of context information to the importance measure of the features to substitute the document and term frequency. Hence we propose new context similarity-based feature selection methods. Their performance is evaluated on two protein interaction article collections and compared against the frequency-based methods. The experimental results reveal that the context similarity-based methods perform better in terms of the F1 measure and the dimension reduction rate. Benefiting from the context information surrounding the features, the proposed methods can select distinctive features effectively for protein interaction article classification.

  16. Feature Selection in Classification of Eye Movements Using Electrooculography for Activity Recognition

    PubMed Central

    Mala, S.; Latha, K.

    2014-01-01

    Activity recognition is needed in different requisition, for example, reconnaissance system, patient monitoring, and human-computer interfaces. Feature selection plays an important role in activity recognition, data mining, and machine learning. In selecting subset of features, an efficient evolutionary algorithm Differential Evolution (DE), a very efficient optimizer, is used for finding informative features from eye movements using electrooculography (EOG). Many researchers use EOG signals in human-computer interactions with various computational intelligence methods to analyze eye movements. The proposed system involves analysis of EOG signals using clearness based features, minimum redundancy maximum relevance features, and Differential Evolution based features. This work concentrates more on the feature selection algorithm based on DE in order to improve the classification for faultless activity recognition. PMID:25574185

  17. Feature selection in classification of eye movements using electrooculography for activity recognition.

    PubMed

    Mala, S; Latha, K

    2014-01-01

    Activity recognition is needed in different requisition, for example, reconnaissance system, patient monitoring, and human-computer interfaces. Feature selection plays an important role in activity recognition, data mining, and machine learning. In selecting subset of features, an efficient evolutionary algorithm Differential Evolution (DE), a very efficient optimizer, is used for finding informative features from eye movements using electrooculography (EOG). Many researchers use EOG signals in human-computer interactions with various computational intelligence methods to analyze eye movements. The proposed system involves analysis of EOG signals using clearness based features, minimum redundancy maximum relevance features, and Differential Evolution based features. This work concentrates more on the feature selection algorithm based on DE in order to improve the classification for faultless activity recognition.

  18. Automatic parameter selection for feature-based multi-sensor image registration

    NASA Astrophysics Data System (ADS)

    DelMarco, Stephen; Tom, Victor; Webb, Helen; Chao, Alan

    2006-05-01

    Accurate image registration is critical for applications such as precision targeting, geo-location, change-detection, surveillance, and remote sensing. However, the increasing volume of image data is exceeding the current capacity of human analysts to perform manual registration. This image data glut necessitates the development of automated approaches to image registration, including algorithm parameter value selection. Proper parameter value selection is crucial to the success of registration techniques. The appropriate algorithm parameters can be highly scene and sensor dependent. Therefore, robust algorithm parameter value selection approaches are a critical component of an end-to-end image registration algorithm. In previous work, we developed a general framework for multisensor image registration which includes feature-based registration approaches. In this work we examine the problem of automated parameter selection. We apply the automated parameter selection approach of Yitzhaky and Peli to select parameters for feature-based registration of multisensor image data. The approach consists of generating multiple feature-detected images by sweeping over parameter combinations and using these images to generate estimated ground truth. The feature-detected images are compared to the estimated ground truth images to generate ROC points associated with each parameter combination. We develop a strategy for selecting the optimal parameter set by choosing the parameter combination corresponding to the optimal ROC point. We present numerical results showing the effectiveness of the approach using registration of collected SAR data to reference EO data.

  19. Feature selection for elderly faller classification based on wearable sensors.

    PubMed

    Howcroft, Jennifer; Kofman, Jonathan; Lemaire, Edward D

    2017-05-30

    Wearable sensors can be used to derive numerous gait pattern features for elderly fall risk and faller classification; however, an appropriate feature set is required to avoid high computational costs and the inclusion of irrelevant features. The objectives of this study were to identify and evaluate smaller feature sets for faller classification from large feature sets derived from wearable accelerometer and pressure-sensing insole gait data. A convenience sample of 100 older adults (75.5 ± 6.7 years; 76 non-fallers, 24 fallers based on 6 month retrospective fall occurrence) walked 7.62 m while wearing pressure-sensing insoles and tri-axial accelerometers at the head, pelvis, left and right shanks. Feature selection was performed using correlation-based feature selection (CFS), fast correlation based filter (FCBF), and Relief-F algorithms. Faller classification was performed using multi-layer perceptron neural network, naïve Bayesian, and support vector machine classifiers, with 75:25 single stratified holdout and repeated random sampling. The best performing model was a support vector machine with 78% accuracy, 26% sensitivity, 95% specificity, 0.36 F1 score, and 0.31 MCC and one posterior pelvis accelerometer input feature (left acceleration standard deviation). The second best model achieved better sensitivity (44%) and used a support vector machine with 74% accuracy, 83% specificity, 0.44 F1 score, and 0.29 MCC. This model had ten input features: maximum, mean and standard deviation posterior acceleration; maximum, mean and standard deviation anterior acceleration; mean superior acceleration; and three impulse features. The best multi-sensor model sensitivity (56%) was achieved using posterior pelvis and both shank accelerometers and a naïve Bayesian classifier. The best single-sensor model sensitivity (41%) was achieved using the posterior pelvis accelerometer and a naïve Bayesian classifier. Feature selection provided models with smaller feature

  20. Reducing Sweeping Frequencies in Microwave NDT Employing Machine Learning Feature Selection

    PubMed Central

    Moomen, Abdelniser; Ali, Abdulbaset; Ramahi, Omar M.

    2016-01-01

    Nondestructive Testing (NDT) assessment of materials’ health condition is useful for classifying healthy from unhealthy structures or detecting flaws in metallic or dielectric structures. Performing structural health testing for coated/uncoated metallic or dielectric materials with the same testing equipment requires a testing method that can work on metallics and dielectrics such as microwave testing. Reducing complexity and expenses associated with current diagnostic practices of microwave NDT of structural health requires an effective and intelligent approach based on feature selection and classification techniques of machine learning. Current microwave NDT methods in general based on measuring variation in the S-matrix over the entire operating frequency ranges of the sensors. For instance, assessing the health of metallic structures using a microwave sensor depends on the reflection or/and transmission coefficient measurements as a function of the sweeping frequencies of the operating band. The aim of this work is reducing sweeping frequencies using machine learning feature selection techniques. By treating sweeping frequencies as features, the number of top important features can be identified, then only the most influential features (frequencies) are considered when building the microwave NDT equipment. The proposed method of reducing sweeping frequencies was validated experimentally using a waveguide sensor and a metallic plate with different cracks. Among the investigated feature selection techniques are information gain, gain ratio, relief, chi-squared. The effectiveness of the selected features were validated through performance evaluations of various classification models; namely, Nearest Neighbor, Neural Networks, Random Forest, and Support Vector Machine. Results showed good crack classification accuracy rates after employing feature selection algorithms. PMID:27104533

  1. Vessel Classification in Cosmo-Skymed SAR Data Using Hierarchical Feature Selection

    NASA Astrophysics Data System (ADS)

    Makedonas, A.; Theoharatos, C.; Tsagaris, V.; Anastasopoulos, V.; Costicoglou, S.

    2015-04-01

    SAR based ship detection and classification are important elements of maritime monitoring applications. Recently, high-resolution SAR data have opened new possibilities to researchers for achieving improved classification results. In this work, a hierarchical vessel classification procedure is presented based on a robust feature extraction and selection scheme that utilizes scale, shape and texture features in a hierarchical way. Initially, different types of feature extraction algorithms are implemented in order to form the utilized feature pool, able to represent the structure, material, orientation and other vessel type characteristics. A two-stage hierarchical feature selection algorithm is utilized next in order to be able to discriminate effectively civilian vessels into three distinct types, in COSMO-SkyMed SAR images: cargos, small ships and tankers. In our analysis, scale and shape features are utilized in order to discriminate smaller types of vessels present in the available SAR data, or shape specific vessels. Then, the most informative texture and intensity features are incorporated in order to be able to better distinguish the civilian types with high accuracy. A feature selection procedure that utilizes heuristic measures based on features' statistical characteristics, followed by an exhaustive research with feature sets formed by the most qualified features is carried out, in order to discriminate the most appropriate combination of features for the final classification. In our analysis, five COSMO-SkyMed SAR data with 2.2m x 2.2m resolution were used to analyse the detailed characteristics of these types of ships. A total of 111 ships with available AIS data were used in the classification process. The experimental results show that this method has good performance in ship classification, with an overall accuracy reaching 83%. Further investigation of additional features and proper feature selection is currently in progress.

  2. Fuzzy feature selection based on interval type-2 fuzzy sets

    NASA Astrophysics Data System (ADS)

    Cherif, Sahar; Baklouti, Nesrine; Alimi, Adel; Snasel, Vaclav

    2017-03-01

    When dealing with real world data; noise, complexity, dimensionality, uncertainty and irrelevance can lead to low performance and insignificant judgment. Fuzzy logic is a powerful tool for controlling conflicting attributes which can have similar effects and close meanings. In this paper, an interval type-2 fuzzy feature selection is presented as a new approach for removing irrelevant features and reducing complexity. We demonstrate how can Feature Selection be joined with Interval Type-2 Fuzzy Logic for keeping significant features and hence reducing time complexity. The proposed method is compared with some other approaches. The results show that the number of attributes is proportionally small.

  3. The effect of feature selection methods on computer-aided detection of masses in mammograms

    NASA Astrophysics Data System (ADS)

    Hupse, Rianne; Karssemeijer, Nico

    2010-05-01

    In computer-aided diagnosis (CAD) research, feature selection methods are often used to improve generalization performance of classifiers and shorten computation times. In an application that detects malignant masses in mammograms, we investigated the effect of using a selection criterion that is similar to the final performance measure we are optimizing, namely the mean sensitivity of the system in a predefined range of the free-response receiver operating characteristics (FROC). To obtain the generalization performance of the selected feature subsets, a cross validation procedure was performed on a dataset containing 351 abnormal and 7879 normal regions, each region providing a set of 71 mass features. The same number of noise features, not containing any information, were added to investigate the ability of the feature selection algorithms to distinguish between useful and non-useful features. It was found that significantly higher performances were obtained using feature sets selected by the general test statistic Wilks' lambda than using feature sets selected by the more specific FROC measure. Feature selection leads to better performance when compared to a system in which all features were used.

  4. An Ensemble Method with Integration of Feature Selection and Classifier Selection to Detect the Landslides

    NASA Astrophysics Data System (ADS)

    Zhongqin, G.; Chen, Y.

    2017-12-01

    Abstract Quickly identify the spatial distribution of landslides automatically is essential for the prevention, mitigation and assessment of the landslide hazard. It's still a challenging job owing to the complicated characteristics and vague boundary of the landslide areas on the image. The high resolution remote sensing image has multi-scales, complex spatial distribution and abundant features, the object-oriented image classification methods can make full use of the above information and thus effectively detect the landslides after the hazard happened. In this research we present a new semi-supervised workflow, taking advantages of recent object-oriented image analysis and machine learning algorithms to quick locate the different origins of landslides of some areas on the southwest part of China. Besides a sequence of image segmentation, feature selection, object classification and error test, this workflow ensemble the feature selection and classifier selection. The feature this study utilized were normalized difference vegetation index (NDVI) change, textural feature derived from the gray level co-occurrence matrices (GLCM), spectral feature and etc. The improvement of this study shows this algorithm significantly removes some redundant feature and the classifiers get fully used. All these improvements lead to a higher accuracy on the determination of the shape of landslides on the high resolution remote sensing image, in particular the flexibility aimed at different kinds of landslides.

  5. An Ant Colony Optimization Based Feature Selection for Web Page Classification

    PubMed Central

    2014-01-01

    The increased popularity of the web has caused the inclusion of huge amount of information to the web, and as a result of this explosive information growth, automated web page classification systems are needed to improve search engines' performance. Web pages have a large number of features such as HTML/XML tags, URLs, hyperlinks, and text contents that should be considered during an automated classification process. The aim of this study is to reduce the number of features to be used to improve runtime and accuracy of the classification of web pages. In this study, we used an ant colony optimization (ACO) algorithm to select the best features, and then we applied the well-known C4.5, naive Bayes, and k nearest neighbor classifiers to assign class labels to web pages. We used the WebKB and Conference datasets in our experiments, and we showed that using the ACO for feature selection improves both accuracy and runtime performance of classification. We also showed that the proposed ACO based algorithm can select better features with respect to the well-known information gain and chi square feature selection methods. PMID:25136678

  6. An ant colony optimization based feature selection for web page classification.

    PubMed

    Saraç, Esra; Özel, Selma Ayşe

    2014-01-01

    The increased popularity of the web has caused the inclusion of huge amount of information to the web, and as a result of this explosive information growth, automated web page classification systems are needed to improve search engines' performance. Web pages have a large number of features such as HTML/XML tags, URLs, hyperlinks, and text contents that should be considered during an automated classification process. The aim of this study is to reduce the number of features to be used to improve runtime and accuracy of the classification of web pages. In this study, we used an ant colony optimization (ACO) algorithm to select the best features, and then we applied the well-known C4.5, naive Bayes, and k nearest neighbor classifiers to assign class labels to web pages. We used the WebKB and Conference datasets in our experiments, and we showed that using the ACO for feature selection improves both accuracy and runtime performance of classification. We also showed that the proposed ACO based algorithm can select better features with respect to the well-known information gain and chi square feature selection methods.

  7. Hierarchical feature selection for erythema severity estimation

    NASA Astrophysics Data System (ADS)

    Wang, Li; Shi, Chenbo; Shu, Chang

    2014-10-01

    At present PASI system of scoring is used for evaluating erythema severity, which can help doctors to diagnose psoriasis [1-3]. The system relies on the subjective judge of doctors, where the accuracy and stability cannot be guaranteed [4]. This paper proposes a stable and precise algorithm for erythema severity estimation. Our contributions are twofold. On one hand, in order to extract the multi-scale redness of erythema, we design the hierarchical feature. Different from traditional methods, we not only utilize the color statistical features, but also divide the detect window into small window and extract hierarchical features. Further, a feature re-ranking step is introduced, which can guarantee that extracted features are irrelevant to each other. On the other hand, an adaptive boosting classifier is applied for further feature selection. During the step of training, the classifier will seek out the most valuable feature for evaluating erythema severity, due to its strong learning ability. Experimental results demonstrate the high precision and robustness of our algorithm. The accuracy is 80.1% on the dataset which comprise 116 patients' images with various kinds of erythema. Now our system has been applied for erythema medical efficacy evaluation in Union Hosp, China.

  8. Mutual information-based feature selection for radiomics

    NASA Astrophysics Data System (ADS)

    Oubel, Estanislao; Beaumont, Hubert; Iannessi, Antoine

    2016-03-01

    Background The extraction and analysis of image features (radiomics) is a promising field in the precision medicine era, with applications to prognosis, prediction, and response to treatment quantification. In this work, we present a mutual information - based method for quantifying reproducibility of features, a necessary step for qualification before their inclusion in big data systems. Materials and Methods Ten patients with Non-Small Cell Lung Cancer (NSCLC) lesions were followed over time (7 time points in average) with Computed Tomography (CT). Five observers segmented lesions by using a semi-automatic method and 27 features describing shape and intensity distribution were extracted. Inter-observer reproducibility was assessed by computing the multi-information (MI) of feature changes over time, and the variability of global extrema. Results The highest MI values were obtained for volume-based features (VBF). The lesion mass (M), surface to volume ratio (SVR) and volume (V) presented statistically significant higher values of MI than the rest of features. Within the same VBF group, SVR showed also the lowest variability of extrema. The correlation coefficient (CC) of feature values was unable to make a difference between features. Conclusions MI allowed to discriminate three features (M, SVR, and V) from the rest in a statistically significant manner. This result is consistent with the order obtained when sorting features by increasing values of extrema variability. MI is a promising alternative for selecting features to be considered as surrogate biomarkers in a precision medicine context.

  9. A feature selection approach towards progressive vector transmission over the Internet

    NASA Astrophysics Data System (ADS)

    Miao, Ru; Song, Jia; Feng, Min

    2017-09-01

    WebGIS has been applied for visualizing and sharing geospatial information popularly over the Internet. In order to improve the efficiency of the client applications, the web-based progressive vector transmission approach is proposed. Important features should be selected and transferred firstly, and the methods for measuring the importance of features should be further considered in the progressive transmission. However, studies on progressive transmission for large-volume vector data have mostly focused on map generalization in the field of cartography, but rarely discussed on the selection of geographic features quantitatively. This paper applies information theory for measuring the feature importance of vector maps. A measurement model for the amount of information of vector features is defined based upon the amount of information for dealing with feature selection issues. The measurement model involves geometry factor, spatial distribution factor and thematic attribute factor. Moreover, a real-time transport protocol (RTP)-based progressive transmission method is then presented to improve the transmission of vector data. To clearly demonstrate the essential methodology and key techniques, a prototype for web-based progressive vector transmission is presented, and an experiment of progressive selection and transmission for vector features is conducted. The experimental results indicate that our approach clearly improves the performance and end-user experience of delivering and manipulating large vector data over the Internet.

  10. Feature-based and spatial attentional selection in visual working memory.

    PubMed

    Heuer, Anna; Schubö, Anna

    2016-05-01

    The contents of visual working memory (VWM) can be modulated by spatial cues presented during the maintenance interval ("retrocues"). Here, we examined whether attentional selection of representations in VWM can also be based on features. In addition, we investigated whether the mechanisms of feature-based and spatial attention in VWM differ with respect to parallel access to noncontiguous locations. In two experiments, we tested the efficacy of valid retrocues relying on different kinds of information. Specifically, participants were presented with a typical spatial retrocue pointing to two locations, a symbolic spatial retrocue (numbers mapping onto two locations), and two feature-based retrocues: a color retrocue (a blob of the same color as two of the items) and a shape retrocue (an outline of the shape of two of the items). The two cued items were presented at either contiguous or noncontiguous locations. Overall retrocueing benefits, as compared to a neutral condition, were observed for all retrocue types. Whereas feature-based retrocues yielded benefits for cued items presented at both contiguous and noncontiguous locations, spatial retrocues were only effective when the cued items had been presented at contiguous locations. These findings demonstrate that attentional selection and updating in VWM can operate on different kinds of information, allowing for a flexible and efficient use of this limited system. The observation that the representations of items presented at noncontiguous locations could only be reliably selected with feature-based retrocues suggests that feature-based and spatial attentional selection in VWM rely on different mechanisms, as has been shown for attentional orienting in the external world.

  11. Differential diagnosis of CT focal liver lesions using texture features, feature selection and ensemble driven classifiers.

    PubMed

    Mougiakakou, Stavroula G; Valavanis, Ioannis K; Nikita, Alexandra; Nikita, Konstantina S

    2007-09-01

    The aim of the present study is to define an optimally performing computer-aided diagnosis (CAD) architecture for the classification of liver tissue from non-enhanced computed tomography (CT) images into normal liver (C1), hepatic cyst (C2), hemangioma (C3), and hepatocellular carcinoma (C4). To this end, various CAD architectures, based on texture features and ensembles of classifiers (ECs), are comparatively assessed. Number of regions of interests (ROIs) corresponding to C1-C4 have been defined by experienced radiologists in non-enhanced liver CT images. For each ROI, five distinct sets of texture features were extracted using first order statistics, spatial gray level dependence matrix, gray level difference method, Laws' texture energy measures, and fractal dimension measurements. Two different ECs were constructed and compared. The first one consists of five multilayer perceptron neural networks (NNs), each using as input one of the computed texture feature sets or its reduced version after genetic algorithm-based feature selection. The second EC comprised five different primary classifiers, namely one multilayer perceptron NN, one probabilistic NN, and three k-nearest neighbor classifiers, each fed with the combination of the five texture feature sets or their reduced versions. The final decision of each EC was extracted by using appropriate voting schemes, while bootstrap re-sampling was utilized in order to estimate the generalization ability of the CAD architectures based on the available relatively small-sized data set. The best mean classification accuracy (84.96%) is achieved by the second EC using a fused feature set, and the weighted voting scheme. The fused feature set was obtained after appropriate feature selection applied to specific subsets of the original feature set. The comparative assessment of the various CAD architectures shows that combining three types of classifiers with a voting scheme, fed with identical feature sets obtained after

  12. Efficient feature subset selection with probabilistic distance criteria. [pattern recognition

    NASA Technical Reports Server (NTRS)

    Chittineni, C. B.

    1979-01-01

    Recursive expressions are derived for efficiently computing the commonly used probabilistic distance measures as a change in the criteria both when a feature is added to and when a feature is deleted from the current feature subset. A combinatorial algorithm for generating all possible r feature combinations from a given set of s features in (s/r) steps with a change of a single feature at each step is presented. These expressions can also be used for both forward and backward sequential feature selection.

  13. Relevance popularity: A term event model based feature selection scheme for text classification.

    PubMed

    Feng, Guozhong; An, Baiguo; Yang, Fengqin; Wang, Han; Zhang, Libiao

    2017-01-01

    Feature selection is a practical approach for improving the performance of text classification methods by optimizing the feature subsets input to classifiers. In traditional feature selection methods such as information gain and chi-square, the number of documents that contain a particular term (i.e. the document frequency) is often used. However, the frequency of a given term appearing in each document has not been fully investigated, even though it is a promising feature to produce accurate classifications. In this paper, we propose a new feature selection scheme based on a term event Multinomial naive Bayes probabilistic model. According to the model assumptions, the matching score function, which is based on the prediction probability ratio, can be factorized. Finally, we derive a feature selection measurement for each term after replacing inner parameters by their estimators. On a benchmark English text datasets (20 Newsgroups) and a Chinese text dataset (MPH-20), our numerical experiment results obtained from using two widely used text classifiers (naive Bayes and support vector machine) demonstrate that our method outperformed the representative feature selection methods.

  14. EEG-based mild depressive detection using feature selection methods and classifiers.

    PubMed

    Li, Xiaowei; Hu, Bin; Sun, Shuting; Cai, Hanshu

    2016-11-01

    Depression has become a major health burden worldwide, and effectively detection of such disorder is a great challenge which requires latest technological tool, such as Electroencephalography (EEG). This EEG-based research seeks to find prominent frequency band and brain regions that are most related to mild depression, as well as an optimal combination of classification algorithms and feature selection methods which can be used in future mild depression detection. An experiment based on facial expression viewing task (Emo_block and Neu_block) was conducted, and EEG data of 37 university students were collected using a 128 channel HydroCel Geodesic Sensor Net (HCGSN). For discriminating mild depressive patients and normal controls, BayesNet (BN), Support Vector Machine (SVM), Logistic Regression (LR), k-nearest neighbor (KNN) and RandomForest (RF) classifiers were used. And BestFirst (BF), GreedyStepwise (GSW), GeneticSearch (GS), LinearForwordSelection (LFS) and RankSearch (RS) based on Correlation Features Selection (CFS) were applied for linear and non-linear EEG features selection. Independent Samples T-test with Bonferroni correction was used to find the significantly discriminant electrodes and features. Data mining results indicate that optimal performance is achieved using a combination of feature selection method GSW based on CFS and classifier KNN for beta frequency band. Accuracies achieved 92.00% and 98.00%, and AUC achieved 0.957 and 0.997, for Emo_block and Neu_block beta band data respectively. T-test results validate the effectiveness of selected features by search method GSW. Simplified EEG system with only FP1, FP2, F3, O2, T3 electrodes was also explored with linear features, which yielded accuracies of 91.70% and 96.00%, AUC of 0.952 and 0.972, for Emo_block and Neu_block respectively. Classification results obtained by GSW + KNN are encouraging and better than previously published results. In the spatial distribution of features, we find

  15. A Feature and Algorithm Selection Method for Improving the Prediction of Protein Structural Class.

    PubMed

    Ni, Qianwu; Chen, Lei

    2017-01-01

    Correct prediction of protein structural class is beneficial to investigation on protein functions, regulations and interactions. In recent years, several computational methods have been proposed in this regard. However, based on various features, it is still a great challenge to select proper classification algorithm and extract essential features to participate in classification. In this study, a feature and algorithm selection method was presented for improving the accuracy of protein structural class prediction. The amino acid compositions and physiochemical features were adopted to represent features and thirty-eight machine learning algorithms collected in Weka were employed. All features were first analyzed by a feature selection method, minimum redundancy maximum relevance (mRMR), producing a feature list. Then, several feature sets were constructed by adding features in the list one by one. For each feature set, thirtyeight algorithms were executed on a dataset, in which proteins were represented by features in the set. The predicted classes yielded by these algorithms and true class of each protein were collected to construct a dataset, which were analyzed by mRMR method, yielding an algorithm list. From the algorithm list, the algorithm was taken one by one to build an ensemble prediction model. Finally, we selected the ensemble prediction model with the best performance as the optimal ensemble prediction model. Experimental results indicate that the constructed model is much superior to models using single algorithm and other models that only adopt feature selection procedure or algorithm selection procedure. The feature selection procedure or algorithm selection procedure are really helpful for building an ensemble prediction model that can yield a better performance. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  16. Mutual information criterion for feature selection with application to classification of breast microcalcifications

    NASA Astrophysics Data System (ADS)

    Diamant, Idit; Shalhon, Moran; Goldberger, Jacob; Greenspan, Hayit

    2016-03-01

    Classification of clustered breast microcalcifications into benign and malignant categories is an extremely challenging task for computerized algorithms and expert radiologists alike. In this paper we present a novel method for feature selection based on mutual information (MI) criterion for automatic classification of microcalcifications. We explored the MI based feature selection for various texture features. The proposed method was evaluated on a standardized digital database for screening mammography (DDSM). Experimental results demonstrate the effectiveness and the advantage of using the MI-based feature selection to obtain the most relevant features for the task and thus to provide for improved performance as compared to using all features.

  17. Simultaneous grouping pursuit and feature selection over an undirected graph*

    PubMed Central

    Zhu, Yunzhang; Shen, Xiaotong; Pan, Wei

    2013-01-01

    Summary In high-dimensional regression, grouping pursuit and feature selection have their own merits while complementing each other in battling the curse of dimensionality. To seek a parsimonious model, we perform simultaneous grouping pursuit and feature selection over an arbitrary undirected graph with each node corresponding to one predictor. When the corresponding nodes are reachable from each other over the graph, regression coefficients can be grouped, whose absolute values are the same or close. This is motivated from gene network analysis, where genes tend to work in groups according to their biological functionalities. Through a nonconvex penalty, we develop a computational strategy and analyze the proposed method. Theoretical analysis indicates that the proposed method reconstructs the oracle estimator, that is, the unbiased least squares estimator given the true grouping, leading to consistent reconstruction of grouping structures and informative features, as well as to optimal parameter estimation. Simulation studies suggest that the method combines the benefit of grouping pursuit with that of feature selection, and compares favorably against its competitors in selection accuracy and predictive performance. An application to eQTL data is used to illustrate the methodology, where a network is incorporated into analysis through an undirected graph. PMID:24098061

  18. Investigating a memory-based account of negative priming: support for selection-feature mismatch.

    PubMed

    MacDonald, P A; Joordens, S

    2000-08-01

    Using typical and modified negative priming tasks, the selection-feature mismatch account of negative priming was tested. In the modified task, participants performed selections on the basis of a semantic feature (e.g., referent size). This procedure has been shown to enhance negative priming (P. A. MacDonald, S. Joordens, & K. N. Seergobin, 1999). Across 3 experiments, negative priming occurred only when the repeated item mismatched in terms of the feature used as the basis for selections. When the repeated item was congruent on the selection feature across the prime and probe displays, positive priming arose. This pattern of results appeared in both the ignored- and the attended-repetition conditions. Negative priming does not result from previously ignoring an item. These findings strongly support the selection-feature mismatch account of negative priming and refute both the distractor inhibition and the episodic-retrieval explanations.

  19. Delivery of integrated diabetes care using logistics and information technology--the Joint Asia Diabetes Evaluation (JADE) program.

    PubMed

    Chan, Juliana C N; Ozaki, Risa; Luk, Andrea; Kong, Alice P S; Ma, Ronald C W; Chow, Francis C C; Wong, Patrick; Wong, Rebecca; Chung, Harriet; Chiu, Cherry; Wolthers, Troels; Tong, Peter C Y; Ko, Gary T C; So, Wing-Yee; Lyubomirsky, Greg

    2014-12-01

    Diabetes is a global epidemic, and many affected individuals are undiagnosed, untreated, or uncontrolled. The silent and multi-system nature of diabetes and its complications, with complex care protocols, are often associated with omission of periodic assessments, clinical inertia, poor treatment compliance, and care fragmentation. These barriers at the system, patient, and care-provider levels have resulted in poor control of risk factors and under-usage of potentially life-saving medications such as statins and renin-angiotensin system inhibitors. However, in the clinical trial setting, use of nurses and protocol with frequent contact and regular monitoring have resulted in marked differences in event rates compared to epidemiological data collected in the real-world setting. The phenotypic heterogeneity and cognitive-psychological-behavioral needs of people with diabetes call for regular risk stratification to personalize care. Quality improvement initiatives targeted at patient education, task delegation, case management, and self-care promotion had the largest effect size in improving cardio-metabolic risk factors. The Joint Asia Diabetes Evaluation (JADE) program is an innovative care prototype that advocates a change in clinic setting and workflow, coordinated by a doctor-nurse team and augmented by a web-based portal, which incorporates care protocols and a validated risk engine to provide decision support and regular feedback. By using logistics and information technology, supported by a network of health-care professionals to provide integrated, holistic, and evidence-based care, the JADE Program aims to establish a high-quality regional diabetes database to reflect the status of diabetes care in real-world practice, confirm efficacy data, and identify unmet needs. Through collaborative efforts, we shall evaluate the feasibility, acceptability, and cost-effectiveness of this "high tech, soft touch" model to make diabetes and chronic disease care more

  20. Enhancing the Performance of LibSVM Classifier by Kernel F-Score Feature Selection

    NASA Astrophysics Data System (ADS)

    Sarojini, Balakrishnan; Ramaraj, Narayanasamy; Nickolas, Savarimuthu

    Medical Data mining is the search for relationships and patterns within the medical datasets that could provide useful knowledge for effective clinical decisions. The inclusion of irrelevant, redundant and noisy features in the process model results in poor predictive accuracy. Much research work in data mining has gone into improving the predictive accuracy of the classifiers by applying the techniques of feature selection. Feature selection in medical data mining is appreciable as the diagnosis of the disease could be done in this patient-care activity with minimum number of significant features. The objective of this work is to show that selecting the more significant features would improve the performance of the classifier. We empirically evaluate the classification effectiveness of LibSVM classifier on the reduced feature subset of diabetes dataset. The evaluations suggest that the feature subset selected improves the predictive accuracy of the classifier and reduce false negatives and false positives.

  1. Modified Bat Algorithm for Feature Selection with the Wisconsin Diagnosis Breast Cancer (WDBC) Dataset

    PubMed

    Jeyasingh, Suganthi; Veluchamy, Malathi

    2017-05-01

    Early diagnosis of breast cancer is essential to save lives of patients. Usually, medical datasets include a large variety of data that can lead to confusion during diagnosis. The Knowledge Discovery on Database (KDD) process helps to improve efficiency. It requires elimination of inappropriate and repeated data from the dataset before final diagnosis. This can be done using any of the feature selection algorithms available in data mining. Feature selection is considered as a vital step to increase the classification accuracy. This paper proposes a Modified Bat Algorithm (MBA) for feature selection to eliminate irrelevant features from an original dataset. The Bat algorithm was modified using simple random sampling to select the random instances from the dataset. Ranking was with the global best features to recognize the predominant features available in the dataset. The selected features are used to train a Random Forest (RF) classification algorithm. The MBA feature selection algorithm enhanced the classification accuracy of RF in identifying the occurrence of breast cancer. The Wisconsin Diagnosis Breast Cancer Dataset (WDBC) was used for estimating the performance analysis of the proposed MBA feature selection algorithm. The proposed algorithm achieved better performance in terms of Kappa statistic, Mathew’s Correlation Coefficient, Precision, F-measure, Recall, Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Relative Absolute Error (RAE) and Root Relative Squared Error (RRSE). Creative Commons Attribution License

  2. Object-based selection from spatially-invariant representations: evidence from a feature-report task.

    PubMed

    Matsukura, Michi; Vecera, Shaun P

    2011-02-01

    Attention selects objects as well as locations. When attention selects an object's features, observers identify two features from a single object more accurately than two features from two different objects (object-based effect of attention; e.g., Duncan, Journal of Experimental Psychology: General, 113, 501-517, 1984). Several studies have demonstrated that object-based attention can operate at a late visual processing stage that is independent of objects' spatial information (Awh, Dhaliwal, Christensen, & Matsukura, Psychological Science, 12, 329-334, 2001; Matsukura & Vecera, Psychonomic Bulletin & Review, 16, 529-536, 2009; Vecera, Journal of Experimental Psychology: General, 126, 14-18, 1997; Vecera & Farah, Journal of Experimental Psychology: General, 123, 146-160, 1994). In the present study, we asked two questions regarding this late object-based selection mechanism. In Part I, we investigated how observers' foreknowledge of to-be-reported features allows attention to select objects, as opposed to individual features. Using a feature-report task, a significant object-based effect was observed when to-be-reported features were known in advance but not when this advance knowledge was absent. In Part II, we examined what drives attention to select objects rather than individual features in the absence of observers' foreknowledge of to-be-reported features. Results suggested that, when there was no opportunity for observers to direct their attention to objects that possess to-be-reported features at the time of stimulus presentation, these stimuli must retain strong perceptual cues to establish themselves as separate objects.

  3. Feature selection method based on multi-fractal dimension and harmony search algorithm and its application

    NASA Astrophysics Data System (ADS)

    Zhang, Chen; Ni, Zhiwei; Ni, Liping; Tang, Na

    2016-10-01

    Feature selection is an important method of data preprocessing in data mining. In this paper, a novel feature selection method based on multi-fractal dimension and harmony search algorithm is proposed. Multi-fractal dimension is adopted as the evaluation criterion of feature subset, which can determine the number of selected features. An improved harmony search algorithm is used as the search strategy to improve the efficiency of feature selection. The performance of the proposed method is compared with that of other feature selection algorithms on UCI data-sets. Besides, the proposed method is also used to predict the daily average concentration of PM2.5 in China. Experimental results show that the proposed method can obtain competitive results in terms of both prediction accuracy and the number of selected features.

  4. Prediction of lysine ubiquitylation with ensemble classifier and feature selection.

    PubMed

    Zhao, Xiaowei; Li, Xiangtao; Ma, Zhiqiang; Yin, Minghao

    2011-01-01

    Ubiquitylation is an important process of post-translational modification. Correct identification of protein lysine ubiquitylation sites is of fundamental importance to understand the molecular mechanism of lysine ubiquitylation in biological systems. This paper develops a novel computational method to effectively identify the lysine ubiquitylation sites based on the ensemble approach. In the proposed method, 468 ubiquitylation sites from 323 proteins retrieved from the Swiss-Prot database were encoded into feature vectors by using four kinds of protein sequences information. An effective feature selection method was then applied to extract informative feature subsets. After different feature subsets were obtained by setting different starting points in the search procedure, they were used to train multiple random forests classifiers and then aggregated into a consensus classifier by majority voting. Evaluated by jackknife tests and independent tests respectively, the accuracy of the proposed predictor reached 76.82% for the training dataset and 79.16% for the test dataset, indicating that this predictor is a useful tool to predict lysine ubiquitylation sites. Furthermore, site-specific feature analysis was performed and it was shown that ubiquitylation is intimately correlated with the features of its surrounding sites in addition to features derived from the lysine site itself. The feature selection method is available upon request.

  5. Comparison of Feature Selection Techniques in Machine Learning for Anatomical Brain MRI in Dementia.

    PubMed

    Tohka, Jussi; Moradi, Elaheh; Huttunen, Heikki

    2016-07-01

    We present a comparative split-half resampling analysis of various data driven feature selection and classification methods for the whole brain voxel-based classification analysis of anatomical magnetic resonance images. We compared support vector machines (SVMs), with or without filter based feature selection, several embedded feature selection methods and stability selection. While comparisons of the accuracy of various classification methods have been reported previously, the variability of the out-of-training sample classification accuracy and the set of selected features due to independent training and test sets have not been previously addressed in a brain imaging context. We studied two classification problems: 1) Alzheimer's disease (AD) vs. normal control (NC) and 2) mild cognitive impairment (MCI) vs. NC classification. In AD vs. NC classification, the variability in the test accuracy due to the subject sample did not vary between different methods and exceeded the variability due to different classifiers. In MCI vs. NC classification, particularly with a large training set, embedded feature selection methods outperformed SVM-based ones with the difference in the test accuracy exceeding the test accuracy variability due to the subject sample. The filter and embedded methods produced divergent feature patterns for MCI vs. NC classification that suggests the utility of the embedded feature selection for this problem when linked with the good generalization performance. The stability of the feature sets was strongly correlated with the number of features selected, weakly correlated with the stability of classification accuracy, and uncorrelated with the average classification accuracy.

  6. Toward optimal feature and time segment selection by divergence method for EEG signals classification.

    PubMed

    Wang, Jie; Feng, Zuren; Lu, Na; Luo, Jing

    2018-06-01

    Feature selection plays an important role in the field of EEG signals based motor imagery pattern classification. It is a process that aims to select an optimal feature subset from the original set. Two significant advantages involved are: lowering the computational burden so as to speed up the learning procedure and removing redundant and irrelevant features so as to improve the classification performance. Therefore, feature selection is widely employed in the classification of EEG signals in practical brain-computer interface systems. In this paper, we present a novel statistical model to select the optimal feature subset based on the Kullback-Leibler divergence measure, and automatically select the optimal subject-specific time segment. The proposed method comprises four successive stages: a broad frequency band filtering and common spatial pattern enhancement as preprocessing, features extraction by autoregressive model and log-variance, the Kullback-Leibler divergence based optimal feature and time segment selection and linear discriminate analysis classification. More importantly, this paper provides a potential framework for combining other feature extraction models and classification algorithms with the proposed method for EEG signals classification. Experiments on single-trial EEG signals from two public competition datasets not only demonstrate that the proposed method is effective in selecting discriminative features and time segment, but also show that the proposed method yields relatively better classification results in comparison with other competitive methods. Copyright © 2018 Elsevier Ltd. All rights reserved.

  7. Studies of the 4-JET Rate and of Moments of Event Shape Observables Using Jade Data

    NASA Astrophysics Data System (ADS)

    Kluth, S.

    2005-04-01

    Data from e+e- annihilation into hadrons collected by the JADE experiment at centre-of-mass energies between 14 and 44 GeV were used to study the 4-jet rate using the Durham algorithm as well as the first five moments of event shape observables. The data were compared with NLO QCD predictions, augmented by resummed NLLA calculations for the 4-jet rate, in order to extract values of the strong coupling constant αS. The preliminary results are αS(MZ0) = 0.1169 ± 0.0026 (4-jet rate) and αS(MZ0) = 0.1286 ± 0.0072 (moments) consistent with the world average value. For some of the higher moments systematic deficiencies of the QCD predictions are observed.

  8. Magnetic field feature extraction and selection for indoor location estimation.

    PubMed

    Galván-Tejada, Carlos E; García-Vázquez, Juan Pablo; Brena, Ramon F

    2014-06-20

    User indoor positioning has been under constant improvement especially with the availability of new sensors integrated into the modern mobile devices, which allows us to exploit not only infrastructures made for everyday use, such as WiFi, but also natural infrastructure, as is the case of natural magnetic field. In this paper we present an extension and improvement of our current indoor localization model based on the feature extraction of 46 magnetic field signal features. The extension adds a feature selection phase to our methodology, which is performed through Genetic Algorithm (GA) with the aim of optimizing the fitness of our current model. In addition, we present an evaluation of the final model in two different scenarios: home and office building. The results indicate that performing a feature selection process allows us to reduce the number of signal features of the model from 46 to 5 regardless the scenario and room location distribution. Further, we verified that reducing the number of features increases the probability of our estimator correctly detecting the user's location (sensitivity) and its capacity to detect false positives (specificity) in both scenarios.

  9. Assessment of metal ion concentration in water with structured feature selection.

    PubMed

    Naula, Pekka; Airola, Antti; Pihlasalo, Sari; Montoya Perez, Ileana; Salakoski, Tapio; Pahikkala, Tapio

    2017-10-01

    We propose a cost-effective system for the determination of metal ion concentration in water, addressing a central issue in water resources management. The system combines novel luminometric label array technology with a machine learning algorithm that selects a minimal number of array reagents (modulators) and liquid sample dilutions, such that enable accurate quantification. The algorithm is able to identify the optimal modulators and sample dilutions leading to cost reductions since less manual labour and resources are needed. Inferring the ion detector involves a unique type of a structured feature selection problem, which we formalize in this paper. We propose a novel Cartesian greedy forward feature selection algorithm for solving the problem. The novel algorithm was evaluated in the concentration assessment of five metal ions and the performance was compared to two known feature selection approaches. The results demonstrate that the proposed system can assist in lowering the costs with minimal loss in accuracy. Copyright © 2017 Elsevier Ltd. All rights reserved.

  10. Feature weight estimation for gene selection: a local hyperlinear learning approach

    PubMed Central

    2014-01-01

    Background Modeling high-dimensional data involving thousands of variables is particularly important for gene expression profiling experiments, nevertheless,it remains a challenging task. One of the challenges is to implement an effective method for selecting a small set of relevant genes, buried in high-dimensional irrelevant noises. RELIEF is a popular and widely used approach for feature selection owing to its low computational cost and high accuracy. However, RELIEF based methods suffer from instability, especially in the presence of noisy and/or high-dimensional outliers. Results We propose an innovative feature weighting algorithm, called LHR, to select informative genes from highly noisy data. LHR is based on RELIEF for feature weighting using classical margin maximization. The key idea of LHR is to estimate the feature weights through local approximation rather than global measurement, which is typically used in existing methods. The weights obtained by our method are very robust in terms of degradation of noisy features, even those with vast dimensions. To demonstrate the performance of our method, extensive experiments involving classification tests have been carried out on both synthetic and real microarray benchmark datasets by combining the proposed technique with standard classifiers, including the support vector machine (SVM), k-nearest neighbor (KNN), hyperplane k-nearest neighbor (HKNN), linear discriminant analysis (LDA) and naive Bayes (NB). Conclusion Experiments on both synthetic and real-world datasets demonstrate the superior performance of the proposed feature selection method combined with supervised learning in three aspects: 1) high classification accuracy, 2) excellent robustness to noise and 3) good stability using to various classification algorithms. PMID:24625071

  11. Classification of epileptic EEG signals based on simple random sampling and sequential feature selection.

    PubMed

    Ghayab, Hadi Ratham Al; Li, Yan; Abdulla, Shahab; Diykh, Mohammed; Wan, Xiangkui

    2016-06-01

    Electroencephalogram (EEG) signals are used broadly in the medical fields. The main applications of EEG signals are the diagnosis and treatment of diseases such as epilepsy, Alzheimer, sleep problems and so on. This paper presents a new method which extracts and selects features from multi-channel EEG signals. This research focuses on three main points. Firstly, simple random sampling (SRS) technique is used to extract features from the time domain of EEG signals. Secondly, the sequential feature selection (SFS) algorithm is applied to select the key features and to reduce the dimensionality of the data. Finally, the selected features are forwarded to a least square support vector machine (LS_SVM) classifier to classify the EEG signals. The LS_SVM classifier classified the features which are extracted and selected from the SRS and the SFS. The experimental results show that the method achieves 99.90, 99.80 and 100 % for classification accuracy, sensitivity and specificity, respectively.

  12. Feature selection from a facial image for distinction of sasang constitution.

    PubMed

    Koo, Imhoi; Kim, Jong Yeol; Kim, Myoung Geun; Kim, Keun Ho

    2009-09-01

    Recently, oriental medicine has received attention for providing personalized medicine through consideration of the unique nature and constitution of individual patients. With the eventual goal of globalization, the current trend in oriental medicine research is the standardization by adopting western scientific methods, which could represent a scientific revolution. The purpose of this study is to establish methods for finding statistically significant features in a facial image with respect to distinguishing constitution and to show the meaning of those features. From facial photo images, facial elements are analyzed in terms of the distance, angle and the distance ratios, for which there are 1225, 61 250 and 749 700 features, respectively. Due to the very large number of facial features, it is quite difficult to determine truly meaningful features. We suggest a process for the efficient analysis of facial features including the removal of outliers, control for missing data to guarantee data confidence and calculation of statistical significance by applying ANOVA. We show the statistical properties of selected features according to different constitutions using the nine distances, 10 angles and 10 rates of distance features that are finally established. Additionally, the Sasang constitutional meaning of the selected features is shown here.

  13. Greedy feature selection for glycan chromatography data with the generalized Dirichlet distribution

    PubMed Central

    2013-01-01

    Background Glycoproteins are involved in a diverse range of biochemical and biological processes. Changes in protein glycosylation are believed to occur in many diseases, particularly during cancer initiation and progression. The identification of biomarkers for human disease states is becoming increasingly important, as early detection is key to improving survival and recovery rates. To this end, the serum glycome has been proposed as a potential source of biomarkers for different types of cancers. High-throughput hydrophilic interaction liquid chromatography (HILIC) technology for glycan analysis allows for the detailed quantification of the glycan content in human serum. However, the experimental data from this analysis is compositional by nature. Compositional data are subject to a constant-sum constraint, which restricts the sample space to a simplex. Statistical analysis of glycan chromatography datasets should account for their unusual mathematical properties. As the volume of glycan HILIC data being produced increases, there is a considerable need for a framework to support appropriate statistical analysis. Proposed here is a methodology for feature selection in compositional data. The principal objective is to provide a template for the analysis of glycan chromatography data that may be used to identify potential glycan biomarkers. Results A greedy search algorithm, based on the generalized Dirichlet distribution, is carried out over the feature space to search for the set of “grouping variables” that best discriminate between known group structures in the data, modelling the compositional variables using beta distributions. The algorithm is applied to two glycan chromatography datasets. Statistical classification methods are used to test the ability of the selected features to differentiate between known groups in the data. Two well-known methods are used for comparison: correlation-based feature selection (CFS) and recursive partitioning (rpart). CFS

  14. Train axle bearing fault detection using a feature selection scheme based multi-scale morphological filter

    NASA Astrophysics Data System (ADS)

    Li, Yifan; Liang, Xihui; Lin, Jianhui; Chen, Yuejian; Liu, Jianxin

    2018-02-01

    This paper presents a novel signal processing scheme, feature selection based multi-scale morphological filter (MMF), for train axle bearing fault detection. In this scheme, more than 30 feature indicators of vibration signals are calculated for axle bearings with different conditions and the features which can reflect fault characteristics more effectively and representatively are selected using the max-relevance and min-redundancy principle. Then, a filtering scale selection approach for MMF based on feature selection and grey relational analysis is proposed. The feature selection based MMF method is tested on diagnosis of artificially created damages of rolling bearings of railway trains. Experimental results show that the proposed method has a superior performance in extracting fault features of defective train axle bearings. In addition, comparisons are performed with the kurtosis criterion based MMF and the spectral kurtosis criterion based MMF. The proposed feature selection based MMF method outperforms these two methods in detection of train axle bearing faults.

  15. MRM-Lasso: A Sparse Multiview Feature Selection Method via Low-Rank Analysis.

    PubMed

    Yang, Wanqi; Gao, Yang; Shi, Yinghuan; Cao, Longbing

    2015-11-01

    Learning about multiview data involves many applications, such as video understanding, image classification, and social media. However, when the data dimension increases dramatically, it is important but very challenging to remove redundant features in multiview feature selection. In this paper, we propose a novel feature selection algorithm, multiview rank minimization-based Lasso (MRM-Lasso), which jointly utilizes Lasso for sparse feature selection and rank minimization for learning relevant patterns across views. Instead of simply integrating multiple Lasso from view level, we focus on the performance of sample-level (sample significance) and introduce pattern-specific weights into MRM-Lasso. The weights are utilized to measure the contribution of each sample to the labels in the current view. In addition, the latent correlation across different views is successfully captured by learning a low-rank matrix consisting of pattern-specific weights. The alternating direction method of multipliers is applied to optimize the proposed MRM-Lasso. Experiments on four real-life data sets show that features selected by MRM-Lasso have better multiview classification performance than the baselines. Moreover, pattern-specific weights are demonstrated to be significant for learning about multiview data, compared with view-specific weights.

  16. SVM-RFE based feature selection and Taguchi parameters optimization for multiclass SVM classifier.

    PubMed

    Huang, Mei-Ling; Hung, Yung-Hsiang; Lee, W M; Li, R K; Jiang, Bo-Ru

    2014-01-01

    Recently, support vector machine (SVM) has excellent performance on classification and prediction and is widely used on disease diagnosis or medical assistance. However, SVM only functions well on two-group classification problems. This study combines feature selection and SVM recursive feature elimination (SVM-RFE) to investigate the classification accuracy of multiclass problems for Dermatology and Zoo databases. Dermatology dataset contains 33 feature variables, 1 class variable, and 366 testing instances; and the Zoo dataset contains 16 feature variables, 1 class variable, and 101 testing instances. The feature variables in the two datasets were sorted in descending order by explanatory power, and different feature sets were selected by SVM-RFE to explore classification accuracy. Meanwhile, Taguchi method was jointly combined with SVM classifier in order to optimize parameters C and γ to increase classification accuracy for multiclass classification. The experimental results show that the classification accuracy can be more than 95% after SVM-RFE feature selection and Taguchi parameter optimization for Dermatology and Zoo databases.

  17. SVM-RFE Based Feature Selection and Taguchi Parameters Optimization for Multiclass SVM Classifier

    PubMed Central

    Huang, Mei-Ling; Hung, Yung-Hsiang; Lee, W. M.; Li, R. K.; Jiang, Bo-Ru

    2014-01-01

    Recently, support vector machine (SVM) has excellent performance on classification and prediction and is widely used on disease diagnosis or medical assistance. However, SVM only functions well on two-group classification problems. This study combines feature selection and SVM recursive feature elimination (SVM-RFE) to investigate the classification accuracy of multiclass problems for Dermatology and Zoo databases. Dermatology dataset contains 33 feature variables, 1 class variable, and 366 testing instances; and the Zoo dataset contains 16 feature variables, 1 class variable, and 101 testing instances. The feature variables in the two datasets were sorted in descending order by explanatory power, and different feature sets were selected by SVM-RFE to explore classification accuracy. Meanwhile, Taguchi method was jointly combined with SVM classifier in order to optimize parameters C and γ to increase classification accuracy for multiclass classification. The experimental results show that the classification accuracy can be more than 95% after SVM-RFE feature selection and Taguchi parameter optimization for Dermatology and Zoo databases. PMID:25295306

  18. Image search engine with selective filtering and feature-element-based classification

    NASA Astrophysics Data System (ADS)

    Li, Qing; Zhang, Yujin; Dai, Shengyang

    2001-12-01

    With the growth of Internet and storage capability in recent years, image has become a widespread information format in World Wide Web. However, it has become increasingly harder to search for images of interest, and effective image search engine for the WWW needs to be developed. We propose in this paper a selective filtering process and a novel approach for image classification based on feature element in the image search engine we developed for the WWW. First a selective filtering process is embedded in a general web crawler to filter out the meaningless images with GIF format. Two parameters that can be obtained easily are used in the filtering process. Our classification approach first extract feature elements from images instead of feature vectors. Compared with feature vectors, feature elements can better capture visual meanings of the image according to subjective perception of human beings. Different from traditional image classification method, our classification approach based on feature element doesn't calculate the distance between two vectors in the feature space, while trying to find associations between feature element and class attribute of the image. Experiments are presented to show the efficiency of the proposed approach.

  19. A comparative analysis of swarm intelligence techniques for feature selection in cancer classification.

    PubMed

    Gunavathi, Chellamuthu; Premalatha, Kandasamy

    2014-01-01

    Feature selection in cancer classification is a central area of research in the field of bioinformatics and used to select the informative genes from thousands of genes of the microarray. The genes are ranked based on T-statistics, signal-to-noise ratio (SNR), and F-test values. The swarm intelligence (SI) technique finds the informative genes from the top-m ranked genes. These selected genes are used for classification. In this paper the shuffled frog leaping with Lévy flight (SFLLF) is proposed for feature selection. In SFLLF, the Lévy flight is included to avoid premature convergence of shuffled frog leaping (SFL) algorithm. The SI techniques such as particle swarm optimization (PSO), cuckoo search (CS), SFL, and SFLLF are used for feature selection which identifies informative genes for classification. The k-nearest neighbour (k-NN) technique is used to classify the samples. The proposed work is applied on 10 different benchmark datasets and examined with SI techniques. The experimental results show that the results obtained from k-NN classifier through SFLLF feature selection method outperform PSO, CS, and SFL.

  20. A hybrid feature selection method using multiclass SVM for diagnosis of erythemato-squamous disease

    NASA Astrophysics Data System (ADS)

    Maryam, Setiawan, Noor Akhmad; Wahyunggoro, Oyas

    2017-08-01

    The diagnosis of erythemato-squamous disease is a complex problem and difficult to detect in dermatology. Besides that, it is a major cause of skin cancer. Data mining implementation in the medical field helps expert to diagnose precisely, accurately, and inexpensively. In this research, we use data mining technique to developed a diagnosis model based on multiclass SVM with a novel hybrid feature selection method to diagnose erythemato-squamous disease. Our hybrid feature selection method, named ChiGA (Chi Square and Genetic Algorithm), uses the advantages from filter and wrapper methods to select the optimal feature subset from original feature. Chi square used as filter method to remove redundant features and GA as wrapper method to select the ideal feature subset with SVM used as classifier. Experiment performed with 10 fold cross validation on erythemato-squamous diseases dataset taken from University of California Irvine (UCI) machine learning database. The experimental result shows that the proposed model based multiclass SVM with Chi Square and GA can give an optimum feature subset. There are 18 optimum features with 99.18% accuracy.

  1. Classification of motor imagery tasks for BCI with multiresolution analysis and multiobjective feature selection.

    PubMed

    Ortega, Julio; Asensio-Cubero, Javier; Gan, John Q; Ortiz, Andrés

    2016-07-15

    Brain-computer interfacing (BCI) applications based on the classification of electroencephalographic (EEG) signals require solving high-dimensional pattern classification problems with such a relatively small number of training patterns that curse of dimensionality problems usually arise. Multiresolution analysis (MRA) has useful properties for signal analysis in both temporal and spectral analysis, and has been broadly used in the BCI field. However, MRA usually increases the dimensionality of the input data. Therefore, some approaches to feature selection or feature dimensionality reduction should be considered for improving the performance of the MRA based BCI. This paper investigates feature selection in the MRA-based frameworks for BCI. Several wrapper approaches to evolutionary multiobjective feature selection are proposed with different structures of classifiers. They are evaluated by comparing with baseline methods using sparse representation of features or without feature selection. The statistical analysis, by applying the Kolmogorov-Smirnoff and Kruskal-Wallis tests to the means of the Kappa values evaluated by using the test patterns in each approach, has demonstrated some advantages of the proposed approaches. In comparison with the baseline MRA approach used in previous studies, the proposed evolutionary multiobjective feature selection approaches provide similar or even better classification performances, with significant reduction in the number of features that need to be computed.

  2. A study of metaheuristic algorithms for high dimensional feature selection on microarray data

    NASA Astrophysics Data System (ADS)

    Dankolo, Muhammad Nasiru; Radzi, Nor Haizan Mohamed; Sallehuddin, Roselina; Mustaffa, Noorfa Haszlinna

    2017-11-01

    Microarray systems enable experts to examine gene profile at molecular level using machine learning algorithms. It increases the potentials of classification and diagnosis of many diseases at gene expression level. Though, numerous difficulties may affect the efficiency of machine learning algorithms which includes vast number of genes features comprised in the original data. Many of these features may be unrelated to the intended analysis. Therefore, feature selection is necessary to be performed in the data pre-processing. Many feature selection algorithms are developed and applied on microarray which including the metaheuristic optimization algorithms. This paper discusses the application of the metaheuristics algorithms for feature selection in microarray dataset. This study reveals that, the algorithms have yield an interesting result with limited resources thereby saving computational expenses of machine learning algorithms.

  3. Toward high-throughput phenotyping: unbiased automated feature extraction and selection from knowledge sources.

    PubMed

    Yu, Sheng; Liao, Katherine P; Shaw, Stanley Y; Gainer, Vivian S; Churchill, Susanne E; Szolovits, Peter; Murphy, Shawn N; Kohane, Isaac S; Cai, Tianxi

    2015-09-01

    Analysis of narrative (text) data from electronic health records (EHRs) can improve population-scale phenotyping for clinical and genetic research. Currently, selection of text features for phenotyping algorithms is slow and laborious, requiring extensive and iterative involvement by domain experts. This paper introduces a method to develop phenotyping algorithms in an unbiased manner by automatically extracting and selecting informative features, which can be comparable to expert-curated ones in classification accuracy. Comprehensive medical concepts were collected from publicly available knowledge sources in an automated, unbiased fashion. Natural language processing (NLP) revealed the occurrence patterns of these concepts in EHR narrative notes, which enabled selection of informative features for phenotype classification. When combined with additional codified features, a penalized logistic regression model was trained to classify the target phenotype. The authors applied our method to develop algorithms to identify patients with rheumatoid arthritis and coronary artery disease cases among those with rheumatoid arthritis from a large multi-institutional EHR. The area under the receiver operating characteristic curves (AUC) for classifying RA and CAD using models trained with automated features were 0.951 and 0.929, respectively, compared to the AUCs of 0.938 and 0.929 by models trained with expert-curated features. Models trained with NLP text features selected through an unbiased, automated procedure achieved comparable or slightly higher accuracy than those trained with expert-curated features. The majority of the selected model features were interpretable. The proposed automated feature extraction method, generating highly accurate phenotyping algorithms with improved efficiency, is a significant step toward high-throughput phenotyping. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association. All

  4. Kernel-based Joint Feature Selection and Max-Margin Classification for Early Diagnosis of Parkinson’s Disease

    NASA Astrophysics Data System (ADS)

    Adeli, Ehsan; Wu, Guorong; Saghafi, Behrouz; An, Le; Shi, Feng; Shen, Dinggang

    2017-01-01

    Feature selection methods usually select the most compact and relevant set of features based on their contribution to a linear regression model. Thus, these features might not be the best for a non-linear classifier. This is especially crucial for the tasks, in which the performance is heavily dependent on the feature selection techniques, like the diagnosis of neurodegenerative diseases. Parkinson’s disease (PD) is one of the most common neurodegenerative disorders, which progresses slowly while affects the quality of life dramatically. In this paper, we use the data acquired from multi-modal neuroimaging data to diagnose PD by investigating the brain regions, known to be affected at the early stages. We propose a joint kernel-based feature selection and classification framework. Unlike conventional feature selection techniques that select features based on their performance in the original input feature space, we select features that best benefit the classification scheme in the kernel space. We further propose kernel functions, specifically designed for our non-negative feature types. We use MRI and SPECT data of 538 subjects from the PPMI database, and obtain a diagnosis accuracy of 97.5%, which outperforms all baseline and state-of-the-art methods.

  5. Kernel-based Joint Feature Selection and Max-Margin Classification for Early Diagnosis of Parkinson’s Disease

    PubMed Central

    Adeli, Ehsan; Wu, Guorong; Saghafi, Behrouz; An, Le; Shi, Feng; Shen, Dinggang

    2017-01-01

    Feature selection methods usually select the most compact and relevant set of features based on their contribution to a linear regression model. Thus, these features might not be the best for a non-linear classifier. This is especially crucial for the tasks, in which the performance is heavily dependent on the feature selection techniques, like the diagnosis of neurodegenerative diseases. Parkinson’s disease (PD) is one of the most common neurodegenerative disorders, which progresses slowly while affects the quality of life dramatically. In this paper, we use the data acquired from multi-modal neuroimaging data to diagnose PD by investigating the brain regions, known to be affected at the early stages. We propose a joint kernel-based feature selection and classification framework. Unlike conventional feature selection techniques that select features based on their performance in the original input feature space, we select features that best benefit the classification scheme in the kernel space. We further propose kernel functions, specifically designed for our non-negative feature types. We use MRI and SPECT data of 538 subjects from the PPMI database, and obtain a diagnosis accuracy of 97.5%, which outperforms all baseline and state-of-the-art methods. PMID:28120883

  6. Finger vein recognition with personalized feature selection.

    PubMed

    Xi, Xiaoming; Yang, Gongping; Yin, Yilong; Meng, Xianjing

    2013-08-22

    Finger veins are a promising biometric pattern for personalized identification in terms of their advantages over existing biometrics. Based on the spatial pyramid representation and the combination of more effective information such as gray, texture and shape, this paper proposes a simple but powerful feature, called Pyramid Histograms of Gray, Texture and Orientation Gradients (PHGTOG). For a finger vein image, PHGTOG can reflect the global spatial layout and local details of gray, texture and shape. To further improve the recognition performance and reduce the computational complexity, we select a personalized subset of features from PHGTOG for each subject by using the sparse weight vector, which is trained by using LASSO and called PFS-PHGTOG. We conduct extensive experiments to demonstrate the promise of the PHGTOG and PFS-PHGTOG, experimental results on our databases show that PHGTOG outperforms the other existing features. Moreover, PFS-PHGTOG can further boost the performance in comparison with PHGTOG.

  7. Finger Vein Recognition with Personalized Feature Selection

    PubMed Central

    Xi, Xiaoming; Yang, Gongping; Yin, Yilong; Meng, Xianjing

    2013-01-01

    Finger veins are a promising biometric pattern for personalized identification in terms of their advantages over existing biometrics. Based on the spatial pyramid representation and the combination of more effective information such as gray, texture and shape, this paper proposes a simple but powerful feature, called Pyramid Histograms of Gray, Texture and Orientation Gradients (PHGTOG). For a finger vein image, PHGTOG can reflect the global spatial layout and local details of gray, texture and shape. To further improve the recognition performance and reduce the computational complexity, we select a personalized subset of features from PHGTOG for each subject by using the sparse weight vector, which is trained by using LASSO and called PFS-PHGTOG. We conduct extensive experiments to demonstrate the promise of the PHGTOG and PFS-PHGTOG, experimental results on our databases show that PHGTOG outperforms the other existing features. Moreover, PFS-PHGTOG can further boost the performance in comparison with PHGTOG. PMID:23974154

  8. Feature Selection Method Based on Neighborhood Relationships: Applications in EEG Signal Identification and Chinese Character Recognition

    PubMed Central

    Zhao, Yu-Xiang; Chou, Chien-Hsing

    2016-01-01

    In this study, a new feature selection algorithm, the neighborhood-relationship feature selection (NRFS) algorithm, is proposed for identifying rat electroencephalogram signals and recognizing Chinese characters. In these two applications, dependent relationships exist among the feature vectors and their neighboring feature vectors. Therefore, the proposed NRFS algorithm was designed for solving this problem. By applying the NRFS algorithm, unselected feature vectors have a high priority of being added into the feature subset if the neighboring feature vectors have been selected. In addition, selected feature vectors have a high priority of being eliminated if the neighboring feature vectors are not selected. In the experiments conducted in this study, the NRFS algorithm was compared with two feature algorithms. The experimental results indicated that the NRFS algorithm can extract the crucial frequency bands for identifying rat vigilance states and identifying crucial character regions for recognizing Chinese characters. PMID:27314346

  9. Feature Selection from a Facial Image for Distinction of Sasang Constitution

    PubMed Central

    Koo, Imhoi; Kim, Jong Yeol; Kim, Myoung Geun

    2009-01-01

    Recently, oriental medicine has received attention for providing personalized medicine through consideration of the unique nature and constitution of individual patients. With the eventual goal of globalization, the current trend in oriental medicine research is the standardization by adopting western scientific methods, which could represent a scientific revolution. The purpose of this study is to establish methods for finding statistically significant features in a facial image with respect to distinguishing constitution and to show the meaning of those features. From facial photo images, facial elements are analyzed in terms of the distance, angle and the distance ratios, for which there are 1225, 61 250 and 749 700 features, respectively. Due to the very large number of facial features, it is quite difficult to determine truly meaningful features. We suggest a process for the efficient analysis of facial features including the removal of outliers, control for missing data to guarantee data confidence and calculation of statistical significance by applying ANOVA. We show the statistical properties of selected features according to different constitutions using the nine distances, 10 angles and 10 rates of distance features that are finally established. Additionally, the Sasang constitutional meaning of the selected features is shown here. PMID:19745013

  10. Feature diagnosticity and task context shape activity in human scene-selective cortex.

    PubMed

    Lowe, Matthew X; Gallivan, Jason P; Ferber, Susanne; Cant, Jonathan S

    2016-01-15

    Scenes are constructed from multiple visual features, yet previous research investigating scene processing has often focused on the contributions of single features in isolation. In the real world, features rarely exist independently of one another and likely converge to inform scene identity in unique ways. Here, we utilize fMRI and pattern classification techniques to examine the interactions between task context (i.e., attend to diagnostic global scene features; texture or layout) and high-level scene attributes (content and spatial boundary) to test the novel hypothesis that scene-selective cortex represents multiple visual features, the importance of which varies according to their diagnostic relevance across scene categories and task demands. Our results show for the first time that scene representations are driven by interactions between multiple visual features and high-level scene attributes. Specifically, univariate analysis of scene-selective cortex revealed that task context and feature diagnosticity shape activity differentially across scene categories. Examination using multivariate decoding methods revealed results consistent with univariate findings, but also evidence for an interaction between high-level scene attributes and diagnostic visual features within scene categories. Critically, these findings suggest visual feature representations are not distributed uniformly across scene categories but are shaped by task context and feature diagnosticity. Thus, we propose that scene-selective cortex constructs a flexible representation of the environment by integrating multiple diagnostically relevant visual features, the nature of which varies according to the particular scene being perceived and the goals of the observer. Copyright © 2015 Elsevier Inc. All rights reserved.

  11. Infrared face recognition based on LBP histogram and KW feature selection

    NASA Astrophysics Data System (ADS)

    Xie, Zhihua

    2014-07-01

    The conventional LBP-based feature as represented by the local binary pattern (LBP) histogram still has room for performance improvements. This paper focuses on the dimension reduction of LBP micro-patterns and proposes an improved infrared face recognition method based on LBP histogram representation. To extract the local robust features in infrared face images, LBP is chosen to get the composition of micro-patterns of sub-blocks. Based on statistical test theory, Kruskal-Wallis (KW) feature selection method is proposed to get the LBP patterns which are suitable for infrared face recognition. The experimental results show combination of LBP and KW features selection improves the performance of infrared face recognition, the proposed method outperforms the traditional methods based on LBP histogram, discrete cosine transform(DCT) or principal component analysis(PCA).

  12. Receptive fields selection for binary feature description.

    PubMed

    Fan, Bin; Kong, Qingqun; Trzcinski, Tomasz; Wang, Zhiheng; Pan, Chunhong; Fua, Pascal

    2014-06-01

    Feature description for local image patch is widely used in computer vision. While the conventional way to design local descriptor is based on expert experience and knowledge, learning-based methods for designing local descriptor become more and more popular because of their good performance and data-driven property. This paper proposes a novel data-driven method for designing binary feature descriptor, which we call receptive fields descriptor (RFD). Technically, RFD is constructed by thresholding responses of a set of receptive fields, which are selected from a large number of candidates according to their distinctiveness and correlations in a greedy way. Using two different kinds of receptive fields (namely rectangular pooling area and Gaussian pooling area) for selection, we obtain two binary descriptors RFDR and RFDG .accordingly. Image matching experiments on the well-known patch data set and Oxford data set demonstrate that RFD significantly outperforms the state-of-the-art binary descriptors, and is comparable with the best float-valued descriptors at a fraction of processing time. Finally, experiments on object recognition tasks confirm that both RFDR and RFDG successfully bridge the performance gap between binary descriptors and their floating-point competitors.

  13. A ROC-based feature selection method for computer-aided detection and diagnosis

    NASA Astrophysics Data System (ADS)

    Wang, Songyuan; Zhang, Guopeng; Liao, Qimei; Zhang, Junying; Jiao, Chun; Lu, Hongbing

    2014-03-01

    Image-based computer-aided detection and diagnosis (CAD) has been a very active research topic aiming to assist physicians to detect lesions and distinguish them from benign to malignant. However, the datasets fed into a classifier usually suffer from small number of samples, as well as significantly less samples available in one class (have a disease) than the other, resulting in the classifier's suboptimal performance. How to identifying the most characterizing features of the observed data for lesion detection is critical to improve the sensitivity and minimize false positives of a CAD system. In this study, we propose a novel feature selection method mR-FAST that combines the minimal-redundancymaximal relevance (mRMR) framework with a selection metric FAST (feature assessment by sliding thresholds) based on the area under a ROC curve (AUC) generated on optimal simple linear discriminants. With three feature datasets extracted from CAD systems for colon polyps and bladder cancer, we show that the space of candidate features selected by mR-FAST is more characterizing for lesion detection with higher AUC, enabling to find a compact subset of superior features at low cost.

  14. Raman spectral feature selection using ant colony optimization for breast cancer diagnosis.

    PubMed

    Fallahzadeh, Omid; Dehghani-Bidgoli, Zohreh; Assarian, Mohammad

    2018-06-04

    Pathology as a common diagnostic test of cancer is an invasive, time-consuming, and partially subjective method. Therefore, optical techniques, especially Raman spectroscopy, have attracted the attention of cancer diagnosis researchers. However, as Raman spectra contain numerous peaks involved in molecular bounds of the sample, finding the best features related to cancerous changes can improve the accuracy of diagnosis in this method. The present research attempted to improve the power of Raman-based cancer diagnosis by finding the best Raman features using the ACO algorithm. In the present research, 49 spectra were measured from normal, benign, and cancerous breast tissue samples using a 785-nm micro-Raman system. After preprocessing for removal of noise and background fluorescence, the intensity of 12 important Raman bands of the biological samples was extracted as features of each spectrum. Then, the ACO algorithm was applied to find the optimum features for diagnosis. As the results demonstrated, by selecting five features, the classification accuracy of the normal, benign, and cancerous groups increased by 14% and reached 87.7%. ACO feature selection can improve the diagnostic accuracy of Raman-based diagnostic models. In the present study, features corresponding to ν(C-C) αhelix proline, valine (910-940), νs(C-C) skeletal lipids (1110-1130), and δ(CH2)/δ(CH3) proteins (1445-1460) were selected as the best features in cancer diagnosis.

  15. Compensatory selection for roads over natural linear features by wolves in northern Ontario: Implications for caribou conservation

    PubMed Central

    Patterson, Brent R.; Anderson, Morgan L.; Rodgers, Arthur R.; Vander Vennen, Lucas M.; Fryxell, John M.

    2017-01-01

    Woodland caribou (Rangifer tarandus caribou) in Ontario are a threatened species that have experienced a substantial retraction of their historic range. Part of their decline has been attributed to increasing densities of anthropogenic linear features such as trails, roads, railways, and hydro lines. These features have been shown to increase the search efficiency and kill rate of wolves. However, it is unclear whether selection for anthropogenic linear features is additive or compensatory to selection for natural (water) linear features which may also be used for travel. We studied the selection of water and anthropogenic linear features by 52 resident wolves (Canis lupus x lycaon) over four years across three study areas in northern Ontario that varied in degrees of forestry activity and human disturbance. We used Euclidean distance-based resource selection functions (mixed-effects logistic regression) at the seasonal range scale with random coefficients for distance to water linear features, primary/secondary roads/railways, and hydro lines, and tertiary roads to estimate the strength of selection for each linear feature and for several habitat types, while accounting for availability of each feature. Next, we investigated the trade-off between selection for anthropogenic and water linear features. Wolves selected both anthropogenic and water linear features; selection for anthropogenic features was stronger than for water during the rendezvous season. Selection for anthropogenic linear features increased with increasing density of these features on the landscape, while selection for natural linear features declined, indicating compensatory selection of anthropogenic linear features. These results have implications for woodland caribou conservation. Prey encounter rates between wolves and caribou seem to be strongly influenced by increasing linear feature densities. This behavioral mechanism–a compensatory functional response to anthropogenic linear feature

  16. Compensatory selection for roads over natural linear features by wolves in northern Ontario: Implications for caribou conservation.

    PubMed

    Newton, Erica J; Patterson, Brent R; Anderson, Morgan L; Rodgers, Arthur R; Vander Vennen, Lucas M; Fryxell, John M

    2017-01-01

    Woodland caribou (Rangifer tarandus caribou) in Ontario are a threatened species that have experienced a substantial retraction of their historic range. Part of their decline has been attributed to increasing densities of anthropogenic linear features such as trails, roads, railways, and hydro lines. These features have been shown to increase the search efficiency and kill rate of wolves. However, it is unclear whether selection for anthropogenic linear features is additive or compensatory to selection for natural (water) linear features which may also be used for travel. We studied the selection of water and anthropogenic linear features by 52 resident wolves (Canis lupus x lycaon) over four years across three study areas in northern Ontario that varied in degrees of forestry activity and human disturbance. We used Euclidean distance-based resource selection functions (mixed-effects logistic regression) at the seasonal range scale with random coefficients for distance to water linear features, primary/secondary roads/railways, and hydro lines, and tertiary roads to estimate the strength of selection for each linear feature and for several habitat types, while accounting for availability of each feature. Next, we investigated the trade-off between selection for anthropogenic and water linear features. Wolves selected both anthropogenic and water linear features; selection for anthropogenic features was stronger than for water during the rendezvous season. Selection for anthropogenic linear features increased with increasing density of these features on the landscape, while selection for natural linear features declined, indicating compensatory selection of anthropogenic linear features. These results have implications for woodland caribou conservation. Prey encounter rates between wolves and caribou seem to be strongly influenced by increasing linear feature densities. This behavioral mechanism-a compensatory functional response to anthropogenic linear feature

  17. Multi-level gene/MiRNA feature selection using deep belief nets and active learning.

    PubMed

    Ibrahim, Rania; Yousri, Noha A; Ismail, Mohamed A; El-Makky, Nagwa M

    2014-01-01

    Selecting the most discriminative genes/miRNAs has been raised as an important task in bioinformatics to enhance disease classifiers and to mitigate the dimensionality curse problem. Original feature selection methods choose genes/miRNAs based on their individual features regardless of how they perform together. Considering group features instead of individual ones provides a better view for selecting the most informative genes/miRNAs. Recently, deep learning has proven its ability in representing the data in multiple levels of abstraction, allowing for better discrimination between different classes. However, the idea of using deep learning for feature selection is not widely used in the bioinformatics field yet. In this paper, a novel multi-level feature selection approach named MLFS is proposed for selecting genes/miRNAs based on expression profiles. The approach is based on both deep and active learning. Moreover, an extension to use the technique for miRNAs is presented by considering the biological relation between miRNAs and genes. Experimental results show that the approach was able to outperform classical feature selection methods in hepatocellular carcinoma (HCC) by 9%, lung cancer by 6% and breast cancer by around 10% in F1-measure. Results also show the enhancement in F1-measure of our approach over recently related work in [1] and [2].

  18. Influence of time and length size feature selections for human activity sequences recognition.

    PubMed

    Fang, Hongqing; Chen, Long; Srinivasan, Raghavendiran

    2014-01-01

    In this paper, Viterbi algorithm based on a hidden Markov model is applied to recognize activity sequences from observed sensors events. Alternative features selections of time feature values of sensors events and activity length size feature values are tested, respectively, and then the results of activity sequences recognition performances of Viterbi algorithm are evaluated. The results show that the selection of larger time feature values of sensor events and/or smaller activity length size feature values will generate relatively better results on the activity sequences recognition performances. © 2013 ISA Published by ISA All rights reserved.

  19. Comparison of Different EHG Feature Selection Methods for the Detection of Preterm Labor

    PubMed Central

    Alamedine, D.; Khalil, M.; Marque, C.

    2013-01-01

    Numerous types of linear and nonlinear features have been extracted from the electrohysterogram (EHG) in order to classify labor and pregnancy contractions. As a result, the number of available features is now very large. The goal of this study is to reduce the number of features by selecting only the relevant ones which are useful for solving the classification problem. This paper presents three methods for feature subset selection that can be applied to choose the best subsets for classifying labor and pregnancy contractions: an algorithm using the Jeffrey divergence (JD) distance, a sequential forward selection (SFS) algorithm, and a binary particle swarm optimization (BPSO) algorithm. The two last methods are based on a classifier and were tested with three types of classifiers. These methods have allowed us to identify common features which are relevant for contraction classification. PMID:24454536

  20. Discriminative and informative features for biomolecular text mining with ensemble feature selection.

    PubMed

    Van Landeghem, Sofie; Abeel, Thomas; Saeys, Yvan; Van de Peer, Yves

    2010-09-15

    In the field of biomolecular text mining, black box behavior of machine learning systems currently limits understanding of the true nature of the predictions. However, feature selection (FS) is capable of identifying the most relevant features in any supervised learning setting, providing insight into the specific properties of the classification algorithm. This allows us to build more accurate classifiers while at the same time bridging the gap between the black box behavior and the end-user who has to interpret the results. We show that our FS methodology successfully discards a large fraction of machine-generated features, improving classification performance of state-of-the-art text mining algorithms. Furthermore, we illustrate how FS can be applied to gain understanding in the predictions of a framework for biomolecular event extraction from text. We include numerous examples of highly discriminative features that model either biological reality or common linguistic constructs. Finally, we discuss a number of insights from our FS analyses that will provide the opportunity to considerably improve upon current text mining tools. The FS algorithms and classifiers are available in Java-ML (http://java-ml.sf.net). The datasets are publicly available from the BioNLP'09 Shared Task web site (http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/SharedTask/).

  1. Efficient feature selection using a hybrid algorithm for the task of epileptic seizure detection

    NASA Astrophysics Data System (ADS)

    Lai, Kee Huong; Zainuddin, Zarita; Ong, Pauline

    2014-07-01

    Feature selection is a very important aspect in the field of machine learning. It entails the search of an optimal subset from a very large data set with high dimensional feature space. Apart from eliminating redundant features and reducing computational cost, a good selection of feature also leads to higher prediction and classification accuracy. In this paper, an efficient feature selection technique is introduced in the task of epileptic seizure detection. The raw data are electroencephalography (EEG) signals. Using discrete wavelet transform, the biomedical signals were decomposed into several sets of wavelet coefficients. To reduce the dimension of these wavelet coefficients, a feature selection method that combines the strength of both filter and wrapper methods is proposed. Principal component analysis (PCA) is used as part of the filter method. As for wrapper method, the evolutionary harmony search (HS) algorithm is employed. This metaheuristic method aims at finding the best discriminating set of features from the original data. The obtained features were then used as input for an automated classifier, namely wavelet neural networks (WNNs). The WNNs model was trained to perform a binary classification task, that is, to determine whether a given EEG signal was normal or epileptic. For comparison purposes, different sets of features were also used as input. Simulation results showed that the WNNs that used the features chosen by the hybrid algorithm achieved the highest overall classification accuracy.

  2. HIV-1 protease cleavage site prediction based on two-stage feature selection method.

    PubMed

    Niu, Bing; Yuan, Xiao-Cheng; Roeper, Preston; Su, Qiang; Peng, Chun-Rong; Yin, Jing-Yuan; Ding, Juan; Li, HaiPeng; Lu, Wen-Cong

    2013-03-01

    Knowledge of the mechanism of HIV protease cleavage specificity is critical to the design of specific and effective HIV inhibitors. Searching for an accurate, robust, and rapid method to correctly predict the cleavage sites in proteins is crucial when searching for possible HIV inhibitors. In this article, HIV-1 protease specificity was studied using the correlation-based feature subset (CfsSubset) selection method combined with Genetic Algorithms method. Thirty important biochemical features were found based on a jackknife test from the original data set containing 4,248 features. By using the AdaBoost method with the thirty selected features the prediction model yields an accuracy of 96.7% for the jackknife test and 92.1% for an independent set test, with increased accuracy over the original dataset by 6.7% and 77.4%, respectively. Our feature selection scheme could be a useful technique for finding effective competitive inhibitors of HIV protease.

  3. Ant-cuckoo colony optimization for feature selection in digital mammogram.

    PubMed

    Jona, J B; Nagaveni, N

    2014-01-15

    Digital mammogram is the only effective screening method to detect the breast cancer. Gray Level Co-occurrence Matrix (GLCM) textural features are extracted from the mammogram. All the features are not essential to detect the mammogram. Therefore identifying the relevant feature is the aim of this work. Feature selection improves the classification rate and accuracy of any classifier. In this study, a new hybrid metaheuristic named Ant-Cuckoo Colony Optimization a hybrid of Ant Colony Optimization (ACO) and Cuckoo Search (CS) is proposed for feature selection in Digital Mammogram. ACO is a good metaheuristic optimization technique but the drawback of this algorithm is that the ant will walk through the path where the pheromone density is high which makes the whole process slow hence CS is employed to carry out the local search of ACO. Support Vector Machine (SVM) classifier with Radial Basis Kernal Function (RBF) is done along with the ACO to classify the normal mammogram from the abnormal mammogram. Experiments are conducted in miniMIAS database. The performance of the new hybrid algorithm is compared with the ACO and PSO algorithm. The results show that the hybrid Ant-Cuckoo Colony Optimization algorithm is more accurate than the other techniques.

  4. Emotional textile image classification based on cross-domain convolutional sparse autoencoders with feature selection

    NASA Astrophysics Data System (ADS)

    Li, Zuhe; Fan, Yangyu; Liu, Weihua; Yu, Zeqi; Wang, Fengqin

    2017-01-01

    We aim to apply sparse autoencoder-based unsupervised feature learning to emotional semantic analysis for textile images. To tackle the problem of limited training data, we present a cross-domain feature learning scheme for emotional textile image classification using convolutional autoencoders. We further propose a correlation-analysis-based feature selection method for the weights learned by sparse autoencoders to reduce the number of features extracted from large size images. First, we randomly collect image patches on an unlabeled image dataset in the source domain and learn local features with a sparse autoencoder. We then conduct feature selection according to the correlation between different weight vectors corresponding to the autoencoder's hidden units. We finally adopt a convolutional neural network including a pooling layer to obtain global feature activations of textile images in the target domain and send these global feature vectors into logistic regression models for emotional image classification. The cross-domain unsupervised feature learning method achieves 65% to 78% average accuracy in the cross-validation experiments corresponding to eight emotional categories and performs better than conventional methods. Feature selection can reduce the computational cost of global feature extraction by about 50% while improving classification performance.

  5. Jade: using on-demand cloud analysis to give scientists back their flow

    NASA Astrophysics Data System (ADS)

    Robinson, N.; Tomlinson, J.; Hilson, A. J.; Arribas, A.; Powell, T.

    2017-12-01

    The UK's Met Office generates 400 TB weather and climate data every day by running physical models on its Top 20 supercomputer. As data volumes explode, there is a danger that analysis workflows become dominated by watching progress bars, and not thinking about science. We have been researching how we can use distributed computing to allow analysts to process these large volumes of high velocity data in a way that's easy, effective and cheap.Our prototype analysis stack, Jade, tries to encapsulate this. Functionality includes: An under-the-hood Dask engine which parallelises and distributes computations, without the need to retrain analysts Hybrid compute clusters (AWS, Alibaba, and local compute) comprising many thousands of cores Clusters which autoscale up/down in response to calculation load using Kubernetes, and balances the cluster across providers based on the current price of compute Lazy data access from cloud storage via containerised OpenDAP This technology stack allows us to perform calculations many orders of magnitude faster than is possible on local workstations. It is also possible to outperform dedicated local compute clusters, as cloud compute can, in principle, scale to much larger scales. The use of ephemeral compute resources also makes this implementation cost efficient.

  6. Computational Prediction of Protein Epsilon Lysine Acetylation Sites Based on a Feature Selection Method.

    PubMed

    Gao, JianZhao; Tao, Xue-Wen; Zhao, Jia; Feng, Yuan-Ming; Cai, Yu-Dong; Zhang, Ning

    2017-01-01

    Lysine acetylation, as one type of post-translational modifications (PTM), plays key roles in cellular regulations and can be involved in a variety of human diseases. However, it is often high-cost and time-consuming to use traditional experimental approaches to identify the lysine acetylation sites. Therefore, effective computational methods should be developed to predict the acetylation sites. In this study, we developed a position-specific method for epsilon lysine acetylation site prediction. Sequences of acetylated proteins were retrieved from the UniProt database. Various kinds of features such as position specific scoring matrix (PSSM), amino acid factors (AAF), and disorders were incorporated. A feature selection method based on mRMR (Maximum Relevance Minimum Redundancy) and IFS (Incremental Feature Selection) was employed. Finally, 319 optimal features were selected from total 541 features. Using the 319 optimal features to encode peptides, a predictor was constructed based on dagging. As a result, an accuracy of 69.56% with MCC of 0.2792 was achieved. We analyzed the optimal features, which suggested some important factors determining the lysine acetylation sites. We developed a position-specific method for epsilon lysine acetylation site prediction. A set of optimal features was selected. Analysis of the optimal features provided insights into the mechanism of lysine acetylation sites, providing guidance of experimental validation. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  7. Comparative study of feature selection with ensemble learning using SOM variants

    NASA Astrophysics Data System (ADS)

    Filali, Ameni; Jlassi, Chiraz; Arous, Najet

    2017-03-01

    Ensemble learning has succeeded in the growth of stability and clustering accuracy, but their runtime prohibits them from scaling up to real-world applications. This study deals the problem of selecting a subset of the most pertinent features for every cluster from a dataset. The proposed method is another extension of the Random Forests approach using self-organizing maps (SOM) variants to unlabeled data that estimates the out-of-bag feature importance from a set of partitions. Every partition is created using a various bootstrap sample and a random subset of the features. Then, we show that the process internal estimates are used to measure variable pertinence in Random Forests are also applicable to feature selection in unsupervised learning. This approach aims to the dimensionality reduction, visualization and cluster characterization at the same time. Hence, we provide empirical results on nineteen benchmark data sets indicating that RFS can lead to significant improvement in terms of clustering accuracy, over several state-of-the-art unsupervised methods, with a very limited subset of features. The approach proves promise to treat with very broad domains.

  8. Effect of finite sample size on feature selection and classification: a simulation study.

    PubMed

    Way, Ted W; Sahiner, Berkman; Hadjiiski, Lubomir M; Chan, Heang-Ping

    2010-02-01

    The small number of samples available for training and testing is often the limiting factor in finding the most effective features and designing an optimal computer-aided diagnosis (CAD) system. Training on a limited set of samples introduces bias and variance in the performance of a CAD system relative to that trained with an infinite sample size. In this work, the authors conducted a simulation study to evaluate the performances of various combinations of classifiers and feature selection techniques and their dependence on the class distribution, dimensionality, and the training sample size. The understanding of these relationships will facilitate development of effective CAD systems under the constraint of limited available samples. Three feature selection techniques, the stepwise feature selection (SFS), sequential floating forward search (SFFS), and principal component analysis (PCA), and two commonly used classifiers, Fisher's linear discriminant analysis (LDA) and support vector machine (SVM), were investigated. Samples were drawn from multidimensional feature spaces of multivariate Gaussian distributions with equal or unequal covariance matrices and unequal means, and with equal covariance matrices and unequal means estimated from a clinical data set. Classifier performance was quantified by the area under the receiver operating characteristic curve Az. The mean Az values obtained by resubstitution and hold-out methods were evaluated for training sample sizes ranging from 15 to 100 per class. The number of simulated features available for selection was chosen to be 50, 100, and 200. It was found that the relative performance of the different combinations of classifier and feature selection method depends on the feature space distributions, the dimensionality, and the available training sample sizes. The LDA and SVM with radial kernel performed similarly for most of the conditions evaluated in this study, although the SVM classifier showed a slightly higher

  9. [Combining speech sample and feature bilateral selection algorithm for classification of Parkinson's disease].

    PubMed

    Zhang, Xiaoheng; Wang, Lirui; Cao, Yao; Wang, Pin; Zhang, Cheng; Yang, Liuyang; Li, Yongming; Zhang, Yanling; Cheng, Oumei

    2018-02-01

    Diagnosis of Parkinson's disease (PD) based on speech data has been proved to be an effective way in recent years. However, current researches just care about the feature extraction and classifier design, and do not consider the instance selection. Former research by authors showed that the instance selection can lead to improvement on classification accuracy. However, no attention is paid on the relationship between speech sample and feature until now. Therefore, a new diagnosis algorithm of PD is proposed in this paper by simultaneously selecting speech sample and feature based on relevant feature weighting algorithm and multiple kernel method, so as to find their synergy effects, thereby improving classification accuracy. Experimental results showed that this proposed algorithm obtained apparent improvement on classification accuracy. It can obtain mean classification accuracy of 82.5%, which was 30.5% higher than the relevant algorithm. Besides, the proposed algorithm detected the synergy effects of speech sample and feature, which is valuable for speech marker extraction.

  10. Spectrally queued feature selection for robotic visual odometery

    NASA Astrophysics Data System (ADS)

    Pirozzo, David M.; Frederick, Philip A.; Hunt, Shawn; Theisen, Bernard; Del Rose, Mike

    2011-01-01

    Over the last two decades, research in Unmanned Vehicles (UV) has rapidly progressed and become more influenced by the field of biological sciences. Researchers have been investigating mechanical aspects of varying species to improve UV air and ground intrinsic mobility, they have been exploring the computational aspects of the brain for the development of pattern recognition and decision algorithms and they have been exploring perception capabilities of numerous animals and insects. This paper describes a 3 month exploratory applied research effort performed at the US ARMY Research, Development and Engineering Command's (RDECOM) Tank Automotive Research, Development and Engineering Center (TARDEC) in the area of biologically inspired spectrally augmented feature selection for robotic visual odometry. The motivation for this applied research was to develop a feasibility analysis on multi-spectrally queued feature selection, with improved temporal stability, for the purposes of visual odometry. The intended application is future semi-autonomous Unmanned Ground Vehicle (UGV) control as the richness of data sets required to enable human like behavior in these systems has yet to be defined.

  11. Kruskal-Wallis-based computationally efficient feature selection for face recognition.

    PubMed

    Ali Khan, Sajid; Hussain, Ayyaz; Basit, Abdul; Akram, Sheeraz

    2014-01-01

    Face recognition in today's technological world, and face recognition applications attain much more importance. Most of the existing work used frontal face images to classify face image. However these techniques fail when applied on real world face images. The proposed technique effectively extracts the prominent facial features. Most of the features are redundant and do not contribute to representing face. In order to eliminate those redundant features, computationally efficient algorithm is used to select the more discriminative face features. Extracted features are then passed to classification step. In the classification step, different classifiers are ensemble to enhance the recognition accuracy rate as single classifier is unable to achieve the high accuracy. Experiments are performed on standard face database images and results are compared with existing techniques.

  12. Unsupervised Feature Selection Based on the Morisita Index for Hyperspectral Images

    NASA Astrophysics Data System (ADS)

    Golay, Jean; Kanevski, Mikhail

    2017-04-01

    Hyperspectral sensors are capable of acquiring images with hundreds of narrow and contiguous spectral bands. Compared with traditional multispectral imagery, the use of hyperspectral images allows better performance in discriminating between land-cover classes, but it also results in large redundancy and high computational data processing. To alleviate such issues, unsupervised feature selection techniques for redundancy minimization can be implemented. Their goal is to select the smallest subset of features (or bands) in such a way that all the information content of a data set is preserved as much as possible. The present research deals with the application to hyperspectral images of a recently introduced technique of unsupervised feature selection: the Morisita-Based filter for Redundancy Minimization (MBRM). MBRM is based on the (multipoint) Morisita index of clustering and on the Morisita estimator of Intrinsic Dimension (ID). The fundamental idea of the technique is to retain only the bands which contribute to increasing the ID of an image. In this way, redundant bands are disregarded, since they have no impact on the ID. Besides, MBRM has several advantages over benchmark techniques: in addition to its ability to deal with large data sets, it can capture highly-nonlinear dependences and its implementation is straightforward in any programming environment. Experimental results on freely available hyperspectral images show the good effectiveness of MBRM in remote sensing data processing. Comparisons with benchmark techniques are carried out and random forests are used to assess the performance of MBRM in reducing the data dimensionality without loss of relevant information. References [1] C. Traina Jr., A.J.M. Traina, L. Wu, C. Faloutsos, Fast feature selection using fractal dimension, in: Proceedings of the XV Brazilian Symposium on Databases, SBBD, pp. 158-171, 2000. [2] J. Golay, M. Kanevski, A new estimator of intrinsic dimension based on the multipoint

  13. Weighted feature selection criteria for visual servoing of a telerobot

    NASA Technical Reports Server (NTRS)

    Feddema, John T.; Lee, C. S. G.; Mitchell, O. R.

    1989-01-01

    Because of the continually changing environment of a space station, visual feedback is a vital element of a telerobotic system. A real time visual servoing system would allow a telerobot to track and manipulate randomly moving objects. Methodologies for the automatic selection of image features to be used to visually control the relative position between an eye-in-hand telerobot and a known object are devised. A weighted criteria function with both image recognition and control components is used to select the combination of image features which provides the best control. Simulation and experimental results of a PUMA robot arm visually tracking a randomly moving carburetor gasket with a visual update time of 70 milliseconds are discussed.

  14. A Hierarchical Feature and Sample Selection Framework and Its Application for Alzheimer’s Disease Diagnosis

    NASA Astrophysics Data System (ADS)

    An, Le; Adeli, Ehsan; Liu, Mingxia; Zhang, Jun; Lee, Seong-Whan; Shen, Dinggang

    2017-03-01

    Classification is one of the most important tasks in machine learning. Due to feature redundancy or outliers in samples, using all available data for training a classifier may be suboptimal. For example, the Alzheimer’s disease (AD) is correlated with certain brain regions or single nucleotide polymorphisms (SNPs), and identification of relevant features is critical for computer-aided diagnosis. Many existing methods first select features from structural magnetic resonance imaging (MRI) or SNPs and then use those features to build the classifier. However, with the presence of many redundant features, the most discriminative features are difficult to be identified in a single step. Thus, we formulate a hierarchical feature and sample selection framework to gradually select informative features and discard ambiguous samples in multiple steps for improved classifier learning. To positively guide the data manifold preservation process, we utilize both labeled and unlabeled data during training, making our method semi-supervised. For validation, we conduct experiments on AD diagnosis by selecting mutually informative features from both MRI and SNP, and using the most discriminative samples for training. The superior classification results demonstrate the effectiveness of our approach, as compared with the rivals.

  15. High-order graph matching based feature selection for Alzheimer's disease identification.

    PubMed

    Liu, Feng; Suk, Heung-Il; Wee, Chong-Yaw; Chen, Huafu; Shen, Dinggang

    2013-01-01

    One of the main limitations of l1-norm feature selection is that it focuses on estimating the target vector for each sample individually without considering relations with other samples. However, it's believed that the geometrical relation among target vectors in the training set may provide useful information, and it would be natural to expect that the predicted vectors have similar geometric relations as the target vectors. To overcome these limitations, we formulate this as a graph-matching feature selection problem between a predicted graph and a target graph. In the predicted graph a node is represented by predicted vector that may describe regional gray matter volume or cortical thickness features, and in the target graph a node is represented by target vector that include class label and clinical scores. In particular, we devise new regularization terms in sparse representation to impose high-order graph matching between the target vectors and the predicted ones. Finally, the selected regional gray matter volume and cortical thickness features are fused in kernel space for classification. Using the ADNI dataset, we evaluate the effectiveness of the proposed method and obtain the accuracies of 92.17% and 81.57% in AD and MCI classification, respectively.

  16. Self-adaptive MOEA feature selection for classification of bankruptcy prediction data.

    PubMed

    Gaspar-Cunha, A; Recio, G; Costa, L; Estébanez, C

    2014-01-01

    Bankruptcy prediction is a vast area of finance and accounting whose importance lies in the relevance for creditors and investors in evaluating the likelihood of getting into bankrupt. As companies become complex, they develop sophisticated schemes to hide their real situation. In turn, making an estimation of the credit risks associated with counterparts or predicting bankruptcy becomes harder. Evolutionary algorithms have shown to be an excellent tool to deal with complex problems in finances and economics where a large number of irrelevant features are involved. This paper provides a methodology for feature selection in classification of bankruptcy data sets using an evolutionary multiobjective approach that simultaneously minimise the number of features and maximise the classifier quality measure (e.g., accuracy). The proposed methodology makes use of self-adaptation by applying the feature selection algorithm while simultaneously optimising the parameters of the classifier used. The methodology was applied to four different sets of data. The obtained results showed the utility of using the self-adaptation of the classifier.

  17. Enhancing the Discrimination Ability of a Gas Sensor Array Based on a Novel Feature Selection and Fusion Framework.

    PubMed

    Deng, Changjian; Lv, Kun; Shi, Debo; Yang, Bo; Yu, Song; He, Zhiyi; Yan, Jia

    2018-06-12

    In this paper, a novel feature selection and fusion framework is proposed to enhance the discrimination ability of gas sensor arrays for odor identification. Firstly, we put forward an efficient feature selection method based on the separability and the dissimilarity to determine the feature selection order for each type of feature when increasing the dimension of selected feature subsets. Secondly, the K-nearest neighbor (KNN) classifier is applied to determine the dimensions of the optimal feature subsets for different types of features. Finally, in the process of establishing features fusion, we come up with a classification dominance feature fusion strategy which conducts an effective basic feature. Experimental results on two datasets show that the recognition rates of Database I and Database II achieve 97.5% and 80.11%, respectively, when k = 1 for KNN classifier and the distance metric is correlation distance (COR), which demonstrates the superiority of the proposed feature selection and fusion framework in representing signal features. The novel feature selection method proposed in this paper can effectively select feature subsets that are conducive to the classification, while the feature fusion framework can fuse various features which describe the different characteristics of sensor signals, for enhancing the discrimination ability of gas sensors and, to a certain extent, suppressing drift effect.

  18. Feature-Selective Attentional Modulations in Human Frontoparietal Cortex.

    PubMed

    Ester, Edward F; Sutterer, David W; Serences, John T; Awh, Edward

    2016-08-03

    Control over visual selection has long been framed in terms of a dichotomy between "source" and "site," where top-down feedback signals originating in frontoparietal cortical areas modulate or bias sensory processing in posterior visual areas. This distinction is motivated in part by observations that frontoparietal cortical areas encode task-level variables (e.g., what stimulus is currently relevant or what motor outputs are appropriate), while posterior sensory areas encode continuous or analog feature representations. Here, we present evidence that challenges this distinction. We used fMRI, a roving searchlight analysis, and an inverted encoding model to examine representations of an elementary feature property (orientation) across the entire human cortical sheet while participants attended either the orientation or luminance of a peripheral grating. Orientation-selective representations were present in a multitude of visual, parietal, and prefrontal cortical areas, including portions of the medial occipital cortex, the lateral parietal cortex, and the superior precentral sulcus (thought to contain the human homolog of the macaque frontal eye fields). Additionally, representations in many-but not all-of these regions were stronger when participants were instructed to attend orientation relative to luminance. Collectively, these findings challenge models that posit a strict segregation between sources and sites of attentional control on the basis of representational properties by demonstrating that simple feature values are encoded by cortical regions throughout the visual processing hierarchy, and that representations in many of these areas are modulated by attention. Influential models of visual attention posit a distinction between top-down control and bottom-up sensory processing networks. These models are motivated in part by demonstrations showing that frontoparietal cortical areas associated with top-down control represent abstract or categorical stimulus

  19. Selecting Power-Efficient Signal Features for a Low-Power Fall Detector.

    PubMed

    Wang, Changhong; Redmond, Stephen J; Lu, Wei; Stevens, Michael C; Lord, Stephen R; Lovell, Nigel H

    2017-11-01

    Falls are a serious threat to the health of older people. A wearable fall detector can automatically detect the occurrence of a fall and alert a caregiver or an emergency response service so they may deliver immediate assistance, improving the chances of recovering from fall-related injuries. One constraint of such a wearable technology is its limited battery life. Thus, minimization of power consumption is an important design concern, all the while maintaining satisfactory accuracy of the fall detection algorithms implemented on the wearable device. This paper proposes an approach for selecting power-efficient signal features such that the minimum desirable fall detection accuracy is assured. Using data collected in simulated falls, simulated activities of daily living, and real free-living trials, all using young volunteers, the proposed approach selects four features from a set of ten commonly used features, providing a power saving of 75.3%, while limiting the error rate of a binary classification decision tree fall detection algorithm to 7.1%.Falls are a serious threat to the health of older people. A wearable fall detector can automatically detect the occurrence of a fall and alert a caregiver or an emergency response service so they may deliver immediate assistance, improving the chances of recovering from fall-related injuries. One constraint of such a wearable technology is its limited battery life. Thus, minimization of power consumption is an important design concern, all the while maintaining satisfactory accuracy of the fall detection algorithms implemented on the wearable device. This paper proposes an approach for selecting power-efficient signal features such that the minimum desirable fall detection accuracy is assured. Using data collected in simulated falls, simulated activities of daily living, and real free-living trials, all using young volunteers, the proposed approach selects four features from a set of ten commonly used features, providing

  20. A combinatorial feature selection approach to describe the QSAR of dual site inhibitors of acetylcholinesterase.

    PubMed

    Asadabadi, Ebrahim Barzegari; Abdolmaleki, Parviz; Barkooie, Seyyed Mohsen Hosseini; Jahandideh, Samad; Rezaei, Mohammad Ali

    2009-12-01

    Regarding the great potential of dual binding site inhibitors of acetylcholinesterase as the future potent drugs of Alzheimer's disease, this study was devoted to extraction of the most effective structural features of these inhibitors from among a large number of quantitative descriptors. To do this, we adopted a unique approach in quantitative structure-activity relationships. An efficient feature selection method was emphasized in such an approach, using the confirmative results of different routine and novel feature selection methods. The proposed methods generated quite consistent results ensuring the effectiveness of the selected structural features.

  1. Feature selection gait-based gender classification under different circumstances

    NASA Astrophysics Data System (ADS)

    Sabir, Azhin; Al-Jawad, Naseer; Jassim, Sabah

    2014-05-01

    This paper proposes a gender classification based on human gait features and investigates the problem of two variations: clothing (wearing coats) and carrying bag condition as addition to the normal gait sequence. The feature vectors in the proposed system are constructed after applying wavelet transform. Three different sets of feature are proposed in this method. First, Spatio-temporal distance that is dealing with the distance of different parts of the human body (like feet, knees, hand, Human Height and shoulder) during one gait cycle. The second and third feature sets are constructed from approximation and non-approximation coefficient of human body respectively. To extract these two sets of feature we divided the human body into two parts, upper and lower body part, based on the golden ratio proportion. In this paper, we have adopted a statistical method for constructing the feature vector from the above sets. The dimension of the constructed feature vector is reduced based on the Fisher score as a feature selection method to optimize their discriminating significance. Finally k-Nearest Neighbor is applied as a classification method. Experimental results demonstrate that our approach is providing more realistic scenario and relatively better performance compared with the existing approaches.

  2. Comparisons and Selections of Features and Classifiers for Short Text Classification

    NASA Astrophysics Data System (ADS)

    Wang, Ye; Zhou, Zhi; Jin, Shan; Liu, Debin; Lu, Mi

    2017-10-01

    Short text is considerably different from traditional long text documents due to its shortness and conciseness, which somehow hinders the applications of conventional machine learning and data mining algorithms in short text classification. According to traditional artificial intelligence methods, we divide short text classification into three steps, namely preprocessing, feature selection and classifier comparison. In this paper, we have illustrated step-by-step how we approach our goals. Specifically, in feature selection, we compared the performance and robustness of the four methods of one-hot encoding, tf-idf weighting, word2vec and paragraph2vec, and in the classification part, we deliberately chose and compared Naive Bayes, Logistic Regression, Support Vector Machine, K-nearest Neighbor and Decision Tree as our classifiers. Then, we compared and analysed the classifiers horizontally with each other and vertically with feature selections. Regarding the datasets, we crawled more than 400,000 short text files from Shanghai and Shenzhen Stock Exchanges and manually labeled them into two classes, the big and the small. There are eight labels in the big class, and 59 labels in the small class.

  3. Channel and feature selection in multifunction myoelectric control.

    PubMed

    Khushaba, Rami N; Al-Jumaily, Adel

    2007-01-01

    Real time controlling devices based on myoelectric singles (MES) is one of the challenging research problems. This paper presents a new approach to reduce the computational cost of real time systems driven by Myoelectric signals (MES) (a.k.a Electromyography--EMG). The new approach evaluates the significance of feature/channel selection on MES pattern recognition. Particle Swarm Optimization (PSO), an evolutionary computational technique, is employed to search the feature/channel space for important subsets. These important subsets will be evaluated using a multilayer perceptron trained with back propagation neural network (BPNN). Practical results acquired from tests done on six subjects' datasets of MES signals measured in a noninvasive manner using surface electrodes are presented. It is proved that minimum error rates can be achieved by considering the correct combination of features/channels, thus providing a feasible system for practical implementation purpose for rehabilitation of patients.

  4. A Novel Feature Selection Technique for Text Classification Using Naïve Bayes.

    PubMed

    Dey Sarkar, Subhajit; Goswami, Saptarsi; Agarwal, Aman; Aktar, Javed

    2014-01-01

    With the proliferation of unstructured data, text classification or text categorization has found many applications in topic classification, sentiment analysis, authorship identification, spam detection, and so on. There are many classification algorithms available. Naïve Bayes remains one of the oldest and most popular classifiers. On one hand, implementation of naïve Bayes is simple and, on the other hand, this also requires fewer amounts of training data. From the literature review, it is found that naïve Bayes performs poorly compared to other classifiers in text classification. As a result, this makes the naïve Bayes classifier unusable in spite of the simplicity and intuitiveness of the model. In this paper, we propose a two-step feature selection method based on firstly a univariate feature selection and then feature clustering, where we use the univariate feature selection method to reduce the search space and then apply clustering to select relatively independent feature sets. We demonstrate the effectiveness of our method by a thorough evaluation and comparison over 13 datasets. The performance improvement thus achieved makes naïve Bayes comparable or superior to other classifiers. The proposed algorithm is shown to outperform other traditional methods like greedy search based wrapper or CFS.

  5. Feature Selection Methods for Robust Decoding of Finger Movements in a Non-human Primate

    PubMed Central

    Padmanaban, Subash; Baker, Justin; Greger, Bradley

    2018-01-01

    Objective: The performance of machine learning algorithms used for neural decoding of dexterous tasks may be impeded due to problems arising when dealing with high-dimensional data. The objective of feature selection algorithms is to choose a near-optimal subset of features from the original feature space to improve the performance of the decoding algorithm. The aim of our study was to compare the effects of four feature selection techniques, Wilcoxon signed-rank test, Relative Importance, Principal Component Analysis (PCA), and Mutual Information Maximization on SVM classification performance for a dexterous decoding task. Approach: A nonhuman primate (NHP) was trained to perform small coordinated movements—similar to typing. An array of microelectrodes was implanted in the hand area of the motor cortex of the NHP and used to record action potentials (AP) during finger movements. A Support Vector Machine (SVM) was used to classify which finger movement the NHP was making based upon AP firing rates. We used the SVM classification to examine the functional parameters of (i) robustness to simulated failure and (ii) longevity of classification. We also compared the effect of using isolated-neuron and multi-unit firing rates as the feature vector supplied to the SVM. Main results: The average decoding accuracy for multi-unit features and single-unit features using Mutual Information Maximization (MIM) across 47 sessions was 96.74 ± 3.5% and 97.65 ± 3.36% respectively. The reduction in decoding accuracy between using 100% of the features and 10% of features based on MIM was 45.56% (from 93.7 to 51.09%) and 4.75% (from 95.32 to 90.79%) for multi-unit and single-unit features respectively. MIM had best performance compared to other feature selection methods. Significance: These results suggest improved decoding performance can be achieved by using optimally selected features. The results based on clinically relevant performance metrics also suggest that the decoding

  6. Attentional Selection Can Be Predicted by Reinforcement Learning of Task-relevant Stimulus Features Weighted by Value-independent Stickiness.

    PubMed

    Balcarras, Matthew; Ardid, Salva; Kaping, Daniel; Everling, Stefan; Womelsdorf, Thilo

    2016-02-01

    Attention includes processes that evaluate stimuli relevance, select the most relevant stimulus against less relevant stimuli, and bias choice behavior toward the selected information. It is not clear how these processes interact. Here, we captured these processes in a reinforcement learning framework applied to a feature-based attention task that required macaques to learn and update the value of stimulus features while ignoring nonrelevant sensory features, locations, and action plans. We found that value-based reinforcement learning mechanisms could account for feature-based attentional selection and choice behavior but required a value-independent stickiness selection process to explain selection errors while at asymptotic behavior. By comparing different reinforcement learning schemes, we found that trial-by-trial selections were best predicted by a model that only represents expected values for the task-relevant feature dimension, with nonrelevant stimulus features and action plans having only a marginal influence on covert selections. These findings show that attentional control subprocesses can be described by (1) the reinforcement learning of feature values within a restricted feature space that excludes irrelevant feature dimensions, (2) a stochastic selection process on feature-specific value representations, and (3) value-independent stickiness toward previous feature selections akin to perseveration in the motor domain. We speculate that these three mechanisms are implemented by distinct but interacting brain circuits and that the proposed formal account of feature-based stimulus selection will be important to understand how attentional subprocesses are implemented in primate brain networks.

  7. Automatic migraine classification via feature selection committee and machine learning techniques over imaging and questionnaire data.

    PubMed

    Garcia-Chimeno, Yolanda; Garcia-Zapirain, Begonya; Gomez-Beldarrain, Marian; Fernandez-Ruanova, Begonya; Garcia-Monco, Juan Carlos

    2017-04-13

    Feature selection methods are commonly used to identify subsets of relevant features to facilitate the construction of models for classification, yet little is known about how feature selection methods perform in diffusion tensor images (DTIs). In this study, feature selection and machine learning classification methods were tested for the purpose of automating diagnosis of migraines using both DTIs and questionnaire answers related to emotion and cognition - factors that influence of pain perceptions. We select 52 adult subjects for the study divided into three groups: control group (15), subjects with sporadic migraine (19) and subjects with chronic migraine and medication overuse (18). These subjects underwent magnetic resonance with diffusion tensor to see white matter pathway integrity of the regions of interest involved in pain and emotion. The tests also gather data about pathology. The DTI images and test results were then introduced into feature selection algorithms (Gradient Tree Boosting, L1-based, Random Forest and Univariate) to reduce features of the first dataset and classification algorithms (SVM (Support Vector Machine), Boosting (Adaboost) and Naive Bayes) to perform a classification of migraine group. Moreover we implement a committee method to improve the classification accuracy based on feature selection algorithms. When classifying the migraine group, the greatest improvements in accuracy were made using the proposed committee-based feature selection method. Using this approach, the accuracy of classification into three types improved from 67 to 93% when using the Naive Bayes classifier, from 90 to 95% with the support vector machine classifier, 93 to 94% in boosting. The features that were determined to be most useful for classification included are related with the pain, analgesics and left uncinate brain (connected with the pain and emotions). The proposed feature selection committee method improved the performance of migraine diagnosis

  8. Feature Selection based on Machine Learning in MRIs for Hippocampal Segmentation

    NASA Astrophysics Data System (ADS)

    Tangaro, Sabina; Amoroso, Nicola; Brescia, Massimo; Cavuoti, Stefano; Chincarini, Andrea; Errico, Rosangela; Paolo, Inglese; Longo, Giuseppe; Maglietta, Rosalia; Tateo, Andrea; Riccio, Giuseppe; Bellotti, Roberto

    2015-01-01

    Neurodegenerative diseases are frequently associated with structural changes in the brain. Magnetic resonance imaging (MRI) scans can show these variations and therefore can be used as a supportive feature for a number of neurodegenerative diseases. The hippocampus has been known to be a biomarker for Alzheimer disease and other neurological and psychiatric diseases. However, it requires accurate, robust, and reproducible delineation of hippocampal structures. Fully automatic methods are usually the voxel based approach; for each voxel a number of local features were calculated. In this paper, we compared four different techniques for feature selection from a set of 315 features extracted for each voxel: (i) filter method based on the Kolmogorov-Smirnov test; two wrapper methods, respectively, (ii) sequential forward selection and (iii) sequential backward elimination; and (iv) embedded method based on the Random Forest Classifier on a set of 10 T1-weighted brain MRIs and tested on an independent set of 25 subjects. The resulting segmentations were compared with manual reference labelling. By using only 23 feature for each voxel (sequential backward elimination) we obtained comparable state-of-the-art performances with respect to the standard tool FreeSurfer.

  9. EEG-based recognition of video-induced emotions: selecting subject-independent feature set.

    PubMed

    Kortelainen, Jukka; Seppänen, Tapio

    2013-01-01

    Emotions are fundamental for everyday life affecting our communication, learning, perception, and decision making. Including emotions into the human-computer interaction (HCI) could be seen as a significant step forward offering a great potential for developing advanced future technologies. While the electrical activity of the brain is affected by emotions, offers electroencephalogram (EEG) an interesting channel to improve the HCI. In this paper, the selection of subject-independent feature set for EEG-based emotion recognition is studied. We investigate the effect of different feature sets in classifying person's arousal and valence while watching videos with emotional content. The classification performance is optimized by applying a sequential forward floating search algorithm for feature selection. The best classification rate (65.1% for arousal and 63.0% for valence) is obtained with a feature set containing power spectral features from the frequency band of 1-32 Hz. The proposed approach substantially improves the classification rate reported in the literature. In future, further analysis of the video-induced EEG changes including the topographical differences in the spectral features is needed.

  10. Genetic Particle Swarm Optimization–Based Feature Selection for Very-High-Resolution Remotely Sensed Imagery Object Change Detection

    PubMed Central

    Chen, Qiang; Chen, Yunhao; Jiang, Weiguo

    2016-01-01

    In the field of multiple features Object-Based Change Detection (OBCD) for very-high-resolution remotely sensed images, image objects have abundant features and feature selection affects the precision and efficiency of OBCD. Through object-based image analysis, this paper proposes a Genetic Particle Swarm Optimization (GPSO)-based feature selection algorithm to solve the optimization problem of feature selection in multiple features OBCD. We select the Ratio of Mean to Variance (RMV) as the fitness function of GPSO, and apply the proposed algorithm to the object-based hybrid multivariate alternative detection model. Two experiment cases on Worldview-2/3 images confirm that GPSO can significantly improve the speed of convergence, and effectively avoid the problem of premature convergence, relative to other feature selection algorithms. According to the accuracy evaluation of OBCD, GPSO is superior at overall accuracy (84.17% and 83.59%) and Kappa coefficient (0.6771 and 0.6314) than other algorithms. Moreover, the sensitivity analysis results show that the proposed algorithm is not easily influenced by the initial parameters, but the number of features to be selected and the size of the particle swarm would affect the algorithm. The comparison experiment results reveal that RMV is more suitable than other functions as the fitness function of GPSO-based feature selection algorithm. PMID:27483285

  11. Genetic Particle Swarm Optimization-Based Feature Selection for Very-High-Resolution Remotely Sensed Imagery Object Change Detection.

    PubMed

    Chen, Qiang; Chen, Yunhao; Jiang, Weiguo

    2016-07-30

    In the field of multiple features Object-Based Change Detection (OBCD) for very-high-resolution remotely sensed images, image objects have abundant features and feature selection affects the precision and efficiency of OBCD. Through object-based image analysis, this paper proposes a Genetic Particle Swarm Optimization (GPSO)-based feature selection algorithm to solve the optimization problem of feature selection in multiple features OBCD. We select the Ratio of Mean to Variance (RMV) as the fitness function of GPSO, and apply the proposed algorithm to the object-based hybrid multivariate alternative detection model. Two experiment cases on Worldview-2/3 images confirm that GPSO can significantly improve the speed of convergence, and effectively avoid the problem of premature convergence, relative to other feature selection algorithms. According to the accuracy evaluation of OBCD, GPSO is superior at overall accuracy (84.17% and 83.59%) and Kappa coefficient (0.6771 and 0.6314) than other algorithms. Moreover, the sensitivity analysis results show that the proposed algorithm is not easily influenced by the initial parameters, but the number of features to be selected and the size of the particle swarm would affect the algorithm. The comparison experiment results reveal that RMV is more suitable than other functions as the fitness function of GPSO-based feature selection algorithm.

  12. Supervised Extraction of Diagnosis Codes from EMRs: Role of Feature Selection, Data Selection, and Probabilistic Thresholding.

    PubMed

    Rios, Anthony; Kavuluru, Ramakanth

    2013-09-01

    Extracting diagnosis codes from medical records is a complex task carried out by trained coders by reading all the documents associated with a patient's visit. With the popularity of electronic medical records (EMRs), computational approaches to code extraction have been proposed in the recent years. Machine learning approaches to multi-label text classification provide an important methodology in this task given each EMR can be associated with multiple codes. In this paper, we study the the role of feature selection, training data selection, and probabilistic threshold optimization in improving different multi-label classification approaches. We conduct experiments based on two different datasets: a recent gold standard dataset used for this task and a second larger and more complex EMR dataset we curated from the University of Kentucky Medical Center. While conventional approaches achieve results comparable to the state-of-the-art on the gold standard dataset, on our complex in-house dataset, we show that feature selection, training data selection, and probabilistic thresholding provide significant gains in performance.

  13. Use of genetic algorithm for the selection of EEG features

    NASA Astrophysics Data System (ADS)

    Asvestas, P.; Korda, A.; Kostopoulos, S.; Karanasiou, I.; Ouzounoglou, A.; Sidiropoulos, K.; Ventouras, E.; Matsopoulos, G.

    2015-09-01

    Genetic Algorithm (GA) is a popular optimization technique that can detect the global optimum of a multivariable function containing several local optima. GA has been widely used in the field of biomedical informatics, especially in the context of designing decision support systems that classify biomedical signals or images into classes of interest. The aim of this paper is to present a methodology, based on GA, for the selection of the optimal subset of features that can be used for the efficient classification of Event Related Potentials (ERPs), which are recorded during the observation of correct or incorrect actions. In our experiment, ERP recordings were acquired from sixteen (16) healthy volunteers who observed correct or incorrect actions of other subjects. The brain electrical activity was recorded at 47 locations on the scalp. The GA was formulated as a combinatorial optimizer for the selection of the combination of electrodes that maximizes the performance of the Fuzzy C Means (FCM) classification algorithm. In particular, during the evolution of the GA, for each candidate combination of electrodes, the well-known (Σ, Φ, Ω) features were calculated and were evaluated by means of the FCM method. The proposed methodology provided a combination of 8 electrodes, with classification accuracy 93.8%. Thus, GA can be the basis for the selection of features that discriminate ERP recordings of observations of correct or incorrect actions.

  14. Self-Adaptive MOEA Feature Selection for Classification of Bankruptcy Prediction Data

    PubMed Central

    Gaspar-Cunha, A.; Recio, G.; Costa, L.; Estébanez, C.

    2014-01-01

    Bankruptcy prediction is a vast area of finance and accounting whose importance lies in the relevance for creditors and investors in evaluating the likelihood of getting into bankrupt. As companies become complex, they develop sophisticated schemes to hide their real situation. In turn, making an estimation of the credit risks associated with counterparts or predicting bankruptcy becomes harder. Evolutionary algorithms have shown to be an excellent tool to deal with complex problems in finances and economics where a large number of irrelevant features are involved. This paper provides a methodology for feature selection in classification of bankruptcy data sets using an evolutionary multiobjective approach that simultaneously minimise the number of features and maximise the classifier quality measure (e.g., accuracy). The proposed methodology makes use of self-adaptation by applying the feature selection algorithm while simultaneously optimising the parameters of the classifier used. The methodology was applied to four different sets of data. The obtained results showed the utility of using the self-adaptation of the classifier. PMID:24707201

  15. Temporal Correlation Mechanisms and Their Role in Feature Selection: A Single-Unit Study in Primate Somatosensory Cortex

    PubMed Central

    Gomez-Ramirez, Manuel; Trzcinski, Natalie K.; Mihalas, Stefan; Niebur, Ernst

    2014-01-01

    Studies in vision show that attention enhances the firing rates of cells when it is directed towards their preferred stimulus feature. However, it is unknown whether other sensory systems employ this mechanism to mediate feature selection within their modalities. Moreover, whether feature-based attention modulates the correlated activity of a population is unclear. Indeed, temporal correlation codes such as spike-synchrony and spike-count correlations (rsc) are believed to play a role in stimulus selection by increasing the signal and reducing the noise in a population, respectively. Here, we investigate (1) whether feature-based attention biases the correlated activity between neurons when attention is directed towards their common preferred feature, (2) the interplay between spike-synchrony and rsc during feature selection, and (3) whether feature attention effects are common across the visual and tactile systems. Single-unit recordings were made in secondary somatosensory cortex of three non-human primates while animals engaged in tactile feature (orientation and frequency) and visual discrimination tasks. We found that both firing rate and spike-synchrony between neurons with similar feature selectivity were enhanced when attention was directed towards their preferred feature. However, attention effects on spike-synchrony were twice as large as those on firing rate, and had a tighter relationship with behavioral performance. Further, we observed increased rsc when attention was directed towards the visual modality (i.e., away from touch). These data suggest that similar feature selection mechanisms are employed in vision and touch, and that temporal correlation codes such as spike-synchrony play a role in mediating feature selection. We posit that feature-based selection operates by implementing multiple mechanisms that reduce the overall noise levels in the neural population and synchronize activity across subpopulations that encode the relevant features of

  16. The impact of feature selection on one and two-class classification performance for plant microRNAs.

    PubMed

    Khalifa, Waleed; Yousef, Malik; Saçar Demirci, Müşerref Duygu; Allmer, Jens

    2016-01-01

    MicroRNAs (miRNAs) are short nucleotide sequences that form a typical hairpin structure which is recognized by a complex enzyme machinery. It ultimately leads to the incorporation of 18-24 nt long mature miRNAs into RISC where they act as recognition keys to aid in regulation of target mRNAs. It is involved to determine miRNAs experimentally and, therefore, machine learning is used to complement such endeavors. The success of machine learning mostly depends on proper input data and appropriate features for parameterization of the data. Although, in general, two-class classification (TCC) is used in the field; because negative examples are hard to come by, one-class classification (OCC) has been tried for pre-miRNA detection. Since both positive and negative examples are currently somewhat limited, feature selection can prove to be vital for furthering the field of pre-miRNA detection. In this study, we compare the performance of OCC and TCC using eight feature selection methods and seven different plant species providing positive pre-miRNA examples. Feature selection was very successful for OCC where the best feature selection method achieved an average accuracy of 95.6%, thereby being ∼29% better than the worst method which achieved 66.9% accuracy. While the performance is comparable to TCC, which performs up to 3% better than OCC, TCC is much less affected by feature selection and its largest performance gap is ∼13% which only occurs for two of the feature selection methodologies. We conclude that feature selection is crucially important for OCC and that it can perform on par with TCC given the proper set of features.

  17. Evaluation of entropy and JM-distance criterions as features selection methods using spectral and spatial features derived from LANDSAT images

    NASA Technical Reports Server (NTRS)

    Parada, N. D. J. (Principal Investigator); Dutra, L. V.; Mascarenhas, N. D. A.; Mitsuo, Fernando Augusta, II

    1984-01-01

    A study area near Ribeirao Preto in Sao Paulo state was selected, with predominance in sugar cane. Eight features were extracted from the 4 original bands of LANDSAT image, using low-pass and high-pass filtering to obtain spatial features. There were 5 training sites in order to acquire the necessary parameters. Two groups of four channels were selected from 12 channels using JM-distance and entropy criterions. The number of selected channels was defined by physical restrictions of the image analyzer and computacional costs. The evaluation was performed by extracting the confusion matrix for training and tests areas, with a maximum likelihood classifier, and by defining performance indexes based on those matrixes for each group of channels. Results show that in spatial features and supervised classification, the entropy criterion is better in the sense that allows a more accurate and generalized definition of class signature. On the other hand, JM-distance criterion strongly reduces the misclassification within training areas.

  18. Prediction of protein-protein interactions based on PseAA composition and hybrid feature selection.

    PubMed

    Liu, Liang; Cai, Yudong; Lu, Wencong; Feng, Kaiyan; Peng, Chunrong; Niu, Bing

    2009-03-06

    Based on pseudo amino acid (PseAA) composition and a novel hybrid feature selection frame, this paper presents a computational system to predict the PPIs (protein-protein interactions) using 8796 protein pairs. These pairs are coded by PseAA composition, resulting in 114 features. A hybrid feature selection system, mRMR-KNNs-wrapper, is applied to obtain an optimized feature set by excluding poor-performed and/or redundant features, resulting in 103 remaining features. Using the optimized 103-feature subset, a prediction model is trained and tested in the k-nearest neighbors (KNNs) learning system. This prediction model achieves an overall accurate prediction rate of 76.18%, evaluated by 10-fold cross-validation test, which is 1.46% higher than using the initial 114 features and is 6.51% higher than the 20 features, coded by amino acid compositions. The PPIs predictor, developed for this research, is available for public use at http://chemdata.shu.edu.cn/ppi.

  19. Diagnosis of Chronic Kidney Disease Based on Support Vector Machine by Feature Selection Methods.

    PubMed

    Polat, Huseyin; Danaei Mehr, Homay; Cetin, Aydin

    2017-04-01

    As Chronic Kidney Disease progresses slowly, early detection and effective treatment are the only cure to reduce the mortality rate. Machine learning techniques are gaining significance in medical diagnosis because of their classification ability with high accuracy rates. The accuracy of classification algorithms depend on the use of correct feature selection algorithms to reduce the dimension of datasets. In this study, Support Vector Machine classification algorithm was used to diagnose Chronic Kidney Disease. To diagnose the Chronic Kidney Disease, two essential types of feature selection methods namely, wrapper and filter approaches were chosen to reduce the dimension of Chronic Kidney Disease dataset. In wrapper approach, classifier subset evaluator with greedy stepwise search engine and wrapper subset evaluator with the Best First search engine were used. In filter approach, correlation feature selection subset evaluator with greedy stepwise search engine and filtered subset evaluator with the Best First search engine were used. The results showed that the Support Vector Machine classifier by using filtered subset evaluator with the Best First search engine feature selection method has higher accuracy rate (98.5%) in the diagnosis of Chronic Kidney Disease compared to other selected methods.

  20. Improving mass candidate detection in mammograms via feature maxima propagation and local feature selection.

    PubMed

    Melendez, Jaime; Sánchez, Clara I; van Ginneken, Bram; Karssemeijer, Nico

    2014-08-01

    Mass candidate detection is a crucial component of multistep computer-aided detection (CAD) systems. It is usually performed by combining several local features by means of a classifier. When these features are processed on a per-image-location basis (e.g., for each pixel), mismatching problems may arise while constructing feature vectors for classification, which is especially true when the behavior expected from the evaluated features is a peaked response due to the presence of a mass. In this study, two of these problems, consisting of maxima misalignment and differences of maxima spread, are identified and two solutions are proposed. The first proposed method, feature maxima propagation, reproduces feature maxima through their neighboring locations. The second method, local feature selection, combines different subsets of features for different feature vectors associated with image locations. Both methods are applied independently and together. The proposed methods are included in a mammogram-based CAD system intended for mass detection in screening. Experiments are carried out with a database of 382 digital cases. Sensitivity is assessed at two sets of operating points. The first one is the interval of 3.5-15 false positives per image (FPs/image), which is typical for mass candidate detection. The second one is 1 FP/image, which allows to estimate the quality of the mass candidate detector's output for use in subsequent steps of the CAD system. The best results are obtained when the proposed methods are applied together. In that case, the mean sensitivity in the interval of 3.5-15 FPs/image significantly increases from 0.926 to 0.958 (p < 0.0002). At the lower rate of 1 FP/image, the mean sensitivity improves from 0.628 to 0.734 (p < 0.0002). Given the improved detection performance, the authors believe that the strategies proposed in this paper can render mass candidate detection approaches based on image location classification more robust to feature

  1. Improving permafrost distribution modelling using feature selection algorithms

    NASA Astrophysics Data System (ADS)

    Deluigi, Nicola; Lambiel, Christophe; Kanevski, Mikhail

    2016-04-01

    The availability of an increasing number of spatial data on the occurrence of mountain permafrost allows the employment of machine learning (ML) classification algorithms for modelling the distribution of the phenomenon. One of the major problems when dealing with high-dimensional dataset is the number of input features (variables) involved. Application of ML classification algorithms to this large number of variables leads to the risk of overfitting, with the consequence of a poor generalization/prediction. For this reason, applying feature selection (FS) techniques helps simplifying the amount of factors required and improves the knowledge on adopted features and their relation with the studied phenomenon. Moreover, taking away irrelevant or redundant variables from the dataset effectively improves the quality of the ML prediction. This research deals with a comparative analysis of permafrost distribution models supported by FS variable importance assessment. The input dataset (dimension = 20-25, 10 m spatial resolution) was constructed using landcover maps, climate data and DEM derived variables (altitude, aspect, slope, terrain curvature, solar radiation, etc.). It was completed with permafrost evidences (geophysical and thermal data and rock glacier inventories) that serve as training permafrost data. Used FS algorithms informed about variables that appeared less statistically important for permafrost presence/absence. Three different algorithms were compared: Information Gain (IG), Correlation-based Feature Selection (CFS) and Random Forest (RF). IG is a filter technique that evaluates the worth of a predictor by measuring the information gain with respect to the permafrost presence/absence. Conversely, CFS is a wrapper technique that evaluates the worth of a subset of predictors by considering the individual predictive ability of each variable along with the degree of redundancy between them. Finally, RF is a ML algorithm that performs FS as part of its

  2. Feature-selective attention: evidence for a decline in old age.

    PubMed

    Quigley, Cliodhna; Andersen, Søren K; Schulze, Lars; Grunwald, Martin; Müller, Matthias M

    2010-04-19

    Although attention in older adults is an active research area, feature-selective aspects have not yet been explicitly studied. Here we report the results of an exploratory study involving directed changes in feature-selective attention. The stimuli used were two random dot kinematograms (RDKs) of different colours, superimposed and centrally presented. A colour cue with random onset after the beginning of each trial instructed young and older subjects to attend to one of the RDKs and detect short intervals of coherent motion while ignoring analogous motion events in the non-cued RDK. Behavioural data show that older adults could detect motion, but discriminated target from distracter motion less reliably than young adults. The method of frequency tagging allowed us to separate the EEG responses to the attended and ignored stimuli and directly compare steady-state visual evoked potential (SSVEP) amplitudes elicited by each stimulus before and after cue onset. We found that younger adults show a clear attentional enhancement of SSVEP amplitude in the post-cue interval, while older adults' SSVEP responses to attended and ignored stimuli do not differ. Thus, in situations where attentional selection cannot be spatially resolved, older adults show a deficit in selection that is not shared by young adults. Copyright 2010 Elsevier Ireland Ltd. All rights reserved.

  3. A new approach to modeling the influence of image features on fixation selection in scenes

    PubMed Central

    Nuthmann, Antje; Einhäuser, Wolfgang

    2015-01-01

    Which image characteristics predict where people fixate when memorizing natural images? To answer this question, we introduce a new analysis approach that combines a novel scene-patch analysis with generalized linear mixed models (GLMMs). Our method allows for (1) directly describing the relationship between continuous feature value and fixation probability, and (2) assessing each feature's unique contribution to fixation selection. To demonstrate this method, we estimated the relative contribution of various image features to fixation selection: luminance and luminance contrast (low-level features); edge density (a mid-level feature); visual clutter and image segmentation to approximate local object density in the scene (higher-level features). An additional predictor captured the central bias of fixation. The GLMM results revealed that edge density, clutter, and the number of homogenous segments in a patch can independently predict whether image patches are fixated or not. Importantly, neither luminance nor contrast had an independent effect above and beyond what could be accounted for by the other predictors. Since the parcellation of the scene and the selection of features can be tailored to the specific research question, our approach allows for assessing the interplay of various factors relevant for fixation selection in scenes in a powerful and flexible manner. PMID:25752239

  4. A Generalization Strategy for Discrete Area Feature by Using Stroke Grouping and Polarization Transportation Selection

    NASA Astrophysics Data System (ADS)

    Wang, Xiao; Burghardt, Dirk

    2018-05-01

    This paper presents a new strategy for the generalization of discrete area features by using stroke grouping method and polarization transportation selection. The mentioned stroke is constructed on derive of the refined proximity graph of area features, and the refinement is under the control of four constraints to meet different grouping requirements. The area features which belong to the same stroke are detected into the same group. The stroke-based strategy decomposes the generalization process into two sub-processes by judging whether the area features related to strokes or not. For the area features which belong to the same one stroke, they normally present a linear like pat-tern, and in order to preserve this kind of pattern, typification is chosen as the operator to implement the generalization work. For the remaining area features which are not related by strokes, they are still distributed randomly and discretely, and the selection is chosen to conduct the generalization operation. For the purpose of retaining their original distribution characteristic, a Polarization Transportation (PT) method is introduced to implement the selection operation. Buildings and lakes are selected as the representatives of artificial area feature and natural area feature respectively to take the experiments. The generalized results indicate that by adopting this proposed strategy, the original distribution characteristics of building and lake data can be preserved, and the visual perception is pre-served as before.

  5. Selecting multiple features delays perception, but only when targets are horizontally arranged.

    PubMed

    Lo, Shih-Yu

    2017-01-01

    Based on the finding that perception is lagged by attention split on multiple features (Lo et al., 2012), this study investigated how the feature-based lag effect interacts with the target spatial arrangement. Participants were presented with gratings the spatial frequencies of which constantly changed. The task was to monitor two gratings of the same or different colors and report their spatial frequencies right before the stimulus offset. The results showed a perceptual lag wherein the reported value was closer to the physical value some time prior to the stimulus offset. This lag effect was larger when the two gratings were of different colors than when they were the same color. Furthermore, the feature-based lag effect was statistically significant when the two gratings were horizontally arranged but not when they were vertically or diagonally arranged. A model is proposed to explain the effect of target arrangement: When targets are horizontally arranged, selecting an additional feature delays perception. When targets are vertically or diagonally arranged, target selection for the lower field is prioritized. This prioritization on the lower target might prompt observers to only select the lower target and ignore the upper one, and this causes more perceptual errors without delaying perception. © 2017 Elsevier B.V. All rights reserved.

  6. iFeature: a python package and web server for features extraction and selection from protein and peptide sequences.

    PubMed

    Chen, Zhen; Zhao, Pei; Li, Fuyi; Leier, André; Marquez-Lago, Tatiana T; Wang, Yanan; Webb, Geoffrey I; Smith, A Ian; Daly, Roger J; Chou, Kuo-Chen; Song, Jiangning

    2018-03-08

    Structural and physiochemical descriptors extracted from sequence data have been widely used to represent sequences and predict structural, functional, expression and interaction profiles of proteins and peptides as well as DNAs/RNAs. Here, we present iFeature, a versatile Python-based toolkit for generating various numerical feature representation schemes for both protein and peptide sequences. iFeature is capable of calculating and extracting a comprehensive spectrum of 18 major sequence encoding schemes that encompass 53 different types of feature descriptors. It also allows users to extract specific amino acid properties from the AAindex database. Furthermore, iFeature integrates 12 different types of commonly used feature clustering, selection, and dimensionality reduction algorithms, greatly facilitating training, analysis, and benchmarking of machine-learning models. The functionality of iFeature is made freely available via an online web server and a stand-alone toolkit. http://iFeature.erc.monash.edu/; https://github.com/Superzchen/iFeature/. jiangning.song@monash.edu; kcchou@gordonlifescience.org; roger.daly@monash.edu. Supplementary data are available at Bioinformatics online.

  7. Multiband tangent space mapping and feature selection for classification of EEG during motor imagery.

    PubMed

    Islam, Md Rabiul; Tanaka, Toshihisa; Molla, Md Khademul Islam

    2018-05-08

    When designing multiclass motor imagery-based brain-computer interface (MI-BCI), a so-called tangent space mapping (TSM) method utilizing the geometric structure of covariance matrices is an effective technique. This paper aims to introduce a method using TSM for finding accurate operational frequency bands related brain activities associated with MI tasks. A multichannel electroencephalogram (EEG) signal is decomposed into multiple subbands, and tangent features are then estimated on each subband. A mutual information analysis-based effective algorithm is implemented to select subbands containing features capable of improving motor imagery classification accuracy. Thus obtained features of selected subbands are combined to get feature space. A principal component analysis-based approach is employed to reduce the features dimension and then the classification is accomplished by a support vector machine (SVM). Offline analysis demonstrates the proposed multiband tangent space mapping with subband selection (MTSMS) approach outperforms state-of-the-art methods. It acheives the highest average classification accuracy for all datasets (BCI competition dataset 2a, IIIa, IIIb, and dataset JK-HH1). The increased classification accuracy of MI tasks with the proposed MTSMS approach can yield effective implementation of BCI. The mutual information-based subband selection method is implemented to tune operation frequency bands to represent actual motor imagery tasks.

  8. Simultaneous Spectral-Spatial Feature Selection and Extraction for Hyperspectral Images.

    PubMed

    Zhang, Lefei; Zhang, Qian; Du, Bo; Huang, Xin; Tang, Yuan Yan; Tao, Dacheng

    2018-01-01

    In hyperspectral remote sensing data mining, it is important to take into account of both spectral and spatial information, such as the spectral signature, texture feature, and morphological property, to improve the performances, e.g., the image classification accuracy. In a feature representation point of view, a nature approach to handle this situation is to concatenate the spectral and spatial features into a single but high dimensional vector and then apply a certain dimension reduction technique directly on that concatenated vector before feed it into the subsequent classifier. However, multiple features from various domains definitely have different physical meanings and statistical properties, and thus such concatenation has not efficiently explore the complementary properties among different features, which should benefit for boost the feature discriminability. Furthermore, it is also difficult to interpret the transformed results of the concatenated vector. Consequently, finding a physically meaningful consensus low dimensional feature representation of original multiple features is still a challenging task. In order to address these issues, we propose a novel feature learning framework, i.e., the simultaneous spectral-spatial feature selection and extraction algorithm, for hyperspectral images spectral-spatial feature representation and classification. Specifically, the proposed method learns a latent low dimensional subspace by projecting the spectral-spatial feature into a common feature space, where the complementary information has been effectively exploited, and simultaneously, only the most significant original features have been transformed. Encouraging experimental results on three public available hyperspectral remote sensing datasets confirm that our proposed method is effective and efficient.

  9. Feature Selection for Nonstationary Data: Application to Human Recognition Using Medical Biometrics.

    PubMed

    Komeili, Majid; Louis, Wael; Armanfard, Narges; Hatzinakos, Dimitrios

    2018-05-01

    Electrocardiogram (ECG) and transient evoked otoacoustic emission (TEOAE) are among the physiological signals that have attracted significant interest in biometric community due to their inherent robustness to replay and falsification attacks. However, they are time-dependent signals and this makes them hard to deal with in across-session human recognition scenario where only one session is available for enrollment. This paper presents a novel feature selection method to address this issue. It is based on an auxiliary dataset with multiple sessions where it selects a subset of features that are more persistent across different sessions. It uses local information in terms of sample margins while enforcing an across-session measure. This makes it a perfect fit for aforementioned biometric recognition problem. Comprehensive experiments on ECG and TEOAE variability due to time lapse and body posture are done. Performance of the proposed method is compared against seven state-of-the-art feature selection algorithms as well as another six approaches in the area of ECG and TEOAE biometric recognition. Experimental results demonstrate that the proposed method performs noticeably better than other algorithms.

  10. Joint L2,1 Norm and Fisher Discrimination Constrained Feature Selection for Rational Synthesis of Microporous Aluminophosphates.

    PubMed

    Qi, Miao; Wang, Ting; Yi, Yugen; Gao, Na; Kong, Jun; Wang, Jianzhong

    2017-04-01

    Feature selection has been regarded as an effective tool to help researchers understand the generating process of data. For mining the synthesis mechanism of microporous AlPOs, this paper proposes a novel feature selection method by joint l 2,1 norm and Fisher discrimination constraints (JNFDC). In order to obtain more effective feature subset, the proposed method can be achieved in two steps. The first step is to rank the features according to sparse and discriminative constraints. The second step is to establish predictive model with the ranked features, and select the most significant features in the light of the contribution of improving the predictive accuracy. To the best of our knowledge, JNFDC is the first work which employs the sparse representation theory to explore the synthesis mechanism of six kinds of pore rings. Numerical simulations demonstrate that our proposed method can select significant features affecting the specified structural property and improve the predictive accuracy. Moreover, comparison results show that JNFDC can obtain better predictive performances than some other state-of-the-art feature selection methods. © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

  11. A stereo remote sensing feature selection method based on artificial bee colony algorithm

    NASA Astrophysics Data System (ADS)

    Yan, Yiming; Liu, Pigang; Zhang, Ye; Su, Nan; Tian, Shu; Gao, Fengjiao; Shen, Yi

    2014-05-01

    To improve the efficiency of stereo information for remote sensing classification, a stereo remote sensing feature selection method is proposed in this paper presents, which is based on artificial bee colony algorithm. Remote sensing stereo information could be described by digital surface model (DSM) and optical image, which contain information of the three-dimensional structure and optical characteristics, respectively. Firstly, three-dimensional structure characteristic could be analyzed by 3D-Zernike descriptors (3DZD). However, different parameters of 3DZD could descript different complexity of three-dimensional structure, and it needs to be better optimized selected for various objects on the ground. Secondly, features for representing optical characteristic also need to be optimized. If not properly handled, when a stereo feature vector composed of 3DZD and image features, that would be a lot of redundant information, and the redundant information may not improve the classification accuracy, even cause adverse effects. To reduce information redundancy while maintaining or improving the classification accuracy, an optimized frame for this stereo feature selection problem is created, and artificial bee colony algorithm is introduced for solving this optimization problem. Experimental results show that the proposed method can effectively improve the computational efficiency, improve the classification accuracy.

  12. Prediction of lysine glutarylation sites by maximum relevance minimum redundancy feature selection.

    PubMed

    Ju, Zhe; He, Jian-Jun

    2018-06-01

    Lysine glutarylation is new type of protein acylation modification in both prokaryotes and eukaryotes. To better understand the molecular mechanism of glutarylation, it is important to identify glutarylated substrates and their corresponding glutarylation sites accurately. In this study, a novel bioinformatics tool named GlutPred is developed to predict glutarylation sites by using multiple feature extraction and maximum relevance minimum redundancy feature selection. On the one hand, amino acid factors, binary encoding, and the composition of k-spaced amino acid pairs features are incorporated to encode glutarylation sites. And the maximum relevance minimum redundancy method and the incremental feature selection algorithm are adopted to remove the redundant features. On the other hand, a biased support vector machine algorithm is used to handle the imbalanced problem in glutarylation sites training dataset. As illustrated by 10-fold cross-validation, the performance of GlutPred achieves a satisfactory performance with a Sensitivity of 64.80%, a Specificity of 76.60%, an Accuracy of 74.90% and a Matthew's correlation coefficient of 0.3194. Feature analysis shows that some k-spaced amino acid pair features play the most important roles in the prediction of glutarylation sites. The conclusions derived from this study might provide some clues for understanding the molecular mechanisms of glutarylation. Copyright © 2018 Elsevier Inc. All rights reserved.

  13. Information Theory for Gabor Feature Selection for Face Recognition

    NASA Astrophysics Data System (ADS)

    Shen, Linlin; Bai, Li

    2006-12-01

    A discriminative and robust feature—kernel enhanced informative Gabor feature—is proposed in this paper for face recognition. Mutual information is applied to select a set of informative and nonredundant Gabor features, which are then further enhanced by kernel methods for recognition. Compared with one of the top performing methods in the 2004 Face Verification Competition (FVC2004), our methods demonstrate a clear advantage over existing methods in accuracy, computation efficiency, and memory cost. The proposed method has been fully tested on the FERET database using the FERET evaluation protocol. Significant improvements on three of the test data sets are observed. Compared with the classical Gabor wavelet-based approaches using a huge number of features, our method requires less than 4 milliseconds to retrieve a few hundreds of features. Due to the substantially reduced feature dimension, only 4 seconds are required to recognize 200 face images. The paper also unified different Gabor filter definitions and proposed a training sample generation algorithm to reduce the effects caused by unbalanced number of samples available in different classes.

  14. Selecting Feature Subsets Based on SVM-RFE and the Overlapping Ratio with Applications in Bioinformatics.

    PubMed

    Lin, Xiaohui; Li, Chao; Zhang, Yanhui; Su, Benzhe; Fan, Meng; Wei, Hai

    2017-12-26

    Feature selection is an important topic in bioinformatics. Defining informative features from complex high dimensional biological data is critical in disease study, drug development, etc. Support vector machine-recursive feature elimination (SVM-RFE) is an efficient feature selection technique that has shown its power in many applications. It ranks the features according to the recursive feature deletion sequence based on SVM. In this study, we propose a method, SVM-RFE-OA, which combines the classification accuracy rate and the average overlapping ratio of the samples to determine the number of features to be selected from the feature rank of SVM-RFE. Meanwhile, to measure the feature weights more accurately, we propose a modified SVM-RFE-OA (M-SVM-RFE-OA) algorithm that temporally screens out the samples lying in a heavy overlapping area in each iteration. The experiments on the eight public biological datasets show that the discriminative ability of the feature subset could be measured more accurately by combining the classification accuracy rate with the average overlapping degree of the samples compared with using the classification accuracy rate alone, and shielding the samples in the overlapping area made the calculation of the feature weights more stable and accurate. The methods proposed in this study can also be used with other RFE techniques to define potential biomarkers from big biological data.

  15. Particle Swarm Optimization Based Feature Enhancement and Feature Selection for Improved Emotion Recognition in Speech and Glottal Signals

    PubMed Central

    Muthusamy, Hariharan; Polat, Kemal; Yaacob, Sazali

    2015-01-01

    In the recent years, many research works have been published using speech related features for speech emotion recognition, however, recent studies show that there is a strong correlation between emotional states and glottal features. In this work, Mel-frequency cepstralcoefficients (MFCCs), linear predictive cepstral coefficients (LPCCs), perceptual linear predictive (PLP) features, gammatone filter outputs, timbral texture features, stationary wavelet transform based timbral texture features and relative wavelet packet energy and entropy features were extracted from the emotional speech (ES) signals and its glottal waveforms(GW). Particle swarm optimization based clustering (PSOC) and wrapper based particle swarm optimization (WPSO) were proposed to enhance the discerning ability of the features and to select the discriminating features respectively. Three different emotional speech databases were utilized to gauge the proposed method. Extreme learning machine (ELM) was employed to classify the different types of emotions. Different experiments were conducted and the results show that the proposed method significantly improves the speech emotion recognition performance compared to previous works published in the literature. PMID:25799141

  16. Tehran Air Pollutants Prediction Based on Random Forest Feature Selection Method

    NASA Astrophysics Data System (ADS)

    Shamsoddini, A.; Aboodi, M. R.; Karami, J.

    2017-09-01

    Air pollution as one of the most serious forms of environmental pollutions poses huge threat to human life. Air pollution leads to environmental instability, and has harmful and undesirable effects on the environment. Modern prediction methods of the pollutant concentration are able to improve decision making and provide appropriate solutions. This study examines the performance of the Random Forest feature selection in combination with multiple-linear regression and Multilayer Perceptron Artificial Neural Networks methods, in order to achieve an efficient model to estimate carbon monoxide and nitrogen dioxide, sulfur dioxide and PM2.5 contents in the air. The results indicated that Artificial Neural Networks fed by the attributes selected by Random Forest feature selection method performed more accurate than other models for the modeling of all pollutants. The estimation accuracy of sulfur dioxide emissions was lower than the other air contaminants whereas the nitrogen dioxide was predicted more accurate than the other pollutants.

  17. A proposed framework on hybrid feature selection techniques for handling high dimensional educational data

    NASA Astrophysics Data System (ADS)

    Shahiri, Amirah Mohamed; Husain, Wahidah; Rashid, Nur'Aini Abd

    2017-10-01

    Huge amounts of data in educational datasets may cause the problem in producing quality data. Recently, data mining approach are increasingly used by educational data mining researchers for analyzing the data patterns. However, many research studies have concentrated on selecting suitable learning algorithms instead of performing feature selection process. As a result, these data has problem with computational complexity and spend longer computational time for classification. The main objective of this research is to provide an overview of feature selection techniques that have been used to analyze the most significant features. Then, this research will propose a framework to improve the quality of students' dataset. The proposed framework uses filter and wrapper based technique to support prediction process in future study.

  18. An opinion formation based binary optimization approach for feature selection

    NASA Astrophysics Data System (ADS)

    Hamedmoghadam, Homayoun; Jalili, Mahdi; Yu, Xinghuo

    2018-02-01

    This paper proposed a novel optimization method based on opinion formation in complex network systems. The proposed optimization technique mimics human-human interaction mechanism based on a mathematical model derived from social sciences. Our method encodes a subset of selected features to the opinion of an artificial agent and simulates the opinion formation process among a population of agents to solve the feature selection problem. The agents interact using an underlying interaction network structure and get into consensus in their opinions, while finding better solutions to the problem. A number of mechanisms are employed to avoid getting trapped in local minima. We compare the performance of the proposed method with a number of classical population-based optimization methods and a state-of-the-art opinion formation based method. Our experiments on a number of high dimensional datasets reveal outperformance of the proposed algorithm over others.

  19. Effects of feature-selective and spatial attention at different stages of visual processing.

    PubMed

    Andersen, Søren K; Fuchs, Sandra; Müller, Matthias M

    2011-01-01

    We investigated mechanisms of concurrent attentional selection of location and color using electrophysiological measures in human subjects. Two completely overlapping random dot kinematograms (RDKs) of two different colors were presented on either side of a central fixation cross. On each trial, participants attended one of these four RDKs, defined by its specific combination of color and location, in order to detect coherent motion targets. Sustained attentional selection while monitoring for targets was measured by means of steady-state visual evoked potentials (SSVEPs) elicited by the frequency-tagged RDKs. Attentional selection of transient targets and distractors was assessed by behavioral responses and by recording event-related potentials to these stimuli. Spatial attention and attention to color had independent and largely additive effects on the amplitudes of SSVEPs elicited in early visual areas. In contrast, behavioral false alarms and feature-selective modulation of P3 amplitudes to targets and distractors were limited to the attended location. These results suggest that feature-selective attention produces an early, global facilitation of stimuli having the attended feature throughout the visual field, whereas the discrimination of target events takes place at a later stage of processing that is only applied to stimuli at the attended position.

  20. Tabu search and binary particle swarm optimization for feature selection using microarray data.

    PubMed

    Chuang, Li-Yeh; Yang, Cheng-Huei; Yang, Cheng-Hong

    2009-12-01

    Gene expression profiles have great potential as a medical diagnosis tool because they represent the state of a cell at the molecular level. In the classification of cancer type research, available training datasets generally have a fairly small sample size compared to the number of genes involved. This fact poses an unprecedented challenge to some classification methodologies due to training data limitations. Therefore, a good selection method for genes relevant for sample classification is needed to improve the predictive accuracy, and to avoid incomprehensibility due to the large number of genes investigated. In this article, we propose to combine tabu search (TS) and binary particle swarm optimization (BPSO) for feature selection. BPSO acts as a local optimizer each time the TS has been run for a single generation. The K-nearest neighbor method with leave-one-out cross-validation and support vector machine with one-versus-rest serve as evaluators of the TS and BPSO. The proposed method is applied and compared to the 11 classification problems taken from the literature. Experimental results show that our method simplifies features effectively and either obtains higher classification accuracy or uses fewer features compared to other feature selection methods.

  1. Statistical analysis for validating ACO-KNN algorithm as feature selection in sentiment analysis

    NASA Astrophysics Data System (ADS)

    Ahmad, Siti Rohaidah; Yusop, Nurhafizah Moziyana Mohd; Bakar, Azuraliza Abu; Yaakub, Mohd Ridzwan

    2017-10-01

    This research paper aims to propose a hybrid of ant colony optimization (ACO) and k-nearest neighbor (KNN) algorithms as feature selections for selecting and choosing relevant features from customer review datasets. Information gain (IG), genetic algorithm (GA), and rough set attribute reduction (RSAR) were used as baseline algorithms in a performance comparison with the proposed algorithm. This paper will also discuss the significance test, which was used to evaluate the performance differences between the ACO-KNN, IG-GA, and IG-RSAR algorithms. This study evaluated the performance of the ACO-KNN algorithm using precision, recall, and F-score, which were validated using the parametric statistical significance tests. The evaluation process has statistically proven that this ACO-KNN algorithm has been significantly improved compared to the baseline algorithms. The evaluation process has statistically proven that this ACO-KNN algorithm has been significantly improved compared to the baseline algorithms. In addition, the experimental results have proven that the ACO-KNN can be used as a feature selection technique in sentiment analysis to obtain quality, optimal feature subset that can represent the actual data in customer review data.

  2. Feature selection methods for object-based classification of sub-decimeter resolution digital aerial imagery

    USDA-ARS?s Scientific Manuscript database

    Due to the availability of numerous spectral, spatial, and contextual features, the determination of optimal features and class separabilities can be a time consuming process in object-based image analysis (OBIA). While several feature selection methods have been developed to assist OBIA, a robust c...

  3. Comparison of Genetic Algorithm, Particle Swarm Optimization and Biogeography-based Optimization for Feature Selection to Classify Clusters of Microcalcifications

    NASA Astrophysics Data System (ADS)

    Khehra, Baljit Singh; Pharwaha, Amar Partap Singh

    2017-04-01

    Ductal carcinoma in situ (DCIS) is one type of breast cancer. Clusters of microcalcifications (MCCs) are symptoms of DCIS that are recognized by mammography. Selection of robust features vector is the process of selecting an optimal subset of features from a large number of available features in a given problem domain after the feature extraction and before any classification scheme. Feature selection reduces the feature space that improves the performance of classifier and decreases the computational burden imposed by using many features on classifier. Selection of an optimal subset of features from a large number of available features in a given problem domain is a difficult search problem. For n features, the total numbers of possible subsets of features are 2n. Thus, selection of an optimal subset of features problem belongs to the category of NP-hard problems. In this paper, an attempt is made to find the optimal subset of MCCs features from all possible subsets of features using genetic algorithm (GA), particle swarm optimization (PSO) and biogeography-based optimization (BBO). For simulation, a total of 380 benign and malignant MCCs samples have been selected from mammogram images of DDSM database. A total of 50 features extracted from benign and malignant MCCs samples are used in this study. In these algorithms, fitness function is correct classification rate of classifier. Support vector machine is used as a classifier. From experimental results, it is also observed that the performance of PSO-based and BBO-based algorithms to select an optimal subset of features for classifying MCCs as benign or malignant is better as compared to GA-based algorithm.

  4. Computer Presentational Features for Young Children's Preferential Selection and Recall of Information.

    ERIC Educational Resources Information Center

    Calvert, Sandra L.; And Others

    The purpose of this study was to examine the impact of visual and auditory presentational features on young children's selection and memory for verbally presented content. Assessed as a function of action and sound were preschool children's preferential selection and recall of words presented in a computer microworld. A computer microworld…

  5. featsel: A framework for benchmarking of feature selection algorithms and cost functions

    NASA Astrophysics Data System (ADS)

    Reis, Marcelo S.; Estrela, Gustavo; Ferreira, Carlos Eduardo; Barrera, Junior

    In this paper, we introduce featsel, a framework for benchmarking of feature selection algorithms and cost functions. This framework allows the user to deal with the search space as a Boolean lattice and has its core coded in C++ for computational efficiency purposes. Moreover, featsel includes Perl scripts to add new algorithms and/or cost functions, generate random instances, plot graphs and organize results into tables. Besides, this framework already comes with dozens of algorithms and cost functions for benchmarking experiments. We also provide illustrative examples, in which featsel outperforms the popular Weka workbench in feature selection procedures on data sets from the UCI Machine Learning Repository.

  6. Genetic Programming and Frequent Itemset Mining to Identify Feature Selection Patterns of iEEG and fMRI Epilepsy Data

    PubMed Central

    Smart, Otis; Burrell, Lauren

    2014-01-01

    Pattern classification for intracranial electroencephalogram (iEEG) and functional magnetic resonance imaging (fMRI) signals has furthered epilepsy research toward understanding the origin of epileptic seizures and localizing dysfunctional brain tissue for treatment. Prior research has demonstrated that implicitly selecting features with a genetic programming (GP) algorithm more effectively determined the proper features to discern biomarker and non-biomarker interictal iEEG and fMRI activity than conventional feature selection approaches. However for each the iEEG and fMRI modalities, it is still uncertain whether the stochastic properties of indirect feature selection with a GP yield (a) consistent results within a patient data set and (b) features that are specific or universal across multiple patient data sets. We examined the reproducibility of implicitly selecting features to classify interictal activity using a GP algorithm by performing several selection trials and subsequent frequent itemset mining (FIM) for separate iEEG and fMRI epilepsy patient data. We observed within-subject consistency and across-subject variability with some small similarity for selected features, indicating a clear need for patient-specific features and possible need for patient-specific feature selection or/and classification. For the fMRI, using nearest-neighbor classification and 30 GP generations, we obtained over 60% median sensitivity and over 60% median selectivity. For the iEEG, using nearest-neighbor classification and 30 GP generations, we obtained over 65% median sensitivity and over 65% median selectivity except one patient. PMID:25580059

  7. Computational Intelligence Modeling of the Macromolecules Release from PLGA Microspheres-Focus on Feature Selection.

    PubMed

    Zawbaa, Hossam M; Szlȩk, Jakub; Grosan, Crina; Jachowicz, Renata; Mendyk, Aleksander

    2016-01-01

    Poly-lactide-co-glycolide (PLGA) is a copolymer of lactic and glycolic acid. Drug release from PLGA microspheres depends not only on polymer properties but also on drug type, particle size, morphology of microspheres, release conditions, etc. Selecting a subset of relevant properties for PLGA is a challenging machine learning task as there are over three hundred features to consider. In this work, we formulate the selection of critical attributes for PLGA as a multiobjective optimization problem with the aim of minimizing the error of predicting the dissolution profile while reducing the number of attributes selected. Four bio-inspired optimization algorithms: antlion optimization, binary version of antlion optimization, grey wolf optimization, and social spider optimization are used to select the optimal feature set for predicting the dissolution profile of PLGA. Besides these, LASSO algorithm is also used for comparisons. Selection of crucial variables is performed under the assumption that both predictability and model simplicity are of equal importance to the final result. During the feature selection process, a set of input variables is employed to find minimum generalization error across different predictive models and their settings/architectures. The methodology is evaluated using predictive modeling for which various tools are chosen, such as Cubist, random forests, artificial neural networks (monotonic MLP, deep learning MLP), multivariate adaptive regression splines, classification and regression tree, and hybrid systems of fuzzy logic and evolutionary computations (fugeR). The experimental results are compared with the results reported by Szlȩk. We obtain a normalized root mean square error (NRMSE) of 15.97% versus 15.4%, and the number of selected input features is smaller, nine versus eleven.

  8. Computational Intelligence Modeling of the Macromolecules Release from PLGA Microspheres—Focus on Feature Selection

    PubMed Central

    Zawbaa, Hossam M.; Szlȩk, Jakub; Grosan, Crina; Jachowicz, Renata; Mendyk, Aleksander

    2016-01-01

    Poly-lactide-co-glycolide (PLGA) is a copolymer of lactic and glycolic acid. Drug release from PLGA microspheres depends not only on polymer properties but also on drug type, particle size, morphology of microspheres, release conditions, etc. Selecting a subset of relevant properties for PLGA is a challenging machine learning task as there are over three hundred features to consider. In this work, we formulate the selection of critical attributes for PLGA as a multiobjective optimization problem with the aim of minimizing the error of predicting the dissolution profile while reducing the number of attributes selected. Four bio-inspired optimization algorithms: antlion optimization, binary version of antlion optimization, grey wolf optimization, and social spider optimization are used to select the optimal feature set for predicting the dissolution profile of PLGA. Besides these, LASSO algorithm is also used for comparisons. Selection of crucial variables is performed under the assumption that both predictability and model simplicity are of equal importance to the final result. During the feature selection process, a set of input variables is employed to find minimum generalization error across different predictive models and their settings/architectures. The methodology is evaluated using predictive modeling for which various tools are chosen, such as Cubist, random forests, artificial neural networks (monotonic MLP, deep learning MLP), multivariate adaptive regression splines, classification and regression tree, and hybrid systems of fuzzy logic and evolutionary computations (fugeR). The experimental results are compared with the results reported by Szlȩk. We obtain a normalized root mean square error (NRMSE) of 15.97% versus 15.4%, and the number of selected input features is smaller, nine versus eleven. PMID:27315205

  9. Discrete Biogeography Based Optimization for Feature Selection in Molecular Signatures.

    PubMed

    Liu, Bo; Tian, Meihong; Zhang, Chunhua; Li, Xiangtao

    2015-04-01

    Biomarker discovery from high-dimensional data is a complex task in the development of efficient cancer diagnoses and classification. However, these data are usually redundant and noisy, and only a subset of them present distinct profiles for different classes of samples. Thus, selecting high discriminative genes from gene expression data has become increasingly interesting in the field of bioinformatics. In this paper, a discrete biogeography based optimization is proposed to select the good subset of informative gene relevant to the classification. In the proposed algorithm, firstly, the fisher-markov selector is used to choose fixed number of gene data. Secondly, to make biogeography based optimization suitable for the feature selection problem; discrete migration model and discrete mutation model are proposed to balance the exploration and exploitation ability. Then, discrete biogeography based optimization, as we called DBBO, is proposed by integrating discrete migration model and discrete mutation model. Finally, the DBBO method is used for feature selection, and three classifiers are used as the classifier with the 10 fold cross-validation method. In order to show the effective and efficiency of the algorithm, the proposed algorithm is tested on four breast cancer dataset benchmarks. Comparison with genetic algorithm, particle swarm optimization, differential evolution algorithm and hybrid biogeography based optimization, experimental results demonstrate that the proposed method is better or at least comparable with previous method from literature when considering the quality of the solutions obtained. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  10. Hierarchical learning architecture with automatic feature selection for multiclass protein fold classification.

    PubMed

    Huang, Chuen-Der; Lin, Chin-Teng; Pal, Nikhil Ranjan

    2003-12-01

    The structure classification of proteins plays a very important role in bioinformatics, since the relationships and characteristics among those known proteins can be exploited to predict the structure of new proteins. The success of a classification system depends heavily on two things: the tools being used and the features considered. For the bioinformatics applications, the role of appropriate features has not been paid adequate importance. In this investigation we use three novel ideas for multiclass protein fold classification. First, we use the gating neural network, where each input node is associated with a gate. This network can select important features in an online manner when the learning goes on. At the beginning of the training, all gates are almost closed, i.e., no feature is allowed to enter the network. Through the training, gates corresponding to good features are completely opened while gates corresponding to bad features are closed more tightly, and some gates may be partially open. The second novel idea is to use a hierarchical learning architecture (HLA). The classifier in the first level of HLA classifies the protein features into four major classes: all alpha, all beta, alpha + beta, and alpha/beta. And in the next level we have another set of classifiers, which further classifies the protein features into 27 folds. The third novel idea is to induce the indirect coding features from the amino-acid composition sequence of proteins based on the N-gram concept. This provides us with more representative and discriminative new local features of protein sequences for multiclass protein fold classification. The proposed HLA with new indirect coding features increases the protein fold classification accuracy by about 12%. Moreover, the gating neural network is found to reduce the number of features drastically. Using only half of the original features selected by the gating neural network can reach comparable test accuracy as that using all the

  11. The optional selection of micro-motion feature based on Support Vector Machine

    NASA Astrophysics Data System (ADS)

    Li, Bo; Ren, Hongmei; Xiao, Zhi-he; Sheng, Jing

    2017-11-01

    Micro-motion form of target is multiple, different micro-motion forms are apt to be modulated, which makes it difficult for feature extraction and recognition. Aiming at feature extraction of cone-shaped objects with different micro-motion forms, this paper proposes the best selection method of micro-motion feature based on support vector machine. After the time-frequency distribution of radar echoes, comparing the time-frequency spectrum of objects with different micro-motion forms, features are extracted based on the differences between the instantaneous frequency variations of different micro-motions. According to the methods based on SVM (Support Vector Machine) features are extracted, then the best features are acquired. Finally, the result shows the method proposed in this paper is feasible under the test condition of certain signal-to-noise ratio(SNR).

  12. Feature selection with harmony search.

    PubMed

    Diao, Ren; Shen, Qiang

    2012-12-01

    Many search strategies have been exploited for the task of feature selection (FS), in an effort to identify more compact and better quality subsets. Such work typically involves the use of greedy hill climbing (HC), or nature-inspired heuristics, in order to discover the optimal solution without going through exhaustive search. In this paper, a novel FS approach based on harmony search (HS) is presented. It is a general approach that can be used in conjunction with many subset evaluation techniques. The simplicity of HS is exploited to reduce the overall complexity of the search process. The proposed approach is able to escape from local solutions and identify multiple solutions owing to the stochastic nature of HS. Additional parameter control schemes are introduced to reduce the effort and impact of parameter configuration. These can be further combined with the iterative refinement strategy, tailored to enforce the discovery of quality subsets. The resulting approach is compared with those that rely on HC, genetic algorithms, and particle swarm optimization, accompanied by in-depth studies of the suggested improvements.

  13. Less is more: Avoiding the LIBS dimensionality curse through judicious feature selection for explosive detection.

    PubMed

    Kumar Myakalwar, Ashwin; Spegazzini, Nicolas; Zhang, Chi; Kumar Anubham, Siva; Dasari, Ramachandra R; Barman, Ishan; Kumar Gundawar, Manoj

    2015-08-19

    Despite its intrinsic advantages, translation of laser induced breakdown spectroscopy for material identification has been often impeded by the lack of robustness of developed classification models, often due to the presence of spurious correlations. While a number of classifiers exhibiting high discriminatory power have been reported, efforts in establishing the subset of relevant spectral features that enable a fundamental interpretation of the segmentation capability and avoid the 'curse of dimensionality' have been lacking. Using LIBS data acquired from a set of secondary explosives, we investigate judicious feature selection approaches and architect two different chemometrics classifiers -based on feature selection through prerequisite knowledge of the sample composition and genetic algorithm, respectively. While the full spectral input results in classification rate of ca.92%, selection of only carbon to hydrogen spectral window results in near identical performance. Importantly, the genetic algorithm-derived classifier shows a statistically significant improvement to ca. 94% accuracy for prospective classification, even though the number of features used is an order of magnitude smaller. Our findings demonstrate the impact of rigorous feature selection in LIBS and also hint at the feasibility of using a discrete filter based detector thereby enabling a cheaper and compact system more amenable to field operations.

  14. Less is more: Avoiding the LIBS dimensionality curse through judicious feature selection for explosive detection

    PubMed Central

    Kumar Myakalwar, Ashwin; Spegazzini, Nicolas; Zhang, Chi; Kumar Anubham, Siva; Dasari, Ramachandra R.; Barman, Ishan; Kumar Gundawar, Manoj

    2015-01-01

    Despite its intrinsic advantages, translation of laser induced breakdown spectroscopy for material identification has been often impeded by the lack of robustness of developed classification models, often due to the presence of spurious correlations. While a number of classifiers exhibiting high discriminatory power have been reported, efforts in establishing the subset of relevant spectral features that enable a fundamental interpretation of the segmentation capability and avoid the ‘curse of dimensionality’ have been lacking. Using LIBS data acquired from a set of secondary explosives, we investigate judicious feature selection approaches and architect two different chemometrics classifiers –based on feature selection through prerequisite knowledge of the sample composition and genetic algorithm, respectively. While the full spectral input results in classification rate of ca.92%, selection of only carbon to hydrogen spectral window results in near identical performance. Importantly, the genetic algorithm-derived classifier shows a statistically significant improvement to ca. 94% accuracy for prospective classification, even though the number of features used is an order of magnitude smaller. Our findings demonstrate the impact of rigorous feature selection in LIBS and also hint at the feasibility of using a discrete filter based detector thereby enabling a cheaper and compact system more amenable to field operations. PMID:26286630

  15. Featural and temporal attention selectively enhance task-appropriate representations in human V1

    PubMed Central

    Warren, Scott; Yacoub, Essa; Ghose, Geoffrey

    2015-01-01

    Our perceptions are often shaped by focusing our attention toward specific features or periods of time irrespective of location. We explore the physiological bases of these non-spatial forms of attention by imaging brain activity while subjects perform a challenging change detection task. The task employs a continuously varying visual stimulus that, for any moment in time, selectively activates functionally distinct subpopulations of primary visual cortex (V1) neurons. When subjects are cued to the timing and nature of the change, the mapping of orientation preference across V1 was systematically shifts toward the cued stimulus just prior to its appearance. A simple linear model can explain this shift: attentional changes are selectively targeted toward neural subpopulations representing the attended feature at the times the feature was anticipated. Our results suggest that featural attention is mediated by a linear change in the responses of task-appropriate neurons across cortex during appropriate periods of time. PMID:25501983

  16. Feature selection for wearable smartphone-based human activity recognition with able bodied, elderly, and stroke patients.

    PubMed

    Capela, Nicole A; Lemaire, Edward D; Baddour, Natalie

    2015-01-01

    Human activity recognition (HAR), using wearable sensors, is a growing area with the potential to provide valuable information on patient mobility to rehabilitation specialists. Smartphones with accelerometer and gyroscope sensors are a convenient, minimally invasive, and low cost approach for mobility monitoring. HAR systems typically pre-process raw signals, segment the signals, and then extract features to be used in a classifier. Feature selection is a crucial step in the process to reduce potentially large data dimensionality and provide viable parameters to enable activity classification. Most HAR systems are customized to an individual research group, including a unique data set, classes, algorithms, and signal features. These data sets are obtained predominantly from able-bodied participants. In this paper, smartphone accelerometer and gyroscope sensor data were collected from populations that can benefit from human activity recognition: able-bodied, elderly, and stroke patients. Data from a consecutive sequence of 41 mobility tasks (18 different tasks) were collected for a total of 44 participants. Seventy-six signal features were calculated and subsets of these features were selected using three filter-based, classifier-independent, feature selection methods (Relief-F, Correlation-based Feature Selection, Fast Correlation Based Filter). The feature subsets were then evaluated using three generic classifiers (Naïve Bayes, Support Vector Machine, j48 Decision Tree). Common features were identified for all three populations, although the stroke population subset had some differences from both able-bodied and elderly sets. Evaluation with the three classifiers showed that the feature subsets produced similar or better accuracies than classification with the entire feature set. Therefore, since these feature subsets are classifier-independent, they should be useful for developing and improving HAR systems across and within populations.

  17. Feature Selection for Wearable Smartphone-Based Human Activity Recognition with Able bodied, Elderly, and Stroke Patients

    PubMed Central

    2015-01-01

    Human activity recognition (HAR), using wearable sensors, is a growing area with the potential to provide valuable information on patient mobility to rehabilitation specialists. Smartphones with accelerometer and gyroscope sensors are a convenient, minimally invasive, and low cost approach for mobility monitoring. HAR systems typically pre-process raw signals, segment the signals, and then extract features to be used in a classifier. Feature selection is a crucial step in the process to reduce potentially large data dimensionality and provide viable parameters to enable activity classification. Most HAR systems are customized to an individual research group, including a unique data set, classes, algorithms, and signal features. These data sets are obtained predominantly from able-bodied participants. In this paper, smartphone accelerometer and gyroscope sensor data were collected from populations that can benefit from human activity recognition: able-bodied, elderly, and stroke patients. Data from a consecutive sequence of 41 mobility tasks (18 different tasks) were collected for a total of 44 participants. Seventy-six signal features were calculated and subsets of these features were selected using three filter-based, classifier-independent, feature selection methods (Relief-F, Correlation-based Feature Selection, Fast Correlation Based Filter). The feature subsets were then evaluated using three generic classifiers (Naïve Bayes, Support Vector Machine, j48 Decision Tree). Common features were identified for all three populations, although the stroke population subset had some differences from both able-bodied and elderly sets. Evaluation with the three classifiers showed that the feature subsets produced similar or better accuracies than classification with the entire feature set. Therefore, since these feature subsets are classifier-independent, they should be useful for developing and improving HAR systems across and within populations. PMID:25885272

  18. Feature selection for neural network based defect classification of ceramic components using high frequency ultrasound.

    PubMed

    Kesharaju, Manasa; Nagarajah, Romesh

    2015-09-01

    The motivation for this research stems from a need for providing a non-destructive testing method capable of detecting and locating any defects and microstructural variations within armour ceramic components before issuing them to the soldiers who rely on them for their survival. The development of an automated ultrasonic inspection based classification system would make possible the checking of each ceramic component and immediately alert the operator about the presence of defects. Generally, in many classification problems a choice of features or dimensionality reduction is significant and simultaneously very difficult, as a substantial computational effort is required to evaluate possible feature subsets. In this research, a combination of artificial neural networks and genetic algorithms are used to optimize the feature subset used in classification of various defects in reaction-sintered silicon carbide ceramic components. Initially wavelet based feature extraction is implemented from the region of interest. An Artificial Neural Network classifier is employed to evaluate the performance of these features. Genetic Algorithm based feature selection is performed. Principal Component Analysis is a popular technique used for feature selection and is compared with the genetic algorithm based technique in terms of classification accuracy and selection of optimal number of features. The experimental results confirm that features identified by Principal Component Analysis lead to improved performance in terms of classification percentage with 96% than Genetic algorithm with 94%. Copyright © 2015 Elsevier B.V. All rights reserved.

  19. [A brief textual research on circulated versions of Dan tai yu an (Jade Case Records of Red Stage].

    PubMed

    Xu, Gao; Zhu, Jianping

    2014-03-01

    Dan tai yu an (Jade Case Records of Red Stage) was compiled by a doctor of the Ming Dynasty Sun Wenyin, including 6 volumes. This book involves Chinese internal medicine, paediatrics, gynaecology, external medicine, and Department of the sense organs (ENT) classified into 73 categories, each of which contains 80 kinds of disease. The total number of disease was 157. Each kind of disease is discussed under the order of etiology, syndrome, pulse condition and treatment. The range of traditional Chinese prescriptions in this book is rather extensive with its indications, administrations and modification of main prescriptions given concretely. Both internal and external treatment are included, and the individual drug and proved recipe are practical and effective, which is a significant reference to clinical practice. There are many versions of this book extant. According to our investigation and research, we replenished some information to the"General Catalogue of TCM Ancient Books", and at the same time, correct some mistakes, providing the basis for further collation and publishing.

  20. Recursive feature selection with significant variables of support vectors.

    PubMed

    Tsai, Chen-An; Huang, Chien-Hsun; Chang, Ching-Wei; Chen, Chun-Houh

    2012-01-01

    The development of DNA microarray makes researchers screen thousands of genes simultaneously and it also helps determine high- and low-expression level genes in normal and disease tissues. Selecting relevant genes for cancer classification is an important issue. Most of the gene selection methods use univariate ranking criteria and arbitrarily choose a threshold to choose genes. However, the parameter setting may not be compatible to the selected classification algorithms. In this paper, we propose a new gene selection method (SVM-t) based on the use of t-statistics embedded in support vector machine. We compared the performance to two similar SVM-based methods: SVM recursive feature elimination (SVMRFE) and recursive support vector machine (RSVM). The three methods were compared based on extensive simulation experiments and analyses of two published microarray datasets. In the simulation experiments, we found that the proposed method is more robust in selecting informative genes than SVMRFE and RSVM and capable to attain good classification performance when the variations of informative and noninformative genes are different. In the analysis of two microarray datasets, the proposed method yields better performance in identifying fewer genes with good prediction accuracy, compared to SVMRFE and RSVM.

  1. Early Visual Cortex Dynamics during Top-Down Modulated Shifts of Feature-Selective Attention.

    PubMed

    Müller, Matthias M; Trautmann, Mireille; Keitel, Christian

    2016-04-01

    Shifting attention from one color to another color or from color to another feature dimension such as shape or orientation is imperative when searching for a certain object in a cluttered scene. Most attention models that emphasize feature-based selection implicitly assume that all shifts in feature-selective attention underlie identical temporal dynamics. Here, we recorded time courses of behavioral data and steady-state visual evoked potentials (SSVEPs), an objective electrophysiological measure of neural dynamics in early visual cortex to investigate temporal dynamics when participants shifted attention from color or orientation toward color or orientation, respectively. SSVEPs were elicited by four random dot kinematograms that flickered at different frequencies. Each random dot kinematogram was composed of dashes that uniquely combined two features from the dimensions color (red or blue) and orientation (slash or backslash). Participants were cued to attend to one feature (such as color or orientation) and respond to coherent motion targets of the to-be-attended feature. We found that shifts toward color occurred earlier after the shifting cue compared with shifts toward orientation, regardless of the original feature (i.e., color or orientation). This was paralleled in SSVEP amplitude modulations as well as in the time course of behavioral data. Overall, our results suggest different neural dynamics during shifts of attention from color and orientation and the respective shifting destinations, namely, either toward color or toward orientation.

  2. Comparison of Naive Bayes and Decision Tree on Feature Selection Using Genetic Algorithm for Classification Problem

    NASA Astrophysics Data System (ADS)

    Rahmadani, S.; Dongoran, A.; Zarlis, M.; Zakarias

    2018-03-01

    This paper discusses the problem of feature selection using genetic algorithms on a dataset for classification problems. The classification model used is the decicion tree (DT), and Naive Bayes. In this paper we will discuss how the Naive Bayes and Decision Tree models to overcome the classification problem in the dataset, where the dataset feature is selectively selected using GA. Then both models compared their performance, whether there is an increase in accuracy or not. From the results obtained shows an increase in accuracy if the feature selection using GA. The proposed model is referred to as GADT (GA-Decision Tree) and GANB (GA-Naive Bayes). The data sets tested in this paper are taken from the UCI Machine Learning repository.

  3. Absolute cosine-based SVM-RFE feature selection method for prostate histopathological grading.

    PubMed

    Sahran, Shahnorbanun; Albashish, Dheeb; Abdullah, Azizi; Shukor, Nordashima Abd; Hayati Md Pauzi, Suria

    2018-04-18

    Feature selection (FS) methods are widely used in grading and diagnosing prostate histopathological images. In this context, FS is based on the texture features obtained from the lumen, nuclei, cytoplasm and stroma, all of which are important tissue components. However, it is difficult to represent the high-dimensional textures of these tissue components. To solve this problem, we propose a new FS method that enables the selection of features with minimal redundancy in the tissue components. We categorise tissue images based on the texture of individual tissue components via the construction of a single classifier and also construct an ensemble learning model by merging the values obtained by each classifier. Another issue that arises is overfitting due to the high-dimensional texture of individual tissue components. We propose a new FS method, SVM-RFE(AC), that integrates a Support Vector Machine-Recursive Feature Elimination (SVM-RFE) embedded procedure with an absolute cosine (AC) filter method to prevent redundancy in the selected features of the SV-RFE and an unoptimised classifier in the AC. We conducted experiments on H&E histopathological prostate and colon cancer images with respect to three prostate classifications, namely benign vs. grade 3, benign vs. grade 4 and grade 3 vs. grade 4. The colon benchmark dataset requires a distinction between grades 1 and 2, which are the most difficult cases to distinguish in the colon domain. The results obtained by both the single and ensemble classification models (which uses the product rule as its merging method) confirm that the proposed SVM-RFE(AC) is superior to the other SVM and SVM-RFE-based methods. We developed an FS method based on SVM-RFE and AC and successfully showed that its use enabled the identification of the most crucial texture feature of each tissue component. Thus, it makes possible the distinction between multiple Gleason grades (e.g. grade 3 vs. grade 4) and its performance is far superior to

  4. On the use of feature selection to improve the detection of sea oil spills in SAR images

    NASA Astrophysics Data System (ADS)

    Mera, David; Bolon-Canedo, Veronica; Cotos, J. M.; Alonso-Betanzos, Amparo

    2017-03-01

    Fast and effective oil spill detection systems are crucial to ensure a proper response to environmental emergencies caused by hydrocarbon pollution on the ocean's surface. Typically, these systems uncover not only oil spills, but also a high number of look-alikes. The feature extraction is a critical and computationally intensive phase where each detected dark spot is independently examined. Traditionally, detection systems use an arbitrary set of features to discriminate between oil spills and look-alikes phenomena. However, Feature Selection (FS) methods based on Machine Learning (ML) have proved to be very useful in real domains for enhancing the generalization capabilities of the classifiers, while discarding the existing irrelevant features. In this work, we present a generic and systematic approach, based on FS methods, for choosing a concise and relevant set of features to improve the oil spill detection systems. We have compared five FS methods: Correlation-based feature selection (CFS), Consistency-based filter, Information Gain, ReliefF and Recursive Feature Elimination for Support Vector Machine (SVM-RFE). They were applied on a 141-input vector composed of features from a collection of outstanding studies. Selected features were validated via a Support Vector Machine (SVM) classifier and the results were compared with previous works. Test experiments revealed that the classifier trained with the 6-input feature vector proposed by SVM-RFE achieved the best accuracy and Cohen's kappa coefficient (87.1% and 74.06% respectively). This is a smaller feature combination with similar or even better classification accuracy than previous works. The presented finding allows to speed up the feature extraction phase without reducing the classifier accuracy. Experiments also confirmed the significance of the geometrical features since 75.0% of the different features selected by the applied FS methods as well as 66.67% of the proposed 6-input feature vector belong to

  5. Differential privacy-based evaporative cooling feature selection and classification with relief-F and random forests.

    PubMed

    Le, Trang T; Simmons, W Kyle; Misaki, Masaya; Bodurka, Jerzy; White, Bill C; Savitz, Jonathan; McKinney, Brett A

    2017-09-15

    Classification of individuals into disease or clinical categories from high-dimensional biological data with low prediction error is an important challenge of statistical learning in bioinformatics. Feature selection can improve classification accuracy but must be incorporated carefully into cross-validation to avoid overfitting. Recently, feature selection methods based on differential privacy, such as differentially private random forests and reusable holdout sets, have been proposed. However, for domains such as bioinformatics, where the number of features is much larger than the number of observations p≫n , these differential privacy methods are susceptible to overfitting. We introduce private Evaporative Cooling, a stochastic privacy-preserving machine learning algorithm that uses Relief-F for feature selection and random forest for privacy preserving classification that also prevents overfitting. We relate the privacy-preserving threshold mechanism to a thermodynamic Maxwell-Boltzmann distribution, where the temperature represents the privacy threshold. We use the thermal statistical physics concept of Evaporative Cooling of atomic gases to perform backward stepwise privacy-preserving feature selection. On simulated data with main effects and statistical interactions, we compare accuracies on holdout and validation sets for three privacy-preserving methods: the reusable holdout, reusable holdout with random forest, and private Evaporative Cooling, which uses Relief-F feature selection and random forest classification. In simulations where interactions exist between attributes, private Evaporative Cooling provides higher classification accuracy without overfitting based on an independent validation set. In simulations without interactions, thresholdout with random forest and private Evaporative Cooling give comparable accuracies. We also apply these privacy methods to human brain resting-state fMRI data from a study of major depressive disorder. Code

  6. Elitist Binary Wolf Search Algorithm for Heuristic Feature Selection in High-Dimensional Bioinformatics Datasets.

    PubMed

    Li, Jinyan; Fong, Simon; Wong, Raymond K; Millham, Richard; Wong, Kelvin K L

    2017-06-28

    Due to the high-dimensional characteristics of dataset, we propose a new method based on the Wolf Search Algorithm (WSA) for optimising the feature selection problem. The proposed approach uses the natural strategy established by Charles Darwin; that is, 'It is not the strongest of the species that survives, but the most adaptable'. This means that in the evolution of a swarm, the elitists are motivated to quickly obtain more and better resources. The memory function helps the proposed method to avoid repeat searches for the worst position in order to enhance the effectiveness of the search, while the binary strategy simplifies the feature selection problem into a similar problem of function optimisation. Furthermore, the wrapper strategy gathers these strengthened wolves with the classifier of extreme learning machine to find a sub-dataset with a reasonable number of features that offers the maximum correctness of global classification models. The experimental results from the six public high-dimensional bioinformatics datasets tested demonstrate that the proposed method can best some of the conventional feature selection methods up to 29% in classification accuracy, and outperform previous WSAs by up to 99.81% in computational time.

  7. Bayesian feature selection for high-dimensional linear regression via the Ising approximation with applications to genomics.

    PubMed

    Fisher, Charles K; Mehta, Pankaj

    2015-06-01

    Feature selection, identifying a subset of variables that are relevant for predicting a response, is an important and challenging component of many methods in statistics and machine learning. Feature selection is especially difficult and computationally intensive when the number of variables approaches or exceeds the number of samples, as is often the case for many genomic datasets. Here, we introduce a new approach--the Bayesian Ising Approximation (BIA)-to rapidly calculate posterior probabilities for feature relevance in L2 penalized linear regression. In the regime where the regression problem is strongly regularized by the prior, we show that computing the marginal posterior probabilities for features is equivalent to computing the magnetizations of an Ising model with weak couplings. Using a mean field approximation, we show it is possible to rapidly compute the feature selection path described by the posterior probabilities as a function of the L2 penalty. We present simulations and analytical results illustrating the accuracy of the BIA on some simple regression problems. Finally, we demonstrate the applicability of the BIA to high-dimensional regression by analyzing a gene expression dataset with nearly 30 000 features. These results also highlight the impact of correlations between features on Bayesian feature selection. An implementation of the BIA in C++, along with data for reproducing our gene expression analyses, are freely available at http://physics.bu.edu/∼pankajm/BIACode. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  8. Feature selection from hyperspectral imaging for guava fruit defects detection

    NASA Astrophysics Data System (ADS)

    Mat Jafri, Mohd. Zubir; Tan, Sou Ching

    2017-06-01

    Development of technology makes hyperspectral imaging commonly used for defect detection. In this research, a hyperspectral imaging system was setup in lab to target for guava fruits defect detection. Guava fruit was selected as the object as to our knowledge, there is fewer attempts were made for guava defect detection based on hyperspectral imaging. The common fluorescent light source was used to represent the uncontrolled lighting condition in lab and analysis was carried out in a specific wavelength range due to inefficiency of this particular light source. Based on the data, the reflectance intensity of this specific setup could be categorized in two groups. Sequential feature selection with linear discriminant (LD) and quadratic discriminant (QD) function were used to select features that could potentially be used in defects detection. Besides the ordinary training method, training dataset in discriminant was separated in two to cater for the uncontrolled lighting condition. These two parts were separated based on the brighter and dimmer area. Four evaluation matrixes were evaluated which are LD with common training method, QD with common training method, LD with two part training method and QD with two part training method. These evaluation matrixes were evaluated using F1-score with total 48 defected areas. Experiment shown that F1-score of linear discriminant with the compensated method hitting 0.8 score, which is the highest score among all.

  9. A Feature Selection Method Based on Fisher's Discriminant Ratio for Text Sentiment Classification

    NASA Astrophysics Data System (ADS)

    Wang, Suge; Li, Deyu; Wei, Yingjie; Li, Hongxia

    With the rapid growth of e-commerce, product reviews on the Web have become an important information source for customers' decision making when they intend to buy some product. As the reviews are often too many for customers to go through, how to automatically classify them into different sentiment orientation categories (i.e. positive/negative) has become a research problem. In this paper, based on Fisher's discriminant ratio, an effective feature selection method is proposed for product review text sentiment classification. In order to validate the validity of the proposed method, we compared it with other methods respectively based on information gain and mutual information while support vector machine is adopted as the classifier. In this paper, 6 subexperiments are conducted by combining different feature selection methods with 2 kinds of candidate feature sets. Under 1006 review documents of cars, the experimental results indicate that the Fisher's discriminant ratio based on word frequency estimation has the best performance with F value 83.3% while the candidate features are the words which appear in both positive and negative texts.

  10. Feature selection and classification of multiparametric medical images using bagging and SVM

    NASA Astrophysics Data System (ADS)

    Fan, Yong; Resnick, Susan M.; Davatzikos, Christos

    2008-03-01

    This paper presents a framework for brain classification based on multi-parametric medical images. This method takes advantage of multi-parametric imaging to provide a set of discriminative features for classifier construction by using a regional feature extraction method which takes into account joint correlations among different image parameters; in the experiments herein, MRI and PET images of the brain are used. Support vector machine classifiers are then trained based on the most discriminative features selected from the feature set. To facilitate robust classification and optimal selection of parameters involved in classification, in view of the well-known "curse of dimensionality", base classifiers are constructed in a bagging (bootstrap aggregating) framework for building an ensemble classifier and the classification parameters of these base classifiers are optimized by means of maximizing the area under the ROC (receiver operating characteristic) curve estimated from their prediction performance on left-out samples of bootstrap sampling. This classification system is tested on a sex classification problem, where it yields over 90% classification rates for unseen subjects. The proposed classification method is also compared with other commonly used classification algorithms, with favorable results. These results illustrate that the methods built upon information jointly extracted from multi-parametric images have the potential to perform individual classification with high sensitivity and specificity.

  11. Feature-selective Attention in Frontoparietal Cortex: Multivoxel Codes Adjust to Prioritize Task-relevant Information.

    PubMed

    Jackson, Jade; Rich, Anina N; Williams, Mark A; Woolgar, Alexandra

    2017-02-01

    Human cognition is characterized by astounding flexibility, enabling us to select appropriate information according to the objectives of our current task. A circuit of frontal and parietal brain regions, often referred to as the frontoparietal attention network or multiple-demand (MD) regions, are believed to play a fundamental role in this flexibility. There is evidence that these regions dynamically adjust their responses to selectively process information that is currently relevant for behavior, as proposed by the "adaptive coding hypothesis" [Duncan, J. An adaptive coding model of neural function in prefrontal cortex. Nature Reviews Neuroscience, 2, 820-829, 2001]. Could this provide a neural mechanism for feature-selective attention, the process by which we preferentially process one feature of a stimulus over another? We used multivariate pattern analysis of fMRI data during a perceptually challenging categorization task to investigate whether the representation of visual object features in the MD regions flexibly adjusts according to task relevance. Participants were trained to categorize visually similar novel objects along two orthogonal stimulus dimensions (length/orientation) and performed short alternating blocks in which only one of these dimensions was relevant. We found that multivoxel patterns of activation in the MD regions encoded the task-relevant distinctions more strongly than the task-irrelevant distinctions: The MD regions discriminated between stimuli of different lengths when length was relevant and between the same objects according to orientation when orientation was relevant. The data suggest a flexible neural system that adjusts its representation of visual objects to preferentially encode stimulus features that are currently relevant for behavior, providing a neural mechanism for feature-selective attention.

  12. Selecting relevant 3D image features of margin sharpness and texture for lung nodule retrieval.

    PubMed

    Ferreira, José Raniery; de Azevedo-Marques, Paulo Mazzoncini; Oliveira, Marcelo Costa

    2017-03-01

    Lung cancer is the leading cause of cancer-related deaths in the world. Its diagnosis is a challenge task to specialists due to several aspects on the classification of lung nodules. Therefore, it is important to integrate content-based image retrieval methods on the lung nodule classification process, since they are capable of retrieving similar cases from databases that were previously diagnosed. However, this mechanism depends on extracting relevant image features in order to obtain high efficiency. The goal of this paper is to perform the selection of 3D image features of margin sharpness and texture that can be relevant on the retrieval of similar cancerous and benign lung nodules. A total of 48 3D image attributes were extracted from the nodule volume. Border sharpness features were extracted from perpendicular lines drawn over the lesion boundary. Second-order texture features were extracted from a cooccurrence matrix. Relevant features were selected by a correlation-based method and a statistical significance analysis. Retrieval performance was assessed according to the nodule's potential malignancy on the 10 most similar cases and by the parameters of precision and recall. Statistical significant features reduced retrieval performance. Correlation-based method selected 2 margin sharpness attributes and 6 texture attributes and obtained higher precision compared to all 48 extracted features on similar nodule retrieval. Feature space dimensionality reduction of 83 % obtained higher retrieval performance and presented to be a computationaly low cost method of retrieving similar nodules for the diagnosis of lung cancer.

  13. A Flexible Mechanism of Rule Selection Enables Rapid Feature-Based Reinforcement Learning

    PubMed Central

    Balcarras, Matthew; Womelsdorf, Thilo

    2016-01-01

    Learning in a new environment is influenced by prior learning and experience. Correctly applying a rule that maps a context to stimuli, actions, and outcomes enables faster learning and better outcomes compared to relying on strategies for learning that are ignorant of task structure. However, it is often difficult to know when and how to apply learned rules in new contexts. In our study we explored how subjects employ different strategies for learning the relationship between stimulus features and positive outcomes in a probabilistic task context. We test the hypothesis that task naive subjects will show enhanced learning of feature specific reward associations by switching to the use of an abstract rule that associates stimuli by feature type and restricts selections to that dimension. To test this hypothesis we designed a decision making task where subjects receive probabilistic feedback following choices between pairs of stimuli. In the task, trials are grouped in two contexts by blocks, where in one type of block there is no unique relationship between a specific feature dimension (stimulus shape or color) and positive outcomes, and following an un-cued transition, alternating blocks have outcomes that are linked to either stimulus shape or color. Two-thirds of subjects (n = 22/32) exhibited behavior that was best fit by a hierarchical feature-rule model. Supporting the prediction of the model mechanism these subjects showed significantly enhanced performance in feature-reward blocks, and rapidly switched their choice strategy to using abstract feature rules when reward contingencies changed. Choice behavior of other subjects (n = 10/32) was fit by a range of alternative reinforcement learning models representing strategies that do not benefit from applying previously learned rules. In summary, these results show that untrained subjects are capable of flexibly shifting between behavioral rules by leveraging simple model-free reinforcement learning and context

  14. YamiPred: A Novel Evolutionary Method for Predicting Pre-miRNAs and Selecting Relevant Features.

    PubMed

    Kleftogiannis, Dimitrios; Theofilatos, Konstantinos; Likothanassis, Spiros; Mavroudi, Seferina

    2015-01-01

    MicroRNAs (miRNAs) are small non-coding RNAs, which play a significant role in gene regulation. Predicting miRNA genes is a challenging bioinformatics problem and existing experimental and computational methods fail to deal with it effectively. We developed YamiPred, an embedded classification method that combines the efficiency and robustness of support vector machines (SVM) with genetic algorithms (GA) for feature selection and parameters optimization. YamiPred was tested in a new and realistic human dataset and was compared with state-of-the-art computational intelligence approaches and the prevalent SVM-based tools for miRNA prediction. Experimental results indicate that YamiPred outperforms existing approaches in terms of accuracy and of geometric mean of sensitivity and specificity. The embedded feature selection component selects a compact feature subset that contributes to the performance optimization. Further experimentation with this minimal feature subset has achieved very high classification performance and revealed the minimum number of samples required for developing a robust predictor. YamiPred also confirmed the important role of commonly used features such as entropy and enthalpy, and uncovered the significance of newly introduced features, such as %A-U aggregate nucleotide frequency and positional entropy. The best model trained on human data has successfully predicted pre-miRNAs to other organisms including the category of viruses.

  15. Schizophrenia with prominent catatonic features: A selective review.

    PubMed

    Ungvari, Gabor S; Gerevich, Jozsef; Takács, Rozália; Gazdag, Gábor

    2017-08-14

    A widely accepted consensus holds that a variety of motor symptoms subsumed under the term 'catatonia' have been an integral part of the symptomatology of schizophrenia since 1896, when Kraepelin proposed the concept of dementia praecox (schizophrenia). Until recently, psychiatric classifications included catatonic schizophrenia mainly through tradition, without compelling evidence of its validity as a schizophrenia subtype. This selective review briefly summarizes the history, psychopathology, demographic and epidemiological data, and treatment options for schizophrenia with prominent catatonic features. Although most catatonic signs and symptoms are easy to observe and measure, the lack of conceptual clarity of catatonia and consensus about the threshold and criteria for its diagnosis have hampered our understanding of how catatonia contributes to the pathophysiology of schizophrenic psychoses. Diverse study samples and methodologies have further hindered research on schizophrenia with prominent catatonic features. A focus on the motor aspects of broadly defined schizophrenia using modern methods of detecting and quantifying catatonic signs and symptoms coupled with sophisticated neuroimaging techniques offers a new approach to research in this long-overlooked field. Copyright © 2017. Published by Elsevier B.V.

  16. Examining applying high performance genetic data feature selection and classification algorithms for colon cancer diagnosis.

    PubMed

    Al-Rajab, Murad; Lu, Joan; Xu, Qiang

    2017-07-01

    This paper examines the accuracy and efficiency (time complexity) of high performance genetic data feature selection and classification algorithms for colon cancer diagnosis. The need for this research derives from the urgent and increasing need for accurate and efficient algorithms. Colon cancer is a leading cause of death worldwide, hence it is vitally important for the cancer tissues to be expertly identified and classified in a rapid and timely manner, to assure both a fast detection of the disease and to expedite the drug discovery process. In this research, a three-phase approach was proposed and implemented: Phases One and Two examined the feature selection algorithms and classification algorithms employed separately, and Phase Three examined the performance of the combination of these. It was found from Phase One that the Particle Swarm Optimization (PSO) algorithm performed best with the colon dataset as a feature selection (29 genes selected) and from Phase Two that the Support Vector Machine (SVM) algorithm outperformed other classifications, with an accuracy of almost 86%. It was also found from Phase Three that the combined use of PSO and SVM surpassed other algorithms in accuracy and performance, and was faster in terms of time analysis (94%). It is concluded that applying feature selection algorithms prior to classification algorithms results in better accuracy than when the latter are applied alone. This conclusion is important and significant to industry and society. Copyright © 2017 Elsevier B.V. All rights reserved.

  17. Feature selection and classification model construction on type 2 diabetic patients' data.

    PubMed

    Huang, Yue; McCullagh, Paul; Black, Norman; Harper, Roy

    2007-11-01

    Diabetes affects between 2% and 4% of the global population (up to 10% in the over 65 age group), and its avoidance and effective treatment are undoubtedly crucial public health and health economics issues in the 21st century. The aim of this research was to identify significant factors influencing diabetes control, by applying feature selection to a working patient management system to assist with ranking, classification and knowledge discovery. The classification models can be used to determine individuals in the population with poor diabetes control status based on physiological and examination factors. The diabetic patients' information was collected by Ulster Community and Hospitals Trust (UCHT) from year 2000 to 2004 as part of clinical management. In order to discover key predictors and latent knowledge, data mining techniques were applied. To improve computational efficiency, a feature selection technique, feature selection via supervised model construction (FSSMC), an optimisation of ReliefF, was used to rank the important attributes affecting diabetic control. After selecting suitable features, three complementary classification techniques (Naïve Bayes, IB1 and C4.5) were applied to the data to predict how well the patients' condition was controlled. FSSMC identified patients' 'age', 'diagnosis duration', the need for 'insulin treatment', 'random blood glucose' measurement and 'diet treatment' as the most important factors influencing blood glucose control. Using the reduced features, a best predictive accuracy of 95% and sensitivity of 98% was achieved. The influence of factors, such as 'type of care' delivered, the use of 'home monitoring', and the importance of 'smoking' on outcome can contribute to domain knowledge in diabetes control. In the care of patients with diabetes, the more important factors identified: patients' 'age', 'diagnosis duration' and 'family history', are beyond the control of physicians. Treatment methods such as 'insulin', 'diet

  18. A hybrid feature selection approach for the early diagnosis of Alzheimer’s disease

    NASA Astrophysics Data System (ADS)

    Gallego-Jutglà, Esteve; Solé-Casals, Jordi; Vialatte, François-Benoît; Elgendi, Mohamed; Cichocki, Andrzej; Dauwels, Justin

    2015-02-01

    Objective. Recently, significant advances have been made in the early diagnosis of Alzheimer’s disease (AD) from electroencephalography (EEG). However, choosing suitable measures is a challenging task. Among other measures, frequency relative power (RP) and loss of complexity have been used with promising results. In the present study we investigate the early diagnosis of AD using synchrony measures and frequency RP on EEG signals, examining the changes found in different frequency ranges. Approach. We first explore the use of a single feature for computing the classification rate (CR), looking for the best frequency range. Then, we present a multiple feature classification system that outperforms all previous results using a feature selection strategy. These two approaches are tested in two different databases, one containing mild cognitive impairment (MCI) and healthy subjects (patients age: 71.9 ± 10.2, healthy subjects age: 71.7 ± 8.3), and the other containing Mild AD and healthy subjects (patients age: 77.6 ± 10.0 healthy subjects age: 69.4 ± 11.5). Main results. Using a single feature to compute CRs we achieve a performance of 78.33% for the MCI data set and of 97.56% for Mild AD. Results are clearly improved using the multiple feature classification, where a CR of 95% is found for the MCI data set using 11 features, and 100% for the Mild AD data set using four features. Significance. The new features selection method described in this work may be a reliable tool that could help to design a realistic system that does not require prior knowledge of a patient's status. With that aim, we explore the standardization of features for MCI and Mild AD data sets with promising results.

  19. A Novel Feature Extraction Method with Feature Selection to Identify Golgi-Resident Protein Types from Imbalanced Data

    PubMed Central

    Yang, Runtao; Zhang, Chengjin; Gao, Rui; Zhang, Lina

    2016-01-01

    The Golgi Apparatus (GA) is a major collection and dispatch station for numerous proteins destined for secretion, plasma membranes and lysosomes. The dysfunction of GA proteins can result in neurodegenerative diseases. Therefore, accurate identification of protein subGolgi localizations may assist in drug development and understanding the mechanisms of the GA involved in various cellular processes. In this paper, a new computational method is proposed for identifying cis-Golgi proteins from trans-Golgi proteins. Based on the concept of Common Spatial Patterns (CSP), a novel feature extraction technique is developed to extract evolutionary information from protein sequences. To deal with the imbalanced benchmark dataset, the Synthetic Minority Over-sampling Technique (SMOTE) is adopted. A feature selection method called Random Forest-Recursive Feature Elimination (RF-RFE) is employed to search the optimal features from the CSP based features and g-gap dipeptide composition. Based on the optimal features, a Random Forest (RF) module is used to distinguish cis-Golgi proteins from trans-Golgi proteins. Through the jackknife cross-validation, the proposed method achieves a promising performance with a sensitivity of 0.889, a specificity of 0.880, an accuracy of 0.885, and a Matthew’s Correlation Coefficient (MCC) of 0.765, which remarkably outperforms previous methods. Moreover, when tested on a common independent dataset, our method also achieves a significantly improved performance. These results highlight the promising performance of the proposed method to identify Golgi-resident protein types. Furthermore, the CSP based feature extraction method may provide guidelines for protein function predictions. PMID:26861308

  20. Optimal feature selection using a modified differential evolution algorithm and its effectiveness for prediction of heart disease.

    PubMed

    Vivekanandan, T; Sriman Narayana Iyengar, N Ch

    2017-11-01

    Enormous data growth in multiple domains has posed a great challenge for data processing and analysis techniques. In particular, the traditional record maintenance strategy has been replaced in the healthcare system. It is vital to develop a model that is able to handle the huge amount of e-healthcare data efficiently. In this paper, the challenging tasks of selecting critical features from the enormous set of available features and diagnosing heart disease are carried out. Feature selection is one of the most widely used pre-processing steps in classification problems. A modified differential evolution (DE) algorithm is used to perform feature selection for cardiovascular disease and optimization of selected features. Of the 10 available strategies for the traditional DE algorithm, the seventh strategy, which is represented by DE/rand/2/exp, is considered for comparative study. The performance analysis of the developed modified DE strategy is given in this paper. With the selected critical features, prediction of heart disease is carried out using fuzzy AHP and a feed-forward neural network. Various performance measures of integrating the modified differential evolution algorithm with fuzzy AHP and a feed-forward neural network in the prediction of heart disease are evaluated in this paper. The accuracy of the proposed hybrid model is 83%, which is higher than that of some other existing models. In addition, the prediction time of the proposed hybrid model is also evaluated and has shown promising results. Copyright © 2017 Elsevier Ltd. All rights reserved.

  1. Human activity recognition based on feature selection in smart home using back-propagation algorithm.

    PubMed

    Fang, Hongqing; He, Lei; Si, Hao; Liu, Peng; Xie, Xiaolei

    2014-09-01

    In this paper, Back-propagation(BP) algorithm has been used to train the feed forward neural network for human activity recognition in smart home environments, and inter-class distance method for feature selection of observed motion sensor events is discussed and tested. And then, the human activity recognition performances of neural network using BP algorithm have been evaluated and compared with other probabilistic algorithms: Naïve Bayes(NB) classifier and Hidden Markov Model(HMM). The results show that different feature datasets yield different activity recognition accuracy. The selection of unsuitable feature datasets increases the computational complexity and degrades the activity recognition accuracy. Furthermore, neural network using BP algorithm has relatively better human activity recognition performances than NB classifier and HMM. Copyright © 2014 ISA. Published by Elsevier Ltd. All rights reserved.

  2. Wrapper-based selection of genetic features in genome-wide association studies through fast matrix operations

    PubMed Central

    2012-01-01

    Background Through the wealth of information contained within them, genome-wide association studies (GWAS) have the potential to provide researchers with a systematic means of associating genetic variants with a wide variety of disease phenotypes. Due to the limitations of approaches that have analyzed single variants one at a time, it has been proposed that the genetic basis of these disorders could be determined through detailed analysis of the genetic variants themselves and in conjunction with one another. The construction of models that account for these subsets of variants requires methodologies that generate predictions based on the total risk of a particular group of polymorphisms. However, due to the excessive number of variants, constructing these types of models has so far been computationally infeasible. Results We have implemented an algorithm, known as greedy RLS, that we use to perform the first known wrapper-based feature selection on the genome-wide level. The running time of greedy RLS grows linearly in the number of training examples, the number of features in the original data set, and the number of selected features. This speed is achieved through computational short-cuts based on matrix calculus. Since the memory consumption in present-day computers can form an even tighter bottleneck than running time, we also developed a space efficient variation of greedy RLS which trades running time for memory. These approaches are then compared to traditional wrapper-based feature selection implementations based on support vector machines (SVM) to reveal the relative speed-up and to assess the feasibility of the new algorithm. As a proof of concept, we apply greedy RLS to the Hypertension – UK National Blood Service WTCCC dataset and select the most predictive variants using 3-fold external cross-validation in less than 26 minutes on a high-end desktop. On this dataset, we also show that greedy RLS has a better classification performance on independent

  3. Identity Recognition Algorithm Using Improved Gabor Feature Selection of Gait Energy Image

    NASA Astrophysics Data System (ADS)

    Chao, LIANG; Ling-yao, JIA; Dong-cheng, SHI

    2017-01-01

    This paper describes an effective gait recognition approach based on Gabor features of gait energy image. In this paper, the kernel Fisher analysis combined with kernel matrix is proposed to select dominant features. The nearest neighbor classifier based on whitened cosine distance is used to discriminate different gait patterns. The approach proposed is tested on the CASIA and USF gait databases. The results show that our approach outperforms other state of gait recognition approaches in terms of recognition accuracy and robustness.

  4. Feature Selection Has a Large Impact on One-Class Classification Accuracy for MicroRNAs in Plants.

    PubMed

    Yousef, Malik; Saçar Demirci, Müşerref Duygu; Khalifa, Waleed; Allmer, Jens

    2016-01-01

    MicroRNAs (miRNAs) are short RNA sequences involved in posttranscriptional gene regulation. Their experimental analysis is complicated and, therefore, needs to be supplemented with computational miRNA detection. Currently computational miRNA detection is mainly performed using machine learning and in particular two-class classification. For machine learning, the miRNAs need to be parametrized and more than 700 features have been described. Positive training examples for machine learning are readily available, but negative data is hard to come by. Therefore, it seems prerogative to use one-class classification instead of two-class classification. Previously, we were able to almost reach two-class classification accuracy using one-class classifiers. In this work, we employ feature selection procedures in conjunction with one-class classification and show that there is up to 36% difference in accuracy among these feature selection methods. The best feature set allowed the training of a one-class classifier which achieved an average accuracy of ~95.6% thereby outperforming previous two-class-based plant miRNA detection approaches by about 0.5%. We believe that this can be improved upon in the future by rigorous filtering of the positive training examples and by improving current feature clustering algorithms to better target pre-miRNA feature selection.

  5. Using multiple classifiers for predicting the risk of endovascular aortic aneurysm repair re-intervention through hybrid feature selection.

    PubMed

    Attallah, Omneya; Karthikesalingam, Alan; Holt, Peter Je; Thompson, Matthew M; Sayers, Rob; Bown, Matthew J; Choke, Eddie C; Ma, Xianghong

    2017-11-01

    Feature selection is essential in medical area; however, its process becomes complicated with the presence of censoring which is the unique character of survival analysis. Most survival feature selection methods are based on Cox's proportional hazard model, though machine learning classifiers are preferred. They are less employed in survival analysis due to censoring which prevents them from directly being used to survival data. Among the few work that employed machine learning classifiers, partial logistic artificial neural network with auto-relevance determination is a well-known method that deals with censoring and perform feature selection for survival data. However, it depends on data replication to handle censoring which leads to unbalanced and biased prediction results especially in highly censored data. Other methods cannot deal with high censoring. Therefore, in this article, a new hybrid feature selection method is proposed which presents a solution to high level censoring. It combines support vector machine, neural network, and K-nearest neighbor classifiers using simple majority voting and a new weighted majority voting method based on survival metric to construct a multiple classifier system. The new hybrid feature selection process uses multiple classifier system as a wrapper method and merges it with iterated feature ranking filter method to further reduce features. Two endovascular aortic repair datasets containing 91% censored patients collected from two centers were used to construct a multicenter study to evaluate the performance of the proposed approach. The results showed the proposed technique outperformed individual classifiers and variable selection methods based on Cox's model such as Akaike and Bayesian information criterions and least absolute shrinkage and selector operator in p values of the log-rank test, sensitivity, and concordance index. This indicates that the proposed classifier is more powerful in correctly predicting the risk of

  6. Regression-Based Approach For Feature Selection In Classification Issues. Application To Breast Cancer Detection And Recurrence

    NASA Astrophysics Data System (ADS)

    Belciug, Smaranda; Serbanescu, Mircea-Sebastian

    2015-09-01

    Feature selection is considered a key factor in classifications/decision problems. It is currently used in designing intelligent decision systems to choose the best features which allow the best performance. This paper proposes a regression-based approach to select the most important predictors to significantly increase the classification performance. Application to breast cancer detection and recurrence using publically available datasets proved the efficiency of this technique.

  7. A Feature Selection Algorithm to Compute Gene Centric Methylation from Probe Level Methylation Data.

    PubMed

    Baur, Brittany; Bozdag, Serdar

    2016-01-01

    DNA methylation is an important epigenetic event that effects gene expression during development and various diseases such as cancer. Understanding the mechanism of action of DNA methylation is important for downstream analysis. In the Illumina Infinium HumanMethylation 450K array, there are tens of probes associated with each gene. Given methylation intensities of all these probes, it is necessary to compute which of these probes are most representative of the gene centric methylation level. In this study, we developed a feature selection algorithm based on sequential forward selection that utilized different classification methods to compute gene centric DNA methylation using probe level DNA methylation data. We compared our algorithm to other feature selection algorithms such as support vector machines with recursive feature elimination, genetic algorithms and ReliefF. We evaluated all methods based on the predictive power of selected probes on their mRNA expression levels and found that a K-Nearest Neighbors classification using the sequential forward selection algorithm performed better than other algorithms based on all metrics. We also observed that transcriptional activities of certain genes were more sensitive to DNA methylation changes than transcriptional activities of other genes. Our algorithm was able to predict the expression of those genes with high accuracy using only DNA methylation data. Our results also showed that those DNA methylation-sensitive genes were enriched in Gene Ontology terms related to the regulation of various biological processes.

  8. Choice: 36 band feature selection software with applications to multispectral pattern recognition

    NASA Technical Reports Server (NTRS)

    Jones, W. C.

    1973-01-01

    Feature selection software was developed at the Earth Resources Laboratory that is capable of inputting up to 36 channels and selecting channel subsets according to several criteria based on divergence. One of the criterion used is compatible with the table look-up classifier requirements. The software indicates which channel subset best separates (based on average divergence) each class from all other classes. The software employs an exhaustive search technique, and computer time is not prohibitive. A typical task to select the best 4 of 22 channels for 12 classes takes 9 minutes on a Univac 1108 computer.

  9. Wavelet Packet Feature Assessment for High-Density Myoelectric Pattern Recognition and Channel Selection toward Stroke Rehabilitation.

    PubMed

    Wang, Dongqing; Zhang, Xu; Gao, Xiaoping; Chen, Xiang; Zhou, Ping

    2016-01-01

    This study presents wavelet packet feature assessment of neural control information in paretic upper limb muscles of stroke survivors for myoelectric pattern recognition, taking advantage of high-resolution time-frequency representations of surface electromyogram (EMG) signals. On this basis, a novel channel selection method was developed by combining the Fisher's class separability index and the sequential feedforward selection analyses, in order to determine a small number of appropriate EMG channels from original high-density EMG electrode array. The advantages of the wavelet packet features and the channel selection analyses were further illustrated by comparing with previous conventional approaches, in terms of classification performance when identifying 20 functional arm/hand movements implemented by 12 stroke survivors. This study offers a practical approach including paretic EMG feature extraction and channel selection that enables active myoelectric control of multiple degrees of freedom with paretic muscles. All these efforts will facilitate upper limb dexterity restoration and improved stroke rehabilitation.

  10. Sequential feature selection for detecting buried objects using forward looking ground penetrating radar

    NASA Astrophysics Data System (ADS)

    Shaw, Darren; Stone, Kevin; Ho, K. C.; Keller, James M.; Luke, Robert H.; Burns, Brian P.

    2016-05-01

    Forward looking ground penetrating radar (FLGPR) has the benefit of detecting objects at a significant standoff distance. The FLGPR signal is radiated over a large surface area and the radar signal return is often weak. Improving detection, especially for buried in road targets, while maintaining an acceptable false alarm rate remains to be a challenging task. Various kinds of features have been developed over the years to increase the FLGPR detection performance. This paper focuses on investigating the use of as many features as possible for detecting buried targets and uses the sequential feature selection technique to automatically choose the features that contribute most for improving performance. Experimental results using data collected at a government test site are presented.

  11. Advances in feature selection methods for hyperspectral image processing in food industry applications: a review.

    PubMed

    Dai, Qiong; Cheng, Jun-Hu; Sun, Da-Wen; Zeng, Xin-An

    2015-01-01

    There is an increased interest in the applications of hyperspectral imaging (HSI) for assessing food quality, safety, and authenticity. HSI provides abundance of spatial and spectral information from foods by combining both spectroscopy and imaging, resulting in hundreds of contiguous wavebands for each spatial position of food samples, also known as the curse of dimensionality. It is desirable to employ feature selection algorithms for decreasing computation burden and increasing predicting accuracy, which are especially relevant in the development of online applications. Recently, a variety of feature selection algorithms have been proposed that can be categorized into three groups based on the searching strategy namely complete search, heuristic search and random search. This review mainly introduced the fundamental of each algorithm, illustrated its applications in hyperspectral data analysis in the food field, and discussed the advantages and disadvantages of these algorithms. It is hoped that this review should provide a guideline for feature selections and data processing in the future development of hyperspectral imaging technique in foods.

  12. Automatic Target Recognition: Statistical Feature Selection of Non-Gaussian Distributed Target Classes

    DTIC Science & Technology

    2011-06-01

    implementing, and evaluating many feature selection algorithms. Mucciardi and Gose compared seven different techniques for choosing subsets of pattern...122 THIS PAGE INTENTIONALLY LEFT BLANK 123 LIST OF REFERENCES [1] A. Mucciardi and E. Gose , “A comparison of seven techniques for

  13. Prediction of Protein Modification Sites of Pyrrolidone Carboxylic Acid Using mRMR Feature Selection and Analysis

    PubMed Central

    Zheng, Lu-Lu; Niu, Shen; Hao, Pei; Feng, KaiYan; Cai, Yu-Dong; Li, Yixue

    2011-01-01

    Pyrrolidone carboxylic acid (PCA) is formed during a common post-translational modification (PTM) of extracellular and multi-pass membrane proteins. In this study, we developed a new predictor to predict the modification sites of PCA based on maximum relevance minimum redundancy (mRMR) and incremental feature selection (IFS). We incorporated 727 features that belonged to 7 kinds of protein properties to predict the modification sites, including sequence conservation, residual disorder, amino acid factor, secondary structure and solvent accessibility, gain/loss of amino acid during evolution, propensity of amino acid to be conserved at protein-protein interface and protein surface, and deviation of side chain carbon atom number. Among these 727 features, 244 features were selected by mRMR and IFS as the optimized features for the prediction, with which the prediction model achieved a maximum of MCC of 0.7812. Feature analysis showed that all feature types contributed to the modification process. Further site-specific feature analysis showed that the features derived from PCA's surrounding sites contributed more to the determination of PCA sites than other sites. The detailed feature analysis in this paper might provide important clues for understanding the mechanism of the PCA formation and guide relevant experimental validations. PMID:22174779

  14. Feature Augmentation via Nonparametrics and Selection (FANS) in High-Dimensional Classification.

    PubMed

    Fan, Jianqing; Feng, Yang; Jiang, Jiancheng; Tong, Xin

    We propose a high dimensional classification method that involves nonparametric feature augmentation. Knowing that marginal density ratios are the most powerful univariate classifiers, we use the ratio estimates to transform the original feature measurements. Subsequently, penalized logistic regression is invoked, taking as input the newly transformed or augmented features. This procedure trains models equipped with local complexity and global simplicity, thereby avoiding the curse of dimensionality while creating a flexible nonlinear decision boundary. The resulting method is called Feature Augmentation via Nonparametrics and Selection (FANS). We motivate FANS by generalizing the Naive Bayes model, writing the log ratio of joint densities as a linear combination of those of marginal densities. It is related to generalized additive models, but has better interpretability and computability. Risk bounds are developed for FANS. In numerical analysis, FANS is compared with competing methods, so as to provide a guideline on its best application domain. Real data analysis demonstrates that FANS performs very competitively on benchmark email spam and gene expression data sets. Moreover, FANS is implemented by an extremely fast algorithm through parallel computing.

  15. Feature Augmentation via Nonparametrics and Selection (FANS) in High-Dimensional Classification

    PubMed Central

    Feng, Yang; Jiang, Jiancheng; Tong, Xin

    2015-01-01

    We propose a high dimensional classification method that involves nonparametric feature augmentation. Knowing that marginal density ratios are the most powerful univariate classifiers, we use the ratio estimates to transform the original feature measurements. Subsequently, penalized logistic regression is invoked, taking as input the newly transformed or augmented features. This procedure trains models equipped with local complexity and global simplicity, thereby avoiding the curse of dimensionality while creating a flexible nonlinear decision boundary. The resulting method is called Feature Augmentation via Nonparametrics and Selection (FANS). We motivate FANS by generalizing the Naive Bayes model, writing the log ratio of joint densities as a linear combination of those of marginal densities. It is related to generalized additive models, but has better interpretability and computability. Risk bounds are developed for FANS. In numerical analysis, FANS is compared with competing methods, so as to provide a guideline on its best application domain. Real data analysis demonstrates that FANS performs very competitively on benchmark email spam and gene expression data sets. Moreover, FANS is implemented by an extremely fast algorithm through parallel computing. PMID:27185970

  16. Efficient Feature Selection and Classification of Protein Sequence Data in Bioinformatics

    PubMed Central

    Faye, Ibrahima; Samir, Brahim Belhaouari; Md Said, Abas

    2014-01-01

    Bioinformatics has been an emerging area of research for the last three decades. The ultimate aims of bioinformatics were to store and manage the biological data, and develop and analyze computational tools to enhance their understanding. The size of data accumulated under various sequencing projects is increasing exponentially, which presents difficulties for the experimental methods. To reduce the gap between newly sequenced protein and proteins with known functions, many computational techniques involving classification and clustering algorithms were proposed in the past. The classification of protein sequences into existing superfamilies is helpful in predicting the structure and function of large amount of newly discovered proteins. The existing classification results are unsatisfactory due to a huge size of features obtained through various feature encoding methods. In this work, a statistical metric-based feature selection technique has been proposed in order to reduce the size of the extracted feature vector. The proposed method of protein classification shows significant improvement in terms of performance measure metrics: accuracy, sensitivity, specificity, recall, F-measure, and so forth. PMID:25045727

  17. Feature Selection for Wheat Yield Prediction

    NASA Astrophysics Data System (ADS)

    Ruß, Georg; Kruse, Rudolf

    Carrying out effective and sustainable agriculture has become an important issue in recent years. Agricultural production has to keep up with an everincreasing population by taking advantage of a field’s heterogeneity. Nowadays, modern technology such as the global positioning system (GPS) and a multitude of developed sensors enable farmers to better measure their fields’ heterogeneities. For this small-scale, precise treatment the term precision agriculture has been coined. However, the large amounts of data that are (literally) harvested during the growing season have to be analysed. In particular, the farmer is interested in knowing whether a newly developed heterogeneity sensor is potentially advantageous or not. Since the sensor data are readily available, this issue should be seen from an artificial intelligence perspective. There it can be treated as a feature selection problem. The additional task of yield prediction can be treated as a multi-dimensional regression problem. This article aims to present an approach towards solving these two practically important problems using artificial intelligence and data mining ideas and methodologies.

  18. A Multicriteria Approach to Find Predictive and Sparse Models with Stable Feature Selection for High-Dimensional Data.

    PubMed

    Bommert, Andrea; Rahnenführer, Jörg; Lang, Michel

    2017-01-01

    Finding a good predictive model for a high-dimensional data set can be challenging. For genetic data, it is not only important to find a model with high predictive accuracy, but it is also important that this model uses only few features and that the selection of these features is stable. This is because, in bioinformatics, the models are used not only for prediction but also for drawing biological conclusions which makes the interpretability and reliability of the model crucial. We suggest using three target criteria when fitting a predictive model to a high-dimensional data set: the classification accuracy, the stability of the feature selection, and the number of chosen features. As it is unclear which measure is best for evaluating the stability, we first compare a variety of stability measures. We conclude that the Pearson correlation has the best theoretical and empirical properties. Also, we find that for the stability assessment behaviour it is most important that a measure contains a correction for chance or large numbers of chosen features. Then, we analyse Pareto fronts and conclude that it is possible to find models with a stable selection of few features without losing much predictive accuracy.

  19. Feature-selective attention enhances color signals in early visual areas of the human brain.

    PubMed

    Müller, M M; Andersen, S; Trujillo, N J; Valdés-Sosa, P; Malinowski, P; Hillyard, S A

    2006-09-19

    We used an electrophysiological measure of selective stimulus processing (the steady-state visual evoked potential, SSVEP) to investigate feature-specific attention to color cues. Subjects viewed a display consisting of spatially intermingled red and blue dots that continually shifted their positions at random. The red and blue dots flickered at different frequencies and thereby elicited distinguishable SSVEP signals in the visual cortex. Paying attention selectively to either the red or blue dot population produced an enhanced amplitude of its frequency-tagged SSVEP, which was localized by source modeling to early levels of the visual cortex. A control experiment showed that this selection was based on color rather than flicker frequency cues. This signal amplification of attended color items provides an empirical basis for the rapid identification of feature conjunctions during visual search, as proposed by "guided search" models.

  20. Ischemia episode detection in ECG using kernel density estimation, support vector machine and feature selection

    PubMed Central

    2012-01-01

    Background Myocardial ischemia can be developed into more serious diseases. Early Detection of the ischemic syndrome in electrocardiogram (ECG) more accurately and automatically can prevent it from developing into a catastrophic disease. To this end, we propose a new method, which employs wavelets and simple feature selection. Methods For training and testing, the European ST-T database is used, which is comprised of 367 ischemic ST episodes in 90 records. We first remove baseline wandering, and detect time positions of QRS complexes by a method based on the discrete wavelet transform. Next, for each heart beat, we extract three features which can be used for differentiating ST episodes from normal: 1) the area between QRS offset and T-peak points, 2) the normalized and signed sum from QRS offset to effective zero voltage point, and 3) the slope from QRS onset to offset point. We average the feature values for successive five beats to reduce effects of outliers. Finally we apply classifiers to those features. Results We evaluated the algorithm by kernel density estimation (KDE) and support vector machine (SVM) methods. Sensitivity and specificity for KDE were 0.939 and 0.912, respectively. The KDE classifier detects 349 ischemic ST episodes out of total 367 ST episodes. Sensitivity and specificity of SVM were 0.941 and 0.923, respectively. The SVM classifier detects 355 ischemic ST episodes. Conclusions We proposed a new method for detecting ischemia in ECG. It contains signal processing techniques of removing baseline wandering and detecting time positions of QRS complexes by discrete wavelet transform, and feature extraction from morphology of ECG waveforms explicitly. It was shown that the number of selected features were sufficient to discriminate ischemic ST episodes from the normal ones. We also showed how the proposed KDE classifier can automatically select kernel bandwidths, meaning that the algorithm does not require any numerical values of the parameters

  1. Classification of Alzheimer's disease patients with hippocampal shape wrapper-based feature selection and support vector machine

    NASA Astrophysics Data System (ADS)

    Young, Jonathan; Ridgway, Gerard; Leung, Kelvin; Ourselin, Sebastien

    2012-02-01

    It is well known that hippocampal atrophy is a marker of the onset of Alzheimer's disease (AD) and as a result hippocampal volumetry has been used in a number of studies to provide early diagnosis of AD and predict conversion of mild cognitive impairment patients to AD. However, rates of atrophy are not uniform across the hippocampus making shape analysis a potentially more accurate biomarker. This study studies the hippocampi from 226 healthy controls, 148 AD patients and 330 MCI patients obtained from T1 weighted structural MRI images from the ADNI database. The hippocampi are anatomically segmented using the MAPS multi-atlas segmentation method, and the resulting binary images are then processed with SPHARM software to decompose their shapes as a weighted sum of spherical harmonic basis functions. The resulting parameterizations are then used as feature vectors in Support Vector Machine (SVM) classification. A wrapper based feature selection method was used as this considers the utility of features in discriminating classes in combination, fully exploiting the multivariate nature of the data and optimizing the selected set of features for the type of classifier that is used. The leave-one-out cross validated accuracy obtained on training data is 88.6% for classifying AD vs controls and 74% for classifying MCI-converters vs MCI-stable with very compact feature sets, showing that this is a highly promising method. There is currently a considerable fall in accuracy on unseen data indicating that the feature selection is sensitive to the data used, however feature ensemble methods may overcome this.

  2. Cluster analysis based on dimensional information with applications to feature selection and classification

    NASA Technical Reports Server (NTRS)

    Eigen, D. J.; Fromm, F. R.; Northouse, R. A.

    1974-01-01

    A new clustering algorithm is presented that is based on dimensional information. The algorithm includes an inherent feature selection criterion, which is discussed. Further, a heuristic method for choosing the proper number of intervals for a frequency distribution histogram, a feature necessary for the algorithm, is presented. The algorithm, although usable as a stand-alone clustering technique, is then utilized as a global approximator. Local clustering techniques and configuration of a global-local scheme are discussed, and finally the complete global-local and feature selector configuration is shown in application to a real-time adaptive classification scheme for the analysis of remote sensed multispectral scanner data.

  3. Feature Selection for Motor Imagery EEG Classification Based on Firefly Algorithm and Learning Automata

    PubMed Central

    Liu, Aiming; Liu, Quan; Ai, Qingsong; Xie, Yi; Chen, Anqi

    2017-01-01

    Motor Imagery (MI) electroencephalography (EEG) is widely studied for its non-invasiveness, easy availability, portability, and high temporal resolution. As for MI EEG signal processing, the high dimensions of features represent a research challenge. It is necessary to eliminate redundant features, which not only create an additional overhead of managing the space complexity, but also might include outliers, thereby reducing classification accuracy. The firefly algorithm (FA) can adaptively select the best subset of features, and improve classification accuracy. However, the FA is easily entrapped in a local optimum. To solve this problem, this paper proposes a method of combining the firefly algorithm and learning automata (LA) to optimize feature selection for motor imagery EEG. We employed a method of combining common spatial pattern (CSP) and local characteristic-scale decomposition (LCD) algorithms to obtain a high dimensional feature set, and classified it by using the spectral regression discriminant analysis (SRDA) classifier. Both the fourth brain–computer interface competition data and real-time data acquired in our designed experiments were used to verify the validation of the proposed method. Compared with genetic and adaptive weight particle swarm optimization algorithms, the experimental results show that our proposed method effectively eliminates redundant features, and improves the classification accuracy of MI EEG signals. In addition, a real-time brain–computer interface system was implemented to verify the feasibility of our proposed methods being applied in practical brain–computer interface systems. PMID:29117100

  4. Feature Selection for Motor Imagery EEG Classification Based on Firefly Algorithm and Learning Automata.

    PubMed

    Liu, Aiming; Chen, Kun; Liu, Quan; Ai, Qingsong; Xie, Yi; Chen, Anqi

    2017-11-08

    Motor Imagery (MI) electroencephalography (EEG) is widely studied for its non-invasiveness, easy availability, portability, and high temporal resolution. As for MI EEG signal processing, the high dimensions of features represent a research challenge. It is necessary to eliminate redundant features, which not only create an additional overhead of managing the space complexity, but also might include outliers, thereby reducing classification accuracy. The firefly algorithm (FA) can adaptively select the best subset of features, and improve classification accuracy. However, the FA is easily entrapped in a local optimum. To solve this problem, this paper proposes a method of combining the firefly algorithm and learning automata (LA) to optimize feature selection for motor imagery EEG. We employed a method of combining common spatial pattern (CSP) and local characteristic-scale decomposition (LCD) algorithms to obtain a high dimensional feature set, and classified it by using the spectral regression discriminant analysis (SRDA) classifier. Both the fourth brain-computer interface competition data and real-time data acquired in our designed experiments were used to verify the validation of the proposed method. Compared with genetic and adaptive weight particle swarm optimization algorithms, the experimental results show that our proposed method effectively eliminates redundant features, and improves the classification accuracy of MI EEG signals. In addition, a real-time brain-computer interface system was implemented to verify the feasibility of our proposed methods being applied in practical brain-computer interface systems.

  5. Feature and Score Fusion Based Multiple Classifier Selection for Iris Recognition

    PubMed Central

    Islam, Md. Rabiul

    2014-01-01

    The aim of this work is to propose a new feature and score fusion based iris recognition approach where voting method on Multiple Classifier Selection technique has been applied. Four Discrete Hidden Markov Model classifiers output, that is, left iris based unimodal system, right iris based unimodal system, left-right iris feature fusion based multimodal system, and left-right iris likelihood ratio score fusion based multimodal system, is combined using voting method to achieve the final recognition result. CASIA-IrisV4 database has been used to measure the performance of the proposed system with various dimensions. Experimental results show the versatility of the proposed system of four different classifiers with various dimensions. Finally, recognition accuracy of the proposed system has been compared with existing N hamming distance score fusion approach proposed by Ma et al., log-likelihood ratio score fusion approach proposed by Schmid et al., and single level feature fusion approach proposed by Hollingsworth et al. PMID:25114676

  6. Feature and score fusion based multiple classifier selection for iris recognition.

    PubMed

    Islam, Md Rabiul

    2014-01-01

    The aim of this work is to propose a new feature and score fusion based iris recognition approach where voting method on Multiple Classifier Selection technique has been applied. Four Discrete Hidden Markov Model classifiers output, that is, left iris based unimodal system, right iris based unimodal system, left-right iris feature fusion based multimodal system, and left-right iris likelihood ratio score fusion based multimodal system, is combined using voting method to achieve the final recognition result. CASIA-IrisV4 database has been used to measure the performance of the proposed system with various dimensions. Experimental results show the versatility of the proposed system of four different classifiers with various dimensions. Finally, recognition accuracy of the proposed system has been compared with existing N hamming distance score fusion approach proposed by Ma et al., log-likelihood ratio score fusion approach proposed by Schmid et al., and single level feature fusion approach proposed by Hollingsworth et al.

  7. Feature Genes Selection Using Supervised Locally Linear Embedding and Correlation Coefficient for Microarray Classification

    PubMed Central

    Wang, Yun; Huang, Fangzhou

    2018-01-01

    The selection of feature genes with high recognition ability from the gene expression profiles has gained great significance in biology. However, most of the existing methods have a high time complexity and poor classification performance. Motivated by this, an effective feature selection method, called supervised locally linear embedding and Spearman's rank correlation coefficient (SLLE-SC2), is proposed which is based on the concept of locally linear embedding and correlation coefficient algorithms. Supervised locally linear embedding takes into account class label information and improves the classification performance. Furthermore, Spearman's rank correlation coefficient is used to remove the coexpression genes. The experiment results obtained on four public tumor microarray datasets illustrate that our method is valid and feasible. PMID:29666661

  8. Feature Genes Selection Using Supervised Locally Linear Embedding and Correlation Coefficient for Microarray Classification.

    PubMed

    Xu, Jiucheng; Mu, Huiyu; Wang, Yun; Huang, Fangzhou

    2018-01-01

    The selection of feature genes with high recognition ability from the gene expression profiles has gained great significance in biology. However, most of the existing methods have a high time complexity and poor classification performance. Motivated by this, an effective feature selection method, called supervised locally linear embedding and Spearman's rank correlation coefficient (SLLE-SC 2 ), is proposed which is based on the concept of locally linear embedding and correlation coefficient algorithms. Supervised locally linear embedding takes into account class label information and improves the classification performance. Furthermore, Spearman's rank correlation coefficient is used to remove the coexpression genes. The experiment results obtained on four public tumor microarray datasets illustrate that our method is valid and feasible.

  9. Sequential and Mixed Genetic Algorithm and Learning Automata (SGALA, MGALA) for Feature Selection in QSAR

    PubMed Central

    MotieGhader, Habib; Gharaghani, Sajjad; Masoudi-Sobhanzadeh, Yosef; Masoudi-Nejad, Ali

    2017-01-01

    Feature selection is of great importance in Quantitative Structure-Activity Relationship (QSAR) analysis. This problem has been solved using some meta-heuristic algorithms such as GA, PSO, ACO and so on. In this work two novel hybrid meta-heuristic algorithms i.e. Sequential GA and LA (SGALA) and Mixed GA and LA (MGALA), which are based on Genetic algorithm and learning automata for QSAR feature selection are proposed. SGALA algorithm uses advantages of Genetic algorithm and Learning Automata sequentially and the MGALA algorithm uses advantages of Genetic Algorithm and Learning Automata simultaneously. We applied our proposed algorithms to select the minimum possible number of features from three different datasets and also we observed that the MGALA and SGALA algorithms had the best outcome independently and in average compared to other feature selection algorithms. Through comparison of our proposed algorithms, we deduced that the rate of convergence to optimal result in MGALA and SGALA algorithms were better than the rate of GA, ACO, PSO and LA algorithms. In the end, the results of GA, ACO, PSO, LA, SGALA, and MGALA algorithms were applied as the input of LS-SVR model and the results from LS-SVR models showed that the LS-SVR model had more predictive ability with the input from SGALA and MGALA algorithms than the input from all other mentioned algorithms. Therefore, the results have corroborated that not only is the predictive efficiency of proposed algorithms better, but their rate of convergence is also superior to the all other mentioned algorithms. PMID:28979308

  10. Sequential and Mixed Genetic Algorithm and Learning Automata (SGALA, MGALA) for Feature Selection in QSAR.

    PubMed

    MotieGhader, Habib; Gharaghani, Sajjad; Masoudi-Sobhanzadeh, Yosef; Masoudi-Nejad, Ali

    2017-01-01

    Feature selection is of great importance in Quantitative Structure-Activity Relationship (QSAR) analysis. This problem has been solved using some meta-heuristic algorithms such as GA, PSO, ACO and so on. In this work two novel hybrid meta-heuristic algorithms i.e. Sequential GA and LA (SGALA) and Mixed GA and LA (MGALA), which are based on Genetic algorithm and learning automata for QSAR feature selection are proposed. SGALA algorithm uses advantages of Genetic algorithm and Learning Automata sequentially and the MGALA algorithm uses advantages of Genetic Algorithm and Learning Automata simultaneously. We applied our proposed algorithms to select the minimum possible number of features from three different datasets and also we observed that the MGALA and SGALA algorithms had the best outcome independently and in average compared to other feature selection algorithms. Through comparison of our proposed algorithms, we deduced that the rate of convergence to optimal result in MGALA and SGALA algorithms were better than the rate of GA, ACO, PSO and LA algorithms. In the end, the results of GA, ACO, PSO, LA, SGALA, and MGALA algorithms were applied as the input of LS-SVR model and the results from LS-SVR models showed that the LS-SVR model had more predictive ability with the input from SGALA and MGALA algorithms than the input from all other mentioned algorithms. Therefore, the results have corroborated that not only is the predictive efficiency of proposed algorithms better, but their rate of convergence is also superior to the all other mentioned algorithms.

  11. A combined Fisher and Laplacian score for feature selection in QSAR based drug design using compounds with known and unknown activities.

    PubMed

    Valizade Hasanloei, Mohammad Amin; Sheikhpour, Razieh; Sarram, Mehdi Agha; Sheikhpour, Elnaz; Sharifi, Hamdollah

    2018-02-01

    Quantitative structure-activity relationship (QSAR) is an effective computational technique for drug design that relates the chemical structures of compounds to their biological activities. Feature selection is an important step in QSAR based drug design to select the most relevant descriptors. One of the most popular feature selection methods for classification problems is Fisher score which aim is to minimize the within-class distance and maximize the between-class distance. In this study, the properties of Fisher criterion were extended for QSAR models to define the new distance metrics based on the continuous activity values of compounds with known activities. Then, a semi-supervised feature selection method was proposed based on the combination of Fisher and Laplacian criteria which exploits both compounds with known and unknown activities to select the relevant descriptors. To demonstrate the efficiency of the proposed semi-supervised feature selection method in selecting the relevant descriptors, we applied the method and other feature selection methods on three QSAR data sets such as serine/threonine-protein kinase PLK3 inhibitors, ROCK inhibitors and phenol compounds. The results demonstrated that the QSAR models built on the selected descriptors by the proposed semi-supervised method have better performance than other models. This indicates the efficiency of the proposed method in selecting the relevant descriptors using the compounds with known and unknown activities. The results of this study showed that the compounds with known and unknown activities can be helpful to improve the performance of the combined Fisher and Laplacian based feature selection methods.

  12. A combined Fisher and Laplacian score for feature selection in QSAR based drug design using compounds with known and unknown activities

    NASA Astrophysics Data System (ADS)

    Valizade Hasanloei, Mohammad Amin; Sheikhpour, Razieh; Sarram, Mehdi Agha; Sheikhpour, Elnaz; Sharifi, Hamdollah

    2018-02-01

    Quantitative structure-activity relationship (QSAR) is an effective computational technique for drug design that relates the chemical structures of compounds to their biological activities. Feature selection is an important step in QSAR based drug design to select the most relevant descriptors. One of the most popular feature selection methods for classification problems is Fisher score which aim is to minimize the within-class distance and maximize the between-class distance. In this study, the properties of Fisher criterion were extended for QSAR models to define the new distance metrics based on the continuous activity values of compounds with known activities. Then, a semi-supervised feature selection method was proposed based on the combination of Fisher and Laplacian criteria which exploits both compounds with known and unknown activities to select the relevant descriptors. To demonstrate the efficiency of the proposed semi-supervised feature selection method in selecting the relevant descriptors, we applied the method and other feature selection methods on three QSAR data sets such as serine/threonine-protein kinase PLK3 inhibitors, ROCK inhibitors and phenol compounds. The results demonstrated that the QSAR models built on the selected descriptors by the proposed semi-supervised method have better performance than other models. This indicates the efficiency of the proposed method in selecting the relevant descriptors using the compounds with known and unknown activities. The results of this study showed that the compounds with known and unknown activities can be helpful to improve the performance of the combined Fisher and Laplacian based feature selection methods.

  13. Revealing metabolite biomarkers for acupuncture treatment by linear programming based feature selection.

    PubMed

    Wang, Yong; Wu, Qiao-Feng; Chen, Chen; Wu, Ling-Yun; Yan, Xian-Zhong; Yu, Shu-Guang; Zhang, Xiang-Sun; Liang, Fan-Rong

    2012-01-01

    Acupuncture has been practiced in China for thousands of years as part of the Traditional Chinese Medicine (TCM) and has gradually accepted in western countries as an alternative or complementary treatment. However, the underlying mechanism of acupuncture, especially whether there exists any difference between varies acupoints, remains largely unknown, which hinders its widespread use. In this study, we develop a novel Linear Programming based Feature Selection method (LPFS) to understand the mechanism of acupuncture effect, at molecular level, by revealing the metabolite biomarkers for acupuncture treatment. Specifically, we generate and investigate the high-throughput metabolic profiles of acupuncture treatment at several acupoints in human. To select the subsets of metabolites that best characterize the acupuncture effect for each meridian point, an optimization model is proposed to identify biomarkers from high-dimensional metabolic data from case and control samples. Importantly, we use nearest centroid as the prototype to simultaneously minimize the number of selected features and the leave-one-out cross validation error of classifier. We compared the performance of LPFS to several state-of-the-art methods, such as SVM recursive feature elimination (SVM-RFE) and sparse multinomial logistic regression approach (SMLR). We find that our LPFS method tends to reveal a small set of metabolites with small standard deviation and large shifts, which exactly serves our requirement for good biomarker. Biologically, several metabolite biomarkers for acupuncture treatment are revealed and serve as the candidates for further mechanism investigation. Also biomakers derived from five meridian points, Zusanli (ST36), Liangmen (ST21), Juliao (ST3), Yanglingquan (GB34), and Weizhong (BL40), are compared for their similarity and difference, which provide evidence for the specificity of acupoints. Our result demonstrates that metabolic profiling might be a promising method to

  14. Revealing metabolite biomarkers for acupuncture treatment by linear programming based feature selection

    PubMed Central

    2012-01-01

    Background Acupuncture has been practiced in China for thousands of years as part of the Traditional Chinese Medicine (TCM) and has gradually accepted in western countries as an alternative or complementary treatment. However, the underlying mechanism of acupuncture, especially whether there exists any difference between varies acupoints, remains largely unknown, which hinders its widespread use. Results In this study, we develop a novel Linear Programming based Feature Selection method (LPFS) to understand the mechanism of acupuncture effect, at molecular level, by revealing the metabolite biomarkers for acupuncture treatment. Specifically, we generate and investigate the high-throughput metabolic profiles of acupuncture treatment at several acupoints in human. To select the subsets of metabolites that best characterize the acupuncture effect for each meridian point, an optimization model is proposed to identify biomarkers from high-dimensional metabolic data from case and control samples. Importantly, we use nearest centroid as the prototype to simultaneously minimize the number of selected features and the leave-one-out cross validation error of classifier. We compared the performance of LPFS to several state-of-the-art methods, such as SVM recursive feature elimination (SVM-RFE) and sparse multinomial logistic regression approach (SMLR). We find that our LPFS method tends to reveal a small set of metabolites with small standard deviation and large shifts, which exactly serves our requirement for good biomarker. Biologically, several metabolite biomarkers for acupuncture treatment are revealed and serve as the candidates for further mechanism investigation. Also biomakers derived from five meridian points, Zusanli (ST36), Liangmen (ST21), Juliao (ST3), Yanglingquan (GB34), and Weizhong (BL40), are compared for their similarity and difference, which provide evidence for the specificity of acupoints. Conclusions Our result demonstrates that metabolic profiling

  15. Automatic Feature Selection and Improved Classification in SICADA Counterfeit Electronics Detection

    DTIC Science & Technology

    2017-03-20

    The SICADA methodology was developed to detect such counterfeit microelectronics by collecting power side channel data and applying machine learning...to identify counterfeits. This methodology has been extended to include a two-step automated feature selection process and now uses a one-class SVM...classifier. We describe this methodology and show results for empirical data collected from several types of Microchip dsPIC33F microcontrollers

  16. A general procedure to generate models for urban environmental-noise pollution using feature selection and machine learning methods.

    PubMed

    Torija, Antonio J; Ruiz, Diego P

    2015-02-01

    The prediction of environmental noise in urban environments requires the solution of a complex and non-linear problem, since there are complex relationships among the multitude of variables involved in the characterization and modelling of environmental noise and environmental-noise magnitudes. Moreover, the inclusion of the great spatial heterogeneity characteristic of urban environments seems to be essential in order to achieve an accurate environmental-noise prediction in cities. This problem is addressed in this paper, where a procedure based on feature-selection techniques and machine-learning regression methods is proposed and applied to this environmental problem. Three machine-learning regression methods, which are considered very robust in solving non-linear problems, are used to estimate the energy-equivalent sound-pressure level descriptor (LAeq). These three methods are: (i) multilayer perceptron (MLP), (ii) sequential minimal optimisation (SMO), and (iii) Gaussian processes for regression (GPR). In addition, because of the high number of input variables involved in environmental-noise modelling and estimation in urban environments, which make LAeq prediction models quite complex and costly in terms of time and resources for application to real situations, three different techniques are used to approach feature selection or data reduction. The feature-selection techniques used are: (i) correlation-based feature-subset selection (CFS), (ii) wrapper for feature-subset selection (WFS), and the data reduction technique is principal-component analysis (PCA). The subsequent analysis leads to a proposal of different schemes, depending on the needs regarding data collection and accuracy. The use of WFS as the feature-selection technique with the implementation of SMO or GPR as regression algorithm provides the best LAeq estimation (R(2)=0.94 and mean absolute error (MAE)=1.14-1.16 dB(A)). Copyright © 2014 Elsevier B.V. All rights reserved.

  17. The Emotion Recognition System Based on Autoregressive Model and Sequential Forward Feature Selection of Electroencephalogram Signals

    PubMed Central

    Hatamikia, Sepideh; Maghooli, Keivan; Nasrabadi, Ali Motie

    2014-01-01

    Electroencephalogram (EEG) is one of the useful biological signals to distinguish different brain diseases and mental states. In recent years, detecting different emotional states from biological signals has been merged more attention by researchers and several feature extraction methods and classifiers are suggested to recognize emotions from EEG signals. In this research, we introduce an emotion recognition system using autoregressive (AR) model, sequential forward feature selection (SFS) and K-nearest neighbor (KNN) classifier using EEG signals during emotional audio-visual inductions. The main purpose of this paper is to investigate the performance of AR features in the classification of emotional states. To achieve this goal, a distinguished AR method (Burg's method) based on Levinson-Durbin's recursive algorithm is used and AR coefficients are extracted as feature vectors. In the next step, two different feature selection methods based on SFS algorithm and Davies–Bouldin index are used in order to decrease the complexity of computing and redundancy of features; then, three different classifiers include KNN, quadratic discriminant analysis and linear discriminant analysis are used to discriminate two and three different classes of valence and arousal levels. The proposed method is evaluated with EEG signals of available database for emotion analysis using physiological signals, which are recorded from 32 participants during 40 1 min audio visual inductions. According to the results, AR features are efficient to recognize emotional states from EEG signals, and KNN performs better than two other classifiers in discriminating of both two and three valence/arousal classes. The results also show that SFS method improves accuracies by almost 10-15% as compared to Davies–Bouldin based feature selection. The best accuracies are %72.33 and %74.20 for two classes of valence and arousal and %61.10 and %65.16 for three classes, respectively. PMID:25298928

  18. Intelligent feature selection techniques for pattern classification of Lamb wave signals

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hinders, Mark K.; Miller, Corey A.

    2014-02-18

    Lamb wave interaction with flaws is a complex, three-dimensional phenomenon, which often frustrates signal interpretation schemes based on mode arrival time shifts predicted by dispersion curves. As the flaw severity increases, scattering and mode conversion effects will often dominate the time-domain signals, obscuring available information about flaws because multiple modes may arrive on top of each other. Even for idealized flaw geometries the scattering and mode conversion behavior of Lamb waves is very complex. Here, multi-mode Lamb waves in a metal plate are propagated across a rectangular flat-bottom hole in a sequence of pitch-catch measurements corresponding to the double crossholemore » tomography geometry. The flaw is sequentially deepened, with the Lamb wave measurements repeated at each flaw depth. Lamb wave tomography reconstructions are used to identify which waveforms have interacted with the flaw and thereby carry information about its depth. Multiple features are extracted from each of the Lamb wave signals using wavelets, which are then fed to statistical pattern classification algorithms that identify flaw severity. In order to achieve the highest classification accuracy, an optimal feature space is required but it’s never known a priori which features are going to be best. For structural health monitoring we make use of the fact that physical flaws, such as corrosion, will only increase over time. This allows us to identify feature vectors which are topologically well-behaved by requiring that sequential classes “line up” in feature vector space. An intelligent feature selection routine is illustrated that identifies favorable class distributions in multi-dimensional feature spaces using computational homology theory. Betti numbers and formal classification accuracies are calculated for each feature space subset to establish a correlation between the topology of the class distribution and the corresponding classification accuracy.« less

  19. Feature Selection for Speech Emotion Recognition in Spanish and Basque: On the Use of Machine Learning to Improve Human-Computer Interaction

    PubMed Central

    Arruti, Andoni; Cearreta, Idoia; Álvarez, Aitor; Lazkano, Elena; Sierra, Basilio

    2014-01-01

    Study of emotions in human–computer interaction is a growing research area. This paper shows an attempt to select the most significant features for emotion recognition in spoken Basque and Spanish Languages using different methods for feature selection. RekEmozio database was used as the experimental data set. Several Machine Learning paradigms were used for the emotion classification task. Experiments were executed in three phases, using different sets of features as classification variables in each phase. Moreover, feature subset selection was applied at each phase in order to seek for the most relevant feature subset. The three phases approach was selected to check the validity of the proposed approach. Achieved results show that an instance-based learning algorithm using feature subset selection techniques based on evolutionary algorithms is the best Machine Learning paradigm in automatic emotion recognition, with all different feature sets, obtaining a mean of 80,05% emotion recognition rate in Basque and a 74,82% in Spanish. In order to check the goodness of the proposed process, a greedy searching approach (FSS-Forward) has been applied and a comparison between them is provided. Based on achieved results, a set of most relevant non-speaker dependent features is proposed for both languages and new perspectives are suggested. PMID:25279686

  20. Gender, diabetes education, and psychosocial factors are associated with persistent poor glycemic control in patients with type 2 diabetes in the Joint Asia Diabetes Evaluation (JADE) program.

    PubMed

    Yin, Junmei; Yeung, Roseanne; Luk, Andrea; Tutino, Greg; Zhang, Yuying; Kong, Alice; Chung, Harriet; Wong, Rebecca; Ozaki, Risa; Ma, Ronald; Tsang, Chiu-Chi; Tong, Peter; So, Wingyee; Chan, Juliana

    2016-01-01

    Factors associated with persistent poor glycemic control were explored in patients with type 2 diabetes under the Joint Asia Diabetes Evaluation (JADE) program. Chinese adults enrolled in JADE with HbA1c ≥8% at initial comprehensive assessment (CA1) and repeat assessment were analyzed. The improved group was defined as those with a ≥1% absolute reduction in HbA1c, and the unimproved group was those with <1% reduction at the repeat CA (CA2). Of 4458 enrolled patients with HbA1c ≥8% at baseline, 1450 underwent repeat CA. After a median interval of 1.7 years (interquartile range[IQR] 1.1-2.2) between CA1 and CA2, the unimproved group (n = 677) had a mean 0.4% (95% confidence interval [CI] 0.3%, 0.5%) increase in HbA1c compared with a mean 2.8% reduction (95% CI -2.9, -2.6%) in the improved group (n = 773). The unimproved group had a female preponderance with lower education level, and was more likely to be insulin treated. Patients in the improved group received more diabetes education between CAs with improved self-care behaviors, whereas the unimproved group had worsening of health-related quality of life at CA2. Apart from female gender, long disease duration, low educational level, obesity, retinopathy, history of hypoglycemia, and insulin use, lack of education from diabetes nurses between CAs had the strongest association for persistent poor glycemic control. These results highlight the multidimensional nature of glycemic control, and the importance of diabetes education and optimizing diabetes care by considering psychosocial factors. © 2015 Ruijin Hospital, Shanghai Jiaotong University School of Medicine and Wiley Publishing Asia Pty Ltd.

  1. Optimum location of external markers using feature selection algorithms for real-time tumor tracking in external-beam radiotherapy: a virtual phantom study.

    PubMed

    Nankali, Saber; Torshabi, Ahmad Esmaili; Miandoab, Payam Samadi; Baghizadeh, Amin

    2016-01-08

    In external-beam radiotherapy, using external markers is one of the most reliable tools to predict tumor position, in clinical applications. The main challenge in this approach is tumor motion tracking with highest accuracy that depends heavily on external markers location, and this issue is the objective of this study. Four commercially available feature selection algorithms entitled 1) Correlation-based Feature Selection, 2) Classifier, 3) Principal Components, and 4) Relief were proposed to find optimum location of external markers in combination with two "Genetic" and "Ranker" searching procedures. The performance of these algorithms has been evaluated using four-dimensional extended cardiac-torso anthropomorphic phantom. Six tumors in lung, three tumors in liver, and 49 points on the thorax surface were taken into account to simulate internal and external motions, respectively. The root mean square error of an adaptive neuro-fuzzy inference system (ANFIS) as prediction model was considered as metric for quantitatively evaluating the performance of proposed feature selection algorithms. To do this, the thorax surface region was divided into nine smaller segments and predefined tumors motion was predicted by ANFIS using external motion data of given markers at each small segment, separately. Our comparative results showed that all feature selection algorithms can reasonably select specific external markers from those segments where the root mean square error of the ANFIS model is minimum. Moreover, the performance accuracy of proposed feature selection algorithms was compared, separately. For this, each tumor motion was predicted using motion data of those external markers selected by each feature selection algorithm. Duncan statistical test, followed by F-test, on final results reflected that all proposed feature selection algorithms have the same performance accuracy for lung tumors. But for liver tumors, a correlation-based feature selection algorithm, in

  2. Enhancement web proxy cache performance using Wrapper Feature Selection methods with NB and J48

    NASA Astrophysics Data System (ADS)

    Mahmoud Al-Qudah, Dua'a.; Funke Olanrewaju, Rashidah; Wong Azman, Amelia

    2017-11-01

    Web proxy cache technique reduces response time by storing a copy of pages between client and server sides. If requested pages are cached in the proxy, there is no need to access the server. Due to the limited size and excessive cost of cache compared to the other storages, cache replacement algorithm is used to determine evict page when the cache is full. On the other hand, the conventional algorithms for replacement such as Least Recently Use (LRU), First in First Out (FIFO), Least Frequently Use (LFU), Randomized Policy etc. may discard important pages just before use. Furthermore, using conventional algorithm cannot be well optimized since it requires some decision to intelligently evict a page before replacement. Hence, most researchers propose an integration among intelligent classifiers and replacement algorithm to improves replacement algorithms performance. This research proposes using automated wrapper feature selection methods to choose the best subset of features that are relevant and influence classifiers prediction accuracy. The result present that using wrapper feature selection methods namely: Best First (BFS), Incremental Wrapper subset selection(IWSS)embedded NB and particle swarm optimization(PSO)reduce number of features and have a good impact on reducing computation time. Using PSO enhance NB classifier accuracy by 1.1%, 0.43% and 0.22% over using NB with all features, using BFS and using IWSS embedded NB respectively. PSO rises J48 accuracy by 0.03%, 1.91 and 0.04% over using J48 classifier with all features, using IWSS-embedded NB and using BFS respectively. While using IWSS embedded NB fastest NB and J48 classifiers much more than BFS and PSO. However, it reduces computation time of NB by 0.1383 and reduce computation time of J48 by 2.998.

  3. Feature selection and definition for contours classification of thermograms in breast cancer detection

    NASA Astrophysics Data System (ADS)

    Jagodziński, Dariusz; Matysiewicz, Mateusz; Neumann, Łukasz; Nowak, Robert M.; Okuniewski, Rafał; Oleszkiewicz, Witold; Cichosz, Paweł

    2016-09-01

    This contribution introduces the method of cancer pathologies detection on breast skin temperature distribution images. The use of thermosensitive foils applied to the breasts skin allows to create thermograms, which displays the amount of infrared energy emitted by all breast cells. The significant foci of hyperthermia or inflammation are typical for cancer cells. That foci can be recognized on thermograms as a contours, which are the areas of higher temperature. Every contour can be converted to a feature set that describe it, using the raw, central, Hu, outline, Fourier and colour moments of image pixels processing. This paper defines also the new way of describing a set of contours through theirs neighbourhood relations. Contribution introduces moreover the way of ranking and selecting most relevant features. Authors used Neural Network with Gevrey`s concept and recursive feature elimination, to estimate feature importance.

  4. The Fisher-Markov selector: fast selecting maximally separable feature subset for multiclass classification with applications to high-dimensional data.

    PubMed

    Cheng, Qiang; Zhou, Hongbo; Cheng, Jie

    2011-06-01

    Selecting features for multiclass classification is a critically important task for pattern recognition and machine learning applications. Especially challenging is selecting an optimal subset of features from high-dimensional data, which typically have many more variables than observations and contain significant noise, missing components, or outliers. Existing methods either cannot handle high-dimensional data efficiently or scalably, or can only obtain local optimum instead of global optimum. Toward the selection of the globally optimal subset of features efficiently, we introduce a new selector--which we call the Fisher-Markov selector--to identify those features that are the most useful in describing essential differences among the possible groups. In particular, in this paper we present a way to represent essential discriminating characteristics together with the sparsity as an optimization objective. With properly identified measures for the sparseness and discriminativeness in possibly high-dimensional settings, we take a systematic approach for optimizing the measures to choose the best feature subset. We use Markov random field optimization techniques to solve the formulated objective functions for simultaneous feature selection. Our results are noncombinatorial, and they can achieve the exact global optimum of the objective function for some special kernels. The method is fast; in particular, it can be linear in the number of features and quadratic in the number of observations. We apply our procedure to a variety of real-world data, including mid--dimensional optical handwritten digit data set and high-dimensional microarray gene expression data sets. The effectiveness of our method is confirmed by experimental results. In pattern recognition and from a model selection viewpoint, our procedure says that it is possible to select the most discriminating subset of variables by solving a very simple unconstrained objective function which in fact can be

  5. Multisensor-based real-time quality monitoring by means of feature extraction, selection and modeling for Al alloy in arc welding

    NASA Astrophysics Data System (ADS)

    Zhang, Zhifen; Chen, Huabin; Xu, Yanling; Zhong, Jiyong; Lv, Na; Chen, Shanben

    2015-08-01

    Multisensory data fusion-based online welding quality monitoring has gained increasing attention in intelligent welding process. This paper mainly focuses on the automatic detection of typical welding defect for Al alloy in gas tungsten arc welding (GTAW) by means of analzing arc spectrum, sound and voltage signal. Based on the developed algorithms in time and frequency domain, 41 feature parameters were successively extracted from these signals to characterize the welding process and seam quality. Then, the proposed feature selection approach, i.e., hybrid fisher-based filter and wrapper was successfully utilized to evaluate the sensitivity of each feature and reduce the feature dimensions. Finally, the optimal feature subset with 19 features was selected to obtain the highest accuracy, i.e., 94.72% using established classification model. This study provides a guideline for feature extraction, selection and dynamic modeling based on heterogeneous multisensory data to achieve a reliable online defect detection system in arc welding.

  6. Sparse feature selection for classification and prediction of metastasis in endometrial cancer.

    PubMed

    Ahsen, Mehmet Eren; Boren, Todd P; Singh, Nitin K; Misganaw, Burook; Mutch, David G; Moore, Kathleen N; Backes, Floor J; McCourt, Carolyn K; Lea, Jayanthi S; Miller, David S; White, Michael A; Vidyasagar, Mathukumalli

    2017-03-27

    Metastasis via pelvic and/or para-aortic lymph nodes is a major risk factor for endometrial cancer. Lymph-node resection ameliorates risk but is associated with significant co-morbidities. Incidence in patients with stage I disease is 4-22% but no mechanism exists to accurately predict it. Therefore, national guidelines for primary staging surgery include pelvic and para-aortic lymph node dissection for all patients whose tumor exceeds 2cm in diameter. We sought to identify a robust molecular signature that can accurately classify risk of lymph node metastasis in endometrial cancer patients. 86 tumors matched for age and race, and evenly distributed between lymph node-positive and lymph node-negative cases, were selected as a training cohort. Genomic micro-RNA expression was profiled for each sample to serve as the predictive feature matrix. An independent set of 28 tumor samples was collected and similarly characterized to serve as a test cohort. A feature selection algorithm was designed for applications where the number of samples is far smaller than the number of measured features per sample. A predictive miRNA expression signature was developed using this algorithm, which was then used to predict the metastatic status of the independent test cohort. A weighted classifier, using 18 micro-RNAs, achieved 100% accuracy on the training cohort. When applied to the testing cohort, the classifier correctly predicted 90% of node-positive cases, and 80% of node-negative cases (FDR = 6.25%). Results indicate that the evaluation of the quantitative sparse-feature classifier proposed here in clinical trials may lead to significant improvement in the prediction of lymphatic metastases in endometrial cancer patients.

  7. A Quantum Hybrid PSO Combined with Fuzzy k-NN Approach to Feature Selection and Cell Classification in Cervical Cancer Detection.

    PubMed

    Iliyasu, Abdullah M; Fatichah, Chastine

    2017-12-19

    A quantum hybrid (QH) intelligent approach that blends the adaptive search capability of the quantum-behaved particle swarm optimisation (QPSO) method with the intuitionistic rationality of traditional fuzzy k -nearest neighbours (Fuzzy k -NN) algorithm (known simply as the Q-Fuzzy approach) is proposed for efficient feature selection and classification of cells in cervical smeared (CS) images. From an initial multitude of 17 features describing the geometry, colour, and texture of the CS images, the QPSO stage of our proposed technique is used to select the best subset features (i.e., global best particles) that represent a pruned down collection of seven features. Using a dataset of almost 1000 images, performance evaluation of our proposed Q-Fuzzy approach assesses the impact of our feature selection on classification accuracy by way of three experimental scenarios that are compared alongside two other approaches: the All-features (i.e., classification without prior feature selection) and another hybrid technique combining the standard PSO algorithm with the Fuzzy k -NN technique (P-Fuzzy approach). In the first and second scenarios, we further divided the assessment criteria in terms of classification accuracy based on the choice of best features and those in terms of the different categories of the cervical cells. In the third scenario, we introduced new QH hybrid techniques, i.e., QPSO combined with other supervised learning methods, and compared the classification accuracy alongside our proposed Q-Fuzzy approach. Furthermore, we employed statistical approaches to establish qualitative agreement with regards to the feature selection in the experimental scenarios 1 and 3. The synergy between the QPSO and Fuzzy k -NN in the proposed Q-Fuzzy approach improves classification accuracy as manifest in the reduction in number cell features, which is crucial for effective cervical cancer detection and diagnosis.

  8. Feature selection and classifier parameters estimation for EEG signals peak detection using particle swarm optimization.

    PubMed

    Adam, Asrul; Shapiai, Mohd Ibrahim; Tumari, Mohd Zaidi Mohd; Mohamad, Mohd Saberi; Mubin, Marizan

    2014-01-01

    Electroencephalogram (EEG) signal peak detection is widely used in clinical applications. The peak point can be detected using several approaches, including time, frequency, time-frequency, and nonlinear domains depending on various peak features from several models. However, there is no study that provides the importance of every peak feature in contributing to a good and generalized model. In this study, feature selection and classifier parameters estimation based on particle swarm optimization (PSO) are proposed as a framework for peak detection on EEG signals in time domain analysis. Two versions of PSO are used in the study: (1) standard PSO and (2) random asynchronous particle swarm optimization (RA-PSO). The proposed framework tries to find the best combination of all the available features that offers good peak detection and a high classification rate from the results in the conducted experiments. The evaluation results indicate that the accuracy of the peak detection can be improved up to 99.90% and 98.59% for training and testing, respectively, as compared to the framework without feature selection adaptation. Additionally, the proposed framework based on RA-PSO offers a better and reliable classification rate as compared to standard PSO as it produces low variance model.

  9. Feature Selection and Classifier Parameters Estimation for EEG Signals Peak Detection Using Particle Swarm Optimization

    PubMed Central

    Adam, Asrul; Mohd Tumari, Mohd Zaidi; Mohamad, Mohd Saberi

    2014-01-01

    Electroencephalogram (EEG) signal peak detection is widely used in clinical applications. The peak point can be detected using several approaches, including time, frequency, time-frequency, and nonlinear domains depending on various peak features from several models. However, there is no study that provides the importance of every peak feature in contributing to a good and generalized model. In this study, feature selection and classifier parameters estimation based on particle swarm optimization (PSO) are proposed as a framework for peak detection on EEG signals in time domain analysis. Two versions of PSO are used in the study: (1) standard PSO and (2) random asynchronous particle swarm optimization (RA-PSO). The proposed framework tries to find the best combination of all the available features that offers good peak detection and a high classification rate from the results in the conducted experiments. The evaluation results indicate that the accuracy of the peak detection can be improved up to 99.90% and 98.59% for training and testing, respectively, as compared to the framework without feature selection adaptation. Additionally, the proposed framework based on RA-PSO offers a better and reliable classification rate as compared to standard PSO as it produces low variance model. PMID:25243236

  10. Improved sparse decomposition based on a smoothed L0 norm using a Laplacian kernel to select features from fMRI data.

    PubMed

    Zhang, Chuncheng; Song, Sutao; Wen, Xiaotong; Yao, Li; Long, Zhiying

    2015-04-30

    Feature selection plays an important role in improving the classification accuracy of multivariate classification techniques in the context of fMRI-based decoding due to the "few samples and large features" nature of functional magnetic resonance imaging (fMRI) data. Recently, several sparse representation methods have been applied to the voxel selection of fMRI data. Despite the low computational efficiency of the sparse representation methods, they still displayed promise for applications that select features from fMRI data. In this study, we proposed the Laplacian smoothed L0 norm (LSL0) approach for feature selection of fMRI data. Based on the fast sparse decomposition using smoothed L0 norm (SL0) (Mohimani, 2007), the LSL0 method used the Laplacian function to approximate the L0 norm of sources. Results of the simulated and real fMRI data demonstrated the feasibility and robustness of LSL0 for the sparse source estimation and feature selection. Simulated results indicated that LSL0 produced more accurate source estimation than SL0 at high noise levels. The classification accuracy using voxels that were selected by LSL0 was higher than that by SL0 in both simulated and real fMRI experiment. Moreover, both LSL0 and SL0 showed higher classification accuracy and required less time than ICA and t-test for the fMRI decoding. LSL0 outperformed SL0 in sparse source estimation at high noise level and in feature selection. Moreover, LSL0 and SL0 showed better performance than ICA and t-test for feature selection. Copyright © 2015 Elsevier B.V. All rights reserved.

  11. Time course of spatial and feature selective attention for partly-occluded objects.

    PubMed

    Kasai, Tetsuko; Takeya, Ryuji

    2012-07-01

    Attention selects objects/groups as the most fundamental units, and this may be achieved by an attention-spreading mechanism. Previous event-related potential (ERP) studies have found that attention-spreading is reflected by a decrease in the N1 spatial attention effect. The present study tested whether the electrophysiological attention effect is associated with the perception of object unity or amodal completion through the use of partly-occluded objects. ERPs were recorded in 14 participants who were required to pay attention to their left or right visual field and to press a button for a target shape in the attended field. Bilateral stimuli were presented rapidly, and were separated, connected, or connected behind an occluder. Behavioral performance in the connected and occluded conditions was worse than that in the separated condition, indicating that attention spread over perceptual object representations after amodal completion. Consistently, the late N1 spatial attention effect (180-220 ms post-stimulus) and the early phase (230-280 ms) of feature selection effects (target N2) at contralateral sites decreased, equally for the occluded and connected conditions, while the attention effect in the early N1 latency (140-180 ms) shifted most positively for the occluded condition. These results suggest that perceptual organization processes for object recognition transiently modulate spatial and feature selection processes in the visual cortex. Copyright © 2012 Elsevier Ltd. All rights reserved.

  12. Audio-visual synchrony and feature-selective attention co-amplify early visual processing.

    PubMed

    Keitel, Christian; Müller, Matthias M

    2016-05-01

    Our brain relies on neural mechanisms of selective attention and converging sensory processing to efficiently cope with rich and unceasing multisensory inputs. One prominent assumption holds that audio-visual synchrony can act as a strong attractor for spatial attention. Here, we tested for a similar effect of audio-visual synchrony on feature-selective attention. We presented two superimposed Gabor patches that differed in colour and orientation. On each trial, participants were cued to selectively attend to one of the two patches. Over time, spatial frequencies of both patches varied sinusoidally at distinct rates (3.14 and 3.63 Hz), giving rise to pulse-like percepts. A simultaneously presented pure tone carried a frequency modulation at the pulse rate of one of the two visual stimuli to introduce audio-visual synchrony. Pulsed stimulation elicited distinct time-locked oscillatory electrophysiological brain responses. These steady-state responses were quantified in the spectral domain to examine individual stimulus processing under conditions of synchronous versus asynchronous tone presentation and when respective stimuli were attended versus unattended. We found that both, attending to the colour of a stimulus and its synchrony with the tone, enhanced its processing. Moreover, both gain effects combined linearly for attended in-sync stimuli. Our results suggest that audio-visual synchrony can attract attention to specific stimulus features when stimuli overlap in space.

  13. Feature selection and back-projection algorithms for nonline-of-sight laser-gated viewing

    NASA Astrophysics Data System (ADS)

    Laurenzis, Martin; Velten, Andreas

    2014-11-01

    We discuss new approaches to analyze laser-gated viewing data for nonline-of-sight vision with a frame-to-frame back-projection as well as feature selection algorithms. Although first back-projection approaches use time transients for each pixel, our method has the ability to calculate the projection of imaging data on the voxel space for each frame. Further, different data analysis algorithms and their sequential application were studied with the aim of identifying and selecting signals from different target positions. A slight modification of commonly used filters leads to a powerful selection of local maximum values. It is demonstrated that the choice of the filter has an impact on the selectivity i.e., multiple target detection as well as on the localization precision.

  14. Characterization of computer network events through simultaneous feature selection and clustering of intrusion alerts

    NASA Astrophysics Data System (ADS)

    Chen, Siyue; Leung, Henry; Dondo, Maxwell

    2014-05-01

    As computer network security threats increase, many organizations implement multiple Network Intrusion Detection Systems (NIDS) to maximize the likelihood of intrusion detection and provide a comprehensive understanding of intrusion activities. However, NIDS trigger a massive number of alerts on a daily basis. This can be overwhelming for computer network security analysts since it is a slow and tedious process to manually analyse each alert produced. Thus, automated and intelligent clustering of alerts is important to reveal the structural correlation of events by grouping alerts with common features. As the nature of computer network attacks, and therefore alerts, is not known in advance, unsupervised alert clustering is a promising approach to achieve this goal. We propose a joint optimization technique for feature selection and clustering to aggregate similar alerts and to reduce the number of alerts that analysts have to handle individually. More precisely, each identified feature is assigned a binary value, which reflects the feature's saliency. This value is treated as a hidden variable and incorporated into a likelihood function for clustering. Since computing the optimal solution of the likelihood function directly is analytically intractable, we use the Expectation-Maximisation (EM) algorithm to iteratively update the hidden variable and use it to maximize the expected likelihood. Our empirical results, using a labelled Defense Advanced Research Projects Agency (DARPA) 2000 reference dataset, show that the proposed method gives better results than the EM clustering without feature selection in terms of the clustering accuracy.

  15. Resolving Transition Metal Chemical Space: Feature Selection for Machine Learning and Structure-Property Relationships.

    PubMed

    Janet, Jon Paul; Kulik, Heather J

    2017-11-22

    Machine learning (ML) of quantum mechanical properties shows promise for accelerating chemical discovery. For transition metal chemistry where accurate calculations are computationally costly and available training data sets are small, the molecular representation becomes a critical ingredient in ML model predictive accuracy. We introduce a series of revised autocorrelation functions (RACs) that encode relationships of the heuristic atomic properties (e.g., size, connectivity, and electronegativity) on a molecular graph. We alter the starting point, scope, and nature of the quantities evaluated in standard ACs to make these RACs amenable to inorganic chemistry. On an organic molecule set, we first demonstrate superior standard AC performance to other presently available topological descriptors for ML model training, with mean unsigned errors (MUEs) for atomization energies on set-aside test molecules as low as 6 kcal/mol. For inorganic chemistry, our RACs yield 1 kcal/mol ML MUEs on set-aside test molecules in spin-state splitting in comparison to 15-20× higher errors for feature sets that encode whole-molecule structural information. Systematic feature selection methods including univariate filtering, recursive feature elimination, and direct optimization (e.g., random forest and LASSO) are compared. Random-forest- or LASSO-selected subsets 4-5× smaller than the full RAC set produce sub- to 1 kcal/mol spin-splitting MUEs, with good transferability to metal-ligand bond length prediction (0.004-5 Å MUE) and redox potential on a smaller data set (0.2-0.3 eV MUE). Evaluation of feature selection results across property sets reveals the relative importance of local, electronic descriptors (e.g., electronegativity, atomic number) in spin-splitting and distal, steric effects in redox potential and bond lengths.

  16. LFSPMC: Linear feature selection program using the probability of misclassification

    NASA Technical Reports Server (NTRS)

    Guseman, L. F., Jr.; Marion, B. P.

    1975-01-01

    The computational procedure and associated computer program for a linear feature selection technique are presented. The technique assumes that: a finite number, m, of classes exists; each class is described by an n-dimensional multivariate normal density function of its measurement vectors; the mean vector and covariance matrix for each density function are known (or can be estimated); and the a priori probability for each class is known. The technique produces a single linear combination of the original measurements which minimizes the one-dimensional probability of misclassification defined by the transformed densities.

  17. Rough-Fuzzy Clustering and Unsupervised Feature Selection for Wavelet Based MR Image Segmentation

    PubMed Central

    Maji, Pradipta; Roy, Shaswati

    2015-01-01

    Image segmentation is an indispensable process in the visualization of human tissues, particularly during clinical analysis of brain magnetic resonance (MR) images. For many human experts, manual segmentation is a difficult and time consuming task, which makes an automated brain MR image segmentation method desirable. In this regard, this paper presents a new segmentation method for brain MR images, integrating judiciously the merits of rough-fuzzy computing and multiresolution image analysis technique. The proposed method assumes that the major brain tissues, namely, gray matter, white matter, and cerebrospinal fluid from the MR images are considered to have different textural properties. The dyadic wavelet analysis is used to extract the scale-space feature vector for each pixel, while the rough-fuzzy clustering is used to address the uncertainty problem of brain MR image segmentation. An unsupervised feature selection method is introduced, based on maximum relevance-maximum significance criterion, to select relevant and significant textural features for segmentation problem, while the mathematical morphology based skull stripping preprocessing step is proposed to remove the non-cerebral tissues like skull. The performance of the proposed method, along with a comparison with related approaches, is demonstrated on a set of synthetic and real brain MR images using standard validity indices. PMID:25848961

  18. Optimum location of external markers using feature selection algorithms for real‐time tumor tracking in external‐beam radiotherapy: a virtual phantom study

    PubMed Central

    Nankali, Saber; Miandoab, Payam Samadi; Baghizadeh, Amin

    2016-01-01

    In external‐beam radiotherapy, using external markers is one of the most reliable tools to predict tumor position, in clinical applications. The main challenge in this approach is tumor motion tracking with highest accuracy that depends heavily on external markers location, and this issue is the objective of this study. Four commercially available feature selection algorithms entitled 1) Correlation‐based Feature Selection, 2) Classifier, 3) Principal Components, and 4) Relief were proposed to find optimum location of external markers in combination with two “Genetic” and “Ranker” searching procedures. The performance of these algorithms has been evaluated using four‐dimensional extended cardiac‐torso anthropomorphic phantom. Six tumors in lung, three tumors in liver, and 49 points on the thorax surface were taken into account to simulate internal and external motions, respectively. The root mean square error of an adaptive neuro‐fuzzy inference system (ANFIS) as prediction model was considered as metric for quantitatively evaluating the performance of proposed feature selection algorithms. To do this, the thorax surface region was divided into nine smaller segments and predefined tumors motion was predicted by ANFIS using external motion data of given markers at each small segment, separately. Our comparative results showed that all feature selection algorithms can reasonably select specific external markers from those segments where the root mean square error of the ANFIS model is minimum. Moreover, the performance accuracy of proposed feature selection algorithms was compared, separately. For this, each tumor motion was predicted using motion data of those external markers selected by each feature selection algorithm. Duncan statistical test, followed by F‐test, on final results reflected that all proposed feature selection algorithms have the same performance accuracy for lung tumors. But for liver tumors, a correlation‐based feature

  19. A hybrid approach to select features and classify diseases based on medical data

    NASA Astrophysics Data System (ADS)

    AbdelLatif, Hisham; Luo, Jiawei

    2018-03-01

    Feature selection is popular problem in the classification of diseases in clinical medicine. Here, we developing a hybrid methodology to classify diseases, based on three medical datasets, Arrhythmia, Breast cancer, and Hepatitis datasets. This methodology called k-means ANOVA Support Vector Machine (K-ANOVA-SVM) uses K-means cluster with ANOVA statistical to preprocessing data and selection the significant features, and Support Vector Machines in the classification process. To compare and evaluate the performance, we choice three classification algorithms, decision tree Naïve Bayes, Support Vector Machines and applied the medical datasets direct to these algorithms. Our methodology was a much better classification accuracy is given of 98% in Arrhythmia datasets, 92% in Breast cancer datasets and 88% in Hepatitis datasets, Compare to use the medical data directly with decision tree Naïve Bayes, and Support Vector Machines. Also, the ROC curve and precision with (K-ANOVA-SVM) Achieved best results than other algorithms

  20. Fatigue level estimation of monetary bills based on frequency band acoustic signals with feature selection by supervised SOM

    NASA Astrophysics Data System (ADS)

    Teranishi, Masaru; Omatu, Sigeru; Kosaka, Toshihisa

    Fatigued monetary bills adversely affect the daily operation of automated teller machines (ATMs). In order to make the classification of fatigued bills more efficient, the development of an automatic fatigued monetary bill classification method is desirable. We propose a new method by which to estimate the fatigue level of monetary bills from the feature-selected frequency band acoustic energy pattern of banking machines. By using a supervised self-organizing map (SOM), we effectively estimate the fatigue level using only the feature-selected frequency band acoustic energy pattern. Furthermore, the feature-selected frequency band acoustic energy pattern improves the estimation accuracy of the fatigue level of monetary bills by adding frequency domain information to the acoustic energy pattern. The experimental results with real monetary bill samples reveal the effectiveness of the proposed method.

  1. Pattern classification using an olfactory model with PCA feature selection in electronic noses: study and application.

    PubMed

    Fu, Jun; Huang, Canqin; Xing, Jianguo; Zheng, Junbao

    2012-01-01

    Biologically-inspired models and algorithms are considered as promising sensor array signal processing methods for electronic noses. Feature selection is one of the most important issues for developing robust pattern recognition models in machine learning. This paper describes an investigation into the classification performance of a bionic olfactory model with the increase of the dimensions of input feature vector (outer factor) as well as its parallel channels (inner factor). The principal component analysis technique was applied for feature selection and dimension reduction. Two data sets of three classes of wine derived from different cultivars and five classes of green tea derived from five different provinces of China were used for experiments. In the former case the results showed that the average correct classification rate increased as more principal components were put in to feature vector. In the latter case the results showed that sufficient parallel channels should be reserved in the model to avoid pattern space crowding. We concluded that 6~8 channels of the model with principal component feature vector values of at least 90% cumulative variance is adequate for a classification task of 3~5 pattern classes considering the trade-off between time consumption and classification rate.

  2. Effects of changing canopy directional reflectance on feature selection

    NASA Technical Reports Server (NTRS)

    Smith, J. A.; Oliver, R. E.; Kilpela, O. E.

    1973-01-01

    The use of a Monte Carlo model for generating sample directional reflectance data for two simplified target canopies at two different solar positions is reported. Successive iterations through the model permit the calculation of a mean vector and covariance matrix for canopy reflectance for varied sensor view angles. These data may then be used to calculate the divergence between the target distributions for various wavelength combinations and for these view angles. Results of a feature selection analysis indicate that different sets of wavelengths are optimum for target discrimination depending on sensor view angle and that the targets may be more easily discriminated for some scan angles than others. The time-varying behavior of these results is also pointed out.

  3. Automatic ICD-10 multi-class classification of cause of death from plaintext autopsy reports through expert-driven feature selection.

    PubMed

    Mujtaba, Ghulam; Shuib, Liyana; Raj, Ram Gopal; Rajandram, Retnagowri; Shaikh, Khairunisa; Al-Garadi, Mohammed Ali

    2017-01-01

    Widespread implementation of electronic databases has improved the accessibility of plaintext clinical information for supplementary use. Numerous machine learning techniques, such as supervised machine learning approaches or ontology-based approaches, have been employed to obtain useful information from plaintext clinical data. This study proposes an automatic multi-class classification system to predict accident-related causes of death from plaintext autopsy reports through expert-driven feature selection with supervised automatic text classification decision models. Accident-related autopsy reports were obtained from one of the largest hospital in Kuala Lumpur. These reports belong to nine different accident-related causes of death. Master feature vector was prepared by extracting features from the collected autopsy reports by using unigram with lexical categorization. This master feature vector was used to detect cause of death [according to internal classification of disease version 10 (ICD-10) classification system] through five automated feature selection schemes, proposed expert-driven approach, five subset sizes of features, and five machine learning classifiers. Model performance was evaluated using precisionM, recallM, F-measureM, accuracy, and area under ROC curve. Four baselines were used to compare the results with the proposed system. Random forest and J48 decision models parameterized using expert-driven feature selection yielded the highest evaluation measure approaching (85% to 90%) for most metrics by using a feature subset size of 30. The proposed system also showed approximately 14% to 16% improvement in the overall accuracy compared with the existing techniques and four baselines. The proposed system is feasible and practical to use for automatic classification of ICD-10-related cause of death from autopsy reports. The proposed system assists pathologists to accurately and rapidly determine underlying cause of death based on autopsy findings

  4. A Filter Feature Selection Method Based on MFA Score and Redundancy Excluding and It's Application to Tumor Gene Expression Data Analysis.

    PubMed

    Li, Jiangeng; Su, Lei; Pang, Zenan

    2015-12-01

    Feature selection techniques have been widely applied to tumor gene expression data analysis in recent years. A filter feature selection method named marginal Fisher analysis score (MFA score) which is based on graph embedding has been proposed, and it has been widely used mainly because it is superior to Fisher score. Considering the heavy redundancy in gene expression data, we proposed a new filter feature selection technique in this paper. It is named MFA score+ and is based on MFA score and redundancy excluding. We applied it to an artificial dataset and eight tumor gene expression datasets to select important features and then used support vector machine as the classifier to classify the samples. Compared with MFA score, t test and Fisher score, it achieved higher classification accuracy.

  5. Distributed Spacing Stochastic Feature Selection and its Application to Textile Classification

    DTIC Science & Technology

    2011-09-01

    Spandex, (b) 65% Polyester / 35% Cot- ton vs 94% Polyester / 6% Spandex, (c) 65% Polyester / 35% Cotton vs 100% Cotton , and (d) 65% Polyester / 35% Cotton ...3-29 3.10. This is an example of the final feature selection process for 100% Cotton Woven, with acceptable distributed spacing set to a 35...3-40 4.1. Representative samples from the 12 class textile data set: 65% Polyester 35% Cotton Woven (red), 80% Nylon 20% Spandex Knit (green), 97

  6. Identification of landscape features influencing gene flow: How useful are habitat selection models?

    USGS Publications Warehouse

    Roffler, Gretchen H.; Schwartz, Michael K.; Pilgrim, Kristy L.; Talbot, Sandra L.; Sage, Kevin; Adams, Layne G.; Luikart, Gordon

    2016-01-01

    Understanding how dispersal patterns are influenced by landscape heterogeneity is critical for modeling species connectivity. Resource selection function (RSF) models are increasingly used in landscape genetics approaches. However, because the ecological factors that drive habitat selection may be different from those influencing dispersal and gene flow, it is important to consider explicit assumptions and spatial scales of measurement. We calculated pairwise genetic distance among 301 Dall's sheep (Ovis dalli dalli) in southcentral Alaska using an intensive noninvasive sampling effort and 15 microsatellite loci. We used multiple regression of distance matrices to assess the correlation of pairwise genetic distance and landscape resistance derived from an RSF, and combinations of landscape features hypothesized to influence dispersal. Dall's sheep gene flow was positively correlated with steep slopes, moderate peak normalized difference vegetation indices (NDVI), and open land cover. Whereas RSF covariates were significant in predicting genetic distance, the RSF model itself was not significantly correlated with Dall's sheep gene flow, suggesting that certain habitat features important during summer (rugged terrain, mid-range elevation) were not influential to effective dispersal. This work underscores that consideration of both habitat selection and landscape genetics models may be useful in developing management strategies to both meet the immediate survival of a species and allow for long-term genetic connectivity.

  7. Oral cancer prognosis based on clinicopathologic and genomic markers using a hybrid of feature selection and machine learning methods

    PubMed Central

    2013-01-01

    Background Machine learning techniques are becoming useful as an alternative approach to conventional medical diagnosis or prognosis as they are good for handling noisy and incomplete data, and significant results can be attained despite a small sample size. Traditionally, clinicians make prognostic decisions based on clinicopathologic markers. However, it is not easy for the most skilful clinician to come out with an accurate prognosis by using these markers alone. Thus, there is a need to use genomic markers to improve the accuracy of prognosis. The main aim of this research is to apply a hybrid of feature selection and machine learning methods in oral cancer prognosis based on the parameters of the correlation of clinicopathologic and genomic markers. Results In the first stage of this research, five feature selection methods have been proposed and experimented on the oral cancer prognosis dataset. In the second stage, the model with the features selected from each feature selection methods are tested on the proposed classifiers. Four types of classifiers are chosen; these are namely, ANFIS, artificial neural network, support vector machine and logistic regression. A k-fold cross-validation is implemented on all types of classifiers due to the small sample size. The hybrid model of ReliefF-GA-ANFIS with 3-input features of drink, invasion and p63 achieved the best accuracy (accuracy = 93.81%; AUC = 0.90) for the oral cancer prognosis. Conclusions The results revealed that the prognosis is superior with the presence of both clinicopathologic and genomic markers. The selected features can be investigated further to validate the potential of becoming as significant prognostic signature in the oral cancer studies. PMID:23725313

  8. Computer aided diagnosis system for Alzheimer disease using brain diffusion tensor imaging features selected by Pearson's correlation.

    PubMed

    Graña, M; Termenon, M; Savio, A; Gonzalez-Pinto, A; Echeveste, J; Pérez, J M; Besga, A

    2011-09-20

    The aim of this paper is to obtain discriminant features from two scalar measures of Diffusion Tensor Imaging (DTI) data, Fractional Anisotropy (FA) and Mean Diffusivity (MD), and to train and test classifiers able to discriminate Alzheimer's Disease (AD) patients from controls on the basis of features extracted from the FA or MD volumes. In this study, support vector machine (SVM) classifier was trained and tested on FA and MD data. Feature selection is done computing the Pearson's correlation between FA or MD values at voxel site across subjects and the indicative variable specifying the subject class. Voxel sites with high absolute correlation are selected for feature extraction. Results are obtained over an on-going study in Hospital de Santiago Apostol collecting anatomical T1-weighted MRI volumes and DTI data from healthy control subjects and AD patients. FA features and a linear SVM classifier achieve perfect accuracy, sensitivity and specificity in several cross-validation studies, supporting the usefulness of DTI-derived features as an image-marker for AD and to the feasibility of building Computer Aided Diagnosis systems for AD based on them. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.

  9. Feature Transformation Detection Method with Best Spectral Band Selection Process for Hyper-spectral Imaging

    NASA Astrophysics Data System (ADS)

    Chen, Hai-Wen; McGurr, Mike; Brickhouse, Mark

    2015-11-01

    We present a newly developed feature transformation (FT) detection method for hyper-spectral imagery (HSI) sensors. In essence, the FT method, by transforming the original features (spectral bands) to a different feature domain, may considerably increase the statistical separation between the target and background probability density functions, and thus may significantly improve the target detection and identification performance, as evidenced by the test results in this paper. We show that by differentiating the original spectral, one can completely separate targets from the background using a single spectral band, leading to perfect detection results. In addition, we have proposed an automated best spectral band selection process with a double-threshold scheme that can rank the available spectral bands from the best to the worst for target detection. Finally, we have also proposed an automated cross-spectrum fusion process to further improve the detection performance in lower spectral range (<1000 nm) by selecting the best spectral band pair with multivariate analysis. Promising detection performance has been achieved using a small background material signature library for concept-proving, and has then been further evaluated and verified using a real background HSI scene collected by a HYDICE sensor.

  10. Is the emotion recognition deficit associated with frontotemporal dementia caused by selective inattention to diagnostic facial features?

    PubMed

    Oliver, Lindsay D; Virani, Karim; Finger, Elizabeth C; Mitchell, Derek G V

    2014-07-01

    Frontotemporal dementia (FTD) is a debilitating neurodegenerative disorder characterized by severely impaired social and emotional behaviour, including emotion recognition deficits. Though fear recognition impairments seen in particular neurological and developmental disorders can be ameliorated by reallocating attention to critical facial features, the possibility that similar benefits can be conferred to patients with FTD has yet to be explored. In the current study, we examined the impact of presenting distinct regions of the face (whole face, eyes-only, and eyes-removed) on the ability to recognize expressions of anger, fear, disgust, and happiness in 24 patients with FTD and 24 healthy controls. A recognition deficit was demonstrated across emotions by patients with FTD relative to controls. Crucially, removal of diagnostic facial features resulted in an appropriate decline in performance for both groups; furthermore, patients with FTD demonstrated a lack of disproportionate improvement in emotion recognition accuracy as a result of isolating critical facial features relative to controls. Thus, unlike some neurological and developmental disorders featuring amygdala dysfunction, the emotion recognition deficit observed in FTD is not likely driven by selective inattention to critical facial features. Patients with FTD also mislabelled negative facial expressions as happy more often than controls, providing further evidence for abnormalities in the representation of positive affect in FTD. This work suggests that the emotional expression recognition deficit associated with FTD is unlikely to be rectified by adjusting selective attention to diagnostic features, as has proven useful in other select disorders. Copyright © 2014 Elsevier Ltd. All rights reserved.

  11. Coverage of Jade Goody's cervical cancer in UK newspapers: a missed opportunity for health promotion?

    PubMed Central

    2010-01-01

    Background It has been claimed that publicity surrounding popular celebrity Jade Goody's experience of cervical cancer will raise awareness about the disease. This study examines the content of newspaper articles covering her illness to consider whether 'mobilising information' which could encourage women to adopt risk-reducing and health promoting behaviours has been included. Methods Content analysis of 15 national newspapers published between August 2008 and April 2009 Findings In the extensive coverage of Goody's illness (527 articles in the 7 months of study) few newspaper articles included information that might make women more aware of the signs and symptoms or risk factors for the disease, or discussed the role of the human papilloma virus (HPV) and the recently introduced HPV vaccination programme to reduce the future incidence of cervical cancer. For example, less than 5% of articles mentioned well-known risk-factors for cervical cancer and less than 8% gave any information about HPV. The 'human interest' aspects of Goody's illness (her treatment, the spread of her disease in later months, her wedding, and her preparations for her children's future) were more extensively covered. Conclusions Newspaper coverage of Goody's illness has tended not to include factual or educational information that could mobilise or inform women, or help them to recognise early symptoms. However, the focus on personal tragedy may encourage women to be receptive to HPV vaccination or screening if her story acts as a reminder that cervical cancer can be a devastating and fatal disease in the longer term. PMID:20576115

  12. Fluoro jade-C staining in the assessment of brain injury after deep hypothermia circulatory arrest.

    PubMed

    Wang, Ren; Ma, Wei-Guo; Gao, Guo-Dong; Mao, Qun-Xia; Zheng, Jun; Sun, Li-Zhong; Liu, Ying-Long

    2011-02-04

    To evaluate the efficacy of Fluoro Jade-C staining (FJC) in the assessment of brain injury after deep hypothermia circulatory arrest (DHCA). Six healthy adult miniature male pigs underwent DHCA, the rectal temperature was down to 18°C, circulation was stopped , circulatory arrest was maintained for 60 minutes. On postoperative day 1, perfusion-fixation was performed on brain tissue. Cerebral cortex, hippocampus, cerebellum were taken for sampling. FJC, hematoxylin-eosin staining (HE), nissl staining (NISSL), terminal deoxynucleotidyl transferase biotin-dUTP nick end labeling (TUNEL) were performed to detect the histological and pathological changes. Histological scores of all slices were ranked. Comparison between the FJC and other techniques was done by analysis of variance (ANOVA) according to histological scores. All animals survived the operation. On the cerebral cortex, in comparison of FJC between HE, NISSL and TUNEL, the p value was 0.90, 0.40, 0.16 respectively (p>0.05). On the hippocampus, the comparison of FJC with HE, NISSL and TUNEL had a p value of 0.12, 0.23, 0.62 respectively (p>0.05). On the cerebellum, in comparing FJC with HE, NISSL and TUNEL, the p value was 0.96, 0.77, 0.96 respectively (p>0.05). On representative regions, the results of FJC were in accordance with that of TUNEL, NISSL and HE. Furthermore, ascertainment of brain injury is easier with FJC. FJC is a reliable and convenient method to assess brain injury after DHCA. Copyright © 2010 Elsevier B.V. All rights reserved.

  13. Harmony Search as a Powerful Tool for Feature Selection in QSPR Study of the Drugs Lipophilicity.

    PubMed

    Bahadori, Behnoosh; Atabati, Morteza

    2017-01-01

    Aims & Scope: Lipophilicity represents one of the most studied and most frequently used fundamental physicochemical properties. In the present work, harmony search (HS) algorithm is suggested to feature selection in quantitative structure-property relationship (QSPR) modeling to predict lipophilicity of neutral, acidic, basic and amphotheric drugs that were determined by UHPLC. Harmony search is a music-based metaheuristic optimization algorithm. It was affected by the observation that the aim of music is to search for a perfect state of harmony. Semi-empirical quantum-chemical calculations at AM1 level were used to find the optimum 3D geometry of the studied molecules and variant descriptors (1497 descriptors) were calculated by the Dragon software. The selected descriptors by harmony search algorithm (9 descriptors) were applied for model development using multiple linear regression (MLR). In comparison with other feature selection methods such as genetic algorithm and simulated annealing, harmony search algorithm has better results. The root mean square error (RMSE) with and without leave-one out cross validation (LOOCV) were obtained 0.417 and 0.302, respectively. The results were compared with those obtained from the genetic algorithm and simulated annealing methods and it showed that the HS is a helpful tool for feature selection with fine performance. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  14. Teaching the Sociology of Popular Music with the Help of Feature Films: A Selected and Annotated Videography.

    ERIC Educational Resources Information Center

    Groce, Stephen B.

    1992-01-01

    Discusses the use of feature films for courses on popular culture and the sociology of popular music. Suggests that films can illustrate topics such as culture, social groups, deviant behavior, racism, and sexism. Lists a selection of Hollywood feature films with accompanying readings and students' evaluations. (DK)

  15. Sparse generalized linear model with L0 approximation for feature selection and prediction with big omics data.

    PubMed

    Liu, Zhenqiu; Sun, Fengzhu; McGovern, Dermot P

    2017-01-01

    Feature selection and prediction are the most important tasks for big data mining. The common strategies for feature selection in big data mining are L 1 , SCAD and MC+. However, none of the existing algorithms optimizes L 0 , which penalizes the number of nonzero features directly. In this paper, we develop a novel sparse generalized linear model (GLM) with L 0 approximation for feature selection and prediction with big omics data. The proposed approach approximate the L 0 optimization directly. Even though the original L 0 problem is non-convex, the problem is approximated by sequential convex optimizations with the proposed algorithm. The proposed method is easy to implement with only several lines of code. Novel adaptive ridge algorithms ( L 0 ADRIDGE) for L 0 penalized GLM with ultra high dimensional big data are developed. The proposed approach outperforms the other cutting edge regularization methods including SCAD and MC+ in simulations. When it is applied to integrated analysis of mRNA, microRNA, and methylation data from TCGA ovarian cancer, multilevel gene signatures associated with suboptimal debulking are identified simultaneously. The biological significance and potential clinical importance of those genes are further explored. The developed Software L 0 ADRIDGE in MATLAB is available at https://github.com/liuzqx/L0adridge.

  16. Pattern Classification Using an Olfactory Model with PCA Feature Selection in Electronic Noses: Study and Application

    PubMed Central

    Fu, Jun; Huang, Canqin; Xing, Jianguo; Zheng, Junbao

    2012-01-01

    Biologically-inspired models and algorithms are considered as promising sensor array signal processing methods for electronic noses. Feature selection is one of the most important issues for developing robust pattern recognition models in machine learning. This paper describes an investigation into the classification performance of a bionic olfactory model with the increase of the dimensions of input feature vector (outer factor) as well as its parallel channels (inner factor). The principal component analysis technique was applied for feature selection and dimension reduction. Two data sets of three classes of wine derived from different cultivars and five classes of green tea derived from five different provinces of China were used for experiments. In the former case the results showed that the average correct classification rate increased as more principal components were put in to feature vector. In the latter case the results showed that sufficient parallel channels should be reserved in the model to avoid pattern space crowding. We concluded that 6∼8 channels of the model with principal component feature vector values of at least 90% cumulative variance is adequate for a classification task of 3∼5 pattern classes considering the trade-off between time consumption and classification rate. PMID:22736979

  17. Recurrence predictive models for patients with hepatocellular carcinoma after radiofrequency ablation using support vector machines with feature selection methods.

    PubMed

    Liang, Ja-Der; Ping, Xiao-Ou; Tseng, Yi-Ju; Huang, Guan-Tarn; Lai, Feipei; Yang, Pei-Ming

    2014-12-01

    Recurrence of hepatocellular carcinoma (HCC) is an important issue despite effective treatments with tumor eradication. Identification of patients who are at high risk for recurrence may provide more efficacious screening and detection of tumor recurrence. The aim of this study was to develop recurrence predictive models for HCC patients who received radiofrequency ablation (RFA) treatment. From January 2007 to December 2009, 83 newly diagnosed HCC patients receiving RFA as their first treatment were enrolled. Five feature selection methods including genetic algorithm (GA), simulated annealing (SA) algorithm, random forests (RF) and hybrid methods (GA+RF and SA+RF) were utilized for selecting an important subset of features from a total of 16 clinical features. These feature selection methods were combined with support vector machine (SVM) for developing predictive models with better performance. Five-fold cross-validation was used to train and test SVM models. The developed SVM-based predictive models with hybrid feature selection methods and 5-fold cross-validation had averages of the sensitivity, specificity, accuracy, positive predictive value, negative predictive value, and area under the ROC curve as 67%, 86%, 82%, 69%, 90%, and 0.69, respectively. The SVM derived predictive model can provide suggestive high-risk recurrent patients, who should be closely followed up after complete RFA treatment. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  18. Gene features selection for three-class disease classification via multiple orthogonal partial least square discriminant analysis and S-plot using microarray data.

    PubMed

    Yang, Mingxing; Li, Xiumin; Li, Zhibin; Ou, Zhimin; Liu, Ming; Liu, Suhuan; Li, Xuejun; Yang, Shuyu

    2013-01-01

    DNA microarray analysis is characterized by obtaining a large number of gene variables from a small number of observations. Cluster analysis is widely used to analyze DNA microarray data to make classification and diagnosis of disease. Because there are so many irrelevant and insignificant genes in a dataset, a feature selection approach must be employed in data analysis. The performance of cluster analysis of this high-throughput data depends on whether the feature selection approach chooses the most relevant genes associated with disease classes. Here we proposed a new method using multiple Orthogonal Partial Least Squares-Discriminant Analysis (mOPLS-DA) models and S-plots to select the most relevant genes to conduct three-class disease classification and prediction. We tested our method using Golub's leukemia microarray data. For three classes with subtypes, we proposed hierarchical orthogonal partial least squares-discriminant analysis (OPLS-DA) models and S-plots to select features for two main classes and their subtypes. For three classes in parallel, we employed three OPLS-DA models and S-plots to choose marker genes for each class. The power of feature selection to classify and predict three-class disease was evaluated using cluster analysis. Further, the general performance of our method was tested using four public datasets and compared with those of four other feature selection methods. The results revealed that our method effectively selected the most relevant features for disease classification and prediction, and its performance was better than that of the other methods.

  19. A comparison of three feature selection methods for object-based classification of sub-decimeter resolution UltraCam-L imagery

    USDA-ARS?s Scientific Manuscript database

    The availability of numerous spectral, spatial, and contextual features with object-based image analysis (OBIA) renders the selection of optimal features a time consuming and subjective process. While several feature election methods have been used in conjunction with OBIA, a robust comparison of th...

  20. Hybrid Binary Imperialist Competition Algorithm and Tabu Search Approach for Feature Selection Using Gene Expression Data.

    PubMed

    Wang, Shuaiqun; Aorigele; Kong, Wei; Zeng, Weiming; Hong, Xiaomin

    2016-01-01

    Gene expression data composed of thousands of genes play an important role in classification platforms and disease diagnosis. Hence, it is vital to select a small subset of salient features over a large number of gene expression data. Lately, many researchers devote themselves to feature selection using diverse computational intelligence methods. However, in the progress of selecting informative genes, many computational methods face difficulties in selecting small subsets for cancer classification due to the huge number of genes (high dimension) compared to the small number of samples, noisy genes, and irrelevant genes. In this paper, we propose a new hybrid algorithm HICATS incorporating imperialist competition algorithm (ICA) which performs global search and tabu search (TS) that conducts fine-tuned search. In order to verify the performance of the proposed algorithm HICATS, we have tested it on 10 well-known benchmark gene expression classification datasets with dimensions varying from 2308 to 12600. The performance of our proposed method proved to be superior to other related works including the conventional version of binary optimization algorithm in terms of classification accuracy and the number of selected genes.

  1. Hybrid Binary Imperialist Competition Algorithm and Tabu Search Approach for Feature Selection Using Gene Expression Data

    PubMed Central

    Aorigele; Zeng, Weiming; Hong, Xiaomin

    2016-01-01

    Gene expression data composed of thousands of genes play an important role in classification platforms and disease diagnosis. Hence, it is vital to select a small subset of salient features over a large number of gene expression data. Lately, many researchers devote themselves to feature selection using diverse computational intelligence methods. However, in the progress of selecting informative genes, many computational methods face difficulties in selecting small subsets for cancer classification due to the huge number of genes (high dimension) compared to the small number of samples, noisy genes, and irrelevant genes. In this paper, we propose a new hybrid algorithm HICATS incorporating imperialist competition algorithm (ICA) which performs global search and tabu search (TS) that conducts fine-tuned search. In order to verify the performance of the proposed algorithm HICATS, we have tested it on 10 well-known benchmark gene expression classification datasets with dimensions varying from 2308 to 12600. The performance of our proposed method proved to be superior to other related works including the conventional version of binary optimization algorithm in terms of classification accuracy and the number of selected genes. PMID:27579323

  2. The application of feature selection to the development of Gaussian process models for percutaneous absorption.

    PubMed

    Lam, Lun Tak; Sun, Yi; Davey, Neil; Adams, Rod; Prapopoulou, Maria; Brown, Marc B; Moss, Gary P

    2010-06-01

    The aim was to employ Gaussian processes to assess mathematically the nature of a skin permeability dataset and to employ these methods, particularly feature selection, to determine the key physicochemical descriptors which exert the most significant influence on percutaneous absorption, and to compare such models with established existing models. Gaussian processes, including automatic relevance detection (GPRARD) methods, were employed to develop models of percutaneous absorption that identified key physicochemical descriptors of percutaneous absorption. Using MatLab software, the statistical performance of these models was compared with single linear networks (SLN) and quantitative structure-permeability relationships (QSPRs). Feature selection methods were used to examine in more detail the physicochemical parameters used in this study. A range of statistical measures to determine model quality were used. The inherently nonlinear nature of the skin data set was confirmed. The Gaussian process regression (GPR) methods yielded predictive models that offered statistically significant improvements over SLN and QSPR models with regard to predictivity (where the rank order was: GPR > SLN > QSPR). Feature selection analysis determined that the best GPR models were those that contained log P, melting point and the number of hydrogen bond donor groups as significant descriptors. Further statistical analysis also found that great synergy existed between certain parameters. It suggested that a number of the descriptors employed were effectively interchangeable, thus questioning the use of models where discrete variables are output, usually in the form of an equation. The use of a nonlinear GPR method produced models with significantly improved predictivity, compared with SLN or QSPR models. Feature selection methods were able to provide important mechanistic information. However, it was also shown that significant synergy existed between certain parameters, and as such it

  3. Automatic ICD-10 multi-class classification of cause of death from plaintext autopsy reports through expert-driven feature selection

    PubMed Central

    Mujtaba, Ghulam; Shuib, Liyana; Raj, Ram Gopal; Rajandram, Retnagowri; Shaikh, Khairunisa; Al-Garadi, Mohammed Ali

    2017-01-01

    Objectives Widespread implementation of electronic databases has improved the accessibility of plaintext clinical information for supplementary use. Numerous machine learning techniques, such as supervised machine learning approaches or ontology-based approaches, have been employed to obtain useful information from plaintext clinical data. This study proposes an automatic multi-class classification system to predict accident-related causes of death from plaintext autopsy reports through expert-driven feature selection with supervised automatic text classification decision models. Methods Accident-related autopsy reports were obtained from one of the largest hospital in Kuala Lumpur. These reports belong to nine different accident-related causes of death. Master feature vector was prepared by extracting features from the collected autopsy reports by using unigram with lexical categorization. This master feature vector was used to detect cause of death [according to internal classification of disease version 10 (ICD-10) classification system] through five automated feature selection schemes, proposed expert-driven approach, five subset sizes of features, and five machine learning classifiers. Model performance was evaluated using precisionM, recallM, F-measureM, accuracy, and area under ROC curve. Four baselines were used to compare the results with the proposed system. Results Random forest and J48 decision models parameterized using expert-driven feature selection yielded the highest evaluation measure approaching (85% to 90%) for most metrics by using a feature subset size of 30. The proposed system also showed approximately 14% to 16% improvement in the overall accuracy compared with the existing techniques and four baselines. Conclusion The proposed system is feasible and practical to use for automatic classification of ICD-10-related cause of death from autopsy reports. The proposed system assists pathologists to accurately and rapidly determine underlying

  4. A method for fast selecting feature wavelengths from the spectral information of crop nitrogen

    USDA-ARS?s Scientific Manuscript database

    Research on a method for fast selecting feature wavelengths from the nitrogen spectral information is necessary, which can determine the nitrogen content of crops. Based on the uniformity of uniform design, this paper proposed an improved particle swarm optimization (PSO) method. The method can ch...

  5. Stimulus features coded by single neurons of a macaque body category selective patch.

    PubMed

    Popivanov, Ivo D; Schyns, Philippe G; Vogels, Rufin

    2016-04-26

    Body category-selective regions of the primate temporal cortex respond to images of bodies, but it is unclear which fragments of such images drive single neurons' responses in these regions. Here we applied the Bubbles technique to the responses of single macaque middle superior temporal sulcus (midSTS) body patch neurons to reveal the image fragments the neurons respond to. We found that local image fragments such as extremities (limbs), curved boundaries, and parts of the torso drove the large majority of neurons. Bubbles revealed the whole body in only a few neurons. Neurons coded the features in a manner that was tolerant to translation and scale changes. Most image fragments were excitatory but for a few neurons both inhibitory and excitatory fragments (opponent coding) were present in the same image. The fragments we reveal here in the body patch with Bubbles differ from those suggested in previous studies of face-selective neurons in face patches. Together, our data indicate that the majority of body patch neurons respond to local image fragments that occur frequently, but not exclusively, in bodies, with a coding that is tolerant to translation and scale. Overall, the data suggest that the body category selectivity of the midSTS body patch depends more on the feature statistics of bodies (e.g., extensions occur more frequently in bodies) than on semantics (bodies as an abstract category).

  6. Stimulus features coded by single neurons of a macaque body category selective patch

    PubMed Central

    Popivanov, Ivo D.; Schyns, Philippe G.; Vogels, Rufin

    2016-01-01

    Body category-selective regions of the primate temporal cortex respond to images of bodies, but it is unclear which fragments of such images drive single neurons’ responses in these regions. Here we applied the Bubbles technique to the responses of single macaque middle superior temporal sulcus (midSTS) body patch neurons to reveal the image fragments the neurons respond to. We found that local image fragments such as extremities (limbs), curved boundaries, and parts of the torso drove the large majority of neurons. Bubbles revealed the whole body in only a few neurons. Neurons coded the features in a manner that was tolerant to translation and scale changes. Most image fragments were excitatory but for a few neurons both inhibitory and excitatory fragments (opponent coding) were present in the same image. The fragments we reveal here in the body patch with Bubbles differ from those suggested in previous studies of face-selective neurons in face patches. Together, our data indicate that the majority of body patch neurons respond to local image fragments that occur frequently, but not exclusively, in bodies, with a coding that is tolerant to translation and scale. Overall, the data suggest that the body category selectivity of the midSTS body patch depends more on the feature statistics of bodies (e.g., extensions occur more frequently in bodies) than on semantics (bodies as an abstract category). PMID:27071095

  7. Identifying Patients with Atrioventricular Septal Defect in Down Syndrome Populations by Using Self-Normalizing Neural Networks and Feature Selection.

    PubMed

    Pan, Xiaoyong; Hu, Xiaohua; Zhang, Yu Hang; Feng, Kaiyan; Wang, Shao Peng; Chen, Lei; Huang, Tao; Cai, Yu Dong

    2018-04-12

    Atrioventricular septal defect (AVSD) is a clinically significant subtype of congenital heart disease (CHD) that severely influences the health of babies during birth and is associated with Down syndrome (DS). Thus, exploring the differences in functional genes in DS samples with and without AVSD is a critical way to investigate the complex association between AVSD and DS. In this study, we present a computational method to distinguish DS patients with AVSD from those without AVSD using the newly proposed self-normalizing neural network (SNN). First, each patient was encoded by using the copy number of probes on chromosome 21. The encoded features were ranked by the reliable Monte Carlo feature selection (MCFS) method to obtain a ranked feature list. Based on this feature list, we used a two-stage incremental feature selection to construct two series of feature subsets and applied SNNs to build classifiers to identify optimal features. Results show that 2737 optimal features were obtained, and the corresponding optimal SNN classifier constructed on optimal features yielded a Matthew's correlation coefficient (MCC) value of 0.748. For comparison, random forest was also used to build classifiers and uncover optimal features. This method received an optimal MCC value of 0.582 when top 132 features were utilized. Finally, we analyzed some key features derived from the optimal features in SNNs found in literature support to further reveal their essential roles.

  8. The Influence of Selected Personality and Workplace Features on Burnout among Nurse Academics

    ERIC Educational Resources Information Center

    Kizilci, Sevgi; Erdogan, Vesile; Sozen, Emine

    2012-01-01

    This study aimed to determine the influence of selected individual and situational features on burnout among nurse academics. The Maslach Burnout Inventory was used to assess the burnout levels of academics. The sample population comprised 94 female participant. The emotion exhaustion (EE) score of the nurse academics was 16.43[plus or minus]5.97,…

  9. QSRR modeling for diverse drugs using different feature selection methods coupled with linear and nonlinear regressions.

    PubMed

    Goodarzi, Mohammad; Jensen, Richard; Vander Heyden, Yvan

    2012-12-01

    A Quantitative Structure-Retention Relationship (QSRR) is proposed to estimate the chromatographic retention of 83 diverse drugs on a Unisphere poly butadiene (PBD) column, using isocratic elutions at pH 11.7. Previous work has generated QSRR models for them using Classification And Regression Trees (CART). In this work, Ant Colony Optimization is used as a feature selection method to find the best molecular descriptors from a large pool. In addition, several other selection methods have been applied, such as Genetic Algorithms, Stepwise Regression and the Relief method, not only to evaluate Ant Colony Optimization as a feature selection method but also to investigate its ability to find the important descriptors in QSRR. Multiple Linear Regression (MLR) and Support Vector Machines (SVMs) were applied as linear and nonlinear regression methods, respectively, giving excellent correlation between the experimental, i.e. extrapolated to a mobile phase consisting of pure water, and predicted logarithms of the retention factors of the drugs (logk(w)). The overall best model was the SVM one built using descriptors selected by ACO. Copyright © 2012 Elsevier B.V. All rights reserved.

  10. Prediction of active sites of enzymes by maximum relevance minimum redundancy (mRMR) feature selection.

    PubMed

    Gao, Yu-Fei; Li, Bi-Qing; Cai, Yu-Dong; Feng, Kai-Yan; Li, Zhan-Dong; Jiang, Yang

    2013-01-27

    Identification of catalytic residues plays a key role in understanding how enzymes work. Although numerous computational methods have been developed to predict catalytic residues and active sites, the prediction accuracy remains relatively low with high false positives. In this work, we developed a novel predictor based on the Random Forest algorithm (RF) aided by the maximum relevance minimum redundancy (mRMR) method and incremental feature selection (IFS). We incorporated features of physicochemical/biochemical properties, sequence conservation, residual disorder, secondary structure and solvent accessibility to predict active sites of enzymes and achieved an overall accuracy of 0.885687 and MCC of 0.689226 on an independent test dataset. Feature analysis showed that every category of the features except disorder contributed to the identification of active sites. It was also shown via the site-specific feature analysis that the features derived from the active site itself contributed most to the active site determination. Our prediction method may become a useful tool for identifying the active sites and the key features identified by the paper may provide valuable insights into the mechanism of catalysis.

  11. Functional connectivity supporting the selective maintenance of feature-location binding in visual working memory

    PubMed Central

    Takahama, Sachiko; Saiki, Jun

    2014-01-01

    Information on an object's features bound to its location is very important for maintaining object representations in visual working memory. Interactions with dynamic multi-dimensional objects in an external environment require complex cognitive control, including the selective maintenance of feature-location binding. Here, we used event-related functional magnetic resonance imaging to investigate brain activity and functional connectivity related to the maintenance of complex feature-location binding. Participants were required to detect task-relevant changes in feature-location binding between objects defined by color, orientation, and location. We compared a complex binding task requiring complex feature-location binding (color-orientation-location) with a simple binding task in which simple feature-location binding, such as color-location, was task-relevant and the other feature was task-irrelevant. Univariate analyses showed that the dorsolateral prefrontal cortex (DLPFC), hippocampus, and frontoparietal network were activated during the maintenance of complex feature-location binding. Functional connectivity analyses indicated cooperation between the inferior precentral sulcus (infPreCS), DLPFC, and hippocampus during the maintenance of complex feature-location binding. In contrast, the connectivity for the spatial updating of simple feature-location binding determined by reanalyzing the data from Takahama et al. (2010) demonstrated that the superior parietal lobule (SPL) cooperated with the DLPFC and hippocampus. These results suggest that the connectivity for complex feature-location binding does not simply reflect general memory load and that the DLPFC and hippocampus flexibly modulate the dorsal frontoparietal network, depending on the task requirements, with the infPreCS involved in the maintenance of complex feature-location binding and the SPL involved in the spatial updating of simple feature-location binding. PMID:24917833

  12. Functional connectivity supporting the selective maintenance of feature-location binding in visual working memory.

    PubMed

    Takahama, Sachiko; Saiki, Jun

    2014-01-01

    Information on an object's features bound to its location is very important for maintaining object representations in visual working memory. Interactions with dynamic multi-dimensional objects in an external environment require complex cognitive control, including the selective maintenance of feature-location binding. Here, we used event-related functional magnetic resonance imaging to investigate brain activity and functional connectivity related to the maintenance of complex feature-location binding. Participants were required to detect task-relevant changes in feature-location binding between objects defined by color, orientation, and location. We compared a complex binding task requiring complex feature-location binding (color-orientation-location) with a simple binding task in which simple feature-location binding, such as color-location, was task-relevant and the other feature was task-irrelevant. Univariate analyses showed that the dorsolateral prefrontal cortex (DLPFC), hippocampus, and frontoparietal network were activated during the maintenance of complex feature-location binding. Functional connectivity analyses indicated cooperation between the inferior precentral sulcus (infPreCS), DLPFC, and hippocampus during the maintenance of complex feature-location binding. In contrast, the connectivity for the spatial updating of simple feature-location binding determined by reanalyzing the data from Takahama et al. (2010) demonstrated that the superior parietal lobule (SPL) cooperated with the DLPFC and hippocampus. These results suggest that the connectivity for complex feature-location binding does not simply reflect general memory load and that the DLPFC and hippocampus flexibly modulate the dorsal frontoparietal network, depending on the task requirements, with the infPreCS involved in the maintenance of complex feature-location binding and the SPL involved in the spatial updating of simple feature-location binding.

  13. Feature selection using angle modulated simulated Kalman filter for peak classification of EEG signals.

    PubMed

    Adam, Asrul; Ibrahim, Zuwairie; Mokhtar, Norrima; Shapiai, Mohd Ibrahim; Mubin, Marizan; Saad, Ismail

    2016-01-01

    In the existing electroencephalogram (EEG) signals peak classification research, the existing models, such as Dumpala, Acir, Liu, and Dingle peak models, employ different set of features. However, all these models may not be able to offer good performance for various applications and it is found to be problem dependent. Therefore, the objective of this study is to combine all the associated features from the existing models before selecting the best combination of features. A new optimization algorithm, namely as angle modulated simulated Kalman filter (AMSKF) will be employed as feature selector. Also, the neural network random weight method is utilized in the proposed AMSKF technique as a classifier. In the conducted experiment, 11,781 samples of peak candidate are employed in this study for the validation purpose. The samples are collected from three different peak event-related EEG signals of 30 healthy subjects; (1) single eye blink, (2) double eye blink, and (3) eye movement signals. The experimental results have shown that the proposed AMSKF feature selector is able to find the best combination of features and performs at par with the existing related studies of epileptic EEG events classification.

  14. Prediction of human disease-associated phosphorylation sites with combined feature selection approach and support vector machine.

    PubMed

    Xu, Xiaoyi; Li, Ao; Wang, Minghui

    2015-08-01

    Phosphorylation is a crucial post-translational modification, which regulates almost all cellular processes in life. It has long been recognised that protein phosphorylation has close relationship with diseases, and therefore many researches are undertaken to predict phosphorylation sites for disease treatment and drug design. However, despite the success achieved by these approaches, no method focuses on disease-associated phosphorylation sites prediction. Herein, for the first time the authors propose a novel approach that is specially designed to identify associations between phosphorylation sites and human diseases. To take full advantage of local sequence information, a combined feature selection method-based support vector machine (CFS-SVM) that incorporates minimum-redundancy-maximum-relevance filtering process and forward feature selection process is developed. Performance evaluation shows that CFS-SVM is significantly better than the widely used classifiers including Bayesian decision theory, k nearest neighbour and random forest. With the extremely high specificity of 99%, CFS-SVM can still achieve a high sensitivity. Besides, tests on extra data confirm the effectiveness and general applicability of CFS-SVM approach on a variety of diseases. Finally, the analysis of selected features and corresponding kinases also help the understanding of the potential mechanism of disease-phosphorylation relationships and guide further experimental validations.

  15. Low-Complexity Discriminative Feature Selection From EEG Before and After Short-Term Memory Task.

    PubMed

    Behzadfar, Neda; Firoozabadi, S Mohammad P; Badie, Kambiz

    2016-10-01

    A reliable and unobtrusive quantification of changes in cortical activity during short-term memory task can be used to evaluate the efficacy of interfaces and to provide real-time user-state information. In this article, we investigate changes in electroencephalogram signals in short-term memory with respect to the baseline activity. The electroencephalogram signals have been analyzed using 9 linear and nonlinear/dynamic measures. We applied statistical Wilcoxon examination and Davis-Bouldian criterion to select optimal discriminative features. The results show that among the features, the permutation entropy significantly increased in frontal lobe and the occipital second lower alpha band activity decreased during memory task. These 2 features reflect the same mental task; however, their correlation with memory task varies in different intervals. In conclusion, it is suggested that the combination of the 2 features would improve the performance of memory based neurofeedback systems. © EEG and Clinical Neuroscience Society (ECNS) 2016.

  16. Multiple-output support vector machine regression with feature selection for arousal/valence space emotion assessment.

    PubMed

    Torres-Valencia, Cristian A; Álvarez, Mauricio A; Orozco-Gutiérrez, Alvaro A

    2014-01-01

    Human emotion recognition (HER) allows the assessment of an affective state of a subject. Until recently, such emotional states were described in terms of discrete emotions, like happiness or contempt. In order to cover a high range of emotions, researchers in the field have introduced different dimensional spaces for emotion description that allow the characterization of affective states in terms of several variables or dimensions that measure distinct aspects of the emotion. One of the most common of such dimensional spaces is the bidimensional Arousal/Valence space. To the best of our knowledge, all HER systems so far have modelled independently, the dimensions in these dimensional spaces. In this paper, we study the effect of modelling the output dimensions simultaneously and show experimentally the advantages in modeling them in this way. We consider a multimodal approach by including features from the Electroencephalogram and a few physiological signals. For modelling the multiple outputs, we employ a multiple output regressor based on support vector machines. We also include an stage of feature selection that is developed within an embedded approach known as Recursive Feature Elimination (RFE), proposed initially for SVM. The results show that several features can be eliminated using the multiple output support vector regressor with RFE without affecting the performance of the regressor. From the analysis of the features selected in smaller subsets via RFE, it can be observed that the signals that are more informative into the arousal and valence space discrimination are the EEG, Electrooculogram/Electromiogram (EOG/EMG) and the Galvanic Skin Response (GSR).

  17. Predicting domain-domain interaction based on domain profiles with feature selection and support vector machines

    PubMed Central

    2010-01-01

    Background Protein-protein interaction (PPI) plays essential roles in cellular functions. The cost, time and other limitations associated with the current experimental methods have motivated the development of computational methods for predicting PPIs. As protein interactions generally occur via domains instead of the whole molecules, predicting domain-domain interaction (DDI) is an important step toward PPI prediction. Computational methods developed so far have utilized information from various sources at different levels, from primary sequences, to molecular structures, to evolutionary profiles. Results In this paper, we propose a computational method to predict DDI using support vector machines (SVMs), based on domains represented as interaction profile hidden Markov models (ipHMM) where interacting residues in domains are explicitly modeled according to the three dimensional structural information available at the Protein Data Bank (PDB). Features about the domains are extracted first as the Fisher scores derived from the ipHMM and then selected using singular value decomposition (SVD). Domain pairs are represented by concatenating their selected feature vectors, and classified by a support vector machine trained on these feature vectors. The method is tested by leave-one-out cross validation experiments with a set of interacting protein pairs adopted from the 3DID database. The prediction accuracy has shown significant improvement as compared to InterPreTS (Interaction Prediction through Tertiary Structure), an existing method for PPI prediction that also uses the sequences and complexes of known 3D structure. Conclusions We show that domain-domain interaction prediction can be significantly enhanced by exploiting information inherent in the domain profiles via feature selection based on Fisher scores, singular value decomposition and supervised learning based on support vector machines. Datasets and source code are freely available on the web at http

  18. Vibration and acoustic frequency spectra for industrial process modeling using selective fusion multi-condition samples and multi-source features

    NASA Astrophysics Data System (ADS)

    Tang, Jian; Qiao, Junfei; Wu, ZhiWei; Chai, Tianyou; Zhang, Jian; Yu, Wen

    2018-01-01

    Frequency spectral data of mechanical vibration and acoustic signals relate to difficult-to-measure production quality and quantity parameters of complex industrial processes. A selective ensemble (SEN) algorithm can be used to build a soft sensor model of these process parameters by fusing valued information selectively from different perspectives. However, a combination of several optimized ensemble sub-models with SEN cannot guarantee the best prediction model. In this study, we use several techniques to construct mechanical vibration and acoustic frequency spectra of a data-driven industrial process parameter model based on selective fusion multi-condition samples and multi-source features. Multi-layer SEN (MLSEN) strategy is used to simulate the domain expert cognitive process. Genetic algorithm and kernel partial least squares are used to construct the inside-layer SEN sub-model based on each mechanical vibration and acoustic frequency spectral feature subset. Branch-and-bound and adaptive weighted fusion algorithms are integrated to select and combine outputs of the inside-layer SEN sub-models. Then, the outside-layer SEN is constructed. Thus, "sub-sampling training examples"-based and "manipulating input features"-based ensemble construction methods are integrated, thereby realizing the selective information fusion process based on multi-condition history samples and multi-source input features. This novel approach is applied to a laboratory-scale ball mill grinding process. A comparison with other methods indicates that the proposed MLSEN approach effectively models mechanical vibration and acoustic signals.

  19. Selective attention to a facial feature with and without facial context: an ERP-study.

    PubMed

    Wijers, A A; Van Besouw, N J P; Mulder, G

    2002-04-01

    The present experiment addressed the question whether selectively attending to a facial feature (mouth shape) would benefit from the presence of a correct facial context. Subjects attended selectively to one of two possible mouth shapes belonging to photographs of a face with a happy or sad expression, respectively. These mouths were presented randomly either in isolation, embedded in the original photos, or in an exchanged facial context. The ERP effect of attending mouth shape was a lateral posterior negativity, anterior positivity with an onset latency of 160-200 ms; this effect was completely unaffected by the type of facial context. When the mouth shape and the facial context conflicted, this resulted in a medial parieto-occipital positivity with an onset latency of 180 ms, independent of the relevance of the mouth shape. Finally, there was a late (onset at approx. 400 ms) expression (happy vs. sad) effect, which was strongly lateralized to the right posterior hemisphere and was most prominent for attended stimuli in the correct facial context. For the isolated mouth stimuli, a similarly distributed expression effect was observed at an earlier latency range (180-240 ms). These data suggest the existence of separate, independent and neuroanatomically segregated processors engaged in the selective processing of facial features and the detection of contextual congruence and emotional expression of face stimuli. The data do not support that early selective attention processes benefit from top-down constraints provided by the correct facial context.

  20. A comprehensive analysis of earthquake damage patterns using high dimensional model representation feature selection

    NASA Astrophysics Data System (ADS)

    Taşkin Kaya, Gülşen

    2013-10-01

    Recently, earthquake damage assessment using satellite images has been a very popular ongoing research direction. Especially with the availability of very high resolution (VHR) satellite images, a quite detailed damage map based on building scale has been produced, and various studies have also been conducted in the literature. As the spatial resolution of satellite images increases, distinguishability of damage patterns becomes more cruel especially in case of using only the spectral information during classification. In order to overcome this difficulty, textural information needs to be involved to the classification to improve the visual quality and reliability of damage map. There are many kinds of textural information which can be derived from VHR satellite images depending on the algorithm used. However, extraction of textural information and evaluation of them have been generally a time consuming process especially for the large areas affected from the earthquake due to the size of VHR image. Therefore, in order to provide a quick damage map, the most useful features describing damage patterns needs to be known in advance as well as the redundant features. In this study, a very high resolution satellite image after Iran, Bam earthquake was used to identify the earthquake damage. Not only the spectral information, textural information was also used during the classification. For textural information, second order Haralick features were extracted from the panchromatic image for the area of interest using gray level co-occurrence matrix with different size of windows and directions. In addition to using spatial features in classification, the most useful features representing the damage characteristic were selected with a novel feature selection method based on high dimensional model representation (HDMR) giving sensitivity of each feature during classification. The method called HDMR was recently proposed as an efficient tool to capture the input

  1. Novel Mahalanobis-based feature selection improves one-class classification of early hepatocellular carcinoma.

    PubMed

    Thomaz, Ricardo de Lima; Carneiro, Pedro Cunha; Bonin, João Eliton; Macedo, Túlio Augusto Alves; Patrocinio, Ana Claudia; Soares, Alcimar Barbosa

    2018-05-01

    Detection of early hepatocellular carcinoma (HCC) is responsible for increasing survival rates in up to 40%. One-class classifiers can be used for modeling early HCC in multidetector computed tomography (MDCT), but demand the specific knowledge pertaining to the set of features that best describes the target class. Although the literature outlines several features for characterizing liver lesions, it is unclear which is most relevant for describing early HCC. In this paper, we introduce an unconstrained GA feature selection algorithm based on a multi-objective Mahalanobis fitness function to improve the classification performance for early HCC. We compared our approach to a constrained Mahalanobis function and two other unconstrained functions using Welch's t-test and Gaussian Data Descriptors. The performance of each fitness function was evaluated by cross-validating a one-class SVM. The results show that the proposed multi-objective Mahalanobis fitness function is capable of significantly reducing data dimensionality (96.4%) and improving one-class classification of early HCC (0.84 AUC). Furthermore, the results provide strong evidence that intensity features extracted at the arterial to portal and arterial to equilibrium phases are important for classifying early HCC.

  2. iFER: facial expression recognition using automatically selected geometric eye and eyebrow features

    NASA Astrophysics Data System (ADS)

    Oztel, Ismail; Yolcu, Gozde; Oz, Cemil; Kazan, Serap; Bunyak, Filiz

    2018-03-01

    Facial expressions have an important role in interpersonal communications and estimation of emotional states or intentions. Automatic recognition of facial expressions has led to many practical applications and became one of the important topics in computer vision. We present a facial expression recognition system that relies on geometry-based features extracted from eye and eyebrow regions of the face. The proposed system detects keypoints on frontal face images and forms a feature set using geometric relationships among groups of detected keypoints. Obtained feature set is refined and reduced using the sequential forward selection (SFS) algorithm and fed to a support vector machine classifier to recognize five facial expression classes. The proposed system, iFER (eye-eyebrow only facial expression recognition), is robust to lower face occlusions that may be caused by beards, mustaches, scarves, etc. and lower face motion during speech production. Preliminary experiments on benchmark datasets produced promising results outperforming previous facial expression recognition studies using partial face features, and comparable results to studies using whole face information, only slightly lower by ˜ 2.5 % compared to the best whole face facial recognition system while using only ˜ 1 / 3 of the facial region.

  3. Support Vector Feature Selection for Early Detection of Anastomosis Leakage From Bag-of-Words in Electronic Health Records.

    PubMed

    Soguero-Ruiz, Cristina; Hindberg, Kristian; Rojo-Alvarez, Jose Luis; Skrovseth, Stein Olav; Godtliebsen, Fred; Mortensen, Kim; Revhaug, Arthur; Lindsetmo, Rolv-Ole; Augestad, Knut Magne; Jenssen, Robert

    2016-09-01

    The free text in electronic health records (EHRs) conveys a huge amount of clinical information about health state and patient history. Despite a rapidly growing literature on the use of machine learning techniques for extracting this information, little effort has been invested toward feature selection and the features' corresponding medical interpretation. In this study, we focus on the task of early detection of anastomosis leakage (AL), a severe complication after elective surgery for colorectal cancer (CRC) surgery, using free text extracted from EHRs. We use a bag-of-words model to investigate the potential for feature selection strategies. The purpose is earlier detection of AL and prediction of AL with data generated in the EHR before the actual complication occur. Due to the high dimensionality of the data, we derive feature selection strategies using the robust support vector machine linear maximum margin classifier, by investigating: 1) a simple statistical criterion (leave-one-out-based test); 2) an intensive-computation statistical criterion (Bootstrap resampling); and 3) an advanced statistical criterion (kernel entropy). Results reveal a discriminatory power for early detection of complications after CRC (sensitivity 100%; specificity 72%). These results can be used to develop prediction models, based on EHR data, that can support surgeons and patients in the preoperative decision making phase.

  4. A Pareto-based Ensemble with Feature and Instance Selection for Learning from Multi-Class Imbalanced Datasets.

    PubMed

    Fernández, Alberto; Carmona, Cristobal José; José Del Jesus, María; Herrera, Francisco

    2017-09-01

    Imbalanced classification is related to those problems that have an uneven distribution among classes. In addition to the former, when instances are located into the overlapped areas, the correct modeling of the problem becomes harder. Current solutions for both issues are often focused on the binary case study, as multi-class datasets require an additional effort to be addressed. In this research, we overcome these problems by carrying out a combination between feature and instance selections. Feature selection will allow simplifying the overlapping areas easing the generation of rules to distinguish among the classes. Selection of instances from all classes will address the imbalance itself by finding the most appropriate class distribution for the learning task, as well as possibly removing noise and difficult borderline examples. For the sake of obtaining an optimal joint set of features and instances, we embedded the searching for both parameters in a Multi-Objective Evolutionary Algorithm, using the C4.5 decision tree as baseline classifier in this wrapper approach. The multi-objective scheme allows taking a double advantage: the search space becomes broader, and we may provide a set of different solutions in order to build an ensemble of classifiers. This proposal has been contrasted versus several state-of-the-art solutions on imbalanced classification showing excellent results in both binary and multi-class problems.

  5. DNABP: Identification of DNA-Binding Proteins Based on Feature Selection Using a Random Forest and Predicting Binding Residues.

    PubMed

    Ma, Xin; Guo, Jing; Sun, Xiao

    2016-01-01

    DNA-binding proteins are fundamentally important in cellular processes. Several computational-based methods have been developed to improve the prediction of DNA-binding proteins in previous years. However, insufficient work has been done on the prediction of DNA-binding proteins from protein sequence information. In this paper, a novel predictor, DNABP (DNA-binding proteins), was designed to predict DNA-binding proteins using the random forest (RF) classifier with a hybrid feature. The hybrid feature contains two types of novel sequence features, which reflect information about the conservation of physicochemical properties of the amino acids, and the binding propensity of DNA-binding residues and non-binding propensities of non-binding residues. The comparisons with each feature demonstrated that these two novel features contributed most to the improvement in predictive ability. Furthermore, to improve the prediction performance of the DNABP model, feature selection using the minimum redundancy maximum relevance (mRMR) method combined with incremental feature selection (IFS) was carried out during the model construction. The results showed that the DNABP model could achieve 86.90% accuracy, 83.76% sensitivity, 90.03% specificity and a Matthews correlation coefficient of 0.727. High prediction accuracy and performance comparisons with previous research suggested that DNABP could be a useful approach to identify DNA-binding proteins from sequence information. The DNABP web server system is freely available at http://www.cbi.seu.edu.cn/DNABP/.

  6. Simultaneous feature selection and parameter optimisation using an artificial ant colony: case study of melting point prediction

    PubMed Central

    O'Boyle, Noel M; Palmer, David S; Nigsch, Florian; Mitchell, John BO

    2008-01-01

    Background We present a novel feature selection algorithm, Winnowing Artificial Ant Colony (WAAC), that performs simultaneous feature selection and model parameter optimisation for the development of predictive quantitative structure-property relationship (QSPR) models. The WAAC algorithm is an extension of the modified ant colony algorithm of Shen et al. (J Chem Inf Model 2005, 45: 1024–1029). We test the ability of the algorithm to develop a predictive partial least squares model for the Karthikeyan dataset (J Chem Inf Model 2005, 45: 581–590) of melting point values. We also test its ability to perform feature selection on a support vector machine model for the same dataset. Results Starting from an initial set of 203 descriptors, the WAAC algorithm selected a PLS model with 68 descriptors which has an RMSE on an external test set of 46.6°C and R2 of 0.51. The number of components chosen for the model was 49, which was close to optimal for this feature selection. The selected SVM model has 28 descriptors (cost of 5, ε of 0.21) and an RMSE of 45.1°C and R2 of 0.54. This model outperforms a kNN model (RMSE of 48.3°C, R2 of 0.47) for the same data and has similar performance to a Random Forest model (RMSE of 44.5°C, R2 of 0.55). However it is much less prone to bias at the extremes of the range of melting points as shown by the slope of the line through the residuals: -0.43 for WAAC/SVM, -0.53 for Random Forest. Conclusion With a careful choice of objective function, the WAAC algorithm can be used to optimise machine learning and regression models that suffer from overfitting. Where model parameters also need to be tuned, as is the case with support vector machine and partial least squares models, it can optimise these simultaneously. The moving probabilities used by the algorithm are easily interpreted in terms of the best and current models of the ants, and the winnowing procedure promotes the removal of irrelevant descriptors. PMID:18959785

  7. Simultaneous feature selection and parameter optimisation using an artificial ant colony: case study of melting point prediction.

    PubMed

    O'Boyle, Noel M; Palmer, David S; Nigsch, Florian; Mitchell, John Bo

    2008-10-29

    We present a novel feature selection algorithm, Winnowing Artificial Ant Colony (WAAC), that performs simultaneous feature selection and model parameter optimisation for the development of predictive quantitative structure-property relationship (QSPR) models. The WAAC algorithm is an extension of the modified ant colony algorithm of Shen et al. (J Chem Inf Model 2005, 45: 1024-1029). We test the ability of the algorithm to develop a predictive partial least squares model for the Karthikeyan dataset (J Chem Inf Model 2005, 45: 581-590) of melting point values. We also test its ability to perform feature selection on a support vector machine model for the same dataset. Starting from an initial set of 203 descriptors, the WAAC algorithm selected a PLS model with 68 descriptors which has an RMSE on an external test set of 46.6 degrees C and R2 of 0.51. The number of components chosen for the model was 49, which was close to optimal for this feature selection. The selected SVM model has 28 descriptors (cost of 5, epsilon of 0.21) and an RMSE of 45.1 degrees C and R2 of 0.54. This model outperforms a kNN model (RMSE of 48.3 degrees C, R2 of 0.47) for the same data and has similar performance to a Random Forest model (RMSE of 44.5 degrees C, R2 of 0.55). However it is much less prone to bias at the extremes of the range of melting points as shown by the slope of the line through the residuals: -0.43 for WAAC/SVM, -0.53 for Random Forest. With a careful choice of objective function, the WAAC algorithm can be used to optimise machine learning and regression models that suffer from overfitting. Where model parameters also need to be tuned, as is the case with support vector machine and partial least squares models, it can optimise these simultaneously. The moving probabilities used by the algorithm are easily interpreted in terms of the best and current models of the ants, and the winnowing procedure promotes the removal of irrelevant descriptors.

  8. Feature Selection and Pedestrian Detection Based on Sparse Representation.

    PubMed

    Yao, Shihong; Wang, Tao; Shen, Weiming; Pan, Shaoming; Chong, Yanwen; Ding, Fei

    2015-01-01

    Pedestrian detection have been currently devoted to the extraction of effective pedestrian features, which has become one of the obstacles in pedestrian detection application according to the variety of pedestrian features and their large dimension. Based on the theoretical analysis of six frequently-used features, SIFT, SURF, Haar, HOG, LBP and LSS, and their comparison with experimental results, this paper screens out the sparse feature subsets via sparse representation to investigate whether the sparse subsets have the same description abilities and the most stable features. When any two of the six features are fused, the fusion feature is sparsely represented to obtain its important components. Sparse subsets of the fusion features can be rapidly generated by avoiding calculation of the corresponding index of dimension numbers of these feature descriptors; thus, the calculation speed of the feature dimension reduction is improved and the pedestrian detection time is reduced. Experimental results show that sparse feature subsets are capable of keeping the important components of these six feature descriptors. The sparse features of HOG and LSS possess the same description ability and consume less time compared with their full features. The ratios of the sparse feature subsets of HOG and LSS to their full sets are the highest among the six, and thus these two features can be used to best describe the characteristics of the pedestrian and the sparse feature subsets of the combination of HOG-LSS show better distinguishing ability and parsimony.

  9. Gene Expression Network Reconstruction by Convex Feature Selection when Incorporating Genetic Perturbations

    PubMed Central

    Logsdon, Benjamin A.; Mezey, Jason

    2010-01-01

    Cellular gene expression measurements contain regulatory information that can be used to discover novel network relationships. Here, we present a new algorithm for network reconstruction powered by the adaptive lasso, a theoretically and empirically well-behaved method for selecting the regulatory features of a network. Any algorithms designed for network discovery that make use of directed probabilistic graphs require perturbations, produced by either experiments or naturally occurring genetic variation, to successfully infer unique regulatory relationships from gene expression data. Our approach makes use of appropriately selected cis-expression Quantitative Trait Loci (cis-eQTL), which provide a sufficient set of independent perturbations for maximum network resolution. We compare the performance of our network reconstruction algorithm to four other approaches: the PC-algorithm, QTLnet, the QDG algorithm, and the NEO algorithm, all of which have been used to reconstruct directed networks among phenotypes leveraging QTL. We show that the adaptive lasso can outperform these algorithms for networks of ten genes and ten cis-eQTL, and is competitive with the QDG algorithm for networks with thirty genes and thirty cis-eQTL, with rich topologies and hundreds of samples. Using this novel approach, we identify unique sets of directed relationships in Saccharomyces cerevisiae when analyzing genome-wide gene expression data for an intercross between a wild strain and a lab strain. We recover novel putative network relationships between a tyrosine biosynthesis gene (TYR1), and genes involved in endocytosis (RCY1), the spindle checkpoint (BUB2), sulfonate catabolism (JLP1), and cell-cell communication (PRM7). Our algorithm provides a synthesis of feature selection methods and graphical model theory that has the potential to reveal new directed regulatory relationships from the analysis of population level genetic and gene expression data. PMID:21152011

  10. Analysis of the GRNs Inference by Using Tsallis Entropy and a Feature Selection Approach

    NASA Astrophysics Data System (ADS)

    Lopes, Fabrício M.; de Oliveira, Evaldo A.; Cesar, Roberto M.

    An important problem in the bioinformatics field is to understand how genes are regulated and interact through gene networks. This knowledge can be helpful for many applications, such as disease treatment design and drugs creation purposes. For this reason, it is very important to uncover the functional relationship among genes and then to construct the gene regulatory network (GRN) from temporal expression data. However, this task usually involves data with a large number of variables and small number of observations. In this way, there is a strong motivation to use pattern recognition and dimensionality reduction approaches. In particular, feature selection is specially important in order to select the most important predictor genes that can explain some phenomena associated with the target genes. This work presents a first study about the sensibility of entropy methods regarding the entropy functional form, applied to the problem of topology recovery of GRNs. The generalized entropy proposed by Tsallis is used to study this sensibility. The inference process is based on a feature selection approach, which is applied to simulated temporal expression data generated by an artificial gene network (AGN) model. The inferred GRNs are validated in terms of global network measures. Some interesting conclusions can be drawn from the experimental results, as reported for the first time in the present paper.

  11. PREAL: prediction of allergenic protein by maximum Relevance Minimum Redundancy (mRMR) feature selection

    PubMed Central

    2013-01-01

    Background Assessment of potential allergenicity of protein is necessary whenever transgenic proteins are introduced into the food chain. Bioinformatics approaches in allergen prediction have evolved appreciably in recent years to increase sophistication and performance. However, what are the critical features for protein's allergenicity have been not fully investigated yet. Results We presented a more comprehensive model in 128 features space for allergenic proteins prediction by integrating various properties of proteins, such as biochemical and physicochemical properties, sequential features and subcellular locations. The overall accuracy in the cross-validation reached 93.42% to 100% with our new method. Maximum Relevance Minimum Redundancy (mRMR) method and Incremental Feature Selection (IFS) procedure were applied to obtain which features are essential for allergenicity. Results of the performance comparisons showed the superior of our method to the existing methods used widely. More importantly, it was observed that the features of subcellular locations and amino acid composition played major roles in determining the allergenicity of proteins, particularly extracellular/cell surface and vacuole of the subcellular locations for wheat and soybean. To facilitate the allergen prediction, we implemented our computational method in a web application, which can be available at http://gmobl.sjtu.edu.cn/PREAL/index.php. Conclusions Our new approach could improve the accuracy of allergen prediction. And the findings may provide novel insights for the mechanism of allergies. PMID:24565053

  12. Feature selection for examining behavior by pathology laboratories.

    PubMed

    Hawkins, S; Williams, G; Baxter, R

    2001-08-01

    Australia has a universal health insurance scheme called Medicare, which is managed by Australia's Health Insurance Commission. Medicare payments for pathology services generate voluminous transaction data on patients, doctors and pathology laboratories. The Health Insurance Commission (HIC) currently uses predictive models to monitor compliance with regulatory requirements. The HIC commissioned a project to investigate the generation of new features from the data. Feature generation has not appeared as an important step in the knowledge discovery in databases (KDD) literature. New interesting features for use in predictive modeling are generated. These features were summarized, visualized and used as inputs for clustering and outlier detection methods. Data organization and data transformation methods are described for the efficient access and manipulation of these new features.

  13. 2-DE combined with two-layer feature selection accurately establishes the origin of oolong tea.

    PubMed

    Chien, Han-Ju; Chu, Yen-Wei; Chen, Chi-Wei; Juang, Yu-Min; Chien, Min-Wei; Liu, Chih-Wei; Wu, Chia-Chang; Tzen, Jason T C; Lai, Chien-Chen

    2016-11-15

    Taiwan is known for its high quality oolong tea. Because of high consumer demand, some tea manufactures mix lower quality leaves with genuine Taiwan oolong tea in order to increase profits. Robust scientific methods are, therefore, needed to verify the origin and quality of tea leaves. In this study, we investigated whether two-dimensional gel electrophoresis (2-DE) and nanoscale liquid chromatography/tandem mass spectroscopy (nano-LC/MS/MS) coupled with a two-layer feature selection mechanism comprising information gain attribute evaluation (IGAE) and support vector machine feature selection (SVM-FS) are useful in identifying characteristic proteins that can be used as markers of the original source of oolong tea. Samples in this study included oolong tea leaves from 23 different sources. We found that our method had an accuracy of 95.5% in correctly identifying the origin of the leaves. Overall, our method is a novel approach for determining the origin of oolong tea leaves. Copyright © 2016 Elsevier Ltd. All rights reserved.

  14. Classifying Human Voices by Using Hybrid SFX Time-Series Preprocessing and Ensemble Feature Selection

    PubMed Central

    Wong, Raymond

    2013-01-01

    Voice biometrics is one kind of physiological characteristics whose voice is different for each individual person. Due to this uniqueness, voice classification has found useful applications in classifying speakers' gender, mother tongue or ethnicity (accent), emotion states, identity verification, verbal command control, and so forth. In this paper, we adopt a new preprocessing method named Statistical Feature Extraction (SFX) for extracting important features in training a classification model, based on piecewise transformation treating an audio waveform as a time-series. Using SFX we can faithfully remodel statistical characteristics of the time-series; together with spectral analysis, a substantial amount of features are extracted in combination. An ensemble is utilized in selecting only the influential features to be used in classification model induction. We focus on the comparison of effects of various popular data mining algorithms on multiple datasets. Our experiment consists of classification tests over four typical categories of human voice data, namely, Female and Male, Emotional Speech, Speaker Identification, and Language Recognition. The experiments yield encouraging results supporting the fact that heuristically choosing significant features from both time and frequency domains indeed produces better performance in voice classification than traditional signal processing techniques alone, like wavelets and LPC-to-CC. PMID:24288684

  15. An improved strategy for skin lesion detection and classification using uniform segmentation and feature selection based approach.

    PubMed

    Nasir, Muhammad; Attique Khan, Muhammad; Sharif, Muhammad; Lali, Ikram Ullah; Saba, Tanzila; Iqbal, Tassawar

    2018-02-21

    Melanoma is the deadliest type of skin cancer with highest mortality rate. However, the annihilation in early stage implies a high survival rate therefore, it demands early diagnosis. The accustomed diagnosis methods are costly and cumbersome due to the involvement of experienced experts as well as the requirements for highly equipped environment. The recent advancements in computerized solutions for these diagnoses are highly promising with improved accuracy and efficiency. In this article, we proposed a method for the classification of melanoma and benign skin lesions. Our approach integrates preprocessing, lesion segmentation, features extraction, features selection, and classification. Preprocessing is executed in the context of hair removal by DullRazor, whereas lesion texture and color information are utilized to enhance the lesion contrast. In lesion segmentation, a hybrid technique has been implemented and results are fused using additive law of probability. Serial based method is applied subsequently that extracts and fuses the traits such as color, texture, and HOG (shape). The fused features are selected afterwards by implementing a novel Boltzman Entropy method. Finally, the selected features are classified by Support Vector Machine. The proposed method is evaluated on publically available data set PH2. Our approach has provided promising results of sensitivity 97.7%, specificity 96.7%, accuracy 97.5%, and F-score 97.5%, which are significantly better than the results of existing methods available on the same data set. The proposed method detects and classifies melanoma significantly good as compared to existing methods. © 2018 Wiley Periodicals, Inc.

  16. A data-driven multi-model methodology with deep feature selection for short-term wind forecasting

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Feng, Cong; Cui, Mingjian; Hodge, Bri-Mathias

    With the growing wind penetration into the power system worldwide, improving wind power forecasting accuracy is becoming increasingly important to ensure continued economic and reliable power system operations. In this paper, a data-driven multi-model wind forecasting methodology is developed with a two-layer ensemble machine learning technique. The first layer is composed of multiple machine learning models that generate individual forecasts. A deep feature selection framework is developed to determine the most suitable inputs to the first layer machine learning models. Then, a blending algorithm is applied in the second layer to create an ensemble of the forecasts produced by firstmore » layer models and generate both deterministic and probabilistic forecasts. This two-layer model seeks to utilize the statistically different characteristics of each machine learning algorithm. A number of machine learning algorithms are selected and compared in both layers. This developed multi-model wind forecasting methodology is compared to several benchmarks. The effectiveness of the proposed methodology is evaluated to provide 1-hour-ahead wind speed forecasting at seven locations of the Surface Radiation network. Numerical results show that comparing to the single-algorithm models, the developed multi-model framework with deep feature selection procedure has improved the forecasting accuracy by up to 30%.« less

  17. Production of a national 1:1,000,000-scale hydrography dataset for the United States: feature selection, simplification, and refinement

    USGS Publications Warehouse

    Gary, Robin H.; Wilson, Zachary D.; Archuleta, Christy-Ann M.; Thompson, Florence E.; Vrabel, Joseph

    2009-01-01

    During 2006-09, the U.S. Geological Survey, in cooperation with the National Atlas of the United States, produced a 1:1,000,000-scale (1:1M) hydrography dataset comprising streams and waterbodies for the entire United States, including Puerto Rico and the U.S. Virgin Islands, for inclusion in the recompiled National Atlas. This report documents the methods used to select, simplify, and refine features in the 1:100,000-scale (1:100K) (1:63,360-scale in Alaska) National Hydrography Dataset to create the national 1:1M hydrography dataset. Custom tools and semi-automated processes were created to facilitate generalization of the 1:100K National Hydrography Dataset (1:63,360-scale in Alaska) to 1:1M on the basis of existing small-scale hydrography datasets. The first step in creating the new 1:1M dataset was to address feature selection and optimal data density in the streams network. Several existing methods were evaluated. The production method that was established for selecting features for inclusion in the 1:1M dataset uses a combination of the existing attributes and network in the National Hydrography Dataset and several of the concepts from the methods evaluated. The process for creating the 1:1M waterbodies dataset required a similar approach to that used for the streams dataset. Geometric simplification of features was the next step. Stream reaches and waterbodies indicated in the feature selection process were exported as new feature classes and then simplified using a geographic information system tool. The final step was refinement of the 1:1M streams and waterbodies. Refinement was done through the use of additional geographic information system tools.

  18. The Role of Feature Selection and Statistical Weighting in ...

    EPA Pesticide Factsheets

    Our study assesses the value of both in vitro assay and quantitative structure activity relationship (QSAR) data in predicting in vivo toxicity using numerous statistical models and approaches to process the data. Our models are built on datasets of (i) 586 chemicals for which both in vitro and in vivo data are currently available in EPA’s Toxcast and ToxRefDB databases, respectively, and (ii) 769 chemicals for which both QSAR data and in vivo data exist. Similar to a previous study (based on just 309 chemicals, Thomas et al. 2012), after converting the continuous values from each dataset to binary values, the majority of more than 1,000 in vivo endpoints are poorly predicted. Even for the endpoints that are well predicted (about 40 with an F1 score of >0.75), imbalances in in vivo endpoint data or cytotoxicity across in vitro assays may be skewing results. In order to better account for these types of considerations, we examine best practices in data preprocessing and model fitting in real-world contexts where data are rife with imperfections. We discuss options for dealing with missing data, including omitting observations, aggregating variables, and imputing values. We also examine the impacts of feature selection (from both a statistical and biological perspective) on performance and efficiency, and we weight outcome data to reduce endpoint imbalances to account for potential chemical selection bias and assess revised performance. For example, initial weig

  19. SVM feature selection based rotation forest ensemble classifiers to improve computer-aided diagnosis of Parkinson disease.

    PubMed

    Ozcift, Akin

    2012-08-01

    Parkinson disease (PD) is an age-related deterioration of certain nerve systems, which affects movement, balance, and muscle control of clients. PD is one of the common diseases which affect 1% of people older than 60 years. A new classification scheme based on support vector machine (SVM) selected features to train rotation forest (RF) ensemble classifiers is presented for improving diagnosis of PD. The dataset contains records of voice measurements from 31 people, 23 with PD and each record in the dataset is defined with 22 features. The diagnosis model first makes use of a linear SVM to select ten most relevant features from 22. As a second step of the classification model, six different classifiers are trained with the subset of features. Subsequently, at the third step, the accuracies of classifiers are improved by the utilization of RF ensemble classification strategy. The results of the experiments are evaluated using three metrics; classification accuracy (ACC), Kappa Error (KE) and Area under the Receiver Operating Characteristic (ROC) Curve (AUC). Performance measures of two base classifiers, i.e. KStar and IBk, demonstrated an apparent increase in PD diagnosis accuracy compared to similar studies in literature. After all, application of RF ensemble classification scheme improved PD diagnosis in 5 of 6 classifiers significantly. We, numerically, obtained about 97% accuracy in RF ensemble of IBk (a K-Nearest Neighbor variant) algorithm, which is a quite high performance for Parkinson disease diagnosis.

  20. Study of a Vocal Feature Selection Method and Vocal Properties for Discriminating Four Constitution Types

    PubMed Central

    Kim, Keun Ho; Ku, Boncho; Kang, Namsik; Kim, Young-Su; Jang, Jun-Su; Kim, Jong Yeol

    2012-01-01

    The voice has been used to classify the four constitution types, and to recognize a subject's health condition by extracting meaningful physical quantities, in traditional Korean medicine. In this paper, we propose a method of selecting the reliable variables from various voice features, such as frequency derivative features, frequency band ratios, and intensity, from vowels and a sentence. Further, we suggest a process to extract independent variables by eliminating explanatory variables and reducing their correlation and remove outlying data to enable reliable discriminant analysis. Moreover, the suitable division of data for analysis, according to the gender and age of subjects, is discussed. Finally, the vocal features are applied to a discriminant analysis to classify each constitution type. This method of voice classification can be widely used in the u-Healthcare system of personalized medicine and for improving diagnostic accuracy. PMID:22529874

  1. CURE-SMOTE algorithm and hybrid algorithm for feature selection and parameter optimization based on random forests.

    PubMed

    Ma, Li; Fan, Suohai

    2017-03-14

    The random forests algorithm is a type of classifier with prominent universality, a wide application range, and robustness for avoiding overfitting. But there are still some drawbacks to random forests. Therefore, to improve the performance of random forests, this paper seeks to improve imbalanced data processing, feature selection and parameter optimization. We propose the CURE-SMOTE algorithm for the imbalanced data classification problem. Experiments on imbalanced UCI data reveal that the combination of Clustering Using Representatives (CURE) enhances the original synthetic minority oversampling technique (SMOTE) algorithms effectively compared with the classification results on the original data using random sampling, Borderline-SMOTE1, safe-level SMOTE, C-SMOTE, and k-means-SMOTE. Additionally, the hybrid RF (random forests) algorithm has been proposed for feature selection and parameter optimization, which uses the minimum out of bag (OOB) data error as its objective function. Simulation results on binary and higher-dimensional data indicate that the proposed hybrid RF algorithms, hybrid genetic-random forests algorithm, hybrid particle swarm-random forests algorithm and hybrid fish swarm-random forests algorithm can achieve the minimum OOB error and show the best generalization ability. The training set produced from the proposed CURE-SMOTE algorithm is closer to the original data distribution because it contains minimal noise. Thus, better classification results are produced from this feasible and effective algorithm. Moreover, the hybrid algorithm's F-value, G-mean, AUC and OOB scores demonstrate that they surpass the performance of the original RF algorithm. Hence, this hybrid algorithm provides a new way to perform feature selection and parameter optimization.

  2. Robust Ground Target Detection by SAR and IR Sensor Fusion Using Adaboost-Based Feature Selection

    PubMed Central

    Kim, Sungho; Song, Woo-Jin; Kim, So-Hyun

    2016-01-01

    Long-range ground targets are difficult to detect in a noisy cluttered environment using either synthetic aperture radar (SAR) images or infrared (IR) images. SAR-based detectors can provide a high detection rate with a high false alarm rate to background scatter noise. IR-based approaches can detect hot targets but are affected strongly by the weather conditions. This paper proposes a novel target detection method by decision-level SAR and IR fusion using an Adaboost-based machine learning scheme to achieve a high detection rate and low false alarm rate. The proposed method consists of individual detection, registration, and fusion architecture. This paper presents a single framework of a SAR and IR target detection method using modified Boolean map visual theory (modBMVT) and feature-selection based fusion. Previous methods applied different algorithms to detect SAR and IR targets because of the different physical image characteristics. One method that is optimized for IR target detection produces unsuccessful results in SAR target detection. This study examined the image characteristics and proposed a unified SAR and IR target detection method by inserting a median local average filter (MLAF, pre-filter) and an asymmetric morphological closing filter (AMCF, post-filter) into the BMVT. The original BMVT was optimized to detect small infrared targets. The proposed modBMVT can remove the thermal and scatter noise by the MLAF and detect extended targets by attaching the AMCF after the BMVT. Heterogeneous SAR and IR images were registered automatically using the proposed RANdom SAmple Region Consensus (RANSARC)-based homography optimization after a brute-force correspondence search using the detected target centers and regions. The final targets were detected by feature-selection based sensor fusion using Adaboost. The proposed method showed good SAR and IR target detection performance through feature selection-based decision fusion on a synthetic database generated

  3. Robust Ground Target Detection by SAR and IR Sensor Fusion Using Adaboost-Based Feature Selection.

    PubMed

    Kim, Sungho; Song, Woo-Jin; Kim, So-Hyun

    2016-07-19

    Long-range ground targets are difficult to detect in a noisy cluttered environment using either synthetic aperture radar (SAR) images or infrared (IR) images. SAR-based detectors can provide a high detection rate with a high false alarm rate to background scatter noise. IR-based approaches can detect hot targets but are affected strongly by the weather conditions. This paper proposes a novel target detection method by decision-level SAR and IR fusion using an Adaboost-based machine learning scheme to achieve a high detection rate and low false alarm rate. The proposed method consists of individual detection, registration, and fusion architecture. This paper presents a single framework of a SAR and IR target detection method using modified Boolean map visual theory (modBMVT) and feature-selection based fusion. Previous methods applied different algorithms to detect SAR and IR targets because of the different physical image characteristics. One method that is optimized for IR target detection produces unsuccessful results in SAR target detection. This study examined the image characteristics and proposed a unified SAR and IR target detection method by inserting a median local average filter (MLAF, pre-filter) and an asymmetric morphological closing filter (AMCF, post-filter) into the BMVT. The original BMVT was optimized to detect small infrared targets. The proposed modBMVT can remove the thermal and scatter noise by the MLAF and detect extended targets by attaching the AMCF after the BMVT. Heterogeneous SAR and IR images were registered automatically using the proposed RANdom SAmple Region Consensus (RANSARC)-based homography optimization after a brute-force correspondence search using the detected target centers and regions. The final targets were detected by feature-selection based sensor fusion using Adaboost. The proposed method showed good SAR and IR target detection performance through feature selection-based decision fusion on a synthetic database generated

  4. Coding of visual object features and feature conjunctions in the human brain.

    PubMed

    Martinovic, Jasna; Gruber, Thomas; Müller, Matthias M

    2008-01-01

    Object recognition is achieved through neural mechanisms reliant on the activity of distributed coordinated neural assemblies. In the initial steps of this process, an object's features are thought to be coded very rapidly in distinct neural assemblies. These features play different functional roles in the recognition process--while colour facilitates recognition, additional contours and edges delay it. Here, we selectively varied the amount and role of object features in an entry-level categorization paradigm and related them to the electrical activity of the human brain. We found that early synchronizations (approx. 100 ms) increased quantitatively when more image features had to be coded, without reflecting their qualitative contribution to the recognition process. Later activity (approx. 200-400 ms) was modulated by the representational role of object features. These findings demonstrate that although early synchronizations may be sufficient for relatively crude discrimination of objects in visual scenes, they cannot support entry-level categorization. This was subserved by later processes of object model selection, which utilized the representational value of object features such as colour or edges to select the appropriate model and achieve identification.

  5. Anticancer drug sensitivity prediction in cell lines from baseline gene expression through recursive feature selection.

    PubMed

    Dong, Zuoli; Zhang, Naiqian; Li, Chun; Wang, Haiyun; Fang, Yun; Wang, Jun; Zheng, Xiaoqi

    2015-06-30

    An enduring challenge in personalized medicine is to select right drug for individual patients. Testing drugs on patients in large clinical trials is one way to assess their efficacy and toxicity, but it is impractical to test hundreds of drugs currently under development. Therefore the preclinical prediction model is highly expected as it enables prediction of drug response to hundreds of cell lines in parallel. Recently, two large-scale pharmacogenomic studies screened multiple anticancer drugs on over 1000 cell lines in an effort to elucidate the response mechanism of anticancer drugs. To this aim, we here used gene expression features and drug sensitivity data in Cancer Cell Line Encyclopedia (CCLE) to build a predictor based on Support Vector Machine (SVM) and a recursive feature selection tool. Robustness of our model was validated by cross-validation and an independent dataset, the Cancer Genome Project (CGP). Our model achieved good cross validation performance for most drugs in the Cancer Cell Line Encyclopedia (≥80% accuracy for 10 drugs, ≥75% accuracy for 19 drugs). Independent tests on eleven common drugs between CCLE and CGP achieved satisfactory performance for three of them, i.e., AZD6244, Erlotinib and PD-0325901, using expression levels of only twelve, six and seven genes, respectively. These results suggest that drug response could be effectively predicted from genomic features. Our model could be applied to predict drug response for some certain drugs and potentially play a complementary role in personalized medicine.

  6. Sequence-Based Prediction of RNA-Binding Proteins Using Random Forest with Minimum Redundancy Maximum Relevance Feature Selection.

    PubMed

    Ma, Xin; Guo, Jing; Sun, Xiao

    2015-01-01

    The prediction of RNA-binding proteins is one of the most challenging problems in computation biology. Although some studies have investigated this problem, the accuracy of prediction is still not sufficient. In this study, a highly accurate method was developed to predict RNA-binding proteins from amino acid sequences using random forests with the minimum redundancy maximum relevance (mRMR) method, followed by incremental feature selection (IFS). We incorporated features of conjoint triad features and three novel features: binding propensity (BP), nonbinding propensity (NBP), and evolutionary information combined with physicochemical properties (EIPP). The results showed that these novel features have important roles in improving the performance of the predictor. Using the mRMR-IFS method, our predictor achieved the best performance (86.62% accuracy and 0.737 Matthews correlation coefficient). High prediction accuracy and successful prediction performance suggested that our method can be a useful approach to identify RNA-binding proteins from sequence information.

  7. Predicting bacteriophage proteins located in host cell with feature selection technique.

    PubMed

    Ding, Hui; Liang, Zhi-Yong; Guo, Feng-Biao; Huang, Jian; Chen, Wei; Lin, Hao

    2016-04-01

    A bacteriophage is a virus that can infect a bacterium. The fate of an infected bacterium is determined by the bacteriophage proteins located in the host cell. Thus, reliably identifying bacteriophage proteins located in the host cell is extremely important to understand their functions and discover potential anti-bacterial drugs. Thus, in this paper, a computational method was developed to recognize bacteriophage proteins located in host cells based only on their amino acid sequences. The analysis of variance (ANOVA) combined with incremental feature selection (IFS) was proposed to optimize the feature set. Using a jackknife cross-validation, our method can discriminate between bacteriophage proteins located in a host cell and the bacteriophage proteins not located in a host cell with a maximum overall accuracy of 84.2%, and can further classify bacteriophage proteins located in host cell cytoplasm and in host cell membranes with a maximum overall accuracy of 92.4%. To enhance the value of the practical applications of the method, we built a web server called PHPred (〈http://lin.uestc.edu.cn/server/PHPred〉). We believe that the PHPred will become a powerful tool to study bacteriophage proteins located in host cells and to guide related drug discovery. Copyright © 2016 Elsevier Ltd. All rights reserved.

  8. Genetic algorithm based feature selection combined with dual classification for the automated detection of proliferative diabetic retinopathy.

    PubMed

    Welikala, R A; Fraz, M M; Dehmeshki, J; Hoppe, A; Tah, V; Mann, S; Williamson, T H; Barman, S A

    2015-07-01

    Proliferative diabetic retinopathy (PDR) is a condition that carries a high risk of severe visual impairment. The hallmark of PDR is the growth of abnormal new vessels. In this paper, an automated method for the detection of new vessels from retinal images is presented. This method is based on a dual classification approach. Two vessel segmentation approaches are applied to create two separate binary vessel map which each hold vital information. Local morphology features are measured from each binary vessel map to produce two separate 4-D feature vectors. Independent classification is performed for each feature vector using a support vector machine (SVM) classifier. The system then combines these individual outcomes to produce a final decision. This is followed by the creation of additional features to generate 21-D feature vectors, which feed into a genetic algorithm based feature selection approach with the objective of finding feature subsets that improve the performance of the classification. Sensitivity and specificity results using a dataset of 60 images are 0.9138 and 0.9600, respectively, on a per patch basis and 1.000 and 0.975, respectively, on a per image basis. Copyright © 2015 Elsevier Ltd. All rights reserved.

  9. Spectral characteristics and feature selection of satellite remote sensing data for climate and anthropogenic changes assessment in Bucharest area

    NASA Astrophysics Data System (ADS)

    Zoran, Maria; Savastru, Roxana; Savastru, Dan; Tautan, Marina; Miclos, Sorin; Cristescu, Luminita; Carstea, Elfrida; Baschir, Laurentiu

    2010-05-01

    Urban systems play a vital role in social and economic development in all countries. Their environmental changes can be investigated on different spatial and temporal scales. Urban and peri-urban environment dynamics is of great interest for future planning and decision making as well as in frame of local and regional changes. Changes in urban land cover include changes in biotic diversity, actual and potential primary productivity, soil quality, runoff, and sedimentation rates, and cannot be well understood without the knowledge of land use change that drives them. The study focuses on the assessment of environmental features changes for Bucharest metropolitan area, Romania by satellite remote sensing and in-situ monitoring data. Rational feature selection from the varieties of spectral channels in the optical wavelengths of electromagnetic spectrum (VIS and NIR) is very important for effective analysis and information extraction of remote sensing data. Based on comprehensively analyses of the spectral characteristics of remote sensing data is possibly to derive environmental changes in urban areas. The information quantity contained in a band is an important parameter in evaluating the band. The deviation and entropy are often used to show information amount. Feature selection is one of the most important steps in recognition and classification of remote sensing images. Therefore, it is necessary to select features before classification. The optimal features are those that can be used to distinguish objects easily and correctly. Three factors—the information quantity of bands, the correlation between bands and the spectral characteristic (e.g. absorption specialty) of classified objects in test area Bucharest have been considered in our study. As, the spectral characteristic of an object is influenced by many factors, being difficult to define optimal feature parameters to distinguish all the objects in a whole area, a method of multi-level feature selection

  10. Selected PET radiomic features remain the same.

    PubMed

    Tsujikawa, Tetsuya; Tsuyoshi, Hideaki; Kanno, Masafumi; Yamada, Shizuka; Kobayashi, Masato; Narita, Norihiko; Kimura, Hirohiko; Fujieda, Shigeharu; Yoshida, Yoshio; Okazawa, Hidehiko

    2018-04-17

    We investigated whether PET radiomic features are affected by differences in the scanner, scan protocol, and lesion location using 18 F-FDG PET/CT and PET/MR scans. SUV, TMR, skewness, kurtosis, entropy, and homogeneity strongly correlated between PET/CT and PET/MR images. SUVs were significantly higher on PET/MR 0-2 min and PET/MR 0-10 min than on PET/CT in gynecological cancer ( p = 0.008 and 0.008, respectively), whereas no significant difference was observed between PET/CT, PET/MR 0-2 min , and PET/MR 0-10 min images in oral cavity/oropharyngeal cancer. TMRs on PET/CT, PET/MR 0-2 min , and PET/MR 0-10 min increased in this order in gynecological cancer and oral cavity/oropharyngeal cancer. In contrast to conventional and histogram indices, 4 textural features (entropy, homogeneity, SRE, and LRE) were not significantly different between PET/CT, PET/MR 0-2 min , and PET/MR 0-10 min images. 18 F-FDG PET radiomic features strongly correlated between PET/CT and PET/MR images. Dixon-based attenuation correction on PET/MR images underestimated tumor tracer uptake more significantly in oral cavity/oropharyngeal cancer than in gynecological cancer. 18 F-FDG PET textural features were affected less by differences in the scanner and scan protocol than conventional and histogram features, possibly due to the resampling process using a medium bin width. Eight patients with gynecological cancer and 7 with oral cavity/oropharyngeal cancer underwent a whole-body 18 F-FDG PET/CT scan and regional PET/MR scan in one day. PET/MR scans were performed for 10 minutes in the list mode, and PET/CT and 0-2 min and 0-10 min PET/MR images were reconstructed. The standardized uptake value (SUV), tumor-to-muscle SUV ratio (TMR), skewness, kurtosis, entropy, homogeneity, short-run emphasis (SRE), and long-run emphasis (LRE) were compared between PET/CT, PET/MR 0-2 min , and PET/MR 0-10 min images.

  11. Differential prioritization between relevance and redundancy in correlation-based feature selection techniques for multiclass gene expression data.

    PubMed

    Ooi, Chia Huey; Chetty, Madhu; Teng, Shyh Wei

    2006-06-23

    Due to the large number of genes in a typical microarray dataset, feature selection looks set to play an important role in reducing noise and computational cost in gene expression-based tissue classification while improving accuracy at the same time. Surprisingly, this does not appear to be the case for all multiclass microarray datasets. The reason is that many feature selection techniques applied on microarray datasets are either rank-based and hence do not take into account correlations between genes, or are wrapper-based, which require high computational cost, and often yield difficult-to-reproduce results. In studies where correlations between genes are considered, attempts to establish the merit of the proposed techniques are hampered by evaluation procedures which are less than meticulous, resulting in overly optimistic estimates of accuracy. We present two realistically evaluated correlation-based feature selection techniques which incorporate, in addition to the two existing criteria involved in forming a predictor set (relevance and redundancy), a third criterion called the degree of differential prioritization (DDP). DDP functions as a parameter to strike the balance between relevance and redundancy, providing our techniques with the novel ability to differentially prioritize the optimization of relevance against redundancy (and vice versa). This ability proves useful in producing optimal classification accuracy while using reasonably small predictor set sizes for nine well-known multiclass microarray datasets. For multiclass microarray datasets, especially the GCM and NCI60 datasets, DDP enables our filter-based techniques to produce accuracies better than those reported in previous studies which employed similarly realistic evaluation procedures.

  12. Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems.

    PubMed

    Lê Cao, Kim-Anh; Boitard, Simon; Besse, Philippe

    2011-06-22

    Variable selection on high throughput biological data, such as gene expression or single nucleotide polymorphisms (SNPs), becomes inevitable to select relevant information and, therefore, to better characterize diseases or assess genetic structure. There are different ways to perform variable selection in large data sets. Statistical tests are commonly used to identify differentially expressed features for explanatory purposes, whereas Machine Learning wrapper approaches can be used for predictive purposes. In the case of multiple highly correlated variables, another option is to use multivariate exploratory approaches to give more insight into cell biology, biological pathways or complex traits. A simple extension of a sparse PLS exploratory approach is proposed to perform variable selection in a multiclass classification framework. sPLS-DA has a classification performance similar to other wrapper or sparse discriminant analysis approaches on public microarray and SNP data sets. More importantly, sPLS-DA is clearly competitive in terms of computational efficiency and superior in terms of interpretability of the results via valuable graphical outputs. sPLS-DA is available in the R package mixOmics, which is dedicated to the analysis of large biological data sets.

  13. Discriminative region extraction and feature selection based on the combination of SURF and saliency

    NASA Astrophysics Data System (ADS)

    Deng, Li; Wang, Chunhong; Rao, Changhui

    2011-08-01

    The objective of this paper is to provide a possible optimization on salient region algorithm, which is extensively used in recognizing and learning object categories. Salient region algorithm owns the superiority of intra-class tolerance, global score of features and automatically prominent scale selection under certain range. However, the major limitation behaves on performance, and that is what we attempt to improve. By reducing the number of pixels involved in saliency calculation, it can be accelerated. We use interest points detected by fast-Hessian, the detector of SURF, as the candidate feature for saliency operation, rather than the whole set in image. This implementation is thereby called Saliency based Optimization over SURF (SOSU for short). Experiment shows that bringing in of such a fast detector significantly speeds up the algorithm. Meanwhile, Robustness of intra-class diversity ensures object recognition accuracy.

  14. A Study for the Feature Selection to Identify GIEMSA-Stained Human Chromosomes Based on Artificial Neural Network

    DTIC Science & Technology

    2001-10-25

    neural network (ANN) has been adopted for the human chromosome classification. It is important to select optimum features for training neural network...Many studies for computer-based chromosome analysis have shown that it is possible to classify chromosomes into 24 subgroups. In addition, artificial

  15. Reconstruction and feature selection for desorption electrospray ionization mass spectroscopy imagery

    NASA Astrophysics Data System (ADS)

    Gao, Yi; Zhu, Liangjia; Norton, Isaiah; Agar, Nathalie Y. R.; Tannenbaum, Allen

    2014-03-01

    Desorption electrospray ionization mass spectrometry (DESI-MS) provides a highly sensitive imaging technique for differentiating normal and cancerous tissue at the molecular level. This can be very useful, especially under intra-operative conditions where the surgeon has to make crucial decision about the tumor boundary. In such situations, the time it takes for imaging and data analysis becomes a critical factor. Therefore, in this work we utilize compressive sensing to perform the sparse sampling of the tissue, which halves the scanning time. Furthermore, sparse feature selection is performed, which not only reduces the dimension of data from about 104 to less than 50, and thus significantly shortens the analysis time. This procedure also identifies biochemically important molecules for further pathological analysis. The methods are validated on brain and breast tumor data sets.

  16. Just-in-time adaptive disturbance estimation for run-to-run control of photolithography overlay

    NASA Astrophysics Data System (ADS)

    Firth, Stacy K.; Campbell, W. J.; Edgar, Thomas F.

    2002-07-01

    One of the main challenges to implementations of traditional run-to-run control in the semiconductor industry is a high mix of products in a single factory. To address this challenge, Just-in-time Adaptive Disturbance Estimation (JADE) has been developed. JADE uses a recursive weighted least-squares parameters estimation technique to identify the contributions to variation that are dependent on product, as well as the tools on which the lot was processed. As applied to photolithography overlay, JADE assigns these sources of variation to contributions from the context items: tool, product, reference tool, and reference reticle. Simulations demonstrate that JADE effectively identifies disturbances in contributing context items when the variations are known to be additive. The superior performance of JADE over traditional EWMA is also shown in these simulations. The results of application of JADE to data from a high mix production facility show that JADE still performs better than EWMA, even with the challenges of a real manufacturing environment.

  17. Changes in selected features of a male face and assessment of their influence on facial recognition.

    PubMed

    Lewandowski, Zdzisław

    2011-01-01

    The project aimed at finding the answers to the following two research questions: --To what extent does a change in size, height or width of the selected face feature influence the assessment of likeness between an original composite portrait and a modified one? --How does the sex of a person who judges the images have an impact on the perception of likeness of the face features? The results indicate that there are significant differences in the assessment of likeness of the portraits with some features modified to the original ones. The images with changes in size and height of the nose received the lowest scores on the likeness scale, which indicates that these changes were perceived by the subjects as the most important. The photos with changes in height and width of the lips, and height and width of the eye slit, in turn, received high scores of likeness, in spite of big changes. This signifies that these modifications were perceived to be of the least importance (compared to the other features investigated).

  18. Different underlying mechanisms for face emotion and gender processing during feature-selective attention: Evidence from event-related potential studies.

    PubMed

    Wang, Hailing; Ip, Chengteng; Fu, Shimin; Sun, Pei

    2017-05-01

    Face recognition theories suggest that our brains process invariant (e.g., gender) and changeable (e.g., emotion) facial dimensions separately. To investigate whether these two dimensions are processed in different time courses, we analyzed the selection negativity (SN, an event-related potential component reflecting attentional modulation) elicited by face gender and emotion during a feature selective attention task. Participants were instructed to attend to a combination of face emotion and gender attributes in Experiment 1 (bi-dimensional task) and to either face emotion or gender in Experiment 2 (uni-dimensional task). The results revealed that face emotion did not elicit a substantial SN, whereas face gender consistently generated a substantial SN in both experiments. These results suggest that face gender is more sensitive to feature-selective attention and that face emotion is encoded relatively automatically on SN, implying the existence of different underlying processing mechanisms for invariant and changeable facial dimensions. Copyright © 2017 Elsevier Ltd. All rights reserved.

  19. Evaluation of feature selection algorithms for classification in temporal lobe epilepsy based on MR images

    NASA Astrophysics Data System (ADS)

    Lai, Chunren; Guo, Shengwen; Cheng, Lina; Wang, Wensheng; Wu, Kai

    2017-02-01

    It's very important to differentiate the temporal lobe epilepsy (TLE) patients from healthy people and localize the abnormal brain regions of the TLE patients. The cortical features and changes can reveal the unique anatomical patterns of brain regions from the structural MR images. In this study, structural MR images from 28 normal controls (NC), 18 left TLE (LTLE), and 21 right TLE (RTLE) were acquired, and four types of cortical feature, namely cortical thickness (CTh), cortical surface area (CSA), gray matter volume (GMV), and mean curvature (MCu), were explored for discriminative analysis. Three feature selection methods, the independent sample t-test filtering, the sparse-constrained dimensionality reduction model (SCDRM), and the support vector machine-recursive feature elimination (SVM-RFE), were investigated to extract dominant regions with significant differences among the compared groups for classification using the SVM classifier. The results showed that the SVM-REF achieved the highest performance (most classifications with more than 92% accuracy), followed by the SCDRM, and the t-test. Especially, the surface area and gray volume matter exhibited prominent discriminative ability, and the performance of the SVM was improved significantly when the four cortical features were combined. Additionally, the dominant regions with higher classification weights were mainly located in temporal and frontal lobe, including the inferior temporal, entorhinal cortex, fusiform, parahippocampal cortex, middle frontal and frontal pole. It was demonstrated that the cortical features provided effective information to determine the abnormal anatomical pattern and the proposed method has the potential to improve the clinical diagnosis of the TLE.

  20. Efficient and sparse feature selection for biomedical text classification via the elastic net: Application to ICU risk stratification from nursing notes.

    PubMed

    Marafino, Ben J; Boscardin, W John; Dudley, R Adams

    2015-04-01

    Sparsity is often a desirable property of statistical models, and various feature selection methods exist so as to yield sparser and interpretable models. However, their application to biomedical text classification, particularly to mortality risk stratification among intensive care unit (ICU) patients, has not been thoroughly studied. To develop and characterize sparse classifiers based on the free text of nursing notes in order to predict ICU mortality risk and to discover text features most strongly associated with mortality. We selected nursing notes from the first 24h of ICU admission for 25,826 adult ICU patients from the MIMIC-II database. We then developed a pair of stochastic gradient descent-based classifiers with elastic-net regularization. We also studied the performance-sparsity tradeoffs of both classifiers as their regularization parameters were varied. The best-performing classifier achieved a 10-fold cross-validated AUC of 0.897 under the log loss function and full L2 regularization, while full L1 regularization used just 0.00025% of candidate input features and resulted in an AUC of 0.889. Using the log loss (range of AUCs 0.889-0.897) yielded better performance compared to the hinge loss (0.850-0.876), but the latter yielded even sparser models. Most features selected by both classifiers appear clinically relevant and correspond to predictors already present in existing ICU mortality models. The sparser classifiers were also able to discover a number of informative - albeit nonclinical - features. The elastic-net-regularized classifiers perform reasonably well and are capable of reducing the number of features required by over a thousandfold, with only a modest impact on performance. Copyright © 2015 Elsevier Inc. All rights reserved.

  1. A research of selected textural features for detection of asbestos-cement roofing sheets using orthoimages

    NASA Astrophysics Data System (ADS)

    Książek, Judyta

    2015-10-01

    At present, there has been a great interest in the development of texture based image classification methods in many different areas. This study presents the results of research carried out to assess the usefulness of selected textural features for detection of asbestos-cement roofs in orthophotomap classification. Two different orthophotomaps of southern Poland (with ground resolution: 5 cm and 25 cm) were used. On both orthoimages representative samples for two classes: asbestos-cement roofing sheets and other roofing materials were selected. Estimation of texture analysis usefulness was conducted using machine learning methods based on decision trees (C5.0 algorithm). For this purpose, various sets of texture parameters were calculated in MaZda software. During the calculation of decision trees different numbers of texture parameters groups were considered. In order to obtain the best settings for decision trees models cross-validation was performed. Decision trees models with the lowest mean classification error were selected. The accuracy of the classification was held based on validation data sets, which were not used for the classification learning. For 5 cm ground resolution samples, the lowest mean classification error was 15.6%. The lowest mean classification error in the case of 25 cm ground resolution was 20.0%. The obtained results confirm potential usefulness of the texture parameter image processing for detection of asbestos-cement roofing sheets. In order to improve the accuracy another extended study should be considered in which additional textural features as well as spectral characteristics should be analyzed.

  2. Interactive prostate segmentation using atlas-guided semi-supervised learning and adaptive feature selection

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Park, Sang Hyun; Gao, Yaozong, E-mail: yzgao@cs.unc.edu; Shi, Yinghuan, E-mail: syh@nju.edu.cn

    Purpose: Accurate prostate segmentation is necessary for maximizing the effectiveness of radiation therapy of prostate cancer. However, manual segmentation from 3D CT images is very time-consuming and often causes large intra- and interobserver variations across clinicians. Many segmentation methods have been proposed to automate this labor-intensive process, but tedious manual editing is still required due to the limited performance. In this paper, the authors propose a new interactive segmentation method that can (1) flexibly generate the editing result with a few scribbles or dots provided by a clinician, (2) fast deliver intermediate results to the clinician, and (3) sequentially correctmore » the segmentations from any type of automatic or interactive segmentation methods. Methods: The authors formulate the editing problem as a semisupervised learning problem which can utilize a priori knowledge of training data and also the valuable information from user interactions. Specifically, from a region of interest near the given user interactions, the appropriate training labels, which are well matched with the user interactions, can be locally searched from a training set. With voting from the selected training labels, both confident prostate and background voxels, as well as unconfident voxels can be estimated. To reflect informative relationship between voxels, location-adaptive features are selected from the confident voxels by using regression forest and Fisher separation criterion. Then, the manifold configuration computed in the derived feature space is enforced into the semisupervised learning algorithm. The labels of unconfident voxels are then predicted by regularizing semisupervised learning algorithm. Results: The proposed interactive segmentation method was applied to correct automatic segmentation results of 30 challenging CT images. The correction was conducted three times with different user interactions performed at different time periods, in

  3. Extremely Selective Attention: Eye-Tracking Studies of the Dynamic Allocation of Attention to Stimulus Features in Categorization

    ERIC Educational Resources Information Center

    Blair, Mark R.; Watson, Marcus R.; Walshe, R. Calen; Maj, Fillip

    2009-01-01

    Humans have an extremely flexible ability to categorize regularities in their environment, in part because of attentional systems that allow them to focus on important perceptual information. In formal theories of categorization, attention is typically modeled with weights that selectively bias the processing of stimulus features. These theories…

  4. Feature Selection for Evolutionary Commercial-off-the-Shelf Software: Studies Focusing on Time-to-Market, Innovation and Hedonic-Utilitarian Trade-Offs

    ERIC Educational Resources Information Center

    Kakar, Adarsh Kumar

    2013-01-01

    Feature selection is one of the most important decisions made by product managers. This three article study investigates the concepts, tools and techniques for making trade-off decisions of introducing new features in evolving Commercial-Off-The-Shelf (COTS) software products. The first article investigates the efficacy of various feature…

  5. Comprehensive evaluation of untargeted metabolomics data processing software in feature detection, quantification and discriminating marker selection.

    PubMed

    Li, Zhucui; Lu, Yan; Guo, Yufeng; Cao, Haijie; Wang, Qinhong; Shui, Wenqing

    2018-10-31

    Data analysis represents a key challenge for untargeted metabolomics studies and it commonly requires extensive processing of more than thousands of metabolite peaks included in raw high-resolution MS data. Although a number of software packages have been developed to facilitate untargeted data processing, they have not been comprehensively scrutinized in the capability of feature detection, quantification and marker selection using a well-defined benchmark sample set. In this study, we acquired a benchmark dataset from standard mixtures consisting of 1100 compounds with specified concentration ratios including 130 compounds with significant variation of concentrations. Five software evaluated here (MS-Dial, MZmine 2, XCMS, MarkerView, and Compound Discoverer) showed similar performance in detection of true features derived from compounds in the mixtures. However, significant differences between untargeted metabolomics software were observed in relative quantification of true features in the benchmark dataset. MZmine 2 outperformed the other software in terms of quantification accuracy and it reported the most true discriminating markers together with the fewest false markers. Furthermore, we assessed selection of discriminating markers by different software using both the benchmark dataset and a real-case metabolomics dataset to propose combined usage of two software for increasing confidence of biomarker identification. Our findings from comprehensive evaluation of untargeted metabolomics software would help guide future improvements of these widely used bioinformatics tools and enable users to properly interpret their metabolomics results. Copyright © 2018 Elsevier B.V. All rights reserved.

  6. Efficient brain lesion segmentation using multi-modality tissue-based feature selection and support vector machines.

    PubMed

    Fiot, Jean-Baptiste; Cohen, Laurent D; Raniga, Parnesh; Fripp, Jurgen

    2013-09-01

    Support vector machines (SVM) are machine learning techniques that have been used for segmentation and classification of medical images, including segmentation of white matter hyper-intensities (WMH). Current approaches using SVM for WMH segmentation extract features from the brain and classify these followed by complex post-processing steps to remove false positives. The method presented in this paper combines advanced pre-processing, tissue-based feature selection and SVM classification to obtain efficient and accurate WMH segmentation. Features from 125 patients, generated from up to four MR modalities [T1-w, T2-w, proton-density and fluid attenuated inversion recovery(FLAIR)], differing neighbourhood sizes and the use of multi-scale features were compared. We found that although using all four modalities gave the best overall classification (average Dice scores of 0.54  ±  0.12, 0.72  ±  0.06 and 0.82  ±  0.06 respectively for small, moderate and severe lesion loads); this was not significantly different (p = 0.50) from using just T1-w and FLAIR sequences (Dice scores of 0.52  ±  0.13, 0.71  ±  0.08 and 0.81  ±  0.07). Furthermore, there was a negligible difference between using 5 × 5 × 5 and 3 × 3 × 3 features (p = 0.93). Finally, we show that careful consideration of features and pre-processing techniques not only saves storage space and computation time but also leads to more efficient classification, which outperforms the one based on all features with post-processing. Copyright © 2013 John Wiley & Sons, Ltd.

  7. A Selective Overview of Variable Selection in High Dimensional Feature Space

    PubMed Central

    Fan, Jianqing

    2010-01-01

    High dimensional statistical problems arise from diverse fields of scientific research and technological development. Variable selection plays a pivotal role in contemporary statistical learning and scientific discoveries. The traditional idea of best subset selection methods, which can be regarded as a specific form of penalized likelihood, is computationally too expensive for many modern statistical applications. Other forms of penalized likelihood methods have been successfully developed over the last decade to cope with high dimensionality. They have been widely applied for simultaneously selecting important variables and estimating their effects in high dimensional statistical inference. In this article, we present a brief account of the recent developments of theory, methods, and implementations for high dimensional variable selection. What limits of the dimensionality such methods can handle, what the role of penalty functions is, and what the statistical properties are rapidly drive the advances of the field. The properties of non-concave penalized likelihood and its roles in high dimensional statistical modeling are emphasized. We also review some recent advances in ultra-high dimensional variable selection, with emphasis on independence screening and two-scale methods. PMID:21572976

  8. Multiclass Classification of Cardiac Arrhythmia Using Improved Feature Selection and SVM Invariants.

    PubMed

    Mustaqeem, Anam; Anwar, Syed Muhammad; Majid, Muahammad

    2018-01-01

    Arrhythmia is considered a life-threatening disease causing serious health issues in patients, when left untreated. An early diagnosis of arrhythmias would be helpful in saving lives. This study is conducted to classify patients into one of the sixteen subclasses, among which one class represents absence of disease and the other fifteen classes represent electrocardiogram records of various subtypes of arrhythmias. The research is carried out on the dataset taken from the University of California at Irvine Machine Learning Data Repository. The dataset contains a large volume of feature dimensions which are reduced using wrapper based feature selection technique. For multiclass classification, support vector machine (SVM) based approaches including one-against-one (OAO), one-against-all (OAA), and error-correction code (ECC) are employed to detect the presence and absence of arrhythmias. The SVM method results are compared with other standard machine learning classifiers using varying parameters and the performance of the classifiers is evaluated using accuracy, kappa statistics, and root mean square error. The results show that OAO method of SVM outperforms all other classifiers by achieving an accuracy rate of 81.11% when used with 80/20 data split and 92.07% using 90/10 data split option.

  9. Prediction of Antimicrobial Peptides Based on Sequence Alignment and Feature Selection Methods

    PubMed Central

    Wang, Ping; Hu, Lele; Liu, Guiyou; Jiang, Nan; Chen, Xiaoyun; Xu, Jianyong; Zheng, Wen; Li, Li; Tan, Ming; Chen, Zugen; Song, Hui; Cai, Yu-Dong; Chou, Kuo-Chen

    2011-01-01

    Antimicrobial peptides (AMPs) represent a class of natural peptides that form a part of the innate immune system, and this kind of ‘nature's antibiotics’ is quite promising for solving the problem of increasing antibiotic resistance. In view of this, it is highly desired to develop an effective computational method for accurately predicting novel AMPs because it can provide us with more candidates and useful insights for drug design. In this study, a new method for predicting AMPs was implemented by integrating the sequence alignment method and the feature selection method. It was observed that, the overall jackknife success rate by the new predictor on a newly constructed benchmark dataset was over 80.23%, and the Mathews correlation coefficient is 0.73, indicating a good prediction. Moreover, it is indicated by an in-depth feature analysis that the results are quite consistent with the previously known knowledge that some amino acids are preferential in AMPs and that these amino acids do play an important role for the antimicrobial activity. For the convenience of most experimental scientists who want to use the prediction method without the interest to follow the mathematical details, a user-friendly web-server is provided at http://amp.biosino.org/. PMID:21533231

  10. Designing using manufacturing features

    NASA Astrophysics Data System (ADS)

    Szecsi, T.; Hoque, A. S. M.

    2012-04-01

    This paper presents a design system that enables the composition of a part using manufacturing features. Features are selected from feature libraries. Upon insertion, the system ensures that the feature does not contradict the design-for-manufacture rules. This helps eliminating costly manufacturing problems. The system is developed as an extension to a commercial CAD/CAM system Pro/Engineer.

  11. A Genetic-Based Feature Selection Approach in the Identification of Left/Right Hand Motor Imagery for a Brain-Computer Interface

    PubMed Central

    Yaacoub, Charles; Mhanna, Georges; Rihana, Sandy

    2017-01-01

    Electroencephalography is a non-invasive measure of the brain electrical activity generated by millions of neurons. Feature extraction in electroencephalography analysis is a core issue that may lead to accurate brain mental state classification. This paper presents a new feature selection method that improves left/right hand movement identification of a motor imagery brain-computer interface, based on genetic algorithms and artificial neural networks used as classifiers. Raw electroencephalography signals are first preprocessed using appropriate filtering. Feature extraction is carried out afterwards, based on spectral and temporal signal components, and thus a feature vector is constructed. As various features might be inaccurate and mislead the classifier, thus degrading the overall system performance, the proposed approach identifies a subset of features from a large feature space, such that the classifier error rate is reduced. Experimental results show that the proposed method is able to reduce the number of features to as low as 0.5% (i.e., the number of ignored features can reach 99.5%) while improving the accuracy, sensitivity, specificity, and precision of the classifier. PMID:28124985

  12. A Genetic-Based Feature Selection Approach in the Identification of Left/Right Hand Motor Imagery for a Brain-Computer Interface.

    PubMed

    Yaacoub, Charles; Mhanna, Georges; Rihana, Sandy

    2017-01-23

    Electroencephalography is a non-invasive measure of the brain electrical activity generated by millions of neurons. Feature extraction in electroencephalography analysis is a core issue that may lead to accurate brain mental state classification. This paper presents a new feature selection method that improves left/right hand movement identification of a motor imagery brain-computer interface, based on genetic algorithms and artificial neural networks used as classifiers. Raw electroencephalography signals are first preprocessed using appropriate filtering. Feature extraction is carried out afterwards, based on spectral and temporal signal components, and thus a feature vector is constructed. As various features might be inaccurate and mislead the classifier, thus degrading the overall system performance, the proposed approach identifies a subset of features from a large feature space, such that the classifier error rate is reduced. Experimental results show that the proposed method is able to reduce the number of features to as low as 0.5% (i.e., the number of ignored features can reach 99.5%) while improving the accuracy, sensitivity, specificity, and precision of the classifier.

  13. Robust Joint Graph Sparse Coding for Unsupervised Spectral Feature Selection.

    PubMed

    Zhu, Xiaofeng; Li, Xuelong; Zhang, Shichao; Ju, Chunhua; Wu, Xindong

    2017-06-01

    In this paper, we propose a new unsupervised spectral feature selection model by embedding a graph regularizer into the framework of joint sparse regression for preserving the local structures of data. To do this, we first extract the bases of training data by previous dictionary learning methods and, then, map original data into the basis space to generate their new representations, by proposing a novel joint graph sparse coding (JGSC) model. In JGSC, we first formulate its objective function by simultaneously taking subspace learning and joint sparse regression into account, then, design a new optimization solution to solve the resulting objective function, and further prove the convergence of the proposed solution. Furthermore, we extend JGSC to a robust JGSC (RJGSC) via replacing the least square loss function with a robust loss function, for achieving the same goals and also avoiding the impact of outliers. Finally, experimental results on real data sets showed that both JGSC and RJGSC outperformed the state-of-the-art algorithms in terms of k -nearest neighbor classification performance.

  14. A soft computing based approach using modified selection strategy for feature reduction of medical systems.

    PubMed

    Zuhtuogullari, Kursat; Allahverdi, Novruz; Arikan, Nihat

    2013-01-01

    The systems consisting high input spaces require high processing times and memory usage. Most of the attribute selection algorithms have the problems of input dimensions limits and information storage problems. These problems are eliminated by means of developed feature reduction software using new modified selection mechanism with middle region solution candidates adding. The hybrid system software is constructed for reducing the input attributes of the systems with large number of input variables. The designed software also supports the roulette wheel selection mechanism. Linear order crossover is used as the recombination operator. In the genetic algorithm based soft computing methods, locking to the local solutions is also a problem which is eliminated by using developed software. Faster and effective results are obtained in the test procedures. Twelve input variables of the urological system have been reduced to the reducts (reduced input attributes) with seven, six, and five elements. It can be seen from the obtained results that the developed software with modified selection has the advantages in the fields of memory allocation, execution time, classification accuracy, sensitivity, and specificity values when compared with the other reduction algorithms by using the urological test data.

  15. A Soft Computing Based Approach Using Modified Selection Strategy for Feature Reduction of Medical Systems

    PubMed Central

    Zuhtuogullari, Kursat; Allahverdi, Novruz; Arikan, Nihat

    2013-01-01

    The systems consisting high input spaces require high processing times and memory usage. Most of the attribute selection algorithms have the problems of input dimensions limits and information storage problems. These problems are eliminated by means of developed feature reduction software using new modified selection mechanism with middle region solution candidates adding. The hybrid system software is constructed for reducing the input attributes of the systems with large number of input variables. The designed software also supports the roulette wheel selection mechanism. Linear order crossover is used as the recombination operator. In the genetic algorithm based soft computing methods, locking to the local solutions is also a problem which is eliminated by using developed software. Faster and effective results are obtained in the test procedures. Twelve input variables of the urological system have been reduced to the reducts (reduced input attributes) with seven, six, and five elements. It can be seen from the obtained results that the developed software with modified selection has the advantages in the fields of memory allocation, execution time, classification accuracy, sensitivity, and specificity values when compared with the other reduction algorithms by using the urological test data. PMID:23573172

  16. Optimal subset selection of primary sequence features using the genetic algorithm for thermophilic proteins identification.

    PubMed

    Wang, LiQiang; Li, CuiFeng

    2014-10-01

    A genetic algorithm (GA) coupled with multiple linear regression (MLR) was used to extract useful features from amino acids and g-gap dipeptides for distinguishing between thermophilic and non-thermophilic proteins. The method was trained by a benchmark dataset of 915 thermophilic and 793 non-thermophilic proteins. The method reached an overall accuracy of 95.4 % in a Jackknife test using nine amino acids, 38 0-gap dipeptides and 29 1-gap dipeptides. The accuracy as a function of protein size ranged between 85.8 and 96.9 %. The overall accuracies of three independent tests were 93, 93.4 and 91.8 %. The observed results of detecting thermophilic proteins suggest that the GA-MLR approach described herein should be a powerful method for selecting features that describe thermostabile machines and be an aid in the design of more stable proteins.

  17. Joint approximate diagonalization of eigenmatrices as a high-throughput approach for analysis of hyphenated and comprehensive two-dimensional gas chromatographic data.

    PubMed

    Zarghani, Maryam; Parastar, Hadi

    2017-11-17

    The objective of the present work is development of joint approximate diagonalization of eigenmatrices (JADE) as a member of independent component analysis (ICA) family, for the analysis of gas chromatography-mass spectrometry (GC-MS) and comprehensive two-dimensional gas chromatography-mass spectrometry (GC×GC-MS) data to address incomplete separation problem occurred during the analysis of complex sample matrices. In this regard, simulated GC-MS and GC×GC-MS data sets with different number of components, different degree of overlap and noise were evaluated. In the case of simultaneous analysis of multiple samples, column-wise augmentation for GC-MS and column-wise super-augmentation for GC×GC-MS was used before JADE analysis. The performance of JADE was evaluated in terms of statistical parameters of lack of fit (LOF), mutual information (MI) and Amari index as well as analytical figures of merit (AFOMs) obtained from calibration curves. In addition, the area of feasible solutions (AFSs) was calculated by two different approaches of MCR-BANDs and polygon inflation algorithm (FACPACK). Furthermore, JADE performance was compared with multivariate curve resolution-alternating least squares (MCR-ALS) and other ICA algorithms of mean-field ICA (MFICA) and mutual information least dependent component analysis (MILCA). In all cases, JADE could successfully resolve the elution and spectral profiles in GC-MS and GC×GC-MS data with acceptable statistical and calibration parameters and their solutions were in AFSs. To check the applicability of JADE in real cases, JADE was used for resolution and quantification of phenanthrene and anthracene in aromatic fraction of heavy fuel oil (HFO) analyzed by GC×GC-MS. Surprisingly, pure elution and spectral profiles of target compounds were properly resolved in the presence of baseline and interferences using JADE. Once more, the performance of JADE was compared with MCR-ALS in real case. On this matter, the mutual information

  18. Effective and extensible feature extraction method using genetic algorithm-based frequency-domain feature search for epileptic EEG multiclassification

    PubMed Central

    Wen, Tingxi; Zhang, Zhongnan

    2017-01-01

    Abstract In this paper, genetic algorithm-based frequency-domain feature search (GAFDS) method is proposed for the electroencephalogram (EEG) analysis of epilepsy. In this method, frequency-domain features are first searched and then combined with nonlinear features. Subsequently, these features are selected and optimized to classify EEG signals. The extracted features are analyzed experimentally. The features extracted by GAFDS show remarkable independence, and they are superior to the nonlinear features in terms of the ratio of interclass distance and intraclass distance. Moreover, the proposed feature search method can search for features of instantaneous frequency in a signal after Hilbert transformation. The classification results achieved using these features are reasonable; thus, GAFDS exhibits good extensibility. Multiple classical classifiers (i.e., k-nearest neighbor, linear discriminant analysis, decision tree, AdaBoost, multilayer perceptron, and Naïve Bayes) achieve satisfactory classification accuracies by using the features generated by the GAFDS method and the optimized feature selection. The accuracies for 2-classification and 3-classification problems may reach up to 99% and 97%, respectively. Results of several cross-validation experiments illustrate that GAFDS is effective in the extraction of effective features for EEG classification. Therefore, the proposed feature selection and optimization model can improve classification accuracy. PMID:28489789

  19. Effective and extensible feature extraction method using genetic algorithm-based frequency-domain feature search for epileptic EEG multiclassification.

    PubMed

    Wen, Tingxi; Zhang, Zhongnan

    2017-05-01

    In this paper, genetic algorithm-based frequency-domain feature search (GAFDS) method is proposed for the electroencephalogram (EEG) analysis of epilepsy. In this method, frequency-domain features are first searched and then combined with nonlinear features. Subsequently, these features are selected and optimized to classify EEG signals. The extracted features are analyzed experimentally. The features extracted by GAFDS show remarkable independence, and they are superior to the nonlinear features in terms of the ratio of interclass distance and intraclass distance. Moreover, the proposed feature search method can search for features of instantaneous frequency in a signal after Hilbert transformation. The classification results achieved using these features are reasonable; thus, GAFDS exhibits good extensibility. Multiple classical classifiers (i.e., k-nearest neighbor, linear discriminant analysis, decision tree, AdaBoost, multilayer perceptron, and Naïve Bayes) achieve satisfactory classification accuracies by using the features generated by the GAFDS method and the optimized feature selection. The accuracies for 2-classification and 3-classification problems may reach up to 99% and 97%, respectively. Results of several cross-validation experiments illustrate that GAFDS is effective in the extraction of effective features for EEG classification. Therefore, the proposed feature selection and optimization model can improve classification accuracy.

  20. Sentiment analysis of feature ranking methods for classification accuracy

    NASA Astrophysics Data System (ADS)

    Joseph, Shashank; Mugauri, Calvin; Sumathy, S.

    2017-11-01

    Text pre-processing and feature selection are important and critical steps in text mining. Text pre-processing of large volumes of datasets is a difficult task as unstructured raw data is converted into structured format. Traditional methods of processing and weighing took much time and were less accurate. To overcome this challenge, feature ranking techniques have been devised. A feature set from text preprocessing is fed as input for feature selection. Feature selection helps improve text classification accuracy. Of the three feature selection categories available, the filter category will be the focus. Five feature ranking methods namely: document frequency, standard deviation information gain, CHI-SQUARE, and weighted-log likelihood -ratio is analyzed.

  1. An application of locally linear model tree algorithm with combination of feature selection in credit scoring

    NASA Astrophysics Data System (ADS)

    Siami, Mohammad; Gholamian, Mohammad Reza; Basiri, Javad

    2014-10-01

    Nowadays, credit scoring is one of the most important topics in the banking sector. Credit scoring models have been widely used to facilitate the process of credit assessing. In this paper, an application of the locally linear model tree algorithm (LOLIMOT) was experimented to evaluate the superiority of its performance to predict the customer's credit status. The algorithm is improved with an aim of adjustment by credit scoring domain by means of data fusion and feature selection techniques. Two real world credit data sets - Australian and German - from UCI machine learning database were selected to demonstrate the performance of our new classifier. The analytical results indicate that the improved LOLIMOT significantly increase the prediction accuracy.

  2. Flight State Identification of a Self-Sensing Wing via an Improved Feature Selection Method and Machine Learning Approaches.

    PubMed

    Chen, Xi; Kopsaftopoulos, Fotis; Wu, Qi; Ren, He; Chang, Fu-Kuo

    2018-04-29

    In this work, a data-driven approach for identifying the flight state of a self-sensing wing structure with an embedded multi-functional sensing network is proposed. The flight state is characterized by the structural vibration signals recorded from a series of wind tunnel experiments under varying angles of attack and airspeeds. A large feature pool is created by extracting potential features from the signals covering the time domain, the frequency domain as well as the information domain. Special emphasis is given to feature selection in which a novel filter method is developed based on the combination of a modified distance evaluation algorithm and a variance inflation factor. Machine learning algorithms are then employed to establish the mapping relationship from the feature space to the practical state space. Results from two case studies demonstrate the high identification accuracy and the effectiveness of the model complexity reduction via the proposed method, thus providing new perspectives of self-awareness towards the next generation of intelligent air vehicles.

  3. Flight State Identification of a Self-Sensing Wing via an Improved Feature Selection Method and Machine Learning Approaches

    PubMed Central

    Chen, Xi; Wu, Qi; Ren, He; Chang, Fu-Kuo

    2018-01-01

    In this work, a data-driven approach for identifying the flight state of a self-sensing wing structure with an embedded multi-functional sensing network is proposed. The flight state is characterized by the structural vibration signals recorded from a series of wind tunnel experiments under varying angles of attack and airspeeds. A large feature pool is created by extracting potential features from the signals covering the time domain, the frequency domain as well as the information domain. Special emphasis is given to feature selection in which a novel filter method is developed based on the combination of a modified distance evaluation algorithm and a variance inflation factor. Machine learning algorithms are then employed to establish the mapping relationship from the feature space to the practical state space. Results from two case studies demonstrate the high identification accuracy and the effectiveness of the model complexity reduction via the proposed method, thus providing new perspectives of self-awareness towards the next generation of intelligent air vehicles. PMID:29710832

  4. Natural scene logo recognition by joint boosting feature selection in salient regions

    NASA Astrophysics Data System (ADS)

    Fan, Wei; Sun, Jun; Naoi, Satoshi; Minagawa, Akihiro; Hotta, Yoshinobu

    2011-01-01

    Logos are considered valuable intellectual properties and a key component of the goodwill of a business. In this paper, we propose a natural scene logo recognition method which is segmentation-free and capable of processing images extremely rapidly and achieving high recognition rates. The classifiers for each logo are trained jointly, rather than independently. In this way, common features can be shared across multiple classes for better generalization. To deal with large range of aspect ratio of different logos, a set of salient regions of interest (ROI) are extracted to describe each class. We ensure the selected ROIs to be both individually informative and two-by-two weakly dependant by a Class Conditional Entropy Maximization criteria. Experimental results on a large logo database demonstrate the effectiveness and efficiency of our proposed method.

  5. Selective binding of choline by a phosphate-coordination-based triple helicate featuring an aromatic box.

    PubMed

    Jia, Chuandong; Zuo, Wei; Yang, Dong; Chen, Yanming; Cao, Liping; Custelcean, Radu; Hostaš, Jiří; Hobza, Pavel; Glaser, Robert; Wang, Yao-Yu; Yang, Xiao-Juan; Wu, Biao

    2017-10-16

    In nature, proteins have evolved sophisticated cavities tailored for capturing target guests selectively among competitors of similar size, shape, and charge. The fundamental principles guiding the molecular recognition, such as self-assembly and complementarity, have inspired the development of biomimetic receptors. In the current work, we report a self-assembled triple anion helicate (host 2) featuring a cavity resembling that of the choline-binding protein ChoX, as revealed by crystal and density functional theory (DFT)-optimized structures, which binds choline in a unique dual-site-binding mode. This similarity in structure leads to a similarly high selectivity of host 2 for choline over its derivatives, as demonstrated by the NMR and fluorescence competition experiments. Furthermore, host 2 is able to act as a fluorescence displacement sensor for discriminating choline, acetylcholine, L-carnitine, and glycine betaine effectively.The choline-binding protein ChoX exhibits a synergistic dual-site binding mode that allows it to discriminate choline over structural analogues. Here, the authors design a biomimetic triple anion helicate receptor whose selectivity for choline arises from a similar binding mechanism.

  6. A method for feature selection of APT samples based on entropy

    NASA Astrophysics Data System (ADS)

    Du, Zhenyu; Li, Yihong; Hu, Jinsong

    2018-05-01

    By studying the known APT attack events deeply, this paper propose a feature selection method of APT sample and a logic expression generation algorithm IOCG (Indicator of Compromise Generate). The algorithm can automatically generate machine readable IOCs (Indicator of Compromise), to solve the existing IOCs logical relationship is fixed, the number of logical items unchanged, large scale and cannot generate a sample of the limitations of the expression. At the same time, it can reduce the redundancy and useless APT sample processing time consumption, and improve the sharing rate of information analysis, and actively respond to complex and volatile APT attack situation. The samples were divided into experimental set and training set, and then the algorithm was used to generate the logical expression of the training set with the IOC_ Aware plug-in. The contrast expression itself was different from the detection result. The experimental results show that the algorithm is effective and can improve the detection effect.

  7. A multiple kernel support vector machine scheme for feature selection and rule extraction from gene expression data of cancer tissue.

    PubMed

    Chen, Zhenyu; Li, Jianping; Wei, Liwei

    2007-10-01

    Recently, gene expression profiling using microarray techniques has been shown as a promising tool to improve the diagnosis and treatment of cancer. Gene expression data contain high level of noise and the overwhelming number of genes relative to the number of available samples. It brings out a great challenge for machine learning and statistic techniques. Support vector machine (SVM) has been successfully used to classify gene expression data of cancer tissue. In the medical field, it is crucial to deliver the user a transparent decision process. How to explain the computed solutions and present the extracted knowledge becomes a main obstacle for SVM. A multiple kernel support vector machine (MK-SVM) scheme, consisting of feature selection, rule extraction and prediction modeling is proposed to improve the explanation capacity of SVM. In this scheme, we show that the feature selection problem can be translated into an ordinary multiple parameters learning problem. And a shrinkage approach: 1-norm based linear programming is proposed to obtain the sparse parameters and the corresponding selected features. We propose a novel rule extraction approach using the information provided by the separating hyperplane and support vectors to improve the generalization capacity and comprehensibility of rules and reduce the computational complexity. Two public gene expression datasets: leukemia dataset and colon tumor dataset are used to demonstrate the performance of this approach. Using the small number of selected genes, MK-SVM achieves encouraging classification accuracy: more than 90% for both two datasets. Moreover, very simple rules with linguist labels are extracted. The rule sets have high diagnostic power because of their good classification performance.

  8. Investigation of frame-to-frame back projection and feature selection algorithms for non-line-of-sight laser gated viewing

    NASA Astrophysics Data System (ADS)

    Laurenzis, Martin; Velten, Andreas

    2014-10-01

    In the present paper, we discuss new approaches to analyze laser gated viewing data for non-line-of-sight vision with a novel frame-to-frame back projection as well as feature selection algorithms. While first back projection approaches use time transients for each pixel, our new method has the ability to calculate the projection of imaging data on the obscured voxel space for each frame. Further, four different data analysis algorithms were studied with the aim to identify and select signals from different target positions. A slight modification of commonly used filters leads to powerful selection of local maximum values. It is demonstrated that the choice of the filter has impact on the selectivity i.e. multiple target detection as well as on the localization precision.

  9. Method of generating features optimal to a dataset and classifier

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bruillard, Paul J.; Gosink, Luke J.; Jarman, Kenneth D.

    A method of generating features optimal to a particular dataset and classifier is disclosed. A dataset of messages is inputted and a classifier is selected. An algebra of features is encoded. Computable features that are capable of describing the dataset from the algebra of features are selected. Irredundant features that are optimal for the classifier and the dataset are selected.

  10. Hybridization between multi-objective genetic algorithm and support vector machine for feature selection in walker-assisted gait.

    PubMed

    Martins, Maria; Costa, Lino; Frizera, Anselmo; Ceres, Ramón; Santos, Cristina

    2014-03-01

    Walker devices are often prescribed incorrectly to patients, leading to the increase of dissatisfaction and occurrence of several problems, such as, discomfort and pain. Thus, it is necessary to objectively evaluate the effects that assisted gait can have on the gait patterns of walker users, comparatively to a non-assisted gait. A gait analysis, focusing on spatiotemporal and kinematics parameters, will be issued for this purpose. However, gait analysis yields redundant information that often is difficult to interpret. This study addresses the problem of selecting the most relevant gait features required to differentiate between assisted and non-assisted gait. For that purpose, it is presented an efficient approach that combines evolutionary techniques, based on genetic algorithms, and support vector machine algorithms, to discriminate differences between assisted and non-assisted gait with a walker with forearm supports. For comparison purposes, other classification algorithms are verified. Results with healthy subjects show that the main differences are characterized by balance and joints excursion in the sagittal plane. These results, confirmed by clinical evidence, allow concluding that this technique is an efficient feature selection approach. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  11. An unsupervised technique for optimal feature selection in attribute profiles for spectral-spatial classification of hyperspectral images

    NASA Astrophysics Data System (ADS)

    Bhardwaj, Kaushal; Patra, Swarnajyoti

    2018-04-01

    Inclusion of spatial information along with spectral features play a significant role in classification of remote sensing images. Attribute profiles have already proved their ability to represent spatial information. In order to incorporate proper spatial information, multiple attributes are required and for each attribute large profiles need to be constructed by varying the filter parameter values within a wide range. Thus, the constructed profiles that represent spectral-spatial information of an hyperspectral image have huge dimension which leads to Hughes phenomenon and increases computational burden. To mitigate these problems, this work presents an unsupervised feature selection technique that selects a subset of filtered image from the constructed high dimensional multi-attribute profile which are sufficiently informative to discriminate well among classes. In this regard the proposed technique exploits genetic algorithms (GAs). The fitness function of GAs are defined in an unsupervised way with the help of mutual information. The effectiveness of the proposed technique is assessed using one-against-all support vector machine classifier. The experiments conducted on three hyperspectral data sets show the robustness of the proposed method in terms of computation time and classification accuracy.

  12. Systematic feature selection improves accuracy of methylation-based forensic age estimation in Han Chinese males.

    PubMed

    Feng, Lei; Peng, Fuduan; Li, Shanfei; Jiang, Li; Sun, Hui; Ji, Anquan; Zeng, Changqing; Li, Caixia; Liu, Fan

    2018-03-23

    Estimating individual age from biomarkers may provide key information facilitating forensic investigations. Recent progress has shown DNA methylation at age-associated CpG sites as the most informative biomarkers for estimating the individual age of an unknown donor. Optimal feature selection plays a critical role in determining the performance of the final prediction model. In this study we investigate methylation levels at 153 age-associated CpG sites from 21 previously reported genomic regions using the EpiTYPER system for their predictive power on individual age in 390 Han Chinese males ranging from 15 to 75 years of age. We conducted a systematic feature selection using a stepwise backward multiple linear regression analysis as well as an exhaustive searching algorithm. Both approaches identified the same subset of 9 CpG sites, which in linear combination provided the optimal model fitting with mean absolute deviation (MAD) of 2.89 years of age and explainable variance (R 2 ) of 0.92. The final model was validated in two independent Han Chinese male samples (validation set 1, N = 65, MAD = 2.49, R 2  = 0.95, and validation set 2, N = 62, MAD = 3.36, R 2  = 0.89). Other competing models such as support vector machine and artificial neural network did not outperform the linear model to any noticeable degree. The validation set 1 was additionally analyzed using Pyrosequencing technology for cross-platform validation and was termed as validation set 3. Directly applying our model, in which the methylation levels were detected by the EpiTYPER system, to the data from pyrosequencing technology showed, however, less accurate results in terms of MAD (validation set 3, N = 65 Han Chinese males, MAD = 4.20, R 2  = 0.93), suggesting the presence of a batch effect between different data generation platforms. This batch effect could be partially overcome by a z-score transformation (MAD = 2.76, R 2  = 0.93). Overall, our

  13. 3 CFR - Limited Waiver of Certain Sanctions Imposed by, and Delegation of Certain Authorities Pursuant to...

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... Delegation of Certain Authorities Pursuant to, the Tom Lantos Block Burmese JADE (Junta's Anti-Democratic... Authorities Pursuant to, the Tom Lantos Block Burmese JADE (Junta's Anti-Democratic Efforts) Act of 2008... (Junta's Anti-Democratic Efforts) Act of 2008 (Public Law 110-286) (JADE Act) and section 301 of title 3...

  14. Defect Detection in Arc-Welding Processes by Means of the Line-to-Continuum Method and Feature Selection.

    PubMed

    Garcia-Allende, P Beatriz; Mirapeix, Jesus; Conde, Olga M; Cobo, Adolfo; Lopez-Higuera, Jose M

    2009-01-01

    Plasma optical spectroscopy is widely employed in on-line welding diagnostics. The determination of the plasma electron temperature, which is typically selected as the output monitoring parameter, implies the identification of the atomic emission lines. As a consequence, additional processing stages are required with a direct impact on the real time performance of the technique. The line-to-continuum method is a feasible alternative spectroscopic approach and it is particularly interesting in terms of its computational efficiency. However, the monitoring signal highly depends on the chosen emission line. In this paper, a feature selection methodology is proposed to solve the uncertainty regarding the selection of the optimum spectral band, which allows the employment of the line-to-continuum method for on-line welding diagnostics. Field test results have been conducted to demonstrate the feasibility of the solution.

  15. Selective Exposure to Health Information: The Role of Headline Features in the Choice of Health Newsletter Articles.

    PubMed

    Kim, Hyun Suk; Forquer, Heather; Rusko, Joseph; Hornik, Robert C; Cappella, Joseph N

    2016-01-01

    This study investigated how content and context features of headlines drive selective exposure when choosing between headlines of a monthly e-mail health newsletter in a naturalistic setting over a period of nine months. Study participants received a monthly e-mail newsletter and could freely open it and click any headline to read the accompanying article. In each e-mail newsletter, nine headlines competed with each other for selection. Textual and visual information of the headlines was content-analyzed, and clickstream data on the headlines were collected automatically. The results showed that headlines invited more frequent audience selections when they provided efficacy-signaling information in an imperative voice, when they used a moderate number of negative emotion words, when they presented negative thumbnail images while mentioning cancer or other diseases, and when they were placed higher in position.

  16. A swarm-trained k-nearest prototypes adaptive classifier with automatic feature selection for interval data.

    PubMed

    Silva Filho, Telmo M; Souza, Renata M C R; Prudêncio, Ricardo B C

    2016-08-01

    Some complex data types are capable of modeling data variability and imprecision. These data types are studied in the symbolic data analysis field. One such data type is interval data, which represents ranges of values and is more versatile than classic point data for many domains. This paper proposes a new prototype-based classifier for interval data, trained by a swarm optimization method. Our work has two main contributions: a swarm method which is capable of performing both automatic selection of features and pruning of unused prototypes and a generalized weighted squared Euclidean distance for interval data. By discarding unnecessary features and prototypes, the proposed algorithm deals with typical limitations of prototype-based methods, such as the problem of prototype initialization. The proposed distance is useful for learning classes in interval datasets with different shapes, sizes and structures. When compared to other prototype-based methods, the proposed method achieves lower error rates in both synthetic and real interval datasets. Copyright © 2016 Elsevier Ltd. All rights reserved.

  17. Sensor feature fusion for detecting buried objects

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Clark, G.A.; Sengupta, S.K.; Sherwood, R.J.

    1993-04-01

    Given multiple registered images of the earth`s surface from dual-band sensors, our system fuses information from the sensors to reduce the effects of clutter and improve the ability to detect buried or surface target sites. The sensor suite currently includes two sensors (5 micron and 10 micron wavelengths) and one ground penetrating radar (GPR) of the wide-band pulsed synthetic aperture type. We use a supervised teaming pattern recognition approach to detect metal and plastic land mines buried in soil. The overall process consists of four main parts: Preprocessing, feature extraction, feature selection, and classification. These parts are used in amore » two step process to classify a subimage. Thee first step, referred to as feature selection, determines the features of sub-images which result in the greatest separability among the classes. The second step, image labeling, uses the selected features and the decisions from a pattern classifier to label the regions in the image which are likely to correspond to buried mines. We extract features from the images, and use feature selection algorithms to select only the most important features according to their contribution to correct detections. This allows us to save computational complexity and determine which of the sensors add value to the detection system. The most important features from the various sensors are fused using supervised teaming pattern classifiers (including neural networks). We present results of experiments to detect buried land mines from real data, and evaluate the usefulness of fusing feature information from multiple sensor types, including dual-band infrared and ground penetrating radar. The novelty of the work lies mostly in the combination of the algorithms and their application to the very important and currently unsolved operational problem of detecting buried land mines from an airborne standoff platform.« less

  18. Selective binding of choline by a phosphate-coordination-based triple helicate featuring an aromatic box

    DOE PAGES

    Jia, Chuandong; Zuo, Wei; Yang, Dong; ...

    2017-10-16

    In nature, proteins have evolved sophisticated cavities tailored for capturing target guests selectively among competitors of similar size, shape, and charge. The fundamental principles guiding the molecular recognition, such as self-assembly and complementarity, have inspired the development of biomimetic receptors. In the current work, we report a self-assembled triple anion helicate (host 2) featuring a cavity resembling that of the choline-binding protein ChoX, as revealed by crystal and density functional theory (DFT)-optimized structures, which binds choline in a unique dual-site-binding mode. Here, this similarity in structure leads to a similarly high selectivity of host 2 for choline over its derivatives,more » as demonstrated by the NMR and fluorescence competition experiments. Furthermore, host 2 is able to act as a fluorescence displacement sensor for discriminating choline, acetylcholine, l-carnitine, and glycine betaine effectively.« less

  19. Relevant, irredundant feature selection and noisy example elimination.

    PubMed

    Lashkia, George V; Anthony, Laurence

    2004-04-01

    In many real-world situations, the method for computing the desired output from a set of inputs is unknown. One strategy for solving these types of problems is to learn the input-output functionality from examples in a training set. However, in many situations it is difficult to know what information is relevant to the task at hand. Subsequently, researchers have investigated ways to deal with the so-called problem of consistency of attributes, i.e., attributes that can distinguish examples from different classes. In this paper, we first prove that the notion of relevance of attributes is directly related to the consistency of attributes, and show how relevant, irredundant attributes can be selected. We then compare different relevant attribute selection algorithms, and show the superiority of algorithms that select irredundant attributes over those that select relevant attributes. We also show that searching for an "optimal" subset of attributes, which is considered to be the main purpose of attribute selection, is not the best way to improve the accuracy of classifiers. Employing sets of relevant, irredundant attributes improves classification accuracy in many more cases. Finally, we propose a new method for selecting relevant examples, which is based on filtering the so-called pattern frequency domain. By identifying examples that are nontypical in the determination of relevant, irredundant attributes, irrelevant examples can be eliminated prior to the learning process. Empirical results using artificial and real databases show the effectiveness of the proposed method in selecting relevant examples leading to improved performance even on greatly reduced training sets.

  20. Some Questions about Feature Re-Assembly

    ERIC Educational Resources Information Center

    White, Lydia

    2009-01-01

    In this commentary, differences between feature re-assembly and feature selection are discussed. Lardiere's proposals are compared to existing approaches to grammatical features in second language (L2) acquisition. Questions are raised about the predictive power of the feature re-assembly approach. (Contains 1 footnote.)

  1. Celebrities and screening: a measurable impact on high-grade cervical neoplasia diagnosis from the ‘Jade Goody effect' in the UK

    PubMed Central

    Casey, G M; Morris, B; Burnell, M; Parberry, A; Singh, N; Rosenthal, A N

    2013-01-01

    Background: The celebrity Jade Goody's cervical cancer diagnosis was associated with increased UK cervical screening attendance. We wanted to establish if there was an increase in high-grade (HG) cervical neoplasia diagnoses, and if so, what the characteristics of the women with HG disease were. Methods: We analysed prospective data on 3233 consecutive colposcopy referrals in North East London, UK, from 01 April 2005 to 30 June 2010. Characteristics and outcomes of pre- and post-Goody cohorts were compared. Results: Goody's diagnosis was associated with an increased incidence of colposcopy referrals in all subsequent annual quarters (incidence rate ratio (IRR) 1.3–1.9, P<0.002–P<0.0005) and increased HG disease diagnoses in the fourth quarter 2008/2009 (IRR 1.3, P=0.05) and first quarter 2009/2010 (IRR 1.3, P=0.07). We observed 1.90-fold (CI: 1.06–3.39), 2.06 (CI: 1.13–3.76) and 2.13-fold (CI: 1.07–4.25) respective increases in the odds of HG disease women being screening-naive in the first and second quarter 2009/2010, and the first quarter 2010/2011 (P<0.04, P<0.02 and P<0.04, respectively). There was a 2.23-fold increase in the odds of screening-naive HG disease women being symptomatic post-Goody's diagnosis (P=0.023). The age distributions of the pre- and post-Goody cohorts did not differ in any study group. Conclusion: Continued publicity about celebrities' diagnoses might encourage screening in at-risk populations. PMID:23963142

  2. Feature Selection for Object-Based Classification of High-Resolution Remote Sensing Images Based on the Combination of a Genetic Algorithm and Tabu Search

    PubMed Central

    Shi, Lei; Wan, Youchuan; Gao, Xianjun

    2018-01-01

    In object-based image analysis of high-resolution images, the number of features can reach hundreds, so it is necessary to perform feature reduction prior to classification. In this paper, a feature selection method based on the combination of a genetic algorithm (GA) and tabu search (TS) is presented. The proposed GATS method aims to reduce the premature convergence of the GA by the use of TS. A prematurity index is first defined to judge the convergence situation during the search. When premature convergence does take place, an improved mutation operator is executed, in which TS is performed on individuals with higher fitness values. As for the other individuals with lower fitness values, mutation with a higher probability is carried out. Experiments using the proposed GATS feature selection method and three other methods, a standard GA, the multistart TS method, and ReliefF, were conducted on WorldView-2 and QuickBird images. The experimental results showed that the proposed method outperforms the other methods in terms of the final classification accuracy. PMID:29581721

  3. Habitat selection in a rocky landscape: experimentally decoupling the influence of retreat site attributes from that of landscape features.

    PubMed

    Croak, Benjamin M; Pike, David A; Webb, Jonathan K; Shine, Richard

    2012-01-01

    Organisms selecting retreat sites may evaluate not only the quality of the specific shelter, but also the proximity of that site to resources in the surrounding area. Distinguishing between habitat selection at these two spatial scales is complicated by co-variation among microhabitat factors (i.e., the attributes of individual retreat sites often correlate with their proximity to landscape features). Disentangling this co-variation may facilitate the restoration or conservation of threatened systems. To experimentally examine the role of landscape attributes in determining retreat-site quality for saxicolous ectotherms, we deployed 198 identical artificial rocks in open (sun-exposed) sites on sandstone outcrops in southeastern Australia, and recorded faunal usage of those retreat sites over the next 29 months. Several landscape-scale attributes were associated with occupancy of experimental rocks, but different features were important for different species. For example, endangered broad-headed snakes (Hoplocephalus bungaroides) preferred retreat sites close to cliff edges, flat rock spiders (Hemicloea major) preferred small outcrops, and velvet geckos (Oedura lesueurii) preferred rocks close to the cliff edge with higher-than-average sun exposure. Standardized retreat sites can provide robust experimental data on the effects of landscape-scale attributes on retreat site selection, revealing interspecific divergences among sympatric taxa that use similar habitats.

  4. Training Humans to Categorize Monkey Calls: Auditory Feature- and Category-Selective Neural Tuning Changes.

    PubMed

    Jiang, Xiong; Chevillet, Mark A; Rauschecker, Josef P; Riesenhuber, Maximilian

    2018-04-18

    Grouping auditory stimuli into common categories is essential for a variety of auditory tasks, including speech recognition. We trained human participants to categorize auditory stimuli from a large novel set of morphed monkey vocalizations. Using fMRI-rapid adaptation (fMRI-RA) and multi-voxel pattern analysis (MVPA) techniques, we gained evidence that categorization training results in two distinct sets of changes: sharpened tuning to monkey call features (without explicit category representation) in left auditory cortex and category selectivity for different types of calls in lateral prefrontal cortex. In addition, the sharpness of neural selectivity in left auditory cortex, as estimated with both fMRI-RA and MVPA, predicted the steepness of the categorical boundary, whereas categorical judgment correlated with release from adaptation in the left inferior frontal gyrus. These results support the theory that auditory category learning follows a two-stage model analogous to the visual domain, suggesting general principles of perceptual category learning in the human brain. Copyright © 2018 Elsevier Inc. All rights reserved.

  5. A Circular Polarizer with Beamforming Feature Based on Frequency Selective Surfaces

    NASA Astrophysics Data System (ADS)

    Yin, Jia Yuan; Wan, Xiang; Ren, Jian; Cui, Tie Jun

    2017-01-01

    We propose a circular polarizer with beamforming features based on frequency selective surface (FSS), in which a modified anchor-shaped unit cell is used to reach the circular polarizer function. The beamforming characteristic is realized by a particular design of the unit-phase distribution, which is obtained by varying the scale of the unit cell. Instead of using plane waves, a horn antenna is designed to feed the phase-variant FSS. The proposed two-layer FSS is fabricated and measured to verify the design. The measured results show that the proposed structure can convert the linearly polarized waves to circularly polarized waves. Compared with the feeding horn antenna, the transmitted beam of the FSS-added horn is 14.43° broader in one direction, while 3.77° narrower in the orthogonal direction. To our best knowledge, this is the first time to realize circular polarizer with beamforming as the extra function based on FSS, which is promising in satellite and communication systems for potential applications due to its simple design and good performance.

  6. Proteomics Versus Clinical Data and Stochastic Local Search Based Feature Selection for Acute Myeloid Leukemia Patients' Classification.

    PubMed

    Chebouba, Lokmane; Boughaci, Dalila; Guziolowski, Carito

    2018-06-04

    The use of data issued from high throughput technologies in drug target problems is widely widespread during the last decades. This study proposes a meta-heuristic framework using stochastic local search (SLS) combined with random forest (RF) where the aim is to specify the most important genes and proteins leading to the best classification of Acute Myeloid Leukemia (AML) patients. First we use a stochastic local search meta-heuristic as a feature selection technique to select the most significant proteins to be used in the classification task step. Then we apply RF to classify new patients into their corresponding classes. The evaluation technique is to run the RF classifier on the training data to get a model. Then, we apply this model on the test data to find the appropriate class. We use as metrics the balanced accuracy (BAC) and the area under the receiver operating characteristic curve (AUROC) to measure the performance of our model. The proposed method is evaluated on the dataset issued from DREAM 9 challenge. The comparison is done with a pure random forest (without feature selection), and with the two best ranked results of the DREAM 9 challenge. We used three types of data: only clinical data, only proteomics data, and finally clinical and proteomics data combined. The numerical results show that the highest scores are obtained when using clinical data alone, and the lowest is obtained when using proteomics data alone. Further, our method succeeds in finding promising results compared to the methods presented in the DREAM challenge.

  7. Diagnosing Autism Spectrum Disorder from Brain Resting-State Functional Connectivity Patterns Using a Deep Neural Network with a Novel Feature Selection Method

    PubMed Central

    Guo, Xinyu; Dominick, Kelli C.; Minai, Ali A.; Li, Hailong; Erickson, Craig A.; Lu, Long J.

    2017-01-01

    The whole-brain functional connectivity (FC) pattern obtained from resting-state functional magnetic resonance imaging data are commonly applied to study neuropsychiatric conditions such as autism spectrum disorder (ASD) by using different machine learning models. Recent studies indicate that both hyper- and hypo- aberrant ASD-associated FCs were widely distributed throughout the entire brain rather than only in some specific brain regions. Deep neural networks (DNN) with multiple hidden layers have shown the ability to systematically extract lower-to-higher level information from high dimensional data across a series of neural hidden layers, significantly improving classification accuracy for such data. In this study, a DNN with a novel feature selection method (DNN-FS) is developed for the high dimensional whole-brain resting-state FC pattern classification of ASD patients vs. typical development (TD) controls. The feature selection method is able to help the DNN generate low dimensional high-quality representations of the whole-brain FC patterns by selecting features with high discriminating power from multiple trained sparse auto-encoders. For the comparison, a DNN without the feature selection method (DNN-woFS) is developed, and both of them are tested with different architectures (i.e., with different numbers of hidden layers/nodes). Results show that the best classification accuracy of 86.36% is generated by the DNN-FS approach with 3 hidden layers and 150 hidden nodes (3/150). Remarkably, DNN-FS outperforms DNN-woFS for all architectures studied. The most significant accuracy improvement was 9.09% with the 3/150 architecture. The method also outperforms other feature selection methods, e.g., two sample t-test and elastic net. In addition to improving the classification accuracy, a Fisher's score-based biomarker identification method based on the DNN is also developed, and used to identify 32 FCs related to ASD. These FCs come from or cross different pre

  8. Diagnosing Autism Spectrum Disorder from Brain Resting-State Functional Connectivity Patterns Using a Deep Neural Network with a Novel Feature Selection Method.

    PubMed

    Guo, Xinyu; Dominick, Kelli C; Minai, Ali A; Li, Hailong; Erickson, Craig A; Lu, Long J

    2017-01-01

    The whole-brain functional connectivity (FC) pattern obtained from resting-state functional magnetic resonance imaging data are commonly applied to study neuropsychiatric conditions such as autism spectrum disorder (ASD) by using different machine learning models. Recent studies indicate that both hyper- and hypo- aberrant ASD-associated FCs were widely distributed throughout the entire brain rather than only in some specific brain regions. Deep neural networks (DNN) with multiple hidden layers have shown the ability to systematically extract lower-to-higher level information from high dimensional data across a series of neural hidden layers, significantly improving classification accuracy for such data. In this study, a DNN with a novel feature selection method (DNN-FS) is developed for the high dimensional whole-brain resting-state FC pattern classification of ASD patients vs. typical development (TD) controls. The feature selection method is able to help the DNN generate low dimensional high-quality representations of the whole-brain FC patterns by selecting features with high discriminating power from multiple trained sparse auto-encoders. For the comparison, a DNN without the feature selection method (DNN-woFS) is developed, and both of them are tested with different architectures (i.e., with different numbers of hidden layers/nodes). Results show that the best classification accuracy of 86.36% is generated by the DNN-FS approach with 3 hidden layers and 150 hidden nodes (3/150). Remarkably, DNN-FS outperforms DNN-woFS for all architectures studied. The most significant accuracy improvement was 9.09% with the 3/150 architecture. The method also outperforms other feature selection methods, e.g., two sample t -test and elastic net. In addition to improving the classification accuracy, a Fisher's score-based biomarker identification method based on the DNN is also developed, and used to identify 32 FCs related to ASD. These FCs come from or cross different pre

  9. Feature selection and classification of protein-protein complexes based on their binding affinities using machine learning approaches.

    PubMed

    Yugandhar, K; Gromiha, M Michael

    2014-09-01

    Protein-protein interactions are intrinsic to virtually every cellular process. Predicting the binding affinity of protein-protein complexes is one of the challenging problems in computational and molecular biology. In this work, we related sequence features of protein-protein complexes with their binding affinities using machine learning approaches. We set up a database of 185 protein-protein complexes for which the interacting pairs are heterodimers and their experimental binding affinities are available. On the other hand, we have developed a set of 610 features from the sequences of protein complexes and utilized Ranker search method, which is the combination of Attribute evaluator and Ranker method for selecting specific features. We have analyzed several machine learning algorithms to discriminate protein-protein complexes into high and low affinity groups based on their Kd values. Our results showed a 10-fold cross-validation accuracy of 76.1% with the combination of nine features using support vector machines. Further, we observed accuracy of 83.3% on an independent test set of 30 complexes. We suggest that our method would serve as an effective tool for identifying the interacting partners in protein-protein interaction networks and human-pathogen interactions based on the strength of interactions. © 2014 Wiley Periodicals, Inc.

  10. Multi-scale textural feature extraction and particle swarm optimization based model selection for false positive reduction in mammography.

    PubMed

    Zyout, Imad; Czajkowska, Joanna; Grzegorzek, Marcin

    2015-12-01

    The high number of false positives and the resulting number of avoidable breast biopsies are the major problems faced by current mammography Computer Aided Detection (CAD) systems. False positive reduction is not only a requirement for mass but also for calcification CAD systems which are currently deployed for clinical use. This paper tackles two problems related to reducing the number of false positives in the detection of all lesions and masses, respectively. Firstly, textural patterns of breast tissue have been analyzed using several multi-scale textural descriptors based on wavelet and gray level co-occurrence matrix. The second problem addressed in this paper is the parameter selection and performance optimization. For this, we adopt a model selection procedure based on Particle Swarm Optimization (PSO) for selecting the most discriminative textural features and for strengthening the generalization capacity of the supervised learning stage based on a Support Vector Machine (SVM) classifier. For evaluating the proposed methods, two sets of suspicious mammogram regions have been used. The first one, obtained from Digital Database for Screening Mammography (DDSM), contains 1494 regions (1000 normal and 494 abnormal samples). The second set of suspicious regions was obtained from database of Mammographic Image Analysis Society (mini-MIAS) and contains 315 (207 normal and 108 abnormal) samples. Results from both datasets demonstrate the efficiency of using PSO based model selection for optimizing both classifier hyper-parameters and parameters, respectively. Furthermore, the obtained results indicate the promising performance of the proposed textural features and more specifically, those based on co-occurrence matrix of wavelet image representation technique. Copyright © 2015 Elsevier Ltd. All rights reserved.

  11. Binary classification of aqueous solubility using support vector machines with reduction and recombination feature selection.

    PubMed

    Cheng, Tiejun; Li, Qingliang; Wang, Yanli; Bryant, Stephen H

    2011-02-28

    Aqueous solubility is recognized as a critical parameter in both the early- and late-stage drug discovery. Therefore, in silico modeling of solubility has attracted extensive interests in recent years. Most previous studies have been limited in using relatively small data sets with limited diversity, which in turn limits the predictability of derived models. In this work, we present a support vector machines model for the binary classification of solubility by taking advantage of the largest known public data set that contains over 46 000 compounds with experimental solubility. Our model was optimized in combination with a reduction and recombination feature selection strategy. The best model demonstrated robust performance in both cross-validation and prediction of two independent test sets, indicating it could be a practical tool to select soluble compounds for screening, purchasing, and synthesizing. Moreover, our work may be used for comparative evaluation of solubility classification studies ascribe to the use of completely public resources.

  12. Electrical Identification and Selective Microstimulation of Neuronal Compartments Based on Features of Extracellular Action Potentials

    NASA Astrophysics Data System (ADS)

    Radivojevic, Milos; Jäckel, David; Altermatt, Michael; Müller, Jan; Viswam, Vijay; Hierlemann, Andreas; Bakkum, Douglas J.

    2016-08-01

    A detailed, high-spatiotemporal-resolution characterization of neuronal responses to local electrical fields and the capability of precise extracellular microstimulation of selected neurons are pivotal for studying and manipulating neuronal activity and circuits in networks and for developing neural prosthetics. Here, we studied cultured neocortical neurons by using high-density microelectrode arrays and optical imaging, complemented by the patch-clamp technique, and with the aim to correlate morphological and electrical features of neuronal compartments with their responsiveness to extracellular stimulation. We developed strategies to electrically identify any neuron in the network, while subcellular spatial resolution recording of extracellular action potential (AP) traces enabled their assignment to the axon initial segment (AIS), axonal arbor and proximal somatodendritic compartments. Stimulation at the AIS required low voltages and provided immediate, selective and reliable neuronal activation, whereas stimulation at the soma required high voltages and produced delayed and unreliable responses. Subthreshold stimulation at the soma depolarized the somatic membrane potential without eliciting APs.

  13. Machine Learning Feature Selection for Tuning Memory Page Swapping

    DTIC Science & Technology

    2013-09-01

    environments we set up. 13 Figure 4.1 Updated Feature Vector List. Features we added to the kernel are anno - tated with “(MLVM...Feb. 1966. [2] P. J . Denning, “The working set model for program behavior,” Communications of the ACM, vol. 11, no. 5, pp. 323–333, May 1968. [3] L. A...8] R. W. Cart and J . L. Hennessy, “WSClock — A simple and effective algorithm for virtual memory management,” M.S. thesis, Dept. Computer Science

  14. Feature selection through validation and un-censoring of endovascular repair survival data for predicting the risk of re-intervention.

    PubMed

    Attallah, Omneya; Karthikesalingam, Alan; Holt, Peter J E; Thompson, Matthew M; Sayers, Rob; Bown, Matthew J; Choke, Eddie C; Ma, Xianghong

    2017-08-03

    Feature selection (FS) process is essential in the medical area as it reduces the effort and time needed for physicians to measure unnecessary features. Choosing useful variables is a difficult task with the presence of censoring which is the unique characteristic in survival analysis. Most survival FS methods depend on Cox's proportional hazard model; however, machine learning techniques (MLT) are preferred but not commonly used due to censoring. Techniques that have been proposed to adopt MLT to perform FS with survival data cannot be used with the high level of censoring. The researcher's previous publications proposed a technique to deal with the high level of censoring. It also used existing FS techniques to reduce dataset dimension. However, in this paper a new FS technique was proposed and combined with feature transformation and the proposed uncensoring approaches to select a reduced set of features and produce a stable predictive model. In this paper, a FS technique based on artificial neural network (ANN) MLT is proposed to deal with highly censored Endovascular Aortic Repair (EVAR). Survival data EVAR datasets were collected during 2004 to 2010 from two vascular centers in order to produce a final stable model. They contain almost 91% of censored patients. The proposed approach used a wrapper FS method with ANN to select a reduced subset of features that predict the risk of EVAR re-intervention after 5 years to patients from two different centers located in the United Kingdom, to allow it to be potentially applied to cross-centers predictions. The proposed model is compared with the two popular FS techniques; Akaike and Bayesian information criteria (AIC, BIC) that are used with Cox's model. The final model outperforms other methods in distinguishing the high and low risk groups; as they both have concordance index and estimated AUC better than the Cox's model based on AIC, BIC, Lasso, and SCAD approaches. These models have p-values lower than 0

  15. Irrelevant reward and selection histories have different influences on task-relevant attentional selection.

    PubMed

    MacLean, Mary H; Giesbrecht, Barry

    2015-07-01

    Task-relevant and physically salient features influence visual selective attention. In the present study, we investigated the influence of task-irrelevant and physically nonsalient reward-associated features on visual selective attention. Two hypotheses were tested: One predicts that the effects of target-defining task-relevant and task-irrelevant features interact to modulate visual selection; the other predicts that visual selection is determined by the independent combination of relevant and irrelevant feature effects. These alternatives were tested using a visual search task that contained multiple targets, placing a high demand on the need for selectivity, and that was data-limited and required unspeeded responses, emphasizing early perceptual selection processes. One week prior to the visual search task, participants completed a training task in which they learned to associate particular colors with a specific reward value. In the search task, the reward-associated colors were presented surrounding targets and distractors, but were neither physically salient nor task-relevant. In two experiments, the irrelevant reward-associated features influenced performance, but only when they were presented in a task-relevant location. The costs induced by the irrelevant reward-associated features were greater when they oriented attention to a target than to a distractor. In a third experiment, we examined the effects of selection history in the absence of reward history and found that the interaction between task relevance and selection history differed, relative to when the features had previously been associated with reward. The results indicate that under conditions that demand highly efficient perceptual selection, physically nonsalient task-irrelevant and task-relevant factors interact to influence visual selective attention.

  16. A short feature vector for image matching: The Log-Polar Magnitude feature descriptor

    PubMed Central

    Hast, Anders; Wählby, Carolina; Sintorn, Ida-Maria

    2017-01-01

    The choice of an optimal feature detector-descriptor combination for image matching often depends on the application and the image type. In this paper, we propose the Log-Polar Magnitude feature descriptor—a rotation, scale, and illumination invariant descriptor that achieves comparable performance to SIFT on a large variety of image registration problems but with much shorter feature vectors. The descriptor is based on the Log-Polar Transform followed by a Fourier Transform and selection of the magnitude spectrum components. Selecting different frequency components allows optimizing for image patterns specific for a particular application. In addition, by relying only on coordinates of the found features and (optionally) feature sizes our descriptor is completely detector independent. We propose 48- or 56-long feature vectors that potentially can be shortened even further depending on the application. Shorter feature vectors result in better memory usage and faster matching. This combined with the fact that the descriptor does not require a time-consuming feature orientation estimation (the rotation invariance is achieved solely by using the magnitude spectrum of the Log-Polar Transform) makes it particularly attractive to applications with limited hardware capacity. Evaluation is performed on the standard Oxford dataset and two different microscopy datasets; one with fluorescence and one with transmission electron microscopy images. Our method performs better than SURF and comparable to SIFT on the Oxford dataset, and better than SIFT on both microscopy datasets indicating that it is particularly useful in applications with microscopy images. PMID:29190737

  17. Selective Attention to Semantic and Syntactic Features Modulates Sentence Processing Networks in Anterior Temporal Cortex

    PubMed Central

    Rogalsky, Corianne

    2009-01-01

    Numerous studies have identified an anterior temporal lobe (ATL) region that responds preferentially to sentence-level stimuli. It is unclear, however, whether this activity reflects a response to syntactic computations or some form of semantic integration. This distinction is difficult to investigate with the stimulus manipulations and anomaly detection paradigms traditionally implemented. The present functional magnetic resonance imaging study addresses this question via a selective attention paradigm. Subjects monitored for occasional semantic anomalies or occasional syntactic errors, thus directing their attention to semantic integration, or syntactic properties of the sentences. The hemodynamic response in the sentence-selective ATL region (defined with a localizer scan) was examined during anomaly/error-free sentences only, to avoid confounds due to error detection. The majority of the sentence-specific region of interest was equally modulated by attention to syntactic or compositional semantic features, whereas a smaller subregion was only modulated by the semantic task. We suggest that the sentence-specific ATL region is sensitive to both syntactic and integrative semantic functions during sentence processing, with a smaller portion of this area preferentially involved in the later. This study also suggests that selective attention paradigms may be effective tools to investigate the functional diversity of networks involved in sentence processing. PMID:18669589

  18. A geographic comparison of selected large-scale planetary surface features

    NASA Technical Reports Server (NTRS)

    Meszaros, S. P.

    1984-01-01

    Photographic and cartographic comparisons of geographic features on Mercury, the Moon, Earth, Mars, Ganymede, Callisto, Mimas, and Tethys are presented. Planetary structures caused by impacts, volcanism, tectonics, and other natural forces are included. Each feature is discussed individually and then those of similar origin are compared at the same scale.

  19. An improved survivability prognosis of breast cancer by using sampling and feature selection technique to solve imbalanced patient classification data.

    PubMed

    Wang, Kung-Jeng; Makond, Bunjira; Wang, Kung-Min

    2013-11-09

    Breast cancer is one of the most critical cancers and is a major cause of cancer death among women. It is essential to know the survivability of the patients in order to ease the decision making process regarding medical treatment and financial preparation. Recently, the breast cancer data sets have been imbalanced (i.e., the number of survival patients outnumbers the number of non-survival patients) whereas the standard classifiers are not applicable for the imbalanced data sets. The methods to improve survivability prognosis of breast cancer need for study. Two well-known five-year prognosis models/classifiers [i.e., logistic regression (LR) and decision tree (DT)] are constructed by combining synthetic minority over-sampling technique (SMOTE), cost-sensitive classifier technique (CSC), under-sampling, bagging, and boosting. The feature selection method is used to select relevant variables, while the pruning technique is applied to obtain low information-burden models. These methods are applied on data obtained from the Surveillance, Epidemiology, and End Results database. The improvements of survivability prognosis of breast cancer are investigated based on the experimental results. Experimental results confirm that the DT and LR models combined with SMOTE, CSC, and under-sampling generate higher predictive performance consecutively than the original ones. Most of the time, DT and LR models combined with SMOTE and CSC use less informative burden/features when a feature selection method and a pruning technique are applied. LR is found to have better statistical power than DT in predicting five-year survivability. CSC is superior to SMOTE, under-sampling, bagging, and boosting to improve the prognostic performance of DT and LR.

  20. An improved survivability prognosis of breast cancer by using sampling and feature selection technique to solve imbalanced patient classification data

    PubMed Central

    2013-01-01

    Background Breast cancer is one of the most critical cancers and is a major cause of cancer death among women. It is essential to know the survivability of the patients in order to ease the decision making process regarding medical treatment and financial preparation. Recently, the breast cancer data sets have been imbalanced (i.e., the number of survival patients outnumbers the number of non-survival patients) whereas the standard classifiers are not applicable for the imbalanced data sets. The methods to improve survivability prognosis of breast cancer need for study. Methods Two well-known five-year prognosis models/classifiers [i.e., logistic regression (LR) and decision tree (DT)] are constructed by combining synthetic minority over-sampling technique (SMOTE) ,cost-sensitive classifier technique (CSC), under-sampling, bagging, and boosting. The feature selection method is used to select relevant variables, while the pruning technique is applied to obtain low information-burden models. These methods are applied on data obtained from the Surveillance, Epidemiology, and End Results database. The improvements of survivability prognosis of breast cancer are investigated based on the experimental results. Results Experimental results confirm that the DT and LR models combined with SMOTE, CSC, and under-sampling generate higher predictive performance consecutively than the original ones. Most of the time, DT and LR models combined with SMOTE and CSC use less informative burden/features when a feature selection method and a pruning technique are applied. Conclusions LR is found to have better statistical power than DT in predicting five-year survivability. CSC is superior to SMOTE, under-sampling, bagging, and boosting to improve the prognostic performance of DT and LR. PMID:24207108

  1. Bias and Stability of Single Variable Classifiers for Feature Ranking and Selection

    PubMed Central

    Fakhraei, Shobeir; Soltanian-Zadeh, Hamid; Fotouhi, Farshad

    2014-01-01

    Feature rankings are often used for supervised dimension reduction especially when discriminating power of each feature is of interest, dimensionality of dataset is extremely high, or computational power is limited to perform more complicated methods. In practice, it is recommended to start dimension reduction via simple methods such as feature rankings before applying more complex approaches. Single Variable Classifier (SVC) ranking is a feature ranking based on the predictive performance of a classifier built using only a single feature. While benefiting from capabilities of classifiers, this ranking method is not as computationally intensive as wrappers. In this paper, we report the results of an extensive study on the bias and stability of such feature ranking method. We study whether the classifiers influence the SVC rankings or the discriminative power of features themselves has a dominant impact on the final rankings. We show the common intuition of using the same classifier for feature ranking and final classification does not always result in the best prediction performance. We then study if heterogeneous classifiers ensemble approaches provide more unbiased rankings and if they improve final classification performance. Furthermore, we calculate an empirical prediction performance loss for using the same classifier in SVC feature ranking and final classification from the optimal choices. PMID:25177107

  2. Bias and Stability of Single Variable Classifiers for Feature Ranking and Selection.

    PubMed

    Fakhraei, Shobeir; Soltanian-Zadeh, Hamid; Fotouhi, Farshad

    2014-11-01

    Feature rankings are often used for supervised dimension reduction especially when discriminating power of each feature is of interest, dimensionality of dataset is extremely high, or computational power is limited to perform more complicated methods. In practice, it is recommended to start dimension reduction via simple methods such as feature rankings before applying more complex approaches. Single Variable Classifier (SVC) ranking is a feature ranking based on the predictive performance of a classifier built using only a single feature. While benefiting from capabilities of classifiers, this ranking method is not as computationally intensive as wrappers. In this paper, we report the results of an extensive study on the bias and stability of such feature ranking method. We study whether the classifiers influence the SVC rankings or the discriminative power of features themselves has a dominant impact on the final rankings. We show the common intuition of using the same classifier for feature ranking and final classification does not always result in the best prediction performance. We then study if heterogeneous classifiers ensemble approaches provide more unbiased rankings and if they improve final classification performance. Furthermore, we calculate an empirical prediction performance loss for using the same classifier in SVC feature ranking and final classification from the optimal choices.

  3. Comparison of two feature selection methods for the separability analysis of intertidal sediments with spectrometric datasets in the German Wadden Sea

    NASA Astrophysics Data System (ADS)

    Jung, Richard; Ehlers, Manfred

    2016-10-01

    The spectral features of intertidal sediments are all influenced by the same biophysical properties, such as water, salinity, grain size or vegetation and therefore they are hard to separate by using only multispectral sensors. This could be shown by a previous study of Jung et al. (2015). A more detailed analysis of their characteristic spectral feature has to be carried out to understand the differences and similarities. Spectrometry data (i.e., hyperspectral sensors), for instance, have the opportunity to measure the reflection of the landscape as a continuous spectral pattern for each pixel of an image built from dozen to hundreds of narrow spectral bands. This reveals a high potential to measure unique spectral responses of different ecological conditions (Hennig et al., 2007). In this context, this study uses spectrometric datasets to distinguish between 14 different sediment classes obtained from a study area in the German Wadden Sea. A new feature selection method is proposed (Jeffries-Matusita distance bases feature selection; JMDFS), which uses the Euclidean distance to eliminate the wavelengths with the most similar reflectance values in an iterative process. Subsequent to each iteration, the separation capability is estimated by the Jeffries-Matusita distance (JMD). Two classes can be separated if the JMD is greater than 1.9 and if less than four wavelengths remain, no separation can be assumed. The results of the JMDFS are compared with a state-of-the-art feature selection method called ReliefF. Both methods showed the ability to improve the separation by achieving overall accuracies greater than 82%. The accuracies are 4%-13% better than the results with all wavelengths applied. The number of remaining wavelengths is very diverse and ranges from 14 to 213 of 703. The advantage of JMDFS compared with ReliefF is clearly the processing time. ReliefF needs 30 min for one temporary result. It is necessary to repeat the process several times and to average

  4. Extremely selective attention: eye-tracking studies of the dynamic allocation of attention to stimulus features in categorization.

    PubMed

    Blair, Mark R; Watson, Marcus R; Walshe, R Calen; Maj, Fillip

    2009-09-01

    Humans have an extremely flexible ability to categorize regularities in their environment, in part because of attentional systems that allow them to focus on important perceptual information. In formal theories of categorization, attention is typically modeled with weights that selectively bias the processing of stimulus features. These theories make differing predictions about the degree of flexibility with which attention can be deployed in response to stimulus properties. Results from 2 eye-tracking studies show that humans can rapidly learn to differently allocate attention to members of different categories. These results provide the first unequivocal demonstration of stimulus-responsive attention in a categorization task. Furthermore, the authors found clear temporal patterns in the shifting of attention within trials that follow from the informativeness of particular stimulus features. These data provide new insights into the attention processes involved in categorization. (c) 2009 APA, all rights reserved.

  5. How do you select the right security features for your company's products

    NASA Astrophysics Data System (ADS)

    Pickett, Gordon E.

    1998-04-01

    If your company manufacturers, supplies, or distributes products of almost any type, style, shape, or for any usage, they may become the objective of fraudulent activities from one or more sources. Therefore, someone at your company should be concerned about how these activities may affect the company's future. This paper/presentation will provide information about where these 'threats' may come from, what products have been compromised in the past, and what steps might be taken to deter these threats. During product security conferences, conversations, and other sources of information, you'll hear about many different types of security features that can be incorporated into monetary and identification documents, packaging, labeling, and other products/systems to help protect against counterfeiting, unauthorized tampering, or to identify 'genuine' products. Many of these features have been around for some time (which means that they may have lost at least some of their effectiveness) while others, or improved versions of some of the more mature features, have been or are being developed. This area is a 'moving target' and re-examination of the threats and counterthreats needs to be an ongoing activity. The 'value' and the capabilities of these features can sometimes be overstated, i.e. that a feature/system can solve all of the security-related problems that you may (or may not) have with your products. A couple of things to always keep in mind is that no feature(s) is universally effective and none of the features, or even combinations of features, is totally 'tamperproof' or counterfeitproof, irrespective of what may be said or claimed. So how do you go about determining if you have a product security problem and what, if any, security features might be used to reduce the threat(s) to your products? This paper will attempt to provide information to help you separate the 'wheat from the chaff' in these considerations. Specifically, information to be discussed in

  6. Temporal resolution for the perception of features and conjunctions.

    PubMed

    Bodelón, Clara; Fallah, Mazyar; Reynolds, John H

    2007-01-24

    The visual system decomposes stimuli into their constituent features, represented by neurons with different feature selectivities. How the signals carried by these feature-selective neurons are integrated into coherent object representations is unknown. To constrain the set of possible integrative mechanisms, we quantified the temporal resolution of perception for color, orientation, and conjunctions of these two features. We find that temporal resolution is measurably higher for each feature than for their conjunction, indicating that time is required to integrate features into a perceptual whole. This finding places temporal limits on the mechanisms that could mediate this form of perceptual integration.

  7. Network Receptive Field Modeling Reveals Extensive Integration and Multi-feature Selectivity in Auditory Cortical Neurons.

    PubMed

    Harper, Nicol S; Schoppe, Oliver; Willmore, Ben D B; Cui, Zhanfeng; Schnupp, Jan W H; King, Andrew J

    2016-11-01

    Cortical sensory neurons are commonly characterized using the receptive field, the linear dependence of their response on the stimulus. In primary auditory cortex neurons can be characterized by their spectrotemporal receptive fields, the spectral and temporal features of a sound that linearly drive a neuron. However, receptive fields do not capture the fact that the response of a cortical neuron results from the complex nonlinear network in which it is embedded. By fitting a nonlinear feedforward network model (a network receptive field) to cortical responses to natural sounds, we reveal that primary auditory cortical neurons are sensitive over a substantially larger spectrotemporal domain than is seen in their standard spectrotemporal receptive fields. Furthermore, the network receptive field, a parsimonious network consisting of 1-7 sub-receptive fields that interact nonlinearly, consistently better predicts neural responses to auditory stimuli than the standard receptive fields. The network receptive field reveals separate excitatory and inhibitory sub-fields with different nonlinear properties, and interaction of the sub-fields gives rise to important operations such as gain control and conjunctive feature detection. The conjunctive effects, where neurons respond only if several specific features are present together, enable increased selectivity for particular complex spectrotemporal structures, and may constitute an important stage in sound recognition. In conclusion, we demonstrate that fitting auditory cortical neural responses with feedforward network models expands on simple linear receptive field models in a manner that yields substantially improved predictive power and reveals key nonlinear aspects of cortical processing, while remaining easy to interpret in a physiological context.

  8. Network Receptive Field Modeling Reveals Extensive Integration and Multi-feature Selectivity in Auditory Cortical Neurons

    PubMed Central

    Willmore, Ben D. B.; Cui, Zhanfeng; Schnupp, Jan W. H.; King, Andrew J.

    2016-01-01

    Cortical sensory neurons are commonly characterized using the receptive field, the linear dependence of their response on the stimulus. In primary auditory cortex neurons can be characterized by their spectrotemporal receptive fields, the spectral and temporal features of a sound that linearly drive a neuron. However, receptive fields do not capture the fact that the response of a cortical neuron results from the complex nonlinear network in which it is embedded. By fitting a nonlinear feedforward network model (a network receptive field) to cortical responses to natural sounds, we reveal that primary auditory cortical neurons are sensitive over a substantially larger spectrotemporal domain than is seen in their standard spectrotemporal receptive fields. Furthermore, the network receptive field, a parsimonious network consisting of 1–7 sub-receptive fields that interact nonlinearly, consistently better predicts neural responses to auditory stimuli than the standard receptive fields. The network receptive field reveals separate excitatory and inhibitory sub-fields with different nonlinear properties, and interaction of the sub-fields gives rise to important operations such as gain control and conjunctive feature detection. The conjunctive effects, where neurons respond only if several specific features are present together, enable increased selectivity for particular complex spectrotemporal structures, and may constitute an important stage in sound recognition. In conclusion, we demonstrate that fitting auditory cortical neural responses with feedforward network models expands on simple linear receptive field models in a manner that yields substantially improved predictive power and reveals key nonlinear aspects of cortical processing, while remaining easy to interpret in a physiological context. PMID:27835647

  9. Quantitative EEG features selection in the classification of attention and response control in the children and adolescents with attention deficit hyperactivity disorder.

    PubMed

    Bashiri, Azadeh; Shahmoradi, Leila; Beigy, Hamid; Savareh, Behrouz A; Nosratabadi, Masood; N Kalhori, Sharareh R; Ghazisaeedi, Marjan

    2018-06-01

    Quantitative EEG gives valuable information in the clinical evaluation of psychological disorders. The purpose of the present study is to identify the most prominent features of quantitative electroencephalography (QEEG) that affect attention and response control parameters in children with attention deficit hyperactivity disorder. The QEEG features and the Integrated Visual and Auditory-Continuous Performance Test ( IVA-CPT) of 95 attention deficit hyperactivity disorder subjects were preprocessed by Independent Evaluation Criterion for Binary Classification. Then, the importance of selected features in the classification of desired outputs was evaluated using the artificial neural network. Findings uncovered the highest rank of QEEG features in each IVA-CPT parameters related to attention and response control. Using the designed model could help therapists to determine the existence or absence of defects in attention and response control relying on QEEG.

  10. Text-Based Conferencing: Features vs. Functionality

    ERIC Educational Resources Information Center

    Anderson, Lynn; McCarthy, Cathy

    2005-01-01

    This report examines three text-based conferencing products: "WowBB", "Invision Power Board", and "vBulletin". Their selection was prompted by a feature-by-feature comparison of the same products on the "WowBB" website. The comparison chart painted a misleading impression of "WowBB's" features in relation to the other two products; so the…

  11. Classification and Feature Selection Algorithms for Modeling Ice Storm Climatology

    NASA Astrophysics Data System (ADS)

    Swaminathan, R.; Sridharan, M.; Hayhoe, K.; Dobbie, G.

    2015-12-01

    Ice storms account for billions of dollars of winter storm loss across the continental US and Canada. In the future, increasing concentration of human populations in areas vulnerable to ice storms such as the northeastern US will only exacerbate the impacts of these extreme events on infrastructure and society. Quantifying the potential impacts of global climate change on ice storm prevalence and frequency is challenging, as ice storm climatology is driven by complex and incompletely defined atmospheric processes, processes that are in turn influenced by a changing climate. This makes the underlying atmospheric and computational modeling of ice storm climatology a formidable task. We propose a novel computational framework that uses sophisticated stochastic classification and feature selection algorithms to model ice storm climatology and quantify storm occurrences from both reanalysis and global climate model outputs. The framework is based on an objective identification of ice storm events by key variables derived from vertical profiles of temperature, humidity and geopotential height. Historical ice storm records are used to identify days with synoptic-scale upper air and surface conditions associated with ice storms. Evaluation using NARR reanalysis and historical ice storm records corresponding to the northeastern US demonstrates that an objective computational model with standard performance measures, with a relatively high degree of accuracy, identify ice storm events based on upper-air circulation patterns and provide insights into the relationships between key climate variables associated with ice storms.

  12. Normed kernel function-based fuzzy possibilistic C-means (NKFPCM) algorithm for high-dimensional breast cancer database classification with feature selection is based on Laplacian Score

    NASA Astrophysics Data System (ADS)

    Lestari, A. W.; Rustam, Z.

    2017-07-01

    In the last decade, breast cancer has become the focus of world attention as this disease is one of the primary leading cause of death for women. Therefore, it is necessary to have the correct precautions and treatment. In previous studies, Fuzzy Kennel K-Medoid algorithm has been used for multi-class data. This paper proposes an algorithm to classify the high dimensional data of breast cancer using Fuzzy Possibilistic C-means (FPCM) and a new method based on clustering analysis using Normed Kernel Function-Based Fuzzy Possibilistic C-Means (NKFPCM). The objective of this paper is to obtain the best accuracy in classification of breast cancer data. In order to improve the accuracy of the two methods, the features candidates are evaluated using feature selection, where Laplacian Score is used. The results show the comparison accuracy and running time of FPCM and NKFPCM with and without feature selection.

  13. Feature Selection and Parameters Optimization of SVM Using Particle Swarm Optimization for Fault Classification in Power Distribution Systems.

    PubMed

    Cho, Ming-Yuan; Hoang, Thi Thom

    2017-01-01

    Fast and accurate fault classification is essential to power system operations. In this paper, in order to classify electrical faults in radial distribution systems, a particle swarm optimization (PSO) based support vector machine (SVM) classifier has been proposed. The proposed PSO based SVM classifier is able to select appropriate input features and optimize SVM parameters to increase classification accuracy. Further, a time-domain reflectometry (TDR) method with a pseudorandom binary sequence (PRBS) stimulus has been used to generate a dataset for purposes of classification. The proposed technique has been tested on a typical radial distribution network to identify ten different types of faults considering 12 given input features generated by using Simulink software and MATLAB Toolbox. The success rate of the SVM classifier is over 97%, which demonstrates the effectiveness and high efficiency of the developed method.

  14. Stabilizing l1-norm prediction models by supervised feature grouping.

    PubMed

    Kamkar, Iman; Gupta, Sunil Kumar; Phung, Dinh; Venkatesh, Svetha

    2016-02-01

    Emerging Electronic Medical Records (EMRs) have reformed the modern healthcare. These records have great potential to be used for building clinical prediction models. However, a problem in using them is their high dimensionality. Since a lot of information may not be relevant for prediction, the underlying complexity of the prediction models may not be high. A popular way to deal with this problem is to employ feature selection. Lasso and l1-norm based feature selection methods have shown promising results. But, in presence of correlated features, these methods select features that change considerably with small changes in data. This prevents clinicians to obtain a stable feature set, which is crucial for clinical decision making. Grouping correlated variables together can improve the stability of feature selection, however, such grouping is usually not known and needs to be estimated for optimal performance. Addressing this problem, we propose a new model that can simultaneously learn the grouping of correlated features and perform stable feature selection. We formulate the model as a constrained optimization problem and provide an efficient solution with guaranteed convergence. Our experiments with both synthetic and real-world datasets show that the proposed model is significantly more stable than Lasso and many existing state-of-the-art shrinkage and classification methods. We further show that in terms of prediction performance, the proposed method consistently outperforms Lasso and other baselines. Our model can be used for selecting stable risk factors for a variety of healthcare problems, so it can assist clinicians toward accurate decision making. Copyright © 2015 Elsevier Inc. All rights reserved.

  15. Feature-based attention elicits surround suppression in feature space.

    PubMed

    Störmer, Viola S; Alvarez, George A

    2014-09-08

    It is known that focusing attention on a particular feature (e.g., the color red) facilitates the processing of all objects in the visual field containing that feature [1-7]. Here, we show that such feature-based attention not only facilitates processing but also actively inhibits processing of similar, but not identical, features globally across the visual field. We combined behavior and electrophysiological recordings of frequency-tagged potentials in human observers to measure this inhibitory surround in feature space. We found that sensory signals of an attended color (e.g., red) were enhanced, whereas sensory signals of colors similar to the target color (e.g., orange) were suppressed relative to colors more distinct from the target color (e.g., yellow). Importantly, this inhibitory effect spreads globally across the visual field, thus operating independently of location. These findings suggest that feature-based attention comprises an excitatory peak surrounded by a narrow inhibitory zone in color space to attenuate the most distracting and potentially confusable stimuli during visual perception. This selection profile is akin to what has been reported for location-based attention [8-10] and thus suggests that such center-surround mechanisms are an overarching principle of attention across different domains in the human brain. Copyright © 2014 Elsevier Ltd. All rights reserved.

  16. Expression of Immunoglobulin Receptors with Distinctive Features Indicating Antigen Selection by Marginal Zone B Cells from Human Spleen

    PubMed Central

    Colombo, Monica; Cutrona, Giovanna; Reverberi, Daniele; Bruno, Silvia; Ghiotto, Fabio; Tenca, Claudya; Stamatopoulos, Kostas; Hadzidimitriou, Anastasia; Ceccarelli, Jenny; Salvi, Sandra; Boccardo, Simona; Calevo, Maria Grazia; De Santanna, Amleto; Truini, Mauro; Fais, Franco; Ferrarini, Manlio

    2013-01-01

    Marginal zone (MZ) B cells, identified as surface (s)IgMhighsIgDlowCD23low/−CD21+CD38− B cells, were purified from human spleens, and the features of their V(D)J gene rearrangements were investigated and compared with those of germinal center (GC), follicular mantle (FM) and switched memory (SM) B cells. Most MZ B cells were CD27+ and exhibited somatic hypermutations (SHM), although to a lower extent than SM B cells. Moreover, among MZ B-cell rearrangements, recurrent sequences were observed, some of which displayed intraclonal diversification. The same diversifying sequences were detected in very low numbers in GC and FM B cells and only when a highly sensitive, gene-specific polymerase chain reaction was used. This result indicates that MZ B cells could expand and diversify in situ and also suggested the presence of a number of activation-induced cytidine deaminase (AID)-expressing B cells in the MZ. The notion of antigen-driven expansion/selection in situ is further supported by the VH CDR3 features of MZ B cells with highly conserved amino acids at specific positions and by the finding of shared (“stereotyped”) sequences in two different spleens. Collectively, the data are consistent with the notion that MZ B cells are a special subset selected by in situ antigenic stimuli. PMID:23877718

  17. DHSpred: support-vector-machine-based human DNase I hypersensitive sites prediction using the optimal features selected by random forest.

    PubMed

    Manavalan, Balachandran; Shin, Tae Hwan; Lee, Gwang

    2018-01-05

    DNase I hypersensitive sites (DHSs) are genomic regions that provide important information regarding the presence of transcriptional regulatory elements and the state of chromatin. Therefore, identifying DHSs in uncharacterized DNA sequences is crucial for understanding their biological functions and mechanisms. Although many experimental methods have been proposed to identify DHSs, they have proven to be expensive for genome-wide application. Therefore, it is necessary to develop computational methods for DHS prediction. In this study, we proposed a support vector machine (SVM)-based method for predicting DHSs, called DHSpred (DNase I Hypersensitive Site predictor in human DNA sequences), which was trained with 174 optimal features. The optimal combination of features was identified from a large set that included nucleotide composition and di- and trinucleotide physicochemical properties, using a random forest algorithm. DHSpred achieved a Matthews correlation coefficient and accuracy of 0.660 and 0.871, respectively, which were 3% higher than those of control SVM predictors trained with non-optimized features, indicating the efficiency of the feature selection method. Furthermore, the performance of DHSpred was superior to that of state-of-the-art predictors. An online prediction server has been developed to assist the scientific community, and is freely available at: http://www.thegleelab.org/DHSpred.html.

  18. DHSpred: support-vector-machine-based human DNase I hypersensitive sites prediction using the optimal features selected by random forest

    PubMed Central

    Manavalan, Balachandran; Shin, Tae Hwan; Lee, Gwang

    2018-01-01

    DNase I hypersensitive sites (DHSs) are genomic regions that provide important information regarding the presence of transcriptional regulatory elements and the state of chromatin. Therefore, identifying DHSs in uncharacterized DNA sequences is crucial for understanding their biological functions and mechanisms. Although many experimental methods have been proposed to identify DHSs, they have proven to be expensive for genome-wide application. Therefore, it is necessary to develop computational methods for DHS prediction. In this study, we proposed a support vector machine (SVM)-based method for predicting DHSs, called DHSpred (DNase I Hypersensitive Site predictor in human DNA sequences), which was trained with 174 optimal features. The optimal combination of features was identified from a large set that included nucleotide composition and di- and trinucleotide physicochemical properties, using a random forest algorithm. DHSpred achieved a Matthews correlation coefficient and accuracy of 0.660 and 0.871, respectively, which were 3% higher than those of control SVM predictors trained with non-optimized features, indicating the efficiency of the feature selection method. Furthermore, the performance of DHSpred was superior to that of state-of-the-art predictors. An online prediction server has been developed to assist the scientific community, and is freely available at: http://www.thegleelab.org/DHSpred.html PMID:29416743

  19. Non-negative matrix factorization in texture feature for classification of dementia with MRI data

    NASA Astrophysics Data System (ADS)

    Sarwinda, D.; Bustamam, A.; Ardaneswari, G.

    2017-07-01

    This paper investigates applications of non-negative matrix factorization as feature selection method to select the features from gray level co-occurrence matrix. The proposed approach is used to classify dementia using MRI data. In this study, texture analysis using gray level co-occurrence matrix is done to feature extraction. In the feature extraction process of MRI data, we found seven features from gray level co-occurrence matrix. Non-negative matrix factorization selected three features that influence of all features produced by feature extractions. A Naïve Bayes classifier is adapted to classify dementia, i.e. Alzheimer's disease, Mild Cognitive Impairment (MCI) and normal control. The experimental results show that non-negative factorization as feature selection method able to achieve an accuracy of 96.4% for classification of Alzheimer's and normal control. The proposed method also compared with other features selection methods i.e. Principal Component Analysis (PCA).

  20. Voltammetric electronic tongue and support vector machines for identification of selected features in Mexican coffee.

    PubMed

    Domínguez, Rocio Berenice; Moreno-Barón, Laura; Muñoz, Roberto; Gutiérrez, Juan Manuel

    2014-09-24

    This paper describes a new method based on a voltammetric electronic tongue (ET) for the recognition of distinctive features in coffee samples. An ET was directly applied to different samples from the main Mexican coffee regions without any pretreatment before the analysis. The resulting electrochemical information was modeled with two different mathematical tools, namely Linear Discriminant Analysis (LDA) and Support Vector Machines (SVM). Growing conditions (i.e., organic or non-organic practices and altitude of crops) were considered for a first classification. LDA results showed an average discrimination rate of 88% ± 6.53% while SVM successfully accomplished an overall accuracy of 96.4% ± 3.50% for the same task. A second classification based on geographical origin of samples was carried out. Results showed an overall accuracy of 87.5% ± 7.79% for LDA and a superior performance of 97.5% ± 3.22% for SVM. Given the complexity of coffee samples, the high accuracy percentages achieved by ET coupled with SVM in both classification problems suggested a potential applicability of ET in the assessment of selected coffee features with a simpler and faster methodology along with a null sample pretreatment. In addition, the proposed method can be applied to authentication assessment while improving cost, time and accuracy of the general procedure.