Rule-based fault diagnosis of hall sensors and fault-tolerant control of PMSM
NASA Astrophysics Data System (ADS)
Song, Ziyou; Li, Jianqiu; Ouyang, Minggao; Gu, Jing; Feng, Xuning; Lu, Dongbin
2013-07-01
Hall sensor is widely used for estimating rotor phase of permanent magnet synchronous motor(PMSM). And rotor position is an essential parameter of PMSM control algorithm, hence it is very dangerous if Hall senor faults occur. But there is scarcely any research focusing on fault diagnosis and fault-tolerant control of Hall sensor used in PMSM. From this standpoint, the Hall sensor faults which may occur during the PMSM operating are theoretically analyzed. According to the analysis results, the fault diagnosis algorithm of Hall sensor, which is based on three rules, is proposed to classify the fault phenomena accurately. The rotor phase estimation algorithms, based on one or two Hall sensor(s), are initialized to engender the fault-tolerant control algorithm. The fault diagnosis algorithm can detect 60 Hall fault phenomena in total as well as all detections can be fulfilled in 1/138 rotor rotation period. The fault-tolerant control algorithm can achieve a smooth torque production which means the same control effect as normal control mode (with three Hall sensors). Finally, the PMSM bench test verifies the accuracy and rapidity of fault diagnosis and fault-tolerant control strategies. The fault diagnosis algorithm can detect all Hall sensor faults promptly and fault-tolerant control algorithm allows the PMSM to face failure conditions of one or two Hall sensor(s). In addition, the transitions between health-control and fault-tolerant control conditions are smooth without any additional noise and harshness. Proposed algorithms can deal with the Hall sensor faults of PMSM in real applications, and can be provided to realize the fault diagnosis and fault-tolerant control of PMSM.
Modeling the Fault Tolerant Capability of a Flight Control System: An Exercise in SCR Specification
NASA Technical Reports Server (NTRS)
Alexander, Chris; Cortellessa, Vittorio; DelGobbo, Diego; Mili, Ali; Napolitano, Marcello
2000-01-01
In life-critical and mission-critical applications, it is important to make provisions for a wide range of contingencies, by providing means for fault tolerance. In this paper, we discuss the specification of a flight control system that is fault tolerant with respect to sensor faults. Redundancy is provided by analytical relations that hold between sensor readings; depending on the conditions, this redundancy can be used to detect, identify and accommodate sensor faults.
Tien, Nguyen Xuan; Kim, Semog; Rhee, Jong Myung; Park, Sang Yoon
2017-07-25
Fault tolerance has long been a major concern for sensor communications in fault-tolerant cyber physical systems (CPSs). Network failure problems often occur in wireless sensor networks (WSNs) due to various factors such as the insufficient power of sensor nodes, the dislocation of sensor nodes, the unstable state of wireless links, and unpredictable environmental interference. Fault tolerance is thus one of the key requirements for data communications in WSN applications. This paper proposes a novel path redundancy-based algorithm, called dual separate paths (DSP), that provides fault-tolerant communication with the improvement of the network traffic performance for WSN applications, such as fault-tolerant CPSs. The proposed DSP algorithm establishes two separate paths between a source and a destination in a network based on the network topology information. These paths are node-disjoint paths and have optimal path distances. Unicast frames are delivered from the source to the destination in the network through the dual paths, providing fault-tolerant communication and reducing redundant unicast traffic for the network. The DSP algorithm can be applied to wired and wireless networks, such as WSNs, to provide seamless fault-tolerant communication for mission-critical and life-critical applications such as fault-tolerant CPSs. The analyzed and simulated results show that the DSP-based approach not only provides fault-tolerant communication, but also improves network traffic performance. For the case study in this paper, when the DSP algorithm was applied to high-availability seamless redundancy (HSR) networks, the proposed DSP-based approach reduced the network traffic by 80% to 88% compared with the standard HSR protocol, thus improving network traffic performance.
Adaptive sensor-fault tolerant control for a class of multivariable uncertain nonlinear systems.
Khebbache, Hicham; Tadjine, Mohamed; Labiod, Salim; Boulkroune, Abdesselem
2015-03-01
This paper deals with the active fault tolerant control (AFTC) problem for a class of multiple-input multiple-output (MIMO) uncertain nonlinear systems subject to sensor faults and external disturbances. The proposed AFTC method can tolerate three additive (bias, drift and loss of accuracy) and one multiplicative (loss of effectiveness) sensor faults. By employing backstepping technique, a novel adaptive backstepping-based AFTC scheme is developed using the fact that sensor faults and system uncertainties (including external disturbances and unexpected nonlinear functions caused by sensor faults) can be on-line estimated and compensated via robust adaptive schemes. The stability analysis of the closed-loop system is rigorously proven using a Lyapunov approach. The effectiveness of the proposed controller is illustrated by two simulation examples. Copyright © 2014 ISA. Published by Elsevier Ltd. All rights reserved.
Fault-tolerant cooperative output regulation for multi-vehicle systems with sensor faults
NASA Astrophysics Data System (ADS)
Qin, Liguo; He, Xiao; Zhou, D. H.
2017-10-01
This paper presents a unified framework of fault diagnosis and fault-tolerant cooperative output regulation (FTCOR) for a linear discrete-time multi-vehicle system with sensor faults. The FTCOR control law is designed through three steps. A cooperative output regulation (COR) controller is designed based on the internal mode principle when there are no sensor faults. A sufficient condition on the existence of the COR controller is given based on the discrete-time algebraic Riccati equation (DARE). Then, a decentralised fault diagnosis scheme is designed to cope with sensor faults occurring in followers. A residual generator is developed to detect sensor faults of each follower, and a bank of fault-matching estimators are proposed to isolate and estimate sensor faults of each follower. Unlike the current distributed fault diagnosis for multi-vehicle systems, the presented decentralised fault diagnosis scheme in each vehicle reduces the communication and computation load by only using the information of the vehicle. By combing the sensor fault estimation and the COR control law, an FTCOR controller is proposed. Finally, the simulation results demonstrate the effectiveness of the FTCOR controller.
Experimental Robot Position Sensor Fault Tolerance Using Accelerometers and Joint Torque Sensors
NASA Technical Reports Server (NTRS)
Aldridge, Hal A.; Juang, Jer-Nan
1997-01-01
Robot systems in critical applications, such as those in space and nuclear environments, must be able to operate during component failure to complete important tasks. One failure mode that has received little attention is the failure of joint position sensors. Current fault tolerant designs require the addition of directly redundant position sensors which can affect joint design. The proposed method uses joint torque sensors found in most existing advanced robot designs along with easily locatable, lightweight accelerometers to provide a joint position sensor fault recovery mode. This mode uses the torque sensors along with a virtual passive control law for stability and accelerometers for joint position information. Two methods for conversion from Cartesian acceleration to joint position based on robot kinematics, not integration, are presented. The fault tolerant control method was tested on several joints of a laboratory robot. The controllers performed well with noisy, biased data and a model with uncertain parameters.
Machine learning techniques for fault isolation and sensor placement
NASA Technical Reports Server (NTRS)
Carnes, James R.; Fisher, Douglas H.
1993-01-01
Fault isolation and sensor placement are vital for monitoring and diagnosis. A sensor conveys information about a system's state that guides troubleshooting if problems arise. We are using machine learning methods to uncover behavioral patterns over snapshots of system simulations that will aid fault isolation and sensor placement, with an eye towards minimality, fault coverage, and noise tolerance.
Sensor fault-tolerant control for gear-shifting engaging process of automated manual transmission
NASA Astrophysics Data System (ADS)
Li, Liang; He, Kai; Wang, Xiangyu; Liu, Yahui
2018-01-01
Angular displacement sensor on the actuator of automated manual transmission (AMT) is sensitive to fault, and the sensor fault will disturb its normal control, which affects the entire gear-shifting process of AMT and results in awful riding comfort. In order to solve this problem, this paper proposes a method of fault-tolerant control for AMT gear-shifting engaging process. By using the measured current of actuator motor and angular displacement of actuator, the gear-shifting engaging load torque table is built and updated before the occurrence of the sensor fault. Meanwhile, residual between estimated and measured angular displacements is used to detect the sensor fault. Once the residual exceeds a determined fault threshold, the sensor fault is detected. Then, switch control is triggered, and the current observer and load torque table estimates an actual gear-shifting position to replace the measured one to continue controlling the gear-shifting process. Numerical and experiment tests are carried out to evaluate the reliability and feasibility of proposed methods, and the results show that the performance of estimation and control is satisfactory.
A General theory of Signal Integration for Fault-Tolerant Dynamic Distributed Sensor Networks
1993-10-01
related to a) the architecture and fault- tolerance of the distributed sensor network, b) the proper synchronisation of sensor signals, c) the...Computational complexities of the problem of distributed detection. 5) Issues related to recording of events and synchronization in distributed sensor...Intervals for Synchronization in Real Time Distributed Systems", Submitted to Electronic Encyclopedia. 3. V. G. Hegde and S. S. Iyengar "Efficient
Bounemeur, Abdelhamid; Chemachema, Mohamed; Essounbouli, Najib
2018-05-10
In this paper, an active fuzzy fault tolerant tracking control (AFFTTC) scheme is developed for a class of multi-input multi-output (MIMO) unknown nonlinear systems in the presence of unknown actuator faults, sensor failures and external disturbance. The developed control scheme deals with four kinds of faults for both sensors and actuators. The bias, drift, and loss of accuracy additive faults are considered along with the loss of effectiveness multiplicative fault. A fuzzy adaptive controller based on back-stepping design is developed to deal with actuator failures and unknown system dynamics. However, an additional robust control term is added to deal with sensor faults, approximation errors, and external disturbances. Lyapunov theory is used to prove the stability of the closed loop system. Numerical simulations on a quadrotor are presented to show the effectiveness of the proposed approach. Copyright © 2018 ISA. Published by Elsevier Ltd. All rights reserved.
A Fault Tolerant System for an Integrated Avionics Sensor Configuration
NASA Technical Reports Server (NTRS)
Caglayan, A. K.; Lancraft, R. E.
1984-01-01
An aircraft sensor fault tolerant system methodology for the Transport Systems Research Vehicle in a Microwave Landing System (MLS) environment is described. The fault tolerant system provides reliable estimates in the presence of possible failures both in ground-based navigation aids, and in on-board flight control and inertial sensors. Sensor failures are identified by utilizing the analytic relationships between the various sensors arising from the aircraft point mass equations of motion. The estimation and failure detection performance of the software implementation (called FINDS) of the developed system was analyzed on a nonlinear digital simulation of the research aircraft. Simulation results showing the detection performance of FINDS, using a dual redundant sensor compliment, are presented for bias, hardover, null, ramp, increased noise and scale factor failures. In general, the results show that FINDS can distinguish between normal operating sensor errors and failures while providing an excellent detection speed for bias failures in the MLS, indicated airspeed, attitude and radar altimeter sensors.
Fault tolerant multi-sensor fusion based on the information gain
NASA Astrophysics Data System (ADS)
Hage, Joelle Al; El Najjar, Maan E.; Pomorski, Denis
2017-01-01
In the last decade, multi-robot systems are used in several applications like for example, the army, the intervention areas presenting danger to human life, the management of natural disasters, the environmental monitoring, exploration and agriculture. The integrity of localization of the robots must be ensured in order to achieve their mission in the best conditions. Robots are equipped with proprioceptive (encoders, gyroscope) and exteroceptive sensors (Kinect). However, these sensors could be affected by various faults types that can be assimilated to erroneous measurements, bias, outliers, drifts,… In absence of a sensor fault diagnosis step, the integrity and the continuity of the localization are affected. In this work, we present a muti-sensors fusion approach with Fault Detection and Exclusion (FDE) based on the information theory. In this context, we are interested by the information gain given by an observation which may be relevant when dealing with the fault tolerance aspect. Moreover, threshold optimization based on the quantity of information given by a decision on the true hypothesis is highlighted.
NASA Technical Reports Server (NTRS)
Hruby, R. J.; Bjorkman, W. S.
1977-01-01
Flight test results of the strapdown inertial reference unit (SIRU) navigation system are presented. The fault-tolerant SIRU navigation system features a redundant inertial sensor unit and dual computers. System software provides for detection and isolation of inertial sensor failures and continued operation in the event of failures. Flight test results include assessments of the system's navigational performance and fault tolerance.
Flight test results of the strapdown hexad inertial reference unit (SIRU). Volume 2: Test report
NASA Technical Reports Server (NTRS)
Hruby, R. J.; Bjorkman, W. S.
1977-01-01
Results of flight tests of the Strapdown Inertial Reference Unit (SIRU) navigation system are presented. The fault tolerant SIRU navigation system features a redundant inertial sensor unit and dual computers. System software provides for detection and isolation of inertial sensor failures and continued operation in the event of failures. Flight test results include assessments of the system's navigational performance and fault tolerance. Performance shortcomings are analyzed.
Fault Diagnostics for Turbo-Shaft Engine Sensors Based on a Simplified On-Board Model
Lu, Feng; Huang, Jinquan; Xing, Yaodong
2012-01-01
Combining a simplified on-board turbo-shaft model with sensor fault diagnostic logic, a model-based sensor fault diagnosis method is proposed. The existing fault diagnosis method for turbo-shaft engine key sensors is mainly based on a double redundancies technique, and this can't be satisfied in some occasions as lack of judgment. The simplified on-board model provides the analytical third channel against which the dual channel measurements are compared, while the hardware redundancy will increase the structure complexity and weight. The simplified turbo-shaft model contains the gas generator model and the power turbine model with loads, this is built up via dynamic parameters method. Sensor fault detection, diagnosis (FDD) logic is designed, and two types of sensor failures, such as the step faults and the drift faults, are simulated. When the discrepancy among the triplex channels exceeds a tolerance level, the fault diagnosis logic determines the cause of the difference. Through this approach, the sensor fault diagnosis system achieves the objectives of anomaly detection, sensor fault diagnosis and redundancy recovery. Finally, experiments on this method are carried out on a turbo-shaft engine, and two types of faults under different channel combinations are presented. The experimental results show that the proposed method for sensor fault diagnostics is efficient. PMID:23112645
Fault diagnostics for turbo-shaft engine sensors based on a simplified on-board model.
Lu, Feng; Huang, Jinquan; Xing, Yaodong
2012-01-01
Combining a simplified on-board turbo-shaft model with sensor fault diagnostic logic, a model-based sensor fault diagnosis method is proposed. The existing fault diagnosis method for turbo-shaft engine key sensors is mainly based on a double redundancies technique, and this can't be satisfied in some occasions as lack of judgment. The simplified on-board model provides the analytical third channel against which the dual channel measurements are compared, while the hardware redundancy will increase the structure complexity and weight. The simplified turbo-shaft model contains the gas generator model and the power turbine model with loads, this is built up via dynamic parameters method. Sensor fault detection, diagnosis (FDD) logic is designed, and two types of sensor failures, such as the step faults and the drift faults, are simulated. When the discrepancy among the triplex channels exceeds a tolerance level, the fault diagnosis logic determines the cause of the difference. Through this approach, the sensor fault diagnosis system achieves the objectives of anomaly detection, sensor fault diagnosis and redundancy recovery. Finally, experiments on this method are carried out on a turbo-shaft engine, and two types of faults under different channel combinations are presented. The experimental results show that the proposed method for sensor fault diagnostics is efficient.
The Design and Semi-Physical Simulation Test of Fault-Tolerant Controller for Aero Engine
NASA Astrophysics Data System (ADS)
Liu, Yuan; Zhang, Xin; Zhang, Tianhong
2017-11-01
A new fault-tolerant control method for aero engine is proposed, which can accurately diagnose the sensor fault by Kalman filter banks and reconstruct the signal by real-time on-board adaptive model combing with a simplified real-time model and an improved Kalman filter. In order to verify the feasibility of the method proposed, a semi-physical simulation experiment has been carried out. Besides the real I/O interfaces, controller hardware and the virtual plant model, semi-physical simulation system also contains real fuel system. Compared with the hardware-in-the-loop (HIL) simulation, semi-physical simulation system has a higher degree of confidence. In order to meet the needs of semi-physical simulation, a rapid prototyping controller with fault-tolerant control ability based on NI CompactRIO platform is designed and verified on the semi-physical simulation test platform. The result shows that the controller can realize the aero engine control safely and reliably with little influence on controller performance in the event of fault on sensor.
Sliding Mode Fault Tolerant Control with Adaptive Diagnosis for Aircraft Engines
NASA Astrophysics Data System (ADS)
Xiao, Lingfei; Du, Yanbin; Hu, Jixiang; Jiang, Bin
2018-03-01
In this paper, a novel sliding mode fault tolerant control method is presented for aircraft engine systems with uncertainties and disturbances on the basis of adaptive diagnostic observer. By taking both sensors faults and actuators faults into account, the general model of aircraft engine control systems which is subjected to uncertainties and disturbances, is considered. Then, the corresponding augmented dynamic model is established in order to facilitate the fault diagnosis and fault tolerant controller design. Next, a suitable detection observer is designed to detect the faults effectively. Through creating an adaptive diagnostic observer and based on sliding mode strategy, the sliding mode fault tolerant controller is constructed. Robust stabilization is discussed and the closed-loop system can be stabilized robustly. It is also proven that the adaptive diagnostic observer output errors and the estimations of faults converge to a set exponentially, and the converge rate greater than some value which can be adjusted by choosing designable parameters properly. The simulation on a twin-shaft aircraft engine verifies the applicability of the proposed fault tolerant control method.
Fault Tolerance in ZigBee Wireless Sensor Networks
NASA Technical Reports Server (NTRS)
Alena, Richard; Gilstrap, Ray; Baldwin, Jarren; Stone, Thom; Wilson, Pete
2011-01-01
Wireless sensor networks (WSN) based on the IEEE 802.15.4 Personal Area Network standard are finding increasing use in the home automation and emerging smart energy markets. The network and application layers, based on the ZigBee 2007 PRO Standard, provide a convenient framework for component-based software that supports customer solutions from multiple vendors. This technology is supported by System-on-a-Chip solutions, resulting in extremely small and low-power nodes. The Wireless Connections in Space Project addresses the aerospace flight domain for both flight-critical and non-critical avionics. WSNs provide the inherent fault tolerance required for aerospace applications utilizing such technology. The team from Ames Research Center has developed techniques for assessing the fault tolerance of ZigBee WSNs challenged by radio frequency (RF) interference or WSN node failure.
[Advanced Development for Space Robotics With Emphasis on Fault Tolerance Technology
NASA Technical Reports Server (NTRS)
Tesar, Delbert
1997-01-01
This report describes work developing fault tolerant redundant robotic architectures and adaptive control strategies for robotic manipulator systems which can dynamically accommodate drastic robot manipulator mechanism, sensor or control failures and maintain stable end-point trajectory control with minimum disturbance. Kinematic designs of redundant, modular, reconfigurable arms for fault tolerance were pursued at a fundamental level. The approach developed robotic testbeds to evaluate disturbance responses of fault tolerant concepts in robotic mechanisms and controllers. The development was implemented in various fault tolerant mechanism testbeds including duality in the joint servo motor modules, parallel and serial structural architectures, and dual arms. All have real-time adaptive controller technologies to react to mechanism or controller disturbances (failures) to perform real-time reconfiguration to continue the task operations. The developments fall into three main areas: hardware, software, and theoretical.
Proprioceptive Sensors' Fault Tolerant Control Strategy for an Autonomous Vehicle.
Boukhari, Mohamed Riad; Chaibet, Ahmed; Boukhnifer, Moussa; Glaser, Sébastien
2018-06-09
In this contribution, a fault-tolerant control strategy for the longitudinal dynamics of an autonomous vehicle is presented. The aim is to be able to detect potential failures of the vehicle's speed sensor and then to keep the vehicle in a safe state. For this purpose, the separation principle, composed of a static output feedback controller and fault estimation observers, is designed. Indeed, two observer techniques were proposed: the proportional and integral observer and the descriptor observer. The effectiveness of the proposed scheme is validated by means of the experimental demonstrator of the VEDECOM (Véhicle Décarboné et Communinicant) Institut.
Gyro-based Maximum-Likelihood Thruster Fault Detection and Identification
NASA Technical Reports Server (NTRS)
Wilson, Edward; Lages, Chris; Mah, Robert; Clancy, Daniel (Technical Monitor)
2002-01-01
When building smaller, less expensive spacecraft, there is a need for intelligent fault tolerance vs. increased hardware redundancy. If fault tolerance can be achieved using existing navigation sensors, cost and vehicle complexity can be reduced. A maximum likelihood-based approach to thruster fault detection and identification (FDI) for spacecraft is developed here and applied in simulation to the X-38 space vehicle. The system uses only gyro signals to detect and identify hard, abrupt, single and multiple jet on- and off-failures. Faults are detected within one second and identified within one to five accords,
Fault tolerant operation of switched reluctance machine
NASA Astrophysics Data System (ADS)
Wang, Wei
The energy crisis and environmental challenges have driven industry towards more energy efficient solutions. With nearly 60% of electricity consumed by various electric machines in industry sector, advancement in the efficiency of the electric drive system is of vital importance. Adjustable speed drive system (ASDS) provides excellent speed regulation and dynamic performance as well as dramatically improved system efficiency compared with conventional motors without electronics drives. Industry has witnessed tremendous grow in ASDS applications not only as a driving force but also as an electric auxiliary system for replacing bulky and low efficiency auxiliary hydraulic and mechanical systems. With the vast penetration of ASDS, its fault tolerant operation capability is more widely recognized as an important feature of drive performance especially for aerospace, automotive applications and other industrial drive applications demanding high reliability. The Switched Reluctance Machine (SRM), a low cost, highly reliable electric machine with fault tolerant operation capability, has drawn substantial attention in the past three decades. Nevertheless, SRM is not free of fault. Certain faults such as converter faults, sensor faults, winding shorts, eccentricity and position sensor faults are commonly shared among all ASDS. In this dissertation, a thorough understanding of various faults and their influence on transient and steady state performance of SRM is developed via simulation and experimental study, providing necessary knowledge for fault detection and post fault management. Lumped parameter models are established for fast real time simulation and drive control. Based on the behavior of the faults, a fault detection scheme is developed for the purpose of fast and reliable fault diagnosis. In order to improve the SRM power and torque capacity under faults, the maximum torque per ampere excitation are conceptualized and validated through theoretical analysis and experiments. With the proposed optimal waveform, torque production is greatly improved under the same Root Mean Square (RMS) current constraint. Additionally, position sensorless operation methods under phase faults are investigated to account for the combination of physical position sensor and phase winding faults. A comprehensive solution for position sensorless operation under single and multiple phases fault are proposed and validated through experiments. Continuous position sensorless operation with seamless transition between various numbers of phase fault is achieved.
NASA Technical Reports Server (NTRS)
Kobayashi, Takahisa; Simon, Donald L.
2008-01-01
In this paper, a baseline system which utilizes dual-channel sensor measurements for aircraft engine on-line diagnostics is developed. This system is composed of a linear on-board engine model (LOBEM) and fault detection and isolation (FDI) logic. The LOBEM provides the analytical third channel against which the dual-channel measurements are compared. When the discrepancy among the triplex channels exceeds a tolerance level, the FDI logic determines the cause of the discrepancy. Through this approach, the baseline system achieves the following objectives: (1) anomaly detection, (2) component fault detection, and (3) sensor fault detection and isolation. The performance of the baseline system is evaluated in a simulation environment using faults in sensors and components.
Fault-Tolerant Algorithms for Connectivity Restoration in Wireless Sensor Networks.
Zeng, Yali; Xu, Li; Chen, Zhide
2015-12-22
As wireless sensor network (WSN) is often deployed in a hostile environment, nodes in the networks are prone to large-scale failures, resulting in the network not working normally. In this case, an effective restoration scheme is needed to restore the faulty network timely. Most of existing restoration schemes consider more about the number of deployed nodes or fault tolerance alone, but fail to take into account the fact that network coverage and topology quality are also important to a network. To address this issue, we present two algorithms named Full 2-Connectivity Restoration Algorithm (F2CRA) and Partial 3-Connectivity Restoration Algorithm (P3CRA), which restore a faulty WSN in different aspects. F2CRA constructs the fan-shaped topology structure to reduce the number of deployed nodes, while P3CRA constructs the dual-ring topology structure to improve the fault tolerance of the network. F2CRA is suitable when the restoration cost is given the priority, and P3CRA is suitable when the network quality is considered first. Compared with other algorithms, these two algorithms ensure that the network has stronger fault-tolerant function, larger coverage area and better balanced load after the restoration.
Real time health monitoring and control system methodology for flexible space structures
NASA Astrophysics Data System (ADS)
Jayaram, Sanjay
This dissertation is concerned with the Near Real-time Autonomous Health Monitoring of Flexible Space Structures. The dynamics of multi-body flexible systems is uncertain due to factors such as high non-linearity, consideration of higher modal frequencies, high dimensionality, multiple inputs and outputs, operational constraints, as well as unexpected failures of sensors and/or actuators. Hence a systematic framework of developing a high fidelity, dynamic model of a flexible structural system needs to be understood. The fault detection mechanism that will be an integrated part of an autonomous health monitoring system comprises the detection of abnormalities in the sensors and/or actuators and correcting these detected faults (if possible). Applying the robust control law and the robust measures that are capable of detecting and recovering/replacing the actuators rectifies the actuator faults. The fault tolerant concept applied to the sensors will be in the form of an Extended Kalman Filter (EKF). The EKF is going to weigh the information coming from multiple sensors (redundant sensors used to measure the same information) and automatically identify the faulty sensors and weigh the best estimate from the remaining sensors. The mechanization is comprised of instrumenting flexible deployable panels (solar array) with multiple angular position and rate sensors connected to the data acquisition system. The sensors will give position and rate information of the solar panel in all three axes (i.e. roll, pitch and yaw). The position data corresponds to the steady state response and the rate data will give better insight on the transient response of the system. This is a critical factor for real-time autonomous health monitoring. MATLAB (and/or C++) software will be used for high fidelity modeling and fault tolerant mechanism.
Multi-Sensor Fusion with Interaction Multiple Model and Chi-Square Test Tolerant Filter.
Yang, Chun; Mohammadi, Arash; Chen, Qing-Wei
2016-11-02
Motivated by the key importance of multi-sensor information fusion algorithms in the state-of-the-art integrated navigation systems due to recent advancements in sensor technologies, telecommunication, and navigation systems, the paper proposes an improved and innovative fault-tolerant fusion framework. An integrated navigation system is considered consisting of four sensory sub-systems, i.e., Strap-down Inertial Navigation System (SINS), Global Navigation System (GPS), the Bei-Dou2 (BD2) and Celestial Navigation System (CNS) navigation sensors. In such multi-sensor applications, on the one hand, the design of an efficient fusion methodology is extremely constrained specially when no information regarding the system's error characteristics is available. On the other hand, the development of an accurate fault detection and integrity monitoring solution is both challenging and critical. The paper addresses the sensitivity issues of conventional fault detection solutions and the unavailability of a precisely known system model by jointly designing fault detection and information fusion algorithms. In particular, by using ideas from Interacting Multiple Model (IMM) filters, the uncertainty of the system will be adjusted adaptively by model probabilities and using the proposed fuzzy-based fusion framework. The paper also addresses the problem of using corrupted measurements for fault detection purposes by designing a two state propagator chi-square test jointly with the fusion algorithm. Two IMM predictors, running in parallel, are used and alternatively reactivated based on the received information form the fusion filter to increase the reliability and accuracy of the proposed detection solution. With the combination of the IMM and the proposed fusion method, we increase the failure sensitivity of the detection system and, thereby, significantly increase the overall reliability and accuracy of the integrated navigation system. Simulation results indicate that the proposed fault tolerant fusion framework provides superior performance over its traditional counterparts.
Multi-Sensor Fusion with Interaction Multiple Model and Chi-Square Test Tolerant Filter
Yang, Chun; Mohammadi, Arash; Chen, Qing-Wei
2016-01-01
Motivated by the key importance of multi-sensor information fusion algorithms in the state-of-the-art integrated navigation systems due to recent advancements in sensor technologies, telecommunication, and navigation systems, the paper proposes an improved and innovative fault-tolerant fusion framework. An integrated navigation system is considered consisting of four sensory sub-systems, i.e., Strap-down Inertial Navigation System (SINS), Global Navigation System (GPS), the Bei-Dou2 (BD2) and Celestial Navigation System (CNS) navigation sensors. In such multi-sensor applications, on the one hand, the design of an efficient fusion methodology is extremely constrained specially when no information regarding the system’s error characteristics is available. On the other hand, the development of an accurate fault detection and integrity monitoring solution is both challenging and critical. The paper addresses the sensitivity issues of conventional fault detection solutions and the unavailability of a precisely known system model by jointly designing fault detection and information fusion algorithms. In particular, by using ideas from Interacting Multiple Model (IMM) filters, the uncertainty of the system will be adjusted adaptively by model probabilities and using the proposed fuzzy-based fusion framework. The paper also addresses the problem of using corrupted measurements for fault detection purposes by designing a two state propagator chi-square test jointly with the fusion algorithm. Two IMM predictors, running in parallel, are used and alternatively reactivated based on the received information form the fusion filter to increase the reliability and accuracy of the proposed detection solution. With the combination of the IMM and the proposed fusion method, we increase the failure sensitivity of the detection system and, thereby, significantly increase the overall reliability and accuracy of the integrated navigation system. Simulation results indicate that the proposed fault tolerant fusion framework provides superior performance over its traditional counterparts. PMID:27827832
Robot Position Sensor Fault Tolerance
NASA Technical Reports Server (NTRS)
Aldridge, Hal A.
1997-01-01
Robot systems in critical applications, such as those in space and nuclear environments, must be able to operate during component failure to complete important tasks. One failure mode that has received little attention is the failure of joint position sensors. Current fault tolerant designs require the addition of directly redundant position sensors which can affect joint design. A new method is proposed that utilizes analytical redundancy to allow for continued operation during joint position sensor failure. Joint torque sensors are used with a virtual passive torque controller to make the robot joint stable without position feedback and improve position tracking performance in the presence of unknown link dynamics and end-effector loading. Two Cartesian accelerometer based methods are proposed to determine the position of the joint. The joint specific position determination method utilizes two triaxial accelerometers attached to the link driven by the joint with the failed position sensor. The joint specific method is not computationally complex and the position error is bounded. The system wide position determination method utilizes accelerometers distributed on different robot links and the end-effector to determine the position of sets of multiple joints. The system wide method requires fewer accelerometers than the joint specific method to make all joint position sensors fault tolerant but is more computationally complex and has lower convergence properties. Experiments were conducted on a laboratory manipulator. Both position determination methods were shown to track the actual position satisfactorily. A controller using the position determination methods and the virtual passive torque controller was able to servo the joints to a desired position during position sensor failure.
A Parameter Communication Optimization Strategy for Distributed Machine Learning in Sensors.
Zhang, Jilin; Tu, Hangdi; Ren, Yongjian; Wan, Jian; Zhou, Li; Li, Mingwei; Wang, Jue; Yu, Lifeng; Zhao, Chang; Zhang, Lei
2017-09-21
In order to utilize the distributed characteristic of sensors, distributed machine learning has become the mainstream approach, but the different computing capability of sensors and network delays greatly influence the accuracy and the convergence rate of the machine learning model. Our paper describes a reasonable parameter communication optimization strategy to balance the training overhead and the communication overhead. We extend the fault tolerance of iterative-convergent machine learning algorithms and propose the Dynamic Finite Fault Tolerance (DFFT). Based on the DFFT, we implement a parameter communication optimization strategy for distributed machine learning, named Dynamic Synchronous Parallel Strategy (DSP), which uses the performance monitoring model to dynamically adjust the parameter synchronization strategy between worker nodes and the Parameter Server (PS). This strategy makes full use of the computing power of each sensor, ensures the accuracy of the machine learning model, and avoids the situation that the model training is disturbed by any tasks unrelated to the sensors.
Zeghlache, Samir; Benslimane, Tarak; Bouguerra, Abderrahmen
2017-11-01
In this paper, a robust controller for a three degree of freedom (3 DOF) helicopter control is proposed in presence of actuator and sensor faults. For this purpose, Interval type-2 fuzzy logic control approach (IT2FLC) and sliding mode control (SMC) technique are used to design a controller, named active fault tolerant interval type-2 Fuzzy Sliding mode controller (AFTIT2FSMC) based on non-linear adaptive observer to estimate and detect the system faults for each subsystem of the 3-DOF helicopter. The proposed control scheme allows avoiding difficult modeling, attenuating the chattering effect of the SMC, reducing the rules number of the fuzzy controller. Exponential stability of the closed loop is guaranteed by using the Lyapunov method. The simulation results show that the AFTIT2FSMC can greatly alleviate the chattering effect, providing good tracking performance, even in presence of actuator and sensor faults. Copyright © 2017 ISA. Published by Elsevier Ltd. All rights reserved.
On the design of fault-tolerant robotic manipulator systems
NASA Technical Reports Server (NTRS)
Tesar, Delbert
1993-01-01
Robotic systems are finding increasing use in space applications. Many of these devices are going to be operational on board the Space Station Freedom. Fault tolerance has been deemed necessary because of the criticality of the tasks and the inaccessibility of the systems to maintenance and repair. Design for fault tolerance in manipulator systems is an area within robotics that is without precedence in the literature. In this paper, we will attempt to lay down the foundations for such a technology. Design for fault tolerance demands new and special approaches to design, often at considerable variance from established design practices. These design aspects, together with reliability evaluation and modeling tools, are presented. Mechanical architectures that employ protective redundancies at many levels and have a modular architecture are then studied in detail. Once a mechanical architecture for fault tolerance has been derived, the chronological stages of operational fault tolerance are investigated. Failure detection, isolation, and estimation methods are surveyed, and such methods for robot sensors and actuators are derived. Failure recovery methods are also presented for each of the protective layers of redundancy. Failure recovery tactics often span all of the layers of a control hierarchy. Thus, a unified framework for decision-making and control, which orchestrates both the nominal redundancy management tasks and the failure management tasks, has been derived. The well-developed field of fault-tolerant computers is studied next, and some design principles relevant to the design of fault-tolerant robot controllers are abstracted. Conclusions are drawn, and a road map for the design of fault-tolerant manipulator systems is laid out with recommendations for a 10 DOF arm with dual actuators at each joint.
NASA Technical Reports Server (NTRS)
Yates, Amy M.; Torres-Pomales, Wilfredo; Malekpour, Mahyar R.; Gonzalez, Oscar R.; Gray, W. Steven
2010-01-01
Safety-critical distributed flight control systems require robustness in the presence of faults. In general, these systems consist of a number of input/output (I/O) and computation nodes interacting through a fault-tolerant data communication system. The communication system transfers sensor data and control commands and can handle most faults under typical operating conditions. However, the performance of the closed-loop system can be adversely affected as a result of operating in harsh environments. In particular, High-Intensity Radiated Field (HIRF) environments have the potential to cause random fault manifestations in individual avionic components and to generate simultaneous system-wide communication faults that overwhelm existing fault management mechanisms. This paper presents the design of an experiment conducted at the NASA Langley Research Center's HIRF Laboratory to statistically characterize the faults that a HIRF environment can trigger on a single node of a distributed flight control system.
Health management and controls for Earth-to-orbit propulsion systems
NASA Astrophysics Data System (ADS)
Bickford, R. L.
1995-03-01
Avionics and health management technologies increase the safety and reliability while decreasing the overall cost for Earth-to-orbit (ETO) propulsion systems. New ETO propulsion systems will depend on highly reliable fault tolerant flight avionics, advanced sensing systems and artificial intelligence aided software to ensure critical control, safety and maintenance requirements are met in a cost effective manner. Propulsion avionics consist of the engine controller, actuators, sensors, software and ground support elements. In addition to control and safety functions, these elements perform system monitoring for health management. Health management is enhanced by advanced sensing systems and algorithms which provide automated fault detection and enable adaptive control and/or maintenance approaches. Aerojet is developing advanced fault tolerant rocket engine controllers which provide very high levels of reliability. Smart sensors and software systems which significantly enhance fault coverage and enable automated operations are also under development. Smart sensing systems, such as flight capable plume spectrometers, have reached maturity in ground-based applications and are suitable for bridging to flight. Software to detect failed sensors has reached similar maturity. This paper will discuss fault detection and isolation for advanced rocket engine controllers as well as examples of advanced sensing systems and software which significantly improve component failure detection for engine system safety and health management.
Adaptive and technology-independent architecture for fault-tolerant distributed AAL solutions.
Schmidt, Michael; Obermaisser, Roman
2018-04-01
Today's architectures for Ambient Assisted Living (AAL) must cope with a variety of challenges like flawless sensor integration and time synchronization (e.g. for sensor data fusion) while abstracting from the underlying technologies at the same time. Furthermore, an architecture for AAL must be capable to manage distributed application scenarios in order to support elderly people in all situations of their everyday life. This encompasses not just life at home but in particular the mobility of elderly people (e.g. when going for a walk or having sports) as well. Within this paper we will introduce a novel architecture for distributed AAL solutions whose design follows a modern Microservices approach by providing small core services instead of a monolithic application framework. The architecture comprises core services for sensor integration, and service discovery while supporting several communication models (periodic, sporadic, streaming). We extend the state-of-the-art by introducing a fault-tolerance model for our architecture on the basis of a fault-hypothesis describing the fault-containment regions (FCRs) with their respective failure modes and failure rates in order to support safety-critical AAL applications. Copyright © 2017 Elsevier Ltd. All rights reserved.
A Parameter Communication Optimization Strategy for Distributed Machine Learning in Sensors
Zhang, Jilin; Tu, Hangdi; Ren, Yongjian; Wan, Jian; Zhou, Li; Li, Mingwei; Wang, Jue; Yu, Lifeng; Zhao, Chang; Zhang, Lei
2017-01-01
In order to utilize the distributed characteristic of sensors, distributed machine learning has become the mainstream approach, but the different computing capability of sensors and network delays greatly influence the accuracy and the convergence rate of the machine learning model. Our paper describes a reasonable parameter communication optimization strategy to balance the training overhead and the communication overhead. We extend the fault tolerance of iterative-convergent machine learning algorithms and propose the Dynamic Finite Fault Tolerance (DFFT). Based on the DFFT, we implement a parameter communication optimization strategy for distributed machine learning, named Dynamic Synchronous Parallel Strategy (DSP), which uses the performance monitoring model to dynamically adjust the parameter synchronization strategy between worker nodes and the Parameter Server (PS). This strategy makes full use of the computing power of each sensor, ensures the accuracy of the machine learning model, and avoids the situation that the model training is disturbed by any tasks unrelated to the sensors. PMID:28934163
A Fault Tolerance Mechanism for On-Road Sensor Networks
Feng, Lei; Guo, Shaoyong; Sun, Jialu; Yu, Peng; Li, Wenjing
2016-01-01
On-Road Sensor Networks (ORSNs) play an important role in capturing traffic flow data for predicting short-term traffic patterns, driving assistance and self-driving vehicles. However, this kind of network is prone to large-scale communication failure if a few sensors physically fail. In this paper, to ensure that the network works normally, an effective fault-tolerance mechanism for ORSNs which mainly consists of backup on-road sensor deployment, redundant cluster head deployment and an adaptive failure detection and recovery method is proposed. Firstly, based on the N − x principle and the sensors’ failure rate, this paper formulates the backup sensor deployment problem in the form of a two-objective optimization, which explains the trade-off between the cost and fault resumption. In consideration of improving the network resilience further, this paper introduces a redundant cluster head deployment model according to the coverage constraint. Then a common solving method combining integer-continuing and sequential quadratic programming is explored to determine the optimal location of these two deployment problems. Moreover, an Adaptive Detection and Resume (ADR) protocol is deigned to recover the system communication through route and cluster adjustment if there is a backup on-road sensor mismatch. The final experiments show that our proposed mechanism can achieve an average 90% recovery rate and reduce the average number of failed sensors at most by 35.7%. PMID:27918483
An Autonomous Self-Aware and Adaptive Fault Tolerant Routing Technique for Wireless Sensor Networks
Abba, Sani; Lee, Jeong-A
2015-01-01
We propose an autonomous self-aware and adaptive fault-tolerant routing technique (ASAART) for wireless sensor networks. We address the limitations of self-healing routing (SHR) and self-selective routing (SSR) techniques for routing sensor data. We also examine the integration of autonomic self-aware and adaptive fault detection and resiliency techniques for route formation and route repair to provide resilience to errors and failures. We achieved this by using a combined continuous and slotted prioritized transmission back-off delay to obtain local and global network state information, as well as multiple random functions for attaining faster routing convergence and reliable route repair despite transient and permanent node failure rates and efficient adaptation to instantaneous network topology changes. The results of simulations based on a comparison of the ASAART with the SHR and SSR protocols for five different simulated scenarios in the presence of transient and permanent node failure rates exhibit a greater resiliency to errors and failure and better routing performance in terms of the number of successfully delivered network packets, end-to-end delay, delivered MAC layer packets, packet error rate, as well as efficient energy conservation in a highly congested, faulty, and scalable sensor network. PMID:26295236
An Autonomous Self-Aware and Adaptive Fault Tolerant Routing Technique for Wireless Sensor Networks.
Abba, Sani; Lee, Jeong-A
2015-08-18
We propose an autonomous self-aware and adaptive fault-tolerant routing technique (ASAART) for wireless sensor networks. We address the limitations of self-healing routing (SHR) and self-selective routing (SSR) techniques for routing sensor data. We also examine the integration of autonomic self-aware and adaptive fault detection and resiliency techniques for route formation and route repair to provide resilience to errors and failures. We achieved this by using a combined continuous and slotted prioritized transmission back-off delay to obtain local and global network state information, as well as multiple random functions for attaining faster routing convergence and reliable route repair despite transient and permanent node failure rates and efficient adaptation to instantaneous network topology changes. The results of simulations based on a comparison of the ASAART with the SHR and SSR protocols for five different simulated scenarios in the presence of transient and permanent node failure rates exhibit a greater resiliency to errors and failure and better routing performance in terms of the number of successfully delivered network packets, end-to-end delay, delivered MAC layer packets, packet error rate, as well as efficient energy conservation in a highly congested, faulty, and scalable sensor network.
Fault Tolerant Airborne Sensor Networks for Air Operations
2008-02-01
lives affected by undetected targets. The network is said to have expired when there is no longer a single surviving sensor-pair. Tasking process...tasking a finite number of cooperative agents to randomly emerging targets for their removal. Faults occur when some agents engaged in a mission are...expired. Agents are subject to threat at a level determined by the number of targets present. On the other hand, the rate at which a target is removed
High Speed Operation and Testing of a Fault Tolerant Magnetic Bearing
NASA Technical Reports Server (NTRS)
DeWitt, Kenneth; Clark, Daniel
2004-01-01
Research activities undertaken to upgrade the fault-tolerant facility, continue testing high-speed fault-tolerant operation, and assist in the commission of the high temperature (1000 degrees F) thrust magnetic bearing as described. The fault-tolerant magnetic bearing test facility was upgraded to operate to 40,000 RPM. The necessary upgrades included new state-of-the art position sensors with high frequency modulation and new power edge filtering of amplifier outputs. A comparison study of the new sensors and the previous system was done as well as a noise assessment of the sensor-to-controller signals. Also a comparison study of power edge filtering for amplifier-to-actuator signals was done; this information is valuable for all position sensing and motor actuation applications. After these facility upgrades were completed, the rig is believed to have capabilities for 40,000 RPM operation, though this has yet to be demonstrated. Other upgrades included verification and upgrading of safety shielding, and upgrading control algorithms. The rig will now also be used to demonstrate motoring capabilities and control algorithms are in the process of being created. Recently an extreme temperature thrust magnetic bearing was designed from the ground up. The thrust bearing was designed to fit within the existing high temperature facility. The retrofit began near the end of the summer, 04, and continues currently. Contract staff authored a NASA-TM entitled "An Overview of Magnetic Bearing Technology for Gas Turbine Engines", containing a compilation of bearing data as it pertains to operation in the regime of the gas turbine engine and a presentation of how magnetic bearings can become a viable candidate for use in future engine technology.
A Decentralized Adaptive Approach to Fault Tolerant Flight Control
NASA Technical Reports Server (NTRS)
Wu, N. Eva; Nikulin, Vladimir; Heimes, Felix; Shormin, Victor
2000-01-01
This paper briefly reports some results of our study on the application of a decentralized adaptive control approach to a 6 DOF nonlinear aircraft model. The simulation results showed the potential of using this approach to achieve fault tolerant control. Based on this observation and some analysis, the paper proposes a multiple channel adaptive control scheme that makes use of the functionally redundant actuating and sensing capabilities in the model, and explains how to implement the scheme to tolerate actuator and sensor failures. The conditions, under which the scheme is applicable, are stated in the paper.
Analytical sensor redundancy assessment
NASA Technical Reports Server (NTRS)
Mulcare, D. B.; Downing, L. E.; Smith, M. K.
1988-01-01
The rationale and mechanization of sensor fault tolerance based on analytical redundancy principles are described. The concept involves the substitution of software procedures, such as an observer algorithm, to supplant additional hardware components. The observer synthesizes values of sensor states in lieu of their direct measurement. Such information can then be used, for example, to determine which of two disagreeing sensors is more correct, thus enhancing sensor fault survivability. Here a stability augmentation system is used as an example application, with required modifications being made to a quadruplex digital flight control system. The impact on software structure and the resultant revalidation effort are illustrated as well. Also, the use of an observer algorithm for wind gust filtering of the angle-of-attack sensor signal is presented.
NASA Technical Reports Server (NTRS)
Kobayashi, Takahisa; Simon, Donald L.
2008-01-01
In this paper, an enhanced on-line diagnostic system which utilizes dual-channel sensor measurements is developed for the aircraft engine application. The enhanced system is composed of a nonlinear on-board engine model (NOBEM), the hybrid Kalman filter (HKF) algorithm, and fault detection and isolation (FDI) logic. The NOBEM provides the analytical third channel against which the dual-channel measurements are compared. The NOBEM is further utilized as part of the HKF algorithm which estimates measured engine parameters. Engine parameters obtained from the dual-channel measurements, the NOBEM, and the HKF are compared against each other. When the discrepancy among the signals exceeds a tolerance level, the FDI logic determines the cause of discrepancy. Through this approach, the enhanced system achieves the following objectives: 1) anomaly detection, 2) component fault detection, and 3) sensor fault detection and isolation. The performance of the enhanced system is evaluated in a simulation environment using faults in sensors and components, and it is compared to an existing baseline system.
System Wide Joint Position Sensor Fault Tolerance in Robot Systems Using Cartesian Accelerometers
NASA Technical Reports Server (NTRS)
Aldridge, Hal A.; Juang, Jer-Nan
1997-01-01
Joint position sensors are necessary for most robot control systems. A single position sensor failure in a normal robot system can greatly degrade performance. This paper presents a method to obtain position information from Cartesian accelerometers without integration. Depending on the number and location of the accelerometers. the proposed system can tolerate the loss of multiple position sensors. A solution technique suitable for real-time implementation is presented. Simulations were conducted using 5 triaxial accelerometers to recover from the loss of up to 4 joint position sensors on a 7 degree of freedom robot moving in general three dimensional space. The simulations show good estimation performance using non-ideal accelerometer measurements.
Fault tolerant system based on IDDQ testing
NASA Astrophysics Data System (ADS)
Guibane, Badi; Hamdi, Belgacem; Mtibaa, Abdellatif; Bensalem, Brahim
2018-06-01
Offline test is essential to ensure good manufacturing quality. However, for permanent or transient faults that occur during the use of the integrated circuit in an application, an online integrated test is needed as well. This procedure should ensure the detection and possibly the correction or the masking of these faults. This requirement of self-correction is sometimes necessary, especially in critical applications that require high security such as automotive, space or biomedical applications. We propose a fault-tolerant design for analogue and mixed-signal design complementary metal oxide (CMOS) circuits based on the quiescent current supply (IDDQ) testing. A defect can cause an increase in current consumption. IDDQ testing technique is based on the measurement of power supply current to distinguish between functional and failed circuits. The technique has been an effective testing method for detecting physical defects such as gate-oxide shorts, floating gates (open) and bridging defects in CMOS integrated circuits. An architecture called BICS (Built In Current Sensor) is used for monitoring the supply current (IDDQ) of the connected integrated circuit. If the measured current is not within the normal range, a defect is signalled and the system switches connection from the defective to a functional integrated circuit. The fault-tolerant technique is composed essentially by a double mirror built-in current sensor, allowing the detection of abnormal current consumption and blocks allowing the connection to redundant circuits, if a defect occurs. Spices simulations are performed to valid the proposed design.
Wu, Zhao; Xiong, Naixue; Huang, Yannong; Xu, Degang; Hu, Chunyang
2015-01-01
The services composition technology provides flexible methods for building service composition applications (SCAs) in wireless sensor networks (WSNs). The high reliability and high performance of SCAs help services composition technology promote the practical application of WSNs. The optimization methods for reliability and performance used for traditional software systems are mostly based on the instantiations of software components, which are inapplicable and inefficient in the ever-changing SCAs in WSNs. In this paper, we consider the SCAs with fault tolerance in WSNs. Based on a Universal Generating Function (UGF) we propose a reliability and performance model of SCAs in WSNs, which generalizes a redundancy optimization problem to a multi-state system. Based on this model, an efficient optimization algorithm for reliability and performance of SCAs in WSNs is developed based on a Genetic Algorithm (GA) to find the optimal structure of SCAs with fault-tolerance in WSNs. In order to examine the feasibility of our algorithm, we have evaluated the performance. Furthermore, the interrelationships between the reliability, performance and cost are investigated. In addition, a distinct approach to determine the most suitable parameters in the suggested algorithm is proposed. PMID:26561818
A fault tolerant 80960 engine controller
NASA Technical Reports Server (NTRS)
Reichmuth, D. M.; Gage, M. L.; Paterson, E. S.; Kramer, D. D.
1993-01-01
The paper describes the design of the 80960 Fault Tolerant Engine Controller for the supervision of engine operations, which was designed for the NASA Marshall Space Center. Consideration is given to the major electronic components of the controller, including the engine controller, effectors, and the sensors, as well as to the controller hardware, the controller module and the communications module, and the controller software. The architecture of the controller hardware allows modifications to be made to fit the requirements of any new propulsion systems. Multiple flow diagrams are presented illustrating the controller's operations.
NASA Technical Reports Server (NTRS)
Jansen, Mark; Montague, Gerald; Provenza, Andrew; Palazzolo, Alan
2004-01-01
Closed loop operation of a single, high temperature magnetic radial bearing to 30,000 RPM (2.25 million DN) and 540 C (1000 F) is discussed. Also, high temperature, fault tolerant operation for the three axis system is examined. A novel, hydrostatic backup bearing system was employed to attain high speed, high temperature, lubrication free support of the entire rotor system. The hydrostatic bearings were made of a high lubricity material and acted as journal-type backup bearings. New, high temperature displacement sensors were successfully employed to monitor shaft position throughout the entire temperature range and are described in this paper. Control of the system was accomplished through a stand alone, high speed computer controller and it was used to run both the fault-tolerant PID and active vibration control algorithms.
Guidance, Navigation, and Control System Design in a Mass Reduction Exercise
NASA Technical Reports Server (NTRS)
Crain, Timothy; Begly, Michael; Jackson, Mark; Broome, Joel
2008-01-01
Early Orion GN&C system designs optimized for robustness, simplicity, and utilization of commercially available components. During the System Definition Review (SDR), all subsystems on Orion were asked to re-optimize with component mass and steady state power as primary design metrics. The objective was to create a mass reserve in the Orion point of departure vehicle design prior to beginning the PDR analysis cycle. The Orion GN&C subsystem team transitioned from a philosophy of absolute 2 fault tolerance for crew safety and 1 fault tolerance for mission success to an approach of 1 fault tolerance for crew safety and risk based redundancy to meet probability allocations of loss of mission and loss of crew. This paper will discuss the analyses, rationale, and end results of this activity regarding Orion navigation sensor hardware, control effectors, and trajectory design.
Partitioning in Avionics Architectures: Requirements, Mechanisms, and Assurance
NASA Technical Reports Server (NTRS)
Rushby, John
1999-01-01
Automated aircraft control has traditionally been divided into distinct "functions" that are implemented separately (e.g., autopilot, autothrottle, flight management); each function has its own fault-tolerant computer system, and dependencies among different functions are generally limited to the exchange of sensor and control data. A by-product of this "federated" architecture is that faults are strongly contained within the computer system of the function where they occur and cannot readily propagate to affect the operation of other functions. More modern avionics architectures contemplate supporting multiple functions on a single, shared, fault-tolerant computer system where natural fault containment boundaries are less sharply defined. Partitioning uses appropriate hardware and software mechanisms to restore strong fault containment to such integrated architectures. This report examines the requirements for partitioning, mechanisms for their realization, and issues in providing assurance for partitioning. Because partitioning shares some concerns with computer security, security models are reviewed and compared with the concerns of partitioning.
Model-Based Fault Tolerant Control
NASA Technical Reports Server (NTRS)
Kumar, Aditya; Viassolo, Daniel
2008-01-01
The Model Based Fault Tolerant Control (MBFTC) task was conducted under the NASA Aviation Safety and Security Program. The goal of MBFTC is to develop and demonstrate real-time strategies to diagnose and accommodate anomalous aircraft engine events such as sensor faults, actuator faults, or turbine gas-path component damage that can lead to in-flight shutdowns, aborted take offs, asymmetric thrust/loss of thrust control, or engine surge/stall events. A suite of model-based fault detection algorithms were developed and evaluated. Based on the performance and maturity of the developed algorithms two approaches were selected for further analysis: (i) multiple-hypothesis testing, and (ii) neural networks; both used residuals from an Extended Kalman Filter to detect the occurrence of the selected faults. A simple fusion algorithm was implemented to combine the results from each algorithm to obtain an overall estimate of the identified fault type and magnitude. The identification of the fault type and magnitude enabled the use of an online fault accommodation strategy to correct for the adverse impact of these faults on engine operability thereby enabling continued engine operation in the presence of these faults. The performance of the fault detection and accommodation algorithm was extensively tested in a simulation environment.
Trust index based fault tolerant multiple event localization algorithm for WSNs.
Xu, Xianghua; Gao, Xueyong; Wan, Jian; Xiong, Naixue
2011-01-01
This paper investigates the use of wireless sensor networks for multiple event source localization using binary information from the sensor nodes. The events could continually emit signals whose strength is attenuated inversely proportional to the distance from the source. In this context, faults occur due to various reasons and are manifested when a node reports a wrong decision. In order to reduce the impact of node faults on the accuracy of multiple event localization, we introduce a trust index model to evaluate the fidelity of information which the nodes report and use in the event detection process, and propose the Trust Index based Subtract on Negative Add on Positive (TISNAP) localization algorithm, which reduces the impact of faulty nodes on the event localization by decreasing their trust index, to improve the accuracy of event localization and performance of fault tolerance for multiple event source localization. The algorithm includes three phases: first, the sink identifies the cluster nodes to determine the number of events occurred in the entire region by analyzing the binary data reported by all nodes; then, it constructs the likelihood matrix related to the cluster nodes and estimates the location of all events according to the alarmed status and trust index of the nodes around the cluster nodes. Finally, the sink updates the trust index of all nodes according to the fidelity of their information in the previous reporting cycle. The algorithm improves the accuracy of localization and performance of fault tolerance in multiple event source localization. The experiment results show that when the probability of node fault is close to 50%, the algorithm can still accurately determine the number of the events and have better accuracy of localization compared with other algorithms.
Trust Index Based Fault Tolerant Multiple Event Localization Algorithm for WSNs
Xu, Xianghua; Gao, Xueyong; Wan, Jian; Xiong, Naixue
2011-01-01
This paper investigates the use of wireless sensor networks for multiple event source localization using binary information from the sensor nodes. The events could continually emit signals whose strength is attenuated inversely proportional to the distance from the source. In this context, faults occur due to various reasons and are manifested when a node reports a wrong decision. In order to reduce the impact of node faults on the accuracy of multiple event localization, we introduce a trust index model to evaluate the fidelity of information which the nodes report and use in the event detection process, and propose the Trust Index based Subtract on Negative Add on Positive (TISNAP) localization algorithm, which reduces the impact of faulty nodes on the event localization by decreasing their trust index, to improve the accuracy of event localization and performance of fault tolerance for multiple event source localization. The algorithm includes three phases: first, the sink identifies the cluster nodes to determine the number of events occurred in the entire region by analyzing the binary data reported by all nodes; then, it constructs the likelihood matrix related to the cluster nodes and estimates the location of all events according to the alarmed status and trust index of the nodes around the cluster nodes. Finally, the sink updates the trust index of all nodes according to the fidelity of their information in the previous reporting cycle. The algorithm improves the accuracy of localization and performance of fault tolerance in multiple event source localization. The experiment results show that when the probability of node fault is close to 50%, the algorithm can still accurately determine the number of the events and have better accuracy of localization compared with other algorithms. PMID:22163972
Mixed H2/H∞-Based Fusion Estimation for Energy-Limited Multi-Sensors in Wearable Body Networks
Li, Chao; Zhang, Zhenjiang; Chao, Han-Chieh
2017-01-01
In wireless sensor networks, sensor nodes collect plenty of data for each time period. If all of data are transmitted to a Fusion Center (FC), the power of sensor node would run out rapidly. On the other hand, the data also needs a filter to remove the noise. Therefore, an efficient fusion estimation model, which can save the energy of the sensor nodes while maintaining higher accuracy, is needed. This paper proposes a novel mixed H2/H∞-based energy-efficient fusion estimation model (MHEEFE) for energy-limited Wearable Body Networks. In the proposed model, the communication cost is firstly reduced efficiently while keeping the estimation accuracy. Then, the parameters in quantization method are discussed, and we confirm them by an optimization method with some prior knowledge. Besides, some calculation methods of important parameters are researched which make the final estimates more stable. Finally, an iteration-based weight calculation algorithm is presented, which can improve the fault tolerance of the final estimate. In the simulation, the impacts of some pivotal parameters are discussed. Meanwhile, compared with the other related models, the MHEEFE shows a better performance in accuracy, energy-efficiency and fault tolerance. PMID:29280950
Fault tolerant attitude sensing and force feedback control for unmanned aerial vehicles
NASA Astrophysics Data System (ADS)
Jagadish, Chirag
Two aspects of an unmanned aerial vehicle are studied in this work. One is fault tolerant attitude determination and the other is to provide force feedback to the joy-stick of the UAV so as to prevent faulty inputs from the pilot. Determination of attitude plays an important role in control of aerial vehicles. One way of defining the attitude is through Euler angles. These angles can be determined based on the measurements of the projections of the gravity and earth magnetic fields on the three body axes of the vehicle. Attitude determination in unmanned aerial vehicles poses additional challenges due to limitations of space, payload, power and cost. Therefore it provides for almost no room for any bulky sensors or extra sensor hardware for backup and as such leaves no room for sensor fault issues either. In the face of these limitations, this study proposes a fault tolerant computing of Euler angles by utilizing multiple different computation methods, with each method utilizing a different subset of the available sensor measurement data. Twenty-five such methods have been presented in this document. The capability of computing the Euler angles in multiple ways provides a diversified redundancy required for fault tolerance. The proposed approach can identify certain sets of sensor failures and even separate the reference fields from the disturbances. A bank-to-turn maneuver of the NASA GTM UAV is used to demonstrate the fault tolerance provided by the proposed method as well as to demonstrate the method of determining the correct Euler angles despite interferences by inertial acceleration disturbances. Attitude computation is essential for stability. But as of today most UAVs are commanded remotely by human pilots. While basic stability control is entrusted to machine or the on-board automatic controller, overall guidance is usually with humans. It is therefore the pilot who sets the command/references through a joy-stick. While this is a good compromise between complete automation and complete human control, it still poses some unique challenges. Pilots of manned aircraft are present inside the cockpit of the aircraft they fly and thus have a better feel of the flying environment and also the limitations of the flight. The same might not be true for UAV pilots stationed on the ground. A major handicap is that visual feedback is the only one available for the UAV pilot. An additional parameter like force feedback on the remote control joy-stick can help the UAV pilot to physically feel the limitation of the safe flight envelope. This can make the flying itself easier and safer. A method proposed here is to design a joy-stick assembly with an additional actuator. This actuator is controlled so as to generate a force feedback on the joy-stick. The control developed for this system is such that the actuator allows free movement for the pilot as long as the UAV is within the safe flight envelope. On the other hand, if it is outside this safe range, the actuator opposes the pilot's applied torque and prevents him/her from giving erroneous commands to the UAV.
Dynamical jumping real-time fault-tolerant routing protocol for wireless sensor networks.
Wu, Guowei; Lin, Chi; Xia, Feng; Yao, Lin; Zhang, He; Liu, Bing
2010-01-01
In time-critical wireless sensor network (WSN) applications, a high degree of reliability is commonly required. A dynamical jumping real-time fault-tolerant routing protocol (DMRF) is proposed in this paper. Each node utilizes the remaining transmission time of the data packets and the state of the forwarding candidate node set to dynamically choose the next hop. Once node failure, network congestion or void region occurs, the transmission mode will switch to jumping transmission mode, which can reduce the transmission time delay, guaranteeing the data packets to be sent to the destination node within the specified time limit. By using feedback mechanism, each node dynamically adjusts the jumping probabilities to increase the ratio of successful transmission. Simulation results show that DMRF can not only efficiently reduce the effects of failure nodes, congestion and void region, but also yield higher ratio of successful transmission, smaller transmission delay and reduced number of control packets.
NASA Technical Reports Server (NTRS)
Caglayan, A. K.; Godiwala, P. M.
1985-01-01
The performance analysis results of a fault inferring nonlinear detection system (FINDS) using sensor flight data for the NASA ATOPS B-737 aircraft in a Microwave Landing System (MLS) environment is presented. First, a statistical analysis of the flight recorded sensor data was made in order to determine the characteristics of sensor inaccuracies. Next, modifications were made to the detection and decision functions in the FINDS algorithm in order to improve false alarm and failure detection performance under real modelling errors present in the flight data. Finally, the failure detection and false alarm performance of the FINDS algorithm were analyzed by injecting bias failures into fourteen sensor outputs over six repetitive runs of the five minute flight data. In general, the detection speed, failure level estimation, and false alarm performance showed a marked improvement over the previously reported simulation runs. In agreement with earlier results, detection speed was faster for filter measurement sensors soon as MLS than for filter input sensors such as flight control accelerometers.
NASA Technical Reports Server (NTRS)
Harper, Richard E.; Babikyan, Carol A.; Butler, Bryan P.; Clasen, Robert J.; Harris, Chris H.; Lala, Jaynarayan H.; Masotto, Thomas K.; Nagle, Gail A.; Prizant, Mark J.; Treadwell, Steven
1994-01-01
The Army Avionics Research and Development Activity (AVRADA) is pursuing programs that would enable effective and efficient management of large amounts of situational data that occurs during tactical rotorcraft missions. The Computer Aided Low Altitude Night Helicopter Flight Program has identified automated Terrain Following/Terrain Avoidance, Nap of the Earth (TF/TA, NOE) operation as key enabling technology for advanced tactical rotorcraft to enhance mission survivability and mission effectiveness. The processing of critical information at low altitudes with short reaction times is life-critical and mission-critical necessitating an ultra-reliable/high throughput computing platform for dependable service for flight control, fusion of sensor data, route planning, near-field/far-field navigation, and obstacle avoidance operations. To address these needs the Army Fault Tolerant Architecture (AFTA) is being designed and developed. This computer system is based upon the Fault Tolerant Parallel Processor (FTPP) developed by Charles Stark Draper Labs (CSDL). AFTA is hard real-time, Byzantine, fault-tolerant parallel processor which is programmed in the ADA language. This document describes the results of the Detailed Design (Phase 2 and 3 of a 3-year project) of the AFTA development. This document contains detailed descriptions of the program objectives, the TF/TA NOE application requirements, architecture, hardware design, operating systems design, systems performance measurements and analytical models.
Development and Evaluation of Fault-Tolerant Flight Control Systems
NASA Technical Reports Server (NTRS)
Song, Yong D.; Gupta, Kajal (Technical Monitor)
2004-01-01
The research is concerned with developing a new approach to enhancing fault tolerance of flight control systems. The original motivation for fault-tolerant control comes from the need for safe operation of control elements (e.g. actuators) in the event of hardware failures in high reliability systems. One such example is modem space vehicle subjected to actuator/sensor impairments. A major task in flight control is to revise the control policy to balance impairment detectability and to achieve sufficient robustness. This involves careful selection of types and parameters of the controllers and the impairment detecting filters used. It also involves a decision, upon the identification of some failures, on whether and how a control reconfiguration should take place in order to maintain a certain system performance level. In this project new flight dynamic model under uncertain flight conditions is considered, in which the effects of both ramp and jump faults are reflected. Stabilization algorithms based on neural network and adaptive method are derived. The control algorithms are shown to be effective in dealing with uncertain dynamics due to external disturbances and unpredictable faults. The overall strategy is easy to set up and the computation involved is much less as compared with other strategies. Computer simulation software is developed. A serious of simulation studies have been conducted with varying flight conditions.
NASA Astrophysics Data System (ADS)
Shi, Yongli; Wu, Zhong; Zhi, Kangyi; Xiong, Jun
2018-03-01
In order to realize reliable commutation of brushless DC motors (BLDCMs), a simple approach is proposed to detect and correct signal faults of Hall position sensors in this paper. First, the time instant of the next jumping edge for Hall signals is predicted by using prior information of pulse intervals in the last electrical period. Considering the possible errors between the predicted instant and the real one, a confidence interval is set by using the predicted value and a suitable tolerance for the next pulse edge. According to the relationship between the real pulse edge and the confidence interval, Hall signals can be judged and the signal faults can be corrected. Experimental results of a BLDCM at steady speed demonstrate the effectiveness of the approach.
Flight elements: Fault detection and fault management
NASA Technical Reports Server (NTRS)
Lum, H.; Patterson-Hine, A.; Edge, J. T.; Lawler, D.
1990-01-01
Fault management for an intelligent computational system must be developed using a top down integrated engineering approach. An approach proposed includes integrating the overall environment involving sensors and their associated data; design knowledge capture; operations; fault detection, identification, and reconfiguration; testability; causal models including digraph matrix analysis; and overall performance impacts on the hardware and software architecture. Implementation of the concept to achieve a real time intelligent fault detection and management system will be accomplished via the implementation of several objectives, which are: Development of fault tolerant/FDIR requirement and specification from a systems level which will carry through from conceptual design through implementation and mission operations; Implementation of monitoring, diagnosis, and reconfiguration at all system levels providing fault isolation and system integration; Optimize system operations to manage degraded system performance through system integration; and Lower development and operations costs through the implementation of an intelligent real time fault detection and fault management system and an information management system.
Fault Tolerant Computer Network Study
1980-04-01
2. 1.2. 2 Air Data The air data function processes air pressures, temperature , and angle- of-attack measurements, and provides calibrated airspeed...attitude direction indicator. 2.1.5.2 Fixtaking Sensors used for fixtaking include the radar (in ground map mode), head- up display (for visual...VFR interdiction mission. The radar (ground map mode) is also the primary sensor at night and in adverse weather if the target presents a
Extreme temperature robust optical sensor designs and fault-tolerant signal processing
Riza, Nabeel Agha [Oviedo, FL; Perez, Frank [Tujunga, CA
2012-01-17
Silicon Carbide (SiC) probe designs for extreme temperature and pressure sensing uses a single crystal SiC optical chip encased in a sintered SiC material probe. The SiC chip may be protected for high temperature only use or exposed for both temperature and pressure sensing. Hybrid signal processing techniques allow fault-tolerant extreme temperature sensing. Wavelength peak-to-peak (or null-to-null) collective spectrum spread measurement to detect wavelength peak/null shift measurement forms a coarse-fine temperature measurement using broadband spectrum monitoring. The SiC probe frontend acts as a stable emissivity Black-body radiator and monitoring the shift in radiation spectrum enables a pyrometer. This application combines all-SiC pyrometry with thick SiC etalon laser interferometry within a free-spectral range to form a coarse-fine temperature measurement sensor. RF notch filtering techniques improve the sensitivity of the temperature measurement where fine spectral shift or spectrum measurements are needed to deduce temperature.
Design of LPV fault-tolerant controller for pitch system of wind turbine
NASA Astrophysics Data System (ADS)
Wu, Dinghui; Zhang, Xiaolin
2017-07-01
To address failures of wind turbine pitch-angle sensors, traditional wind turbine linear parameter varying (LPV) model is transformed into a double-layer convex polyhedron LPV model. On the basis of this model, when the plurality of the sensor undergoes failure and details of the failure are inconvenient to obtain, each sub-controller is designed using distributed thought and gain scheduling method. The final controller is obtained using all of the sub-controllers by a convex combination. The design method corrects the errors of the linear model, improves the linear degree of the system, and solves the problem of multiple pitch angle faults to ensure stable operation of the wind turbine.
Interplanetary Radiation and Fault Tolerant Mini-Star Tracker System
NASA Technical Reports Server (NTRS)
Rakoczy, John; Paceley, Pete
2015-01-01
The Charles Stark Draper Laboratory, Inc. is partnering with the NASA Marshall Space Flight Center (MSFC) Engineering Directorate's Avionics Design Division and Flight Mechanics & Analysis Division to develop and test a prototype small, low-weight, low-power, radiation-hardened, fault-tolerant mini-star tracker (fig. 1). The project is expected to enable Draper Laboratory and its small business partner, L-1 Standards and Technologies, Inc., to develop a new guidance, navigation, and control sensor product for the growing small sat technology market. The project also addresses MSFC's need for sophisticated small sat technologies to support a variety of science missions in Earth orbit and beyond. The prototype star tracker will be tested on the night sky on MSFC's Automated Lunar and Meteor Observatory (ALAMO) telescope. The specific goal of the project is to address the need for a compact, low size, weight, and power, yet radiation hardened and fault tolerant star tracker system that can be used as a stand-alone attitude determination system or incorporated into a complete attitude determination and control system for emerging interplanetary and operational CubeSat and small sat missions.
2013-05-01
representation of a centralized control system on a turbine engine. All actuators and sensors are point-to-point cabled to the controller ( FADEC ) which...electronics themselves. Figure 1: Centralized Control System Each function resides within the FADEC and uses Unique Point-to-Point Analog...distributed control system on the same turbine engine. The actuators and sensors interface to Smart Nodes which, in turn communicate to the FADEC via
A real-time, practical sensor fault-tolerant module for robust EMG pattern recognition.
Zhang, Xiaorong; Huang, He
2015-02-19
Unreliability of surface EMG recordings over time is a challenge for applying the EMG pattern recognition (PR)-controlled prostheses in clinical practice. Our previous study proposed a sensor fault-tolerant module (SFTM) by utilizing redundant information in multiple EMG signals. The SFTM consists of multiple sensor fault detectors and a self-recovery mechanism that can identify anomaly in EMG signals and remove the recordings of the disturbed signals from the input of the pattern classifier to recover the PR performance. While the proposed SFTM has shown great promise, the previous design is impractical. A practical SFTM has to be fast enough, lightweight, automatic, and robust under different conditions with or without disturbances. This paper presented a real-time, practical SFTM towards robust EMG PR. A novel fast LDA retraining algorithm and a fully automatic sensor fault detector based on outlier detection were developed, which allowed the SFTM to promptly detect disturbances and recover the PR performance immediately. These components of SFTM were then integrated with the EMG PR module and tested on five able-bodied subjects and a transradial amputee in real-time for classifying multiple hand and wrist motions under different conditions with different disturbance types and levels. The proposed fast LDA retraining algorithm significantly shortened the retraining time from nearly 1 s to less than 4 ms when tested on the embedded system prototype, which demonstrated the feasibility of a nearly "zero-delay" SFTM that is imperceptible to the users. The results of the real-time tests suggested that the SFTM was able to handle different types of disturbances investigated in this study and significantly improve the classification performance when one or multiple EMG signals were disturbed. In addition, the SFTM could also maintain the system's classification performance when there was no disturbance. This paper presented a real-time, lightweight, and automatic SFTM, which paved the way for reliable and robust EMG PR for prosthesis control.
Improving Security for SCADA Sensor Networks with Reputation Systems and Self-Organizing Maps.
Moya, José M; Araujo, Alvaro; Banković, Zorana; de Goyeneche, Juan-Mariano; Vallejo, Juan Carlos; Malagón, Pedro; Villanueva, Daniel; Fraga, David; Romero, Elena; Blesa, Javier
2009-01-01
The reliable operation of modern infrastructures depends on computerized systems and Supervisory Control and Data Acquisition (SCADA) systems, which are also based on the data obtained from sensor networks. The inherent limitations of the sensor devices make them extremely vulnerable to cyberwarfare/cyberterrorism attacks. In this paper, we propose a reputation system enhanced with distributed agents, based on unsupervised learning algorithms (self-organizing maps), in order to achieve fault tolerance and enhanced resistance to previously unknown attacks. This approach has been extensively simulated and compared with previous proposals.
Improving Security for SCADA Sensor Networks with Reputation Systems and Self-Organizing Maps
Moya, José M.; Araujo, Álvaro; Banković, Zorana; de Goyeneche, Juan-Mariano; Vallejo, Juan Carlos; Malagón, Pedro; Villanueva, Daniel; Fraga, David; Romero, Elena; Blesa, Javier
2009-01-01
The reliable operation of modern infrastructures depends on computerized systems and Supervisory Control and Data Acquisition (SCADA) systems, which are also based on the data obtained from sensor networks. The inherent limitations of the sensor devices make them extremely vulnerable to cyberwarfare/cyberterrorism attacks. In this paper, we propose a reputation system enhanced with distributed agents, based on unsupervised learning algorithms (self-organizing maps), in order to achieve fault tolerance and enhanced resistance to previously unknown attacks. This approach has been extensively simulated and compared with previous proposals. PMID:22291569
Making real-time reactive systems reliable
NASA Technical Reports Server (NTRS)
Marzullo, Keith; Wood, Mark
1990-01-01
A reactive system is characterized by a control program that interacts with an environment (or controlled program). The control program monitors the environment and reacts to significant events by sending commands to the environment. This structure is quite general. Not only are most embedded real time systems reactive systems, but so are monitoring and debugging systems and distributed application management systems. Since reactive systems are usually long running and may control physical equipment, fault tolerance is vital. The research tries to understand the principal issues of fault tolerance in real time reactive systems and to build tools that allow a programmer to design reliable, real time reactive systems. In order to make real time reactive systems reliable, several issues must be addressed: (1) How can a control program be built to tolerate failures of sensors and actuators. To achieve this, a methodology was developed for transforming a control program that references physical value into one that tolerates sensors that can fail and can return inaccurate values; (2) How can the real time reactive system be built to tolerate failures of the control program. Towards this goal, whether the techniques presented can be extended to real time reactive systems is investigated; and (3) How can the environment be specified in a way that is useful for writing a control program. Towards this goal, whether a system with real time constraints can be expressed as an equivalent system without such constraints is also investigated.
Information Weighted Consensus for Distributed Estimation in Vision Networks
ERIC Educational Resources Information Center
Kamal, Ahmed Tashrif
2013-01-01
Due to their high fault-tolerance, ease of installation and scalability to large networks, distributed algorithms have recently gained immense popularity in the sensor networks community, especially in computer vision. Multi-target tracking in a camera network is one of the fundamental problems in this domain. Distributed estimation algorithms…
A probabilistic dynamic energy model for ad-hoc wireless sensors network with varying topology
NASA Astrophysics Data System (ADS)
Al-Husseini, Amal
In this dissertation we investigate the behavior of Wireless Sensor Networks (WSNs) from the degree distribution and evolution perspective. In specific, we focus on implementation of a scale-free degree distribution topology for energy efficient WSNs. WSNs is an emerging technology that finds its applications in different areas such as environment monitoring, agricultural crop monitoring, forest fire monitoring, and hazardous chemical monitoring in war zones. This technology allows us to collect data without human presence or intervention. Energy conservation/efficiency is one of the major issues in prolonging the active life WSNs. Recently, many energy aware and fault tolerant topology control algorithms have been presented, but there is dearth of research focused on energy conservation/efficiency of WSNs. Therefore, we study energy efficiency and fault-tolerance in WSNs from the degree distribution and evolution perspective. Self-organization observed in natural and biological systems has been directly linked to their degree distribution. It is widely known that scale-free distribution bestows robustness, fault-tolerance, and access efficiency to system. Fascinated by these properties, we propose two complex network theoretic self-organizing models for adaptive WSNs. In particular, we focus on adopting the Barabasi and Albert scale-free model to fit into the constraints and limitations of WSNs. We developed simulation models to conduct numerical experiments and network analysis. The main objective of studying these models is to find ways to reducing energy usage of each node and balancing the overall network energy disrupted by faulty communication among nodes. The first model constructs the wireless sensor network relative to the degree (connectivity) and remaining energy of every individual node. We observed that it results in a scale-free network structure which has good fault tolerance properties in face of random node failures. The second model considers additional constraints on the maximum degree of each node as well as the energy consumption relative to degree changes. This gives more realistic results from a dynamical network perspective. It results in balanced network-wide energy consumption. The results show that networks constructed using the proposed approach have good properties for different centrality measures. The outcomes of the presented research are beneficial to building WSN control models with greater self-organization properties which leads to optimal energy consumption.
Health management and controls for earth to orbit propulsion systems
NASA Technical Reports Server (NTRS)
Bickford, R. L.
1992-01-01
Fault detection and isolation for advanced rocket engine controllers are discussed focusing on advanced sensing systems and software which significantly improve component failure detection for engine safety and health management. Aerojet's Space Transportation Main Engine controller for the National Launch System is the state of the art in fault tolerant engine avionics. Health management systems provide high levels of automated fault coverage and significantly improve vehicle delivered reliability and lower preflight operations costs. Key technologies, including the sensor data validation algorithms and flight capable spectrometers, have been demonstrated in ground applications and are found to be suitable for bridging programs into flight applications.
Interconnection requirements in avionic systems
NASA Astrophysics Data System (ADS)
Vergnolle, Claude; Houssay, Bruno
1991-04-01
The future aircraft generation will have thousand smart electromagnetic sensors distributed allover. Each sensor is connected with fibers links to the main-frame computer in charge of the real time signal''s correlation. Such a computer must be compactly built and massively parallel: it needs the use of 3 D optical free-space interconnect between neighbouring boards and reconfigurable interconnects via holographic backplane. The optical interconnect facilities will be also used to build fault-tolerant computer through large redundancy.
A Log-Scaling Fault Tolerant Agreement Algorithm for a Fault Tolerant MPI
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hursey, Joshua J; Naughton, III, Thomas J; Vallee, Geoffroy R
The lack of fault tolerance is becoming a limiting factor for application scalability in HPC systems. The MPI does not provide standardized fault tolerance interfaces and semantics. The MPI Forum's Fault Tolerance Working Group is proposing a collective fault tolerant agreement algorithm for the next MPI standard. Such algorithms play a central role in many fault tolerant applications. This paper combines a log-scaling two-phase commit agreement algorithm with a reduction operation to provide the necessary functionality for the new collective without any additional messages. Error handling mechanisms are described that preserve the fault tolerance properties while maintaining overall scalability.
Fallback options for airgap sensor fault of an electromagnetic suspension system
NASA Astrophysics Data System (ADS)
Michail, Konstantinos; Zolotas, Argyrios C.; Goodall, Roger M.
2013-06-01
The paper presents a method to recover the performance of an electromagnetic suspension under faulty airgap sensor. The proposed control scheme is a combination of classical control loops, a Kalman Estimator and analytical redundancy (for the airgap signal). In this way redundant airgap sensors are not essential for reliable operation of this system. When the airgap sensor fails the required signal is recovered using a combination of a Kalman estimator and analytical redundancy. The performance of the suspension is optimised using genetic algorithms and some preliminary robustness issues to load and operating airgap variations are discussed. Simulations on a realistic model of such type of suspension illustrate the efficacy of the proposed sensor tolerant control method.
Test experience on an ultrareliable computer communication network
NASA Technical Reports Server (NTRS)
Abbott, L. W.
1984-01-01
The dispersed sensor processing mesh (DSPM) is an experimental, ultrareliable, fault-tolerant computer communications network that exhibits an organic-like ability to regenerate itself after suffering damage. The regeneration is accomplished by two routines - grow and repair. This paper discusses the DSPM concept for achieving fault tolerance and provides a brief description of the mechanization of both the experiment and the six-node experimental network. The main topic of this paper is the system performance of the growth algorithm contained in the grow routine. The characteristics imbued to DSPM by the growth algorithm are also discussed. Data from an experimental DSPM network and software simulation of larger DSPM-type networks are used to examine the inherent limitation on growth time by the growth algorithm and the relationship of growth time to network size and topology.
An experimental investigation of fault tolerant software structures in an avionics application
NASA Technical Reports Server (NTRS)
Caglayan, Alper K.; Eckhardt, Dave E., Jr.
1989-01-01
The objective of this experimental investigation is to compare the functional performance and software reliability of competing fault tolerant software structures utilizing software diversity. In this experiment, three versions of the redundancy management software for a skewed sensor array have been developed using three diverse failure detection and isolation algorithms and incorporated into various N-version, recovery block and hybrid software structures. The empirical results show that, for maximum functional performance improvement in the selected application domain, the results of diverse algorithms should be voted before being processed by multiple versions without enforced diversity. Results also suggest that when the reliability gain with an N-version structure is modest, recovery block structures are more feasible since higher reliability can be obtained using an acceptance check with a modest reliability.
Further Structural Intelligence for Sensors Cluster Technology in Manufacturing
Mekid, Samir
2006-01-01
With the ever increasing complex sensing and actuating tasks in manufacturing plants, intelligent sensors cluster in hybrid networks becomes a rapidly expanding area. They play a dominant role in many fields from macro and micro scale. Global object control and the ability to self organize into fault-tolerant and scalable systems are expected for high level applications. In this paper, new structural concepts of intelligent sensors and networks with new intelligent agents are presented. Embedding new functionalities to dynamically manage cooperative agents for autonomous machines are interesting key enabling technologies most required in manufacturing for zero defects production.
Integrated rate isolation sensor
NASA Technical Reports Server (NTRS)
Brady, Tye (Inventor); Henderson, Timothy (Inventor); Phillips, Richard (Inventor); Zimpfer, Doug (Inventor); Crain, Tim (Inventor)
2012-01-01
In one embodiment, a system for providing fault-tolerant inertial measurement data includes a sensor for measuring an inertial parameter and a processor. The sensor has less accuracy than a typical inertial measurement unit (IMU). The processor detects whether a difference exists between a first data stream received from a first inertial measurement unit and a second data stream received from a second inertial measurement unit. Upon detecting a difference, the processor determines whether at least one of the first or second inertial measurement units has failed by comparing each of the first and second data streams to the inertial parameter.
NASA Technical Reports Server (NTRS)
Brunelle, J. E.; Eckhardt, D. E., Jr.
1985-01-01
Results are presented of an experiment conducted in the NASA Avionics Integrated Research Laboratory (AIRLAB) to investigate the implementation of fault-tolerant software techniques on fault-tolerant computer architectures, in particular the Software Implemented Fault Tolerance (SIFT) computer. The N-version programming and recovery block techniques were implemented on a portion of the SIFT operating system. The results indicate that, to effectively implement fault-tolerant software design techniques, system requirements will be impacted and suggest that retrofitting fault-tolerant software on existing designs will be inefficient and may require system modification.
Autonomous control system reconfiguration for spacecraft with non-redundant actuators
NASA Astrophysics Data System (ADS)
Grossman, Walter
1995-05-01
The Small Satellite Technology Initiative (SSTI) 'CLARK' spacecraft is required to be single-failure tolerant, i.e., no failure of any single component or subsystem shall result in complete mission loss. Fault tolerance is usually achieved by implementing redundant subsystems. Fault tolerant systems are therefore heavier and cost more to build and launch than non-redundent, non fault-tolerant spacecraft. The SSTI CLARK satellite Attitude Determination and Control System (ADACS) achieves single-fault tolerance without redundancy. The attitude determination system system uses a Kalman Filter which is inherently robust to loss of any single attitude sensor. The attitude control system uses three orthogonal reaction wheels for attitude control and three magnetic dipoles for momentum control. The nominal six-actuator control system functions by projecting the attitude correction torque onto the reaction wheels while a slower momentum management outer loop removes the excess momentum in the direction normal to the local B field. The actuators are not redundant so the nominal control law cannot be implemented in the event of a loss of a single actuator (dipole or reaction wheel). The spacecraft dynamical state (attitude, angular rate, and momentum) is controllable from any five-element subset of the six actuators. With loss of an actuator the instantaneous control authority may not span R(3) but the controllability gramian integral(limits between t,0) Phi(t, tau)B(tau )B(prime)(tau) Phi(prime)(t, tau)d tau retains full rank. Upon detection of an actuator failure the control torque is decomposed onto the remaining active axes. The attitude control torque is effected and the over-orbit momentum is controlled. The resulting control system performance approaches that of the nominal system.
The application of micromachined sensors to manned space systems
NASA Technical Reports Server (NTRS)
Bordano, Aldo; Havey, Gary; Wald, Jerry; Nasr, Hatem
1993-01-01
Micromachined sensors promise significant system advantages to manned space vehicles. Vehicle Health Monitoring (VHM) is a critical need for most future space systems. Micromachined sensors play a significant role in advancing the application of VHM in future space vehicles. This paper addresses the requirements that future VHM systems place on micromachined sensors such as: system integration, performance, size, weight, power, redundancy, reliability and fault tolerance. Current uses of micromachined sensors in commercial, military and space systems are used to document advantages that are gained and lessons learned. Based on these successes, the future use of micromachined sensors in space programs is discussed in terms of future directions and issues that need to be addressed such as how commercial and military sensors can meet future space system requirements.
Practical comparison of distributed ledger technologies for IoT
NASA Astrophysics Data System (ADS)
Red, Val A.
2017-05-01
Existing distributed ledger implementations - specifically, several blockchain implementations - embody a cacophony of divergent capabilities augmenting innovations of cryptographic hashes, consensus mechanisms, and asymmetric cryptography in a wide variety of applications. Whether specifically designed for cryptocurrency or otherwise, several distributed ledgers rely upon modular mechanisms such as consensus or smart contracts. These components, however, can vary substantially among implementations; differences involving proof-of-work, practical byzantine fault tolerance, and other consensus approaches exemplify distinct distributed ledger variations. Such divergence results in unique combinations of modules, performance, latency, and fault tolerance. As implementations continue to develop rapidly due to the emerging nature of blockchain technologies, this paper encapsulates a snapshot of sensor and internet of things (IoT) specific implementations of blockchain as of the end of 2016. Several technical risks and divergent approaches preclude standardization of a blockchain for sensors and IoT in the foreseeable future; such issues will be assessed alongside the practicality of IoT applications among Hyperledger, Iota, and Ethereum distributed ledger implementations suggested for IoT. This paper contributes a comparison of existing distributed ledger implementations intended for practical sensor and IoT utilization. A baseline for characterizing distributed ledger implementations in the context of IoT and sensors is proposed. Technical approaches and performance are compared considering IoT size, weight, and power limitations. Consensus and smart contracts, if applied, are also analyzed for the respective implementations' practicality and security. Overall, the maturity of distributed ledgers with respect to sensor and IoT applicability will be analyzed for enterprise interoperability.
FTAPE: A fault injection tool to measure fault tolerance
NASA Technical Reports Server (NTRS)
Tsai, Timothy K.; Iyer, Ravishankar K.
1995-01-01
The paper introduces FTAPE (Fault Tolerance And Performance Evaluator), a tool that can be used to compare fault-tolerant computers. The tool combines system-wide fault injection with a controllable workload. A workload generator is used to create high stress conditions for the machine. Faults are injected based on this workload activity in order to ensure a high level of fault propagation. The errors/fault ratio and performance degradation are presented as measures of fault tolerance.
A Fault-tolerant RISC Microprocessor for Spacecraft Applications
NASA Technical Reports Server (NTRS)
Timoc, Constantin; Benz, Harry
1990-01-01
Viewgraphs on a fault-tolerant RISC microprocessor for spacecraft applications are presented. Topics covered include: reduced instruction set computer; fault tolerant registers; fault tolerant ALU; and double rail CMOS logic.
Test experience on an ultrareliable computer communication network
NASA Technical Reports Server (NTRS)
Abbott, L. W.
1984-01-01
The dispersed sensor processing mesh (DSPM) is an experimental, ultra-reliable, fault-tolerant computer communications network that exhibits an organic-like ability to regenerate itself after suffering damage. The regeneration is accomplished by two routines - grow and repair. This paper discusses the DSPM concept for achieving fault tolerance and provides a brief description of the mechanization of both the experiment and the six-node experimental network. The main topic of this paper is the system performance of the growth algorithm contained in the grow routine. The characteristics imbued to DSPM by the growth algorithm are also discussed. Data from an experimental DSPM network and software simulation of larger DSPM-type networks are used to examine the inherent limitation on growth time by the growth algorithm and the relationship of growth time to network size and topology.
NASA Technical Reports Server (NTRS)
Tapia, Moiez A.
1993-01-01
The study of a comparative analysis of distinct multiplex and fault-tolerant configurations for a PLC-based safety system from a reliability point of view is presented. It considers simplex, duplex and fault-tolerant triple redundancy configurations. The standby unit in case of a duplex configuration has a failure rate which is k times the failure rate of the standby unit, the value of k varying from 0 to 1. For distinct values of MTTR and MTTF of the main unit, MTBF and availability for these configurations are calculated. The effect of duplexing only the PLC module or only the sensors and the actuators module, on the MTBF of the configuration, is also presented. The results are summarized and merits and demerits of various configurations under distinct environments are discussed.
Analysis of a hardware and software fault tolerant processor for critical applications
NASA Technical Reports Server (NTRS)
Dugan, Joanne B.
1993-01-01
Computer systems for critical applications must be designed to tolerate software faults as well as hardware faults. A unified approach to tolerating hardware and software faults is characterized by classifying faults in terms of duration (transient or permanent) rather than source (hardware or software). Errors arising from transient faults can be handled through masking or voting, but errors arising from permanent faults require system reconfiguration to bypass the failed component. Most errors which are caused by software faults can be considered transient, in that they are input-dependent. Software faults are triggered by a particular set of inputs. Quantitative dependability analysis of systems which exhibit a unified approach to fault tolerance can be performed by a hierarchical combination of fault tree and Markov models. A methodology for analyzing hardware and software fault tolerant systems is applied to the analysis of a hypothetical system, loosely based on the Fault Tolerant Parallel Processor. The models consider both transient and permanent faults, hardware and software faults, independent and related software faults, automatic recovery, and reconfiguration.
Furquim, Gustavo; Filho, Geraldo P R; Jalali, Roozbeh; Pessin, Gustavo; Pazzi, Richard W; Ueyama, Jó
2018-03-19
The rise in the number and intensity of natural disasters is a serious problem that affects the whole world. The consequences of these disasters are significantly worse when they occur in urban districts because of the casualties and extent of the damage to goods and property that is caused. Until now feasible methods of dealing with this have included the use of wireless sensor networks (WSNs) for data collection and machine-learning (ML) techniques for forecasting natural disasters. However, there have recently been some promising new innovations in technology which have supplemented the task of monitoring the environment and carrying out the forecasting. One of these schemes involves adopting IP-based (Internet Protocol) sensor networks, by using emerging patterns for IoT. In light of this, in this study, an attempt has been made to set out and describe the results achieved by SENDI (System for dEtecting and forecasting Natural Disasters based on IoT). SENDI is a fault-tolerant system based on IoT, ML and WSN for the detection and forecasting of natural disasters and the issuing of alerts. The system was modeled by means of ns-3 and data collected by a real-world WSN installed in the town of São Carlos - Brazil, which carries out the data collection from rivers in the region. The fault-tolerance is embedded in the system by anticipating the risk of communication breakdowns and the destruction of the nodes during disasters. It operates by adding intelligence to the nodes to carry out the data distribution and forecasting, even in extreme situations. A case study is also included for flash flood forecasting and this makes use of the ns-3 SENDI model and data collected by WSN.
Furquim, Gustavo; Filho, Geraldo P. R.; Pessin, Gustavo; Pazzi, Richard W.
2018-01-01
The rise in the number and intensity of natural disasters is a serious problem that affects the whole world. The consequences of these disasters are significantly worse when they occur in urban districts because of the casualties and extent of the damage to goods and property that is caused. Until now feasible methods of dealing with this have included the use of wireless sensor networks (WSNs) for data collection and machine-learning (ML) techniques for forecasting natural disasters. However, there have recently been some promising new innovations in technology which have supplemented the task of monitoring the environment and carrying out the forecasting. One of these schemes involves adopting IP-based (Internet Protocol) sensor networks, by using emerging patterns for IoT. In light of this, in this study, an attempt has been made to set out and describe the results achieved by SENDI (System for dEtecting and forecasting Natural Disasters based on IoT). SENDI is a fault-tolerant system based on IoT, ML and WSN for the detection and forecasting of natural disasters and the issuing of alerts. The system was modeled by means of ns-3 and data collected by a real-world WSN installed in the town of São Carlos - Brazil, which carries out the data collection from rivers in the region. The fault-tolerance is embedded in the system by anticipating the risk of communication breakdowns and the destruction of the nodes during disasters. It operates by adding intelligence to the nodes to carry out the data distribution and forecasting, even in extreme situations. A case study is also included for flash flood forecasting and this makes use of the ns-3 SENDI model and data collected by WSN. PMID:29562657
NASA Astrophysics Data System (ADS)
Belapurkar, Rohit K.
Future aircraft engine control systems will be based on a distributed architecture, in which, the sensors and actuators will be connected to the Full Authority Digital Engine Control (FADEC) through an engine area network. Distributed engine control architecture will allow the implementation of advanced, active control techniques along with achieving weight reduction, improvement in performance and lower life cycle cost. The performance of a distributed engine control system is predominantly dependent on the performance of the communication network. Due to the serial data transmission policy, network-induced time delays and sampling jitter are introduced between the sensor/actuator nodes and the distributed FADEC. Communication network faults and transient node failures may result in data dropouts, which may not only degrade the control system performance but may even destabilize the engine control system. Three different architectures for a turbine engine control system based on a distributed framework are presented. A partially distributed control system for a turbo-shaft engine is designed based on ARINC 825 communication protocol. Stability conditions and control design methodology are developed for the proposed partially distributed turbo-shaft engine control system to guarantee the desired performance under the presence of network-induced time delay and random data loss due to transient sensor/actuator failures. A fault tolerant control design methodology is proposed to benefit from the availability of an additional system bandwidth and from the broadcast feature of the data network. It is shown that a reconfigurable fault tolerant control design can help to reduce the performance degradation in presence of node failures. A T-700 turbo-shaft engine model is used to validate the proposed control methodology based on both single input and multiple-input multiple-output control design techniques.
Agent Based Fault Tolerance for the Mobile Environment
NASA Astrophysics Data System (ADS)
Park, Taesoon
This paper presents a fault-tolerance scheme based on mobile agents for the reliable mobile computing systems. Mobility of the agent is suitable to trace the mobile hosts and the intelligence of the agent makes it efficient to support the fault tolerance services. This paper presents two approaches to implement the mobile agent based fault tolerant service and their performances are evaluated and compared with other fault-tolerant schemes.
Sequential behavior and its inherent tolerance to memory faults.
NASA Technical Reports Server (NTRS)
Meyer, J. F.
1972-01-01
Representation of a memory fault of a sequential machine M by a function mu on the states of M and the result of the fault by an appropriately determined machine M(mu). Given some sequential behavior B, its inherent tolerance to memory faults can then be measured in terms of the minimum memory redundancy required to realize B with a state-assigned machine having fault tolerance type tau and fault tolerance level t. A behavior having maximum inherent tolerance is exhibited, and it is shown that behaviors of the same size can have different inherent tolerance.
NASA Technical Reports Server (NTRS)
Joshi, Suresh M.
2012-01-01
This paper explores a class of multiple-model-based fault detection and identification (FDI) methods for bias-type faults in actuators and sensors. These methods employ banks of Kalman-Bucy filters to detect the faults, determine the fault pattern, and estimate the fault values, wherein each Kalman-Bucy filter is tuned to a different failure pattern. Necessary and sufficient conditions are presented for identifiability of actuator faults, sensor faults, and simultaneous actuator and sensor faults. It is shown that FDI of simultaneous actuator and sensor faults is not possible using these methods when all sensors have biases.
Identifiability of Additive Actuator and Sensor Faults by State Augmentation
NASA Technical Reports Server (NTRS)
Joshi, Suresh; Gonzalez, Oscar R.; Upchurch, Jason M.
2014-01-01
A class of fault detection and identification (FDI) methods for bias-type actuator and sensor faults is explored in detail from the point of view of fault identifiability. The methods use state augmentation along with banks of Kalman-Bucy filters for fault detection, fault pattern determination, and fault value estimation. A complete characterization of conditions for identifiability of bias-type actuator faults, sensor faults, and simultaneous actuator and sensor faults is presented. It is shown that FDI of simultaneous actuator and sensor faults is not possible using these methods when all sensors have unknown biases. The fault identifiability conditions are demonstrated via numerical examples. The analytical and numerical results indicate that caution must be exercised to ensure fault identifiability for different fault patterns when using such methods.
Monitoring and Control Interface Based on Virtual Sensors
Escobar, Ricardo F.; Adam-Medina, Manuel; García-Beltrán, Carlos D.; Olivares-Peregrino, Víctor H.; Juárez-Romero, David; Guerrero-Ramírez, Gerardo V.
2014-01-01
In this article, a toolbox based on a monitoring and control interface (MCI) is presented and applied in a heat exchanger. The MCI was programed in order to realize sensor fault detection and isolation and fault tolerance using virtual sensors. The virtual sensors were designed from model-based high-gain observers. To develop the control task, different kinds of control laws were included in the monitoring and control interface. These control laws are PID, MPC and a non-linear model-based control law. The MCI helps to maintain the heat exchanger under operation, even if a temperature outlet sensor fault occurs; in the case of outlet temperature sensor failure, the MCI will display an alarm. The monitoring and control interface is used as a practical tool to support electronic engineering students with heat transfer and control concepts to be applied in a double-pipe heat exchanger pilot plant. The method aims to teach the students through the observation and manipulation of the main variables of the process and by the interaction with the monitoring and control interface (MCI) developed in LabVIEW©. The MCI provides the electronic engineering students with the knowledge of heat exchanger behavior, since the interface is provided with a thermodynamic model that approximates the temperatures and the physical properties of the fluid (density and heat capacity). An advantage of the interface is the easy manipulation of the actuator for an automatic or manual operation. Another advantage of the monitoring and control interface is that all algorithms can be manipulated and modified by the users. PMID:25365462
Coordinated Fault Tolerance for High-Performance Computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dongarra, Jack; Bosilca, George; et al.
2013-04-08
Our work to meet our goal of end-to-end fault tolerance has focused on two areas: (1) improving fault tolerance in various software currently available and widely used throughout the HEC domain and (2) using fault information exchange and coordination to achieve holistic, systemwide fault tolerance and understanding how to design and implement interfaces for integrating fault tolerance features for multiple layers of the software stack—from the application, math libraries, and programming language runtime to other common system software such as jobs schedulers, resource managers, and monitoring tools.
Physical fault tolerance of nanoelectronics.
Szkopek, Thomas; Roychowdhury, Vwani P; Antoniadis, Dimitri A; Damoulakis, John N
2011-04-29
The error rate in complementary transistor circuits is suppressed exponentially in electron number, arising from an intrinsic physical implementation of fault-tolerant error correction. Contrariwise, explicit assembly of gates into the most efficient known fault-tolerant architecture is characterized by a subexponential suppression of error rate with electron number, and incurs significant overhead in wiring and complexity. We conclude that it is more efficient to prevent logical errors with physical fault tolerance than to correct logical errors with fault-tolerant architecture.
Analysis of typical fault-tolerant architectures using HARP
NASA Technical Reports Server (NTRS)
Bavuso, Salvatore J.; Bechta Dugan, Joanne; Trivedi, Kishor S.; Rothmann, Elizabeth M.; Smith, W. Earl
1987-01-01
Difficulties encountered in the modeling of fault-tolerant systems are discussed. The Hybrid Automated Reliability Predictor (HARP) approach to modeling fault-tolerant systems is described. The HARP is written in FORTRAN, consists of nearly 30,000 lines of codes and comments, and is based on behavioral decomposition. Using the behavioral decomposition, the dependability model is divided into fault-occurrence/repair and fault/error-handling models; the characteristics and combining of these two models are examined. Examples in which the HARP is applied to the modeling of some typical fault-tolerant systems, including a local-area network, two fault-tolerant computer systems, and a flight control system, are presented.
A fault-tolerant avionics suite for an entry research vehicle
NASA Technical Reports Server (NTRS)
Dzwonczyk, Mark; Stone, Howard
1988-01-01
A highly-reliable avionics suite has been designed for an Entry Research Vehicle. The autonomous spacecraft would be deployed from the Space Shuttle Orbiter and perform a variety of aerodynamic and propulsive maneuvers which may be required for future space transportation system vehicles. The flight electronics consist of a central fault-tolerant processor, which is resilient to all first failures, reliably cross-strapped to redundant and distributed sets of sensors and effectors. This paper describes the preliminary design and analysis of the architecture which resulted from a fifteen month study by the Charles Stark Draper Laboratory for the NASA Langley Research Center. After a brief introduction to the design task, the architecture of the central flight computer and its interface to the vehicle are discussed. Following this, the method and results of the baseline reliability study for the avionic suite are presented.
A fault-tolerant avionics suite for an entry research vehicle
NASA Astrophysics Data System (ADS)
Dzwonczyk, Mark; Stone, Howard
A highly-reliable avionics suite has been designed for an Entry Research Vehicle. The autonomous spacecraft would be deployed from the Space Shuttle Orbiter and perform a variety of aerodynamic and propulsive maneuvers which may be required for future space transportation system vehicles. The flight electronics consist of a central fault-tolerant processor, which is resilient to all first failures, reliably cross-strapped to redundant and distributed sets of sensors and effectors. This paper describes the preliminary design and analysis of the architecture which resulted from a fifteen month study by the Charles Stark Draper Laboratory for the NASA Langley Research Center. After a brief introduction to the design task, the architecture of the central flight computer and its interface to the vehicle are discussed. Following this, the method and results of the baseline reliability study for the avionic suite are presented.
What does fault tolerant Deep Learning need from MPI?
DOE Office of Scientific and Technical Information (OSTI.GOV)
Amatya, Vinay C.; Vishnu, Abhinav; Siegel, Charles M.
Deep Learning (DL) algorithms have become the {\\em de facto} Machine Learning (ML) algorithm for large scale data analysis. DL algorithms are computationally expensive -- even distributed DL implementations which use MPI require days of training (model learning) time on commonly studied datasets. Long running DL applications become susceptible to faults -- requiring development of a fault tolerant system infrastructure, in addition to fault tolerant DL algorithms. This raises an important question: {\\em What is needed from MPI for designing fault tolerant DL implementations?} In this paper, we address this problem for permanent faults. We motivate the need for amore » fault tolerant MPI specification by an in-depth consideration of recent innovations in DL algorithms and their properties, which drive the need for specific fault tolerance features. We present an in-depth discussion on the suitability of different parallelism types (model, data and hybrid); a need (or lack thereof) for check-pointing of any critical data structures; and most importantly, consideration for several fault tolerance proposals (user-level fault mitigation (ULFM), Reinit) in MPI and their applicability to fault tolerant DL implementations. We leverage a distributed memory implementation of Caffe, currently available under the Machine Learning Toolkit for Extreme Scale (MaTEx). We implement our approaches by extending MaTEx-Caffe for using ULFM-based implementation. Our evaluation using the ImageNet dataset and AlexNet neural network topology demonstrates the effectiveness of the proposed fault tolerant DL implementation using OpenMPI based ULFM.« less
Modeling, Detection, and Disambiguation of Sensor Faults for Aerospace Applications
NASA Technical Reports Server (NTRS)
Balaban, Edward; Saxena, Abhinav; Bansal, Prasun; Goebel, Kai F.; Curran, Simon
2009-01-01
Sensor faults continue to be a major hurdle for systems health management to reach its full potential. At the same time, few recorded instances of sensor faults exist. It is equally difficult to seed particular sensor faults. Therefore, research is underway to better understand the different fault modes seen in sensors and to model the faults. The fault models can then be used in simulated sensor fault scenarios to ensure that algorithms can distinguish between sensor faults and system faults. The paper illustrates the work with data collected from an electro-mechanical actuator in an aerospace setting, equipped with temperature, vibration, current, and position sensors. The most common sensor faults, such as bias, drift, scaling, and dropout were simulated and injected into the experimental data, with the goal of making these simulations as realistic as feasible. A neural network based classifier was then created and tested on both experimental data and the more challenging randomized data sequences. Additional studies were also conducted to determine sensitivity of detection and disambiguation efficacy to severity of fault conditions.
Fault-tolerant processing system
NASA Technical Reports Server (NTRS)
Palumbo, Daniel L. (Inventor)
1996-01-01
A fault-tolerant, fiber optic interconnect, or backplane, which serves as a via for data transfer between modules. Fault tolerance algorithms are embedded in the backplane by dividing the backplane into a read bus and a write bus and placing a redundancy management unit (RMU) between the read bus and the write bus so that all data transmitted by the write bus is subjected to the fault tolerance algorithms before the data is passed for distribution to the read bus. The RMU provides both backplane control and fault tolerance.
A methodology for testing fault-tolerant software
NASA Technical Reports Server (NTRS)
Andrews, D. M.; Mahmood, A.; Mccluskey, E. J.
1985-01-01
A methodology for testing fault tolerant software is presented. There are problems associated with testing fault tolerant software because many errors are masked or corrected by voters, limiter, or automatic channel synchronization. This methodology illustrates how the same strategies used for testing fault tolerant hardware can be applied to testing fault tolerant software. For example, one strategy used in testing fault tolerant hardware is to disable the redundancy during testing. A similar testing strategy is proposed for software, namely, to move the major emphasis on testing earlier in the development cycle (before the redundancy is in place) thus reducing the possibility that undetected errors will be masked when limiters and voters are added.
Eigenstructure Assignment for Fault Tolerant Flight Control Design
NASA Technical Reports Server (NTRS)
Sobel, Kenneth; Joshi, Suresh (Technical Monitor)
2002-01-01
In recent years, fault tolerant flight control systems have gained an increased interest for high performance military aircraft as well as civil aircraft. Fault tolerant control systems can be described as either active or passive. An active fault tolerant control system has to either reconfigure or adapt the controller in response to a failure. One approach is to reconfigure the controller based upon detection and identification of the failure. Another approach is to use direct adaptive control to adjust the controller without explicitly identifying the failure. In contrast, a passive fault tolerant control system uses a fixed controller which achieves acceptable performance for a presumed set of failures. We have obtained a passive fault tolerant flight control law for the F/A-18 aircraft which achieves acceptable handling qualities for a class of control surface failures. The class of failures includes the symmetric failure of any one control surface being stuck at its trim value. A comparison was made of an eigenstructure assignment gain designed for the unfailed aircraft with a fault tolerant multiobjective optimization gain. We have shown that time responses for the unfailed aircraft using the eigenstructure assignment gain and the fault tolerant gain are identical. Furthermore, the fault tolerant gain achieves MIL-F-8785C specifications for all failure conditions.
NASA Astrophysics Data System (ADS)
Ushaq, Muhammad; Fang, Jiancheng
2013-10-01
Integrated navigation systems for various applications, generally employs the centralized Kalman filter (CKF) wherein all measured sensor data are communicated to a single central Kalman filter. The advantage of CKF is that there is a minimal loss of information and high precision under benign conditions. But CKF may suffer computational overloading, and poor fault tolerance. The alternative is the federated Kalman filter (FKF) wherein the local estimates can deliver optimal or suboptimal state estimate as per certain information fusion criterion. FKF has enhanced throughput and multiple level fault detection capability. The Standard CKF or FKF require that the system noise and the measurement noise are zero-mean and Gaussian. Moreover it is assumed that covariance of system and measurement noises remain constant. But if the theoretical and actual statistical features employed in Kalman filter are not compatible, the Kalman filter does not render satisfactory solutions and divergence problems also occur. To resolve such problems, in this paper, an adaptive Kalman filter scheme strengthened with fuzzy inference system (FIS) is employed to adapt the statistical features of contributing sensors, online, in the light of real system dynamics and varying measurement noises. The excessive faults are detected and isolated by employing Chi Square test method. As a case study, the presented scheme has been implemented on Strapdown Inertial Navigation System (SINS) integrated with the Celestial Navigation System (CNS), GPS and Doppler radar using FKF. Collectively the overall system can be termed as SINS/CNS/GPS/Doppler integrated navigation system. The simulation results have validated the effectiveness of the presented scheme with significantly enhanced precision, reliability and fault tolerance. Effectiveness of the scheme has been tested against simulated abnormal errors/noises during different time segments of flight. It is believed that the presented scheme can be applied to the navigation system of aircraft or unmanned aerial vehicle (UAV).
Fault Accommodation in Control of Flexible Systems
NASA Technical Reports Server (NTRS)
Maghami, Peiman G.; Sparks, Dean W., Jr.; Lim, Kyong B.
1998-01-01
New synthesis techniques for the design of fault accommodating controllers for flexible systems are developed. Three robust control design strategies, static dissipative, dynamic dissipative and mu-synthesis, are used in the approach. The approach provides techniques for designing controllers that maximize, in some sense, the tolerance of the closed-loop system against faults in actuators and sensors, while guaranteeing performance robustness at a specified performance level, measured in terms of the proximity of the closed-loop poles to the imaginary axis (the degree of stability). For dissipative control designs, nonlinear programming is employed to synthesize the controllers, whereas in mu-synthesis, the traditional D-K iteration is used. To demonstrate the feasibility of the proposed techniques, they are applied to the control design of a structural model of a flexible laboratory test structure.
Multiple incipient sensor faults diagnosis with application to high-speed railway traction devices.
Wu, Yunkai; Jiang, Bin; Lu, Ningyun; Yang, Hao; Zhou, Yang
2017-03-01
This paper deals with the problem of incipient fault diagnosis for a class of Lipschitz nonlinear systems with sensor biases and explores further results of total measurable fault information residual (ToMFIR). Firstly, state and output transformations are introduced to transform the original system into two subsystems. The first subsystem is subject to system disturbances and free from sensor faults, while the second subsystem contains sensor faults but without any system disturbances. Sensor faults in the second subsystem are then formed as actuator faults by using a pseudo-actuator based approach. Since the effects of system disturbances on the residual are completely decoupled, multiple incipient sensor faults can be detected by constructing ToMFIR, and the fault detectability condition is then derived for discriminating the detectable incipient sensor faults. Further, a sliding-mode observers (SMOs) based fault isolation scheme is designed to guarantee accurate isolation of multiple sensor faults. Finally, simulation results conducted on a CRH2 high-speed railway traction device are given to demonstrate the effectiveness of the proposed approach. Copyright © 2016 ISA. Published by Elsevier Ltd. All rights reserved.
A verified design of a fault-tolerant clock synchronization circuit: Preliminary investigations
NASA Technical Reports Server (NTRS)
Miner, Paul S.
1992-01-01
Schneider demonstrates that many fault tolerant clock synchronization algorithms can be represented as refinements of a single proven correct paradigm. Shankar provides mechanical proof that Schneider's schema achieves Byzantine fault tolerant clock synchronization provided that 11 constraints are satisfied. Some of the constraints are assumptions about physical properties of the system and cannot be established formally. Proofs are given that the fault tolerant midpoint convergence function satisfies three of the constraints. A hardware design is presented, implementing the fault tolerant midpoint function, which is shown to satisfy the remaining constraints. The synchronization circuit will recover completely from transient faults provided the maximum fault assumption is not violated. The initialization protocol for the circuit also provides a recovery mechanism from total system failure caused by correlated transient faults.
A distributed fault-tolerant signal processor /FTSP/
NASA Astrophysics Data System (ADS)
Bonneau, R. J.; Evett, R. C.; Young, M. J.
1980-01-01
A digital fault-tolerant signal processor (FTSP), an example of a self-repairing programmable system is analyzed. The design configuration is discussed in terms of fault tolerance, system-level fault detection, isolation and common memory. Special attention is given to the FDIR (fault detection isolation and reconfiguration) logic, noting that the reconfiguration decisions are based on configuration, summary status, end-around tests, and north marker/synchro data. Several mechanisms of fault detection are described which initiate reconfiguration at different levels. It is concluded that the reliability of a signal processor can be significantly enhanced by the use of fault-tolerant techniques.
Fault tolerant software modules for SIFT
NASA Technical Reports Server (NTRS)
Hecht, M.; Hecht, H.
1982-01-01
The implementation of software fault tolerance is investigated for critical modules of the Software Implemented Fault Tolerance (SIFT) operating system to support the computational and reliability requirements of advanced fly by wire transport aircraft. Fault tolerant designs generated for the error reported and global executive are examined. A description of the alternate routines, implementation requirements, and software validation are included.
Fault tree models for fault tolerant hypercube multiprocessors
NASA Technical Reports Server (NTRS)
Boyd, Mark A.; Tuazon, Jezus O.
1991-01-01
Three candidate fault tolerant hypercube architectures are modeled, their reliability analyses are compared, and the resulting implications of these methods of incorporating fault tolerance into hypercube multiprocessors are discussed. In the course of performing the reliability analyses, the use of HARP and fault trees in modeling sequence dependent system behaviors is demonstrated.
The Design of a Fault-Tolerant COTS-Based Bus Architecture for Space Applications
NASA Technical Reports Server (NTRS)
Chau, Savio N.; Alkalai, Leon; Tai, Ann T.
2000-01-01
The high-performance, scalability and miniaturization requirements together with the power, mass and cost constraints mandate the use of commercial-off-the-shelf (COTS) components and standards in the X2000 avionics system architecture for deep-space missions. In this paper, we report our experiences and findings on the design of an IEEE 1394 compliant fault-tolerant COTS-based bus architecture. While the COTS standard IEEE 1394 adequately supports power management, high performance and scalability, its topological criteria impose restrictions on fault tolerance realization. To circumvent the difficulties, we derive a "stack-tree" topology that not only complies with the IEEE 1394 standard but also facilitates fault tolerance realization in a spaceborne system with limited dedicated resource redundancies. Moreover, by exploiting pertinent standard features of the 1394 interface which are not purposely designed for fault tolerance, we devise a comprehensive set of fault detection mechanisms to support the fault-tolerant bus architecture.
Test plan. GCPS task 7, subtask 7.1: IHM development
NASA Technical Reports Server (NTRS)
Greenberg, H. S.
1994-01-01
The overall objective of Task 7 is to identify cost-effective life cycle integrated health management (IHM) approaches for a reusable launch vehicle's primary structure. Acceptable IHM approaches must: eliminate and accommodate faults through robust designs, identify optimum inspection/maintenance periods, automate ground and on-board test and check-out, and accommodate and detect structural faults by providing wide and localized area sensor and test coverage as required. These requirements are elements of our targeted primary structure low cost operations approach using airline-like maintenance by exception philosophies. This development plan will follow an evolutionary path paving the way to the ultimate development of flight-quality production, operations, and vehicle systems. This effort will be focused on maturing the recommended sensor technologies required for localized and wide area health monitoring to a technology readiness level (TRL) of 6 and to establish flight ready system design requirements. The following is a brief list of IHM program objectives: design out faults by analyzing material properties, structural geometry, and load and environment variables and identify failure modes and damage tolerance requirements; design in system robustness while meeting performance objectives (weight limitations) of the reusable launch vehicle primary structure; establish structural integrity margins to preclude the need for test and checkout and predict optimum inspection/maintenance periods through life prediction analysis; identify optimum fault protection system concept definitions combining system robustness and integrity margins established above with cost effective health monitoring technologies; and use coupons, panels, and integrated full scale primary structure test articles to identify, evaluate, and characterize the preferred NDE/NDI/IHM sensor technologies that will be a part of the fault protection system.
Survivable algorithms and redundancy management in NASA's distributed computing systems
NASA Technical Reports Server (NTRS)
Malek, Miroslaw
1992-01-01
The design of survivable algorithms requires a solid foundation for executing them. While hardware techniques for fault-tolerant computing are relatively well understood, fault-tolerant operating systems, as well as fault-tolerant applications (survivable algorithms), are, by contrast, little understood, and much more work in this field is required. We outline some of our work that contributes to the foundation of ultrareliable operating systems and fault-tolerant algorithm design. We introduce our consensus-based framework for fault-tolerant system design. This is followed by a description of a hierarchical partitioning method for efficient consensus. A scheduler for redundancy management is introduced, and application-specific fault tolerance is described. We give an overview of our hybrid algorithm technique, which is an alternative to the formal approach given.
Fault tolerant architectures for integrated aircraft electronics systems, task 2
NASA Technical Reports Server (NTRS)
Levitt, K. N.; Melliar-Smith, P. M.; Schwartz, R. L.
1984-01-01
The architectural basis for an advanced fault tolerant on-board computer to succeed the current generation of fault tolerant computers is examined. The network error tolerant system architecture is studied with particular attention to intercluster configurations and communication protocols, and to refined reliability estimates. The diagnosis of faults, so that appropriate choices for reconfiguration can be made is discussed. The analysis relates particularly to the recognition of transient faults in a system with tasks at many levels of priority. The demand driven data-flow architecture, which appears to have possible application in fault tolerant systems is described and work investigating the feasibility of automatic generation of aircraft flight control programs from abstract specifications is reported.
Emergent Adaptive Noise Reduction from Communal Cooperation of Sensor Grid
NASA Technical Reports Server (NTRS)
Jones, Kennie H.; Jones, Michael G.; Nark, Douglas M.; Lodding, Kenneth N.
2010-01-01
In the last decade, the realization of small, inexpensive, and powerful devices with sensors, computers, and wireless communication has promised the development of massive sized sensor networks with dense deployments over large areas capable of high fidelity situational assessments. However, most management models have been based on centralized control and research has concentrated on methods for passing data from sensor devices to the central controller. Most implementations have been small but, as it is not scalable, this methodology is insufficient for massive deployments. Here, a specific application of a large sensor network for adaptive noise reduction demonstrates a new paradigm where communities of sensor/computer devices assess local conditions and make local decisions from which emerges a global behaviour. This approach obviates many of the problems of centralized control as it is not prone to single point of failure and is more scalable, efficient, robust, and fault tolerant
Scarola, Kenneth; Jamison, David S.; Manazir, Richard M.; Rescorl, Robert L.; Harmon, Daryl L.
1998-01-01
A method for generating a validated measurement of a process parameter at a point in time by using a plurality of individual sensor inputs from a scan of said sensors at said point in time. The sensor inputs from said scan are stored and a first validation pass is initiated by computing an initial average of all stored sensor inputs. Each sensor input is deviation checked by comparing each input including a preset tolerance against the initial average input. If the first deviation check is unsatisfactory, the sensor which produced the unsatisfactory input is flagged as suspect. It is then determined whether at least two of the inputs have not been flagged as suspect and are therefore considered good inputs. If two or more inputs are good, a second validation pass is initiated by computing a second average of all the good sensor inputs, and deviation checking the good inputs by comparing each good input including a present tolerance against the second average. If the second deviation check is satisfactory, the second average is displayed as the validated measurement and the suspect sensor as flagged as bad. A validation fault occurs if at least two inputs are not considered good, or if the second deviation check is not satisfactory. In the latter situation the inputs from each of all the sensors are compared against the last validated measurement and the value from the sensor input that deviates the least from the last valid measurement is displayed.
NASA Technical Reports Server (NTRS)
Harper, Richard E.; Butler, Bryan P.
1990-01-01
The Draper fault-tolerant processor with fault-tolerant shared memory (FTP/FTSM), which is designed to allow application tasks to continue execution during the memory alignment process, is described. Processor performance is not affected by memory alignment. In addition, the FTP/FTSM incorporates a hardware scrubber device to perform the memory alignment quickly during unused memory access cycles. The FTP/FTSM architecture is described, followed by an estimate of the time required for channel reintegration.
Parallel and distributed computation for fault-tolerant object recognition
NASA Technical Reports Server (NTRS)
Wechsler, Harry
1988-01-01
The distributed associative memory (DAM) model is suggested for distributed and fault-tolerant computation as it relates to object recognition tasks. The fault-tolerance is with respect to geometrical distortions (scale and rotation), noisy inputs, occulsion/overlap, and memory faults. An experimental system was developed for fault-tolerant structure recognition which shows the feasibility of such an approach. The approach is futher extended to the problem of multisensory data integration and applied successfully to the recognition of colored polyhedral objects.
Multiple sensor fault diagnosis for dynamic processes.
Li, Cheng-Chih; Jeng, Jyh-Cheng
2010-10-01
Modern industrial plants are usually large scaled and contain a great amount of sensors. Sensor fault diagnosis is crucial and necessary to process safety and optimal operation. This paper proposes a systematic approach to detect, isolate and identify multiple sensor faults for multivariate dynamic systems. The current work first defines deviation vectors for sensor observations, and further defines and derives the basic sensor fault matrix (BSFM), consisting of the normalized basic fault vectors, by several different methods. By projecting a process deviation vector to the space spanned by BSFM, this research uses a vector with the resulted weights on each direction for multiple sensor fault diagnosis. This study also proposes a novel monitoring index and derives corresponding sensor fault detectability. The study also utilizes that vector to isolate and identify multiple sensor faults, and discusses the isolatability and identifiability. Simulation examples and comparison with two conventional PCA-based contribution plots are presented to demonstrate the effectiveness of the proposed methodology. Copyright © 2010 ISA. Published by Elsevier Ltd. All rights reserved.
NASA Technical Reports Server (NTRS)
Smith, T. B., Jr.; Lala, J. H.
1983-01-01
The basic organization of the fault tolerant multiprocessor, (FTMP) is that of a general purpose homogeneous multiprocessor. Three processors operate on a shared system (memory and I/O) bus. Replication and tight synchronization of all elements and hardware voting is employed to detect and correct any single fault. Reconfiguration is then employed to repair a fault. Multiple faults may be tolerated as a sequence of single faults with repair between fault occurrences.
cost and benefits optimization model for fault-tolerant aircraft electronic systems
NASA Technical Reports Server (NTRS)
1983-01-01
The factors involved in economic assessment of fault tolerant systems (FTS) and fault tolerant flight control systems (FTFCS) are discussed. Algorithms for optimization and economic analysis of FTFCS are documented.
Validation Methods for Fault-Tolerant avionics and control systems, working group meeting 1
NASA Technical Reports Server (NTRS)
1979-01-01
The proceedings of the first working group meeting on validation methods for fault tolerant computer design are presented. The state of the art in fault tolerant computer validation was examined in order to provide a framework for future discussions concerning research issues for the validation of fault tolerant avionics and flight control systems. The development of positions concerning critical aspects of the validation process are given.
Multi-hop routing mechanism for reliable sensor computing.
Chen, Jiann-Liang; Ma, Yi-Wei; Lai, Chia-Ping; Hu, Chia-Cheng; Huang, Yueh-Min
2009-01-01
Current research on routing in wireless sensor computing concentrates on increasing the service lifetime, enabling scalability for large number of sensors and supporting fault tolerance for battery exhaustion and broken nodes. A sensor node is naturally exposed to various sources of unreliable communication channels and node failures. Sensor nodes have many failure modes, and each failure degrades the network performance. This work develops a novel mechanism, called Reliable Routing Mechanism (RRM), based on a hybrid cluster-based routing protocol to specify the best reliable routing path for sensor computing. Table-driven intra-cluster routing and on-demand inter-cluster routing are combined by changing the relationship between clusters for sensor computing. Applying a reliable routing mechanism in sensor computing can improve routing reliability, maintain low packet loss, minimize management overhead and save energy consumption. Simulation results indicate that the reliability of the proposed RRM mechanism is around 25% higher than that of the Dynamic Source Routing (DSR) and ad hoc On-demand Distance Vector routing (AODV) mechanisms.
Zhao, Kaihui; Li, Peng; Zhang, Changfan; Li, Xiangfei; He, Jing; Lin, Yuliang
2017-12-06
This paper proposes a new scheme of reconstructing current sensor faults and estimating unknown load disturbance for a permanent magnet synchronous motor (PMSM)-driven system. First, the original PMSM system is transformed into two subsystems; the first subsystem has unknown system load disturbances, which are unrelated to sensor faults, and the second subsystem has sensor faults, but is free from unknown load disturbances. Introducing a new state variable, the augmented subsystem that has sensor faults can be transformed into having actuator faults. Second, two sliding mode observers (SMOs) are designed: the unknown load disturbance is estimated by the first SMO in the subsystem, which has unknown load disturbance, and the sensor faults can be reconstructed using the second SMO in the augmented subsystem, which has sensor faults. The gains of the proposed SMOs and their stability analysis are developed via the solution of linear matrix inequality (LMI). Finally, the effectiveness of the proposed scheme was verified by simulations and experiments. The results demonstrate that the proposed scheme can reconstruct current sensor faults and estimate unknown load disturbance for the PMSM-driven system.
Advanced cloud fault tolerance system
NASA Astrophysics Data System (ADS)
Sumangali, K.; Benny, Niketa
2017-11-01
Cloud computing has become a prevalent on-demand service on the internet to store, manage and process data. A pitfall that accompanies cloud computing is the failures that can be encountered in the cloud. To overcome these failures, we require a fault tolerance mechanism to abstract faults from users. We have proposed a fault tolerant architecture, which is a combination of proactive and reactive fault tolerance. This architecture essentially increases the reliability and the availability of the cloud. In the future, we would like to compare evaluations of our proposed architecture with existing architectures and further improve it.
ROBUS-2: A Fault-Tolerant Broadcast Communication System
NASA Technical Reports Server (NTRS)
Torres-Pomales, Wilfredo; Malekpour, Mahyar R.; Miner, Paul S.
2005-01-01
The Reliable Optical Bus (ROBUS) is the core communication system of the Scalable Processor-Independent Design for Enhanced Reliability (SPIDER), a general-purpose fault-tolerant integrated modular architecture currently under development at NASA Langley Research Center. The ROBUS is a time-division multiple access (TDMA) broadcast communication system with medium access control by means of time-indexed communication schedule. ROBUS-2 is a developmental version of the ROBUS providing guaranteed fault-tolerant services to the attached processing elements (PEs), in the presence of a bounded number of faults. These services include message broadcast (Byzantine Agreement), dynamic communication schedule update, clock synchronization, and distributed diagnosis (group membership). The ROBUS also features fault-tolerant startup and restart capabilities. ROBUS-2 is tolerant to internal as well as PE faults, and incorporates a dynamic self-reconfiguration capability driven by the internal diagnostic system. This version of the ROBUS is intended for laboratory experimentation and demonstrations of the capability to reintegrate failed nodes, dynamically update the communication schedule, and tolerate and recover from correlated transient faults.
Probabilistic evaluation of on-line checks in fault-tolerant multiprocessor systems
NASA Technical Reports Server (NTRS)
Nair, V. S. S.; Hoskote, Yatin V.; Abraham, Jacob A.
1992-01-01
The analysis of fault-tolerant multiprocessor systems that use concurrent error detection (CED) schemes is much more difficult than the analysis of conventional fault-tolerant architectures. Various analytical techniques have been proposed to evaluate CED schemes deterministically. However, these approaches are based on worst-case assumptions related to the failure of system components. Often, the evaluation results do not reflect the actual fault tolerance capabilities of the system. A probabilistic approach to evaluate the fault detecting and locating capabilities of on-line checks in a system is developed. The various probabilities associated with the checking schemes are identified and used in the framework of the matrix-based model. Based on these probabilistic matrices, estimates for the fault tolerance capabilities of various systems are derived analytically.
Ultrareliable fault-tolerant control systems
NASA Technical Reports Server (NTRS)
Webster, L. D.; Slykhouse, R. A.; Booth, L. A., Jr.; Carson, T. M.; Davis, G. J.; Howard, J. C.
1984-01-01
It is demonstrated that fault-tolerant computer systems, such as on the Shuttles, based on redundant, independent operation are a viable alternative in fault tolerant system designs. The ultrareliable fault-tolerant control system (UFTCS) was developed and tested in laboratory simulations of an UH-1H helicopter. UFTCS includes asymptotically stable independent control elements in a parallel, cross-linked system environment. Static redundancy provides the fault tolerance. A polling is performed among the computers, with results allowing for time-delay channel variations with tight bounds. When compared with the laboratory and actual flight data for the helicopter, the probability of a fault was, for the first 10 hr of flight given a quintuple computer redundancy, found to be 1 in 290 billion. Two weeks of untended Space Station operations would experience a fault probability of 1 in 24 million. Techniques for avoiding channel divergence problems are identified.
Fault recovery characteristics of the fault tolerant multi-processor
NASA Technical Reports Server (NTRS)
Padilla, Peter A.
1990-01-01
The fault handling performance of the fault tolerant multiprocessor (FTMP) was investigated. Fault handling errors detected during fault injection experiments were characterized. In these fault injection experiments, the FTMP disabled a working unit instead of the faulted unit once every 500 faults, on the average. System design weaknesses allow active faults to exercise a part of the fault management software that handles byzantine or lying faults. It is pointed out that these weak areas in the FTMP's design increase the probability that, for any hardware fault, a good LRU (line replaceable unit) is mistakenly disabled by the fault management software. It is concluded that fault injection can help detect and analyze the behavior of a system in the ultra-reliable regime. Although fault injection testing cannot be exhaustive, it has been demonstrated that it provides a unique capability to unmask problems and to characterize the behavior of a fault-tolerant system.
Distributed Fault Detection Based on Credibility and Cooperation for WSNs in Smart Grids.
Shao, Sujie; Guo, Shaoyong; Qiu, Xuesong
2017-04-28
Due to the increasingly important role in monitoring and data collection that sensors play, accurate and timely fault detection is a key issue for wireless sensor networks (WSNs) in smart grids. This paper presents a novel distributed fault detection mechanism for WSNs based on credibility and cooperation. Firstly, a reasonable credibility model of a sensor is established to identify any suspicious status of the sensor according to its own temporal data correlation. Based on the credibility model, the suspicious sensor is then chosen to launch fault diagnosis requests. Secondly, the sending time of fault diagnosis request is discussed to avoid the transmission overhead brought about by unnecessary diagnosis requests and improve the efficiency of fault detection based on neighbor cooperation. The diagnosis reply of a neighbor sensor is analyzed according to its own status. Finally, to further improve the accuracy of fault detection, the diagnosis results of neighbors are divided into several classifications to judge the fault status of the sensors which launch the fault diagnosis requests. Simulation results show that this novel mechanism can achieve high fault detection ratio with a small number of fault diagnoses and low data congestion probability.
Distributed Fault Detection Based on Credibility and Cooperation for WSNs in Smart Grids
Shao, Sujie; Guo, Shaoyong; Qiu, Xuesong
2017-01-01
Due to the increasingly important role in monitoring and data collection that sensors play, accurate and timely fault detection is a key issue for wireless sensor networks (WSNs) in smart grids. This paper presents a novel distributed fault detection mechanism for WSNs based on credibility and cooperation. Firstly, a reasonable credibility model of a sensor is established to identify any suspicious status of the sensor according to its own temporal data correlation. Based on the credibility model, the suspicious sensor is then chosen to launch fault diagnosis requests. Secondly, the sending time of fault diagnosis request is discussed to avoid the transmission overhead brought about by unnecessary diagnosis requests and improve the efficiency of fault detection based on neighbor cooperation. The diagnosis reply of a neighbor sensor is analyzed according to its own status. Finally, to further improve the accuracy of fault detection, the diagnosis results of neighbors are divided into several classifications to judge the fault status of the sensors which launch the fault diagnosis requests. Simulation results show that this novel mechanism can achieve high fault detection ratio with a small number of fault diagnoses and low data congestion probability. PMID:28452925
Performance Monitoring of Diabetic Patient Systems
2001-10-25
a process delay that is due to the dynamics of the glucose sensor. A. Bergman Model The Bergman and AIDA models both utilize a \\minimal model...approxima- tion of the process must be made to achieve reasonable performance. A rst order approximation, ~g(s), of both the Bergman and AIDA models is...Within the IMC framework, both the Bergman and AIDA models can be controlled within acceptable toler- ances. The simulated faults are stochastic
Fault Detection and Isolation for Hydraulic Control
NASA Technical Reports Server (NTRS)
1987-01-01
Pressure sensors and isolation valves act to shut down defective servochannel. Redundant hydraulic system indirectly senses failure in any of its electrical control channels and mechanically isolates hydraulic channel controlled by faulty electrical channel so flat it cannot participate in operating system. With failure-detection and isolation technique, system can sustains two failed channels and still functions at full performance levels. Scheme useful on aircraft or other systems with hydraulic servovalves where failure cannot be tolerated.
Adaptive Control Allocation for Fault Tolerant Overactuated Autonomous Vehicles
2007-11-01
Tolerant Overactuated Autonomous Vehicles Casavola, A.; Garone, E. (2007) Adaptive Control Allocation for Fault Tolerant Overactuated Autonomous ...Adaptive Control Allocation for Fault Tolerant Overactuated Autonomous Vehicles 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6...Tolerant Overactuated Autonomous Vehicles 3.2 - 2 RTO-MP-AVT-145 UNCLASSIFIED/UNLIMITED Control allocation problem (CAP) - Given a virtual input v(t
Study of fault-tolerant software technology
NASA Technical Reports Server (NTRS)
Slivinski, T.; Broglio, C.; Wild, C.; Goldberg, J.; Levitt, K.; Hitt, E.; Webb, J.
1984-01-01
Presented is an overview of the current state of the art of fault-tolerant software and an analysis of quantitative techniques and models developed to assess its impact. It examines research efforts as well as experience gained from commercial application of these techniques. The paper also addresses the computer architecture and design implications on hardware, operating systems and programming languages (including Ada) of using fault-tolerant software in real-time aerospace applications. It concludes that fault-tolerant software has progressed beyond the pure research state. The paper also finds that, although not perfectly matched, newer architectural and language capabilities provide many of the notations and functions needed to effectively and efficiently implement software fault-tolerance.
Li, Xiangfei; Lin, Yuliang
2017-01-01
This paper proposes a new scheme of reconstructing current sensor faults and estimating unknown load disturbance for a permanent magnet synchronous motor (PMSM)-driven system. First, the original PMSM system is transformed into two subsystems; the first subsystem has unknown system load disturbances, which are unrelated to sensor faults, and the second subsystem has sensor faults, but is free from unknown load disturbances. Introducing a new state variable, the augmented subsystem that has sensor faults can be transformed into having actuator faults. Second, two sliding mode observers (SMOs) are designed: the unknown load disturbance is estimated by the first SMO in the subsystem, which has unknown load disturbance, and the sensor faults can be reconstructed using the second SMO in the augmented subsystem, which has sensor faults. The gains of the proposed SMOs and their stability analysis are developed via the solution of linear matrix inequality (LMI). Finally, the effectiveness of the proposed scheme was verified by simulations and experiments. The results demonstrate that the proposed scheme can reconstruct current sensor faults and estimate unknown load disturbance for the PMSM-driven system. PMID:29211017
Xu, Jun; Wang, Jing; Li, Shiying; Cao, Binggang
2016-01-01
Recently, State of energy (SOE) has become one of the most fundamental parameters for battery management systems in electric vehicles. However, current information is critical in SOE estimation and current sensor is usually utilized to obtain the latest current information. However, if the current sensor fails, the SOE estimation may be confronted with large error. Therefore, this paper attempts to make the following contributions: Current sensor fault detection and SOE estimation method is realized simultaneously. Through using the proportional integral observer (PIO) based method, the current sensor fault could be accurately estimated. By taking advantage of the accurate estimated current sensor fault, the influence caused by the current sensor fault can be eliminated and compensated. As a result, the results of the SOE estimation will be influenced little by the fault. In addition, the simulation and experimental workbench is established to verify the proposed method. The results indicate that the current sensor fault can be estimated accurately. Simultaneously, the SOE can also be estimated accurately and the estimation error is influenced little by the fault. The maximum SOE estimation error is less than 2%, even though the large current error caused by the current sensor fault still exists. PMID:27548183
Xu, Jun; Wang, Jing; Li, Shiying; Cao, Binggang
2016-08-19
Recently, State of energy (SOE) has become one of the most fundamental parameters for battery management systems in electric vehicles. However, current information is critical in SOE estimation and current sensor is usually utilized to obtain the latest current information. However, if the current sensor fails, the SOE estimation may be confronted with large error. Therefore, this paper attempts to make the following contributions: Current sensor fault detection and SOE estimation method is realized simultaneously. Through using the proportional integral observer (PIO) based method, the current sensor fault could be accurately estimated. By taking advantage of the accurate estimated current sensor fault, the influence caused by the current sensor fault can be eliminated and compensated. As a result, the results of the SOE estimation will be influenced little by the fault. In addition, the simulation and experimental workbench is established to verify the proposed method. The results indicate that the current sensor fault can be estimated accurately. Simultaneously, the SOE can also be estimated accurately and the estimation error is influenced little by the fault. The maximum SOE estimation error is less than 2%, even though the large current error caused by the current sensor fault still exists.
Software fault tolerance in computer operating systems
NASA Technical Reports Server (NTRS)
Iyer, Ravishankar K.; Lee, Inhwan
1994-01-01
This chapter provides data and analysis of the dependability and fault tolerance for three operating systems: the Tandem/GUARDIAN fault-tolerant system, the VAX/VMS distributed system, and the IBM/MVS system. Based on measurements from these systems, basic software error characteristics are investigated. Fault tolerance in operating systems resulting from the use of process pairs and recovery routines is evaluated. Two levels of models are developed to analyze error and recovery processes inside an operating system and interactions among multiple instances of an operating system running in a distributed environment. The measurements show that the use of process pairs in Tandem systems, which was originally intended for tolerating hardware faults, allows the system to tolerate about 70% of defects in system software that result in processor failures. The loose coupling between processors which results in the backup execution (the processor state and the sequence of events occurring) being different from the original execution is a major reason for the measured software fault tolerance. The IBM/MVS system fault tolerance almost doubles when recovery routines are provided, in comparison to the case in which no recovery routines are available. However, even when recovery routines are provided, there is almost a 50% chance of system failure when critical system jobs are involved.
A fault-tolerant intelligent robotic control system
NASA Technical Reports Server (NTRS)
Marzwell, Neville I.; Tso, Kam Sing
1993-01-01
This paper describes the concept, design, and features of a fault-tolerant intelligent robotic control system being developed for space and commercial applications that require high dependability. The comprehensive strategy integrates system level hardware/software fault tolerance with task level handling of uncertainties and unexpected events for robotic control. The underlying architecture for system level fault tolerance is the distributed recovery block which protects against application software, system software, hardware, and network failures. Task level fault tolerance provisions are implemented in a knowledge-based system which utilizes advanced automation techniques such as rule-based and model-based reasoning to monitor, diagnose, and recover from unexpected events. The two level design provides tolerance of two or more faults occurring serially at any level of command, control, sensing, or actuation. The potential benefits of such a fault tolerant robotic control system include: (1) a minimized potential for damage to humans, the work site, and the robot itself; (2) continuous operation with a minimum of uncommanded motion in the presence of failures; and (3) more reliable autonomous operation providing increased efficiency in the execution of robotic tasks and decreased demand on human operators for controlling and monitoring the robotic servicing routines.
Fault Diagnosis for Micro-Gas Turbine Engine Sensors via Wavelet Entropy
Yu, Bing; Liu, Dongdong; Zhang, Tianhong
2011-01-01
Sensor fault diagnosis is necessary to ensure the normal operation of a gas turbine system. However, the existing methods require too many resources and this need can’t be satisfied in some occasions. Since the sensor readings are directly affected by sensor state, sensor fault diagnosis can be performed by extracting features of the measured signals. This paper proposes a novel fault diagnosis method for sensors based on wavelet entropy. Based on the wavelet theory, wavelet decomposition is utilized to decompose the signal in different scales. Then the instantaneous wavelet energy entropy (IWEE) and instantaneous wavelet singular entropy (IWSE) are defined based on the previous wavelet entropy theory. Subsequently, a fault diagnosis method for gas turbine sensors is proposed based on the results of a numerically simulated example. Then, experiments on this method are carried out on a real micro gas turbine engine. In the experiment, four types of faults with different magnitudes are presented. The experimental results show that the proposed method for sensor fault diagnosis is efficient. PMID:22163734
Fault diagnosis for micro-gas turbine engine sensors via wavelet entropy.
Yu, Bing; Liu, Dongdong; Zhang, Tianhong
2011-01-01
Sensor fault diagnosis is necessary to ensure the normal operation of a gas turbine system. However, the existing methods require too many resources and this need can't be satisfied in some occasions. Since the sensor readings are directly affected by sensor state, sensor fault diagnosis can be performed by extracting features of the measured signals. This paper proposes a novel fault diagnosis method for sensors based on wavelet entropy. Based on the wavelet theory, wavelet decomposition is utilized to decompose the signal in different scales. Then the instantaneous wavelet energy entropy (IWEE) and instantaneous wavelet singular entropy (IWSE) are defined based on the previous wavelet entropy theory. Subsequently, a fault diagnosis method for gas turbine sensors is proposed based on the results of a numerically simulated example. Then, experiments on this method are carried out on a real micro gas turbine engine. In the experiment, four types of faults with different magnitudes are presented. The experimental results show that the proposed method for sensor fault diagnosis is efficient.
Method and system for environmentally adaptive fault tolerant computing
NASA Technical Reports Server (NTRS)
Copenhaver, Jason L. (Inventor); Jeremy, Ramos (Inventor); Wolfe, Jeffrey M. (Inventor); Brenner, Dean (Inventor)
2010-01-01
A method and system for adapting fault tolerant computing. The method includes the steps of measuring an environmental condition representative of an environment. An on-board processing system's sensitivity to the measured environmental condition is measured. It is determined whether to reconfigure a fault tolerance of the on-board processing system based in part on the measured environmental condition. The fault tolerance of the on-board processing system may be reconfigured based in part on the measured environmental condition.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cheung, Howard; Braun, James E.
This report describes models of building faults created for OpenStudio to support the ongoing development of fault detection and diagnostic (FDD) algorithms at the National Renewable Energy Laboratory. Building faults are operating abnormalities that degrade building performance, such as using more energy than normal operation, failing to maintain building temperatures according to the thermostat set points, etc. Models of building faults in OpenStudio can be used to estimate fault impacts on building performance and to develop and evaluate FDD algorithms. The aim of the project is to develop fault models of typical heating, ventilating and air conditioning (HVAC) equipment inmore » the United States, and the fault models in this report are grouped as control faults, sensor faults, packaged and split air conditioner faults, water-cooled chiller faults, and other uncategorized faults. The control fault models simulate impacts of inappropriate thermostat control schemes such as an incorrect thermostat set point in unoccupied hours and manual changes of thermostat set point due to extreme outside temperature. Sensor fault models focus on the modeling of sensor biases including economizer relative humidity sensor bias, supply air temperature sensor bias, and water circuit temperature sensor bias. Packaged and split air conditioner fault models simulate refrigerant undercharging, condenser fouling, condenser fan motor efficiency degradation, non-condensable entrainment in refrigerant, and liquid line restriction. Other fault models that are uncategorized include duct fouling, excessive infiltration into the building, and blower and pump motor degradation.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cheung, Howard; Braun, James E.
2015-12-31
This report describes models of building faults created for OpenStudio to support the ongoing development of fault detection and diagnostic (FDD) algorithms at the National Renewable Energy Laboratory. Building faults are operating abnormalities that degrade building performance, such as using more energy than normal operation, failing to maintain building temperatures according to the thermostat set points, etc. Models of building faults in OpenStudio can be used to estimate fault impacts on building performance and to develop and evaluate FDD algorithms. The aim of the project is to develop fault models of typical heating, ventilating and air conditioning (HVAC) equipment inmore » the United States, and the fault models in this report are grouped as control faults, sensor faults, packaged and split air conditioner faults, water-cooled chiller faults, and other uncategorized faults. The control fault models simulate impacts of inappropriate thermostat control schemes such as an incorrect thermostat set point in unoccupied hours and manual changes of thermostat set point due to extreme outside temperature. Sensor fault models focus on the modeling of sensor biases including economizer relative humidity sensor bias, supply air temperature sensor bias, and water circuit temperature sensor bias. Packaged and split air conditioner fault models simulate refrigerant undercharging, condenser fouling, condenser fan motor efficiency degradation, non-condensable entrainment in refrigerant, and liquid line restriction. Other fault models that are uncategorized include duct fouling, excessive infiltration into the building, and blower and pump motor degradation.« less
Won, Jongho; Ma, Chris Y. T.; Yau, David K. Y.; ...
2016-06-01
Smart meters are integral to demand response in emerging smart grids, by reporting the electricity consumption of users to serve application needs. But reporting real-time usage information for individual households raises privacy concerns. Existing techniques to guarantee differential privacy (DP) of smart meter users either are not fault tolerant or achieve (possibly partial) fault tolerance at high communication overheads. In this paper, we propose a fault-tolerant protocol for smart metering that can handle general communication failures while ensuring DP with significantly improved efficiency and lower errors compared with the state of the art. Our protocol handles fail-stop faults proactively bymore » using a novel design of future ciphertexts, and distributes trust among the smart meters by sharing secret keys among them. We prove the DP properties of our protocol and analyze its advantages in fault tolerance, accuracy, and communication efficiency relative to competing techniques. We illustrate our analysis by simulations driven by real-world traces of electricity consumption.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Won, Jongho; Ma, Chris Y. T.; Yau, David K. Y.
Smart meters are integral to demand response in emerging smart grids, by reporting the electricity consumption of users to serve application needs. But reporting real-time usage information for individual households raises privacy concerns. Existing techniques to guarantee differential privacy (DP) of smart meter users either are not fault tolerant or achieve (possibly partial) fault tolerance at high communication overheads. In this paper, we propose a fault-tolerant protocol for smart metering that can handle general communication failures while ensuring DP with significantly improved efficiency and lower errors compared with the state of the art. Our protocol handles fail-stop faults proactively bymore » using a novel design of future ciphertexts, and distributes trust among the smart meters by sharing secret keys among them. We prove the DP properties of our protocol and analyze its advantages in fault tolerance, accuracy, and communication efficiency relative to competing techniques. We illustrate our analysis by simulations driven by real-world traces of electricity consumption.« less
Onboard Nonlinear Engine Sensor and Component Fault Diagnosis and Isolation Scheme
NASA Technical Reports Server (NTRS)
Tang, Liang; DeCastro, Jonathan A.; Zhang, Xiaodong
2011-01-01
A method detects and isolates in-flight sensor, actuator, and component faults for advanced propulsion systems. In sharp contrast to many conventional methods, which deal with either sensor fault or component fault, but not both, this method considers sensor fault, actuator fault, and component fault under one systemic and unified framework. The proposed solution consists of two main components: a bank of real-time, nonlinear adaptive fault diagnostic estimators for residual generation, and a residual evaluation module that includes adaptive thresholds and a Transferable Belief Model (TBM)-based residual evaluation scheme. By employing a nonlinear adaptive learning architecture, the developed approach is capable of directly dealing with nonlinear engine models and nonlinear faults without the need of linearization. Software modules have been developed and evaluated with the NASA C-MAPSS engine model. Several typical engine-fault modes, including a subset of sensor/actuator/components faults, were tested with a mild transient operation scenario. The simulation results demonstrated that the algorithm was able to successfully detect and isolate all simulated faults as long as the fault magnitudes were larger than the minimum detectable/isolable sizes, and no misdiagnosis occurred
A Unified Nonlinear Adaptive Approach for Detection and Isolation of Engine Faults
NASA Technical Reports Server (NTRS)
Tang, Liang; DeCastro, Jonathan A.; Zhang, Xiaodong; Farfan-Ramos, Luis; Simon, Donald L.
2010-01-01
A challenging problem in aircraft engine health management (EHM) system development is to detect and isolate faults in system components (i.e., compressor, turbine), actuators, and sensors. Existing nonlinear EHM methods often deal with component faults, actuator faults, and sensor faults separately, which may potentially lead to incorrect diagnostic decisions and unnecessary maintenance. Therefore, it would be ideal to address sensor faults, actuator faults, and component faults under one unified framework. This paper presents a systematic and unified nonlinear adaptive framework for detecting and isolating sensor faults, actuator faults, and component faults for aircraft engines. The fault detection and isolation (FDI) architecture consists of a parallel bank of nonlinear adaptive estimators. Adaptive thresholds are appropriately designed such that, in the presence of a particular fault, all components of the residual generated by the adaptive estimator corresponding to the actual fault type remain below their thresholds. If the faults are sufficiently different, then at least one component of the residual generated by each remaining adaptive estimator should exceed its threshold. Therefore, based on the specific response of the residuals, sensor faults, actuator faults, and component faults can be isolated. The effectiveness of the approach was evaluated using the NASA C-MAPSS turbofan engine model, and simulation results are presented.
Software fault tolerance for real-time avionics systems
NASA Technical Reports Server (NTRS)
Anderson, T.; Knight, J. C.
1983-01-01
Avionics systems have very high reliability requirements and are therefore prime candidates for the inclusion of fault tolerance techniques. In order to provide tolerance to software faults, some form of state restoration is usually advocated as a means of recovery. State restoration can be very expensive for systems which utilize concurrent processes. The concurrency present in most avionics systems and the further difficulties introduced by timing constraints imply that providing tolerance for software faults may be inordinately expensive or complex. A straightforward pragmatic approach to software fault tolerance which is believed to be applicable to many real-time avionics systems is proposed. A classification system for software errors is presented together with approaches to recovery and continued service for each error type.
Optimal Sensor Allocation for Fault Detection and Isolation
NASA Technical Reports Server (NTRS)
Azam, Mohammad; Pattipati, Krishna; Patterson-Hine, Ann
2004-01-01
Automatic fault diagnostic schemes rely on various types of sensors (e.g., temperature, pressure, vibration, etc) to measure the system parameters. Efficacy of a diagnostic scheme is largely dependent on the amount and quality of information available from these sensors. The reliability of sensors, as well as the weight, volume, power, and cost constraints, often makes it impractical to monitor a large number of system parameters. An optimized sensor allocation that maximizes the fault diagnosibility, subject to specified weight, volume, power, and cost constraints is required. Use of optimal sensor allocation strategies during the design phase can ensure better diagnostics at a reduced cost for a system incorporating a high degree of built-in testing. In this paper, we propose an approach that employs multiple fault diagnosis (MFD) and optimization techniques for optimal sensor placement for fault detection and isolation (FDI) in complex systems. Keywords: sensor allocation, multiple fault diagnosis, Lagrangian relaxation, approximate belief revision, multidimensional knapsack problem.
Switch failure diagnosis based on inductor current observation for boost converters
NASA Astrophysics Data System (ADS)
Jamshidpour, E.; Poure, P.; Saadate, S.
2016-09-01
Face to the growing number of applications using DC-DC power converters, the improvement of their reliability is subject to an increasing number of studies. Especially in safety critical applications, designing fault-tolerant converters is becoming mandatory. In this paper, a switch fault-tolerant DC-DC converter is studied. First, some of the fastest Fault Detection Algorithms (FDAs) are recalled. Then, a fast switch FDA is proposed which can detect both types of failures; open circuit fault as well as short circuit fault can be detected in less than one switching period. Second, a fault-tolerant converter which can be reconfigured under those types of fault is introduced. Hardware-In-the-Loop (HIL) results and experimental validations are given to verify the validity of the proposed switch fault-tolerant approach in the case of a single switch DC-DC boost converter with one redundant switch.
Design and Analysis of Linear Fault-Tolerant Permanent-Magnet Vernier Machines
Xu, Liang; Liu, Guohai; Du, Yi; Liu, Hu
2014-01-01
This paper proposes a new linear fault-tolerant permanent-magnet (PM) vernier (LFTPMV) machine, which can offer high thrust by using the magnetic gear effect. Both PMs and windings of the proposed machine are on short mover, while the long stator is only manufactured from iron. Hence, the proposed machine is very suitable for long stroke system applications. The key of this machine is that the magnetizer splits the two movers with modular and complementary structures. Hence, the proposed machine offers improved symmetrical and sinusoidal back electromotive force waveform and reduced detent force. Furthermore, owing to the complementary structure, the proposed machine possesses favorable fault-tolerant capability, namely, independent phases. In particular, differing from the existing fault-tolerant machines, the proposed machine offers fault tolerance without sacrificing thrust density. This is because neither fault-tolerant teeth nor the flux-barriers are adopted. The electromagnetic characteristics of the proposed machine are analyzed using the time-stepping finite-element method, which verifies the effectiveness of the theoretical analysis. PMID:24982959
Design and analysis of linear fault-tolerant permanent-magnet vernier machines.
Xu, Liang; Ji, Jinghua; Liu, Guohai; Du, Yi; Liu, Hu
2014-01-01
This paper proposes a new linear fault-tolerant permanent-magnet (PM) vernier (LFTPMV) machine, which can offer high thrust by using the magnetic gear effect. Both PMs and windings of the proposed machine are on short mover, while the long stator is only manufactured from iron. Hence, the proposed machine is very suitable for long stroke system applications. The key of this machine is that the magnetizer splits the two movers with modular and complementary structures. Hence, the proposed machine offers improved symmetrical and sinusoidal back electromotive force waveform and reduced detent force. Furthermore, owing to the complementary structure, the proposed machine possesses favorable fault-tolerant capability, namely, independent phases. In particular, differing from the existing fault-tolerant machines, the proposed machine offers fault tolerance without sacrificing thrust density. This is because neither fault-tolerant teeth nor the flux-barriers are adopted. The electromagnetic characteristics of the proposed machine are analyzed using the time-stepping finite-element method, which verifies the effectiveness of the theoretical analysis.
Chen, Gang; Song, Yongduan; Lewis, Frank L
2016-05-03
This paper investigates the distributed fault-tolerant control problem of networked Euler-Lagrange systems with actuator and communication link faults. An adaptive fault-tolerant cooperative control scheme is proposed to achieve the coordinated tracking control of networked uncertain Lagrange systems on a general directed communication topology, which contains a spanning tree with the root node being the active target system. The proposed algorithm is capable of compensating for the actuator bias fault, the partial loss of effectiveness actuation fault, the communication link fault, the model uncertainty, and the external disturbance simultaneously. The control scheme does not use any fault detection and isolation mechanism to detect, separate, and identify the actuator faults online, which largely reduces the online computation and expedites the responsiveness of the controller. To validate the effectiveness of the proposed method, a test-bed of multiple robot-arm cooperative control system is developed for real-time verification. Experiments on the networked robot-arms are conduced and the results confirm the benefits and the effectiveness of the proposed distributed fault-tolerant control algorithms.
The Design of a Fault-Tolerant COTS-Based Bus Architecture
NASA Technical Reports Server (NTRS)
Chau, Savio N.; Alkalai, Leon; Burt, John B.; Tai, Ann T.
1999-01-01
In this paper, we report our experiences and findings on the design of a fault-tolerant bus architecture comprised of two COTS buses, the IEEE 1394 and the 12C. This fault-tolerant bus is the backbone system bus for the avionics architecture of the X2000 program at the Jet Propulsion Laboratory. COTS buses are attractive because of the availability of low cost commercial products. However, they are not specifically designed for highly reliable applications such as long-life deep-space missions. The X2000 design team has devised a multi-level fault tolerance approach to compensate for this shortcoming of COTS buses. First, the approach enhances the fault tolerance capabilities of the IEEE 1394 and 12 C buses by adding a layer of fault handling hardware and software. Second, algorithms are developed to enable the IEEE 1394 and the 12 C buses assist each other to isolate and recovery from faults. Third, the set of IEEE 1394 and 12 C buses is duplicated to further enhance system reliability. The X2000 design team has paid special attention to guarantee that all fault tolerance provisions will not cause the bus design to deviate from the commercial standard specifications. Otherwise, the economic attractiveness of using COTS will be diminished. The hardware and software design of the X2000 fault-tolerant bus are being implemented and flight hardware will be delivered to the ST4 and Europa Orbiter missions.
Design study of Software-Implemented Fault-Tolerance (SIFT) computer
NASA Technical Reports Server (NTRS)
Wensley, J. H.; Goldberg, J.; Green, M. W.; Kutz, W. H.; Levitt, K. N.; Mills, M. E.; Shostak, R. E.; Whiting-Okeefe, P. M.; Zeidler, H. M.
1982-01-01
Software-implemented fault tolerant (SIFT) computer design for commercial aviation is reported. A SIFT design concept is addressed. Alternate strategies for physical implementation are considered. Hardware and software design correctness is addressed. System modeling and effectiveness evaluation are considered from a fault-tolerant point of view.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Simpson, L.; Britt, J.; Birkmire, R.
ITN Energy Systems, Inc., and Global Solar Energy, Inc., assisted by NREL's PV Manufacturing R&D program, have continued to advance CIGS production technology by developing trajectory-oriented predictive/control models, fault-tolerance control, control platform development, in-situ sensors, and process improvements. Modeling activities included developing physics-based and empirical models for CIGS and sputter-deposition processing, implementing model-based control, and applying predictive models to the construction of new evaporation sources and for control. Model-based control is enabled by implementing reduced or empirical models into a control platform. Reliability improvement activities include implementing preventive maintenance schedules; detecting failed sensors/equipment and reconfiguring to tinue processing; and systematicmore » development of fault prevention and reconfiguration strategies for the full range of CIGS PV production deposition processes. In-situ sensor development activities have resulted in improved control and indicated the potential for enhanced process status monitoring and control of the deposition processes. Substantial process improvements have been made, including significant improvement in CIGS uniformity, thickness control, efficiency, yield, and throughput. In large measure, these gains have been driven by process optimization, which in turn have been enabled by control and reliability improvements due to this PV Manufacturing R&D program.« less
Liu, Xing; Hou, Kun Mean; de Vaulx, Christophe; Shi, Hongling; Gholami, Khalid El
2014-01-01
Operating system (OS) technology is significant for the proliferation of the wireless sensor network (WSN). With an outstanding OS; the constrained WSN resources (processor; memory and energy) can be utilized efficiently. Moreover; the user application development can be served soundly. In this article; a new hybrid; real-time; memory-efficient; energy-efficient; user-friendly and fault-tolerant WSN OS MIROS is designed and implemented. MIROS implements the hybrid scheduler and the dynamic memory allocator. Real-time scheduling can thus be achieved with low memory consumption. In addition; it implements a mid-layer software EMIDE (Efficient Mid-layer Software for User-Friendly Application Development Environment) to decouple the WSN application from the low-level system. The application programming process can consequently be simplified and the application reprogramming performance improved. Moreover; it combines both the software and the multi-core hardware techniques to conserve the energy resources; improve the node reliability; as well as achieve a new debugging method. To evaluate the performance of MIROS; it is compared with the other WSN OSes (TinyOS; Contiki; SOS; openWSN and mantisOS) from different OS concerns. The final evaluation results prove that MIROS is suitable to be used even on the tight resource-constrained WSN nodes. It can support the real-time WSN applications. Furthermore; it is energy efficient; user friendly and fault tolerant. PMID:25248069
Liu, Xing; Hou, Kun Mean; de Vaulx, Christophe; Shi, Hongling; El Gholami, Khalid
2014-09-22
Operating system (OS) technology is significant for the proliferation of the wireless sensor network (WSN). With an outstanding OS; the constrained WSN resources (processor; memory and energy) can be utilized efficiently. Moreover; the user application development can be served soundly. In this article; a new hybrid; real-time; memory-efficient; energy-efficient; user-friendly and fault-tolerant WSN OS MIROS is designed and implemented. MIROS implements the hybrid scheduler and the dynamic memory allocator. Real-time scheduling can thus be achieved with low memory consumption. In addition; it implements a mid-layer software EMIDE (Efficient Mid-layer Software for User-Friendly Application Development Environment) to decouple the WSN application from the low-level system. The application programming process can consequently be simplified and the application reprogramming performance improved. Moreover; it combines both the software and the multi-core hardware techniques to conserve the energy resources; improve the node reliability; as well as achieve a new debugging method. To evaluate the performance of MIROS; it is compared with the other WSN OSes (TinyOS; Contiki; SOS; openWSN and mantisOS) from different OS concerns. The final evaluation results prove that MIROS is suitable to be used even on the tight resource-constrained WSN nodes. It can support the real-time WSN applications. Furthermore; it is energy efficient; user friendly and fault tolerant.
Robust Fault Detection and Isolation for Stochastic Systems
NASA Technical Reports Server (NTRS)
George, Jemin; Gregory, Irene M.
2010-01-01
This paper outlines the formulation of a robust fault detection and isolation scheme that can precisely detect and isolate simultaneous actuator and sensor faults for uncertain linear stochastic systems. The given robust fault detection scheme based on the discontinuous robust observer approach would be able to distinguish between model uncertainties and actuator failures and therefore eliminate the problem of false alarms. Since the proposed approach involves precise reconstruction of sensor faults, it can also be used for sensor fault identification and the reconstruction of true outputs from faulty sensor outputs. Simulation results presented here validate the effectiveness of the robust fault detection and isolation system.
NASA Technical Reports Server (NTRS)
Kobayashi, Takahisa; Simon, Donald L.
2004-01-01
In this paper, an approach for in-flight fault detection and isolation (FDI) of aircraft engine sensors based on a bank of Kalman filters is developed. This approach utilizes multiple Kalman filters, each of which is designed based on a specific fault hypothesis. When the propulsion system experiences a fault, only one Kalman filter with the correct hypothesis is able to maintain the nominal estimation performance. Based on this knowledge, the isolation of faults is achieved. Since the propulsion system may experience component and actuator faults as well, a sensor FDI system must be robust in terms of avoiding misclassifications of any anomalies. The proposed approach utilizes a bank of (m+1) Kalman filters where m is the number of sensors being monitored. One Kalman filter is used for the detection of component and actuator faults while each of the other m filters detects a fault in a specific sensor. With this setup, the overall robustness of the sensor FDI system to anomalies is enhanced. Moreover, numerous component fault events can be accounted for by the FDI system. The sensor FDI system is applied to a commercial aircraft engine simulation, and its performance is evaluated at multiple power settings at a cruise operating point using various fault scenarios.
A comparative study of sensor fault diagnosis methods based on observer for ECAS system
NASA Astrophysics Data System (ADS)
Xu, Xing; Wang, Wei; Zou, Nannan; Chen, Long; Cui, Xiaoli
2017-03-01
The performance and practicality of electronically controlled air suspension (ECAS) system are highly dependent on the state information supplied by kinds of sensors, but faults of sensors occur frequently. Based on a non-linearized 3-DOF 1/4 vehicle model, different methods of fault detection and isolation (FDI) are used to diagnose the sensor faults for ECAS system. The considered approaches include an extended Kalman filter (EKF) with concise algorithm, a strong tracking filter (STF) with robust tracking ability, and the cubature Kalman filter (CKF) with numerical precision. We propose three filters of EKF, STF, and CKF to design a state observer of ECAS system under typical sensor faults and noise. Results show that three approaches can successfully detect and isolate faults respectively despite of the existence of environmental noise, FDI time delay and fault sensitivity of different algorithms are different, meanwhile, compared with EKF and STF, CKF method has best performing FDI of sensor faults for ECAS system.
Reliability Assessment for Low-cost Unmanned Aerial Vehicles
NASA Astrophysics Data System (ADS)
Freeman, Paul Michael
Existing low-cost unmanned aerospace systems are unreliable, and engineers must blend reliability analysis with fault-tolerant control in novel ways. This dissertation introduces the University of Minnesota unmanned aerial vehicle flight research platform, a comprehensive simulation and flight test facility for reliability and fault-tolerance research. An industry-standard reliability assessment technique, the failure modes and effects analysis, is performed for an unmanned aircraft. Particular attention is afforded to the control surface and servo-actuation subsystem. Maintaining effector health is essential for safe flight; failures may lead to loss of control incidents. Failure likelihood, severity, and risk are qualitatively assessed for several effector failure modes. Design changes are recommended to improve aircraft reliability based on this analysis. Most notably, the control surfaces are split, providing independent actuation and dual-redundancy. The simulation models for control surface aerodynamic effects are updated to reflect the split surfaces using a first-principles geometric analysis. The failure modes and effects analysis is extended by using a high-fidelity nonlinear aircraft simulation. A trim state discovery is performed to identify the achievable steady, wings-level flight envelope of the healthy and damaged vehicle. Tolerance of elevator actuator failures is studied using familiar tools from linear systems analysis. This analysis reveals significant inherent performance limitations for candidate adaptive/reconfigurable control algorithms used for the vehicle. Moreover, it demonstrates how these tools can be applied in a design feedback loop to make safety-critical unmanned systems more reliable. Control surface impairments that do occur must be quickly and accurately detected. This dissertation also considers fault detection and identification for an unmanned aerial vehicle using model-based and model-free approaches and applies those algorithms to experimental faulted and unfaulted flight test data. Flight tests are conducted with actuator faults that affect the plant input and sensor faults that affect the vehicle state measurements. A model-based detection strategy is designed and uses robust linear filtering methods to reject exogenous disturbances, e.g. wind, while providing robustness to model variation. A data-driven algorithm is developed to operate exclusively on raw flight test data without physical model knowledge. The fault detection and identification performance of these complementary but different methods is compared. Together, enhanced reliability assessment and multi-pronged fault detection and identification techniques can help to bring about the next generation of reliable low-cost unmanned aircraft.
Quantitative fault tolerant control design for a hydraulic actuator with a leaking piston seal
NASA Astrophysics Data System (ADS)
Karpenko, Mark
Hydraulic actuators are complex fluid power devices whose performance can be degraded in the presence of system faults. In this thesis a linear, fixed-gain, fault tolerant controller is designed that can maintain the positioning performance of an electrohydraulic actuator operating under load with a leaking piston seal and in the presence of parametric uncertainties. Developing a control system tolerant to this class of internal leakage fault is important since a leaking piston seal can be difficult to detect, unless the actuator is disassembled. The designed fault tolerant control law is of low-order, uses only the actuator position as feedback, and can: (i) accommodate nonlinearities in the hydraulic functions, (ii) maintain robustness against typical uncertainties in the hydraulic system parameters, and (iii) keep the positioning performance of the actuator within prescribed tolerances despite an internal leakage fault that can bypass up to 40% of the rated servovalve flow across the actuator piston. Experimental tests verify the functionality of the fault tolerant control under normal and faulty operating conditions. The fault tolerant controller is synthesized based on linear time-invariant equivalent (LTIE) models of the hydraulic actuator using the quantitative feedback theory (QFT) design technique. A numerical approach for identifying LTIE frequency response functions of hydraulic actuators from acceptable input-output responses is developed so that linearizing the hydraulic functions can be avoided. The proposed approach can properly identify the features of the hydraulic actuator frequency response that are important for control system design and requires no prior knowledge about the asymptotic behavior or structure of the LTIE transfer functions. A distributed hardware-in-the-loop (HIL) simulation architecture is constructed that enables the performance of the proposed fault tolerant control law to be further substantiated, under realistic operating conditions. Using the HIL framework, the fault tolerant hydraulic actuator is operated as a flight control actuator against the real-time numerical simulation of a high-performance jet aircraft. A robust electrohydraulic loading system is also designed using QFT so that the in-flight aerodynamic load can be experimentally replicated. The results of the HIL experiments show that using the fault tolerant controller to compensate the internal leakage fault at the actuator level can benefit the flight performance of the airplane.
14 CFR Special Federal Aviation... - Fuel Tank System Fault Tolerance Evaluation Requirements
Code of Federal Regulations, 2014 CFR
2014-01-01
... 14 Aeronautics and Space 1 2014-01-01 2014-01-01 false Fuel Tank System Fault Tolerance Evaluation Requirements Federal Special Federal Aviation Regulation No. 88 Aeronautics and Space FEDERAL AVIATION..., SFAR No. 88 Special Federal Aviation Regulation No. 88—Fuel Tank System Fault Tolerance Evaluation...
14 CFR Special Federal Aviation... - Fuel Tank System Fault Tolerance Evaluation Requirements
Code of Federal Regulations, 2011 CFR
2011-01-01
... 14 Aeronautics and Space 1 2011-01-01 2011-01-01 false Fuel Tank System Fault Tolerance Evaluation Requirements Federal Special Federal Aviation Regulation No. 88 Aeronautics and Space FEDERAL AVIATION..., SFAR No. 88 Special Federal Aviation Regulation No. 88—Fuel Tank System Fault Tolerance Evaluation...
14 CFR Special Federal Aviation... - Fuel Tank System Fault Tolerance Evaluation Requirements
Code of Federal Regulations, 2012 CFR
2012-01-01
... 14 Aeronautics and Space 1 2012-01-01 2012-01-01 false Fuel Tank System Fault Tolerance Evaluation Requirements Federal Special Federal Aviation Regulation No. 88 Aeronautics and Space FEDERAL AVIATION..., SFAR No. 88 Special Federal Aviation Regulation No. 88—Fuel Tank System Fault Tolerance Evaluation...
14 CFR Special Federal Aviation... - Fuel Tank System Fault Tolerance Evaluation Requirements
Code of Federal Regulations, 2010 CFR
2010-01-01
... 14 Aeronautics and Space 1 2010-01-01 2010-01-01 false Fuel Tank System Fault Tolerance Evaluation Requirements Federal Special Federal Aviation Regulation No. 88 Aeronautics and Space FEDERAL AVIATION..., SFAR No. 88 Special Federal Aviation Regulation No. 88—Fuel Tank System Fault Tolerance Evaluation...
14 CFR Special Federal Aviation... - Fuel Tank System Fault Tolerance Evaluation Requirements
Code of Federal Regulations, 2013 CFR
2013-01-01
... 14 Aeronautics and Space 1 2013-01-01 2013-01-01 false Fuel Tank System Fault Tolerance Evaluation Requirements Federal Special Federal Aviation Regulation No. 88 Aeronautics and Space FEDERAL AVIATION..., SFAR No. 88 Special Federal Aviation Regulation No. 88—Fuel Tank System Fault Tolerance Evaluation...
A second generation experiment in fault-tolerant software
NASA Technical Reports Server (NTRS)
Knight, J. C.
1986-01-01
The primary goal was to determine whether the application of fault tolerance to software increases its reliability if the cost of production is the same as for an equivalent nonfault tolerance version derived from the same requirements specification. Software development protocols are discussed. The feasibility of adapting to software design fault tolerance the technique of N-fold Modular Redundancy with majority voting was studied.
Kilgore, Brian D.
2017-01-01
A non-contact, wideband method of sensing dynamic fault slip in laboratory geophysical experiments employs an inexpensive magnetoresistive sensor, a small neodymium rare earth magnet, and user built application-specific wideband signal conditioning. The magnetoresistive sensor generates a voltage proportional to the changing angles of magnetic flux lines, generated by differential motion or rotation of the near-by magnet, through the sensor. The performance of an array of these sensors compares favorably to other conventional position sensing methods employed at multiple locations along a 2 m long × 0.4 m deep laboratory strike-slip fault. For these magnetoresistive sensors, the lack of resonance signals commonly encountered with cantilever-type position sensor mounting, the wide band response (DC to ≈ 100 kHz) that exceeds the capabilities of many traditional position sensors, and the small space required on the sample, make them attractive options for capturing high speed fault slip measurements in these laboratory experiments. An unanticipated observation of this study is the apparent sensitivity of this sensor to high frequency electomagnetic signals associated with fault rupture and (or) rupture propagation, which may offer new insights into the physics of earthquake faulting.
Communal Cooperation in Sensor Networks for Situation Management
NASA Technical Reports Server (NTRS)
Jones, Kennie H.; Lodding, Kenneth N.; Olariu, Stephan; Wilson, Larry; Xin,Chunsheng
2006-01-01
Situation management is a rapidly evolving science where managed sources are processed as realtime streams of events and fused in a way that maximizes comprehension, thus enabling better decisions for action. Sensor networks provide a new technology that promises ubiquitous input and action throughout an environment, which can substantially improve information available to the process. Here we describe a NASA program that requires improvements in sensor networks and situation management. We present an approach for massively deployed sensor networks that does not rely on centralized control but is founded in lessons learned from the way biological ecosystems are organized. In this approach, fully distributed data aggregation and integration can be performed in a scalable fashion where individual motes operate based on local information, making local decisions that achieve globally-meaningful results. This exemplifies the robust, fault-tolerant infrastructure required for successful situation management systems.
Fault tolerance of artificial neural networks with applications in critical systems
NASA Technical Reports Server (NTRS)
Protzel, Peter W.; Palumbo, Daniel L.; Arras, Michael K.
1992-01-01
This paper investigates the fault tolerance characteristics of time continuous recurrent artificial neural networks (ANN) that can be used to solve optimization problems. The principle of operations and performance of these networks are first illustrated by using well-known model problems like the traveling salesman problem and the assignment problem. The ANNs are then subjected to 13 simultaneous 'stuck at 1' or 'stuck at 0' faults for network sizes of up to 900 'neurons'. The effects of these faults is demonstrated and the cause for the observed fault tolerance is discussed. An application is presented in which a network performs a critical task for a real-time distributed processing system by generating new task allocations during the reconfiguration of the system. The performance degradation of the ANN under the presence of faults is investigated by large-scale simulations, and the potential benefits of delegating a critical task to a fault tolerant network are discussed.
Fault-tolerant dynamic task graph scheduling
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kurt, Mehmet C.; Krishnamoorthy, Sriram; Agrawal, Kunal
2014-11-16
In this paper, we present an approach to fault tolerant execution of dynamic task graphs scheduled using work stealing. In particular, we focus on selective and localized recovery of tasks in the presence of soft faults. We elicit from the user the basic task graph structure in terms of successor and predecessor relationships. The work stealing-based algorithm to schedule such a task graph is augmented to enable recovery when the data and meta-data associated with a task get corrupted. We use this redundancy, and the knowledge of the task graph structure, to selectively recover from faults with low space andmore » time overheads. We show that the fault tolerant design retains the essential properties of the underlying work stealing-based task scheduling algorithm, and that the fault tolerant execution is asymptotically optimal when task re-execution is taken into account. Experimental evaluation demonstrates the low cost of recovery under various fault scenarios.« less
Design of on-board Bluetooth wireless network system based on fault-tolerant technology
NASA Astrophysics Data System (ADS)
You, Zheng; Zhang, Xiangqi; Yu, Shijie; Tian, Hexiang
2007-11-01
In this paper, the Bluetooth wireless data transmission technology is applied in on-board computer system, to realize wireless data transmission between peripherals of the micro-satellite integrating electronic system, and in view of the high demand of reliability of a micro-satellite, a design of Bluetooth wireless network based on fault-tolerant technology is introduced. The reliability of two fault-tolerant systems is estimated firstly using Markov model, then the structural design of this fault-tolerant system is introduced; several protocols are established to make the system operate correctly, some related problems are listed and analyzed, with emphasis on Fault Auto-diagnosis System, Active-standby switch design and Data-Integrity process.
Spacecraft fault tolerance: The Magellan experience
NASA Technical Reports Server (NTRS)
Kasuda, Rick; Packard, Donna Sexton
1993-01-01
Interplanetary and earth orbiting missions are now imposing unique fault tolerant requirements upon spacecraft design. Mission success is the prime motivator for building spacecraft with fault tolerant systems. The Magellan spacecraft had many such requirements imposed upon its design. Magellan met these requirements by building redundancy into all the major subsystem components and designing the onboard hardware and software with the capability to detect a fault, isolate it to a component, and issue commands to achieve a back-up configuration. This discussion is limited to fault protection, which is the autonomous capability to respond to a fault. The Magellan fault protection design is discussed, as well as the developmental and flight experiences and a summary of the lessons learned.
An optimized implementation of a fault-tolerant clock synchronization circuit
NASA Technical Reports Server (NTRS)
Torres-Pomales, Wilfredo
1995-01-01
A fault-tolerant clock synchronization circuit was designed and tested. A comparison to a previous design and the procedure followed to achieve the current optimization are included. The report also includes a description of the system and the results of tests performed to study the synchronization and fault-tolerant characteristics of the implementation.
VLSI Implementation of Fault Tolerance Multiplier based on Reversible Logic Gate
NASA Astrophysics Data System (ADS)
Ahmad, Nabihah; Hakimi Mokhtar, Ahmad; Othman, Nurmiza binti; Fhong Soon, Chin; Rahman, Ab Al Hadi Ab
2017-08-01
Multiplier is one of the essential component in the digital world such as in digital signal processing, microprocessor, quantum computing and widely used in arithmetic unit. Due to the complexity of the multiplier, tendency of errors are very high. This paper aimed to design a 2×2 bit Fault Tolerance Multiplier based on Reversible logic gate with low power consumption and high performance. This design have been implemented using 90nm Complemetary Metal Oxide Semiconductor (CMOS) technology in Synopsys Electronic Design Automation (EDA) Tools. Implementation of the multiplier architecture is by using the reversible logic gates. The fault tolerance multiplier used the combination of three reversible logic gate which are Double Feynman gate (F2G), New Fault Tolerance (NFT) gate and Islam Gate (IG) with the area of 160μm x 420.3μm (67.25 mm2). This design achieved a low power consumption of 122.85μW and propagation delay of 16.99ns. The fault tolerance multiplier proposed achieved a low power consumption and high performance which suitable for application of modern computing as it has a fault tolerance capabilities.
An Integrated Fault Tolerant Robotic Controller System for High Reliability and Safety
NASA Technical Reports Server (NTRS)
Marzwell, Neville I.; Tso, Kam S.; Hecht, Myron
1994-01-01
This paper describes the concepts and features of a fault-tolerant intelligent robotic control system being developed for applications that require high dependability (reliability, availability, and safety). The system consists of two major elements: a fault-tolerant controller and an operator workstation. The fault-tolerant controller uses a strategy which allows for detection and recovery of hardware, operating system, and application software failures.The fault-tolerant controller can be used by itself in a wide variety of applications in industry, process control, and communications. The controller in combination with the operator workstation can be applied to robotic applications such as spaceborne extravehicular activities, hazardous materials handling, inspection and maintenance of high value items (e.g., space vehicles, reactor internals, or aircraft), medicine, and other tasks where a robot system failure poses a significant risk to life or property.
Reliability of Fault Tolerant Control Systems. Part 1
NASA Technical Reports Server (NTRS)
Wu, N. Eva
2001-01-01
This paper reports Part I of a two part effort, that is intended to delineate the relationship between reliability and fault tolerant control in a quantitative manner. Reliability analysis of fault-tolerant control systems is performed using Markov models. Reliability properties, peculiar to fault-tolerant control systems are emphasized. As a consequence, coverage of failures through redundancy management can be severely limited. It is shown that in the early life of a syi1ein composed of highly reliable subsystems, the reliability of the overall system is affine with respect to coverage, and inadequate coverage induces dominant single point failures. The utility of some existing software tools for assessing the reliability of fault tolerant control systems is also discussed. Coverage modeling is attempted in Part II in a way that captures its dependence on the control performance and on the diagnostic resolution.
Distributed asynchronous microprocessor architectures in fault tolerant integrated flight systems
NASA Technical Reports Server (NTRS)
Dunn, W. R.
1983-01-01
The paper discusses the implementation of fault tolerant digital flight control and navigation systems for rotorcraft application. It is shown that in implementing fault tolerance at the systems level using advanced LSI/VLSI technology, aircraft physical layout and flight systems requirements tend to define a system architecture of distributed, asynchronous microprocessors in which fault tolerance can be achieved locally through hardware redundancy and/or globally through application of analytical redundancy. The effects of asynchronism on the execution of dynamic flight software is discussed. It is shown that if the asynchronous microprocessors have knowledge of time, these errors can be significantly reduced through appropiate modifications of the flight software. Finally, the papear extends previous work to show that through the combined use of time referencing and stable flight algorithms, individual microprocessors can be configured to autonomously tolerate intermittent faults.
Modeling Sensor Reliability in Fault Diagnosis Based on Evidence Theory
Yuan, Kaijuan; Xiao, Fuyuan; Fei, Liguo; Kang, Bingyi; Deng, Yong
2016-01-01
Sensor data fusion plays an important role in fault diagnosis. Dempster–Shafer (D-R) evidence theory is widely used in fault diagnosis, since it is efficient to combine evidence from different sensors. However, under the situation where the evidence highly conflicts, it may obtain a counterintuitive result. To address the issue, a new method is proposed in this paper. Not only the statistic sensor reliability, but also the dynamic sensor reliability are taken into consideration. The evidence distance function and the belief entropy are combined to obtain the dynamic reliability of each sensor report. A weighted averaging method is adopted to modify the conflict evidence by assigning different weights to evidence according to sensor reliability. The proposed method has better performance in conflict management and fault diagnosis due to the fact that the information volume of each sensor report is taken into consideration. An application in fault diagnosis based on sensor fusion is illustrated to show the efficiency of the proposed method. The results show that the proposed method improves the accuracy of fault diagnosis from 81.19% to 89.48% compared to the existing methods. PMID:26797611
Simulated fault injection - A methodology to evaluate fault tolerant microprocessor architectures
NASA Technical Reports Server (NTRS)
Choi, Gwan S.; Iyer, Ravishankar K.; Carreno, Victor A.
1990-01-01
A simulation-based fault-injection method for validating fault-tolerant microprocessor architectures is described. The approach uses mixed-mode simulation (electrical/logic analysis), and injects transient errors in run-time to assess the resulting fault impact. As an example, a fault-tolerant architecture which models the digital aspects of a dual-channel real-time jet-engine controller is used. The level of effectiveness of the dual configuration with respect to single and multiple transients is measured. The results indicate 100 percent coverage of single transients. Approximately 12 percent of the multiple transients affect both channels; none result in controller failure since two additional levels of redundancy exist.
Fault-tolerant measurement-based quantum computing with continuous-variable cluster states.
Menicucci, Nicolas C
2014-03-28
A long-standing open question about Gaussian continuous-variable cluster states is whether they enable fault-tolerant measurement-based quantum computation. The answer is yes. Initial squeezing in the cluster above a threshold value of 20.5 dB ensures that errors from finite squeezing acting on encoded qubits are below the fault-tolerance threshold of known qubit-based error-correcting codes. By concatenating with one of these codes and using ancilla-based error correction, fault-tolerant measurement-based quantum computation of theoretically indefinite length is possible with finitely squeezed cluster states.
Multi-version software reliability through fault-avoidance and fault-tolerance
NASA Technical Reports Server (NTRS)
Vouk, Mladen A.; Mcallister, David F.
1989-01-01
A number of experimental and theoretical issues associated with the practical use of multi-version software to provide run-time tolerance to software faults were investigated. A specialized tool was developed and evaluated for measuring testing coverage for a variety of metrics. The tool was used to collect information on the relationships between software faults and coverage provided by the testing process as measured by different metrics (including data flow metrics). Considerable correlation was found between coverage provided by some higher metrics and the elimination of faults in the code. Back-to-back testing was continued as an efficient mechanism for removal of un-correlated faults, and common-cause faults of variable span. Software reliability estimation methods was also continued based on non-random sampling, and the relationship between software reliability and code coverage provided through testing. New fault tolerance models were formulated. Simulation studies of the Acceptance Voting and Multi-stage Voting algorithms were finished and it was found that these two schemes for software fault tolerance are superior in many respects to some commonly used schemes. Particularly encouraging are the safety properties of the Acceptance testing scheme.
Sequoia: A fault-tolerant tightly coupled multiprocessor for transaction processing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bernstein, P.A.
1988-02-01
The Sequoia computer is a tightly coupled multiprocessor, and thus attains the performance advantages of this style of architecture. It avoids most of the fault-tolerance disadvantages of tight coupling by using a new fault-tolerance design. The Sequoia architecture is similar to other multimicroprocessor architectures, such as those of Encore and Sequent, in that it gives dozens of microprocessors shared access to a large main memory. It resembles the Stratus architecture in its extensive use of hardware fault-detection techniques. It resembles Stratus and Auragen in its ability to quickly recover all processes after a single point failure, transparently to the user.more » However, Sequoia is unique in its combination of a large-scale tightly coupled architecture with a hardware approach to fault tolerance. This article gives an overview of how the hardware architecture and operating systems (OS) work together to provide a high degree of fault tolerance with good system performance.« less
An Indirect Adaptive Control Scheme in the Presence of Actuator and Sensor Failures
NASA Technical Reports Server (NTRS)
Sun, Joy Z.; Josh, Suresh M.
2009-01-01
The problem of controlling a system in the presence of unknown actuator and sensor faults is addressed. The system is assumed to have groups of actuators, and groups of sensors, with each group consisting of multiple redundant similar actuators or sensors. The types of actuator faults considered consist of unknown actuators stuck in unknown positions, as well as reduced actuator effectiveness. The sensor faults considered include unknown biases and outages. The approach employed for fault detection and estimation consists of a bank of Kalman filters based on multiple models, and subsequent control reconfiguration to mitigate the effect of biases caused by failed components as well as to obtain stability and satisfactory performance using the remaining actuators and sensors. Conditions for fault identifiability are presented, and the adaptive scheme is applied to an aircraft flight control example in the presence of actuator failures. Simulation results demonstrate that the method can rapidly and accurately detect faults and estimate the fault values, thus enabling safe operation and acceptable performance in spite of failures.
Improved fault tolerance for air bag release in automobiles
NASA Astrophysics Data System (ADS)
Yeshwanth Kumar, C. H.; Prudhvi Prasad, P.; Uday Shankar, M.; Shanmugasundaram, M.
2017-11-01
In order to increase the reliability of the airbag system in automobiles which in turn increase the safety of the automobile we require improved airbag release system, our project deals with Triple Modular Redundancy (TMR) Technique where we use either three Sensors interfaced with three Microcontrollers which given as input to the software voter which produces majority output which is feed to the air compressor for releasing airbag. This concept was being used, in this project we are increasing reliability and safety of the entire system.
Coordinated Fault-Tolerance for High-Performance Computing Final Project Report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Panda, Dhabaleswar Kumar; Beckman, Pete
2011-07-28
With the Coordinated Infrastructure for Fault Tolerance Systems (CIFTS, as the original project came to be called) project, our aim has been to understand and tackle the following broad research questions, the answers to which will help the HEC community analyze and shape the direction of research in the field of fault tolerance and resiliency on future high-end leadership systems. Will availability of global fault information, obtained by fault information exchange between the different HEC software on a system, allow individual system software to better detect, diagnose, and adaptively respond to faults? If fault-awareness is raised throughout the system throughmore » fault information exchange, is it possible to get all system software working together to provide a more comprehensive end-to-end fault management on the system? What are the missing fault-tolerance features that widely used HEC system software lacks today that would inhibit such software from taking advantage of systemwide global fault information? What are the practical limitations of a systemwide approach for end-to-end fault management based on fault awareness and coordination? What mechanisms, tools, and technologies are needed to bring about fault awareness and coordination of responses on a leadership-class system? What standards, outreach, and community interaction are needed for adoption of the concept of fault awareness and coordination for fault management on future systems? Keeping our overall objectives in mind, the CIFTS team has taken a parallel fourfold approach. Our central goal was to design and implement a light-weight, scalable infrastructure with a simple, standardized interface to allow communication of fault-related information through the system and facilitate coordinated responses. This work led to the development of the Fault Tolerance Backplane (FTB) publish-subscribe API specification, together with a reference implementation and several experimental implementations on top of existing publish-subscribe tools. We enhanced the intrinsic fault tolerance capabilities representative implementations of a variety of key HPC software subsystems and integrated them with the FTB. Targeting software subsystems included: MPI communication libraries, checkpoint/restart libraries, resource managers and job schedulers, and system monitoring tools. Leveraging the aforementioned infrastructure, as well as developing and utilizing additional tools, we have examined issues associated with expanded, end-to-end fault response from both system and application viewpoints. From the standpoint of system operations, we have investigated log and root cause analysis, anomaly detection and fault prediction, and generalized notification mechanisms. Our applications work has included libraries for fault-tolerance linear algebra, application frameworks for coupled multiphysics applications, and external frameworks to support the monitoring and response for general applications. Our final goal was to engage the high-end computing community to increase awareness of tools and issues around coordinated end-to-end fault management.« less
NASA Technical Reports Server (NTRS)
Brock, L. D.; Lala, J.
1986-01-01
The Advanced Information Processing System (AIPS) is designed to provide a fault tolerant and damage tolerant data processing architecture for a broad range of aerospace vehicles. The AIPS architecture also has attributes to enhance system effectiveness such as graceful degradation, growth and change tolerance, integrability, etc. Two key building blocks being developed by the AIPS program are a fault and damage tolerant processor and communication network. A proof-of-concept system is now being built and will be tested to demonstrate the validity and performance of the AIPS concepts.
Error Mitigation of Point-to-Point Communication for Fault-Tolerant Computing
NASA Technical Reports Server (NTRS)
Akamine, Robert L.; Hodson, Robert F.; LaMeres, Brock J.; Ray, Robert E.
2011-01-01
Fault tolerant systems require the ability to detect and recover from physical damage caused by the hardware s environment, faulty connectors, and system degradation over time. This ability applies to military, space, and industrial computing applications. The integrity of Point-to-Point (P2P) communication, between two microcontrollers for example, is an essential part of fault tolerant computing systems. In this paper, different methods of fault detection and recovery are presented and analyzed.
Virtual Sensor of Surface Electromyography in a New Extensive Fault-Tolerant Classification System.
de Moura, Karina de O A; Balbinot, Alexandre
2018-05-01
A few prosthetic control systems in the scientific literature obtain pattern recognition algorithms adapted to changes that occur in the myoelectric signal over time and, frequently, such systems are not natural and intuitive. These are some of the several challenges for myoelectric prostheses for everyday use. The concept of the virtual sensor, which has as its fundamental objective to estimate unavailable measures based on other available measures, is being used in other fields of research. The virtual sensor technique applied to surface electromyography can help to minimize these problems, typically related to the degradation of the myoelectric signal that usually leads to a decrease in the classification accuracy of the movements characterized by computational intelligent systems. This paper presents a virtual sensor in a new extensive fault-tolerant classification system to maintain the classification accuracy after the occurrence of the following contaminants: ECG interference, electrode displacement, movement artifacts, power line interference, and saturation. The Time-Varying Autoregressive Moving Average (TVARMA) and Time-Varying Kalman filter (TVK) models are compared to define the most robust model for the virtual sensor. Results of movement classification were presented comparing the usual classification techniques with the method of the degraded signal replacement and classifier retraining. The experimental results were evaluated for these five noise types in 16 surface electromyography (sEMG) channel degradation case studies. The proposed system without using classifier retraining techniques recovered of mean classification accuracy was of 4% to 38% for electrode displacement, movement artifacts, and saturation noise. The best mean classification considering all signal contaminants and channel combinations evaluated was the classification using the retraining method, replacing the degraded channel by the virtual sensor TVARMA model. This method recovered the classification accuracy after the degradations, reaching an average of 5.7% below the classification of the clean signal, that is the signal without the contaminants or the original signal. Moreover, the proposed intelligent technique minimizes the impact of the motion classification caused by signal contamination related to degrading events over time. There are improvements in the virtual sensor model and in the algorithm optimization that need further development to provide an increase the clinical application of myoelectric prostheses but already presents robust results to enable research with virtual sensors on biological signs with stochastic behavior.
Virtual Sensor of Surface Electromyography in a New Extensive Fault-Tolerant Classification System
Balbinot, Alexandre
2018-01-01
A few prosthetic control systems in the scientific literature obtain pattern recognition algorithms adapted to changes that occur in the myoelectric signal over time and, frequently, such systems are not natural and intuitive. These are some of the several challenges for myoelectric prostheses for everyday use. The concept of the virtual sensor, which has as its fundamental objective to estimate unavailable measures based on other available measures, is being used in other fields of research. The virtual sensor technique applied to surface electromyography can help to minimize these problems, typically related to the degradation of the myoelectric signal that usually leads to a decrease in the classification accuracy of the movements characterized by computational intelligent systems. This paper presents a virtual sensor in a new extensive fault-tolerant classification system to maintain the classification accuracy after the occurrence of the following contaminants: ECG interference, electrode displacement, movement artifacts, power line interference, and saturation. The Time-Varying Autoregressive Moving Average (TVARMA) and Time-Varying Kalman filter (TVK) models are compared to define the most robust model for the virtual sensor. Results of movement classification were presented comparing the usual classification techniques with the method of the degraded signal replacement and classifier retraining. The experimental results were evaluated for these five noise types in 16 surface electromyography (sEMG) channel degradation case studies. The proposed system without using classifier retraining techniques recovered of mean classification accuracy was of 4% to 38% for electrode displacement, movement artifacts, and saturation noise. The best mean classification considering all signal contaminants and channel combinations evaluated was the classification using the retraining method, replacing the degraded channel by the virtual sensor TVARMA model. This method recovered the classification accuracy after the degradations, reaching an average of 5.7% below the classification of the clean signal, that is the signal without the contaminants or the original signal. Moreover, the proposed intelligent technique minimizes the impact of the motion classification caused by signal contamination related to degrading events over time. There are improvements in the virtual sensor model and in the algorithm optimization that need further development to provide an increase the clinical application of myoelectric prostheses but already presents robust results to enable research with virtual sensors on biological signs with stochastic behavior. PMID:29723994
Learning and diagnosing faults using neural networks
NASA Technical Reports Server (NTRS)
Whitehead, Bruce A.; Kiech, Earl L.; Ali, Moonis
1990-01-01
Neural networks have been employed for learning fault behavior from rocket engine simulator parameters and for diagnosing faults on the basis of the learned behavior. Two problems in applying neural networks to learning and diagnosing faults are (1) the complexity of the sensor data to fault mapping to be modeled by the neural network, which implies difficult and lengthy training procedures; and (2) the lack of sufficient training data to adequately represent the very large number of different types of faults which might occur. Methods are derived and tested in an architecture which addresses these two problems. First, the sensor data to fault mapping is decomposed into three simpler mappings which perform sensor data compression, hypothesis generation, and sensor fusion. Efficient training is performed for each mapping separately. Secondly, the neural network which performs sensor fusion is structured to detect new unknown faults for which training examples were not presented during training. These methods were tested on a task of fault diagnosis by employing rocket engine simulator data. Results indicate that the decomposed neural network architecture can be trained efficiently, can identify faults for which it has been trained, and can detect the occurrence of faults for which it has not been trained.
A survey of NASA and military standards on fault tolerance and reliability applied to robotics
NASA Technical Reports Server (NTRS)
Cavallaro, Joseph R.; Walker, Ian D.
1994-01-01
There is currently increasing interest and activity in the area of reliability and fault tolerance for robotics. This paper discusses the application of Standards in robot reliability, and surveys the literature of relevant existing standards. A bibliography of relevant Military and NASA standards for reliability and fault tolerance is included.
Abstractions for Fault-Tolerant Distributed System Verification
NASA Technical Reports Server (NTRS)
Pike, Lee S.; Maddalon, Jeffrey M.; Miner, Paul S.; Geser, Alfons
2004-01-01
Four kinds of abstraction for the design and analysis of fault tolerant distributed systems are discussed. These abstractions concern system messages, faults, fault masking voting, and communication. The abstractions are formalized in higher order logic, and are intended to facilitate specifying and verifying such systems in higher order theorem provers.
Adaptive robust fault-tolerant control for linear MIMO systems with unmatched uncertainties
NASA Astrophysics Data System (ADS)
Zhang, Kangkang; Jiang, Bin; Yan, Xing-Gang; Mao, Zehui
2017-10-01
In this paper, two novel fault-tolerant control design approaches are proposed for linear MIMO systems with actuator additive faults, multiplicative faults and unmatched uncertainties. For time-varying multiplicative and additive faults, new adaptive laws and additive compensation functions are proposed. A set of conditions is developed such that the unmatched uncertainties are compensated by actuators in control. On the other hand, for unmatched uncertainties with their projection in unmatched space being not zero, based on a (vector) relative degree condition, additive functions are designed to compensate for the uncertainties from output channels in the presence of actuator faults. The developed fault-tolerant control schemes are applied to two aircraft systems to demonstrate the efficiency of the proposed approaches.
Intelligent Wireless Sensor Networks for System Health Monitoring
NASA Technical Reports Server (NTRS)
Alena, Rick
2011-01-01
Wireless sensor networks (WSN) based on the IEEE 802.15.4 Personal Area Network (PAN) standard are finding increasing use in the home automation and emerging smart energy markets. The network and application layers, based on the ZigBee 2007 Standard, provide a convenient framework for component-based software that supports customer solutions from multiple vendors. WSNs provide the inherent fault tolerance required for aerospace applications. The Discovery and Systems Health Group at NASA Ames Research Center has been developing WSN technology for use aboard aircraft and spacecraft for System Health Monitoring of structures and life support systems using funding from the NASA Engineering and Safety Center and Exploration Technology Development and Demonstration Program. This technology provides key advantages for low-power, low-cost ancillary sensing systems particularly across pressure interfaces and in areas where it is difficult to run wires. Intelligence for sensor networks could be defined as the capability of forming dynamic sensor networks, allowing high-level application software to identify and address any sensor that joined the network without the use of any centralized database defining the sensors characteristics. The IEEE 1451 Standard defines methods for the management of intelligent sensor systems and the IEEE 1451.4 section defines Transducer Electronic Datasheets (TEDS), which contain key information regarding the sensor characteristics such as name, description, serial number, calibration information and user information such as location within a vehicle. By locating the TEDS information on the wireless sensor itself and enabling access to this information base from the application software, the application can identify the sensor unambiguously and interpret and present the sensor data stream without reference to any other information. The application software is able to read the status of each sensor module, responding in real-time to changes of PAN configuration, providing the appropriate response for maintaining overall sensor system function, even when sensor modules fail or the WSN is reconfigured. The session will present the architecture and technical feasibility of creating fault-tolerant WSNs for aerospace applications based on our application of the technology to a Structural Health Monitoring testbed. The interim results of WSN development and testing including our software architecture for intelligent sensor management will be discussed in the context of the specific tradeoffs required for effective use. Initial certification measurement techniques and test results gauging WSN susceptibility to Radio Frequency interference are introduced as key challenges for technology adoption. A candidate Developmental and Flight Instrumentation implementation using intelligent sensor networks for wind tunnel and flight tests is developed as a guide to understanding key aspects of the aerospace vehicle design, test and operations life cycle.
Liao, Yi-Hung; Chou, Jung-Chuan; Lin, Chin-Yi
2013-01-01
Fault diagnosis (FD) and data fusion (DF) technologies implemented in the LabVIEW program were used for a ruthenium dioxide pH sensor array. The purpose of the fault diagnosis and data fusion technologies is to increase the reliability of measured data. Data fusion is a very useful statistical method used for sensor arrays in many fields. Fault diagnosis is used to avoid sensor faults and to measure errors in the electrochemical measurement system, therefore, in this study, we use fault diagnosis to remove any faulty sensors in advance, and then proceed with data fusion in the sensor array. The average, self-adaptive and coefficient of variance data fusion methods are used in this study. The pH electrode is fabricated with ruthenium dioxide (RuO2) sensing membrane using a sputtering system to deposit it onto a silicon substrate, and eight RuO2 pH electrodes are fabricated to form a sensor array for this study. PMID:24351636
Liao, Yi-Hung; Chou, Jung-Chuan; Lin, Chin-Yi
2013-12-13
Fault diagnosis (FD) and data fusion (DF) technologies implemented in the LabVIEW program were used for a ruthenium dioxide pH sensor array. The purpose of the fault diagnosis and data fusion technologies is to increase the reliability of measured data. Data fusion is a very useful statistical method used for sensor arrays in many fields. Fault diagnosis is used to avoid sensor faults and to measure errors in the electrochemical measurement system, therefore, in this study, we use fault diagnosis to remove any faulty sensors in advance, and then proceed with data fusion in the sensor array. The average, self-adaptive and coefficient of variance data fusion methods are used in this study. The pH electrode is fabricated with ruthenium dioxide (RuO2) sensing membrane using a sputtering system to deposit it onto a silicon substrate, and eight RuO2 pH electrodes are fabricated to form a sensor array for this study.
A fault-tolerant strategy based on SMC for current-controlled converters
NASA Astrophysics Data System (ADS)
Azer, Peter M.; Marei, Mostafa I.; Sattar, Ahmed A.
2018-05-01
The sliding mode control (SMC) is used to control variable structure systems such as power electronics converters. This paper presents a fault-tolerant strategy based on the SMC for current-controlled AC-DC converters. The proposed SMC is based on three sliding surfaces for the three legs of the AC-DC converter. Two sliding surfaces are assigned to control the phase currents since the input three-phase currents are balanced. Hence, the third sliding surface is considered as an extra degree of freedom which is utilised to control the neutral voltage. This action is utilised to enhance the performance of the converter during open-switch faults. The proposed fault-tolerant strategy is based on allocating the sliding surface of the faulty leg to control the neutral voltage. Consequently, the current waveform is improved. The behaviour of the current-controlled converter during different types of open-switch faults is analysed. Double switch faults include three cases: two upper switch fault; upper and lower switch fault at different legs; and two switches of the same leg. The dynamic performance of the proposed system is evaluated during healthy and open-switch fault operations. Simulation results exhibit the various merits of the proposed SMC-based fault-tolerant strategy.
Strategy Developed for Selecting Optimal Sensors for Monitoring Engine Health
NASA Technical Reports Server (NTRS)
2004-01-01
Sensor indications during rocket engine operation are the primary means of assessing engine performance and health. Effective selection and location of sensors in the operating engine environment enables accurate real-time condition monitoring and rapid engine controller response to mitigate critical fault conditions. These capabilities are crucial to ensure crew safety and mission success. Effective sensor selection also facilitates postflight condition assessment, which contributes to efficient engine maintenance and reduced operating costs. Under the Next Generation Launch Technology program, the NASA Glenn Research Center, in partnership with Rocketdyne Propulsion and Power, has developed a model-based procedure for systematically selecting an optimal sensor suite for assessing rocket engine system health. This optimization process is termed the systematic sensor selection strategy. Engine health management (EHM) systems generally employ multiple diagnostic procedures including data validation, anomaly detection, fault-isolation, and information fusion. The effectiveness of each diagnostic component is affected by the quality, availability, and compatibility of sensor data. Therefore systematic sensor selection is an enabling technology for EHM. Information in three categories is required by the systematic sensor selection strategy. The first category consists of targeted engine fault information; including the description and estimated risk-reduction factor for each identified fault. Risk-reduction factors are used to define and rank the potential merit of timely fault diagnoses. The second category is composed of candidate sensor information; including type, location, and estimated variance in normal operation. The final category includes the definition of fault scenarios characteristic of each targeted engine fault. These scenarios are defined in terms of engine model hardware parameters. Values of these parameters define engine simulations that generate expected sensor values for targeted fault scenarios. Taken together, this information provides an efficient condensation of the engineering experience and engine flow physics needed for sensor selection. The systematic sensor selection strategy is composed of three primary algorithms. The core of the selection process is a genetic algorithm that iteratively improves a defined quality measure of selected sensor suites. A merit algorithm is employed to compute the quality measure for each test sensor suite presented by the selection process. The quality measure is based on the fidelity of fault detection and the level of fault source discrimination provided by the test sensor suite. An inverse engine model, whose function is to derive hardware performance parameters from sensor data, is an integral part of the merit algorithm. The final component is a statistical evaluation algorithm that characterizes the impact of interference effects, such as control-induced sensor variation and sensor noise, on the probability of fault detection and isolation for optimal and near-optimal sensor suites.
NASA Astrophysics Data System (ADS)
Mandal, Shyamapada; Santhi, B.; Sridhar, S.; Vinolia, K.; Swaminathan, P.
2017-06-01
In this paper, an online fault detection and classification method is proposed for thermocouples used in nuclear power plants. In the proposed method, the fault data are detected by the classification method, which classifies the fault data from the normal data. Deep belief network (DBN), a technique for deep learning, is applied to classify the fault data. The DBN has a multilayer feature extraction scheme, which is highly sensitive to a small variation of data. Since the classification method is unable to detect the faulty sensor; therefore, a technique is proposed to identify the faulty sensor from the fault data. Finally, the composite statistical hypothesis test, namely generalized likelihood ratio test, is applied to compute the fault pattern of the faulty sensor signal based on the magnitude of the fault. The performance of the proposed method is validated by field data obtained from thermocouple sensors of the fast breeder test reactor.
Distributed adaptive diagnosis of sensor faults using structural response data
NASA Astrophysics Data System (ADS)
Dragos, Kosmas; Smarsly, Kay
2016-10-01
The reliability and consistency of wireless structural health monitoring (SHM) systems can be compromised by sensor faults, leading to miscalibrations, corrupted data, or even data loss. Several research approaches towards fault diagnosis, referred to as ‘analytical redundancy’, have been proposed that analyze the correlations between different sensor outputs. In wireless SHM, most analytical redundancy approaches require centralized data storage on a server for data analysis, while other approaches exploit the on-board computing capabilities of wireless sensor nodes, analyzing the raw sensor data directly on board. However, using raw sensor data poses an operational constraint due to the limited power resources of wireless sensor nodes. In this paper, a new distributed autonomous approach towards sensor fault diagnosis based on processed structural response data is presented. The inherent correlations among Fourier amplitudes of acceleration response data, at peaks corresponding to the eigenfrequencies of the structure, are used for diagnosis of abnormal sensor outputs at a given structural condition. Representing an entirely data-driven analytical redundancy approach that does not require any a priori knowledge of the monitored structure or of the SHM system, artificial neural networks (ANN) are embedded into the sensor nodes enabling cooperative fault diagnosis in a fully decentralized manner. The distributed analytical redundancy approach is implemented into a wireless SHM system and validated in laboratory experiments, demonstrating the ability of wireless sensor nodes to self-diagnose sensor faults accurately and efficiently with minimal data traffic. Besides enabling distributed autonomous fault diagnosis, the embedded ANNs are able to adapt to the actual condition of the structure, thus ensuring accurate and efficient fault diagnosis even in case of structural changes.
Improved Sensor Fault Detection, Isolation, and Mitigation Using Multiple Observers Approach
Wang, Zheng; Anand, D. M.; Moyne, J.; Tilbury, D. M.
2017-01-01
Traditional Fault Detection and Isolation (FDI) methods analyze a residual signal to detect and isolate sensor faults. The residual signal is the difference between the sensor measurements and the estimated outputs of the system based on an observer. The traditional residual-based FDI methods, however, have some limitations. First, they require that the observer has reached its steady state. In addition, residual-based methods may not detect some sensor faults, such as faults on critical sensors that result in an unobservable system. Furthermore, the system may be in jeopardy if actions required for mitigating the impact of the faulty sensors are not taken before the faulty sensors are identified. The contribution of this paper is to propose three new methods to address these limitations. Faults that occur during the observers' transient state can be detected by analyzing the convergence rate of the estimation error. Open-loop observers, which do not rely on sensor information, are used to detect faults on critical sensors. By switching among different observers, we can potentially mitigate the impact of the faulty sensor during the FDI process. These three methods are systematically integrated with a previously developed residual-based method to provide an improved FDI and mitigation capability framework. The overall approach is validated mathematically, and the effectiveness of the overall approach is demonstrated through simulation on a 5-state suspension system. PMID:28924303
Fault tolerant architectures for integrated aircraft electronics systems
NASA Technical Reports Server (NTRS)
Levitt, K. N.; Melliar-Smith, P. M.; Schwartz, R. L.
1983-01-01
Work into possible architectures for future flight control computer systems is described. Ada for Fault-Tolerant Systems, the NETS Network Error-Tolerant System architecture, and voting in asynchronous systems are covered.
Zhao, Lin; Guan, Dongxue; Landry, René Jr.; Cheng, Jianhua; Sydorenko, Kostyantyn
2015-01-01
Target positioning systems based on MEMS gyros and laser rangefinders (LRs) have extensive prospects due to their advantages of low cost, small size and easy realization. The target positioning accuracy is mainly determined by the LR’s attitude derived by the gyros. However, the attitude error is large due to the inherent noises from isolated MEMS gyros. In this paper, both accelerometer/magnetometer and LR attitude aiding systems are introduced to aid MEMS gyros. A no-reset Federated Kalman Filter (FKF) is employed, which consists of two local Kalman Filters (KF) and a Master Filter (MF). The local KFs are designed by using the Direction Cosine Matrix (DCM)-based dynamic equations and the measurements from the two aiding systems. The KFs can estimate the attitude simultaneously to limit the attitude errors resulting from the gyros. Then, the MF fuses the redundant attitude estimates to yield globally optimal estimates. Simulation and experimental results demonstrate that the FKF-based system can improve the target positioning accuracy effectively and allow for good fault-tolerant capability. PMID:26512672
A lightweight, high strength dexterous manipulator for commercial applications
NASA Technical Reports Server (NTRS)
Marzwell, Neville I.; Schena, Bruce M.; Cohan, Steve M.
1991-01-01
The concept, design, and features are described of a lightweight, high strength, modular robot manipulator being developed for space and commercial applications. The manipulator has seven fully active degrees of freedom and is fully operational in 1 G. Each of the seven joints incorporates a unique drivetrain design which provides zero backlash operation, is insensitive to wear, and is single fault tolerant to motor or servo amplifier failure. Feedback sensors provide position, velocity, torque, and motor winding temperature information at each joint. This sensing system is also designed to be single fault tolerant. The manipulator consists of five modules (not including gripper). These modules join via simple quick-disconnect couplings and self-mating connectors which allow rapid assembly and/or disassembly for reconfiguration, transport, or servicing. The manipulator is a completely enclosed assembly, with no exposed components or wires. Although the initial prototype will not be space qualified, the design is well suited to meeting space requirements. The control system provides dexterous motion by controlling the endpoint location and arm pose simultaneously. Potential applications are discussed.
Predeployment validation of fault-tolerant systems through software-implemented fault insertion
NASA Technical Reports Server (NTRS)
Czeck, Edward W.; Siewiorek, Daniel P.; Segall, Zary Z.
1989-01-01
Fault injection-based automated testing (FIAT) environment, which can be used to experimentally characterize and evaluate distributed realtime systems under fault-free and faulted conditions is described. A survey is presented of validation methodologies. The need for fault insertion based on validation methodologies is demonstrated. The origins and models of faults, and motivation for the FIAT concept are reviewed. FIAT employs a validation methodology which builds confidence in the system through first providing a baseline of fault-free performance data and then characterizing the behavior of the system with faults present. Fault insertion is accomplished through software and allows faults or the manifestation of faults to be inserted by either seeding faults into memory or triggering error detection mechanisms. FIAT is capable of emulating a variety of fault-tolerant strategies and architectures, can monitor system activity, and can automatically orchestrate experiments involving insertion of faults. There is a common system interface which allows ease of use to decrease experiment development and run time. Fault models chosen for experiments on FIAT have generated system responses which parallel those observed in real systems under faulty conditions. These capabilities are shown by two example experiments each using a different fault-tolerance strategy.
Determination of the optimal tolerance for MLC positioning in sliding window and VMAT techniques
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hernandez, V., E-mail: vhernandezmasgrau@gmail.com; Abella, R.; Calvo, J. F.
2015-04-15
Purpose: Several authors have recommended a 2 mm tolerance for multileaf collimator (MLC) positioning in sliding window treatments. In volumetric modulated arc therapy (VMAT) treatments, however, the optimal tolerance for MLC positioning remains unknown. In this paper, the authors present the results of a multicenter study to determine the optimal tolerance for both techniques. Methods: The procedure used is based on dynalog file analysis. The study was carried out using seven Varian linear accelerators from five different centers. Dynalogs were collected from over 100 000 clinical treatments and in-house software was used to compute the number of tolerance faults as amore » function of the user-defined tolerance. Thus, the optimal value for this tolerance, defined as the lowest achievable value, was investigated. Results: Dynalog files accurately predict the number of tolerance faults as a function of the tolerance value, especially for low fault incidences. All MLCs behaved similarly and the Millennium120 and the HD120 models yielded comparable results. In sliding window techniques, the number of beams with an incidence of hold-offs >1% rapidly decreases for a tolerance of 1.5 mm. In VMAT techniques, the number of tolerance faults sharply drops for tolerances around 2 mm. For a tolerance of 2.5 mm, less than 0.1% of the VMAT arcs presented tolerance faults. Conclusions: Dynalog analysis provides a feasible method for investigating the optimal tolerance for MLC positioning in dynamic fields. In sliding window treatments, the tolerance of 2 mm was found to be adequate, although it can be reduced to 1.5 mm. In VMAT treatments, the typically used 5 mm tolerance is excessively high. Instead, a tolerance of 2.5 mm is recommended.« less
Fault-tolerant locomotion of the hexapod robot.
Yang, J M; Kim, J H
1998-01-01
In this paper, we propose a scheme for fault detection and tolerance of the hexapod robot locomotion on even terrain. The fault stability margin is defined to represent potential stability which a gait can have in case a sudden fault event occurs to one leg. Based on this, the fault-tolerant quadruped periodic gaits of the hexapod walking over perfectly even terrain are derived. It is demonstrated that the derived quadruped gait is the optimal one the hexapod can have maintaining fault stability margin nonnegative and a geometric condition should be satisfied for the optimal locomotion. By this scheme, when one leg is in failure, the hexapod robot has the modified tripod gait to continue the optimal locomotion.
Simultaneous Sensor and Process Fault Diagnostics for Propellant Feed System
NASA Technical Reports Server (NTRS)
Cao, J.; Kwan, C.; Figueroa, F.; Xu, R.
2006-01-01
The main objective of this research is to extract fault features from sensor faults and process faults by using advanced fault detection and isolation (FDI) algorithms. A tank system that has some common characteristics to a NASA testbed at Stennis Space Center was used to verify our proposed algorithms. First, a generic tank system was modeled. Second, a mathematical model suitable for FDI has been derived for the tank system. Third, a new and general FDI procedure has been designed to distinguish process faults and sensor faults. Extensive simulations clearly demonstrated the advantages of the new design.
Fault Mitigation Schemes for Future Spaceflight Multicore Processors
NASA Technical Reports Server (NTRS)
Alexander, James W.; Clement, Bradley J.; Gostelow, Kim P.; Lai, John Y.
2012-01-01
Future planetary exploration missions demand significant advances in on-board computing capabilities over current avionics architectures based on a single-core processing element. The state-of-the-art multi-core processor provides much promise in meeting such challenges while introducing new fault tolerance problems when applied to space missions. Software-based schemes are being presented in this paper that can achieve system-level fault mitigation beyond that provided by radiation-hard-by-design (RHBD). For mission and time critical applications such as the Terrain Relative Navigation (TRN) for planetary or small body navigation, and landing, a range of fault tolerance methods can be adapted by the application. The software methods being investigated include Error Correction Code (ECC) for data packet routing between cores, virtual network routing, Triple Modular Redundancy (TMR), and Algorithm-Based Fault Tolerance (ABFT). A robust fault tolerance framework that provides fail-operational behavior under hard real-time constraints and graceful degradation will be demonstrated using TRN executing on a commercial Tilera(R) processor with simulated fault injections.
NASA Technical Reports Server (NTRS)
Harper, Richard E.; Elks, Carl
1995-01-01
An Army Fault Tolerant Architecture (AFTA) has been developed to meet real-time fault tolerant processing requirements of future Army applications. AFTA is the enabling technology that will allow the Army to configure existing processors and other hardware to provide high throughput and ultrahigh reliability necessary for TF/TA/NOE flight control and other advanced Army applications. A comprehensive conceptual study of AFTA has been completed that addresses a wide range of issues including requirements, architecture, hardware, software, testability, producibility, analytical models, validation and verification, common mode faults, VHDL, and a fault tolerant data bus. A Brassboard AFTA for demonstration and validation has been fabricated, and two operating systems and a flight-critical Army application have been ported to it. Detailed performance measurements have been made of fault tolerance and operating system overheads while AFTA was executing the flight application in the presence of faults.
Fault-tolerant rotary actuator
Tesar, Delbert
2006-10-17
A fault-tolerant actuator module, in a single containment shell, containing two actuator subsystems that are either asymmetrically or symmetrically laid out is provided. Fault tolerance in the actuators of the present invention is achieved by the employment of dual sets of equal resources. Dual resources are integrated into single modules, with each having the external appearance and functionality of a single set of resources.
Fault-tolerant wait-free shared objects
NASA Technical Reports Server (NTRS)
Jayanti, Prasad; Chandra, Tushar D.; Toueg, Sam
1992-01-01
A concurrent system consists of processes communicating via shared objects, such as shared variables, queues, etc. The concept of wait-freedom was introduced to cope with process failures: each process that accesses a wait-free object is guaranteed to get a response even if all the other processes crash. However, if a wait-free object 'crashes,' all the processes that access that object are prevented from making progress. In this paper, we introduce the concept of fault-tolerant wait-free objects, and study the problem of implementing them. We give a universal method to construct fault-tolerant wait-free objects, for all types of 'responsive' failures (including one in which faulty objects may 'lie'). In sharp contrast, we prove that many common and interesting types (such as queues, sets, and test&set) have no fault-tolerant wait-free implementations even under the most benign of the 'non-responsive' types of failure. We also introduce several concepts and techniques that are central to the design of fault-tolerant concurrent systems: the concepts of self-implementation and graceful degradation, and techniques to automatically increase the fault-tolerance of implementations. We prove matching lower bounds on the resource complexity of most of our algorithms.
Sensor fault detection and recovery in satellite attitude control
NASA Astrophysics Data System (ADS)
Nasrolahi, Seiied Saeed; Abdollahi, Farzaneh
2018-04-01
This paper proposes an integrated sensor fault detection and recovery for the satellite attitude control system. By introducing a nonlinear observer, the healthy sensor measurements are provided. Considering attitude dynamics and kinematic, a novel observer is developed to detect the fault in angular rate as well as attitude sensors individually or simultaneously. There is no limit on type and configuration of attitude sensors. By designing a state feedback based control signal and Lyapunov stability criterion, the uniformly ultimately boundedness of tracking errors in the presence of sensor faults is guaranteed. Finally, simulation results are presented to illustrate the performance of the integrated scheme.
Salehifar, Mehdi; Moreno-Equilaz, Manuel
2016-01-01
Due to its fault tolerance, a multiphase brushless direct current (BLDC) motor can meet high reliability demand for application in electric vehicles. The voltage-source inverter (VSI) supplying the motor is subjected to open circuit faults. Therefore, it is necessary to design a fault-tolerant (FT) control algorithm with an embedded fault diagnosis (FD) block. In this paper, finite control set-model predictive control (FCS-MPC) is developed to implement the fault-tolerant control algorithm of a five-phase BLDC motor. The developed control method is fast, simple, and flexible. A FD method based on available information from the control block is proposed; this method is simple, robust to common transients in motor and able to localize multiple open circuit faults. The proposed FD and FT control algorithm are embedded in a five-phase BLDC motor drive. In order to validate the theory presented, simulation and experimental results are conducted on a five-phase two-level VSI supplying a five-phase BLDC motor. Copyright © 2015 ISA. Published by Elsevier Ltd. All rights reserved.
Adding Fault Tolerance to NPB Benchmarks Using ULFM
DOE Office of Scientific and Technical Information (OSTI.GOV)
Parchman, Zachary W; Vallee, Geoffroy R; Naughton III, Thomas J
2016-01-01
In the world of high-performance computing, fault tolerance and application resilience are becoming some of the primary concerns because of increasing hardware failures and memory corruptions. While the research community has been investigating various options, from system-level solutions to application-level solutions, standards such as the Message Passing Interface (MPI) are also starting to include such capabilities. The current proposal for MPI fault tolerant is centered around the User-Level Failure Mitigation (ULFM) concept, which provides means for fault detection and recovery of the MPI layer. This approach does not address application-level recovery, which is currently left to application developers. In thismore » work, we present a mod- ification of some of the benchmarks of the NAS parallel benchmark (NPB) to include support of the ULFM capabilities as well as application-level strategies and mechanisms for application-level failure recovery. As such, we present: (i) an application-level library to checkpoint and restore data, (ii) extensions of NPB benchmarks for fault tolerance based on different strategies, (iii) a fault injection tool, and (iv) some preliminary results that show the impact of such fault tolerant strategies on the application execution.« less
Evaluation of reliability modeling tools for advanced fault tolerant systems
NASA Technical Reports Server (NTRS)
Baker, Robert; Scheper, Charlotte
1986-01-01
The Computer Aided Reliability Estimation (CARE III) and Automated Reliability Interactice Estimation System (ARIES 82) reliability tools for application to advanced fault tolerance aerospace systems were evaluated. To determine reliability modeling requirements, the evaluation focused on the Draper Laboratories' Advanced Information Processing System (AIPS) architecture as an example architecture for fault tolerance aerospace systems. Advantages and limitations were identified for each reliability evaluation tool. The CARE III program was designed primarily for analyzing ultrareliable flight control systems. The ARIES 82 program's primary use was to support university research and teaching. Both CARE III and ARIES 82 were not suited for determining the reliability of complex nodal networks of the type used to interconnect processing sites in the AIPS architecture. It was concluded that ARIES was not suitable for modeling advanced fault tolerant systems. It was further concluded that subject to some limitations (the difficulty in modeling systems with unpowered spare modules, systems where equipment maintenance must be considered, systems where failure depends on the sequence in which faults occurred, and systems where multiple faults greater than a double near coincident faults must be considered), CARE III is best suited for evaluating the reliability of advanced tolerant systems for air transport.
Integral Sensor Fault Detection and Isolation for Railway Traction Drive.
Garramiola, Fernando; Del Olmo, Jon; Poza, Javier; Madina, Patxi; Almandoz, Gaizka
2018-05-13
Due to the increasing importance of reliability and availability of electric traction drives in Railway applications, early detection of faults has become an important key for Railway traction drive manufacturers. Sensor faults are important sources of failures. Among the different fault diagnosis approaches, in this article an integral diagnosis strategy for sensors in traction drives is presented. Such strategy is composed of an observer-based approach for direct current (DC)-link voltage and catenary current sensors, a frequency analysis approach for motor current phase sensors and a hardware redundancy solution for speed sensors. None of them requires any hardware change requirement in the actual traction drive. All the fault detection and isolation approaches have been validated in a Hardware-in-the-loop platform comprising a Real Time Simulator and a commercial Traction Control Unit for a tram. In comparison to safety-critical systems in Aerospace applications, Railway applications do not need instantaneous detection, and the diagnosis is validated in a short time period for reliable decision. Combining the different approaches and existing hardware redundancy, an integral fault diagnosis solution is provided, to detect and isolate faults in all the sensors installed in the traction drive.
Integral Sensor Fault Detection and Isolation for Railway Traction Drive
del Olmo, Jon; Poza, Javier; Madina, Patxi; Almandoz, Gaizka
2018-01-01
Due to the increasing importance of reliability and availability of electric traction drives in Railway applications, early detection of faults has become an important key for Railway traction drive manufacturers. Sensor faults are important sources of failures. Among the different fault diagnosis approaches, in this article an integral diagnosis strategy for sensors in traction drives is presented. Such strategy is composed of an observer-based approach for direct current (DC)-link voltage and catenary current sensors, a frequency analysis approach for motor current phase sensors and a hardware redundancy solution for speed sensors. None of them requires any hardware change requirement in the actual traction drive. All the fault detection and isolation approaches have been validated in a Hardware-in-the-loop platform comprising a Real Time Simulator and a commercial Traction Control Unit for a tram. In comparison to safety-critical systems in Aerospace applications, Railway applications do not need instantaneous detection, and the diagnosis is validated in a short time period for reliable decision. Combining the different approaches and existing hardware redundancy, an integral fault diagnosis solution is provided, to detect and isolate faults in all the sensors installed in the traction drive. PMID:29757251
NASA Technical Reports Server (NTRS)
Litt, Jonathan; Kurtkaya, Mehmet; Duyar, Ahmet
1994-01-01
This paper presents an application of a fault detection and diagnosis scheme for the sensor faults of a helicopter engine. The scheme utilizes a model-based approach with real time identification and hypothesis testing which can provide early detection, isolation, and diagnosis of failures. It is an integral part of a proposed intelligent control system with health monitoring capabilities. The intelligent control system will allow for accommodation of faults, reduce maintenance cost, and increase system availability. The scheme compares the measured outputs of the engine with the expected outputs of an engine whose sensor suite is functioning normally. If the differences between the real and expected outputs exceed threshold values, a fault is detected. The isolation of sensor failures is accomplished through a fault parameter isolation technique where parameters which model the faulty process are calculated on-line with a real-time multivariable parameter estimation algorithm. The fault parameters and their patterns can then be analyzed for diagnostic and accommodation purposes. The scheme is applied to the detection and diagnosis of sensor faults of a T700 turboshaft engine. Sensor failures are induced in a T700 nonlinear performance simulation and data obtained are used with the scheme to detect, isolate, and estimate the magnitude of the faults.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lumsdaine, Andrew
2013-03-08
The main purpose of the Coordinated Infrastructure for Fault Tolerance in Systems initiative has been to conduct research with a goal of providing end-to-end fault tolerance on a systemwide basis for applications and other system software. While fault tolerance has been an integral part of most high-performance computing (HPC) system software developed over the past decade, it has been treated mostly as a collection of isolated stovepipes. Visibility and response to faults has typically been limited to the particular hardware and software subsystems in which they are initially observed. Little fault information is shared across subsystems, allowing little flexibility ormore » control on a system-wide basis, making it practically impossible to provide cohesive end-to-end fault tolerance in support of scientific applications. As an example, consider faults such as communication link failures that can be seen by a network library but are not directly visible to the job scheduler, or consider faults related to node failures that can be detected by system monitoring software but are not inherently visible to the resource manager. If information about such faults could be shared by the network libraries or monitoring software, then other system software, such as a resource manager or job scheduler, could ensure that failed nodes or failed network links were excluded from further job allocations and that further diagnosis could be performed. As a founding member and one of the lead developers of the Open MPI project, our efforts over the course of this project have been focused on making Open MPI more robust to failures by supporting various fault tolerance techniques, and using fault information exchange and coordination between MPI and the HPC system software stack from the application, numeric libraries, and programming language runtime to other common system components such as jobs schedulers, resource managers, and monitoring tools.« less
Tutorial: Advanced fault tree applications using HARP
NASA Technical Reports Server (NTRS)
Dugan, Joanne Bechta; Bavuso, Salvatore J.; Boyd, Mark A.
1993-01-01
Reliability analysis of fault tolerant computer systems for critical applications is complicated by several factors. These modeling difficulties are discussed and dynamic fault tree modeling techniques for handling them are described and demonstrated. Several advanced fault tolerant computer systems are described, and fault tree models for their analysis are presented. HARP (Hybrid Automated Reliability Predictor) is a software package developed at Duke University and NASA Langley Research Center that is capable of solving the fault tree models presented.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gao, Qing, E-mail: qing.gao.chance@gmail.com; Dong, Daoyi, E-mail: daoyidong@gmail.com; Petersen, Ian R., E-mail: i.r.petersen@gmai.com
The purpose of this paper is to solve the fault tolerant filtering and fault detection problem for a class of open quantum systems driven by a continuous-mode bosonic input field in single photon states when the systems are subject to stochastic faults. Optimal estimates of both the system observables and the fault process are simultaneously calculated and characterized by a set of coupled recursive quantum stochastic differential equations.
Final Project Report. Scalable fault tolerance runtime technology for petascale computers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Krishnamoorthy, Sriram; Sadayappan, P
With the massive number of components comprising the forthcoming petascale computer systems, hardware failures will be routinely encountered during execution of large-scale applications. Due to the multidisciplinary, multiresolution, and multiscale nature of scientific problems that drive the demand for high end systems, applications place increasingly differing demands on the system resources: disk, network, memory, and CPU. In addition to MPI, future applications are expected to use advanced programming models such as those developed under the DARPA HPCS program as well as existing global address space programming models such as Global Arrays, UPC, and Co-Array Fortran. While there has been amore » considerable amount of work in fault tolerant MPI with a number of strategies and extensions for fault tolerance proposed, virtually none of advanced models proposed for emerging petascale systems is currently fault aware. To achieve fault tolerance, development of underlying runtime and OS technologies able to scale to petascale level is needed. This project has evaluated range of runtime techniques for fault tolerance for advanced programming models.« less
Full-Authority Fault-Tolerant Electronic Engine Control System for Variable Cycle Engines.
1982-04-01
single internally self-checked VLSI micro - processor . The selected configuration is an externally checked pair of com- mercially available...Electronic Engine Control FPMH Failures per Million Hours FTMP Fault Tolerant Multi- Processor FTSC Fault Tolerant Spaceborn Computer GRAMP Generalized...Removal * MTBR Mean Time Between Repair MTTF Mean Time to Failure xiii List of Abbreviations (continued) - NH High Pressure Rotor Speed O&S Operating
A novel redundant INS based on triple rotary inertial measurement units
NASA Astrophysics Data System (ADS)
Chen, Gang; Li, Kui; Wang, Wei; Li, Peng
2016-10-01
Accuracy and reliability are two key performances of inertial navigation system (INS). Rotation modulation (RM) can attenuate the bias of inertial sensors and make it possible for INS to achieve higher navigation accuracy with lower-class sensors. Therefore, the conflict between the accuracy and cost of INS can be eased. Traditional system redundancy and recently researched sensor redundancy are two primary means to improve the reliability of INS. However, how to make the best use of the redundant information from redundant sensors hasn’t been studied adequately, especially in rotational INS. This paper proposed a novel triple rotary unit strapdown inertial navigation system (TRUSINS), which combines RM and sensor redundancy design to enhance the accuracy and reliability of rotational INS. Each rotary unit independently rotates to modulate the errors of two gyros and two accelerometers. Three units can provide double sets of measurements along all three axes of body frame to constitute a couple of INSs which make TRUSINS redundant. Experiments and simulations based on a prototype which is made up of six fiber-optic gyros with drift stability of 0.05° h-1 show that TRUSINS can achieve positioning accuracy of about 0.256 n mile h-1, which is ten times better than that of a normal non-rotational INS with the same level inertial sensors. The theoretical analysis and the experimental results show that due to the advantage of the innovative structure, the designed fault detection and isolation (FDI) strategy can tolerate six sensor faults at most, and is proved to be effective and practical. Therefore, TRUSINS is particularly suitable and highly beneficial for the applications where high accuracy and high reliability is required.
A Novel Online Data-Driven Algorithm for Detecting UAV Navigation Sensor Faults.
Sun, Rui; Cheng, Qi; Wang, Guanyu; Ochieng, Washington Yotto
2017-09-29
The use of Unmanned Aerial Vehicles (UAVs) has increased significantly in recent years. On-board integrated navigation sensors are a key component of UAVs' flight control systems and are essential for flight safety. In order to ensure flight safety, timely and effective navigation sensor fault detection capability is required. In this paper, a novel data-driven Adaptive Neuron Fuzzy Inference System (ANFIS)-based approach is presented for the detection of on-board navigation sensor faults in UAVs. Contrary to the classic UAV sensor fault detection algorithms, based on predefined or modelled faults, the proposed algorithm combines an online data training mechanism with the ANFIS-based decision system. The main advantages of this algorithm are that it allows real-time model-free residual analysis from Kalman Filter (KF) estimates and the ANFIS to build a reliable fault detection system. In addition, it allows fast and accurate detection of faults, which makes it suitable for real-time applications. Experimental results have demonstrated the effectiveness of the proposed fault detection method in terms of accuracy and misdetection rate.
Sensor Selection for Aircraft Engine Performance Estimation and Gas Path Fault Diagnostics
NASA Technical Reports Server (NTRS)
Simon, Donald L.; Rinehart, Aidan W.
2015-01-01
This paper presents analytical techniques for aiding system designers in making aircraft engine health management sensor selection decisions. The presented techniques, which are based on linear estimation and probability theory, are tailored for gas turbine engine performance estimation and gas path fault diagnostics applications. They enable quantification of the performance estimation and diagnostic accuracy offered by different candidate sensor suites. For performance estimation, sensor selection metrics are presented for two types of estimators including a Kalman filter and a maximum a posteriori estimator. For each type of performance estimator, sensor selection is based on minimizing the theoretical sum of squared estimation errors in health parameters representing performance deterioration in the major rotating modules of the engine. For gas path fault diagnostics, the sensor selection metric is set up to maximize correct classification rate for a diagnostic strategy that performs fault classification by identifying the fault type that most closely matches the observed measurement signature in a weighted least squares sense. Results from the application of the sensor selection metrics to a linear engine model are presented and discussed. Given a baseline sensor suite and a candidate list of optional sensors, an exhaustive search is performed to determine the optimal sensor suites for performance estimation and fault diagnostics. For any given sensor suite, Monte Carlo simulation results are found to exhibit good agreement with theoretical predictions of estimation and diagnostic accuracies.
Sensor Selection for Aircraft Engine Performance Estimation and Gas Path Fault Diagnostics
NASA Technical Reports Server (NTRS)
Simon, Donald L.; Rinehart, Aidan W.
2016-01-01
This paper presents analytical techniques for aiding system designers in making aircraft engine health management sensor selection decisions. The presented techniques, which are based on linear estimation and probability theory, are tailored for gas turbine engine performance estimation and gas path fault diagnostics applications. They enable quantification of the performance estimation and diagnostic accuracy offered by different candidate sensor suites. For performance estimation, sensor selection metrics are presented for two types of estimators including a Kalman filter and a maximum a posteriori estimator. For each type of performance estimator, sensor selection is based on minimizing the theoretical sum of squared estimation errors in health parameters representing performance deterioration in the major rotating modules of the engine. For gas path fault diagnostics, the sensor selection metric is set up to maximize correct classification rate for a diagnostic strategy that performs fault classification by identifying the fault type that most closely matches the observed measurement signature in a weighted least squares sense. Results from the application of the sensor selection metrics to a linear engine model are presented and discussed. Given a baseline sensor suite and a candidate list of optional sensors, an exhaustive search is performed to determine the optimal sensor suites for performance estimation and fault diagnostics. For any given sensor suite, Monte Carlo simulation results are found to exhibit good agreement with theoretical predictions of estimation and diagnostic accuracies.
Fault Diagnosis from Raw Sensor Data Using Deep Neural Networks Considering Temporal Coherence.
Zhang, Ran; Peng, Zhen; Wu, Lifeng; Yao, Beibei; Guan, Yong
2017-03-09
Intelligent condition monitoring and fault diagnosis by analyzing the sensor data can assure the safety of machinery. Conventional fault diagnosis and classification methods usually implement pretreatments to decrease noise and extract some time domain or frequency domain features from raw time series sensor data. Then, some classifiers are utilized to make diagnosis. However, these conventional fault diagnosis approaches suffer from the expertise of feature selection and they do not consider the temporal coherence of time series data. This paper proposes a fault diagnosis model based on Deep Neural Networks (DNN). The model can directly recognize raw time series sensor data without feature selection and signal processing. It also takes advantage of the temporal coherence of the data. Firstly, raw time series training data collected by sensors are used to train the DNN until the cost function of DNN gets the minimal value; Secondly, test data are used to test the classification accuracy of the DNN on local time series data. Finally, fault diagnosis considering temporal coherence with former time series data is implemented. Experimental results show that the classification accuracy of bearing faults can get 100%. The proposed fault diagnosis approach is effective in recognizing the type of bearing faults.
Fault Diagnosis from Raw Sensor Data Using Deep Neural Networks Considering Temporal Coherence
Zhang, Ran; Peng, Zhen; Wu, Lifeng; Yao, Beibei; Guan, Yong
2017-01-01
Intelligent condition monitoring and fault diagnosis by analyzing the sensor data can assure the safety of machinery. Conventional fault diagnosis and classification methods usually implement pretreatments to decrease noise and extract some time domain or frequency domain features from raw time series sensor data. Then, some classifiers are utilized to make diagnosis. However, these conventional fault diagnosis approaches suffer from the expertise of feature selection and they do not consider the temporal coherence of time series data. This paper proposes a fault diagnosis model based on Deep Neural Networks (DNN). The model can directly recognize raw time series sensor data without feature selection and signal processing. It also takes advantage of the temporal coherence of the data. Firstly, raw time series training data collected by sensors are used to train the DNN until the cost function of DNN gets the minimal value; Secondly, test data are used to test the classification accuracy of the DNN on local time series data. Finally, fault diagnosis considering temporal coherence with former time series data is implemented. Experimental results show that the classification accuracy of bearing faults can get 100%. The proposed fault diagnosis approach is effective in recognizing the type of bearing faults. PMID:28282936
Universal fault-tolerant quantum computation with only transversal gates and error correction.
Paetznick, Adam; Reichardt, Ben W
2013-08-30
Transversal implementations of encoded unitary gates are highly desirable for fault-tolerant quantum computation. Though transversal gates alone cannot be computationally universal, they can be combined with specially distilled resource states in order to achieve universality. We show that "triorthogonal" stabilizer codes, introduced for state distillation by Bravyi and Haah [Phys. Rev. A 86, 052329 (2012)], admit transversal implementation of the controlled-controlled-Z gate. We then construct a universal set of fault-tolerant gates without state distillation by using only transversal controlled-controlled-Z, transversal Hadamard, and fault-tolerant error correction. We also adapt the distillation procedure of Bravyi and Haah to Toffoli gates, improving on existing Toffoli distillation schemes.
Parameter Transient Behavior Analysis on Fault Tolerant Control System
NASA Technical Reports Server (NTRS)
Belcastro, Christine (Technical Monitor); Shin, Jong-Yeob
2003-01-01
In a fault tolerant control (FTC) system, a parameter varying FTC law is reconfigured based on fault parameters estimated by fault detection and isolation (FDI) modules. FDI modules require some time to detect fault occurrences in aero-vehicle dynamics. This paper illustrates analysis of a FTC system based on estimated fault parameter transient behavior which may include false fault detections during a short time interval. Using Lyapunov function analysis, the upper bound of an induced-L2 norm of the FTC system performance is calculated as a function of a fault detection time and the exponential decay rate of the Lyapunov function.
Fault detection and isolation in motion monitoring system.
Kim, Duk-Jin; Suk, Myoung Hoon; Prabhakaran, B
2012-01-01
Pervasive computing becomes very active research field these days. A watch that can trace human movement to record motion boundary as well as to study of finding social life pattern by one's localized visiting area. Pervasive computing also helps patient monitoring. A daily monitoring system helps longitudinal study of patient monitoring such as Alzheimer's and Parkinson's or obesity monitoring. Due to the nature of monitoring sensor (on-body wireless sensor), however, signal noise or faulty sensors errors can be present at any time. Many research works have addressed these problems any with a large amount of sensor deployment. In this paper, we present the faulty sensor detection and isolation using only two on-body sensors. We have been investigating three different types of sensor errors: the SHORT error, the CONSTANT error, and the NOISY SENSOR error (see more details on section V). Our experimental results show that the success rate of isolating faulty signals are an average of over 91.5% on fault type 1, over 92% on fault type 2, and over 99% on fault type 3 with the fault prior of 30% sensor errors.
Fault Tolerance Middleware for a Multi-Core System
NASA Technical Reports Server (NTRS)
Some, Raphael R.; Springer, Paul L.; Zima, Hans P.; James, Mark; Wagner, David A.
2012-01-01
Fault Tolerance Middleware (FTM) provides a framework to run on a dedicated core of a multi-core system and handles detection of single-event upsets (SEUs), and the responses to those SEUs, occurring in an application running on multiple cores of the processor. This software was written expressly for a multi-core system and can support different kinds of fault strategies, such as introspection, algorithm-based fault tolerance (ABFT), and triple modular redundancy (TMR). It focuses on providing fault tolerance for the application code, and represents the first step in a plan to eventually include fault tolerance in message passing and the FTM itself. In the multi-core system, the FTM resides on a single, dedicated core, separate from the cores used by the application. This is done in order to isolate the FTM from application faults and to allow it to swap out any application core for a substitute. The structure of the FTM consists of an interface to a fault tolerant strategy module, a responder module, a fault manager module, an error factory, and an error mapper that determines the severity of the error. In the present reference implementation, the only fault tolerant strategy implemented is introspection. The introspection code waits for an application node to send an error notification to it. It then uses the error factory to create an error object, and at this time, a severity level is assigned to the error. The introspection code uses its built-in knowledge base to generate a recommended response to the error. Responses might include ignoring the error, logging it, rolling back the application to a previously saved checkpoint, swapping in a new node to replace a bad one, or restarting the application. The original error and recommended response are passed to the top-level fault manager module, which invokes the response. The responder module also notifies the introspection module of the generated response. This provides additional information to the introspection module that it can use in generating its next response. For example, if the responder triggers an application rollback and errors are still occurring, the introspection module may decide to recommend an application restart.
Jeon, Namju; Lee, Hyeongcheol
2016-12-12
An integrated fault-diagnosis algorithm for a motor sensor of in-wheel independent drive electric vehicles is presented. This paper proposes a method that integrates the high- and low-level fault diagnoses to improve the robustness and performance of the system. For the high-level fault diagnosis of vehicle dynamics, a planar two-track non-linear model is first selected, and the longitudinal and lateral forces are calculated. To ensure redundancy of the system, correlation between the sensor and residual in the vehicle dynamics is analyzed to detect and separate the fault of the drive motor system of each wheel. To diagnose the motor system for low-level faults, the state equation of an interior permanent magnet synchronous motor is developed, and a parity equation is used to diagnose the fault of the electric current and position sensors. The validity of the high-level fault-diagnosis algorithm is verified using Carsim and Matlab/Simulink co-simulation. The low-level fault diagnosis is verified through Matlab/Simulink simulation and experiments. Finally, according to the residuals of the high- and low-level fault diagnoses, fault-detection flags are defined. On the basis of this information, an integrated fault-diagnosis strategy is proposed.
Discretized Streams: A Fault-Tolerant Model for Scalable Stream Processing
2012-12-14
Discretized Streams: A Fault-Tolerant Model for Scalable Stream Processing Matei Zaharia Tathagata Das Haoyuan Li Timothy Hunter Scott Shenker Ion...SUBTITLE Discretized Streams: A Fault-Tolerant Model for Scalable Stream Processing 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER...time. However, current programming models for distributed stream processing are relatively low-level often leaving the user to worry about consistency of
Advanced development for space robotics with emphasis on fault tolerance
NASA Technical Reports Server (NTRS)
Tesar, D.; Chladek, J.; Hooper, R.; Sreevijayan, D.; Kapoor, C.; Geisinger, J.; Meaney, M.; Browning, G.; Rackers, K.
1995-01-01
This paper describes the ongoing work in fault tolerance at the University of Texas at Austin. The paper describes the technical goals the group is striving to achieve and includes a brief description of the individual projects focusing on fault tolerance. The ultimate goal is to develop and test technology applicable to all future missions of NASA (lunar base, Mars exploration, planetary surveillance, space station, etc.).
NASA Technical Reports Server (NTRS)
Kobayashi, Takahisa; Simon, Donald L.
2005-01-01
In-flight sensor fault detection and isolation (FDI) is critical to maintaining reliable engine operation during flight. The aircraft engine control system, which computes control commands on the basis of sensor measurements, operates the propulsion systems at the demanded conditions. Any undetected sensor faults, therefore, may cause the control system to drive the engine into an undesirable operating condition. It is critical to detect and isolate failed sensors as soon as possible so that such scenarios can be avoided. A challenging issue in developing reliable sensor FDI systems is to make them robust to changes in engine operating characteristics due to degradation with usage and other faults that can occur during flight. A sensor FDI system that cannot appropriately account for such scenarios may result in false alarms, missed detections, or misclassifications when such faults do occur. To address this issue, an enhanced bank of Kalman filters was developed, and its performance and robustness were demonstrated in a simulation environment. The bank of filters is composed of m + 1 Kalman filters, where m is the number of sensors being used by the control system and, thus, in need of monitoring. Each Kalman filter is designed on the basis of a unique fault hypothesis so that it will be able to maintain its performance if a particular fault scenario, hypothesized by that particular filter, takes place.
Mekki, Hemza; Benzineb, Omar; Boukhetala, Djamel; Tadjine, Mohamed; Benbouzid, Mohamed
2015-07-01
The fault-tolerant control problem belongs to the domain of complex control systems in which inter-control-disciplinary information and expertise are required. This paper proposes an improved faults detection, reconstruction and fault-tolerant control (FTC) scheme for motor systems (MS) with typical faults. For this purpose, a sliding mode controller (SMC) with an integral sliding surface is adopted. This controller can make the output of system to track the desired position reference signal in finite-time and obtain a better dynamic response and anti-disturbance performance. But this controller cannot deal directly with total system failures. However an appropriate combination of the adopted SMC and sliding mode observer (SMO), later it is designed to on-line detect and reconstruct the faults and also to give a sensorless control strategy which can achieve tolerance to a wide class of total additive failures. The closed-loop stability is proved, using the Lyapunov stability theory. Simulation results in healthy and faulty conditions confirm the reliability of the suggested framework. Copyright © 2015 ISA. Published by Elsevier Ltd. All rights reserved.
SABRE: a bio-inspired fault-tolerant electronic architecture.
Bremner, P; Liu, Y; Samie, M; Dragffy, G; Pipe, A G; Tempesti, G; Timmis, J; Tyrrell, A M
2013-03-01
As electronic devices become increasingly complex, ensuring their reliable, fault-free operation is becoming correspondingly more challenging. It can be observed that, in spite of their complexity, biological systems are highly reliable and fault tolerant. Hence, we are motivated to take inspiration for biological systems in the design of electronic ones. In SABRE (self-healing cellular architectures for biologically inspired highly reliable electronic systems), we have designed a bio-inspired fault-tolerant hierarchical architecture for this purpose. As in biology, the foundation for the whole system is cellular in nature, with each cell able to detect faults in its operation and trigger intra-cellular or extra-cellular repair as required. At the next level in the hierarchy, arrays of cells are configured and controlled as function units in a transport triggered architecture (TTA), which is able to perform partial-dynamic reconfiguration to rectify problems that cannot be solved at the cellular level. Each TTA is, in turn, part of a larger multi-processor system which employs coarser grain reconfiguration to tolerate faults that cause a processor to fail. In this paper, we describe the details of operation of each layer of the SABRE hierarchy, and how these layers interact to provide a high systemic level of fault tolerance.
Fault tolerance in computational grids: perspectives, challenges, and issues.
Haider, Sajjad; Nazir, Babar
2016-01-01
Computational grids are established with the intention of providing shared access to hardware and software based resources with special reference to increased computational capabilities. Fault tolerance is one of the most important issues faced by the computational grids. The main contribution of this survey is the creation of an extended classification of problems that incur in the computational grid environments. The proposed classification will help researchers, developers, and maintainers of grids to understand the types of issues to be anticipated. Moreover, different types of problems, such as omission, interaction, and timing related have been identified that need to be handled on various layers of the computational grid. In this survey, an analysis and examination is also performed pertaining to the fault tolerance and fault detection mechanisms. Our conclusion is that a dependable and reliable grid can only be established when more emphasis is on fault identification. Moreover, our survey reveals that adaptive and intelligent fault identification, and tolerance techniques can improve the dependability of grid working environments.
Farrington, R.B.; Pruett, J.C. Jr.
1984-05-14
A fault detecting apparatus and method are provided for use with an active solar system. The apparatus provides an indication as to whether one or more predetermined faults have occurred in the solar system. The apparatus includes a plurality of sensors, each sensor being used in determining whether a predetermined condition is present. The outputs of the sensors are combined in a pre-established manner in accordance with the kind of predetermined faults to be detected. Indicators communicate with the outputs generated by combining the sensor outputs to give the user of the solar system and the apparatus an indication as to whether a predetermined fault has occurred. Upon detection and indication of any predetermined fault, the user can take appropriate corrective action so that the overall reliability and efficiency of the active solar system are increased.
Farrington, Robert B.; Pruett, Jr., James C.
1986-01-01
A fault detecting apparatus and method are provided for use with an active solar system. The apparatus provides an indication as to whether one or more predetermined faults have occurred in the solar system. The apparatus includes a plurality of sensors, each sensor being used in determining whether a predetermined condition is present. The outputs of the sensors are combined in a pre-established manner in accordance with the kind of predetermined faults to be detected. Indicators communicate with the outputs generated by combining the sensor outputs to give the user of the solar system and the apparatus an indication as to whether a predetermined fault has occurred. Upon detection and indication of any predetermined fault, the user can take appropriate corrective action so that the overall reliability and efficiency of the active solar system are increased.
Robust In-Flight Sensor Fault Diagnostics for Aircraft Engine Based on Sliding Mode Observers
Chang, Xiaodong; Huang, Jinquan; Lu, Feng
2017-01-01
For a sensor fault diagnostic system of aircraft engines, the health performance degradation is an inevitable interference that cannot be neglected. To address this issue, this paper investigates an integrated on-line sensor fault diagnostic scheme for a commercial aircraft engine based on a sliding mode observer (SMO). In this approach, one sliding mode observer is designed for engine health performance tracking, and another for sensor fault reconstruction. Both observers are employed in in-flight applications. The results of the former SMO are analyzed for post-flight updating the baseline model of the latter. This idea is practical and feasible since the updating process does not require the algorithm to be regulated or redesigned, so that ground-based intervention is avoided, and the update process is implemented in an economical and efficient way. With this setup, the robustness of the proposed scheme to the health degradation is much enhanced and the latter SMO is able to fulfill sensor fault reconstruction over the course of the engine life. The proposed sensor fault diagnostic system is applied to a nonlinear simulation of a commercial aircraft engine, and its effectiveness is evaluated in several fault scenarios. PMID:28398255
Robust In-Flight Sensor Fault Diagnostics for Aircraft Engine Based on Sliding Mode Observers.
Chang, Xiaodong; Huang, Jinquan; Lu, Feng
2017-04-11
For a sensor fault diagnostic system of aircraft engines, the health performance degradation is an inevitable interference that cannot be neglected. To address this issue, this paper investigates an integrated on-line sensor fault diagnostic scheme for a commercial aircraft engine based on a sliding mode observer (SMO). In this approach, one sliding mode observer is designed for engine health performance tracking, and another for sensor fault reconstruction. Both observers are employed in in-flight applications. The results of the former SMO are analyzed for post-flight updating the baseline model of the latter. This idea is practical and feasible since the updating process does not require the algorithm to be regulated or redesigned, so that ground-based intervention is avoided, and the update process is implemented in an economical and efficient way. With this setup, the robustness of the proposed scheme to the health degradation is much enhanced and the latter SMO is able to fulfill sensor fault reconstruction over the course of the engine life. The proposed sensor fault diagnostic system is applied to a nonlinear simulation of a commercial aircraft engine, and its effectiveness is evaluated in several fault scenarios.
Software Testbed for Developing and Evaluating Integrated Autonomous Subsystems
NASA Technical Reports Server (NTRS)
Ong, James; Remolina, Emilio; Prompt, Axel; Robinson, Peter; Sweet, Adam; Nishikawa, David
2015-01-01
To implement fault tolerant autonomy in future space systems, it will be necessary to integrate planning, adaptive control, and state estimation subsystems. However, integrating these subsystems is difficult, time-consuming, and error-prone. This paper describes Intelliface/ADAPT, a software testbed that helps researchers develop and test alternative strategies for integrating planning, execution, and diagnosis subsystems more quickly and easily. The testbed's architecture, graphical data displays, and implementations of the integrated subsystems support easy plug and play of alternate components to support research and development in fault-tolerant control of autonomous vehicles and operations support systems. Intelliface/ADAPT controls NASA's Advanced Diagnostics and Prognostics Testbed (ADAPT), which comprises batteries, electrical loads (fans, pumps, and lights), relays, circuit breakers, invertors, and sensors. During plan execution, an experimentor can inject faults into the ADAPT testbed by tripping circuit breakers, changing fan speed settings, and closing valves to restrict fluid flow. The diagnostic subsystem, based on NASA's Hybrid Diagnosis Engine (HyDE), detects and isolates these faults to determine the new state of the plant, ADAPT. Intelliface/ADAPT then updates its model of the ADAPT system's resources and determines whether the current plan can be executed using the reduced resources. If not, the planning subsystem generates a new plan that reschedules tasks, reconfigures ADAPT, and reassigns the use of ADAPT resources as needed to work around the fault. The resource model, planning domain model, and planning goals are expressed using NASA's Action Notation Modeling Language (ANML). Parts of the ANML model are generated automatically, and other parts are constructed by hand using the Planning Model Integrated Development Environment, a visual Eclipse-based IDE that accelerates ANML model development. Because native ANML planners are currently under development and not yet sufficiently capable, the ANML model is translated into the New Domain Definition Language (NDDL) and sent to NASA's EUROPA planning system for plan generation. The adaptive controller executes the new plan, using augmented, hierarchical finite state machines to select and sequence actions based on the state of the ADAPT system. Real-time sensor data, commands, and plans are displayed in information-dense arrays of timelines and graphs that zoom and scroll in unison. A dynamic schematic display uses color to show the real-time fault state and utilization of the system components and resources. An execution manager coordinates the activities of the other subsystems. The subsystems are integrated using the Internet Communications Engine (ICE). an object-oriented toolkit for building distributed applications.
Close range fault tolerant noncontacting position sensor
Bingham, D.N.; Anderson, A.A.
1996-02-20
A method and system are disclosed for locating the three dimensional coordinates of a moving or stationary object in real time. The three dimensional coordinates of an object in half space or full space are determined based upon the time of arrival or phase of the wave front measured by a plurality of receiver elements and an established vector magnitudes proportional to the measured time of arrival or phase at each receiver element. The coordinates of the object are calculated by solving a matrix equation or a set of closed form algebraic equations. 3 figs.
NASA Technical Reports Server (NTRS)
1990-01-01
The present conference on digital avionics discusses vehicle-management systems, spacecraft avionics, special vehicle avionics, communication/navigation/identification systems, software qualification and quality assurance, launch-vehicle avionics, Ada applications, sensor and signal processing, general aviation avionics, automated software development, design-for-testability techniques, and avionics-software engineering. Also discussed are optical technology and systems, modular avionics, fault-tolerant avionics, commercial avionics, space systems, data buses, crew-station technology, embedded processors and operating systems, AI and expert systems, data links, and pilot/vehicle interfaces.
NASA Technical Reports Server (NTRS)
Miner, Paul S.
1993-01-01
A critical function in a fault-tolerant computer architecture is the synchronization of the redundant computing elements. The synchronization algorithm must include safeguards to ensure that failed components do not corrupt the behavior of good clocks. Reasoning about fault-tolerant clock synchronization is difficult because of the possibility of subtle interactions involving failed components. Therefore, mechanical proof systems are used to ensure that the verification of the synchronization system is correct. In 1987, Schneider presented a general proof of correctness for several fault-tolerant clock synchronization algorithms. Subsequently, Shankar verified Schneider's proof by using the mechanical proof system EHDM. This proof ensures that any system satisfying its underlying assumptions will provide Byzantine fault-tolerant clock synchronization. The utility of Shankar's mechanization of Schneider's theory for the verification of clock synchronization systems is explored. Some limitations of Shankar's mechanically verified theory were encountered. With minor modifications to the theory, a mechanically checked proof is provided that removes these limitations. The revised theory also allows for proven recovery from transient faults. Use of the revised theory is illustrated with the verification of an abstract design of a clock synchronization system.
Experiments in fault tolerant software reliability
NASA Technical Reports Server (NTRS)
Mcallister, David F.; Tai, K. C.; Vouk, Mladen A.
1987-01-01
The reliability of voting was evaluated in a fault-tolerant software system for small output spaces. The effectiveness of the back-to-back testing process was investigated. Version 3.0 of the RSDIMU-ATS, a semi-automated test bed for certification testing of RSDIMU software, was prepared and distributed. Software reliability estimation methods based on non-random sampling are being studied. The investigation of existing fault-tolerance models was continued and formulation of new models was initiated.
Fault-tolerant onboard digital information switching and routing for communications satellites
NASA Technical Reports Server (NTRS)
Shalkhauser, Mary JO; Quintana, Jorge A.; Soni, Nitin J.; Kim, Heechul
1993-01-01
The NASA Lewis Research Center is developing an information-switching processor for future meshed very-small-aperture terminal (VSAT) communications satellites. The information-switching processor will switch and route baseband user data onboard the VSAT satellite to connect thousands of Earth terminals. Fault tolerance is a critical issue in developing information-switching processor circuitry that will provide and maintain reliable communications services. In parallel with the conceptual development of the meshed VSAT satellite network architecture, NASA designed and built a simple test bed for developing and demonstrating baseband switch architectures and fault-tolerance techniques. The meshed VSAT architecture and the switching demonstration test bed are described, and the initial switching architecture and the fault-tolerance techniques that were developed and tested are discussed.
Intelligent fault-tolerant controllers
NASA Technical Reports Server (NTRS)
Huang, Chien Y.
1987-01-01
A system with fault tolerant controls is one that can detect, isolate, and estimate failures and perform necessary control reconfiguration based on this new information. Artificial intelligence (AI) is concerned with semantic processing, and it has evolved to include the topics of expert systems and machine learning. This research represents an attempt to apply AI to fault tolerant controls, hence, the name intelligent fault tolerant control (IFTC). A generic solution to the problem is sought, providing a system based on logic in addition to analytical tools, and offering machine learning capabilities. The advantages are that redundant system specific algorithms are no longer needed, that reasonableness is used to quickly choose the correct control strategy, and that the system can adapt to new situations by learning about its effects on system dynamics.
Hybrid routing technique for a fault-tolerant, integrated information network
NASA Technical Reports Server (NTRS)
Meredith, B. D.
1986-01-01
The evolutionary growth of the space station and the diverse activities onboard are expected to require a hierarchy of integrated, local area networks capable of supporting data, voice, and video communications. In addition, fault-tolerant network operation is necessary to protect communications between critical systems attached to the net and to relieve the valuable human resources onboard the space station of time-critical data system repair tasks. A key issue for the design of the fault-tolerant, integrated network is the development of a robust routing algorithm which dynamically selects the optimum communication paths through the net. A routing technique is described that adapts to topological changes in the network to support fault-tolerant operation and system evolvability.
NASA Astrophysics Data System (ADS)
Cheng, Xiang-Qin; Qu, Jing-Yuan; Yan, Zhe-Ping; Bian, Xin-Qian
2010-03-01
In order to improve the security and reliability for autonomous underwater vehicle (AUV) navigation, an H∞ robust fault-tolerant controller was designed after analyzing variations in state-feedback gain. Operating conditions and the design method were then analyzed so that the control problem could be expressed as a mathematical optimization problem. This permitted the use of linear matrix inequalities (LMI) to solve for the H∞ controller for the system. When considering different actuator failures, these conditions were then also mathematically expressed, allowing the H∞ robust controller to solve for these events and thus be fault-tolerant. Finally, simulation results showed that the H∞ robust fault-tolerant controller could provide precise AUV navigation control with strong robustness.
Provable Transient Recovery for Frame-Based, Fault-Tolerant Computing Systems
NASA Technical Reports Server (NTRS)
DiVito, Ben L.; Butler, Ricky W.
1992-01-01
We present a formal verification of the transient fault recovery aspects of the Reliable Computing Platform (RCP), a fault-tolerant computing system architecture for digital flight control applications. The RCP uses NMR-style redundancy to mask faults and internal majority voting to purge the effects of transient faults. The system design has been formally specified and verified using the EHDM verification system. Our formalization accommodates a wide variety of voting schemes for purging the effects of transients.
NASA Technical Reports Server (NTRS)
Harper, R. E.; Alger, L. S.; Babikyan, C. A.; Butler, B. P.; Friend, S. A.; Ganska, R. J.; Lala, J. H.; Masotto, T. K.; Meyer, A. J.; Morton, D. P.
1992-01-01
Described here is the Army Fault Tolerant Architecture (AFTA) hardware architecture and components and the operating system. The architectural and operational theory of the AFTA Fault Tolerant Data Bus is discussed. The test and maintenance strategy developed for use in fielded AFTA installations is presented. An approach to be used in reducing the probability of AFTA failure due to common mode faults is described. Analytical models for AFTA performance, reliability, availability, life cycle cost, weight, power, and volume are developed. An approach is presented for using VHSIC Hardware Description Language (VHDL) to describe and design AFTA's developmental hardware. A plan is described for verifying and validating key AFTA concepts during the Dem/Val phase. Analytical models and partial mission requirements are used to generate AFTA configurations for the TF/TA/NOE and Ground Vehicle missions.
Hao, Li-Ying; Park, Ju H; Ye, Dan
2017-09-01
In this paper, a new robust fault-tolerant compensation control method for uncertain linear systems over networks is proposed, where only quantized signals are assumed to be available. This approach is based on the integral sliding mode (ISM) method where two kinds of integral sliding surfaces are constructed. One is the continuous-state-dependent surface with the aim of sliding mode stability analysis and the other is the quantization-state-dependent surface, which is used for ISM controller design. A scheme that combines the adaptive ISM controller and quantization parameter adjustment strategy is then proposed. Through utilizing H ∞ control analytical technique, once the system is in the sliding mode, the nature of performing disturbance attenuation and fault tolerance from the initial time can be found without requiring any fault information. Finally, the effectiveness of our proposed ISM control fault-tolerant schemes against quantization errors is demonstrated in the simulation.
Development and analysis of the Software Implemented Fault-Tolerance (SIFT) computer
NASA Technical Reports Server (NTRS)
Goldberg, J.; Kautz, W. H.; Melliar-Smith, P. M.; Green, M. W.; Levitt, K. N.; Schwartz, R. L.; Weinstock, C. B.
1984-01-01
SIFT (Software Implemented Fault Tolerance) is an experimental, fault-tolerant computer system designed to meet the extreme reliability requirements for safety-critical functions in advanced aircraft. Errors are masked by performing a majority voting operation over the results of identical computations, and faulty processors are removed from service by reassigning computations to the nonfaulty processors. This scheme has been implemented in a special architecture using a set of standard Bendix BDX930 processors, augmented by a special asynchronous-broadcast communication interface that provides direct, processor to processor communication among all processors. Fault isolation is accomplished in hardware; all other fault-tolerance functions, together with scheduling and synchronization are implemented exclusively by executive system software. The system reliability is predicted by a Markov model. Mathematical consistency of the system software with respect to the reliability model has been partially verified, using recently developed tools for machine-aided proof of program correctness.
A fault tolerant gait for a hexapod robot over uneven terrain.
Yang, J M; Kim, J H
2000-01-01
The fault tolerant gait of legged robots in static walking is a gait which maintains its stability against a fault event preventing a leg from having the support state. In this paper, a fault tolerant quadruped gait is proposed for a hexapod traversing uneven terrain with forbidden regions, which do not offer viable footholds but can be stepped over. By comparing performance of straight-line motion and crab walking over even terrain, it is shown that the proposed gait has better mobility and terrain adaptability than previously developed gaits. Based on the proposed gait, we present a method for the generation of the fault tolerant locomotion of a hexapod over uneven terrain with forbidden regions. The proposed method minimizes the number of legs on the ground during walking, and foot adjustment algorithm is used for avoiding steps on forbidden regions. The effectiveness of the proposed strategy over uneven terrain is demonstrated with a computer simulation.
NASA Tech Briefs, October 2011
NASA Technical Reports Server (NTRS)
2011-01-01
Topics covered include: Laser Truss Sensor for Segmented Telescope Phasing; Qualifications of Bonding Process of Temperature Sensors to Deep-Space Missions; Optical Sensors for Monitoring Gamma and Neutron Radiation; Compliant Tactile Sensors; Cytometer on a Chip; Measuring Input Thresholds on an Existing Board; Scanning and Defocusing Properties of Microstrip Reflectarray Antennas; Cable Tester Box; Programmable Oscillator; Fault-Tolerant, Radiation-Hard DSP; Sub-Shot Noise Power Source for Microelectronics; Asynchronous Message Service Reference Implementation; Zero-Copy Objects System; Delay and Disruption Tolerant Networking MACHETE Model; Contact Graph Routing; Parallel Eclipse Project Checkout; Technique for Configuring an Actively Cooled Thermal Shield in a Flight System; Use of Additives to Improve Performance of Methyl Butyrate-Based Lithium-Ion Electrolytes; Li-Ion Cells Employing Electrolytes with Methyl Propionate and Ethyl Butyrate Co-Solvents; Improved Devices for Collecting Sweat for Chemical Analysis; Tissue Photolithography; Method for Impeding Degradation of Porous Silicon Structures; External Cooling Coupled to Reduced Extremity Pressure Device; A Zero-Gravity Cup for Drinking Beverages in Microgravity; Co-Flow Hollow Cathode Technology; Programmable Aperture with MEMS Microshutter Arrays; Polished Panel Optical Receiver for Simultaneous RF/Optical Telemetry with Large DSN Antennas; Adaptive System Modeling for Spacecraft Simulation; Lidar-Based Navigation Algorithm for Safe Lunar Landing; Tracking Object Existence From an Autonomous Patrol Vehicle; Rad-Hard, Miniaturized, Scalable, High-Voltage Switching Module for Power Applications; and Architecture for a 1-GHz Digital RADAR.
Smart Sensor for Online Detection of Multiple-Combined Faults in VSD-Fed Induction Motors
Garcia-Ramirez, Armando G.; Osornio-Rios, Roque A.; Granados-Lieberman, David; Garcia-Perez, Arturo; Romero-Troncoso, Rene J.
2012-01-01
Induction motors fed through variable speed drives (VSD) are widely used in different industrial processes. Nowadays, the industry demands the integration of smart sensors to improve the fault detection in order to reduce cost, maintenance and power consumption. Induction motors can develop one or more faults at the same time that can be produce severe damages. The combined fault identification in induction motors is a demanding task, but it has been rarely considered in spite of being a common situation, because it is difficult to identify two or more faults simultaneously. This work presents a smart sensor for online detection of simple and multiple-combined faults in induction motors fed through a VSD in a wide frequency range covering low frequencies from 3 Hz and high frequencies up to 60 Hz based on a primary sensor being a commercially available current clamp or a hall-effect sensor. The proposed smart sensor implements a methodology based on the fast Fourier transform (FFT), RMS calculation and artificial neural networks (ANN), which are processed online using digital hardware signal processing based on field programmable gate array (FPGA).
Algorithm-Based Fault Tolerance Integrated with Replication
NASA Technical Reports Server (NTRS)
Some, Raphael; Rennels, David
2008-01-01
In a proposed approach to programming and utilization of commercial off-the-shelf computing equipment, a combination of algorithm-based fault tolerance (ABFT) and replication would be utilized to obtain high degrees of fault tolerance without incurring excessive costs. The basic idea of the proposed approach is to integrate ABFT with replication such that the algorithmic portions of computations would be protected by ABFT, and the logical portions by replication. ABFT is an extremely efficient, inexpensive, high-coverage technique for detecting and mitigating faults in computer systems used for algorithmic computations, but does not protect against errors in logical operations surrounding algorithms.
Detection of faults and software reliability analysis
NASA Technical Reports Server (NTRS)
Knight, J. C.
1986-01-01
Multiversion or N-version programming was proposed as a method of providing fault tolerance in software. The approach requires the separate, independent preparation of multiple versions of a piece of software for some application. Specific topics addressed are: failure probabilities in N-version systems, consistent comparison in N-version systems, descriptions of the faults found in the Knight and Leveson experiment, analytic models of comparison testing, characteristics of the input regions that trigger faults, fault tolerance through data diversity, and the relationship between failures caused by automatically seeded faults.
Jeon, Namju; Lee, Hyeongcheol
2016-01-01
An integrated fault-diagnosis algorithm for a motor sensor of in-wheel independent drive electric vehicles is presented. This paper proposes a method that integrates the high- and low-level fault diagnoses to improve the robustness and performance of the system. For the high-level fault diagnosis of vehicle dynamics, a planar two-track non-linear model is first selected, and the longitudinal and lateral forces are calculated. To ensure redundancy of the system, correlation between the sensor and residual in the vehicle dynamics is analyzed to detect and separate the fault of the drive motor system of each wheel. To diagnose the motor system for low-level faults, the state equation of an interior permanent magnet synchronous motor is developed, and a parity equation is used to diagnose the fault of the electric current and position sensors. The validity of the high-level fault-diagnosis algorithm is verified using Carsim and Matlab/Simulink co-simulation. The low-level fault diagnosis is verified through Matlab/Simulink simulation and experiments. Finally, according to the residuals of the high- and low-level fault diagnoses, fault-detection flags are defined. On the basis of this information, an integrated fault-diagnosis strategy is proposed. PMID:27973431
Energy-efficient fault tolerance in multiprocessor real-time systems
NASA Astrophysics Data System (ADS)
Guo, Yifeng
The recent progress in the multiprocessor/multicore systems has important implications for real-time system design and operation. From vehicle navigation to space applications as well as industrial control systems, the trend is to deploy multiple processors in real-time systems: systems with 4 -- 8 processors are common, and it is expected that many-core systems with dozens of processing cores will be available in near future. For such systems, in addition to general temporal requirement common for all real-time systems, two additional operational objectives are seen as critical: energy efficiency and fault tolerance. An intriguing dimension of the problem is that energy efficiency and fault tolerance are typically conflicting objectives, due to the fact that tolerating faults (e.g., permanent/transient) often requires extra resources with high energy consumption potential. In this dissertation, various techniques for energy-efficient fault tolerance in multiprocessor real-time systems have been investigated. First, the Reliability-Aware Power Management (RAPM) framework, which can preserve the system reliability with respect to transient faults when Dynamic Voltage Scaling (DVS) is applied for energy savings, is extended to support parallel real-time applications with precedence constraints. Next, the traditional Standby-Sparing (SS) technique for dual processor systems, which takes both transient and permanent faults into consideration while saving energy, is generalized to support multiprocessor systems with arbitrary number of identical processors. Observing the inefficient usage of slack time in the SS technique, a Preference-Oriented Scheduling Framework is designed to address the problem where tasks are given preferences for being executed as soon as possible (ASAP) or as late as possible (ALAP). A preference-oriented earliest deadline (POED) scheduler is proposed and its application in multiprocessor systems for energy-efficient fault tolerance is investigated, where tasks' main copies are executed ASAP while backup copies ALAP to reduce the overlapped execution of main and backup copies of the same task and thus reduce energy consumption. All proposed techniques are evaluated through extensive simulations and compared with other state-of-the-art approaches. The simulation results confirm that the proposed schemes can preserve the system reliability while still achieving substantial energy savings. Finally, for both SS and POED based Energy-Efficient Fault-Tolerant (EEFT) schemes, a series of recovery strategies are designed when more than one (transient and permanent) faults need to be tolerated.
NASA Astrophysics Data System (ADS)
Lidar, Daniel A.; Brun, Todd A.
2013-09-01
Prologue; Preface; Part I. Background: 1. Introduction to decoherence and noise in open quantum systems Daniel Lidar and Todd Brun; 2. Introduction to quantum error correction Dave Bacon; 3. Introduction to decoherence-free subspaces and noiseless subsystems Daniel Lidar; 4. Introduction to quantum dynamical decoupling Lorenza Viola; 5. Introduction to quantum fault tolerance Panos Aliferis; Part II. Generalized Approaches to Quantum Error Correction: 6. Operator quantum error correction David Kribs and David Poulin; 7. Entanglement-assisted quantum error-correcting codes Todd Brun and Min-Hsiu Hsieh; 8. Continuous-time quantum error correction Ognyan Oreshkov; Part III. Advanced Quantum Codes: 9. Quantum convolutional codes Mark Wilde; 10. Non-additive quantum codes Markus Grassl and Martin Rötteler; 11. Iterative quantum coding systems David Poulin; 12. Algebraic quantum coding theory Andreas Klappenecker; 13. Optimization-based quantum error correction Andrew Fletcher; Part IV. Advanced Dynamical Decoupling: 14. High order dynamical decoupling Zhen-Yu Wang and Ren-Bao Liu; 15. Combinatorial approaches to dynamical decoupling Martin Rötteler and Pawel Wocjan; Part V. Alternative Quantum Computation Approaches: 16. Holonomic quantum computation Paolo Zanardi; 17. Fault tolerance for holonomic quantum computation Ognyan Oreshkov, Todd Brun and Daniel Lidar; 18. Fault tolerant measurement-based quantum computing Debbie Leung; Part VI. Topological Methods: 19. Topological codes Héctor Bombín; 20. Fault tolerant topological cluster state quantum computing Austin Fowler and Kovid Goyal; Part VII. Applications and Implementations: 21. Experimental quantum error correction Dave Bacon; 22. Experimental dynamical decoupling Lorenza Viola; 23. Architectures Jacob Taylor; 24. Error correction in quantum communication Mark Wilde; Part VIII. Critical Evaluation of Fault Tolerance: 25. Hamiltonian methods in QEC and fault tolerance Eduardo Novais, Eduardo Mucciolo and Harold Baranger; 26. Critique of fault-tolerant quantum information processing Robert Alicki; References; Index.
Distributed fault detection over sensor networks with Markovian switching topologies
NASA Astrophysics Data System (ADS)
Ge, Xiaohua; Han, Qing-Long
2014-05-01
This paper deals with the distributed fault detection for discrete-time Markov jump linear systems over sensor networks with Markovian switching topologies. The sensors are scatteredly deployed in the sensor field and the fault detectors are physically distributed via a communication network. The system dynamics changes and sensing topology variations are modeled by a discrete-time Markov chain with incomplete mode transition probabilities. Each of these sensor nodes firstly collects measurement outputs from its all underlying neighboring nodes, processes these data in accordance with the Markovian switching topologies, and then transmits the processed data to the remote fault detector node. Network-induced delays and accumulated data packet dropouts are incorporated in the data transmission between the sensor nodes and the distributed fault detector nodes through the communication network. To generate localized residual signals, mode-independent distributed fault detection filters are proposed. By means of the stochastic Lyapunov functional approach, the residual system performance analysis is carried out such that the overall residual system is stochastically stable and the error between each residual signal and the fault signal is made as small as possible. Furthermore, a sufficient condition on the existence of the mode-independent distributed fault detection filters is derived in the simultaneous presence of incomplete mode transition probabilities, Markovian switching topologies, network-induced delays, and accumulated data packed dropouts. Finally, a stirred-tank reactor system is given to show the effectiveness of the developed theoretical results.
Optimal Management of Redundant Control Authority for Fault Tolerance
NASA Technical Reports Server (NTRS)
Wu, N. Eva; Ju, Jianhong
2000-01-01
This paper is intended to demonstrate the feasibility of a solution to a fault tolerant control problem. It explains, through a numerical example, the design and the operation of a novel scheme for fault tolerant control. The fundamental principle of the scheme was formalized in [5] based on the notion of normalized nonspecificity. The novelty lies with the use of a reliability criterion for redundancy management, and therefore leads to a high overall system reliability.
NASA Technical Reports Server (NTRS)
Brenner, Richard; Lala, Jaynarayan H.; Nagle, Gail A.; Schor, Andrei; Turkovich, John
1994-01-01
This program demonstrated the integration of a number of technologies that can increase the availability and reliability of launch vehicles while lowering costs. Availability is increased with an advanced guidance algorithm that adapts trajectories in real-time. Reliability is increased with fault-tolerant computers and communication protocols. Costs are reduced by automatically generating code and documentation. This program was realized through the cooperative efforts of academia, industry, and government. The NASA-LaRC coordinated the effort, while Draper performed the integration. Georgia Institute of Technology supplied a weak Hamiltonian finite element method for optimal control problems. Martin Marietta used MATLAB to apply this method to a launch vehicle (FENOC). Draper supplied the fault-tolerant computing and software automation technology. The fault-tolerant technology includes sequential and parallel fault-tolerant processors (FTP & FTPP) and authentication protocols (AP) for communication. Fault-tolerant technology was incrementally incorporated. Development culminated with a heterogeneous network of workstations and fault-tolerant computers using AP. Draper's software automation system, ASTER, was used to specify a static guidance system based on FENOC, navigation, flight control (GN&C), models, and the interface to a user interface for mission control. ASTER generated Ada code for GN&C and C code for models. An algebraic transform engine (ATE) was developed to automatically translate MATLAB scripts into ASTER.
Fault Injection Campaign for a Fault Tolerant Duplex Framework
NASA Technical Reports Server (NTRS)
Sacco, Gian Franco; Ferraro, Robert D.; von llmen, Paul; Rennels, Dave A.
2007-01-01
Fault tolerance is an efficient approach adopted to avoid or reduce the damage of a system failure. In this work we present the results of a fault injection campaign we conducted on the Duplex Framework (DF). The DF is a software developed by the UCLA group [1, 2] that uses a fault tolerant approach and allows to run two replicas of the same process on two different nodes of a commercial off-the-shelf (COTS) computer cluster. A third process running on a different node, constantly monitors the results computed by the two replicas, and eventually restarts the two replica processes if an inconsistency in their computation is detected. This approach is very cost efficient and can be adopted to control processes on spacecrafts where the fault rate produced by cosmic rays is not very high.
Fault tolerant control of multivariable processes using auto-tuning PID controller.
Yu, Ding-Li; Chang, T K; Yu, Ding-Wen
2005-02-01
Fault tolerant control of dynamic processes is investigated in this paper using an auto-tuning PID controller. A fault tolerant control scheme is proposed composing an auto-tuning PID controller based on an adaptive neural network model. The model is trained online using the extended Kalman filter (EKF) algorithm to learn system post-fault dynamics. Based on this model, the PID controller adjusts its parameters to compensate the effects of the faults, so that the control performance is recovered from degradation. The auto-tuning algorithm for the PID controller is derived with the Lyapunov method and therefore, the model predicted tracking error is guaranteed to converge asymptotically. The method is applied to a simulated two-input two-output continuous stirred tank reactor (CSTR) with various faults, which demonstrate the applicability of the developed scheme to industrial processes.
NASA Technical Reports Server (NTRS)
Harper, Richard
1989-01-01
In a fault-tolerant parallel computer, a functional programming model can facilitate distributed checkpointing, error recovery, load balancing, and graceful degradation. Such a model has been implemented on the Draper Fault-Tolerant Parallel Processor (FTPP). When used in conjunction with the FTPP's fault detection and masking capabilities, this implementation results in a graceful degradation of system performance after faults. Three graceful degradation algorithms have been implemented and are presented. A user interface has been implemented which requires minimal cognitive overhead by the application programmer, masking such complexities as the system's redundancy, distributed nature, variable complement of processing resources, load balancing, fault occurrence and recovery. This user interface is described and its use demonstrated. The applicability of the functional programming style to the Activation Framework, a paradigm for intelligent systems, is then briefly described.
Current Sensor Fault Reconstruction for PMSM Drives
Huang, Gang; Luo, Yi-Ping; Zhang, Chang-Fan; He, Jing; Huang, Yi-Shan
2016-01-01
This paper deals with a current sensor fault reconstruction algorithm for the torque closed-loop drive system of an interior PMSM. First, sensor faults are equated to actuator ones by a new introduced state variable. Then, in αβ coordinates, based on the motor model with active flux linkage, a current observer is constructed with a specific sliding mode equivalent control methodology to eliminate the effects of unknown disturbances, and the phase current sensor faults are reconstructed by means of an adaptive method. Finally, an αβ axis current fault processing module is designed based on the reconstructed value. The feasibility and effectiveness of the proposed method are verified by simulation and experimental tests on the RT-LAB platform. PMID:26840317
Reconfigurable tree architectures using subtree oriented fault tolerance
NASA Technical Reports Server (NTRS)
Lowrie, Matthew B.
1987-01-01
An approach to the design of reconfigurable tree architecture is presented in which spare processors are allocated at the leaves. The approach is unique in that spares are associated with subtrees and sharing of spares between these subtrees can occur. The Subtree Oriented Fault Tolerance (SOFT) approach is more reliable than previous approaches capable of tolerating link and switch failures for both single chip and multichip tree implementations while reducing redundancy in terms of both spare processors and links. VLSI layout is 0(n) for binary trees and is directly extensible to N-ary trees and fault tolerance through performance degradation.
Airborne Advanced Reconfigurable Computer System (ARCS)
NASA Technical Reports Server (NTRS)
Bjurman, B. E.; Jenkins, G. M.; Masreliez, C. J.; Mcclellan, K. L.; Templeman, J. E.
1976-01-01
A digital computer subsystem fault-tolerant concept was defined, and the potential benefits and costs of such a subsystem were assessed when used as the central element of a new transport's flight control system. The derived advanced reconfigurable computer system (ARCS) is a triple-redundant computer subsystem that automatically reconfigures, under multiple fault conditions, from triplex to duplex to simplex operation, with redundancy recovery if the fault condition is transient. The study included criteria development covering factors at the aircraft's operation level that would influence the design of a fault-tolerant system for commercial airline use. A new reliability analysis tool was developed for evaluating redundant, fault-tolerant system availability and survivability; and a stringent digital system software design methodology was used to achieve design/implementation visibility.
A benchmark for fault tolerant flight control evaluation
NASA Astrophysics Data System (ADS)
Smaili, H.; Breeman, J.; Lombaerts, T.; Stroosma, O.
2013-12-01
A large transport aircraft simulation benchmark (REconfigurable COntrol for Vehicle Emergency Return - RECOVER) has been developed within the GARTEUR (Group for Aeronautical Research and Technology in Europe) Flight Mechanics Action Group 16 (FM-AG(16)) on Fault Tolerant Control (2004 2008) for the integrated evaluation of fault detection and identification (FDI) and reconfigurable flight control strategies. The benchmark includes a suitable set of assessment criteria and failure cases, based on reconstructed accident scenarios, to assess the potential of new adaptive control strategies to improve aircraft survivability. The application of reconstruction and modeling techniques, based on accident flight data, has resulted in high-fidelity nonlinear aircraft and fault models to evaluate new Fault Tolerant Flight Control (FTFC) concepts and their real-time performance to accommodate in-flight failures.
Fault-tolerant communication channel structures
NASA Technical Reports Server (NTRS)
Tai, Ann T. (Inventor); Alkalai, Leon (Inventor); Chau, Savio N. (Inventor)
2006-01-01
Systems and techniques for implementing fault-tolerant communication channels and features in communication systems. Selected commercial-off-the-shelf devices can be integrated in such systems to reduce the cost.
Step-by-step magic state encoding for efficient fault-tolerant quantum computation
Goto, Hayato
2014-01-01
Quantum error correction allows one to make quantum computers fault-tolerant against unavoidable errors due to decoherence and imperfect physical gate operations. However, the fault-tolerant quantum computation requires impractically large computational resources for useful applications. This is a current major obstacle to the realization of a quantum computer. In particular, magic state distillation, which is a standard approach to universality, consumes the most resources in fault-tolerant quantum computation. For the resource problem, here we propose step-by-step magic state encoding for concatenated quantum codes, where magic states are encoded step by step from the physical level to the logical one. To manage errors during the encoding, we carefully use error detection. Since the sizes of intermediate codes are small, it is expected that the resource overheads will become lower than previous approaches based on the distillation at the logical level. Our simulation results suggest that the resource requirements for a logical magic state will become comparable to those for a single logical controlled-NOT gate. Thus, the present method opens a new possibility for efficient fault-tolerant quantum computation. PMID:25511387
Step-by-step magic state encoding for efficient fault-tolerant quantum computation.
Goto, Hayato
2014-12-16
Quantum error correction allows one to make quantum computers fault-tolerant against unavoidable errors due to decoherence and imperfect physical gate operations. However, the fault-tolerant quantum computation requires impractically large computational resources for useful applications. This is a current major obstacle to the realization of a quantum computer. In particular, magic state distillation, which is a standard approach to universality, consumes the most resources in fault-tolerant quantum computation. For the resource problem, here we propose step-by-step magic state encoding for concatenated quantum codes, where magic states are encoded step by step from the physical level to the logical one. To manage errors during the encoding, we carefully use error detection. Since the sizes of intermediate codes are small, it is expected that the resource overheads will become lower than previous approaches based on the distillation at the logical level. Our simulation results suggest that the resource requirements for a logical magic state will become comparable to those for a single logical controlled-NOT gate. Thus, the present method opens a new possibility for efficient fault-tolerant quantum computation.
Study of fault tolerant software technology for dynamic systems
NASA Technical Reports Server (NTRS)
Caglayan, A. K.; Zacharias, G. L.
1985-01-01
The major aim of this study is to investigate the feasibility of using systems-based failure detection isolation and compensation (FDIC) techniques in building fault-tolerant software and extending them, whenever possible, to the domain of software fault tolerance. First, it is shown that systems-based FDIC methods can be extended to develop software error detection techniques by using system models for software modules. In particular, it is demonstrated that systems-based FDIC techniques can yield consistency checks that are easier to implement than acceptance tests based on software specifications. Next, it is shown that systems-based failure compensation techniques can be generalized to the domain of software fault tolerance in developing software error recovery procedures. Finally, the feasibility of using fault-tolerant software in flight software is investigated. In particular, possible system and version instabilities, and functional performance degradation that may occur in N-Version programming applications to flight software are illustrated. Finally, a comparative analysis of N-Version and recovery block techniques in the context of generic blocks in flight software is presented.
Preliminary design of the redundant software experiment
NASA Technical Reports Server (NTRS)
Campbell, Roy; Deimel, Lionel; Eckhardt, Dave, Jr.; Kelly, John; Knight, John; Lauterbach, Linda; Lee, Larry; Mcallister, Dave; Mchugh, John
1985-01-01
The goal of the present experiment is to characterize the fault distributions of highly reliable software replicates, constructed using techniques and environments which are similar to those used in comtemporary industrial software facilities. The fault distributions and their effect on the reliability of fault tolerant configurations of the software will be determined through extensive life testing of the replicates against carefully constructed randomly generated test data. Each detected error will be carefully analyzed to provide insight in to their nature and cause. A direct objective is to develop techniques for reducing the intensity of coincident errors, thus increasing the reliability gain which can be achieved with fault tolerance. Data on the reliability gains realized, and the cost of the fault tolerant configurations can be used to design a companion experiment to determine the cost effectiveness of the fault tolerant strategy. Finally, the data and analysis produced by this experiment will be valuable to the software engineering community as a whole because it will provide a useful insight into the nature and cause of hard to find, subtle faults which escape standard software engineering validation techniques and thus persist far into the software life cycle.
Experimental Demonstration of Fault-Tolerant State Preparation with Superconducting Qubits.
Takita, Maika; Cross, Andrew W; Córcoles, A D; Chow, Jerry M; Gambetta, Jay M
2017-11-03
Robust quantum computation requires encoding delicate quantum information into degrees of freedom that are hard for the environment to change. Quantum encodings have been demonstrated in many physical systems by observing and correcting storage errors, but applications require not just storing information; we must accurately compute even with faulty operations. The theory of fault-tolerant quantum computing illuminates a way forward by providing a foundation and collection of techniques for limiting the spread of errors. Here we implement one of the smallest quantum codes in a five-qubit superconducting transmon device and demonstrate fault-tolerant state preparation. We characterize the resulting code words through quantum process tomography and study the free evolution of the logical observables. Our results are consistent with fault-tolerant state preparation in a protected qubit subspace.
A Voyager attitude control perspective on fault tolerant systems
NASA Technical Reports Server (NTRS)
Rasmussen, R. D.; Litty, E. C.
1981-01-01
In current spacecraft design, a trend can be observed to achieve greater fault tolerance through the application of on-board software dedicated to detecting and isolating failures. Whether fault tolerance through software can meet the desired objectives depends on very careful consideration and control of the system in which the software is imbedded. The considered investigation has the objective to provide some of the insight needed for the required analysis of the system. A description is given of the techniques which have been developed in this connection during the development of the Voyager spacecraft. The Voyager Galileo Attitude and Articulation Control Subsystem (AACS) fault tolerant design is discussed to emphasize basic lessons learned from this experience. The central driver of hardware redundancy implementation on Voyager was known as the 'single point failure criterion'.
Sensor fault diagnosis of aero-engine based on divided flight status.
Zhao, Zhen; Zhang, Jun; Sun, Yigang; Liu, Zhexu
2017-11-01
Fault diagnosis and safety analysis of an aero-engine have attracted more and more attention in modern society, whose safety directly affects the flight safety of an aircraft. In this paper, the problem concerning sensor fault diagnosis is investigated for an aero-engine during the whole flight process. Considering that the aero-engine is always working in different status through the whole flight process, a flight status division-based sensor fault diagnosis method is presented to improve fault diagnosis precision for the aero-engine. First, aero-engine status is partitioned according to normal sensor data during the whole flight process through the clustering algorithm. Based on that, a diagnosis model is built for each status using the principal component analysis algorithm. Finally, the sensors are monitored using the built diagnosis models by identifying the aero-engine status. The simulation result illustrates the effectiveness of the proposed method.
Sensor fault diagnosis of aero-engine based on divided flight status
NASA Astrophysics Data System (ADS)
Zhao, Zhen; Zhang, Jun; Sun, Yigang; Liu, Zhexu
2017-11-01
Fault diagnosis and safety analysis of an aero-engine have attracted more and more attention in modern society, whose safety directly affects the flight safety of an aircraft. In this paper, the problem concerning sensor fault diagnosis is investigated for an aero-engine during the whole flight process. Considering that the aero-engine is always working in different status through the whole flight process, a flight status division-based sensor fault diagnosis method is presented to improve fault diagnosis precision for the aero-engine. First, aero-engine status is partitioned according to normal sensor data during the whole flight process through the clustering algorithm. Based on that, a diagnosis model is built for each status using the principal component analysis algorithm. Finally, the sensors are monitored using the built diagnosis models by identifying the aero-engine status. The simulation result illustrates the effectiveness of the proposed method.
Toward a Fault Tolerant Architecture for Vital Medical-Based Wearable Computing.
Abdali-Mohammadi, Fardin; Bajalan, Vahid; Fathi, Abdolhossein
2015-12-01
Advancements in computers and electronic technologies have led to the emergence of a new generation of efficient small intelligent systems. The products of such technologies might include Smartphones and wearable devices, which have attracted the attention of medical applications. These products are used less in critical medical applications because of their resource constraint and failure sensitivity. This is due to the fact that without safety considerations, small-integrated hardware will endanger patients' lives. Therefore, proposing some principals is required to construct wearable systems in healthcare so that the existing concerns are dealt with. Accordingly, this paper proposes an architecture for constructing wearable systems in critical medical applications. The proposed architecture is a three-tier one, supporting data flow from body sensors to cloud. The tiers of this architecture include wearable computers, mobile computing, and mobile cloud computing. One of the features of this architecture is its high possible fault tolerance due to the nature of its components. Moreover, the required protocols are presented to coordinate the components of this architecture. Finally, the reliability of this architecture is assessed by simulating the architecture and its components, and other aspects of the proposed architecture are discussed.
Parallel Architectures for Planetary Exploration Requirements (PAPER)
NASA Technical Reports Server (NTRS)
Cezzar, Ruknet; Sen, Ranjan K.
1989-01-01
The Parallel Architectures for Planetary Exploration Requirements (PAPER) project is essentially research oriented towards technology insertion issues for NASA's unmanned planetary probes. It was initiated to complement and augment the long-term efforts for space exploration with particular reference to NASA/LaRC's (NASA Langley Research Center) research needs for planetary exploration missions of the mid and late 1990s. The requirements for space missions as given in the somewhat dated Advanced Information Processing Systems (AIPS) requirements document are contrasted with the new requirements from JPL/Caltech involving sensor data capture and scene analysis. It is shown that more stringent requirements have arisen as a result of technological advancements. Two possible architectures, the AIPS Proof of Concept (POC) configuration and the MAX Fault-tolerant dataflow multiprocessor, were evaluated. The main observation was that the AIPS design is biased towards fault tolerance and may not be an ideal architecture for planetary and deep space probes due to high cost and complexity. The MAX concepts appears to be a promising candidate, except that more detailed information is required. The feasibility for adding neural computation capability to this architecture needs to be studied. Key impact issues for architectural design of computing systems meant for planetary missions were also identified.
Closed-Loop HIRF Experiments Performed on a Fault Tolerant Flight Control Computer
NASA Technical Reports Server (NTRS)
Belcastro, Celeste M.
1997-01-01
ABSTRACT Closed-loop HIRF experiments were performed on a fault tolerant flight control computer (FCC) at the NASA Langley Research Center. The FCC used in the experiments was a quad-redundant flight control computer executing B737 Autoland control laws. The FCC was placed in one of the mode-stirred reverberation chambers in the HIRF Laboratory and interfaced to a computer simulation of the B737 flight dynamics, engines, sensors, actuators, and atmosphere in the Closed-Loop Systems Laboratory. Disturbances to the aircraft associated with wind gusts and turbulence were simulated during tests. Electrical isolation between the FCC under test and the simulation computer was achieved via a fiber optic interface for the analog and discrete signals. Closed-loop operation of the FCC enabled flight dynamics and atmospheric disturbances affecting the aircraft to be represented during tests. Upset was induced in the FCC as a result of exposure to HIRF, and the effect of upset on the simulated flight of the aircraft was observed and recorded. This paper presents a description of these closed- loop HIRF experiments, upset data obtained from the FCC during these experiments, and closed-loop effects on the simulated flight of the aircraft.
Copilot: Monitoring Embedded Systems
NASA Technical Reports Server (NTRS)
Pike, Lee; Wegmann, Nis; Niller, Sebastian; Goodloe, Alwyn
2012-01-01
Runtime verification (RV) is a natural fit for ultra-critical systems, where correctness is imperative. In ultra-critical systems, even if the software is fault-free, because of the inherent unreliability of commodity hardware and the adversity of operational environments, processing units (and their hosted software) are replicated, and fault-tolerant algorithms are used to compare the outputs. We investigate both software monitoring in distributed fault-tolerant systems, as well as implementing fault-tolerance mechanisms using RV techniques. We describe the Copilot language and compiler, specifically designed for generating monitors for distributed, hard real-time systems. We also describe two case-studies in which we generated Copilot monitors in avionics systems.
Kim, MinJeong; Liu, Hongbin; Kim, Jeong Tai; Yoo, ChangKyoo
2014-08-15
Sensor faults in metro systems provide incorrect information to indoor air quality (IAQ) ventilation systems, resulting in the miss-operation of ventilation systems and adverse effects on passenger health. In this study, a new sensor validation method is proposed to (1) detect, identify and repair sensor faults and (2) evaluate the influence of sensor reliability on passenger health risk. To address the dynamic non-Gaussianity problem of IAQ data, dynamic independent component analysis (DICA) is used. To detect and identify sensor faults, the DICA-based squared prediction error and sensor validity index are used, respectively. To restore the faults to normal measurements, a DICA-based iterative reconstruction algorithm is proposed. The comprehensive indoor air-quality index (CIAI) that evaluates the influence of the current IAQ on passenger health is then compared using the faulty and reconstructed IAQ data sets. Experimental results from a metro station showed that the DICA-based method can produce an improved IAQ level in the metro station and reduce passenger health risk since it more accurately validates sensor faults than do conventional methods. Copyright © 2014 Elsevier B.V. All rights reserved.
Talebi, H A; Khorasani, K; Tafazoli, S
2009-01-01
This paper presents a robust fault detection and isolation (FDI) scheme for a general class of nonlinear systems using a neural-network-based observer strategy. Both actuator and sensor faults are considered. The nonlinear system considered is subject to both state and sensor uncertainties and disturbances. Two recurrent neural networks are employed to identify general unknown actuator and sensor faults, respectively. The neural network weights are updated according to a modified backpropagation scheme. Unlike many previous methods developed in the literature, our proposed FDI scheme does not rely on availability of full state measurements. The stability of the overall FDI scheme in presence of unknown sensor and actuator faults as well as plant and sensor noise and uncertainties is shown by using the Lyapunov's direct method. The stability analysis developed requires no restrictive assumptions on the system and/or the FDI algorithm. Magnetorquer-type actuators and magnetometer-type sensors that are commonly employed in the attitude control subsystem (ACS) of low-Earth orbit (LEO) satellites for attitude determination and control are considered in our case studies. The effectiveness and capabilities of our proposed fault diagnosis strategy are demonstrated and validated through extensive simulation studies.
Fault tolerant programmable digital attitude control electronics study
NASA Technical Reports Server (NTRS)
Sorensen, A. A.
1974-01-01
The attitude control electronics mechanization study to develop a fault tolerant autonomous concept for a three axis system is reported. Programmable digital electronics are compared to general purpose digital computers. The requirements, constraints, and tradeoffs are discussed. It is concluded that: (1) general fault tolerance can be achieved relatively economically, (2) recovery times of less than one second can be obtained, (3) the number of faulty behavior patterns must be limited, and (4) adjoined processes are the best indicators of faulty operation.
1988-08-20
34 William A. Link, Patuxent Wildlife Research Center "Increasing reliability of multiversion fault-tolerant software design by modulation," Junryo 3... Multiversion lault-Tolerant Software Design by Modularization Junryo Miyashita Department of Computer Science California state University at san Bernardino Fault...They shall beE refered to as " multiversion fault-tolerant software design". Onel problem of developing multi-versions of a program is the high cost
Refinement for fault-tolerance: An aircraft hand-off protocol
NASA Technical Reports Server (NTRS)
Marzullo, Keith; Schneider, Fred B.; Dehn, Jon
1994-01-01
Part of the Advanced Automation System (AAS) for air-traffic control is a protocol to permit flight hand-off from one air-traffic controller to another. The protocol must be fault-tolerant and, therefore, is subtle -- an ideal candidate for the application of formal methods. This paper describes a formal method for deriving fault-tolerant protocols that is based on refinement and proof outlines. The AAS hand-off protocol was actually derived using this method; that derivation is given.
Testing For EM Upsets In Aircraft Control Computers
NASA Technical Reports Server (NTRS)
Belcastro, Celeste M.
1994-01-01
Effects of transient electrical signals evaluated in laboratory tests. Method of evaluating nominally fault-tolerant, aircraft-type digital-computer-based control system devised. Provides for evaluation of susceptibility of system to upset and evaluation of integrity of control when system subjected to transient electrical signals like those induced by electromagnetic (EM) source, in this case lightning. Beyond aerospace applications, fault-tolerant control systems becoming more wide-spread in industry; such as in automobiles. Method supports practical, systematic tests for evaluation of designs of fault-tolerant control systems.
Integrated INS/GPS Navigation from a Popular Perspective
NASA Technical Reports Server (NTRS)
Omerbashich, Mensur
2002-01-01
Inertial navigation, blended with other navigation aids, Global Positioning System (GPS) in particular, has gained significance due to enhanced navigation and inertial reference performance and dissimilarity for fault tolerance and anti-jamming. Relatively new concepts based upon using Differential GPS (DGPS) blended with Inertial (and visual) Navigation Sensors (INS) offer the possibility of low cost, autonomous aircraft landing. The FAA has decided to implement the system in a sophisticated form as a new standard navigation tool during this decade. There have been a number of new inertial sensor concepts in the recent past that emphasize increased accuracy of INS/GPS versus INS and reliability of navigation, as well as lower size and weight, and higher power, fault tolerance, and long life. The principles of GPS are not discussed; rather the attention is directed towards general concepts and comparative advantages. A short introduction to the problems faced in kinematics is presented. The intention is to relate the basic principles of kinematics to probably the most used navigation method in the future-INS/GPS. An example of the airborne INS is presented, with emphasis on how it works. The discussion of the error types and sources in navigation, and of the role of filters in optimal estimation of the errors then follows. The main question this paper is trying to answer is 'What are the benefits of the integration of INS and GPS and how is this, navigation concept of the future achieved in reality?' The main goal is to communicate the idea about what stands behind a modern navigation method.
Design of the Protocol Processor for the ROBUS-2 Communication System
NASA Technical Reports Server (NTRS)
Torres-Pomales, Wilfredo; Malekpour, Mahyar R.; Miner, Paul S.
2005-01-01
The ROBUS-2 Protocol Processor (RPP) is a custom-designed hardware component implementing the functionality of the ROBUS-2 fault-tolerant communication system. The Reliable Optical Bus (ROBUS) is the core communication system of the Scalable Processor-Independent Design for Enhanced Reliability (SPIDER), a general-purpose fault tolerant integrated modular architecture currently under development at NASA Langley Research Center. ROBUS is a time-division multiple access (TDMA) broadcast communication system with medium access control by means of time-indexed communication schedule. ROBUS-2 is a developmental version of the ROBUS providing guaranteed fault-tolerant services to the attached processing elements (PEs), in the presence of a bounded number of faults. These services include message broadcast (Byzantine Agreement), dynamic communication schedule update, time reference (clock synchronization), and distributed diagnosis (group membership). ROBUS also features fault-tolerant startup and restart capabilities. ROBUS-2 tolerates internal as well as PE faults, and incorporates a dynamic self-reconfiguration capability driven by the internal diagnostic system. ROBUS consists of RPPs connected to each other by a lower-level physical communication network. The RPP has a pipelined architecture and the design is parameterized in the behavioral and structural domains. The design of the RPP enables the bus to achieve a PE-message throughput that approaches the available bandwidth at the physical layer.
Shen, H; Xu, Y; Dickinson, B T
2014-11-18
Inspired by sensing strategies observed in birds and bats, a new attitude control concept of directly using real-time pressure and shear stresses has recently been studied. It was shown that with an array of onboard airflow sensors, small unmanned aircraft systems can promptly respond to airflow changes and improve flight performances. In this paper, a mapping function is proposed to compute aerodynamic moments from the real-time pressure and shear data in a practical and computationally tractable formulation. Since many microscale airflow sensors are embedded on the small unmanned aircraft system surface, it is highly possible that certain sensors may fail. Here, an adaptive control system is developed that is robust to sensor failure as well as other numerical mismatches in calculating real-time aerodynamic moments. The advantages of the proposed method are shown in the following simulation cases: (i) feedback pressure and wall shear data from a distributed array of 45 airflow sensors; (ii) 50% failure of the symmetrically distributed airflow sensor array; and (iii) failure of all the airflow sensors on one wing. It is shown that even if 50% of the airflow sensors have failures, the aircraft is still stable and able to track the attitude commands.
Optimal fault-tolerant control strategy of a solid oxide fuel cell system
NASA Astrophysics Data System (ADS)
Wu, Xiaojuan; Gao, Danhui
2017-10-01
For solid oxide fuel cell (SOFC) development, load tracking, heat management, air excess ratio constraint, high efficiency, low cost and fault diagnosis are six key issues. However, no literature studies the control techniques combining optimization and fault diagnosis for the SOFC system. An optimal fault-tolerant control strategy is presented in this paper, which involves four parts: a fault diagnosis module, a switching module, two backup optimizers and a controller loop. The fault diagnosis part is presented to identify the SOFC current fault type, and the switching module is used to select the appropriate backup optimizer based on the diagnosis result. NSGA-II and TOPSIS are employed to design the two backup optimizers under normal and air compressor fault states. PID algorithm is proposed to design the control loop, which includes a power tracking controller, an anode inlet temperature controller, a cathode inlet temperature controller and an air excess ratio controller. The simulation results show the proposed optimal fault-tolerant control method can track the power, temperature and air excess ratio at the desired values, simultaneously achieving the maximum efficiency and the minimum unit cost in the case of SOFC normal and even in the air compressor fault.
NASA Technical Reports Server (NTRS)
Padilla, Peter A.
1991-01-01
An investigation was made in AIRLAB of the fault handling performance of the Fault Tolerant MultiProcessor (FTMP). Fault handling errors detected during fault injection experiments were characterized. In these fault injection experiments, the FTMP disabled a working unit instead of the faulted unit once in every 500 faults, on the average. System design weaknesses allow active faults to exercise a part of the fault management software that handles Byzantine or lying faults. Byzantine faults behave such that the faulted unit points to a working unit as the source of errors. The design's problems involve: (1) the design and interface between the simplex error detection hardware and the error processing software, (2) the functional capabilities of the FTMP system bus, and (3) the communication requirements of a multiprocessor architecture. These weak areas in the FTMP's design increase the probability that, for any hardware fault, a good line replacement unit (LRU) is mistakenly disabled by the fault management software.
Formal Techniques for Synchronized Fault-Tolerant Systems
NASA Technical Reports Server (NTRS)
DiVito, Ben L.; Butler, Ricky W.
1992-01-01
We present the formal verification of synchronizing aspects of the Reliable Computing Platform (RCP), a fault-tolerant computing system for digital flight control applications. The RCP uses NMR-style redundancy to mask faults and internal majority voting to purge the effects of transient faults. The system design has been formally specified and verified using the EHDM verification system. Our formalization is based on an extended state machine model incorporating snapshots of local processors clocks.
Fault-tolerant Control of a Cyber-physical System
NASA Astrophysics Data System (ADS)
Roxana, Rusu-Both; Eva-Henrietta, Dulf
2017-10-01
Cyber-physical systems represent a new emerging field in automatic control. The fault system is a key component, because modern, large scale processes must meet high standards of performance, reliability and safety. Fault propagation in large scale chemical processes can lead to loss of production, energy, raw materials and even environmental hazard. The present paper develops a multi-agent fault-tolerant control architecture using robust fractional order controllers for a (13C) cryogenic separation column cascade. The JADE (Java Agent DEvelopment Framework) platform was used to implement the multi-agent fault tolerant control system while the operational model of the process was implemented in Matlab/SIMULINK environment. MACSimJX (Multiagent Control Using Simulink with Jade Extension) toolbox was used to link the control system and the process model. In order to verify the performance and to prove the feasibility of the proposed control architecture several fault simulation scenarios were performed.
Fault-tolerant building-block computer study
NASA Technical Reports Server (NTRS)
Rennels, D. A.
1978-01-01
Ultra-reliable core computers are required for improving the reliability of complex military systems. Such computers can provide reliable fault diagnosis, failure circumvention, and, in some cases serve as an automated repairman for their host systems. A small set of building-block circuits which can be implemented as single very large integration devices, and which can be used with off-the-shelf microprocessors and memories to build self checking computer modules (SCCM) is described. Each SCCM is a microcomputer which is capable of detecting its own faults during normal operation and is described to communicate with other identical modules over one or more Mil Standard 1553A buses. Several SCCMs can be connected into a network with backup spares to provide fault-tolerant operation, i.e. automated recovery from faults. Alternative fault-tolerant SCCM configurations are discussed along with the cost and reliability associated with their implementation.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Simpson, L.
ITN Energy Systems, Inc., and Global Solar Energy, Inc., with the assistance of NREL's PV Manufacturing R&D program, have continued the advancement of CIGS production technology through the development of trajectory-oriented predictive/control models, fault-tolerance control, control-platform development, in-situ sensors, and process improvements. Modeling activities to date include the development of physics-based and empirical models for CIGS and sputter-deposition processing, implementation of model-based control, and application of predictive models to the construction of new evaporation sources and for control. Model-based control is enabled through implementation of reduced or empirical models into a control platform. Reliability improvement activities include implementation of preventivemore » maintenance schedules; detection of failed sensors/equipment and reconfiguration to continue processing; and systematic development of fault prevention and reconfiguration strategies for the full range of CIGS PV production deposition processes. In-situ sensor development activities have resulted in improved control and indicated the potential for enhanced process status monitoring and control of the deposition processes. Substantial process improvements have been made, including significant improvement in CIGS uniformity, thickness control, efficiency, yield, and throughput. In large measure, these gains have been driven by process optimization, which, in turn, have been enabled by control and reliability improvements due to this PV Manufacturing R&D program. This has resulted in substantial improvements of flexible CIGS PV module performance and efficiency.« less
Application of a Bank of Kalman Filters for Aircraft Engine Fault Diagnostics
NASA Technical Reports Server (NTRS)
Kobayashi, Takahisa; Simon, Donald L.
2003-01-01
In this paper, a bank of Kalman filters is applied to aircraft gas turbine engine sensor and actuator fault detection and isolation (FDI) in conjunction with the detection of component faults. This approach uses multiple Kalman filters, each of which is designed for detecting a specific sensor or actuator fault. In the event that a fault does occur, all filters except the one using the correct hypothesis will produce large estimation errors, thereby isolating the specific fault. In the meantime, a set of parameters that indicate engine component performance is estimated for the detection of abrupt degradation. The proposed FDI approach is applied to a nonlinear engine simulation at nominal and aged conditions, and the evaluation results for various engine faults at cruise operating conditions are given. The ability of the proposed approach to reliably detect and isolate sensor and actuator faults is demonstrated.
A Design of Finite Memory Residual Generation Filter for Sensor Fault Detection
NASA Astrophysics Data System (ADS)
Kim, Pyung Soo
2017-04-01
In the current paper, a residual generation filter with finite memory structure is proposed for sensor fault detection. The proposed finite memory residual generation filter provides the residual by real-time filtering of fault vector using only the most recent finite measurements and inputs on the window. It is shown that the residual given by the proposed residual generation filter provides the exact fault for noisefree systems. The proposed residual generation filter is specified to the digital filter structure for the amenability to hardware implementation. Finally, to illustrate the capability of the proposed residual generation filter, extensive simulations are performed for the discretized DC motor system with two types of sensor faults, incipient soft bias-type fault and abrupt bias-type fault. In particular, according to diverse noise levels and windows lengths, meaningful simulation results are given for the abrupt bias-type fault.
A fault isolation method based on the incidence matrix of an augmented system
NASA Astrophysics Data System (ADS)
Chen, Changxiong; Chen, Liping; Ding, Jianwan; Wu, Yizhong
2018-03-01
A new approach is proposed for isolating faults and fast identifying the redundant sensors of a system in this paper. By introducing fault signal as additional state variable, an augmented system model is constructed by the original system model, fault signals and sensor measurement equations. The structural properties of an augmented system model are provided in this paper. From the viewpoint of evaluating fault variables, the calculating correlations of the fault variables in the system can be found, which imply the fault isolation properties of the system. Compared with previous isolation approaches, the highlights of the new approach are that it can quickly find the faults which can be isolated using exclusive residuals, at the same time, and can identify the redundant sensors in the system, which are useful for the design of diagnosis system. The simulation of a four-tank system is reported to validate the proposed method.
Evaluation of fault-tolerant parallel-processor architectures over long space missions
NASA Technical Reports Server (NTRS)
Johnson, Sally C.
1989-01-01
The impact of a five year space mission environment on fault-tolerant parallel processor architectures is examined. The target application is a Strategic Defense Initiative (SDI) satellite requiring 256 parallel processors to provide the computation throughput. The reliability requirements are that the system still be operational after five years with .99 probability and that the probability of system failure during one-half hour of full operation be less than 10(-7). The fault tolerance features an architecture must possess to meet these reliability requirements are presented, many potential architectures are briefly evaluated, and one candidate architecture, the Charles Stark Draper Laboratory's Fault-Tolerant Parallel Processor (FTPP) is evaluated in detail. A methodology for designing a preliminary system configuration to meet the reliability and performance requirements of the mission is then presented and demonstrated by designing an FTPP configuration.
Measurement and analysis of operating system fault tolerance
NASA Technical Reports Server (NTRS)
Lee, I.; Tang, D.; Iyer, R. K.
1992-01-01
This paper demonstrates a methodology to model and evaluate the fault tolerance characteristics of operational software. The methodology is illustrated through case studies on three different operating systems: the Tandem GUARDIAN fault-tolerant system, the VAX/VMS distributed system, and the IBM/MVS system. Measurements are made on these systems for substantial periods to collect software error and recovery data. In addition to investigating basic dependability characteristics such as major software problems and error distributions, we develop two levels of models to describe error and recovery processes inside an operating system and on multiple instances of an operating system running in a distributed environment. Based on the models, reward analysis is conducted to evaluate the loss of service due to software errors and the effect of the fault-tolerance techniques implemented in the systems. Software error correlation in multicomputer systems is also investigated.
Biomimetic Models for An Ecological Approach to Massively-Deployed Sensor Networks
NASA Technical Reports Server (NTRS)
Jones, Kennie H.; Lodding, Kenneth N.; Olariu, Stephan; Wilson, Larry; Xin, Chunsheng
2005-01-01
Promises of ubiquitous control of the physical environment by massively-deployed wireless sensor networks open avenues for new applications that will redefine the way we live and work. Due to small size and low cost of sensor devices, visionaries promise systems enabled by deployment of massive numbers of sensors ubiquitous throughout our environment working in concert. Recent research has concentrated on developing techniques for performing relatively simple tasks with minimal energy expense, assuming some form of centralized control. Unfortunately, centralized control is not conducive to parallel activities and does not scale to massive size networks. Execution of simple tasks in sparse networks will not lead to the sophisticated applications predicted. We propose a new way of looking at massively-deployed sensor networks, motivated by lessons learned from the way biological ecosystems are organized. We demonstrate that in such a model, fully distributed data aggregation can be performed in a scalable fashion in massively deployed sensor networks, where motes operate on local information, making local decisions that are aggregated across the network to achieve globally-meaningful effects. We show that such architectures may be used to facilitate communication and synchronization in a fault-tolerant manner, while balancing workload and required energy expenditure throughout the network.
Study of a unified hardware and software fault-tolerant architecture
NASA Technical Reports Server (NTRS)
Lala, Jaynarayan; Alger, Linda; Friend, Steven; Greeley, Gregory; Sacco, Stephen; Adams, Stuart
1989-01-01
A unified architectural concept, called the Fault Tolerant Processor Attached Processor (FTP-AP), that can tolerate hardware as well as software faults is proposed for applications requiring ultrareliable computation capability. An emulation of the FTP-AP architecture, consisting of a breadboard Motorola 68010-based quadruply redundant Fault Tolerant Processor, four VAX 750s as attached processors, and four versions of a transport aircraft yaw damper control law, is used as a testbed in the AIRLAB to examine a number of critical issues. Solutions of several basic problems associated with N-Version software are proposed and implemented on the testbed. This includes a confidence voter to resolve coincident errors in N-Version software. A reliability model of N-Version software that is based upon the recent understanding of software failure mechanisms is also developed. The basic FTP-AP architectural concept appears suitable for hosting N-Version application software while at the same time tolerating hardware failures. Architectural enhancements for greater efficiency, software reliability modeling, and N-Version issues that merit further research are identified.
Software dependability in the Tandem GUARDIAN system
NASA Technical Reports Server (NTRS)
Lee, Inhwan; Iyer, Ravishankar K.
1995-01-01
Based on extensive field failure data for Tandem's GUARDIAN operating system this paper discusses evaluation of the dependability of operational software. Software faults considered are major defects that result in processor failures and invoke backup processes to take over. The paper categorizes the underlying causes of software failures and evaluates the effectiveness of the process pair technique in tolerating software faults. A model to describe the impact of software faults on the reliability of an overall system is proposed. The model is used to evaluate the significance of key factors that determine software dependability and to identify areas for improvement. An analysis of the data shows that about 77% of processor failures that are initially considered due to software are confirmed as software problems. The analysis shows that the use of process pairs to provide checkpointing and restart (originally intended for tolerating hardware faults) allows the system to tolerate about 75% of reported software faults that result in processor failures. The loose coupling between processors, which results in the backup execution (the processor state and the sequence of events) being different from the original execution, is a major reason for the measured software fault tolerance. Over two-thirds (72%) of measured software failures are recurrences of previously reported faults. Modeling, based on the data, shows that, in addition to reducing the number of software faults, software dependability can be enhanced by reducing the recurrence rate.
General linear codes for fault-tolerant matrix operations on processor arrays
NASA Technical Reports Server (NTRS)
Nair, V. S. S.; Abraham, J. A.
1988-01-01
Various checksum codes have been suggested for fault-tolerant matrix computations on processor arrays. Use of these codes is limited due to potential roundoff and overflow errors. Numerical errors may also be misconstrued as errors due to physical faults in the system. In this a set of linear codes is identified which can be used for fault-tolerant matrix operations such as matrix addition, multiplication, transposition, and LU-decomposition, with minimum numerical error. Encoding schemes are given for some of the example codes which fall under the general set of codes. With the help of experiments, a rule of thumb for the selection of a particular code for a given application is derived.
Fault-tolerant, high-level quantum circuits: form, compilation and description
NASA Astrophysics Data System (ADS)
Paler, Alexandru; Polian, Ilia; Nemoto, Kae; Devitt, Simon J.
2017-06-01
Fault-tolerant quantum error correction is a necessity for any quantum architecture destined to tackle interesting, large-scale problems. Its theoretical formalism has been well founded for nearly two decades. However, we still do not have an appropriate compiler to produce a fault-tolerant, error-corrected description from a higher-level quantum circuit for state-of the-art hardware models. There are many technical hurdles, including dynamic circuit constructions that occur when constructing fault-tolerant circuits with commonly used error correcting codes. We introduce a package that converts high-level quantum circuits consisting of commonly used gates into a form employing all decompositions and ancillary protocols needed for fault-tolerant error correction. We call this form the (I)initialisation, (C)NOT, (M)measurement form (ICM) and consists of an initialisation layer of qubits into one of four distinct states, a massive, deterministic array of CNOT operations and a series of time-ordered X- or Z-basis measurements. The form allows a more flexible approach towards circuit optimisation. At the same time, the package outputs a standard circuit or a canonical geometric description which is a necessity for operating current state-of-the-art hardware architectures using topological quantum codes.
Software Fault Tolerance: A Tutorial
NASA Technical Reports Server (NTRS)
Torres-Pomales, Wilfredo
2000-01-01
Because of our present inability to produce error-free software, software fault tolerance is and will continue to be an important consideration in software systems. The root cause of software design errors is the complexity of the systems. Compounding the problems in building correct software is the difficulty in assessing the correctness of software for highly complex systems. After a brief overview of the software development processes, we note how hard-to-detect design faults are likely to be introduced during development and how software faults tend to be state-dependent and activated by particular input sequences. Although component reliability is an important quality measure for system level analysis, software reliability is hard to characterize and the use of post-verification reliability estimates remains a controversial issue. For some applications software safety is more important than reliability, and fault tolerance techniques used in those applications are aimed at preventing catastrophes. Single version software fault tolerance techniques discussed include system structuring and closure, atomic actions, inline fault detection, exception handling, and others. Multiversion techniques are based on the assumption that software built differently should fail differently and thus, if one of the redundant versions fails, it is expected that at least one of the other versions will provide an acceptable output. Recovery blocks, N-version programming, and other multiversion techniques are reviewed.
Adaptive control of 5 DOF upper-limb exoskeleton robot with improved safety.
Kang, Hao-Bo; Wang, Jian-Hui
2013-11-01
This paper studies an adaptive control strategy for a class of 5 DOF upper-limb exoskeleton robot with a special safety consideration. The safety requirement plays a critical role in the clinical treatment when assisting patients with shoulder, elbow and wrist joint movements. With the objective of assuring the tracking performance of the pre-specified operations, the proposed adaptive controller is firstly designed to be robust to the model uncertainties. To further improve the safety and fault-tolerance in the presence of unknown large parameter variances or even actuator faults, the adaptive controller is on-line updated according to the information provided by an adaptive observer without additional sensors. An output tracking performance is well achieved with a tunable error bound. The experimental example also verifies the effectiveness of the proposed control scheme. © 2013 ISA. Published by ISA. All rights reserved.
Three-dimensional modeling, estimation, and fault diagnosis of spacecraft air contaminants.
Narayan, A P; Ramirez, W F
1998-01-01
A description is given of the design and implementation of a method to track the presence of air contaminants aboard a spacecraft using an accurate physical model and of a procedure that would raise alarms when certain tolerance levels are exceeded. Because our objective is to monitor the contaminants in real time, we make use of a state estimation procedure that filters measurements from a sensor system and arrives at an optimal estimate of the state of the system. The model essentially consists of a convection-diffusion equation in three dimensions, solved implicitly using the principle of operator splitting, and uses a flowfield obtained by the solution of the Navier-Stokes equations for the cabin geometry, assuming steady-state conditions. A novel implicit Kalman filter has been used for fault detection, a procedure that is an efficient way to track the state of the system and that uses the sparse nature of the state transition matrices.
Identifiability of Additive, Time-Varying Actuator and Sensor Faults by State Augmentation
NASA Technical Reports Server (NTRS)
Upchurch, Jason M.; Gonzalez, Oscar R.; Joshi, Suresh M.
2014-01-01
Recent work has provided a set of necessary and sucient conditions for identifiability of additive step faults (e.g., lock-in-place actuator faults, constant bias in the sensors) using state augmentation. This paper extends these results to an important class of faults which may affect linear, time-invariant systems. In particular, the faults under consideration are those which vary with time and affect the system dynamics additively. Such faults may manifest themselves in aircraft as, for example, control surface oscillations, control surface runaway, and sensor drift. The set of necessary and sucient conditions presented in this paper are general, and apply when a class of time-varying faults affects arbitrary combinations of actuators and sensors. The results in the main theorems are illustrated by two case studies, which provide some insight into how the conditions may be used to check the theoretical identifiability of fault configurations of interest for a given system. It is shown that while state augmentation can be used to identify certain fault configurations, other fault configurations are theoretically impossible to identify using state augmentation, giving practitioners valuable insight into such situations. That is, the limitations of state augmentation for a given system and configuration of faults are made explicit. Another limitation of model-based methods is that there can be large numbers of fault configurations, thus making identification of all possible configurations impractical. However, the theoretical identifiability of known, credible fault configurations can be tested using the theorems presented in this paper, which can then assist the efforts of fault identification practitioners.
NASA Astrophysics Data System (ADS)
Fauji, Shantanu
We consider the problem of energy efficient and fault tolerant in--network aggregation for wireless sensor networks (WSNs). In-network aggregation is the process of aggregation while collecting data from sensors to the base station. This process should be energy efficient due to the limited energy at the sensors and tolerant to the high failure rates common in sensor networks. Tree based in--network aggregation protocols, although energy efficient, are not robust to network failures. Multipath routing protocols are robust to failures to a certain degree but are not energy efficient due to the overhead in the maintenance of multiple paths. We propose a new protocol for in-network aggregation in WSNs, which is energy efficient, achieves high lifetime, and is robust to the changes in the network topology. Our protocol, gossip--based protocol for in-network aggregation (GPIA) is based on the spreading of information via gossip. GPIA is not only adaptive to failures and changes in the network topology, but is also energy efficient. Energy efficiency of GPIA comes from all the nodes being capable of selective message reception and detecting convergence of the aggregation early. We experimentally show that GPIA provides significant improvement over some other competitors like the Ridesharing, Synopsis Diffusion and the pure version of gossip. GPIA shows ten fold, five fold and two fold improvement over the pure gossip, the synopsis diffusion and Ridesharing protocols in terms of network lifetime, respectively. Further, GPIA retains gossip's robustness to failures and improves upon the accuracy of synopsis diffusion and Ridesharing.
Zhang, Feng; Xu, Yuetong; Chou, Jarong
2016-01-01
The service of sensor device in Emerging Sensor Networks (ESNs) is the extension of traditional Web services. Through the sensor network, the service of sensor device can communicate directly with the entity in the geographic environment, and even impact the geographic entity directly. The interaction between the sensor device in ESNs and geographic environment is very complex, and the interaction modeling is a challenging problem. This paper proposed a novel Petri Nets-based modeling method for the interaction between the sensor device and the geographic environment. The feature of the sensor device service in ESNs is more easily affected by the geographic environment than the traditional Web service. Therefore, the response time, the fault-tolerant ability and the resource consumption become important factors in the performance of the whole sensor application system. Thus, this paper classified IoT services as Sensing services and Controlling services according to the interaction between IoT service and geographic entity, and classified GIS services as data services and processing services. Then, this paper designed and analyzed service algebra and Colored Petri Nets model to modeling the geo-feature, IoT service, GIS service and the interaction process between the sensor and the geographic enviroment. At last, the modeling process is discussed by examples. PMID:27681730
2000-06-01
real - time operating system and design of a human-computer interface (HCI) for a triple modular redundant (TMR) fault-tolerant microprocessor for use in space-based applications. Once disadvantage of using COTS hardware components is their susceptibility to the radiation effects present in the space environment. and specifically, radiation-induced single-event upsets (SEUs). In the event of an SEU, a fault-tolerant system can mitigate the effects of the upset and continue to process from the last known correct system state. The TMR basic hardware
Fault tolerant linear actuator
Tesar, Delbert
2004-09-14
In varying embodiments, the fault tolerant linear actuator of the present invention is a new and improved linear actuator with fault tolerance and positional control that may incorporate velocity summing, force summing, or a combination of the two. In one embodiment, the invention offers a velocity summing arrangement with a differential gear between two prime movers driving a cage, which then drives a linear spindle screw transmission. Other embodiments feature two prime movers driving separate linear spindle screw transmissions, one internal and one external, in a totally concentric and compact integrated module.
Fault detection, isolation, and diagnosis of self-validating multifunctional sensors.
Yang, Jing-Li; Chen, Yin-Sheng; Zhang, Li-Li; Sun, Zhen
2016-06-01
A novel fault detection, isolation, and diagnosis (FDID) strategy for self-validating multifunctional sensors is presented in this paper. The sparse non-negative matrix factorization-based method can effectively detect faults by using the squared prediction error (SPE) statistic, and the variables contribution plots based on SPE statistic can help to locate and isolate the faulty sensitive units. The complete ensemble empirical mode decomposition is employed to decompose the fault signals to a series of intrinsic mode functions (IMFs) and a residual. The sample entropy (SampEn)-weighted energy values of each IMFs and the residual are estimated to represent the characteristics of the fault signals. Multi-class support vector machine is introduced to identify the fault mode with the purpose of diagnosing status of the faulty sensitive units. The performance of the proposed strategy is compared with other fault detection strategies such as principal component analysis, independent component analysis, and fault diagnosis strategies such as empirical mode decomposition coupled with support vector machine. The proposed strategy is fully evaluated in a real self-validating multifunctional sensors experimental system, and the experimental results demonstrate that the proposed strategy provides an excellent solution to the FDID research topic of self-validating multifunctional sensors.
NASA Technical Reports Server (NTRS)
Izenson, Michael G.; Crowley, Christopher J.
2005-01-01
A compact, lightweight heat exchanger has been designed to be fault-tolerant in the sense that a single-point leak would not cause mixing of heat-transfer fluids. This particular heat exchanger is intended to be part of the temperature-regulation system for habitable modules of the International Space Station and to function with water and ammonia as the heat-transfer fluids. The basic fault-tolerant design is adaptable to other heat-transfer fluids and heat exchangers for applications in which mixing of heat-transfer fluids would pose toxic, explosive, or other hazards: Examples could include fuel/air heat exchangers for thermal management on aircraft, process heat exchangers in the cryogenic industry, and heat exchangers used in chemical processing. The reason this heat exchanger can tolerate a single-point leak is that the heat-transfer fluids are everywhere separated by a vented volume and at least two seals. The combination of fault tolerance, compactness, and light weight is implemented in a unique heat-exchanger core configuration: Each fluid passage is entirely surrounded by a vented region bridged by solid structures through which heat is conducted between the fluids. Precise, proprietary fabrication techniques make it possible to manufacture the vented regions and heat-conducting structures with very small dimensions to obtain a very large coefficient of heat transfer between the two fluids. A large heat-transfer coefficient favors compact design by making it possible to use a relatively small core for a given heat-transfer rate. Calculations and experiments have shown that in most respects, the fault-tolerant heat exchanger can be expected to equal or exceed the performance of the non-fault-tolerant heat exchanger that it is intended to supplant (see table). The only significant disadvantages are a slight weight penalty and a small decrease in the mass-specific heat transfer.
The cost of software fault tolerance
NASA Technical Reports Server (NTRS)
Migneault, G. E.
1982-01-01
The proposed use of software fault tolerance techniques as a means of reducing software costs in avionics and as a means of addressing the issue of system unreliability due to faults in software is examined. A model is developed to provide a view of the relationships among cost, redundancy, and reliability which suggests strategies for software development and maintenance which are not conventional.
Fiber Bragg grating sensor for fault detection in high voltage overhead transmission lines
NASA Astrophysics Data System (ADS)
Moghadas, Amin
2011-12-01
A fiber optic based sensor capable of fault detection in both radial and network overhead transmission power line systems is investigated. Bragg wavelength shift is used to measure the fault current and detect fault in power systems. Magnetic fields generated by currents in the overhead transmission lines cause a strain in magnetostrictive material which is then detected by fiber Bragg grating (FBG) sensors. The Fiber Bragg interrogator senses the reflected FBG signals, and the Bragg wavelength shift is calculated and the signals are processed. A broadband light source in the control room scans the shift in the reflected signals. Any surge in the magnetic field relates to an increased fault current at a certain location. Also, fault location can be precisely defined with an artificial neural network (ANN) algorithm. This algorithm can be easily coordinated with other protective devices. It is shown that the faults in the overhead transmission line cause a detectable wavelength shift on the reflected signal of FBG sensors and can be used to detect and classify different kind of faults. The proposed method has been extensively tested by simulation and results confirm that the proposed scheme is able to detect different kinds of fault in both radial and network system.
Computer-Aided Reliability Estimation
NASA Technical Reports Server (NTRS)
Bavuso, S. J.; Stiffler, J. J.; Bryant, L. A.; Petersen, P. L.
1986-01-01
CARE III (Computer-Aided Reliability Estimation, Third Generation) helps estimate reliability of complex, redundant, fault-tolerant systems. Program specifically designed for evaluation of fault-tolerant avionics systems. However, CARE III general enough for use in evaluation of other systems as well.
Fault-Tolerant Control For A Robotic Inspection System
NASA Technical Reports Server (NTRS)
Tso, Kam Sing
1995-01-01
Report describes first phase of continuing program of research on fault-tolerant control subsystem of telerobotic visual-inspection system. Goal of program to develop robotic system for remotely controlled visual inspection of structures in outer space.
Characterization of the faulted behavior of digital computers and fault tolerant systems
NASA Technical Reports Server (NTRS)
Bavuso, Salvatore J.; Miner, Paul S.
1989-01-01
A development status evaluation is presented for efforts conducted at NASA-Langley since 1977, toward the characterization of the latent fault in digital fault-tolerant systems. Attention is given to the practical, high speed, generalized gate-level logic system simulator developed, as well as to the validation methodology used for the simulator, on the basis of faultable software and hardware simulations employing a prototype MIL-STD-1750A processor. After validation, latency tests will be performed.
Fly-By-Light/Power-By-Wire Fault-Tolerant Fiber-Optic Backplane
NASA Technical Reports Server (NTRS)
Malekpour, Mahyar R.
2002-01-01
The design and development of a fault-tolerant fiber-optic backplane to demonstrate feasibility of such architecture is presented. The simulation results of test cases on the backplane in the advent of induced faults are presented, and the fault recovery capability of the architecture is demonstrated. The architecture was designed, developed, and implemented using the Very High Speed Integrated Circuits (VHSIC) Hardware Description Language (VHDL). The architecture was synthesized and implemented in hardware using Field Programmable Gate Arrays (FPGA) on multiple prototype boards.
Li, Dan; Hu, Xiaoguang
2017-03-01
Because of the high availability requirements from weapon equipment, an in-depth study has been conducted on the real-time fault-tolerance of the widely applied Compact PCI (CPCI) bus measurement and control system. A redundancy design method that uses heartbeat detection to connect the primary and alternate devices has been developed. To address the low successful execution rate and relatively large waste of time slices in the primary version of the task software, an improved algorithm for real-time fault-tolerant scheduling is proposed based on the Basic Checking available time Elimination idle time (BCE) algorithm, applying a single-neuron self-adaptive proportion sum differential (PSD) controller. The experimental validation results indicate that this system has excellent redundancy and fault-tolerance, and the newly developed method can effectively improve the system availability. Copyright © 2017 ISA. Published by Elsevier Ltd. All rights reserved.
Multiple Embedded Processors for Fault-Tolerant Computing
NASA Technical Reports Server (NTRS)
Bolotin, Gary; Watson, Robert; Katanyoutanant, Sunant; Burke, Gary; Wang, Mandy
2005-01-01
A fault-tolerant computer architecture has been conceived in an effort to reduce vulnerability to single-event upsets (spurious bit flips caused by impingement of energetic ionizing particles or photons). As in some prior fault-tolerant architectures, the redundancy needed for fault tolerance is obtained by use of multiple processors in one computer. Unlike prior architectures, the multiple processors are embedded in a single field-programmable gate array (FPGA). What makes this new approach practical is the recent commercial availability of FPGAs that are capable of having multiple embedded processors. A working prototype (see figure) consists of two embedded IBM PowerPC 405 processor cores and a comparator built on a Xilinx Virtex-II Pro FPGA. This relatively simple instantiation of the architecture implements an error-detection scheme. A planned future version, incorporating four processors and two comparators, would correct some errors in addition to detecting them.
Analysis of fault-tolerant neurocontrol architectures
NASA Technical Reports Server (NTRS)
Troudet, T.; Merrill, W.
1992-01-01
The fault-tolerance of analog parallel distributed implementations of a multivariable aircraft neurocontroller is analyzed by simulating weight and neuron failures in a simplified scheme of analog processing based on the functional architecture of the ETANN chip (Electrically Trainable Artificial Neural Network). The neural information processing is found to be only partially distributed throughout the set of weights of the neurocontroller synthesized with the backpropagation algorithm. Although the degree of distribution of the neural processing, and consequently the fault-tolerance of the neurocontroller, could be enhanced using Locally Distributed Weight and Neuron Approaches, a satisfactory level of fault-tolerance could only be obtained by retraining the degrated VLSI neurocontroller. The possibility of maintaining neurocontrol performance and stability in the presence of single weight of neuron failures was demonstrated through an automated retraining procedure of the neurocontroller based on a pre-programmed choice and sequence of the training parameters.
Evaluating and extending user-level fault tolerance in MPI applications
Laguna, Ignacio; Richards, David F.; Gamblin, Todd; ...
2016-01-11
The user-level failure mitigation (ULFM) interface has been proposed to provide fault-tolerant semantics in the Message Passing Interface (MPI). Previous work presented performance evaluations of ULFM; yet questions related to its programability and applicability, especially to non-trivial, bulk synchronous applications, remain unanswered. In this article, we present our experiences on using ULFM in a case study with a large, highly scalable, bulk synchronous molecular dynamics application to shed light on the advantages and difficulties of this interface to program fault-tolerant MPI applications. We found that, although ULFM is suitable for master–worker applications, it provides few benefits for more common bulkmore » synchronous MPI applications. Furthermore, to address these limitations, we introduce a new, simpler fault-tolerant interface for complex, bulk synchronous MPI programs with better applicability and support than ULFM for application-level recovery mechanisms, such as global rollback.« less
Machine-checked proofs of the design and implementation of a fault-tolerant circuit
NASA Technical Reports Server (NTRS)
Bevier, William R.; Young, William D.
1990-01-01
A formally verified implementation of the 'oral messages' algorithm of Pease, Shostak, and Lamport is described. An abstract implementation of the algorithm is verified to achieve interactive consistency in the presence of faults. This abstract characterization is then mapped down to a hardware level implementation which inherits the fault-tolerant characteristics of the abstract version. All steps in the proof were checked with the Boyer-Moore theorem prover. A significant results is the demonstration of a fault-tolerant device that is formally specified and whose implementation is proved correct with respect to this specification. A significant simplifying assumption is that the redundant processors behave synchronously. A mechanically checked proof that the oral messages algorithm is 'optimal' in the sense that no algorithm which achieves agreement via similar message passing can tolerate a larger proportion of faulty processor is also described.
Liu, Guohai; Yang, Junqin; Chen, Ming; Chen, Qian
2014-01-01
A fault-tolerant permanent-magnet vernier (FT-PMV) machine is designed for direct-drive applications, incorporating the merits of high torque density and high reliability. Based on the so-called magnetic gearing effect, PMV machines have the ability of high torque density by introducing the flux-modulation poles (FMPs). This paper investigates the fault-tolerant characteristic of PMV machines and provides a design method, which is able to not only meet the fault-tolerant requirements but also keep the ability of high torque density. The operation principle of the proposed machine has been analyzed. The design process and optimization are presented specifically, such as the combination of slots and poles, the winding distribution, and the dimensions of PMs and teeth. By using the time-stepping finite element method (TS-FEM), the machine performances are evaluated. Finally, the FT-PMV machine is manufactured, and the experimental results are presented to validate the theoretical analysis.
Sensor placement for diagnosability in space-borne systems - A model-based reasoning approach
NASA Technical Reports Server (NTRS)
Chien, Steve; Doyle, Richard; Rouquette, Nicolas
1992-01-01
This paper presents an approach to evaluating sensor placements on the basis of how well they are able to discriminate between a given fault and normal operating modes and/or other fault modes. In this approach, a model of the system in both normal operations and fault modes is used to evaluate possible sensor placements upon the basis of three criteria. Discriminability measures how much of a divergence in expected sensor readings the two system modes can be expected to produce. Accuracy measures confidence in the particular model predictions. Timeliness measures how long after the fault occurrence the expected divergence will take place. These three metrics then can be used to form a recommendation for a sensor placement. This paper describes how these measures can be computed and illustrated these methods with a brief example.
Faulty node detection in wireless sensor networks using a recurrent neural network
NASA Astrophysics Data System (ADS)
Atiga, Jamila; Mbarki, Nour Elhouda; Ejbali, Ridha; Zaied, Mourad
2018-04-01
The wireless sensor networks (WSN) consist of a set of sensors that are more and more used in surveillance applications on a large scale in different areas: military, Environment, Health ... etc. Despite the minimization and the reduction of the manufacturing costs of the sensors, they can operate in places difficult to access without the possibility of reloading of battery, they generally have limited resources in terms of power of emission, of processing capacity, data storage and energy. These sensors can be used in a hostile environment, such as, for example, on a field of battle, in the presence of fires, floods, earthquakes. In these environments the sensors can fail, even in a normal operation. It is therefore necessary to develop algorithms tolerant and detection of defects of the nodes for the network of sensor without wires, therefore, the faults of the sensor can reduce the quality of the surveillance if they are not detected. The values that are measured by the sensors are used to estimate the state of the monitored area. We used the Non-linear Auto- Regressive with eXogeneous (NARX), the recursive architecture of the neural network, to predict the state of a node of a sensor from the previous values described by the functions of time series. The experimental results have verified that the prediction of the State is enhanced by our proposed model.
Gait planning for a quadruped robot with one faulty actuator
NASA Astrophysics Data System (ADS)
Chen, Xianbao; Gao, Feng; Qi, Chenkun; Tian, Xinghua
2015-01-01
Fault tolerance is essential for quadruped robots when they work in remote areas or hazardous environments. Many fault-tolerant gaits planning method proposed in the past decade constrained more degrees of freedom(DOFs) of a robot than necessary. Thus a novel method to realize the fault-tolerant walking is proposed. The mobility of the robot is analyzed first by using the screw theory. The result shows that the translation of the center of body(CoB) can be kept with one faulty actuator if the rotations of the body are controlled. Thus the DOFs of the robot body are divided into two parts: the translation of the CoB and the rotation of the body. The kinematic model of the whole robot is built, the algorithm is developed to actively control the body orientations at the velocity level so that the planned CoB trajectory can be realized in spite of the constraint of the faulty actuator. This gait has a similar generation sequence with the normal gait and can be applied to the robot at any position. Simulations and experiments of the fault-tolerant gait with one faulty actuator are carried out. The CoB errors and the body rotation angles are measured. Comparing to the traditional fault-tolerant gait they can be reduced by at least 50%. A fault-tolerant gait planning algorithm is presented, which not only realizes the walking of a quadruped robot with a faulty actuator, but also efficiently improves the walking performances by taking full advantage of the remaining operational actuators according to the results of the simulations and experiments.
Optimal Sensor Location Design for Reliable Fault Detection in Presence of False Alarms
Yang, Fan; Xiao, Deyun; Shah, Sirish L.
2009-01-01
To improve fault detection reliability, sensor location should be designed according to an optimization criterion with constraints imposed by issues of detectability and identifiability. Reliability requires the minimization of undetectability and false alarm probability due to random factors on sensor readings, which is not only related with sensor readings but also affected by fault propagation. This paper introduces the reliability criteria expression based on the missed/false alarm probability of each sensor and system topology or connectivity derived from the directed graph. The algorithm for the optimization problem is presented as a heuristic procedure. Finally, a boiler system is illustrated using the proposed method. PMID:22291524
NASA Technical Reports Server (NTRS)
Simon, Dan; Simon, Donald L.
2009-01-01
Given a system which can fail in 1 or n different ways, a fault detection and isolation (FDI) algorithm uses sensor data in order to determine which fault is the most likely to have occurred. The effectiveness of an FDI algorithm can be quantified by a confusion matrix, which i ndicates the probability that each fault is isolated given that each fault has occurred. Confusion matrices are often generated with simulation data, particularly for complex systems. In this paper we perform FDI using sums of squares of sensor residuals (SSRs). We assume that the sensor residuals are Gaussian, which gives the SSRs a chi-squared distribution. We then generate analytic lower and upper bounds on the confusion matrix elements. This allows for the generation of optimal sensor sets without numerical simulations. The confusion matrix bound s are verified with simulated aircraft engine data.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wu, Qishi; Zhu, Mengxia; Rao, Nageswara S
We propose an intelligent decision support system based on sensor and computer networks that incorporates various component techniques for sensor deployment, data routing, distributed computing, and information fusion. The integrated system is deployed in a distributed environment composed of both wireless sensor networks for data collection and wired computer networks for data processing in support of homeland security defense. We present the system framework and formulate the analytical problems and develop approximate or exact solutions for the subtasks: (i) sensor deployment strategy based on a two-dimensional genetic algorithm to achieve maximum coverage with cost constraints; (ii) data routing scheme tomore » achieve maximum signal strength with minimum path loss, high energy efficiency, and effective fault tolerance; (iii) network mapping method to assign computing modules to network nodes for high-performance distributed data processing; and (iv) binary decision fusion rule that derive threshold bounds to improve system hit rate and false alarm rate. These component solutions are implemented and evaluated through either experiments or simulations in various application scenarios. The extensive results demonstrate that these component solutions imbue the integrated system with the desirable and useful quality of intelligence in decision making.« less
Fault detection and fault tolerance in robotics
NASA Technical Reports Server (NTRS)
Visinsky, Monica; Walker, Ian D.; Cavallaro, Joseph R.
1992-01-01
Robots are used in inaccessible or hazardous environments in order to alleviate some of the time, cost and risk involved in preparing men to endure these conditions. In order to perform their expected tasks, the robots are often quite complex, thus increasing their potential for failures. If men must be sent into these environments to repair each component failure in the robot, the advantages of using the robot are quickly lost. Fault tolerant robots are needed which can effectively cope with failures and continue their tasks until repairs can be realistically scheduled. Before fault tolerant capabilities can be created, methods of detecting and pinpointing failures must be perfected. This paper develops a basic fault tree analysis of a robot in order to obtain a better understanding of where failures can occur and how they contribute to other failures in the robot. The resulting failure flow chart can also be used to analyze the resiliency of the robot in the presence of specific faults. By simulating robot failures and fault detection schemes, the problems involved in detecting failures for robots are explored in more depth.
Generic Sensor Failure Modeling for Cooperative Systems.
Jäger, Georg; Zug, Sebastian; Casimiro, António
2018-03-20
The advent of cooperative systems entails a dynamic composition of their components. As this contrasts current, statically composed systems, new approaches for maintaining their safety are required. In that endeavor, we propose an integration step that evaluates the failure model of shared information in relation to an application's fault tolerance and thereby promises maintainability of such system's safety. However, it also poses new requirements on failure models, which are not fulfilled by state-of-the-art approaches. Consequently, this work presents a mathematically defined generic failure model as well as a processing chain for automatically extracting such failure models from empirical data. By examining data of an Sharp GP2D12 distance sensor, we show that the generic failure model not only fulfills the predefined requirements, but also models failure characteristics appropriately when compared to traditional techniques.
Generic Sensor Failure Modeling for Cooperative Systems
Jäger, Georg; Zug, Sebastian
2018-01-01
The advent of cooperative systems entails a dynamic composition of their components. As this contrasts current, statically composed systems, new approaches for maintaining their safety are required. In that endeavor, we propose an integration step that evaluates the failure model of shared information in relation to an application’s fault tolerance and thereby promises maintainability of such system’s safety. However, it also poses new requirements on failure models, which are not fulfilled by state-of-the-art approaches. Consequently, this work presents a mathematically defined generic failure model as well as a processing chain for automatically extracting such failure models from empirical data. By examining data of an Sharp GP2D12 distance sensor, we show that the generic failure model not only fulfills the predefined requirements, but also models failure characteristics appropriately when compared to traditional techniques. PMID:29558435
SFTP: A Secure and Fault-Tolerant Paradigm against Blackhole Attack in MANET
NASA Astrophysics Data System (ADS)
KumarRout, Jitendra; Kumar Bhoi, Sourav; Kumar Panda, Sanjaya
2013-02-01
Security issues in MANET are a challenging task nowadays. MANETs are vulnerable to passive attacks and active attacks because of a limited number of resources and lack of centralized authority. Blackhole attack is an attack in network layer which degrade the network performance by dropping the packets. In this paper, we have proposed a Secure Fault-Tolerant Paradigm (SFTP) which checks the Blackhole attack in the network. The three phases used in SFTP algorithm are designing of coverage area to find the area of coverage, Network Connection algorithm to design a fault-tolerant model and Route Discovery algorithm to discover the route and data delivery from source to destination. SFTP gives better network performance by making the network fault free.
Using Performance Tools to Support Experiments in HPC Resilience
DOE Office of Scientific and Technical Information (OSTI.GOV)
Naughton, III, Thomas J; Boehm, Swen; Engelmann, Christian
2014-01-01
The high performance computing (HPC) community is working to address fault tolerance and resilience concerns for current and future large scale computing platforms. This is driving enhancements in the programming environ- ments, specifically research on enhancing message passing libraries to support fault tolerant computing capabilities. The community has also recognized that tools for resilience experimentation are greatly lacking. However, we argue that there are several parallels between performance tools and resilience tools . As such, we believe the rich set of HPC performance-focused tools can be extended (repurposed) to benefit the resilience community. In this paper, we describe the initialmore » motivation to leverage standard HPC per- formance analysis techniques to aid in developing diagnostic tools to assist fault tolerance experiments for HPC applications. These diagnosis procedures help to provide context for the system when the errors (failures) occurred. We describe our initial work in leveraging an MPI performance trace tool to assist in provid- ing global context during fault injection experiments. Such tools will assist the HPC resilience community as they extend existing and new application codes to support fault tolerances.« less
Adaptive Fault-Tolerant Control of Uncertain Nonlinear Large-Scale Systems With Unknown Dead Zone.
Chen, Mou; Tao, Gang
2016-08-01
In this paper, an adaptive neural fault-tolerant control scheme is proposed and analyzed for a class of uncertain nonlinear large-scale systems with unknown dead zone and external disturbances. To tackle the unknown nonlinear interaction functions in the large-scale system, the radial basis function neural network (RBFNN) is employed to approximate them. To further handle the unknown approximation errors and the effects of the unknown dead zone and external disturbances, integrated as the compounded disturbances, the corresponding disturbance observers are developed for their estimations. Based on the outputs of the RBFNN and the disturbance observer, the adaptive neural fault-tolerant control scheme is designed for uncertain nonlinear large-scale systems by using a decentralized backstepping technique. The closed-loop stability of the adaptive control system is rigorously proved via Lyapunov analysis and the satisfactory tracking performance is achieved under the integrated effects of unknown dead zone, actuator fault, and unknown external disturbances. Simulation results of a mass-spring-damper system are given to illustrate the effectiveness of the proposed adaptive neural fault-tolerant control scheme for uncertain nonlinear large-scale systems.
A Primer on Architectural Level Fault Tolerance
NASA Technical Reports Server (NTRS)
Butler, Ricky W.
2008-01-01
This paper introduces the fundamental concepts of fault tolerant computing. Key topics covered are voting, fault detection, clock synchronization, Byzantine Agreement, diagnosis, and reliability analysis. Low level mechanisms such as Hamming codes or low level communications protocols are not covered. The paper is tutorial in nature and does not cover any topic in detail. The focus is on rationale and approach rather than detailed exposition.
Reliability model derivation of a fault-tolerant, dual, spare-switching, digital computer system
NASA Technical Reports Server (NTRS)
1974-01-01
A computer based reliability projection aid, tailored specifically for application in the design of fault-tolerant computer systems, is described. Its more pronounced characteristics include the facility for modeling systems with two distinct operational modes, measuring the effect of both permanent and transient faults, and calculating conditional system coverage factors. The underlying conceptual principles, mathematical models, and computer program implementation are presented.
Fault-tolerant arithmetic via time-shared TMR
NASA Astrophysics Data System (ADS)
Swartzlander, Earl E.
1999-11-01
Fault tolerance is increasingly important as society has come to depend on computers for more and more aspects of daily life. The current concern about the Y2K problems indicates just how much we depend on accurate computers. This paper describes work on time- shared TMR, a technique which is used to provide arithmetic operations that produce correct results in spite of circuit faults.
Implementation of an experimental fault-tolerant memory system
NASA Technical Reports Server (NTRS)
Carter, W. C.; Mccarthy, C. E.
1976-01-01
The experimental fault-tolerant memory system described in this paper has been designed to enable the modular addition of spares, to validate the theoretical fault-secure and self-testing properties of the translator/corrector, to provide a basis for experiments using the new testing and correction processes for recovery, and to determine the practicality of such systems. The hardware design and implementation are described, together with methods of fault insertion. The hardware/software interface, including a restricted single error correction/double error detection (SEC/DED) code, is specified. Procedures are carefully described which, (1) test for specified physical faults, (2) ensure that single error corrections are not miscorrections due to triple faults, and (3) enable recovery from double errors.
Peer-to-peer model for the area coverage and cooperative control of mobile sensor networks
NASA Astrophysics Data System (ADS)
Tan, Jindong; Xi, Ning
2004-09-01
This paper presents a novel model and distributed algorithms for the cooperation and redeployment of mobile sensor networks. A mobile sensor network composes of a collection of wireless connected mobile robots equipped with a variety of sensors. In such a sensor network, each mobile node has sensing, computation, communication, and locomotion capabilities. The locomotion ability enhances the autonomous deployment of the system. The system can be rapidly deployed to hostile environment, inaccessible terrains or disaster relief operations. The mobile sensor network is essentially a cooperative multiple robot system. This paper first presents a peer-to-peer model to define the relationship between neighboring communicating robots. Delaunay Triangulation and Voronoi diagrams are used to define the geometrical relationship between sensor nodes. This distributed model allows formal analysis for the fusion of spatio-temporal sensory information of the network. Based on the distributed model, this paper discusses a fault tolerant algorithm for autonomous self-deployment of the mobile robots. The algorithm considers the environment constraints, the presence of obstacles and the nonholonomic constraints of the robots. The distributed algorithm enables the system to reconfigure itself such that the area covered by the system can be enlarged. Simulation results have shown the effectiveness of the distributed model and deployment algorithms.
Hua, Yongzhao; Dong, Xiwang; Li, Qingdong; Ren, Zhang
2017-11-01
This paper investigates the fault-tolerant time-varying formation control problems for high-order linear multi-agent systems in the presence of actuator failures. Firstly, a fully distributed formation control protocol is presented to compensate for the influences of both bias fault and loss of effectiveness fault. Using the adaptive online updating strategies, no global knowledge about the communication topology is required and the bounds of actuator failures can be unknown. Then an algorithm is proposed to determine the control parameters of the fault-tolerant formation protocol, where the time-varying formation feasible conditions and an approach to expand the feasible formation set are given. Furthermore, the stability of the proposed algorithm is proven based on the Lyapunov-like theory. Finally, two simulation examples are given to demonstrate the effectiveness of the theoretical results. Copyright © 2017 ISA. Published by Elsevier Ltd. All rights reserved.
An uncertainty-based distributed fault detection mechanism for wireless sensor networks.
Yang, Yang; Gao, Zhipeng; Zhou, Hang; Qiu, Xuesong
2014-04-25
Exchanging too many messages for fault detection will cause not only a degradation of the network quality of service, but also represents a huge burden on the limited energy of sensors. Therefore, we propose an uncertainty-based distributed fault detection through aided judgment of neighbors for wireless sensor networks. The algorithm considers the serious influence of sensing measurement loss and therefore uses Markov decision processes for filling in missing data. Most important of all, fault misjudgments caused by uncertainty conditions are the main drawbacks of traditional distributed fault detection mechanisms. We draw on the experience of evidence fusion rules based on information entropy theory and the degree of disagreement function to increase the accuracy of fault detection. Simulation results demonstrate our algorithm can effectively reduce communication energy overhead due to message exchanges and provide a higher detection accuracy ratio.
Fault Tolerant Real-Time Systems
1993-09-30
The ART (Advanced Real-Time Technology) Project of Carnegie Mellon University is engaged in wide ranging research on hard real - time systems . The...including hardware and software fault tolerance using temporal redundancy and analytic redundancy to permit the construction of real - time systems whose
A fault-tolerant multiprocessor architecture for aircraft, volume 1. [autopilot configuration
NASA Technical Reports Server (NTRS)
Smith, T. B.; Hopkins, A. L.; Taylor, W.; Ausrotas, R. A.; Lala, J. H.; Hanley, L. D.; Martin, J. H.
1978-01-01
A fault-tolerant multiprocessor architecture is reported. This architecture, together with a comprehensive information system architecture, has important potential for future aircraft applications. A preliminary definition and assessment of a suitable multiprocessor architecture for such applications is developed.
Fault-tolerant linear optical quantum computing with small-amplitude coherent States.
Lund, A P; Ralph, T C; Haselgrove, H L
2008-01-25
Quantum computing using two coherent states as a qubit basis is a proposed alternative architecture with lower overheads but has been questioned as a practical way of performing quantum computing due to the fragility of diagonal states with large coherent amplitudes. We show that using error correction only small amplitudes (alpha>1.2) are required for fault-tolerant quantum computing. We study fault tolerance under the effects of small amplitudes and loss using a Monte Carlo simulation. The first encoding level resources are orders of magnitude lower than the best single photon scheme.
Enhanced fault-tolerant quantum computing in d-level systems.
Campbell, Earl T
2014-12-05
Error-correcting codes protect quantum information and form the basis of fault-tolerant quantum computing. Leading proposals for fault-tolerant quantum computation require codes with an exceedingly rare property, a transversal non-Clifford gate. Codes with the desired property are presented for d-level qudit systems with prime d. The codes use n=d-1 qudits and can detect up to ∼d/3 errors. We quantify the performance of these codes for one approach to quantum computation known as magic-state distillation. Unlike prior work, we find performance is always enhanced by increasing d.
NASA Technical Reports Server (NTRS)
Gault, J. W. (Editor); Trivedi, K. S. (Editor); Clary, J. B. (Editor)
1980-01-01
The validation process comprises the activities required to insure the agreement of system realization with system specification. A preliminary validation methodology for fault tolerant systems documented. A general framework for a validation methodology is presented along with a set of specific tasks intended for the validation of two specimen system, SIFT and FTMP. Two major areas of research are identified. First, are those activities required to support the ongoing development of the validation process itself, and second, are those activities required to support the design, development, and understanding of fault tolerant systems.
Modeling and Simulation Reliable Spacecraft On-Board Computing
NASA Technical Reports Server (NTRS)
Park, Nohpill
1999-01-01
The proposed project will investigate modeling and simulation-driven testing and fault tolerance schemes for Spacecraft On-Board Computing, thereby achieving reliable spacecraft telecommunication. A spacecraft communication system has inherent capabilities of providing multipoint and broadcast transmission, connectivity between any two distant nodes within a wide-area coverage, quick network configuration /reconfiguration, rapid allocation of space segment capacity, and distance-insensitive cost. To realize the capabilities above mentioned, both the size and cost of the ground-station terminals have to be reduced by using reliable, high-throughput, fast and cost-effective on-board computing system which has been known to be a critical contributor to the overall performance of space mission deployment. Controlled vulnerability of mission data (measured in sensitivity), improved performance (measured in throughput and delay) and fault tolerance (measured in reliability) are some of the most important features of these systems. The system should be thoroughly tested and diagnosed before employing a fault tolerance into the system. Testing and fault tolerance strategies should be driven by accurate performance models (i.e. throughput, delay, reliability and sensitivity) to find an optimal solution in terms of reliability and cost. The modeling and simulation tools will be integrated with a system architecture module, a testing module and a module for fault tolerance all of which interacting through a centered graphical user interface.
An improved fault-tolerant control scheme for PWM inverter-fed induction motor-based EVs.
Tabbache, Bekheïra; Benbouzid, Mohamed; Kheloui, Abdelaziz; Bourgeot, Jean-Matthieu; Mamoune, Abdeslam
2013-11-01
This paper proposes an improved fault-tolerant control scheme for PWM inverter-fed induction motor-based electric vehicles. The proposed strategy deals with power switch (IGBTs) failures mitigation within a reconfigurable induction motor control. To increase the vehicle powertrain reliability regarding IGBT open-circuit failures, 4-wire and 4-leg PWM inverter topologies are investigated and their performances discussed in a vehicle context. The proposed fault-tolerant topologies require only minimum hardware modifications to the conventional off-the-shelf six-switch three-phase drive, mitigating the IGBTs failures by specific inverter control. Indeed, the two topologies exploit the induction motor neutral accessibility for fault-tolerant purposes. The 4-wire topology uses then classical hysteresis controllers to account for the IGBT failures. The 4-leg topology, meanwhile, uses a specific 3D space vector PWM to handle vehicle requirements in terms of size (DC bus capacitors) and cost (IGBTs number). Experiments on an induction motor drive and simulations on an electric vehicle are carried-out using a European urban driving cycle to show that the proposed fault-tolerant control approach is effective and provides a simple configuration with high performance in terms of speed and torque responses. Copyright © 2013 ISA. Published by Elsevier Ltd. All rights reserved.
Xu, Shi-Zhou; Wang, Chun-Jie; Lin, Fang-Li; Li, Shi-Xiang
2017-10-31
The multi-device open-circuit fault is a common fault of ANPC (Active Neutral-Point Clamped) three-level inverter and effect the operation stability of the whole system. To improve the operation stability, this paper summarized the main solutions currently firstly and analyzed all the possible states of multi-device open-circuit fault. Secondly, an order-reduction optimal control strategy was proposed under multi-device open-circuit fault to realize fault-tolerant control based on the topology and control requirement of ANPC three-level inverter and operation stability. This control strategy can solve the faults with different operation states, and can works in order-reduction state under specific open-circuit faults with specific combined devices, which sacrifices the control quality to obtain the stability priority control. Finally, the simulation and experiment proved the effectiveness of the proposed strategy.
Arun Dominic, D; Chelliah, Thanga Raj
2014-09-01
To obtain high dynamic performance on induction motor drives (IMD), variable voltage and variable frequency operation has to be performed by measuring speed of rotation and stator currents through sensors and fed back them to the controllers. When the sensors are undergone a fault, the stability of control system, may be designed for an industrial process, is disturbed. This paper studies the negative effects on a 12.5 hp induction motor drives when the field oriented control system is subjected to sensor faults. To illustrate the importance of this study mine hoist load diagram is considered as shaft load of the tested machine. The methods to recover the system from sensor faults are discussed. In addition, the various speed sensorless schemes are reviewed comprehensively. Copyright © 2014 ISA. Published by Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Li, Zhixiong; Yan, Xinping; Wang, Xuping; Peng, Zhongxiao
2016-06-01
In the complex gear transmission systems, in wind turbines a crack is one of the most common failure modes and can be fatal to the wind turbine power systems. A single sensor may suffer with issues relating to its installation position and direction, resulting in the collection of weak dynamic responses of the cracked gear. A multi-channel sensor system is hence applied in the signal acquisition and the blind source separation (BSS) technologies are employed to optimally process the information collected from multiple sensors. However, literature review finds that most of the BSS based fault detectors did not address the dependence/correlation between different moving components in the gear systems; particularly, the popular used independent component analysis (ICA) assumes mutual independence of different vibration sources. The fault detection performance may be significantly influenced by the dependence/correlation between vibration sources. In order to address this issue, this paper presents a new method based on the supervised order tracking bounded component analysis (SOTBCA) for gear crack detection in wind turbines. The bounded component analysis (BCA) is a state of art technology for dependent source separation and is applied limitedly to communication signals. To make it applicable for vibration analysis, in this work, the order tracking has been appropriately incorporated into the BCA framework to eliminate the noise and disturbance signal components. Then an autoregressive (AR) model built with prior knowledge about the crack fault is employed to supervise the reconstruction of the crack vibration source signature. The SOTBCA only outputs one source signal that has the closest distance with the AR model. Owing to the dependence tolerance ability of the BCA framework, interfering vibration sources that are dependent/correlated with the crack vibration source could be recognized by the SOTBCA, and hence, only useful fault information could be preserved in the reconstructed signal. The crack failure thus could be precisely identified by the cyclic spectral correlation analysis. A series of numerical simulations and experimental tests have been conducted to illustrate the advantages of the proposed SOTBCA method for fatigue crack detection. Comparisons to three representative techniques, i.e. Erdogan's BCA (E-BCA), joint approximate diagonalization of eigen-matrices (JADE), and FastICA, have demonstrated the effectiveness of the SOTBCA. Hence the proposed approach is suitable for accurate gear crack detection in practical applications.
Fault-tolerant reactor protection system
Gaubatz, Donald C.
1997-01-01
A reactor protection system having four divisions, with quad redundant sensors for each scram parameter providing input to four independent microprocessor-based electronic chassis. Each electronic chassis acquires the scram parameter data from its own sensor, digitizes the information, and then transmits the sensor reading to the other three electronic chassis via optical fibers. To increase system availability and reduce false scrams, the reactor protection system employs two levels of voting on a need for reactor scram. The electronic chassis perform software divisional data processing, vote 2/3 with spare based upon information from all four sensors, and send the divisional scram signals to the hardware logic panel, which performs a 2/4 division vote on whether or not to initiate a reactor scram. Each chassis makes a divisional scram decision based on data from all sensors. Each division performs independently of the others (asynchronous operation). All communications between the divisions are asynchronous. Each chassis substitutes its own spare sensor reading in the 2/3 vote if a sensor reading from one of the other chassis is faulty or missing. Therefore the presence of at least two valid sensor readings in excess of a set point is required before terminating the output to the hardware logic of a scram inhibition signal even when one of the four sensors is faulty or when one of the divisions is out of service.
Fault-tolerant reactor protection system
Gaubatz, D.C.
1997-04-15
A reactor protection system is disclosed having four divisions, with quad redundant sensors for each scram parameter providing input to four independent microprocessor-based electronic chassis. Each electronic chassis acquires the scram parameter data from its own sensor, digitizes the information, and then transmits the sensor reading to the other three electronic chassis via optical fibers. To increase system availability and reduce false scrams, the reactor protection system employs two levels of voting on a need for reactor scram. The electronic chassis perform software divisional data processing, vote 2/3 with spare based upon information from all four sensors, and send the divisional scram signals to the hardware logic panel, which performs a 2/4 division vote on whether or not to initiate a reactor scram. Each chassis makes a divisional scram decision based on data from all sensors. Each division performs independently of the others (asynchronous operation). All communications between the divisions are asynchronous. Each chassis substitutes its own spare sensor reading in the 2/3 vote if a sensor reading from one of the other chassis is faulty or missing. Therefore the presence of at least two valid sensor readings in excess of a set point is required before terminating the output to the hardware logic of a scram inhibition signal even when one of the four sensors is faulty or when one of the divisions is out of service. 16 figs.
Simple Random Sampling-Based Probe Station Selection for Fault Detection in Wireless Sensor Networks
Huang, Rimao; Qiu, Xuesong; Rui, Lanlan
2011-01-01
Fault detection for wireless sensor networks (WSNs) has been studied intensively in recent years. Most existing works statically choose the manager nodes as probe stations and probe the network at a fixed frequency. This straightforward solution leads however to several deficiencies. Firstly, by only assigning the fault detection task to the manager node the whole network is out of balance, and this quickly overloads the already heavily burdened manager node, which in turn ultimately shortens the lifetime of the whole network. Secondly, probing with a fixed frequency often generates too much useless network traffic, which results in a waste of the limited network energy. Thirdly, the traditional algorithm for choosing a probing node is too complicated to be used in energy-critical wireless sensor networks. In this paper, we study the distribution characters of the fault nodes in wireless sensor networks, validate the Pareto principle that a small number of clusters contain most of the faults. We then present a Simple Random Sampling-based algorithm to dynamic choose sensor nodes as probe stations. A dynamic adjusting rule for probing frequency is also proposed to reduce the number of useless probing packets. The simulation experiments demonstrate that the algorithm and adjusting rule we present can effectively prolong the lifetime of a wireless sensor network without decreasing the fault detected rate. PMID:22163789
Huang, Rimao; Qiu, Xuesong; Rui, Lanlan
2011-01-01
Fault detection for wireless sensor networks (WSNs) has been studied intensively in recent years. Most existing works statically choose the manager nodes as probe stations and probe the network at a fixed frequency. This straightforward solution leads however to several deficiencies. Firstly, by only assigning the fault detection task to the manager node the whole network is out of balance, and this quickly overloads the already heavily burdened manager node, which in turn ultimately shortens the lifetime of the whole network. Secondly, probing with a fixed frequency often generates too much useless network traffic, which results in a waste of the limited network energy. Thirdly, the traditional algorithm for choosing a probing node is too complicated to be used in energy-critical wireless sensor networks. In this paper, we study the distribution characters of the fault nodes in wireless sensor networks, validate the Pareto principle that a small number of clusters contain most of the faults. We then present a Simple Random Sampling-based algorithm to dynamic choose sensor nodes as probe stations. A dynamic adjusting rule for probing frequency is also proposed to reduce the number of useless probing packets. The simulation experiments demonstrate that the algorithm and adjusting rule we present can effectively prolong the lifetime of a wireless sensor network without decreasing the fault detected rate.
NASA Technical Reports Server (NTRS)
Vanschalkwyk, Christiaan Mauritz
1991-01-01
Many applications require that a control system must be tolerant to the failure of its components. This is especially true for large space-based systems that must work unattended and with long periods between maintenance. Fault tolerance can be obtained by detecting the failure of the control system component, determining which component has failed, and reconfiguring the system so that the failed component is isolated from the controller. Component failure detection experiments that were conducted on an experimental space structure, the NASA Langley Mini-Mast are presented. Two methodologies for failure detection and isolation (FDI) exist that do not require the specification of failure modes and are applicable to both actuators and sensors. These methods are known as the Failure Detection Filter and the method of Generalized Parity Relations. The latter method was applied to three different sensor types on the Mini-Mast. Failures were simulated in input-output data that were recorded during operation of the Mini-Mast. Both single and double sensor parity relations were tested and the effect of several design parameters on the performance of these relations is discussed. The detection of actuator failures is also treated. It is shown that in all the cases it is possible to identify the parity relations directly from input-output data. Frequency domain analysis is used to explain the behavior of the parity relations.
A method based on multi-sensor data fusion for fault detection of planetary gearboxes.
Lei, Yaguo; Lin, Jing; He, Zhengjia; Kong, Detong
2012-01-01
Studies on fault detection and diagnosis of planetary gearboxes are quite limited compared with those of fixed-axis gearboxes. Different from fixed-axis gearboxes, planetary gearboxes exhibit unique behaviors, which invalidate fault diagnosis methods that work well for fixed-axis gearboxes. It is a fact that for systems as complex as planetary gearboxes, multiple sensors mounted on different locations provide complementary information on the health condition of the systems. On this basis, a fault detection method based on multi-sensor data fusion is introduced in this paper. In this method, two features developed for planetary gearboxes are used to characterize the gear health conditions, and an adaptive neuro-fuzzy inference system (ANFIS) is utilized to fuse all features from different sensors. In order to demonstrate the effectiveness of the proposed method, experiments are carried out on a planetary gearbox test rig, on which multiple accelerometers are mounted for data collection. The comparisons between the proposed method and the methods based on individual sensors show that the former achieves much higher accuracies in detecting planetary gearbox faults.
Advanced Information Processing System - Fault detection and error handling
NASA Technical Reports Server (NTRS)
Lala, J. H.
1985-01-01
The Advanced Information Processing System (AIPS) is designed to provide a fault tolerant and damage tolerant data processing architecture for a broad range of aerospace vehicles, including tactical and transport aircraft, and manned and autonomous spacecraft. A proof-of-concept (POC) system is now in the detailed design and fabrication phase. This paper gives an overview of a preliminary fault detection and error handling philosophy in AIPS.
Catastrophic Fault Recovery with Self-Reconfigurable Chips
NASA Technical Reports Server (NTRS)
Zheng, Will Hua; Marzwell, Neville I.; Chau, Savio N.
2006-01-01
Mission critical systems typically employ multi-string redundancy to cope with possible hardware failure. Such systems are only as fault tolerant as there are many redundant strings. Once a particular critical component exhausts its redundant spares, the multi-string architecture cannot tolerate any further hardware failure. This paper aims at addressing such catastrophic faults through the use of 'Self-Reconfigurable Chips' as a last resort effort to 'repair' a faulty critical component.
Fenix, A Fault Tolerant Programming Framework for MPI Applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gamel, Marc; Teranihi, Keita; Valenzuela, Eric
2016-10-05
Fenix provides APIs to allow the users to add fault tolerance capability to MPI-based parallel programs in a transparent manner. Fenix-enabled programs can run through process failures during program execution using a pool of spare processes accommodated by Fenix.
Li, Xiao-Jian; Yang, Guang-Hong
2018-01-01
This paper is concerned with the adaptive decentralized fault-tolerant tracking control problem for a class of uncertain interconnected nonlinear systems with unknown strong interconnections. An algebraic graph theory result is introduced to address the considered interconnections. In addition, to achieve the desirable tracking performance, a neural-network-based robust adaptive decentralized fault-tolerant control (FTC) scheme is given to compensate the actuator faults and system uncertainties. Furthermore, via the Lyapunov analysis method, it is proven that all the signals of the resulting closed-loop system are semiglobally bounded, and the tracking errors of each subsystem exponentially converge to a compact set, whose radius is adjustable by choosing different controller design parameters. Finally, the effectiveness and advantages of the proposed FTC approach are illustrated with two simulated examples.
Advanced information processing system: Fault injection study and results
NASA Technical Reports Server (NTRS)
Burkhardt, Laura F.; Masotto, Thomas K.; Lala, Jaynarayan H.
1992-01-01
The objective of the AIPS program is to achieve a validated fault tolerant distributed computer system. The goals of the AIPS fault injection study were: (1) to present the fault injection study components addressing the AIPS validation objective; (2) to obtain feedback for fault removal from the design implementation; (3) to obtain statistical data regarding fault detection, isolation, and reconfiguration responses; and (4) to obtain data regarding the effects of faults on system performance. The parameters are described that must be varied to create a comprehensive set of fault injection tests, the subset of test cases selected, the test case measurements, and the test case execution. Both pin level hardware faults using a hardware fault injector and software injected memory mutations were used to test the system. An overview is provided of the hardware fault injector and the associated software used to carry out the experiments. Detailed specifications are given of fault and test results for the I/O Network and the AIPS Fault Tolerant Processor, respectively. The results are summarized and conclusions are given.
Depth optimal sorting networks resistant to k passive faults
DOE Office of Scientific and Technical Information (OSTI.GOV)
Piotrow, M.
In this paper, we study the problem of constructing a sorting network that is tolerant to faults and whose running time (i.e. depth) is as small as possible. We consider the scenario of worst-case comparator faults and follow the model of passive comparator failure proposed by Yao and Yao, in which a faulty comparator outputs directly its inputs without comparison. Our main result is the first construction of an N-input, k-fault-tolerant sorting network that is of an asymptotically optimal depth {theta}(log N+k). That improves over the recent result of Leighton and Ma, whose network is of depth O(log N +more » k log log N/log k). Actually, we present a fault-tolerant correction network that can be added after any N-input sorting network to correct its output in the presence of at most k faulty comparators. Since the depth of the network is O(log N + k) and the constants hidden behind the {open_quotes}O{close_quotes} notation are not big, the construction can be of practical use. Developing the techniques necessary to show the main result, we construct a fault-tolerant network for the insertion problem. As a by-product, we get an N-input, O(log N)-depth INSERT-network that is tolerant to random faults, thereby answering a question posed by Ma in his PhD thesis. The results are based on a new notion of constant delay comparator networks, that is, networks in which each register is used (compared) only in a period of time of a constant length. Copies of such networks can be put one after another with only a constant increase in depth per copy.« less
An Uncertainty-Based Distributed Fault Detection Mechanism for Wireless Sensor Networks
Yang, Yang; Gao, Zhipeng; Zhou, Hang; Qiu, Xuesong
2014-01-01
Exchanging too many messages for fault detection will cause not only a degradation of the network quality of service, but also represents a huge burden on the limited energy of sensors. Therefore, we propose an uncertainty-based distributed fault detection through aided judgment of neighbors for wireless sensor networks. The algorithm considers the serious influence of sensing measurement loss and therefore uses Markov decision processes for filling in missing data. Most important of all, fault misjudgments caused by uncertainty conditions are the main drawbacks of traditional distributed fault detection mechanisms. We draw on the experience of evidence fusion rules based on information entropy theory and the degree of disagreement function to increase the accuracy of fault detection. Simulation results demonstrate our algorithm can effectively reduce communication energy overhead due to message exchanges and provide a higher detection accuracy ratio. PMID:24776937
Cost and benefits design optimization model for fault tolerant flight control systems
NASA Technical Reports Server (NTRS)
Rose, J.
1982-01-01
Requirements and specifications for a method of optimizing the design of fault-tolerant flight control systems are provided. Algorithms that could be used for developing new and modifying existing computer programs are also provided, with recommendations for follow-on work.
Software reliability models for fault-tolerant avionics computers and related topics
NASA Technical Reports Server (NTRS)
Miller, Douglas R.
1987-01-01
Software reliability research is briefly described. General research topics are reliability growth models, quality of software reliability prediction, the complete monotonicity property of reliability growth, conceptual modelling of software failure behavior, assurance of ultrahigh reliability, and analysis techniques for fault-tolerant systems.
Design and Experimental Validation for Direct-Drive Fault-Tolerant Permanent-Magnet Vernier Machines
Liu, Guohai; Yang, Junqin; Chen, Ming; Chen, Qian
2014-01-01
A fault-tolerant permanent-magnet vernier (FT-PMV) machine is designed for direct-drive applications, incorporating the merits of high torque density and high reliability. Based on the so-called magnetic gearing effect, PMV machines have the ability of high torque density by introducing the flux-modulation poles (FMPs). This paper investigates the fault-tolerant characteristic of PMV machines and provides a design method, which is able to not only meet the fault-tolerant requirements but also keep the ability of high torque density. The operation principle of the proposed machine has been analyzed. The design process and optimization are presented specifically, such as the combination of slots and poles, the winding distribution, and the dimensions of PMs and teeth. By using the time-stepping finite element method (TS-FEM), the machine performances are evaluated. Finally, the FT-PMV machine is manufactured, and the experimental results are presented to validate the theoretical analysis. PMID:25045729
Self-adaptive Fault-Tolerance of HLA-Based Simulations in the Grid Environment
NASA Astrophysics Data System (ADS)
Huang, Jijie; Chai, Xudong; Zhang, Lin; Li, Bo Hu
The objects of a HLA-based simulation can access model services to update their attributes. However, the grid server may be overloaded and refuse the model service to handle objects accesses. Because these objects have been accessed this model service during last simulation loop and their medium state are stored in this server, this may terminate the simulation. A fault-tolerance mechanism must be introduced into simulations. But the traditional fault-tolerance methods cannot meet the above needs because the transmission latency between a federate and the RTI in grid environment varies from several hundred milliseconds to several seconds. By adding model service URLs to the OMT and expanding the HLA services and model services with some interfaces, this paper proposes a self-adaptive fault-tolerance mechanism of simulations according to the characteristics of federates accessing model services. Benchmark experiments indicate that the expanded HLA/RTI can make simulations self-adaptively run in the grid environment.
Error rates and resource overheads of encoded three-qubit gates
NASA Astrophysics Data System (ADS)
Takagi, Ryuji; Yoder, Theodore J.; Chuang, Isaac L.
2017-10-01
A non-Clifford gate is required for universal quantum computation, and, typically, this is the most error-prone and resource-intensive logical operation on an error-correcting code. Small, single-qubit rotations are popular choices for this non-Clifford gate, but certain three-qubit gates, such as Toffoli or controlled-controlled-Z (ccz), are equivalent options that are also more suited for implementing some quantum algorithms, for instance, those with coherent classical subroutines. Here, we calculate error rates and resource overheads for implementing logical ccz with pieceable fault tolerance, a nontransversal method for implementing logical gates. We provide a comparison with a nonlocal magic-state scheme on a concatenated code and a local magic-state scheme on the surface code. We find the pieceable fault-tolerance scheme particularly advantaged over magic states on concatenated codes and in certain regimes over magic states on the surface code. Our results suggest that pieceable fault tolerance is a promising candidate for fault tolerance in a near-future quantum computer.
Reliability of Fault Tolerant Control Systems. Part 2
NASA Technical Reports Server (NTRS)
Wu, N. Eva
2000-01-01
This paper reports Part II of a two part effort that is intended to delineate the relationship between reliability and fault tolerant control in a quantitative manner. Reliability properties peculiar to fault-tolerant control systems are emphasized, such as the presence of analytic redundancy in high proportion, the dependence of failures on control performance, and high risks associated with decisions in redundancy management due to multiple sources of uncertainties and sometimes large processing requirements. As a consequence, coverage of failures through redundancy management can be severely limited. The paper proposes to formulate the fault tolerant control problem as an optimization problem that maximizes coverage of failures through redundancy management. Coverage modeling is attempted in a way that captures its dependence on the control performance and on the diagnostic resolution. Under the proposed redundancy management policy, it is shown that an enhanced overall system reliability can be achieved with a control law of a superior robustness, with an estimator of a higher resolution, and with a control performance requirement of a lesser stringency.
Nonuniform code concatenation for universal fault-tolerant quantum computing
NASA Astrophysics Data System (ADS)
Nikahd, Eesa; Sedighi, Mehdi; Saheb Zamani, Morteza
2017-09-01
Using transversal gates is a straightforward and efficient technique for fault-tolerant quantum computing. Since transversal gates alone cannot be computationally universal, they must be combined with other approaches such as magic state distillation, code switching, or code concatenation to achieve universality. In this paper we propose an alternative approach for universal fault-tolerant quantum computing, mainly based on the code concatenation approach proposed in [T. Jochym-O'Connor and R. Laflamme, Phys. Rev. Lett. 112, 010505 (2014), 10.1103/PhysRevLett.112.010505], but in a nonuniform fashion. The proposed approach is described based on nonuniform concatenation of the 7-qubit Steane code with the 15-qubit Reed-Muller code, as well as the 5-qubit code with the 15-qubit Reed-Muller code, which lead to two 49-qubit and 47-qubit codes, respectively. These codes can correct any arbitrary single physical error with the ability to perform a universal set of fault-tolerant gates, without using magic state distillation.
NASA Astrophysics Data System (ADS)
Takagi, R.; Okada, T.; Yoshida, K.; Townend, J.; Boese, C. M.; Baratin, L. M.; Chamberlain, C. J.; Savage, M. K.
2016-12-01
We estimate shear wave velocity anisotropy in shallow crust near the Alpine fault using seismic interferometry of borehole vertical arrays. We utilized four borehole observations: two sensors are deployed in two boreholes of the Deep Fault Drilling Project in the hanging wall side, and the other two sites are located in the footwall side. Surface sensors deployed just above each borehole are used to make vertical arrays. Crosscorrelating rotated horizontal seismograms observed by the borehole and surface sensors, we extracted polarized shear waves propagating from the bottom to the surface of each borehole. The extracted shear waves show polarization angle dependence of travel time, indicating shear wave anisotropy between the two sensors. In the hanging wall side, the estimated fast shear wave directions are parallel to the Alpine fault. Strong anisotropy of 20% is observed at the site within 100 m from the Alpine fault. The hanging wall consists of mylonite and schist characterized by fault parallel foliation. In addition, an acoustic borehole imaging reveals fractures parallel to the Alpine fault. The fault parallel anisotropy suggest structural anisotropy is predominant in the hanging wall, demonstrating consistency of geological and seismological observations. In the footwall side, on the other hand, the angle between the fast direction and the strike of the Alpine fault is 33-40 degrees. Since the footwall is composed of granitoid that may not have planar structure, stress induced anisotropy is possibly predominant. The direction of maximum horizontal stress (SHmax) estimated by focal mechanisms of regional earthquakes is 55 degrees of the Alpine fault. Possible interpretation of the difference between the fast direction and SHmax direction is depth rotation of stress field near the Alpine fault. Similar depth rotation of stress field is also observed in the SAFOD borehole at the San Andreas fault.
NASA Technical Reports Server (NTRS)
Bobin, V.; Whitaker, S.
1990-01-01
This paper reports a design technique to make Complex CMOS Gates fail-safe for a class of faults. Two classes of faults are defined. The fail-safe design presented has limited fault-tolerance capability. Multiple faults are also covered.
Data-based fault-tolerant control for affine nonlinear systems with actuator faults.
Xie, Chun-Hua; Yang, Guang-Hong
2016-09-01
This paper investigates the fault-tolerant control (FTC) problem for unknown nonlinear systems with actuator faults including stuck, outage, bias and loss of effectiveness. The upper bounds of stuck faults, bias faults and loss of effectiveness faults are unknown. A new data-based FTC scheme is proposed. It consists of the online estimations of the bounds and a state-dependent function. The estimations are adjusted online to compensate automatically the actuator faults. The state-dependent function solved by using real system data helps to stabilize the system. Furthermore, all signals in the resulting closed-loop system are uniformly bounded and the states converge asymptotically to zero. Compared with the existing results, the proposed approach is data-based. Finally, two simulation examples are provided to show the effectiveness of the proposed approach. Copyright © 2016 ISA. Published by Elsevier Ltd. All rights reserved.
Advanced I&C for Fault-Tolerant Supervisory Control of Small Modular Reactors
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cole, Daniel G.
In this research, we have developed a supervisory control approach to enable automated control of SMRs. By design the supervisory control system has an hierarchical, interconnected, adaptive control architecture. A considerable advantage to this architecture is that it allows subsystems to communicate at different/finer granularity, facilitates monitoring of process at the modular and plant levels, and enables supervisory control. We have investigated the deployment of automation, monitoring, and data collection technologies to enable operation of multiple SMRs. Each unit's controller collects and transfers information from local loops and optimize that unit’s parameters. Information is passed from the each SMR unitmore » controller to the supervisory controller, which supervises the actions of SMR units and manage plant processes. The information processed at the supervisory level will provide operators the necessary information needed for reactor, unit, and plant operation. In conjunction with the supervisory effort, we have investigated techniques for fault-tolerant networks, over which information is transmitted between local loops and the supervisory controller to maintain a safe level of operational normalcy in the presence of anomalies. The fault-tolerance of the supervisory control architecture, the network that supports it, and the impact of fault-tolerance on multi-unit SMR plant control has been a second focus of this research. To this end, we have investigated the deployment of advanced automation, monitoring, and data collection and communications technologies to enable operation of multiple SMRs. We have created a fault-tolerant multi-unit SMR supervisory controller that collects and transfers information from local loops, supervise their actions, and adaptively optimize the controller parameters. The goal of this research has been to develop the methodologies and procedures for fault-tolerant supervisory control of small modular reactors. To achieve this goal, we have identified the following objectives. These objective are an ordered approach to the research: I) Development of a supervisory digital I&C system II) Fault-tolerance of the supervisory control architecture III) Automated decision making and online monitoring.« less
NASA Astrophysics Data System (ADS)
Ran, Dechao; Chen, Xiaoqian; de Ruiter, Anton; Xiao, Bing
2018-04-01
This study presents an adaptive second-order sliding control scheme to solve the attitude fault tolerant control problem of spacecraft subject to system uncertainties, external disturbances and reaction wheel faults. A novel fast terminal sliding mode is preliminarily designed to guarantee that finite-time convergence of the attitude errors can be achieved globally. Based on this novel sliding mode, an adaptive second-order observer is then designed to reconstruct the system uncertainties and the actuator faults. One feature of the proposed observer is that the design of the observer does not necessitate any priori information of the upper bounds of the system uncertainties and the actuator faults. In view of the reconstructed information supplied by the designed observer, a second-order sliding mode controller is developed to accomplish attitude maneuvers with great robustness and precise tracking accuracy. Theoretical stability analysis proves that the designed fault tolerant control scheme can achieve finite-time stability of the closed-loop system, even in the presence of reaction wheel faults and system uncertainties. Numerical simulations are also presented to demonstrate the effectiveness and superiority of the proposed control scheme over existing methodologies.
Low-Power Fault Tolerance for Spacecraft FPGA-Based Numerical Computing
2006-09-01
Ranganathan , “Power Management – Guest Lecture for CS4135, NPS,” Naval Postgraduate School, Nov 2004 [32] R. L. Phelps, “Operational Experiences with the...4302, and to the Office of Management and Budget, Paperwork Reduction Project (0704-0188) Washington DC 20503. 1. AGENCY USE ONLY (Leave blank) 2...undesirable, are not necessarily harmful. Our intent is to prevent errors by properly managing faults. This research focuses on developing fault-tolerant
A high precision position sensor design and its signal processing algorithm for a maglev train.
Xue, Song; Long, Zhiqiang; He, Ning; Chang, Wensen
2012-01-01
High precision positioning technology for a kind of high speed maglev train with an electromagnetic suspension (EMS) system is studied. At first, the basic structure and functions of the position sensor are introduced and some key techniques to enhance the positioning precision are designed. Then, in order to further improve the positioning signal quality and the fault-tolerant ability of the sensor, a new kind of discrete-time tracking differentiator (TD) is proposed based on nonlinear optimal control theory. This new TD has good filtering and differentiating performances and a small calculation load. It is suitable for real-time signal processing. The stability, convergence property and frequency characteristics of the TD are studied and analyzed thoroughly. The delay constant of the TD is figured out and an effective time delay compensation algorithm is proposed. Based on the TD technology, a filtering process is introduced in to improve the positioning signal waveform when the sensor is under bad working conditions, and a two-sensor switching algorithm is designed to eliminate the positioning errors caused by the joint gaps of the long stator. The effectiveness and stability of the sensor and its signal processing algorithms are proved by the experiments on a test train during a long-term test run.
A High Precision Position Sensor Design and Its Signal Processing Algorithm for a Maglev Train
Xue, Song; Long, Zhiqiang; He, Ning; Chang, Wensen
2012-01-01
High precision positioning technology for a kind of high speed maglev train with an electromagnetic suspension (EMS) system is studied. At first, the basic structure and functions of the position sensor are introduced and some key techniques to enhance the positioning precision are designed. Then, in order to further improve the positioning signal quality and the fault-tolerant ability of the sensor, a new kind of discrete-time tracking differentiator (TD) is proposed based on nonlinear optimal control theory. This new TD has good filtering and differentiating performances and a small calculation load. It is suitable for real-time signal processing. The stability, convergence property and frequency characteristics of the TD are studied and analyzed thoroughly. The delay constant of the TD is figured out and an effective time delay compensation algorithm is proposed. Based on the TD technology, a filtering process is introduced in to improve the positioning signal waveform when the sensor is under bad working conditions, and a two-sensor switching algorithm is designed to eliminate the positioning errors caused by the joint gaps of the long stator. The effectiveness and stability of the sensor and its signal processing algorithms are proved by the experiments on a test train during a long-term test run. PMID:22778582
NASA Astrophysics Data System (ADS)
Hassanabadi, Amir Hossein; Shafiee, Masoud; Puig, Vicenc
2018-01-01
In this paper, sensor fault diagnosis of a singular delayed linear parameter varying (LPV) system is considered. In the considered system, the model matrices are dependent on some parameters which are real-time measurable. The case of inexact parameter measurements is considered which is close to real situations. Fault diagnosis in this system is achieved via fault estimation. For this purpose, an augmented system is created by including sensor faults as additional system states. Then, an unknown input observer (UIO) is designed which estimates both the system states and the faults in the presence of measurement noise, disturbances and uncertainty induced by inexact measured parameters. Error dynamics and the original system constitute an uncertain system due to inconsistencies between real and measured values of the parameters. Then, the robust estimation of the system states and the faults are achieved with H∞ performance and formulated with a set of linear matrix inequalities (LMIs). The designed UIO is also applicable for fault diagnosis of singular delayed LPV systems with unmeasurable scheduling variables. The efficiency of the proposed approach is illustrated with an example.
Sensor Data Fusion with Z-Numbers and Its Application in Fault Diagnosis
Jiang, Wen; Xie, Chunhe; Zhuang, Miaoyan; Shou, Yehang; Tang, Yongchuan
2016-01-01
Sensor data fusion technology is widely employed in fault diagnosis. The information in a sensor data fusion system is characterized by not only fuzziness, but also partial reliability. Uncertain information of sensors, including randomness, fuzziness, etc., has been extensively studied recently. However, the reliability of a sensor is often overlooked or cannot be analyzed adequately. A Z-number, Z = (A, B), can represent the fuzziness and the reliability of information simultaneously, where the first component A represents a fuzzy restriction on the values of uncertain variables and the second component B is a measure of the reliability of A. In order to model and process the uncertainties in a sensor data fusion system reasonably, in this paper, a novel method combining the Z-number and Dempster–Shafer (D-S) evidence theory is proposed, where the Z-number is used to model the fuzziness and reliability of the sensor data and the D-S evidence theory is used to fuse the uncertain information of Z-numbers. The main advantages of the proposed method are that it provides a more robust measure of reliability to the sensor data, and the complementary information of multi-sensors reduces the uncertainty of the fault recognition, thus enhancing the reliability of fault detection. PMID:27649193
Method and system for diagnostics of apparatus
NASA Technical Reports Server (NTRS)
Gorinevsky, Dimitry (Inventor)
2012-01-01
Proposed is a method, implemented in software, for estimating fault state of an apparatus outfitted with sensors. At each execution period the method processes sensor data from the apparatus to obtain a set of parity parameters, which are further used for estimating fault state. The estimation method formulates a convex optimization problem for each fault hypothesis and employs a convex solver to compute fault parameter estimates and fault likelihoods for each fault hypothesis. The highest likelihoods and corresponding parameter estimates are transmitted to a display device or an automated decision and control system. The obtained accurate estimate of fault state can be used to improve safety, performance, or maintenance processes for the apparatus.
Fault detection and isolation for complex system
NASA Astrophysics Data System (ADS)
Jing, Chan Shi; Bayuaji, Luhur; Samad, R.; Mustafa, M.; Abdullah, N. R. H.; Zain, Z. M.; Pebrianti, Dwi
2017-07-01
Fault Detection and Isolation (FDI) is a method to monitor, identify, and pinpoint the type and location of system fault in a complex multiple input multiple output (MIMO) non-linear system. A two wheel robot is used as a complex system in this study. The aim of the research is to construct and design a Fault Detection and Isolation algorithm. The proposed method for the fault identification is using hybrid technique that combines Kalman filter and Artificial Neural Network (ANN). The Kalman filter is able to recognize the data from the sensors of the system and indicate the fault of the system in the sensor reading. Error prediction is based on the fault magnitude and the time occurrence of fault. Additionally, Artificial Neural Network (ANN) is another algorithm used to determine the type of fault and isolate the fault in the system.
Fault-tolerant computer study. [logic designs for building block circuits
NASA Technical Reports Server (NTRS)
Rennels, D. A.; Avizienis, A. A.; Ercegovac, M. D.
1981-01-01
A set of building block circuits is described which can be used with commercially available microprocessors and memories to implement fault tolerant distributed computer systems. Each building block circuit is intended for VLSI implementation as a single chip. Several building blocks and associated processor and memory chips form a self checking computer module with self contained input output and interfaces to redundant communications buses. Fault tolerance is achieved by connecting self checking computer modules into a redundant network in which backup buses and computer modules are provided to circumvent failures. The requirements and design methodology which led to the definition of the building block circuits are discussed.
Jin, Long; Liao, Bolin; Liu, Mei; Xiao, Lin; Guo, Dongsheng; Yan, Xiaogang
2017-01-01
By incorporating the physical constraints in joint space, a different-level simultaneous minimization scheme, which takes both the robot kinematics and robot dynamics into account, is presented and investigated for fault-tolerant motion planning of redundant manipulator in this paper. The scheme is reformulated as a quadratic program (QP) with equality and bound constraints, which is then solved by a discrete-time recurrent neural network. Simulative verifications based on a six-link planar redundant robot manipulator substantiate the efficacy and accuracy of the presented acceleration fault-tolerant scheme, the resultant QP and the corresponding discrete-time recurrent neural network. PMID:28955217
Self-stabilizing byzantine-fault-tolerant clock synchronization system and method
NASA Technical Reports Server (NTRS)
Malekpour, Mahyar R. (Inventor)
2012-01-01
Systems and methods for rapid Byzantine-fault-tolerant self-stabilizing clock synchronization are provided. The systems and methods are based on a protocol comprising a state machine and a set of monitors that execute once every local oscillator tick. The protocol is independent of specific application specific requirements. The faults are assumed to be arbitrary and/or malicious. All timing measures of variables are based on the node's local clock and thus no central clock or externally generated pulse is used. Instances of the protocol are shown to tolerate bursts of transient failures and deterministically converge with a linear convergence time with respect to the synchronization period as predicted.
NASA Astrophysics Data System (ADS)
Ishak, D.; Zhu, Z. Q.; Howe, D.
2005-05-01
The electromagnetic performance of fault-tolerant three-phase permanent magnet brushless dc motors, in which the wound teeth are wider than the unwound teeth and their tooth tips span approximately one pole pitch and which have similar numbers of slots and poles, is investigated. It is shown that they have a more trapezoidal phase back-emf wave form, a higher torque capability, and a lower torque ripple than similar fault-tolerant machines with equal tooth widths. However, these benefits gradually diminish as the pole number is increased, due to the effect of interpole leakage flux.
Fault Tolerant Frequent Pattern Mining
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shohdy, Sameh; Vishnu, Abhinav; Agrawal, Gagan
FP-Growth algorithm is a Frequent Pattern Mining (FPM) algorithm that has been extensively used to study correlations and patterns in large scale datasets. While several researchers have designed distributed memory FP-Growth algorithms, it is pivotal to consider fault tolerant FP-Growth, which can address the increasing fault rates in large scale systems. In this work, we propose a novel parallel, algorithm-level fault-tolerant FP-Growth algorithm. We leverage algorithmic properties and MPI advanced features to guarantee an O(1) space complexity, achieved by using the dataset memory space itself for checkpointing. We also propose a recovery algorithm that can use in-memory and disk-based checkpointing,more » though in many cases the recovery can be completed without any disk access, and incurring no memory overhead for checkpointing. We evaluate our FT algorithm on a large scale InfiniBand cluster with several large datasets using up to 2K cores. Our evaluation demonstrates excellent efficiency for checkpointing and recovery in comparison to the disk-based approach. We have also observed 20x average speed-up in comparison to Spark, establishing that a well designed algorithm can easily outperform a solution based on a general fault-tolerant programming model.« less
Application of the Systematic Sensor Selection Strategy for Turbofan Engine Diagnostics
NASA Technical Reports Server (NTRS)
Sowers, T. Shane; Kopasakis, George; Simon, Donald L.
2008-01-01
The data acquired from available system sensors forms the foundation upon which any health management system is based, and the available sensor suite directly impacts the overall diagnostic performance that can be achieved. While additional sensors may provide improved fault diagnostic performance, there are other factors that also need to be considered such as instrumentation cost, weight, and reliability. A systematic sensor selection approach is desired to perform sensor selection from a holistic system-level perspective as opposed to performing decisions in an ad hoc or heuristic fashion. The Systematic Sensor Selection Strategy is a methodology that optimally selects a sensor suite from a pool of sensors based on the system fault diagnostic approach, with the ability of taking cost, weight, and reliability into consideration. This procedure was applied to a large commercial turbofan engine simulation. In this initial study, sensor suites tailored for improved diagnostic performance are constructed from a prescribed collection of candidate sensors. The diagnostic performance of the best performing sensor suites in terms of fault detection and identification are demonstrated, with a discussion of the results and implications for future research.
Application of the Systematic Sensor Selection Strategy for Turbofan Engine Diagnostics
NASA Technical Reports Server (NTRS)
Sowers, T. Shane; Kopasakis, George; Simon, Donald L.
2008-01-01
The data acquired from available system sensors forms the foundation upon which any health management system is based, and the available sensor suite directly impacts the overall diagnostic performance that can be achieved. While additional sensors may provide improved fault diagnostic performance there are other factors that also need to be considered such as instrumentation cost, weight, and reliability. A systematic sensor selection approach is desired to perform sensor selection from a holistic system-level perspective as opposed to performing decisions in an ad hoc or heuristic fashion. The Systematic Sensor Selection Strategy is a methodology that optimally selects a sensor suite from a pool of sensors based on the system fault diagnostic approach, with the ability of taking cost, weight and reliability into consideration. This procedure was applied to a large commercial turbofan engine simulation. In this initial study, sensor suites tailored for improved diagnostic performance are constructed from a prescribed collection of candidate sensors. The diagnostic performance of the best performing sensor suites in terms of fault detection and identification are demonstrated, with a discussion of the results and implications for future research.
Integrated microelectronics for smart textiles.
Lauterbach, Christl; Glaser, Rupert; Savio, Domnic; Schnell, Markus; Weber, Werner
2005-01-01
The combination of textile fabrics with microelectronics will lead to completely new applications, thus achieving elements of ambient intelligence. The integration of sensor or actuator networks, using fabrics with conductive fibres as a textile motherboard enable the fabrication of large active areas. In this paper we describe an integration technology for the fabrication of a "smart textile" based on a wired peer-to-peer network of microcontrollers with integrated sensors or actuators. A self-organizing and fault-tolerant architecture is accomplished which detects the physical shape of the network. Routing paths are formed for data transmission, automatically circumventing defective or missing areas. The network architecture allows the smart textiles to be produced by reel-to-reel processes, cut into arbitrary shapes subsequently and implemented in systems at low installation costs. The possible applications are manifold, ranging from alarm systems to intelligent guidance systems, passenger recognition in car seats, air conditioning control in interior lining and smart wallpaper with software-defined light switches.
NASA Technical Reports Server (NTRS)
Clune, E.; Segall, Z.; Siewiorek, D.
1984-01-01
A program of experiments has been conducted at NASA-Langley to test the fault-free performance of a Fault-Tolerant Multiprocessor (FTMP) avionics system for next-generation aircraft. Baseline measurements of an operating FTMP system were obtained with respect to the following parameters: instruction execution time, frame size, and the variation of clock ticks. The mechanisms of frame stretching were also investigated. The experimental results are summarized in a table. Areas of interest for future tests are identified, with emphasis given to the implementation of a synthetic workload generation mechanism on FTMP.
Fault tolerant features and experiments of ANTS distributed real-time system
NASA Astrophysics Data System (ADS)
Dominic-Savio, Patrick; Lo, Jien-Chung; Tufts, Donald W.
1995-01-01
The ANTS project at the University of Rhode Island introduces the concept of Active Nodal Task Seeking (ANTS) as a way to efficiently design and implement dependable, high-performance, distributed computing. This paper presents the fault tolerant design features that have been incorporated in the ANTS experimental system implementation. The results of performance evaluations and fault injection experiments are reported. The fault-tolerant version of ANTS categorizes all computing nodes into three groups. They are: the up-and-running green group, the self-diagnosing yellow group and the failed red group. Each available computing node will be placed in the yellow group periodically for a routine diagnosis. In addition, for long-life missions, ANTS uses a monitoring scheme to identify faulty computing nodes. In this monitoring scheme, the communication pattern of each computing node is monitored by two other nodes.
Robust Gain-Scheduled Fault Tolerant Control for a Transport Aircraft
NASA Technical Reports Server (NTRS)
Shin, Jong-Yeob; Gregory, Irene
2007-01-01
This paper presents an application of robust gain-scheduled control concepts using a linear parameter-varying (LPV) control synthesis method to design fault tolerant controllers for a civil transport aircraft. To apply the robust LPV control synthesis method, the nonlinear dynamics must be represented by an LPV model, which is developed using the function substitution method over the entire flight envelope. The developed LPV model associated with the aerodynamic coefficient uncertainties represents nonlinear dynamics including those outside the equilibrium manifold. Passive and active fault tolerant controllers (FTC) are designed for the longitudinal dynamics of the Boeing 747-100/200 aircraft in the presence of elevator failure. Both FTC laws are evaluated in the full nonlinear aircraft simulation in the presence of the elevator fault and the results are compared to show pros and cons of each control law.
Li, Yunji; Wu, QingE; Peng, Li
2018-01-23
In this paper, a synthesized design of fault-detection filter and fault estimator is considered for a class of discrete-time stochastic systems in the framework of event-triggered transmission scheme subject to unknown disturbances and deception attacks. A random variable obeying the Bernoulli distribution is employed to characterize the phenomena of the randomly occurring deception attacks. To achieve a fault-detection residual is only sensitive to faults while robust to disturbances, a coordinate transformation approach is exploited. This approach can transform the considered system into two subsystems and the unknown disturbances are removed from one of the subsystems. The gain of fault-detection filter is derived by minimizing an upper bound of filter error covariance. Meanwhile, system faults can be reconstructed by the remote fault estimator. An recursive approach is developed to obtain fault estimator gains as well as guarantee the fault estimator performance. Furthermore, the corresponding event-triggered sensor data transmission scheme is also presented for improving working-life of the wireless sensor node when measurement information are aperiodically transmitted. Finally, a scaled version of an industrial system consisting of local PC, remote estimator and wireless sensor node is used to experimentally evaluate the proposed theoretical results. In particular, a novel fault-alarming strategy is proposed so that the real-time capacity of fault-detection is guaranteed when the event condition is triggered.
Health Monitoring of a Satellite System
NASA Technical Reports Server (NTRS)
Chen, Robert H.; Ng, Hok K.; Speyer, Jason L.; Guntur, Lokeshkumar S.; Carpenter, Russell
2004-01-01
A health monitoring system based on analytical redundancy is developed for satellites on elliptical orbits. First, the dynamics of the satellite including orbital mechanics and attitude dynamics is modelled as a periodic system. Then, periodic fault detection filters are designed to detect and identify the satellite's actuator and sensor faults. In addition, parity equations are constructed using the algebraic redundant relationship among the actuators and sensors. Furthermore, a residual processor is designed to generate the probability of each of the actuator and sensor faults by using a sequential probability test. Finally, the health monitoring system, consisting of periodic fault detection lters, parity equations and residual processor, is evaluated in the simulation in the presence of disturbances and uncertainty.
Vehicle Fault Diagnose Based on Smart Sensor
NASA Astrophysics Data System (ADS)
Zhining, Li; Peng, Wang; Jianmin, Mei; Jianwei, Li; Fei, Teng
In the vehicle's traditional fault diagnose system, we usually use a computer system with a A/D card and with many sensors connected to it. The disadvantage of this system is that these sensor can hardly be shared with control system and other systems, there are too many connect lines and the electro magnetic compatibility(EMC) will be affected. In this paper, smart speed sensor, smart acoustic press sensor, smart oil press sensor, smart acceleration sensor and smart order tracking sensor were designed to solve this problem. With the CAN BUS these smart sensors, fault diagnose computer and other computer could be connected together to establish a network system which can monitor and control the vehicle's diesel and other system without any duplicate sensor. The hard and soft ware of the smart sensor system was introduced, the oil press, vibration and acoustic signal are resampled by constant angle increment to eliminate the influence of the rotate speed. After the resample, the signal in every working cycle could be averaged in angle domain and do other analysis like order spectrum.
NASA Technical Reports Server (NTRS)
Wilson, Edward; Sutter, David W.; Berkovitz, Dustin; Betts, Bradley J.; Kong, Edmund; delMundo, Rommel; Lages, Christopher R.; Mah, Robert W.; Papasin, Richard
2003-01-01
By analyzing the motions of a thruster-controlled spacecraft, it is possible to provide on-line (1) thruster fault detection and isolation (FDI), and (2) vehicle mass- and thruster-property identification (ID). Technologies developed recently at NASA Ames have significantly improved the speed and accuracy of these ID and FDI capabilities, making them feasible for application to a broad class of spacecraft. Since these technologies use existing sensors, the improved system robustness and performance that comes with the thruster fault tolerance and system ID can be achieved through a software-only implementation. This contrasts with the added cost, mass, and hardware complexity commonly required by FDI. Originally developed in partnership with NASA - Johnson Space Center to provide thruster FDI capability for the X-38 during re-entry, these technologies are most recently being applied to the MIT SPHERES experimental spacecraft to fly on the International Space Station in 2004. The model-based FDI uses a maximum-likelihood calculation at its core, while the ID is based upon recursive least squares estimation. Flight test results from the SPHERES implementation, as flown aboard the NASA KC-1 35A 0-g simulator aircraft in November 2003 are presented.
Definition and trade-off study of reconfigurable airborne digital computer system organizations
NASA Technical Reports Server (NTRS)
Conn, R. B.
1974-01-01
A highly-reliable, fault-tolerant reconfigurable computer system for aircraft applications was developed. The development and application reliability and fault-tolerance assessment techniques are described. Particular emphasis is placed on the needs of an all-digital, fly-by-wire control system appropriate for a passenger-carrying airplane.
Current Sensor Fault Diagnosis Based on a Sliding Mode Observer for PMSM Driven Systems
Huang, Gang; Luo, Yi-Ping; Zhang, Chang-Fan; Huang, Yi-Shan; Zhao, Kai-Hui
2015-01-01
This paper proposes a current sensor fault detection method based on a sliding mode observer for the torque closed-loop control system of interior permanent magnet synchronous motors. First, a sliding mode observer based on the extended flux linkage is built to simplify the motor model, which effectively eliminates the phenomenon of salient poles and the dependence on the direct axis inductance parameter, and can also be used for real-time calculation of feedback torque. Then a sliding mode current observer is constructed in αβ coordinates to generate the fault residuals of the phase current sensors. The method can accurately identify abrupt gain faults and slow-variation offset faults in real time in faulty sensors, and the generated residuals of the designed fault detection system are not affected by the unknown input, the structure of the observer, and the theoretical derivation and the stability proof process are concise and simple. The RT-LAB real-time simulation is used to build a simulation model of the hardware in the loop. The simulation and experimental results demonstrate the feasibility and effectiveness of the proposed method. PMID:25970258
Zhang, Dapeng; Long, Zhiqiang; Xue, Song; Zhang, Junge
2012-01-01
This paper studies an absolute positioning sensor for a high-speed maglev train and its fault diagnosis method. The absolute positioning sensor is an important sensor for the high-speed maglev train to accomplish its synchronous traction. It is used to calibrate the error of the relative positioning sensor which is used to provide the magnetic phase signal. On the basis of the analysis for the principle of the absolute positioning sensor, the paper describes the design of the sending and receiving coils and realizes the hardware and the software for the sensor. In order to enhance the reliability of the sensor, a support vector machine is used to recognize the fault characters, and the signal flow method is used to locate the faulty parts. The diagnosis information not only can be sent to an upper center control computer to evaluate the reliability of the sensors, but also can realize on-line diagnosis for debugging and the quick detection when the maglev train is off-line. The absolute positioning sensor we study has been used in the actual project.
Advanced information processing system
NASA Technical Reports Server (NTRS)
Lala, J. H.
1984-01-01
Design and performance details of the advanced information processing system (AIPS) for fault and damage tolerant data processing on aircraft and spacecraft are presented. AIPS comprises several computers distributed throughout the vehicle and linked by a damage tolerant data bus. Most I/O functions are available to all the computers, which run in a TDMA mode. Each computer performs separate specific tasks in normal operation and assumes other tasks in degraded modes. Redundant software assures that all fault monitoring, logging and reporting are automated, together with control functions. Redundant duplex links and damage-spread limitation provide the fault tolerance. Details of an advanced design of a laboratory-scale proof-of-concept system are described, including functional operations.
Fault detection and diagnosis in a spacecraft attitude determination system
NASA Astrophysics Data System (ADS)
Pirmoradi, F. N.; Sassani, F.; de Silva, C. W.
2009-09-01
This paper presents a new scheme for fault detection and diagnosis (FDD) in spacecraft attitude determination (AD) sensors. An integrated attitude determination system, which includes measurements of rate and angular position using rate gyros and vector sensors, is developed. Measurement data from all sensors are fused by a linearized Kalman filter, which is designed based on the system kinematics, to provide attitude estimation and the values of the gyro bias. Using this information the erroneous sensor measurements are corrected, and unbounded sensor measurement errors are avoided. The resulting bias-free data are used in the FDD scheme. The FDD algorithm uses model-based state estimation, combining the information from the rotational dynamics and kinematics of a spacecraft with the sensor measurements to predict the future sensor outputs. Fault isolation is performed through extended Kalman filters (EKFs). The innovation sequences of EKFs are monitored by several statistical tests to detect the presence of a failure and to localize the failures in all AD sensors. The isolation procedure is developed in two phases. In the first phase, two EKFs are designed, which use subsets of measurements to provide state estimates and form residuals, which are used to verify the source of the fault. In the second phase of isolation, testing of multiple hypotheses is performed. The generalized likelihood ratio test is utilized to identify the faulty components. In the scheme developed in this paper a relatively small number of hypotheses is used, which results in faster isolation and highly distinguishable fault signatures. An important feature of the developed FDD scheme is that it can provide attitude estimations even if only one type of sensors is functioning properly.
Fault-tolerance of a neural network solving the traveling salesman problem
NASA Technical Reports Server (NTRS)
Protzel, P.; Palumbo, D.; Arras, M.
1989-01-01
This study presents the results of a fault-injection experiment that stimulates a neural network solving the Traveling Salesman Problem (TSP). The network is based on a modified version of Hopfield's and Tank's original method. We define a performance characteristic for the TSP that allows an overall assessment of the solution quality for different city-distributions and problem sizes. Five different 10-, 20-, and 30- city cases are sued for the injection of up to 13 simultaneous stuck-at-0 and stuck-at-1 faults. The results of more than 4000 simulation-runs show the extreme fault-tolerance of the network, especially with respect to stuck-at-0 faults. One possible explanation for the overall surprising result is the redundancy of the problem representation.
Software reliability through fault-avoidance and fault-tolerance
NASA Technical Reports Server (NTRS)
Vouk, Mladen A.; Mcallister, David F.
1993-01-01
Strategies and tools for the testing, risk assessment and risk control of dependable software-based systems were developed. Part of this project consists of studies to enable the transfer of technology to industry, for example the risk management techniques for safety-concious systems. Theoretical investigations of Boolean and Relational Operator (BRO) testing strategy were conducted for condition-based testing. The Basic Graph Generation and Analysis tool (BGG) was extended to fully incorporate several variants of the BRO metric. Single- and multi-phase risk, coverage and time-based models are being developed to provide additional theoretical and empirical basis for estimation of the reliability and availability of large, highly dependable software. A model for software process and risk management was developed. The use of cause-effect graphing for software specification and validation was investigated. Lastly, advanced software fault-tolerance models were studied to provide alternatives and improvements in situations where simple software fault-tolerance strategies break down.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fang, Aiman; Laguna, Ignacio; Sato, Kento
Future high-performance computing systems may face frequent failures with their rapid increase in scale and complexity. Resilience to faults has become a major challenge for large-scale applications running on supercomputers, which demands fault tolerance support for prevalent MPI applications. Among failure scenarios, process failures are one of the most severe issues as they usually lead to termination of applications. However, the widely used MPI implementations do not provide mechanisms for fault tolerance. We propose FTA-MPI (Fault Tolerance Assistant MPI), a programming model that provides support for failure detection, failure notification and recovery. Specifically, FTA-MPI exploits a try/catch model that enablesmore » failure localization and transparent recovery of process failures in MPI applications. We demonstrate FTA-MPI with synthetic applications and a molecular dynamics code CoMD, and show that FTA-MPI provides high programmability for users and enables convenient and flexible recovery of process failures.« less
Dong, Hongyang; Hu, Qinglei; Ma, Guangfu
2016-03-01
Study results of developing control system for spacecraft formation proximity operations between a target and a chaser are presented. In particular, a coupled model using dual quaternion is employed to describe the proximity problem of spacecraft formation, and a nonlinear adaptive fault-tolerant feedback control law is developed to enable the chaser spacecraft to track the position and attitude of the target even though its actuator occurs fault. Multiple-task capability of the proposed control system is further demonstrated in the presence of disturbances and parametric uncertainties as well. In addition, the practical finite-time stability feature of the closed-loop system is guaranteed theoretically under the designed control law. Numerical simulation of the proposed method is presented to demonstrate the advantages with respect to interference suppression, fast tracking, fault tolerant and practical finite-time stability. Copyright © 2015 ISA. Published by Elsevier Ltd. All rights reserved.
Magar, Kaman Thapa; Reich, Gregory W; Kondash, Corey; Slinker, Keith; Pankonien, Alexander M; Baur, Jeffery W; Smyers, Brian
2016-11-10
Distributed arrays of artificial hair sensors have bio-like sensing capabilities to obtain spatial and temporal surface flow information which is an important aspect of an effective fly-by-feel system. The spatiotemporal surface flow measurement enables further exploration of additional flow features such as flow stagnation, separation, and reattachment points. Due to their inherent robustness and fault tolerant capability, distributed arrays of hair sensors are well equipped to assess the aerodynamic and flow states in adverse conditions. In this paper, a local flow measurement from an array of artificial hair sensors in a wind tunnel experiment is used with a feedforward artificial neural network to predict aerodynamic parameters such as lift coefficient, moment coefficient, free-stream velocity, and angle of attack on an airfoil. We find the prediction error within 6% and 10% for lift and moment coefficients. The error for free-stream velocity and angle of attack were within 0.12 mph and 0.37 degrees. Knowledge of these parameters are key to finding the real time forces and moments which paves the way for effective control design to increase flight agility, stability, and maneuverability.
NASA Technical Reports Server (NTRS)
Ly, U. L.; Ho, J. K.
1986-01-01
A systematic procedure for the synthesis of fault tolerant control laws to actuator failure has been presented. Two design methods were used to synthesize fault tolerant controllers: the conventional LQ design method and a direct feedback controller design method SANDY. The latter method is used primarily to streamline the full-state Q feedback design into a practical implementable output feedback controller structure. To achieve robustness to control actuator failure, the redundant surfaces are properly balanced according to their control effectiveness. A simple gain schedule based on the landing gear up/down logic involving only three gains was developed to handle three design flight conditions: Mach .25 and Mach .60 at 5000 ft and Mach .90 at 20,000 ft. The fault tolerant control law developed in this study provides good stability augmentation and performance for the relaxed static stability aircraft. The augmented aircraft responses are found to be invariant to the presence of a failure. Furthermore, single-loop stability margins of +6 dB in gain and +30 deg in phase were achieved along with -40 dB/decade rolloff at high frequency.
High-Threshold Fault-Tolerant Quantum Computation with Analog Quantum Error Correction
NASA Astrophysics Data System (ADS)
Fukui, Kosuke; Tomita, Akihisa; Okamoto, Atsushi; Fujii, Keisuke
2018-04-01
To implement fault-tolerant quantum computation with continuous variables, the Gottesman-Kitaev-Preskill (GKP) qubit has been recognized as an important technological element. However, it is still challenging to experimentally generate the GKP qubit with the required squeezing level, 14.8 dB, of the existing fault-tolerant quantum computation. To reduce this requirement, we propose a high-threshold fault-tolerant quantum computation with GKP qubits using topologically protected measurement-based quantum computation with the surface code. By harnessing analog information contained in the GKP qubits, we apply analog quantum error correction to the surface code. Furthermore, we develop a method to prevent the squeezing level from decreasing during the construction of the large-scale cluster states for the topologically protected, measurement-based, quantum computation. We numerically show that the required squeezing level can be relaxed to less than 10 dB, which is within the reach of the current experimental technology. Hence, this work can considerably alleviate this experimental requirement and take a step closer to the realization of large-scale quantum computation.
Active Fault Tolerant Control for Ultrasonic Piezoelectric Motor
NASA Astrophysics Data System (ADS)
Boukhnifer, Moussa
2012-07-01
Ultrasonic piezoelectric motor technology is an important system component in integrated mechatronics devices working on extreme operating conditions. Due to these constraints, robustness and performance of the control interfaces should be taken into account in the motor design. In this paper, we apply a new architecture for a fault tolerant control using Youla parameterization for an ultrasonic piezoelectric motor. The distinguished feature of proposed controller architecture is that it shows structurally how the controller design for performance and robustness may be done separately which has the potential to overcome the conflict between performance and robustness in the traditional feedback framework. A fault tolerant control architecture includes two parts: one part for performance and the other part for robustness. The controller design works in such a way that the feedback control system will be solely controlled by the proportional plus double-integral
Detection of faults and software reliability analysis
NASA Technical Reports Server (NTRS)
Knight, J. C.
1987-01-01
Specific topics briefly addressed include: the consistent comparison problem in N-version system; analytic models of comparison testing; fault tolerance through data diversity; and the relationship between failures caused by automatically seeded faults.
Yazdani, Sahar; Haeri, Mohammad
2017-11-01
In this work, we study the flocking problem of multi-agent systems with uncertain dynamics subject to actuator failure and external disturbances. By considering some standard assumptions, we propose a robust adaptive fault tolerant protocol for compensating of the actuator bias fault, the partial loss of actuator effectiveness fault, the model uncertainties, and external disturbances. Under the designed protocol, velocity convergence of agents to that of virtual leader is guaranteed while the connectivity preservation of network and collision avoidance among agents are ensured as well. Copyright © 2017 ISA. Published by Elsevier Ltd. All rights reserved.
NASA Technical Reports Server (NTRS)
Haakensen, Erik Edward
1998-01-01
The desire for low-cost reliable computing is increasing. Most current fault tolerant computing solutions are not very flexible, i.e., they cannot adapt to reliability requirements of newly emerging applications in business, commerce, and manufacturing. It is important that users have a flexible, reliable platform to support both critical and noncritical applications. Chameleon, under development at the Center for Reliable and High-Performance Computing at the University of Illinois, is a software framework. for supporting cost-effective adaptable networked fault tolerant service. This thesis details a simulation of fault injection, detection, and recovery in Chameleon. The simulation was written in C++ using the DEPEND simulation library. The results obtained from the simulation included the amount of overhead incurred by the fault detection and recovery mechanisms supported by Chameleon. In addition, information about fault scenarios from which Chameleon cannot recover was gained. The results of the simulation showed that both critical and noncritical applications can be executed in the Chameleon environment with a fairly small amount of overhead. No single point of failure from which Chameleon could not recover was found. Chameleon was also found to be capable of recovering from several multiple failure scenarios.
Mini-Ckpts: Surviving OS Failures in Persistent Memory
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fiala, David; Mueller, Frank; Ferreira, Kurt Brian
Concern is growing in the high-performance computing (HPC) community on the reliability of future extreme-scale systems. Current efforts have focused on application fault-tolerance rather than the operating system (OS), despite the fact that recent studies have suggested that failures in OS memory are more likely. The OS is critical to a system's correct and efficient operation of the node and processes it governs -- and in HPC also for any other nodes a parallelized application runs on and communicates with: Any single node failure generally forces all processes of this application to terminate due to tight communication in HPC. Therefore,more » the OS itself must be capable of tolerating failures. In this work, we introduce mini-ckpts, a framework which enables application survival despite the occurrence of a fatal OS failure or crash. Mini-ckpts achieves this tolerance by ensuring that the critical data describing a process is preserved in persistent memory prior to the failure. Following the failure, the OS is rejuvenated via a warm reboot and the application continues execution effectively making the failure and restart transparent. The mini-ckpts rejuvenation and recovery process is measured to take between three to six seconds and has a failure-free overhead of between 3-5% for a number of key HPC workloads. In contrast to current fault-tolerance methods, this work ensures that the operating and runtime system can continue in the presence of faults. This is a much finer-grained and dynamic method of fault-tolerance than the current, coarse-grained, application-centric methods. Handling faults at this level has the potential to greatly reduce overheads and enables mitigation of additional fault scenarios.« less
Fault tolerant high-performance PACS network design and implementation
NASA Astrophysics Data System (ADS)
Chimiak, William J.; Boehme, Johannes M.
1998-07-01
The Wake Forest University School of Medicine and the Wake Forest University/Baptist Medical Center (WFUBMC) are implementing a second generation PACS. The first generation PACS provided helpful information about the functional and temporal requirements of the system. It highlighted the importance of image retrieval speed, system availability, RIS/HIS integration, the ability to rapidly view images on any PACS workstation, network bandwidth, equipment redundancy, and the ability for the system to evolve using standards-based components. This paper deals with the network design and implementation of the PACS. The physical layout of the hospital areas served by the PACS, the choice of network equipment and installation issues encountered are addressed. Efforts to optimize fault tolerance are discussed. The PACS network is a gigabit, mixed-media network based on LAN emulation over ATM (LANE) with a rapid migration from LANE to Multiple Protocols Over ATM (MPOA) planned. Two fault-tolerant backbone ATM switches serve to distribute network accesses with two load-balancing 622 megabit per second (Mbps) OC-12 interconnections. The switch was sized to be upgradable to provide a 2.54 Gbps OC-48 interconnection with an OC-12 interconnection as a load-balancing backup. Modalities connect with legacy network interface cards to a switched-ethernet device. This device has two 155 Mbps OC-3 load-balancing uplinks to each of the backbone ATM switches of the PACS. This provides a fault-tolerant logical connection to the modality servers which pass verified DICOM images to the PACS servers and proper PACS diagnostic workstations. Where fiber pulls were prohibitively expensive, edge ATM switches were installed with an OC-12 uplink to a backbone ATM switches. The PACS and data base servers are fault-tolerant, hot-swappable Sun Enterprise Servers with an OC-12 connection to a backbone ATM switch and a fast-ethernet connection to a back-up network. The workstations come with 10/100 BASET autosense cards. A redundant switched-ethernet network will be installed to provide yet another degree of network fault-tolerance. The switched-ethernet devices are connected to each of the backbone ATM switches with two-load-balancing OC-3 connections to provide fault-tolerant connectivity in the event of a primary network failure.
Attitude Control System Design for the Solar Dynamics Observatory
NASA Technical Reports Server (NTRS)
Starin, Scott R.; Bourkland, Kristin L.; Kuo-Chia, Liu; Mason, Paul A. C.; Vess, Melissa F.; Andrews, Stephen F.; Morgenstern, Wendy M.
2005-01-01
The Solar Dynamics Observatory mission, part of the Living With a Star program, will place a geosynchronous satellite in orbit to observe the Sun and relay data to a dedicated ground station at all times. SDO remains Sun- pointing throughout most of its mission for the instruments to take measurements of the Sun. The SDO attitude control system is a single-fault tolerant design. Its fully redundant attitude sensor complement includes 16 coarse Sun sensors, a digital Sun sensor, 3 two-axis inertial reference units, 2 star trackers, and 4 guide telescopes. Attitude actuation is performed using 4 reaction wheels and 8 thrusters, and a single main engine nominally provides velocity-change thrust. The attitude control software has five nominal control modes-3 wheel-based modes and 2 thruster-based modes. A wheel-based Safehold running in the attitude control electronics box improves the robustness of the system as a whole. All six modes are designed on the same basic proportional-integral-derivative attitude error structure, with more robust modes setting their integral gains to zero. The paper details the mode designs and their uses.
Fault Isolation Filter for Networked Control System with Event-Triggered Sampling Scheme
Li, Shanbin; Sauter, Dominique; Xu, Bugong
2011-01-01
In this paper, the sensor data is transmitted only when the absolute value of difference between the current sensor value and the previously transmitted one is greater than the given threshold value. Based on this send-on-delta scheme which is one of the event-triggered sampling strategies, a modified fault isolation filter for a discrete-time networked control system with multiple faults is then implemented by a particular form of the Kalman filter. The proposed fault isolation filter improves the resource utilization with graceful fault estimation performance degradation. An illustrative example is given to show the efficiency of the proposed method. PMID:22346590
Intelligent neuroprocessors for in-situ launch vehicle propulsion systems health management
NASA Technical Reports Server (NTRS)
Gulati, S.; Tawel, R.; Thakoor, A. P.
1993-01-01
Efficacy of existing on-board propulsion systems health management systems (HMS) are severely impacted by computational limitations (e.g., low sampling rates); paradigmatic limitations (e.g., low-fidelity logic/parameter redlining only, false alarms due to noisy/corrupted sensor signatures, preprogrammed diagnostics only); and telemetry bandwidth limitations on space/ground interactions. Ultra-compact/light, adaptive neural networks with massively parallel, asynchronous, fast reconfigurable and fault-tolerant information processing properties have already demonstrated significant potential for inflight diagnostic analyses and resource allocation with reduced ground dependence. In particular, they can automatically exploit correlation effects across multiple sensor streams (plume analyzer, flow meters, vibration detectors, etc.) so as to detect anomaly signatures that cannot be determined from the exploitation of single sensor. Furthermore, neural networks have already demonstrated the potential for impacting real-time fault recovery in vehicle subsystems by adaptively regulating combustion mixture/power subsystems and optimizing resource utilization under degraded conditions. A class of high-performance neuroprocessors, developed at JPL, that have demonstrated potential for next-generation HMS for a family of space transportation vehicles envisioned for the next few decades, including HLLV, NLS, and space shuttle is presented. Of fundamental interest are intelligent neuroprocessors for real-time plume analysis, optimizing combustion mixture-ratio, and feedback to hydraulic, pneumatic control systems. This class includes concurrently asynchronous reprogrammable, nonvolatile, analog neural processors with high speed, high bandwidth electronic/optical I/O interfaced, with special emphasis on NASA's unique requirements in terms of performance, reliability, ultra-high density ultra-compactness, ultra-light weight devices, radiation hardened devices, power stringency, and long life terms.
Combining dynamical decoupling with fault-tolerant quantum computation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ng, Hui Khoon; Preskill, John; Lidar, Daniel A.
2011-07-15
We study how dynamical decoupling (DD) pulse sequences can improve the reliability of quantum computers. We prove upper bounds on the accuracy of DD-protected quantum gates and derive sufficient conditions for DD-protected gates to outperform unprotected gates. Under suitable conditions, fault-tolerant quantum circuits constructed from DD-protected gates can tolerate stronger noise and have a lower overhead cost than fault-tolerant circuits constructed from unprotected gates. Our accuracy estimates depend on the dynamics of the bath that couples to the quantum computer and can be expressed either in terms of the operator norm of the bath's Hamiltonian or in terms of themore » power spectrum of bath correlations; we explain in particular how the performance of recursively generated concatenated pulse sequences can be analyzed from either viewpoint. Our results apply to Hamiltonian noise models with limited spatial correlations.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Duan, Sisi; Li, Yun; Levitt, Karl N.
Consensus is a fundamental approach to implementing fault-tolerant services through replication where there exists a tradeoff between the cost and the resilience. For instance, Crash Fault Tolerant (CFT) protocols have a low cost but can only handle crash failures while Byzantine Fault Tolerant (BFT) protocols handle arbitrary failures but have a higher cost. Hybrid protocols enjoy the benefits of both high performance without failures and high resiliency under failures by switching among different subprotocols. However, it is challenging to determine which subprotocols should be used. We propose a moving target approach to switch among protocols according to the existing systemmore » and network vulnerability. At the core of our approach is a formalized cost model that evaluates the vulnerability and performance of consensus protocols based on real-time Intrusion Detection System (IDS) signals. Based on the evaluation results, we demonstrate that a safe, cheap, and unpredictable protocol is always used and a high IDS error rate can be tolerated.« less
Fault-Tolerant, Real-Time, Multi-Core Computer System
NASA Technical Reports Server (NTRS)
Gostelow, Kim P.
2012-01-01
A document discusses a fault-tolerant, self-aware, low-power, multi-core computer for space missions with thousands of simple cores, achieving speed through concurrency. The proposed machine decides how to achieve concurrency in real time, rather than depending on programmers. The driving features of the system are simple hardware that is modular in the extreme, with no shared memory, and software with significant runtime reorganizing capability. The document describes a mechanism for moving ongoing computations and data that is based on a functional model of execution. Because there is no shared memory, the processor connects to its neighbors through a high-speed data link. Messages are sent to a neighbor switch, which in turn forwards that message on to its neighbor until reaching the intended destination. Except for the neighbor connections, processors are isolated and independent of each other. The processors on the periphery also connect chip-to-chip, thus building up a large processor net. There is no particular topology to the larger net, as a function at each processor allows it to forward a message in the correct direction. Some chip-to-chip connections are not necessarily nearest neighbors, providing short cuts for some of the longer physical distances. The peripheral processors also provide the connections to sensors, actuators, radios, science instruments, and other devices with which the computer system interacts.
NASA Technical Reports Server (NTRS)
Cason, R. L.; Mcstay, J. J.; Heymann, A. P., Sr.
1979-01-01
Inexpensive system automatically indicates location of short-circuited section of power cable. Monitor does not require that cable be disconnected from its power source or that test signals be applied. Instead, ground-current sensors are installed in manholes or at other selected locations along cable run. When fault occurs, sensors transmit information about fault location to control center. Repair crew can be sent to location and cable can be returned to service with minimum of downtime.
Vibration Sensor Data Denoising Using a Time-Frequency Manifold for Machinery Fault Diagnosis
He, Qingbo; Wang, Xiangxiang; Zhou, Qiang
2014-01-01
Vibration sensor data from a mechanical system are often associated with important measurement information useful for machinery fault diagnosis. However, in practice the existence of background noise makes it difficult to identify the fault signature from the sensing data. This paper introduces the time-frequency manifold (TFM) concept into sensor data denoising and proposes a novel denoising method for reliable machinery fault diagnosis. The TFM signature reflects the intrinsic time-frequency structure of a non-stationary signal. The proposed method intends to realize data denoising by synthesizing the TFM using time-frequency synthesis and phase space reconstruction (PSR) synthesis. Due to the merits of the TFM in noise suppression and resolution enhancement, the denoised signal would have satisfactory denoising effects, as well as inherent time-frequency structure keeping. Moreover, this paper presents a clustering-based statistical parameter to evaluate the proposed method, and also presents a new diagnostic approach, called frequency probability time series (FPTS) spectral analysis, to show its effectiveness in fault diagnosis. The proposed TFM-based data denoising method has been employed to deal with a set of vibration sensor data from defective bearings, and the results verify that for machinery fault diagnosis the method is superior to two traditional denoising methods. PMID:24379045
Fault Tolerant Software Technology for Distributed Computer Systems
1989-03-01
RAY.) &-TR-88-296 I Fin;.’ Technical Report ,r 19,39 i A28 3329 F’ULT TOLERANT SOFTWARE TECHNOLOGY FOR DISTRIBUTED COMPUTER SYSTEMS Georgia Institute...GrfisABN 34-70IiWftlI NO0. IN?3. NO IACCESSION NO. 158 21 7 11. TITLE (Incld security Cassification) FAULT TOLERANT SOFTWARE FOR DISTRIBUTED COMPUTER ...Technology for Distributed Computing Systems," a two year effort performed at Georgia Institute of Technology as part of the Clouds Project. The Clouds
Combined methods of tolerance increasing for embedded SRAM
NASA Astrophysics Data System (ADS)
Shchigorev, L. A.; Shagurin, I. I.
2016-10-01
The abilities of combined use of different methods of fault tolerance increasing for SRAM such as error detection and correction codes, parity bits, and redundant elements are considered. Area penalties due to using combinations of these methods are investigated. Estimation is made for different configurations of 4K x 128 RAM memory block for 28 nm manufacturing process. Evaluation of the effectiveness of the proposed combinations is also reported. The results of these investigations can be useful for designing fault-tolerant “system on chips”.
An approximation formula for a class of fault-tolerant computers
NASA Technical Reports Server (NTRS)
White, A. L.
1986-01-01
An approximation formula is derived for the probability of failure for fault-tolerant process-control computers. These computers use redundancy and reconfiguration to achieve high reliability. Finite-state Markov models capture the dynamic behavior of component failure and system recovery, and the approximation formula permits an estimation of system reliability by an easy examination of the model.
NASA Technical Reports Server (NTRS)
Shalkhauser, Mary JO; Quintana, Jorge A.; Soni, Nitin J.
1994-01-01
The NASA Lewis Research Center is developing a multichannel communication signal processing satellite (MCSPS) system which will provide low data rate, direct to user, commercial communications services. The focus of current space segment developments is a flexible, high-throughput, fault tolerant onboard information switching processor. This information switching processor (ISP) is a destination-directed packet switch which performs both space and time switching to route user information among numerous user ground terminals. Through both industry study contracts and in-house investigations, several packet switching architectures were examined. A contention-free approach, the shared memory per beam architecture, was selected for implementation. The shared memory per beam architecture, fault tolerance insertion, implementation, and demonstration plans are described.
Formal design specification of a Processor Interface Unit
NASA Technical Reports Server (NTRS)
Fura, David A.; Windley, Phillip J.; Cohen, Gerald C.
1992-01-01
This report describes work to formally specify the requirements and design of a processor interface unit (PIU), a single-chip subsystem providing memory-interface bus-interface, and additional support services for a commercial microprocessor within a fault-tolerant computer system. This system, the Fault-Tolerant Embedded Processor (FTEP), is targeted towards applications in avionics and space requiring extremely high levels of mission reliability, extended maintenance-free operation, or both. The need for high-quality design assurance in such applications is an undisputed fact, given the disastrous consequences that even a single design flaw can produce. Thus, the further development and application of formal methods to fault-tolerant systems is of critical importance as these systems see increasing use in modern society.
NASA Technical Reports Server (NTRS)
Butler, Ricky W.; Johnson, Sally C.
1995-01-01
This paper presents a step-by-step tutorial of the methods and the tools that were used for the reliability analysis of fault-tolerant systems. The approach used in this paper is the Markov (or semi-Markov) state-space method. The paper is intended for design engineers with a basic understanding of computer architecture and fault tolerance, but little knowledge of reliability modeling. The representation of architectural features in mathematical models is emphasized. This paper does not present details of the mathematical solution of complex reliability models. Instead, it describes the use of several recently developed computer programs SURE, ASSIST, STEM, and PAWS that automate the generation and the solution of these models.
Cascading Policies Provide Fault Tolerance for Pervasive Clinical Communications.
Williams, Rose; Jalan, Srikant; Stern, Edie; Lussier, Yves A
2005-03-21
We implemented an end-to-end notification system that pushed urgent clinical laboratory results to Blackberry 7510 devices over the Nextel cellular network. We designed our system to use user roles and notification policies to abstract and execute clinical notification procedures. We anticipated some problems with dropped and non-delivered messages when the device was out-of-network, however, we did not expect the same problems in other situations like device reconnection to the network. We addressed these problems by creating cascading "fault tolerance" policies to drive notification escalation when messages timed-out or delivery failed. This paper describes our experience in providing an adaptable, fault tolerant pervasive notification system for delivering secure, critical, time-sensitive patient laboratory results.
A Unified Fault-Tolerance Protocol
NASA Technical Reports Server (NTRS)
Miner, Paul; Gedser, Alfons; Pike, Lee; Maddalon, Jeffrey
2004-01-01
Davies and Wakerly show that Byzantine fault tolerance can be achieved by a cascade of broadcasts and middle value select functions. We present an extension of the Davies and Wakerly protocol, the unified protocol, and its proof of correctness. We prove that it satisfies validity and agreement properties for communication of exact values. We then introduce bounded communication error into the model. Inexact communication is inherent for clock synchronization protocols. We prove that validity and agreement properties hold for inexact communication, and that exact communication is a special case. As a running example, we illustrate the unified protocol using the SPIDER family of fault-tolerant architectures. In particular we demonstrate that the SPIDER interactive consistency, distributed diagnosis, and clock synchronization protocols are instances of the unified protocol.
2009 fault tolerance for extreme-scale computing workshop, Albuquerque, NM - March 19-20, 2009.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Katz, D. S.; Daly, J.; DeBardeleben, N.
2009-02-01
This is a report on the third in a series of petascale workshops co-sponsored by Blue Waters and TeraGrid to address challenges and opportunities for making effective use of emerging extreme-scale computing. This workshop was held to discuss fault tolerance on large systems for running large, possibly long-running applications. The main point of the workshop was to have systems people, middleware people (including fault-tolerance experts), and applications people talk about the issues and figure out what needs to be done, mostly at the middleware and application levels, to run such applications on the emerging petascale systems, without having faults causemore » large numbers of application failures. The workshop found that there is considerable interest in fault tolerance, resilience, and reliability of high-performance computing (HPC) systems in general, at all levels of HPC. The only way to recover from faults is through the use of some redundancy, either in space or in time. Redundancy in time, in the form of writing checkpoints to disk and restarting at the most recent checkpoint after a fault that cause an application to crash/halt, is the most common tool used in applications today, but there are questions about how long this can continue to be a good solution as systems and memories grow faster than I/O bandwidth to disk. There is interest in both modifications to this, such as checkpoints to memory, partial checkpoints, and message logging, and alternative ideas, such as in-memory recovery using residues. We believe that systematic exploration of these ideas holds the most promise for the scientific applications community. Fault tolerance has been an issue of discussion in the HPC community for at least the past 10 years; but much like other issues, the community has managed to put off addressing it during this period. There is a growing recognition that as systems continue to grow to petascale and beyond, the field is approaching the point where we don't have any choice but to address this through R&D efforts.« less
NASA Astrophysics Data System (ADS)
Wang, H.; Jing, X. J.
2017-07-01
This paper presents a virtual beam based approach suitable for conducting diagnosis of multiple faults in complex structures with limited prior knowledge of the faults involved. The "virtual beam", a recently-proposed concept for fault detection in complex structures, is applied, which consists of a chain of sensors representing a vibration energy transmission path embedded in the complex structure. Statistical tests and adaptive threshold are particularly adopted for fault detection due to limited prior knowledge of normal operational conditions and fault conditions. To isolate the multiple faults within a specific structure or substructure of a more complex one, a 'biased running' strategy is developed and embedded within the bacterial-based optimization method to construct effective virtual beams and thus to improve the accuracy of localization. The proposed method is easy and efficient to implement for multiple fault localization with limited prior knowledge of normal conditions and faults. With extensive experimental results, it is validated that the proposed method can localize both single fault and multiple faults more effectively than the classical trust index subtract on negative add on positive (TI-SNAP) method.
Zhang, Dapeng; Long, Zhiqiang; Xue, Song; Zhang, Junge
2012-01-01
This paper studies an absolute positioning sensor for a high-speed maglev train and its fault diagnosis method. The absolute positioning sensor is an important sensor for the high-speed maglev train to accomplish its synchronous traction. It is used to calibrate the error of the relative positioning sensor which is used to provide the magnetic phase signal. On the basis of the analysis for the principle of the absolute positioning sensor, the paper describes the design of the sending and receiving coils and realizes the hardware and the software for the sensor. In order to enhance the reliability of the sensor, a support vector machine is used to recognize the fault characters, and the signal flow method is used to locate the faulty parts. The diagnosis information not only can be sent to an upper center control computer to evaluate the reliability of the sensors, but also can realize on-line diagnosis for debugging and the quick detection when the maglev train is off-line. The absolute positioning sensor we study has been used in the actual project. PMID:23112619
Editing wild points in isolation - Fast agreement for reliable systems (Preliminary version)
NASA Technical Reports Server (NTRS)
Kearns, Phil; Evans, Carol
1989-01-01
Consideration is given to the intuitively appealing notion of discarding sensor values which are strongly suspected of being erroneous in a modified approximate agreement protocol. Approximate agreement with editing imposes a time bound upon the convergence of the protocol - no such bound was possible for the original approximate agreement protocol. This new approach is potentially useful in the construction of asynchronous fault tolerant systems. The main result is that a wild-point replacement technique called t-worst editing can be shown to guarantee convergence of the approximate agreement protocol to a valid agreement value. Results are presented for a four-processor synchronous system in which a single processor may be faulty.
NASA Technical Reports Server (NTRS)
Rash, James L. (Editor); Dent, Carolyn P. (Editor)
1989-01-01
Theoretical and implementation aspects of AI systems for space applications are discussed in reviews and reports. Sections are devoted to planning and scheduling, fault isolation and diagnosis, data management, modeling and simulation, and development tools and methods. Particular attention is given to a situated reasoning architecture for space repair and replace tasks, parallel plan execution with self-processing networks, the electrical diagnostics expert system for Spacelab life-sciences experiments, diagnostic tolerance for missing sensor data, the integration of perception and reasoning in fast neural modules, a connectionist model for dynamic control, and applications of fuzzy sets to the development of rule-based expert systems.
A Fault Diagnosis Methodology for Gear Pump Based on EEMD and Bayesian Network
Liu, Zengkai; Liu, Yonghong; Shan, Hongkai; Cai, Baoping; Huang, Qing
2015-01-01
This paper proposes a fault diagnosis methodology for a gear pump based on the ensemble empirical mode decomposition (EEMD) method and the Bayesian network. Essentially, the presented scheme is a multi-source information fusion based methodology. Compared with the conventional fault diagnosis with only EEMD, the proposed method is able to take advantage of all useful information besides sensor signals. The presented diagnostic Bayesian network consists of a fault layer, a fault feature layer and a multi-source information layer. Vibration signals from sensor measurement are decomposed by the EEMD method and the energy of intrinsic mode functions (IMFs) are calculated as fault features. These features are added into the fault feature layer in the Bayesian network. The other sources of useful information are added to the information layer. The generalized three-layer Bayesian network can be developed by fully incorporating faults and fault symptoms as well as other useful information such as naked eye inspection and maintenance records. Therefore, diagnostic accuracy and capacity can be improved. The proposed methodology is applied to the fault diagnosis of a gear pump and the structure and parameters of the Bayesian network is established. Compared with artificial neural network and support vector machine classification algorithms, the proposed model has the best diagnostic performance when sensor data is used only. A case study has demonstrated that some information from human observation or system repair records is very helpful to the fault diagnosis. It is effective and efficient in diagnosing faults based on uncertain, incomplete information. PMID:25938760
A Fault Diagnosis Methodology for Gear Pump Based on EEMD and Bayesian Network.
Liu, Zengkai; Liu, Yonghong; Shan, Hongkai; Cai, Baoping; Huang, Qing
2015-01-01
This paper proposes a fault diagnosis methodology for a gear pump based on the ensemble empirical mode decomposition (EEMD) method and the Bayesian network. Essentially, the presented scheme is a multi-source information fusion based methodology. Compared with the conventional fault diagnosis with only EEMD, the proposed method is able to take advantage of all useful information besides sensor signals. The presented diagnostic Bayesian network consists of a fault layer, a fault feature layer and a multi-source information layer. Vibration signals from sensor measurement are decomposed by the EEMD method and the energy of intrinsic mode functions (IMFs) are calculated as fault features. These features are added into the fault feature layer in the Bayesian network. The other sources of useful information are added to the information layer. The generalized three-layer Bayesian network can be developed by fully incorporating faults and fault symptoms as well as other useful information such as naked eye inspection and maintenance records. Therefore, diagnostic accuracy and capacity can be improved. The proposed methodology is applied to the fault diagnosis of a gear pump and the structure and parameters of the Bayesian network is established. Compared with artificial neural network and support vector machine classification algorithms, the proposed model has the best diagnostic performance when sensor data is used only. A case study has demonstrated that some information from human observation or system repair records is very helpful to the fault diagnosis. It is effective and efficient in diagnosing faults based on uncertain, incomplete information.
Bae, Sungwoo; Kim, Myungchin
2016-01-01
In order to realize a true WoT environment, a reliable power circuit is required to ensure interconnections among a range of WoT devices. This paper presents research on sensors and their effects on the reliability and response characteristics of power circuits in WoT devices. The presented research can be used in various power circuit applications, such as energy harvesting interfaces, photovoltaic systems, and battery management systems for the WoT devices. As power circuits rely on the feedback from voltage/current sensors, the system performance is likely to be affected by the sensor failure rates, sensor dynamic characteristics, and their interface circuits. This study investigated how the operational availability of the power circuits is affected by the sensor failure rates by performing a quantitative reliability analysis. In the analysis process, this paper also includes the effects of various reconstruction and estimation techniques used in power processing circuits (e.g., energy harvesting circuits and photovoltaic systems). This paper also reports how the transient control performance of power circuits is affected by sensor interface circuits. With the frequency domain stability analysis and circuit simulation, it was verified that the interface circuit dynamics may affect the transient response characteristics of power circuits. The verification results in this paper showed that the reliability and control performance of the power circuits can be affected by the sensor types, fault tolerant approaches against sensor failures, and the response characteristics of the sensor interfaces. The analysis results were also verified by experiments using a power circuit prototype. PMID:27608020
Critical fault patterns determination in fault-tolerant computer systems
NASA Technical Reports Server (NTRS)
Mccluskey, E. J.; Losq, J.
1978-01-01
The method proposed tries to enumerate all the critical fault-patterns (successive occurrences of failures) without analyzing every single possible fault. The conditions for the system to be operating in a given mode can be expressed in terms of the static states. Thus, one can find all the system states that correspond to a given critical mode of operation. The next step consists in analyzing the fault-detection mechanisms, the diagnosis algorithm and the process of switch control. From them, one can find all the possible system configurations that can result from a failure occurrence. Thus, one can list all the characteristics, with respect to detection, diagnosis, and switch control, that failures must have to constitute critical fault-patterns. Such an enumeration of the critical fault-patterns can be directly used to evaluate the overall system tolerance to failures. Present research is focused on how to efficiently make use of these system-level characteristics to enumerate all the failures that verify these characteristics.
NASA Astrophysics Data System (ADS)
Farid, Yousef; Majd, Vahid Johari; Ehsani-Seresht, Abbas
2018-05-01
In this paper, a novel fault accommodation strategy is proposed for the legged robots subject to the actuator faults including actuation bias and effective gain degradation as well as the actuator saturation. First, the combined dynamics of two coupled subsystems consisting of the dynamics of the legs subsystem and the body subsystem are developed. Then, the interaction of the robot with the environment is formulated as the contact force optimization problem with equality and inequality constraints. The desired force is obtained by a dynamic model. A robust super twisting fault estimator is proposed to precisely estimate the defective torque amplitude of the faulty actuator in finite time. Defining a novel fractional sliding surface, a fractional nonsingular terminal sliding mode control law is developed. Moreover, by introducing a suitable auxiliary system and using its state vector in the designed controller, the proposed fault-tolerant control (FTC) scheme guarantees the finite-time stability of the closed-loop control system. The robustness and finite-time convergence of the proposed control law is established using the Lyapunov stability theory. Finally, numerical simulations are performed on a quadruped robot to demonstrate the stable walking of the robot with and without actuator faults, and actuator saturation constraints, and the results are compared to results with an integer order fault-tolerant controller.
Using certification trails to achieve software fault tolerance
NASA Technical Reports Server (NTRS)
Sullivan, Gregory F.; Masson, Gerald M.
1993-01-01
A conceptually novel and powerful technique to achieve fault tolerance in hardware and software systems is introduced. When used for software fault tolerance, this new technique uses time and software redundancy and can be outlined as follows. In the initial phase, a program is run to solve a problem and store the result. In addition, this program leaves behind a trail of data called a certification trail. In the second phase, another program is run which solves the original problem again. This program, however, has access to the certification trail left by the first program. Because of the availability of the certification trail, the second phase can be performed by a less complex program and can execute more quickly. In the final phase, the two results are accepted as correct; otherwise an error is indicated. An essential aspect of this approach is that the second program must always generate either an error indication or a correct output even when the certification trail it receives from the first program is incorrect. The certification trail approach to fault tolerance was formalized and it was illustrated by applying it to the fundamental problem of finding a minimum spanning tree. Cases in which the second phase can be run concorrectly with the first and act as a monitor are discussed. The certification trail approach was compared to other approaches to fault tolerance. Because of space limitations we have omitted examples of our technique applied to the Huffman tree, and convex hull problems. These can be found in the full version of this paper.
Reliability and availability evaluation of Wireless Sensor Networks for industrial applications.
Silva, Ivanovitch; Guedes, Luiz Affonso; Portugal, Paulo; Vasques, Francisco
2012-01-01
Wireless Sensor Networks (WSN) currently represent the best candidate to be adopted as the communication solution for the last mile connection in process control and monitoring applications in industrial environments. Most of these applications have stringent dependability (reliability and availability) requirements, as a system failure may result in economic losses, put people in danger or lead to environmental damages. Among the different type of faults that can lead to a system failure, permanent faults on network devices have a major impact. They can hamper communications over long periods of time and consequently disturb, or even disable, control algorithms. The lack of a structured approach enabling the evaluation of permanent faults, prevents system designers to optimize decisions that minimize these occurrences. In this work we propose a methodology based on an automatic generation of a fault tree to evaluate the reliability and availability of Wireless Sensor Networks, when permanent faults occur on network devices. The proposal supports any topology, different levels of redundancy, network reconfigurations, criticality of devices and arbitrary failure conditions. The proposed methodology is particularly suitable for the design and validation of Wireless Sensor Networks when trying to optimize its reliability and availability requirements.
Reliability and Availability Evaluation of Wireless Sensor Networks for Industrial Applications
Silva, Ivanovitch; Guedes, Luiz Affonso; Portugal, Paulo; Vasques, Francisco
2012-01-01
Wireless Sensor Networks (WSN) currently represent the best candidate to be adopted as the communication solution for the last mile connection in process control and monitoring applications in industrial environments. Most of these applications have stringent dependability (reliability and availability) requirements, as a system failure may result in economic losses, put people in danger or lead to environmental damages. Among the different type of faults that can lead to a system failure, permanent faults on network devices have a major impact. They can hamper communications over long periods of time and consequently disturb, or even disable, control algorithms. The lack of a structured approach enabling the evaluation of permanent faults, prevents system designers to optimize decisions that minimize these occurrences. In this work we propose a methodology based on an automatic generation of a fault tree to evaluate the reliability and availability of Wireless Sensor Networks, when permanent faults occur on network devices. The proposal supports any topology, different levels of redundancy, network reconfigurations, criticality of devices and arbitrary failure conditions. The proposed methodology is particularly suitable for the design and validation of Wireless Sensor Networks when trying to optimize its reliability and availability requirements. PMID:22368497
NASA Astrophysics Data System (ADS)
Wang, H.; Jing, X. J.
2017-02-01
This paper proposes a novel method for the fault diagnosis of complex structures based on an optimized virtual beam-like structure approach. A complex structure can be regarded as a combination of numerous virtual beam-like structures considering the vibration transmission path from vibration sources to each sensor. The structural 'virtual beam' consists of a sensor chain automatically obtained by an Improved Bacterial Optimization Algorithm (IBOA). The biologically inspired optimization method (i.e. IBOA) is proposed for solving the discrete optimization problem associated with the selection of the optimal virtual beam for fault diagnosis. This novel virtual beam-like-structure approach needs less or little prior knowledge. Neither does it require stationary response data, nor is it confined to a specific structure design. It is easy to implement within a sensor network attached to the monitored structure. The proposed fault diagnosis method has been tested on the detection of loosening screws located at varying positions in a real satellite-like model. Compared with empirical methods, the proposed virtual beam-like structure method has proved to be very effective and more reliable for fault localization.
Abbaspour, Alireza; Aboutalebi, Payam; Yen, Kang K; Sargolzaei, Arman
2017-03-01
A new online detection strategy is developed to detect faults in sensors and actuators of unmanned aerial vehicle (UAV) systems. In this design, the weighting parameters of the Neural Network (NN) are updated by using the Extended Kalman Filter (EKF). Online adaptation of these weighting parameters helps to detect abrupt, intermittent, and incipient faults accurately. We apply the proposed fault detection system to a nonlinear dynamic model of the WVU YF-22 unmanned aircraft for its evaluation. The simulation results show that the new method has better performance in comparison with conventional recurrent neural network-based fault detection strategies. Copyright © 2016 ISA. Published by Elsevier Ltd. All rights reserved.
A survey of provably correct fault-tolerant clock synchronization techniques
NASA Technical Reports Server (NTRS)
Butler, Ricky W.
1988-01-01
Six provably correct fault-tolerant clock synchronization algorithms are examined. These algorithms are all presented in the same notation to permit easier comprehension and comparison. The advantages and disadvantages of the different techniques are examined and issues related to the implementation of these algorithms are discussed. The paper argues for the use of such algorithms in life-critical applications.
Fault Tolerance for VLSI Multicomputers
1985-08-01
that consists of hundreds or thousands of VLSI computation nodes interconnected by dedicated links. Some important applications of high-end computers...technology, and intended applications . A proposed fault tolerance scheme combines hardware that performs error detection and system-level protocols for...order to recover from the error and resume correct operation, a valid system state must be restored. A low-overhead, application -transparent error
Fault-Tolerant Computing: An Overview
1991-06-01
Addison Wesley:, Reading, MA) 1984. [8] J. Wakerly , Error Detecting Codes, Self-Checking Circuits and Applications , (Elsevier North Holland, Inc.- New York... applicable to bit-sliced organi- zations of hardware. In the first time step, the normal computation is performed on the operands and the results...for error detection and fault tolerance in parallel processor systems while perform- ing specific computation-intensive applications [111. Contrary to
NASA Technical Reports Server (NTRS)
Tai, Ann T.; Chau, Savio N.; Alkalai, Leon
2000-01-01
Using COTS products, standards and intellectual properties (IPs) for all the system and component interfaces is a crucial step toward significant reduction of both system cost and development cost as the COTS interfaces enable other COTS products and IPs to be readily accommodated by the target system architecture. With respect to the long-term survivable systems for deep-space missions, the major challenge for us is, under stringent power and mass constraints, to achieve ultra-high reliability of the system comprising COTS products and standards that are not developed for mission-critical applications. The spirit of our solution is to exploit the pertinent standard features of a COTS product to circumvent its shortcomings, though these standard features may not be originally designed for highly reliable systems. In this paper, we discuss our experiences and findings on the design of an IEEE 1394 compliant fault-tolerant COTS-based bus architecture. We first derive and qualitatively analyze a -'stacktree topology" that not only complies with IEEE 1394 but also enables the implementation of a fault-tolerant bus architecture without node redundancy. We then present a quantitative evaluation that demonstrates significant reliability improvement from the COTS-based fault tolerance.
Design and Implementation of Replicated Object Layer
NASA Technical Reports Server (NTRS)
Koka, Sudhir
1996-01-01
One of the widely used techniques for construction of fault tolerant applications is the replication of resources so that if one copy fails sufficient copies may still remain operational to allow the application to continue to function. This thesis involves the design and implementation of an object oriented framework for replicating data on multiple sites and across different platforms. Our approach, called the Replicated Object Layer (ROL) provides a mechanism for consistent replication of data over dynamic networks. ROL uses the Reliable Multicast Protocol (RMP) as a communication protocol that provides for reliable delivery, serialization and fault tolerance. Besides providing type registration, this layer facilitates distributed atomic transactions on replicated data. A novel algorithm called the RMP Commit Protocol, which commits transactions efficiently in reliable multicast environment is presented. ROL provides recovery procedures to ensure that site and communication failures do not corrupt persistent data, and male the system fault tolerant to network partitions. ROL will facilitate building distributed fault tolerant applications by performing the burdensome details of replica consistency operations, and making it completely transparent to the application.Replicated databases are a major class of applications which could be built on top of ROL.
NASA Technical Reports Server (NTRS)
Lala, J. H.; Smith, T. B., III
1983-01-01
The experimental test and evaluation of the Fault-Tolerant Multiprocessor (FTMP) is described. Major objectives of this exercise include expanding validation envelope, building confidence in the system, revealing any weaknesses in the architectural concepts and in their execution in hardware and software, and in general, stressing the hardware and software. To this end, pin-level faults were injected into one LRU of the FTMP and the FTMP response was measured in terms of fault detection, isolation, and recovery times. A total of 21,055 stuck-at-0, stuck-at-1 and invert-signal faults were injected in the CPU, memory, bus interface circuits, Bus Guardian Units, and voters and error latches. Of these, 17,418 were detected. At least 80 percent of undetected faults are estimated to be on unused pins. The multiprocessor identified all detected faults correctly and recovered successfully in each case. Total recovery time for all faults averaged a little over one second. This can be reduced to half a second by including appropriate self-tests.
A formally verified algorithm for interactive consistency under a hybrid fault model
NASA Technical Reports Server (NTRS)
Lincoln, Patrick; Rushby, John
1993-01-01
Consistent distribution of single-source data to replicated computing channels is a fundamental problem in fault-tolerant system design. The 'Oral Messages' (OM) algorithm solves this problem of Interactive Consistency (Byzantine Agreement) assuming that all faults are worst-cass. Thambidurai and Park introduced a 'hybrid' fault model that distinguished three fault modes: asymmetric (Byzantine), symmetric, and benign; they also exhibited, along with an informal 'proof of correctness', a modified version of OM. Unfortunately, their algorithm is flawed. The discipline of mechanically checked formal verification eventually enabled us to develop a correct algorithm for Interactive Consistency under the hybrid fault model. This algorithm withstands $a$ asymmetric, $s$ symmetric, and $b$ benign faults simultaneously, using $m+1$ rounds, provided $n is greater than 2a + 2s + b + m$, and $m\\geg a$. We present this algorithm, discuss its subtle points, and describe its formal specification and verification in PVS. We argue that formal verification systems such as PVS are now sufficiently effective that their application to fault-tolerance algorithms should be considered routine.
NASA Technical Reports Server (NTRS)
Lewicki, David George; Lambert, Nicholas A.; Wagoner, Robert S.
2015-01-01
The diagnostics capability of micro-electro-mechanical systems (MEMS) based rotating accelerometer sensors in detecting gear tooth crack failures in helicopter main-rotor transmissions was evaluated. MEMS sensors were installed on a pre-notched OH-58C spiral-bevel pinion gear. Endurance tests were performed and the gear was run to tooth fracture failure. Results from the MEMS sensor were compared to conventional accelerometers mounted on the transmission housing. Most of the four stationary accelerometers mounted on the gear box housing and most of the CI's used gave indications of failure at the end of the test. The MEMS system performed well and lasted the entire test. All MEMS accelerometers gave an indication of failure at the end of the test. The MEMS systems performed as well, if not better, than the stationary accelerometers mounted on the gear box housing with regards to gear tooth fault detection. For both the MEMS sensors and stationary sensors, the fault detection time was not much sooner than the actual tooth fracture time. The MEMS sensor spectrum data showed large first order shaft frequency sidebands due to the measurement rotating frame of reference. The method of constructing a pseudo tach signal from periodic characteristics of the vibration data was successful in deriving a TSA signal without an actual tach and proved as an effective way to improve fault detection for the MEMS.
NASA Astrophysics Data System (ADS)
Liu, Jie; Hu, Youmin; Wang, Yan; Wu, Bo; Fan, Jikai; Hu, Zhongxu
2018-05-01
The diagnosis of complicated fault severity problems in rotating machinery systems is an important issue that affects the productivity and quality of manufacturing processes and industrial applications. However, it usually suffers from several deficiencies. (1) A considerable degree of prior knowledge and expertise is required to not only extract and select specific features from raw sensor signals, and but also choose a suitable fusion for sensor information. (2) Traditional artificial neural networks with shallow architectures are usually adopted and they have a limited ability to learn the complex and variable operating conditions. In multi-sensor-based diagnosis applications in particular, massive high-dimensional and high-volume raw sensor signals need to be processed. In this paper, an integrated multi-sensor fusion-based deep feature learning (IMSFDFL) approach is developed to identify the fault severity in rotating machinery processes. First, traditional statistics and energy spectrum features are extracted from multiple sensors with multiple channels and combined. Then, a fused feature vector is constructed from all of the acquisition channels. Further, deep feature learning with stacked auto-encoders is used to obtain the deep features. Finally, the traditional softmax model is applied to identify the fault severity. The effectiveness of the proposed IMSFDFL approach is primarily verified by a one-stage gearbox experimental platform that uses several accelerometers under different operating conditions. This approach can identify fault severity more effectively than the traditional approaches.
NASA Astrophysics Data System (ADS)
Devaraj, Rajesh; Sarkar, Arnab; Biswas, Santosh
2015-11-01
In the article 'Supervisory control for fault-tolerant scheduling of real-time multiprocessor systems with aperiodic tasks', Park and Cho presented a systematic way of computing a largest fault-tolerant and schedulable language that provides information on whether the scheduler (i.e., supervisor) should accept or reject a newly arrived aperiodic task. The computation of such a language is mainly dependent on the task execution model presented in their paper. However, the task execution model is unable to capture the situation when the fault of a processor occurs even before the task has arrived. Consequently, a task execution model that does not capture this fact may possibly be assigned for execution on a faulty processor. This problem has been illustrated with an appropriate example. Then, the task execution model of Park and Cho has been modified to strengthen the requirement that none of the tasks are assigned for execution on a faulty processor.
NASA Astrophysics Data System (ADS)
Liu, Guohai; Gong, Wensheng; Chen, Qian; Jian, Linni; Shen, Yue; Zhao, Wenxiang
2012-04-01
In this paper, a novel in-wheel permanent-magnet (PM) motor for four-wheel-driving electrical vehicles is proposed. It adopts an outer-rotor topology, which can help generate a large drive torque, in order to achieve prominent dynamic performance of the vehicle. Moreover, by adopting single-layer concentrated-windings, fault-tolerant teeth, and the optimal combination of slot and pole numbers, the proposed motor inherently offers negligible electromagnetic coupling between different phase windings, hence, it possesses a fault-tolerant characteristic. Meanwhile, the phase back electromotive force waveforms can be designed to be sinusoidal by employing PMs with a trapezoidal shape, eccentric armature teeth, and unequal tooth widths. The electromagnetic performance is comprehensively investigated and the optimal design is conducted by using the finite-element method.
Using concatenated quantum codes for universal fault-tolerant quantum gates.
Jochym-O'Connor, Tomas; Laflamme, Raymond
2014-01-10
We propose a method for universal fault-tolerant quantum computation using concatenated quantum error correcting codes. The concatenation scheme exploits the transversal properties of two different codes, combining them to provide a means to protect against low-weight arbitrary errors. We give the required properties of the error correcting codes to ensure universal fault tolerance and discuss a particular example using the 7-qubit Steane and 15-qubit Reed-Muller codes. Namely, other than computational basis state preparation as required by the DiVincenzo criteria, our scheme requires no special ancillary state preparation to achieve universality, as opposed to schemes such as magic state distillation. We believe that optimizing the codes used in such a scheme could provide a useful alternative to state distillation schemes that exhibit high overhead costs.
Li, Ying
2016-09-16
Fault-tolerant quantum computing in systems composed of both Majorana fermions and topologically unprotected quantum systems, e.g., superconducting circuits or quantum dots, is studied in this Letter. Errors caused by topologically unprotected quantum systems need to be corrected with error-correction schemes, for instance, the surface code. We find that the error-correction performance of such a hybrid topological quantum computer is not superior to a normal quantum computer unless the topological charge of Majorana fermions is insusceptible to noise. If errors changing the topological charge are rare, the fault-tolerance threshold is much higher than the threshold of a normal quantum computer and a surface-code logical qubit could be encoded in only tens of topological qubits instead of about 1,000 normal qubits.
Economic modeling of fault tolerant flight control systems in commercial applications
NASA Technical Reports Server (NTRS)
Finelli, G. B.
1982-01-01
This paper describes the current development of a comprehensive model which will supply the assessment and analysis capability to investigate the economic viability of Fault Tolerant Flight Control Systems (FTFCS) for commercial aircraft of the 1990's and beyond. An introduction to the unique attributes of fault tolerance and how they will influence aircraft operations and consequent airline costs and benefits is presented. Specific modeling issues and elements necessary for accurate assessment of all costs affected by ownership and operation of FTFCS are delineated. Trade-off factors are presented, aimed at exposing economically optimal realizations of system implementations, resource allocation, and operating policies. A trade-off example is furnished to graphically display some of the analysis capabilities of the comprehensive simulation model now being developed.
Problems related to the integration of fault tolerant aircraft electronic systems
NASA Technical Reports Server (NTRS)
Bannister, J. A.; Adlakha, V.; Triyedi, K.; Alspaugh, T. A., Jr.
1982-01-01
Problems related to the design of the hardware for an integrated aircraft electronic system are considered. Taxonomies of concurrent systems are reviewed and a new taxonomy is proposed. An informal methodology intended to identify feasible regions of the taxonomic design space is described. Specific tools are recommended for use in the methodology. Based on the methodology, a preliminary strawman integrated fault tolerant aircraft electronic system is proposed. Next, problems related to the programming and control of inegrated aircraft electronic systems are discussed. Issues of system resource management, including the scheduling and allocation of real time periodic tasks in a multiprocessor environment, are treated in detail. The role of software design in integrated fault tolerant aircraft electronic systems is discussed. Conclusions and recommendations for further work are included.
Fault Diagnosis Based on Chemical Sensor Data with an Active Deep Neural Network
Jiang, Peng; Hu, Zhixin; Liu, Jun; Yu, Shanen; Wu, Feng
2016-01-01
Big sensor data provide significant potential for chemical fault diagnosis, which involves the baseline values of security, stability and reliability in chemical processes. A deep neural network (DNN) with novel active learning for inducing chemical fault diagnosis is presented in this study. It is a method using large amount of chemical sensor data, which is a combination of deep learning and active learning criterion to target the difficulty of consecutive fault diagnosis. DNN with deep architectures, instead of shallow ones, could be developed through deep learning to learn a suitable feature representation from raw sensor data in an unsupervised manner using stacked denoising auto-encoder (SDAE) and work through a layer-by-layer successive learning process. The features are added to the top Softmax regression layer to construct the discriminative fault characteristics for diagnosis in a supervised manner. Considering the expensive and time consuming labeling of sensor data in chemical applications, in contrast to the available methods, we employ a novel active learning criterion for the particularity of chemical processes, which is a combination of Best vs. Second Best criterion (BvSB) and a Lowest False Positive criterion (LFP), for further fine-tuning of diagnosis model in an active manner rather than passive manner. That is, we allow models to rank the most informative sensor data to be labeled for updating the DNN parameters during the interaction phase. The effectiveness of the proposed method is validated in two well-known industrial datasets. Results indicate that the proposed method can obtain superior diagnosis accuracy and provide significant performance improvement in accuracy and false positive rate with less labeled chemical sensor data by further active learning compared with existing methods. PMID:27754386
Fault Diagnosis Based on Chemical Sensor Data with an Active Deep Neural Network.
Jiang, Peng; Hu, Zhixin; Liu, Jun; Yu, Shanen; Wu, Feng
2016-10-13
Big sensor data provide significant potential for chemical fault diagnosis, which involves the baseline values of security, stability and reliability in chemical processes. A deep neural network (DNN) with novel active learning for inducing chemical fault diagnosis is presented in this study. It is a method using large amount of chemical sensor data, which is a combination of deep learning and active learning criterion to target the difficulty of consecutive fault diagnosis. DNN with deep architectures, instead of shallow ones, could be developed through deep learning to learn a suitable feature representation from raw sensor data in an unsupervised manner using stacked denoising auto-encoder (SDAE) and work through a layer-by-layer successive learning process. The features are added to the top Softmax regression layer to construct the discriminative fault characteristics for diagnosis in a supervised manner. Considering the expensive and time consuming labeling of sensor data in chemical applications, in contrast to the available methods, we employ a novel active learning criterion for the particularity of chemical processes, which is a combination of Best vs. Second Best criterion (BvSB) and a Lowest False Positive criterion (LFP), for further fine-tuning of diagnosis model in an active manner rather than passive manner. That is, we allow models to rank the most informative sensor data to be labeled for updating the DNN parameters during the interaction phase. The effectiveness of the proposed method is validated in two well-known industrial datasets. Results indicate that the proposed method can obtain superior diagnosis accuracy and provide significant performance improvement in accuracy and false positive rate with less labeled chemical sensor data by further active learning compared with existing methods.
NASA Astrophysics Data System (ADS)
Wang, Xiaohua; Rong, Mingzhe; Qiu, Juan; Liu, Dingxin; Su, Biao; Wu, Yi
A new type of algorithm for predicting the mechanical faults of a vacuum circuit breaker (VCB) based on an artificial neural network (ANN) is proposed in this paper. There are two types of mechanical faults in a VCB: operation mechanism faults and tripping circuit faults. An angle displacement sensor is used to measure the main axle angle displacement which reflects the displacement of the moving contact, to obtain the state of the operation mechanism in the VCB, while a Hall current sensor is used to measure the trip coil current, which reflects the operation state of the tripping circuit. Then an ANN prediction algorithm based on a sliding time window is proposed in this paper and successfully used to predict mechanical faults in a VCB. The research results in this paper provide a theoretical basis for the realization of online monitoring and fault diagnosis of a VCB.
The Design of Fault Tolerant Quantum Dot Cellular Automata Based Logic
NASA Technical Reports Server (NTRS)
Armstrong, C. Duane; Humphreys, William M.; Fijany, Amir
2002-01-01
As transistor geometries are reduced, quantum effects begin to dominate device performance. At some point, transistors cease to have the properties that make them useful computational components. New computing elements must be developed in order to keep pace with Moore s Law. Quantum dot cellular automata (QCA) represent an alternative paradigm to transistor-based logic. QCA architectures that are robust to manufacturing tolerances and defects must be developed. We are developing software that allows the exploration of fault tolerant QCA gate architectures by automating the specification, simulation, analysis and documentation processes.
AGSM Functional Fault Models for Fault Isolation Project
NASA Technical Reports Server (NTRS)
Harp, Janicce Leshay
2014-01-01
This project implements functional fault models to automate the isolation of failures during ground systems operations. FFMs will also be used to recommend sensor placement to improve fault isolation capabilities. The project enables the delivery of system health advisories to ground system operators.
Hao, Li-Ying; Yang, Guang-Hong
2013-09-01
This paper is concerned with the problem of robust fault-tolerant compensation control problem for uncertain linear systems subject to both state and input signal quantization. By incorporating novel matrix full-rank factorization technique with sliding surface design successfully, the total failure of certain actuators can be coped with, under a special actuator redundancy assumption. In order to compensate for quantization errors, an adjustment range of quantization sensitivity for a dynamic uniform quantizer is given through the flexible choices of design parameters. Comparing with the existing results, the derived inequality condition leads to the fault tolerance ability stronger and much wider scope of applicability. With a static adjustment policy of quantization sensitivity, an adaptive sliding mode controller is then designed to maintain the sliding mode, where the gain of the nonlinear unit vector term is updated automatically to compensate for the effects of actuator faults, quantization errors, exogenous disturbances and parameter uncertainties without the need for a fault detection and isolation (FDI) mechanism. Finally, the effectiveness of the proposed design method is illustrated via a model of a rocket fairing structural-acoustic. Copyright © 2013 ISA. Published by Elsevier Ltd. All rights reserved.
A signal-based fault detection and classification method for heavy haul wagons
NASA Astrophysics Data System (ADS)
Li, Chunsheng; Luo, Shihui; Cole, Colin; Spiryagin, Maksym; Sun, Yanquan
2017-12-01
This paper proposes a signal-based fault detection and isolation (FDI) system for heavy haul wagons considering the special requirements of low cost and robustness. The sensor network of the proposed system consists of just two accelerometers mounted on the front left and rear right of the carbody. Seven fault indicators (FIs) are proposed based on the cross-correlation analyses of the sensor-collected acceleration signals. Bolster spring fault conditions are focused on in this paper, including two different levels (small faults and moderate faults) and two locations (faults in the left and right bolster springs of the first bogie). A fully detailed dynamic model of a typical 40t axle load heavy haul wagon is developed to evaluate the deterioration of dynamic behaviour under proposed fault conditions and demonstrate the detectability of the proposed FDI method. Even though the fault conditions considered in this paper did not deteriorate the wagon dynamic behaviour dramatically, the proposed FIs show great sensitivity to the bolster spring faults. The most effective and efficient FIs are chosen for fault detection and classification. Analysis results indicate that it is possible to detect changes in bolster stiffness of ±25% and identify the fault location.
FTMP - A highly reliable Fault-Tolerant Multiprocessor for aircraft
NASA Technical Reports Server (NTRS)
Hopkins, A. L., Jr.; Smith, T. B., III; Lala, J. H.
1978-01-01
The FTMP (Fault-Tolerant Multiprocessor) is a complex multiprocessor computer that employs a form of redundancy related to systems considered by Mathur (1971), in which each major module can substitute for any other module of the same type. Despite the conceptual simplicity of the redundancy form, the implementation has many intricacies owing partly to the low target failure rate, and partly to the difficulty of eliminating single-fault vulnerability. An extensive analysis of the computer through the use of such modeling techniques as Markov processes and combinatorial mathematics shows that for random hard faults the computer can meet its requirements. It is also shown that the maintenance scheduled at intervals of 200 hr or more can be adequate most of the time.
Redundancy management for efficient fault recovery in NASA's distributed computing system
NASA Technical Reports Server (NTRS)
Malek, Miroslaw; Pandya, Mihir; Yau, Kitty
1991-01-01
The management of redundancy in computer systems was studied and guidelines were provided for the development of NASA's fault-tolerant distributed systems. Fault recovery and reconfiguration mechanisms were examined. A theoretical foundation was laid for redundancy management by efficient reconfiguration methods and algorithmic diversity. Algorithms were developed to optimize the resources for embedding of computational graphs of tasks in the system architecture and reconfiguration of these tasks after a failure has occurred. The computational structure represented by a path and the complete binary tree was considered and the mesh and hypercube architectures were targeted for their embeddings. The innovative concept of Hybrid Algorithm Technique was introduced. This new technique provides a mechanism for obtaining fault tolerance while exhibiting improved performance.
Robust Fault Detection for Aircraft Using Mixed Structured Singular Value Theory and Fuzzy Logic
NASA Technical Reports Server (NTRS)
Collins, Emmanuel G.
2000-01-01
The purpose of fault detection is to identify when a fault or failure has occurred in a system such as an aircraft or expendable launch vehicle. The faults may occur in sensors, actuators, structural components, etc. One of the primary approaches to model-based fault detection relies on analytical redundancy. That is the output of a computer-based model (actually a state estimator) is compared with the sensor measurements of the actual system to determine when a fault has occurred. Unfortunately, the state estimator is based on an idealized mathematical description of the underlying plant that is never totally accurate. As a result of these modeling errors, false alarms can occur. This research uses mixed structured singular value theory, a relatively recent and powerful robustness analysis tool, to develop robust estimators and demonstrates the use of these estimators in fault detection. To allow qualitative human experience to be effectively incorporated into the detection process fuzzy logic is used to predict the seriousness of the fault that has occurred.
A cognitive information processing framework for distributed sensor networks
NASA Astrophysics Data System (ADS)
Wang, Feiyi; Qi, Hairong
2004-09-01
In this paper, we present a cognitive agent framework (CAF) based on swarm intelligence and self-organization principles, and demonstrate it through collaborative processing for target classification in sensor networks. The framework involves integrated designs to provide both cognitive behavior at the organization level to conquer complexity and reactive behavior at the individual agent level to retain simplicity. The design tackles various problems in the current information processing systems, including overly complex systems, maintenance difficulties, increasing vulnerability to attack, lack of capability to tolerate faults, and inability to identify and cope with low-frequency patterns. An important and distinguishing point of the presented work from classical AI research is that the acquired intelligence does not pertain to distinct individuals but to groups. It also deviates from multi-agent systems (MAS) due to sheer quantity of extremely simple agents we are able to accommodate, to the degree that some loss of coordination messages and behavior of faulty/compromised agents will not affect the collective decision made by the group.
GTRF: a game theory approach for regulating node behavior in real-time wireless sensor networks.
Lin, Chi; Wu, Guowei; Pirozmand, Poria
2015-06-04
The selfish behaviors of nodes (or selfish nodes) cause packet loss, network congestion or even void regions in real-time wireless sensor networks, which greatly decrease the network performance. Previous methods have focused on detecting selfish nodes or avoiding selfish behavior, but little attention has been paid to regulating selfish behavior. In this paper, a Game Theory-based Real-time & Fault-tolerant (GTRF) routing protocol is proposed. GTRF is composed of two stages. In the first stage, a game theory model named VA is developed to regulate nodes' behaviors and meanwhile balance energy cost. In the second stage, a jumping transmission method is adopted, which ensures that real-time packets can be successfully delivered to the sink before a specific deadline. We prove that GTRF theoretically meets real-time requirements with low energy cost. Finally, extensive simulations are conducted to demonstrate the performance of our scheme. Simulation results show that GTRF not only balances the energy cost of the network, but also prolongs network lifetime.
A Self-Stabilizing Hybrid Fault-Tolerant Synchronization Protocol
NASA Technical Reports Server (NTRS)
Malekpour, Mahyar R.
2015-01-01
This paper presents a strategy for solving the Byzantine general problem for self-stabilizing a fully connected network from an arbitrary state and in the presence of any number of faults with various severities including any number of arbitrary (Byzantine) faulty nodes. The strategy consists of two parts: first, converting Byzantine faults into symmetric faults, and second, using a proven symmetric-fault tolerant algorithm to solve the general case of the problem. A protocol (algorithm) is also present that tolerates symmetric faults, provided that there are more good nodes than faulty ones. The solution applies to realizable systems, while allowing for differences in the network elements, provided that the number of arbitrary faults is not more than a third of the network size. The only constraint on the behavior of a node is that the interactions with other nodes are restricted to defined links and interfaces. The solution does not rely on assumptions about the initial state of the system and no central clock nor centrally generated signal, pulse, or message is used. Nodes are anonymous, i.e., they do not have unique identities. A mechanical verification of a proposed protocol is also present. A bounded model of the protocol is verified using the Symbolic Model Verifier (SMV). The model checking effort is focused on verifying correctness of the bounded model of the protocol as well as confirming claims of determinism and linear convergence with respect to the self-stabilization period.
Heredia, Guillermo; Caballero, Fernando; Maza, Iván; Merino, Luis; Viguria, Antidio; Ollero, Aníbal
2009-01-01
This paper presents a method to increase the reliability of Unmanned Aerial Vehicle (UAV) sensor Fault Detection and Identification (FDI) in a multi-UAV context. Differential Global Positioning System (DGPS) and inertial sensors are used for sensor FDI in each UAV. The method uses additional position estimations that augment individual UAV FDI system. These additional estimations are obtained using images from the same planar scene taken from two different UAVs. Since accuracy and noise level of the estimation depends on several factors, dynamic replanning of the multi-UAV team can be used to obtain a better estimation in case of faults caused by slow growing errors of absolute position estimation that cannot be detected by using local FDI in the UAVs. Experimental results with data from two real UAVs are also presented.