algorithm-based fault tolerance: Topics by Science.gov

Sample records for algorithm-based fault tolerance

Rule-based fault diagnosis of hall sensors and fault-tolerant control of PMSM

NASA Astrophysics Data System (ADS)

Song, Ziyou; Li, Jianqiu; Ouyang, Minggao; Gu, Jing; Feng, Xuning; Lu, Dongbin

2013-07-01

Hall sensor is widely used for estimating rotor phase of permanent magnet synchronous motor(PMSM). And rotor position is an essential parameter of PMSM control algorithm, hence it is very dangerous if Hall senor faults occur. But there is scarcely any research focusing on fault diagnosis and fault-tolerant control of Hall sensor used in PMSM. From this standpoint, the Hall sensor faults which may occur during the PMSM operating are theoretically analyzed. According to the analysis results, the fault diagnosis algorithm of Hall sensor, which is based on three rules, is proposed to classify the fault phenomena accurately. The rotor phase estimation algorithms, based on one or two Hall sensor(s), are initialized to engender the fault-tolerant control algorithm. The fault diagnosis algorithm can detect 60 Hall fault phenomena in total as well as all detections can be fulfilled in 1/138 rotor rotation period. The fault-tolerant control algorithm can achieve a smooth torque production which means the same control effect as normal control mode (with three Hall sensors). Finally, the PMSM bench test verifies the accuracy and rapidity of fault diagnosis and fault-tolerant control strategies. The fault diagnosis algorithm can detect all Hall sensor faults promptly and fault-tolerant control algorithm allows the PMSM to face failure conditions of one or two Hall sensor(s). In addition, the transitions between health-control and fault-tolerant control conditions are smooth without any additional noise and harshness. Proposed algorithms can deal with the Hall sensor faults of PMSM in real applications, and can be provided to realize the fault diagnosis and fault-tolerant control of PMSM.
Survivable algorithms and redundancy management in NASA's distributed computing systems

NASA Technical Reports Server (NTRS)

Malek, Miroslaw

1992-01-01

The design of survivable algorithms requires a solid foundation for executing them. While hardware techniques for fault-tolerant computing are relatively well understood, fault-tolerant operating systems, as well as fault-tolerant applications (survivable algorithms), are, by contrast, little understood, and much more work in this field is required. We outline some of our work that contributes to the foundation of ultrareliable operating systems and fault-tolerant algorithm design. We introduce our consensus-based framework for fault-tolerant system design. This is followed by a description of a hierarchical partitioning method for efficient consensus. A scheduler for redundancy management is introduced, and application-specific fault tolerance is described. We give an overview of our hybrid algorithm technique, which is an alternative to the formal approach given.
What does fault tolerant Deep Learning need from MPI?

DOE Office of Scientific and Technical Information (OSTI.GOV)

Amatya, Vinay C.; Vishnu, Abhinav; Siegel, Charles M.

Deep Learning (DL) algorithms have become the {\\em de facto} Machine Learning (ML) algorithm for large scale data analysis. DL algorithms are computationally expensive -- even distributed DL implementations which use MPI require days of training (model learning) time on commonly studied datasets. Long running DL applications become susceptible to faults -- requiring development of a fault tolerant system infrastructure, in addition to fault tolerant DL algorithms. This raises an important question: {\\em What is needed from MPI for designing fault tolerant DL implementations?} In this paper, we address this problem for permanent faults. We motivate the need for amore » fault tolerant MPI specification by an in-depth consideration of recent innovations in DL algorithms and their properties, which drive the need for specific fault tolerance features. We present an in-depth discussion on the suitability of different parallelism types (model, data and hybrid); a need (or lack thereof) for check-pointing of any critical data structures; and most importantly, consideration for several fault tolerance proposals (user-level fault mitigation (ULFM), Reinit) in MPI and their applicability to fault tolerant DL implementations. We leverage a distributed memory implementation of Caffe, currently available under the Machine Learning Toolkit for Extreme Scale (MaTEx). We implement our approaches by extending MaTEx-Caffe for using ULFM-based implementation. Our evaluation using the ImageNet dataset and AlexNet neural network topology demonstrates the effectiveness of the proposed fault tolerant DL implementation using OpenMPI based ULFM.« less
A Novel Dual Separate Paths (DSP) Algorithm Providing Fault-Tolerant Communication for Wireless Sensor Networks.

PubMed

Tien, Nguyen Xuan; Kim, Semog; Rhee, Jong Myung; Park, Sang Yoon

2017-07-25

Fault tolerance has long been a major concern for sensor communications in fault-tolerant cyber physical systems (CPSs). Network failure problems often occur in wireless sensor networks (WSNs) due to various factors such as the insufficient power of sensor nodes, the dislocation of sensor nodes, the unstable state of wireless links, and unpredictable environmental interference. Fault tolerance is thus one of the key requirements for data communications in WSN applications. This paper proposes a novel path redundancy-based algorithm, called dual separate paths (DSP), that provides fault-tolerant communication with the improvement of the network traffic performance for WSN applications, such as fault-tolerant CPSs. The proposed DSP algorithm establishes two separate paths between a source and a destination in a network based on the network topology information. These paths are node-disjoint paths and have optimal path distances. Unicast frames are delivered from the source to the destination in the network through the dual paths, providing fault-tolerant communication and reducing redundant unicast traffic for the network. The DSP algorithm can be applied to wired and wireless networks, such as WSNs, to provide seamless fault-tolerant communication for mission-critical and life-critical applications such as fault-tolerant CPSs. The analyzed and simulated results show that the DSP-based approach not only provides fault-tolerant communication, but also improves network traffic performance. For the case study in this paper, when the DSP algorithm was applied to high-availability seamless redundancy (HSR) networks, the proposed DSP-based approach reduced the network traffic by 80% to 88% compared with the standard HSR protocol, thus improving network traffic performance.
Algorithm-Based Fault Tolerance Integrated with Replication

NASA Technical Reports Server (NTRS)

Some, Raphael; Rennels, David

2008-01-01

In a proposed approach to programming and utilization of commercial off-the-shelf computing equipment, a combination of algorithm-based fault tolerance (ABFT) and replication would be utilized to obtain high degrees of fault tolerance without incurring excessive costs. The basic idea of the proposed approach is to integrate ABFT with replication such that the algorithmic portions of computations would be protected by ABFT, and the logical portions by replication. ABFT is an extremely efficient, inexpensive, high-coverage technique for detecting and mitigating faults in computer systems used for algorithmic computations, but does not protect against errors in logical operations surrounding algorithms.
Fault Tolerant Frequent Pattern Mining

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shohdy, Sameh; Vishnu, Abhinav; Agrawal, Gagan

FP-Growth algorithm is a Frequent Pattern Mining (FPM) algorithm that has been extensively used to study correlations and patterns in large scale datasets. While several researchers have designed distributed memory FP-Growth algorithms, it is pivotal to consider fault tolerant FP-Growth, which can address the increasing fault rates in large scale systems. In this work, we propose a novel parallel, algorithm-level fault-tolerant FP-Growth algorithm. We leverage algorithmic properties and MPI advanced features to guarantee an O(1) space complexity, achieved by using the dataset memory space itself for checkpointing. We also propose a recovery algorithm that can use in-memory and disk-based checkpointing,more » though in many cases the recovery can be completed without any disk access, and incurring no memory overhead for checkpointing. We evaluate our FT algorithm on a large scale InfiniBand cluster with several large datasets using up to 2K cores. Our evaluation demonstrates excellent efficiency for checkpointing and recovery in comparison to the disk-based approach. We have also observed 20x average speed-up in comparison to Spark, establishing that a well designed algorithm can easily outperform a solution based on a general fault-tolerant programming model.« less
A Log-Scaling Fault Tolerant Agreement Algorithm for a Fault Tolerant MPI

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hursey, Joshua J; Naughton, III, Thomas J; Vallee, Geoffroy R

The lack of fault tolerance is becoming a limiting factor for application scalability in HPC systems. The MPI does not provide standardized fault tolerance interfaces and semantics. The MPI Forum's Fault Tolerance Working Group is proposing a collective fault tolerant agreement algorithm for the next MPI standard. Such algorithms play a central role in many fault tolerant applications. This paper combines a log-scaling two-phase commit agreement algorithm with a reduction operation to provide the necessary functionality for the new collective without any additional messages. Error handling mechanisms are described that preserve the fault tolerance properties while maintaining overall scalability.
Model-Based Fault Tolerant Control

NASA Technical Reports Server (NTRS)

Kumar, Aditya; Viassolo, Daniel

2008-01-01

The Model Based Fault Tolerant Control (MBFTC) task was conducted under the NASA Aviation Safety and Security Program. The goal of MBFTC is to develop and demonstrate real-time strategies to diagnose and accommodate anomalous aircraft engine events such as sensor faults, actuator faults, or turbine gas-path component damage that can lead to in-flight shutdowns, aborted take offs, asymmetric thrust/loss of thrust control, or engine surge/stall events. A suite of model-based fault detection algorithms were developed and evaluated. Based on the performance and maturity of the developed algorithms two approaches were selected for further analysis: (i) multiple-hypothesis testing, and (ii) neural networks; both used residuals from an Extended Kalman Filter to detect the occurrence of the selected faults. A simple fusion algorithm was implemented to combine the results from each algorithm to obtain an overall estimate of the identified fault type and magnitude. The identification of the fault type and magnitude enabled the use of an online fault accommodation strategy to correct for the adverse impact of these faults on engine operability thereby enabling continued engine operation in the presence of these faults. The performance of the fault detection and accommodation algorithm was extensively tested in a simulation environment.
Fault-tolerant dynamic task graph scheduling

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kurt, Mehmet C.; Krishnamoorthy, Sriram; Agrawal, Kunal

2014-11-16

In this paper, we present an approach to fault tolerant execution of dynamic task graphs scheduled using work stealing. In particular, we focus on selective and localized recovery of tasks in the presence of soft faults. We elicit from the user the basic task graph structure in terms of successor and predecessor relationships. The work stealing-based algorithm to schedule such a task graph is augmented to enable recovery when the data and meta-data associated with a task get corrupted. We use this redundancy, and the knowledge of the task graph structure, to selectively recover from faults with low space andmore » time overheads. We show that the fault tolerant design retains the essential properties of the underlying work stealing-based task scheduling algorithm, and that the fault tolerant execution is asymptotically optimal when task re-execution is taken into account. Experimental evaluation demonstrates the low cost of recovery under various fault scenarios.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)

Katti, Amogh; Di Fatta, Giuseppe; Naughton III, Thomas J

Future extreme-scale high-performance computing systems will be required to work under frequent component failures. The MPI Forum's User Level Failure Mitigation proposal has introduced an operation, MPI_Comm_shrink, to synchronize the alive processes on the list of failed processes, so that applications can continue to execute even in the presence of failures by adopting algorithm-based fault tolerance techniques. This MPI_Comm_shrink operation requires a fault tolerant failure detection and consensus algorithm. This paper presents and compares two novel failure detection and consensus algorithms. The proposed algorithms are based on Gossip protocols and are inherently fault-tolerant and scalable. The proposed algorithms were implementedmore » and tested using the Extreme-scale Simulator. The results show that in both algorithms the number of Gossip cycles to achieve global consensus scales logarithmically with system size. The second algorithm also shows better scalability in terms of memory and network bandwidth usage and a perfect synchronization in achieving global consensus.« less
Fault diagnosis and fault-tolerant finite control set-model predictive control of a multiphase voltage-source inverter supplying BLDC motor.

PubMed

Salehifar, Mehdi; Moreno-Equilaz, Manuel

2016-01-01

Due to its fault tolerance, a multiphase brushless direct current (BLDC) motor can meet high reliability demand for application in electric vehicles. The voltage-source inverter (VSI) supplying the motor is subjected to open circuit faults. Therefore, it is necessary to design a fault-tolerant (FT) control algorithm with an embedded fault diagnosis (FD) block. In this paper, finite control set-model predictive control (FCS-MPC) is developed to implement the fault-tolerant control algorithm of a five-phase BLDC motor. The developed control method is fast, simple, and flexible. A FD method based on available information from the control block is proposed; this method is simple, robust to common transients in motor and able to localize multiple open circuit faults. The proposed FD and FT control algorithm are embedded in a five-phase BLDC motor drive. In order to validate the theory presented, simulation and experimental results are conducted on a five-phase two-level VSI supplying a five-phase BLDC motor. Copyright © 2015 ISA. Published by Elsevier Ltd. All rights reserved.
Fault tolerant control of multivariable processes using auto-tuning PID controller.

PubMed

Yu, Ding-Li; Chang, T K; Yu, Ding-Wen

2005-02-01

Fault tolerant control of dynamic processes is investigated in this paper using an auto-tuning PID controller. A fault tolerant control scheme is proposed composing an auto-tuning PID controller based on an adaptive neural network model. The model is trained online using the extended Kalman filter (EKF) algorithm to learn system post-fault dynamics. Based on this model, the PID controller adjusts its parameters to compensate the effects of the faults, so that the control performance is recovered from degradation. The auto-tuning algorithm for the PID controller is derived with the Lyapunov method and therefore, the model predicted tracking error is guaranteed to converge asymptotically. The method is applied to a simulated two-input two-output continuous stirred tank reactor (CSTR) with various faults, which demonstrate the applicability of the developed scheme to industrial processes.
Convergence and objective functions of some fault/noise-injection-based online learning algorithms for RBF networks.

PubMed

Ho, Kevin I-J; Leung, Chi-Sing; Sum, John

2010-06-01

In the last two decades, many online fault/noise injection algorithms have been developed to attain a fault tolerant neural network. However, not much theoretical works related to their convergence and objective functions have been reported. This paper studies six common fault/noise-injection-based online learning algorithms for radial basis function (RBF) networks, namely 1) injecting additive input noise, 2) injecting additive/multiplicative weight noise, 3) injecting multiplicative node noise, 4) injecting multiweight fault (random disconnection of weights), 5) injecting multinode fault during training, and 6) weight decay with injecting multinode fault. Based on the Gladyshev theorem, we show that the convergence of these six online algorithms is almost sure. Moreover, their true objective functions being minimized are derived. For injecting additive input noise during training, the objective function is identical to that of the Tikhonov regularizer approach. For injecting additive/multiplicative weight noise during training, the objective function is the simple mean square training error. Thus, injecting additive/multiplicative weight noise during training cannot improve the fault tolerance of an RBF network. Similar to injective additive input noise, the objective functions of other fault/noise-injection-based online algorithms contain a mean square error term and a specialized regularization term.
Algorithm-Based Fault Tolerance for Numerical Subroutines

NASA Technical Reports Server (NTRS)

Tumon, Michael; Granat, Robert; Lou, John

2007-01-01

A software library implements a new methodology of detecting faults in numerical subroutines, thus enabling application programs that contain the subroutines to recover transparently from single-event upsets. The software library in question is fault-detecting middleware that is wrapped around the numericalsubroutines. Conventional serial versions (based on LAPACK and FFTW) and a parallel version (based on ScaLAPACK) exist. The source code of the application program that contains the numerical subroutines is not modified, and the middleware is transparent to the user. The methodology used is a type of algorithm- based fault tolerance (ABFT). In ABFT, a checksum is computed before a computation and compared with the checksum of the computational result; an error is declared if the difference between the checksums exceeds some threshold. Novel normalization methods are used in the checksum comparison to ensure correct fault detections independent of algorithm inputs. In tests of this software reported in the peer-reviewed literature, this library was shown to enable detection of 99.9 percent of significant faults while generating no false alarms.
Redundant and fault-tolerant algorithms for real-time measurement and control systems for weapon equipment.

PubMed

Li, Dan; Hu, Xiaoguang

2017-03-01

Because of the high availability requirements from weapon equipment, an in-depth study has been conducted on the real-time fault-tolerance of the widely applied Compact PCI (CPCI) bus measurement and control system. A redundancy design method that uses heartbeat detection to connect the primary and alternate devices has been developed. To address the low successful execution rate and relatively large waste of time slices in the primary version of the task software, an improved algorithm for real-time fault-tolerant scheduling is proposed based on the Basic Checking available time Elimination idle time (BCE) algorithm, applying a single-neuron self-adaptive proportion sum differential (PSD) controller. The experimental validation results indicate that this system has excellent redundancy and fault-tolerance, and the newly developed method can effectively improve the system availability. Copyright © 2017 ISA. Published by Elsevier Ltd. All rights reserved.
Fault-tolerant processing system

NASA Technical Reports Server (NTRS)

Palumbo, Daniel L. (Inventor)

1996-01-01

A fault-tolerant, fiber optic interconnect, or backplane, which serves as a via for data transfer between modules. Fault tolerance algorithms are embedded in the backplane by dividing the backplane into a read bus and a write bus and placing a redundancy management unit (RMU) between the read bus and the write bus so that all data transmitted by the write bus is subjected to the fault tolerance algorithms before the data is passed for distribution to the read bus. The RMU provides both backplane control and fault tolerance.
Fault Mitigation Schemes for Future Spaceflight Multicore Processors

NASA Technical Reports Server (NTRS)

Alexander, James W.; Clement, Bradley J.; Gostelow, Kim P.; Lai, John Y.

2012-01-01

Future planetary exploration missions demand significant advances in on-board computing capabilities over current avionics architectures based on a single-core processing element. The state-of-the-art multi-core processor provides much promise in meeting such challenges while introducing new fault tolerance problems when applied to space missions. Software-based schemes are being presented in this paper that can achieve system-level fault mitigation beyond that provided by radiation-hard-by-design (RHBD). For mission and time critical applications such as the Terrain Relative Navigation (TRN) for planetary or small body navigation, and landing, a range of fault tolerance methods can be adapted by the application. The software methods being investigated include Error Correction Code (ECC) for data packet routing between cores, virtual network routing, Triple Modular Redundancy (TMR), and Algorithm-Based Fault Tolerance (ABFT). A robust fault tolerance framework that provides fail-operational behavior under hard real-time constraints and graceful degradation will be demonstrated using TRN executing on a commercial Tilera(R) processor with simulated fault injections.
Intelligent fault-tolerant controllers

NASA Technical Reports Server (NTRS)

Huang, Chien Y.

1987-01-01

A system with fault tolerant controls is one that can detect, isolate, and estimate failures and perform necessary control reconfiguration based on this new information. Artificial intelligence (AI) is concerned with semantic processing, and it has evolved to include the topics of expert systems and machine learning. This research represents an attempt to apply AI to fault tolerant controls, hence, the name intelligent fault tolerant control (IFTC). A generic solution to the problem is sought, providing a system based on logic in addition to analytical tools, and offering machine learning capabilities. The advantages are that redundant system specific algorithms are no longer needed, that reasonableness is used to quickly choose the correct control strategy, and that the system can adapt to new situations by learning about its effects on system dynamics.
Distributed fault-tolerant time-varying formation control for high-order linear multi-agent systems with actuator failures.

PubMed

Hua, Yongzhao; Dong, Xiwang; Li, Qingdong; Ren, Zhang

2017-11-01

This paper investigates the fault-tolerant time-varying formation control problems for high-order linear multi-agent systems in the presence of actuator failures. Firstly, a fully distributed formation control protocol is presented to compensate for the influences of both bias fault and loss of effectiveness fault. Using the adaptive online updating strategies, no global knowledge about the communication topology is required and the bounds of actuator failures can be unknown. Then an algorithm is proposed to determine the control parameters of the fault-tolerant formation protocol, where the time-varying formation feasible conditions and an approach to expand the feasible formation set are given. Furthermore, the stability of the proposed algorithm is proven based on the Lyapunov-like theory. Finally, two simulation examples are given to demonstrate the effectiveness of the theoretical results. Copyright © 2017 ISA. Published by Elsevier Ltd. All rights reserved.
Optimizing the Reliability and Performance of Service Composition Applications with Fault Tolerance in Wireless Sensor Networks

PubMed Central

Wu, Zhao; Xiong, Naixue; Huang, Yannong; Xu, Degang; Hu, Chunyang

2015-01-01

The services composition technology provides flexible methods for building service composition applications (SCAs) in wireless sensor networks (WSNs). The high reliability and high performance of SCAs help services composition technology promote the practical application of WSNs. The optimization methods for reliability and performance used for traditional software systems are mostly based on the instantiations of software components, which are inapplicable and inefficient in the ever-changing SCAs in WSNs. In this paper, we consider the SCAs with fault tolerance in WSNs. Based on a Universal Generating Function (UGF) we propose a reliability and performance model of SCAs in WSNs, which generalizes a redundancy optimization problem to a multi-state system. Based on this model, an efficient optimization algorithm for reliability and performance of SCAs in WSNs is developed based on a Genetic Algorithm (GA) to find the optimal structure of SCAs with fault-tolerance in WSNs. In order to examine the feasibility of our algorithm, we have evaluated the performance. Furthermore, the interrelationships between the reliability, performance and cost are investigated. In addition, a distinct approach to determine the most suitable parameters in the suggested algorithm is proposed. PMID:26561818

Analysis and design of algorithm-based fault-tolerant systems

NASA Technical Reports Server (NTRS)

Nair, V. S. Sukumaran

1990-01-01

An important consideration in the design of high performance multiprocessor systems is to ensure the correctness of the results computed in the presence of transient and intermittent failures. Concurrent error detection and correction have been applied to such systems in order to achieve reliability. Algorithm Based Fault Tolerance (ABFT) was suggested as a cost-effective concurrent error detection scheme. The research was motivated by the complexity involved in the analysis and design of ABFT systems. To that end, a matrix-based model was developed and, based on that, algorithms for both the design and analysis of ABFT systems are formulated. These algorithms are less complex than the existing ones. In order to reduce the complexity further, a hierarchical approach is developed for the analysis of large systems.
A Novel Wide-Area Backup Protection Based on Fault Component Current Distribution and Improved Evidence Theory

PubMed Central

Zhang, Zhe; Kong, Xiangping; Yin, Xianggen; Yang, Zengli; Wang, Lijun

2014-01-01

In order to solve the problems of the existing wide-area backup protection (WABP) algorithms, the paper proposes a novel WABP algorithm based on the distribution characteristics of fault component current and improved Dempster/Shafer (D-S) evidence theory. When a fault occurs, slave substations transmit to master station the amplitudes of fault component currents of transmission lines which are the closest to fault element. Then master substation identifies suspicious faulty lines according to the distribution characteristics of fault component current. After that, the master substation will identify the actual faulty line with improved D-S evidence theory based on the action states of traditional protections and direction components of these suspicious faulty lines. The simulation examples based on IEEE 10-generator-39-bus system show that the proposed WABP algorithm has an excellent performance. The algorithm has low requirement of sampling synchronization, small wide-area communication flow, and high fault tolerance. PMID:25050399
Trust index based fault tolerant multiple event localization algorithm for WSNs.

PubMed

Xu, Xianghua; Gao, Xueyong; Wan, Jian; Xiong, Naixue

2011-01-01

This paper investigates the use of wireless sensor networks for multiple event source localization using binary information from the sensor nodes. The events could continually emit signals whose strength is attenuated inversely proportional to the distance from the source. In this context, faults occur due to various reasons and are manifested when a node reports a wrong decision. In order to reduce the impact of node faults on the accuracy of multiple event localization, we introduce a trust index model to evaluate the fidelity of information which the nodes report and use in the event detection process, and propose the Trust Index based Subtract on Negative Add on Positive (TISNAP) localization algorithm, which reduces the impact of faulty nodes on the event localization by decreasing their trust index, to improve the accuracy of event localization and performance of fault tolerance for multiple event source localization. The algorithm includes three phases: first, the sink identifies the cluster nodes to determine the number of events occurred in the entire region by analyzing the binary data reported by all nodes; then, it constructs the likelihood matrix related to the cluster nodes and estimates the location of all events according to the alarmed status and trust index of the nodes around the cluster nodes. Finally, the sink updates the trust index of all nodes according to the fidelity of their information in the previous reporting cycle. The algorithm improves the accuracy of localization and performance of fault tolerance in multiple event source localization. The experiment results show that when the probability of node fault is close to 50%, the algorithm can still accurately determine the number of the events and have better accuracy of localization compared with other algorithms.
Trust Index Based Fault Tolerant Multiple Event Localization Algorithm for WSNs

PubMed Central

Xu, Xianghua; Gao, Xueyong; Wan, Jian; Xiong, Naixue

2011-01-01

This paper investigates the use of wireless sensor networks for multiple event source localization using binary information from the sensor nodes. The events could continually emit signals whose strength is attenuated inversely proportional to the distance from the source. In this context, faults occur due to various reasons and are manifested when a node reports a wrong decision. In order to reduce the impact of node faults on the accuracy of multiple event localization, we introduce a trust index model to evaluate the fidelity of information which the nodes report and use in the event detection process, and propose the Trust Index based Subtract on Negative Add on Positive (TISNAP) localization algorithm, which reduces the impact of faulty nodes on the event localization by decreasing their trust index, to improve the accuracy of event localization and performance of fault tolerance for multiple event source localization. The algorithm includes three phases: first, the sink identifies the cluster nodes to determine the number of events occurred in the entire region by analyzing the binary data reported by all nodes; then, it constructs the likelihood matrix related to the cluster nodes and estimates the location of all events according to the alarmed status and trust index of the nodes around the cluster nodes. Finally, the sink updates the trust index of all nodes according to the fidelity of their information in the previous reporting cycle. The algorithm improves the accuracy of localization and performance of fault tolerance in multiple event source localization. The experiment results show that when the probability of node fault is close to 50%, the algorithm can still accurately determine the number of the events and have better accuracy of localization compared with other algorithms. PMID:22163972
A Regularizer Approach for RBF Networks Under the Concurrent Weight Failure Situation.

PubMed

Leung, Chi-Sing; Wan, Wai Yan; Feng, Ruibin

2017-06-01

Many existing results on fault-tolerant algorithms focus on the single fault source situation, where a trained network is affected by one kind of weight failure. In fact, a trained network may be affected by multiple kinds of weight failure. This paper first studies how the open weight fault and the multiplicative weight noise degrade the performance of radial basis function (RBF) networks. Afterward, we define the objective function for training fault-tolerant RBF networks. Based on the objective function, we then develop two learning algorithms, one batch mode and one online mode. Besides, the convergent conditions of our online algorithm are investigated. Finally, we develop a formula to estimate the test set error of faulty networks trained from our approach. This formula helps us to optimize some tuning parameters, such as RBF width.
cost and benefits optimization model for fault-tolerant aircraft electronic systems

NASA Technical Reports Server (NTRS)

1983-01-01

The factors involved in economic assessment of fault tolerant systems (FTS) and fault tolerant flight control systems (FTFCS) are discussed. Algorithms for optimization and economic analysis of FTFCS are documented.
An improved ant colony optimization algorithm with fault tolerance for job scheduling in grid computing systems

PubMed Central

Idris, Hajara; Junaidu, Sahalu B.; Adewumi, Aderemi O.

2017-01-01

The Grid scheduler, schedules user jobs on the best available resource in terms of resource characteristics by optimizing job execution time. Resource failure in Grid is no longer an exception but a regular occurring event as resources are increasingly being used by the scientific community to solve computationally intensive problems which typically run for days or even months. It is therefore absolutely essential that these long-running applications are able to tolerate failures and avoid re-computations from scratch after resource failure has occurred, to satisfy the user’s Quality of Service (QoS) requirement. Job Scheduling with Fault Tolerance in Grid Computing using Ant Colony Optimization is proposed to ensure that jobs are executed successfully even when resource failure has occurred. The technique employed in this paper, is the use of resource failure rate, as well as checkpoint-based roll back recovery strategy. Check-pointing aims at reducing the amount of work that is lost upon failure of the system by immediately saving the state of the system. A comparison of the proposed approach with an existing Ant Colony Optimization (ACO) algorithm is discussed. The experimental results of the implemented Fault Tolerance scheduling algorithm show that there is an improvement in the user’s QoS requirement over the existing ACO algorithm, which has no fault tolerance integrated in it. The performance evaluation of the two algorithms was measured in terms of the three main scheduling performance metrics: makespan, throughput and average turnaround time. PMID:28545075
Switch failure diagnosis based on inductor current observation for boost converters

NASA Astrophysics Data System (ADS)

Jamshidpour, E.; Poure, P.; Saadate, S.

2016-09-01

Face to the growing number of applications using DC-DC power converters, the improvement of their reliability is subject to an increasing number of studies. Especially in safety critical applications, designing fault-tolerant converters is becoming mandatory. In this paper, a switch fault-tolerant DC-DC converter is studied. First, some of the fastest Fault Detection Algorithms (FDAs) are recalled. Then, a fast switch FDA is proposed which can detect both types of failures; open circuit fault as well as short circuit fault can be detected in less than one switching period. Second, a fault-tolerant converter which can be reconfigured under those types of fault is introduced. Hardware-In-the-Loop (HIL) results and experimental validations are given to verify the validity of the proposed switch fault-tolerant approach in the case of a single switch DC-DC boost converter with one redundant switch.
Advanced information processing system: Hosting of advanced guidance, navigation and control algorithms on AIPS using ASTER

NASA Technical Reports Server (NTRS)

Brenner, Richard; Lala, Jaynarayan H.; Nagle, Gail A.; Schor, Andrei; Turkovich, John

1994-01-01

This program demonstrated the integration of a number of technologies that can increase the availability and reliability of launch vehicles while lowering costs. Availability is increased with an advanced guidance algorithm that adapts trajectories in real-time. Reliability is increased with fault-tolerant computers and communication protocols. Costs are reduced by automatically generating code and documentation. This program was realized through the cooperative efforts of academia, industry, and government. The NASA-LaRC coordinated the effort, while Draper performed the integration. Georgia Institute of Technology supplied a weak Hamiltonian finite element method for optimal control problems. Martin Marietta used MATLAB to apply this method to a launch vehicle (FENOC). Draper supplied the fault-tolerant computing and software automation technology. The fault-tolerant technology includes sequential and parallel fault-tolerant processors (FTP & FTPP) and authentication protocols (AP) for communication. Fault-tolerant technology was incrementally incorporated. Development culminated with a heterogeneous network of workstations and fault-tolerant computers using AP. Draper's software automation system, ASTER, was used to specify a static guidance system based on FENOC, navigation, flight control (GN&C), models, and the interface to a user interface for mission control. ASTER generated Ada code for GN&C and C code for models. An algebraic transform engine (ATE) was developed to automatically translate MATLAB scripts into ASTER.
Machine-checked proofs of the design and implementation of a fault-tolerant circuit

NASA Technical Reports Server (NTRS)

Bevier, William R.; Young, William D.

1990-01-01

A formally verified implementation of the 'oral messages' algorithm of Pease, Shostak, and Lamport is described. An abstract implementation of the algorithm is verified to achieve interactive consistency in the presence of faults. This abstract characterization is then mapped down to a hardware level implementation which inherits the fault-tolerant characteristics of the abstract version. All steps in the proof were checked with the Boyer-Moore theorem prover. A significant results is the demonstration of a fault-tolerant device that is formally specified and whose implementation is proved correct with respect to this specification. A significant simplifying assumption is that the redundant processors behave synchronously. A mechanically checked proof that the oral messages algorithm is 'optimal' in the sense that no algorithm which achieves agreement via similar message passing can tolerate a larger proportion of faulty processor is also described.
A survey of provably correct fault-tolerant clock synchronization techniques

NASA Technical Reports Server (NTRS)

Butler, Ricky W.

1988-01-01

Six provably correct fault-tolerant clock synchronization algorithms are examined. These algorithms are all presented in the same notation to permit easier comprehension and comparison. The advantages and disadvantages of the different techniques are examined and issues related to the implementation of these algorithms are discussed. The paper argues for the use of such algorithms in life-critical applications.
Risk intelligence: making profit from uncertainty in data processing system.

PubMed

Zheng, Si; Liao, Xiangke; Liu, Xiaodong

2014-01-01

In extreme scale data processing systems, fault tolerance is an essential and indispensable part. Proactive fault tolerance scheme (such as the speculative execution in MapReduce framework) is introduced to dramatically improve the response time of job executions when the failure becomes a norm rather than an exception. Efficient proactive fault tolerance schemes require precise knowledge on the task executions, which has been an open challenge for decades. To well address the issue, in this paper we design and implement RiskI, a profile-based prediction algorithm in conjunction with a riskaware task assignment algorithm, to accelerate task executions, taking the uncertainty nature of tasks into account. Our design demonstrates that the nature uncertainty brings not only great challenges, but also new opportunities. With a careful design, we can benefit from such uncertainties. We implement the idea in Hadoop 0.21.0 systems and the experimental results show that, compared with the traditional LATE algorithm, the response time can be improved by 46% with the same system throughput.
Risk Intelligence: Making Profit from Uncertainty in Data Processing System

PubMed Central

Liao, Xiangke; Liu, Xiaodong

2014-01-01

In extreme scale data processing systems, fault tolerance is an essential and indispensable part. Proactive fault tolerance scheme (such as the speculative execution in MapReduce framework) is introduced to dramatically improve the response time of job executions when the failure becomes a norm rather than an exception. Efficient proactive fault tolerance schemes require precise knowledge on the task executions, which has been an open challenge for decades. To well address the issue, in this paper we design and implement RiskI, a profile-based prediction algorithm in conjunction with a riskaware task assignment algorithm, to accelerate task executions, taking the uncertainty nature of tasks into account. Our design demonstrates that the nature uncertainty brings not only great challenges, but also new opportunities. With a careful design, we can benefit from such uncertainties. We implement the idea in Hadoop 0.21.0 systems and the experimental results show that, compared with the traditional LATE algorithm, the response time can be improved by 46% with the same system throughput. PMID:24883392
Analysis of fault-tolerant neurocontrol architectures

NASA Technical Reports Server (NTRS)

Troudet, T.; Merrill, W.

1992-01-01

The fault-tolerance of analog parallel distributed implementations of a multivariable aircraft neurocontroller is analyzed by simulating weight and neuron failures in a simplified scheme of analog processing based on the functional architecture of the ETANN chip (Electrically Trainable Artificial Neural Network). The neural information processing is found to be only partially distributed throughout the set of weights of the neurocontroller synthesized with the backpropagation algorithm. Although the degree of distribution of the neural processing, and consequently the fault-tolerance of the neurocontroller, could be enhanced using Locally Distributed Weight and Neuron Approaches, a satisfactory level of fault-tolerance could only be obtained by retraining the degrated VLSI neurocontroller. The possibility of maintaining neurocontrol performance and stability in the presence of single weight of neuron failures was demonstrated through an automated retraining procedure of the neurocontroller based on a pre-programmed choice and sequence of the training parameters.
A formally verified algorithm for interactive consistency under a hybrid fault model

NASA Technical Reports Server (NTRS)

Lincoln, Patrick; Rushby, John

1993-01-01

Consistent distribution of single-source data to replicated computing channels is a fundamental problem in fault-tolerant system design. The 'Oral Messages' (OM) algorithm solves this problem of Interactive Consistency (Byzantine Agreement) assuming that all faults are worst-cass. Thambidurai and Park introduced a 'hybrid' fault model that distinguished three fault modes: asymmetric (Byzantine), symmetric, and benign; they also exhibited, along with an informal 'proof of correctness', a modified version of OM. Unfortunately, their algorithm is flawed. The discipline of mechanically checked formal verification eventually enabled us to develop a correct algorithm for Interactive Consistency under the hybrid fault model. This algorithm withstands $a$ asymmetric, $s$ symmetric, and $b$ benign faults simultaneously, using $m+1$ rounds, provided $n is greater than 2a + 2s + b + m$, and $m\\geg a$. We present this algorithm, discuss its subtle points, and describe its formal specification and verification in PVS. We argue that formal verification systems such as PVS are now sufficiently effective that their application to fault-tolerance algorithms should be considered routine.
Adaptive-gain fast super-twisting sliding mode fault tolerant control for a reusable launch vehicle in reentry phase.

PubMed

Zhang, Yao; Tang, Shengjing; Guo, Jie

2017-11-01

In this paper, a novel adaptive-gain fast super-twisting (AGFST) sliding mode attitude control synthesis is carried out for a reusable launch vehicle subject to actuator faults and unknown disturbances. According to the fast nonsingular terminal sliding mode surface (FNTSMS) and adaptive-gain fast super-twisting algorithm, an adaptive fault tolerant control law for the attitude stabilization is derived to protect against the actuator faults and unknown uncertainties. Firstly, a second-order nonlinear control-oriented model for the RLV is established by feedback linearization method. And on the basis a fast nonsingular terminal sliding mode (FNTSM) manifold is designed, which provides fast finite-time global convergence and avoids singularity problem as well as chattering phenomenon. Based on the merits of the standard super-twisting (ST) algorithm and fast reaching law with adaption, a novel adaptive-gain fast super-twisting (AGFST) algorithm is proposed for the finite-time fault tolerant attitude control problem of the RLV without any knowledge of the bounds of uncertainties and actuator faults. The important feature of the AGFST algorithm includes non-overestimating the values of the control gains and faster convergence speed than the standard ST algorithm. A formal proof of the finite-time stability of the closed-loop system is derived using the Lyapunov function technique. An estimation of the convergence time and accurate expression of convergence region are also provided. Finally, simulations are presented to illustrate the effectiveness and superiority of the proposed control scheme. Copyright © 2017 ISA. Published by Elsevier Ltd. All rights reserved.
Scalable and fault tolerant orthogonalization based on randomized distributed data aggregation

PubMed Central

Gansterer, Wilfried N.; Niederbrucker, Gerhard; Straková, Hana; Schulze Grotthoff, Stefan

2013-01-01

The construction of distributed algorithms for matrix computations built on top of distributed data aggregation algorithms with randomized communication schedules is investigated. For this purpose, a new aggregation algorithm for summing or averaging distributed values, the push-flow algorithm, is developed, which achieves superior resilience properties with respect to failures compared to existing aggregation methods. It is illustrated that on a hypercube topology it asymptotically requires the same number of iterations as the optimal all-to-all reduction operation and that it scales well with the number of nodes. Orthogonalization is studied as a prototypical matrix computation task. A new fault tolerant distributed orthogonalization method rdmGS, which can produce accurate results even in the presence of node failures, is built on top of distributed data aggregation algorithms. PMID:24748902
SFTP: A Secure and Fault-Tolerant Paradigm against Blackhole Attack in MANET

NASA Astrophysics Data System (ADS)

KumarRout, Jitendra; Kumar Bhoi, Sourav; Kumar Panda, Sanjaya

2013-02-01

Security issues in MANET are a challenging task nowadays. MANETs are vulnerable to passive attacks and active attacks because of a limited number of resources and lack of centralized authority. Blackhole attack is an attack in network layer which degrade the network performance by dropping the packets. In this paper, we have proposed a Secure Fault-Tolerant Paradigm (SFTP) which checks the Blackhole attack in the network. The three phases used in SFTP algorithm are designing of coverage area to find the area of coverage, Network Connection algorithm to design a fault-tolerant model and Route Discovery algorithm to discover the route and data delivery from source to destination. SFTP gives better network performance by making the network fault free.
Fault-tolerant clock synchronization in distributed systems

NASA Technical Reports Server (NTRS)

Ramanathan, Parameswaran; Shin, Kang G.; Butler, Ricky W.

1990-01-01

Existing fault-tolerant clock synchronization algorithms are compared and contrasted. These include the following: software synchronization algorithms, such as convergence-averaging, convergence-nonaveraging, and consistency algorithms, as well as probabilistic synchronization; hardware synchronization algorithms; and hybrid synchronization. The worst-case clock skews guaranteed by representative algorithms are compared, along with other important aspects such as time, message, and cost overhead imposed by the algorithms. More recent developments such as hardware-assisted software synchronization and algorithms for synchronizing large, partially connected distributed systems are especially emphasized.
Investigation of the applicability of a functional programming model to fault-tolerant parallel processing for knowledge-based systems

NASA Technical Reports Server (NTRS)

Harper, Richard

1989-01-01

In a fault-tolerant parallel computer, a functional programming model can facilitate distributed checkpointing, error recovery, load balancing, and graceful degradation. Such a model has been implemented on the Draper Fault-Tolerant Parallel Processor (FTPP). When used in conjunction with the FTPP's fault detection and masking capabilities, this implementation results in a graceful degradation of system performance after faults. Three graceful degradation algorithms have been implemented and are presented. A user interface has been implemented which requires minimal cognitive overhead by the application programmer, masking such complexities as the system's redundancy, distributed nature, variable complement of processing resources, load balancing, fault occurrence and recovery. This user interface is described and its use demonstrated. The applicability of the functional programming style to the Activation Framework, a paradigm for intelligent systems, is then briefly described.

A modified NARMAX model-based self-tuner with fault tolerance for unknown nonlinear stochastic hybrid systems with an input-output direct feed-through term.

PubMed

Tsai, Jason S-H; Hsu, Wen-Teng; Lin, Long-Guei; Guo, Shu-Mei; Tann, Joseph W

2014-01-01

A modified nonlinear autoregressive moving average with exogenous inputs (NARMAX) model-based state-space self-tuner with fault tolerance is proposed in this paper for the unknown nonlinear stochastic hybrid system with a direct transmission matrix from input to output. Through the off-line observer/Kalman filter identification method, one has a good initial guess of modified NARMAX model to reduce the on-line system identification process time. Then, based on the modified NARMAX-based system identification, a corresponding adaptive digital control scheme is presented for the unknown continuous-time nonlinear system, with an input-output direct transmission term, which also has measurement and system noises and inaccessible system states. Besides, an effective state space self-turner with fault tolerance scheme is presented for the unknown multivariable stochastic system. A quantitative criterion is suggested by comparing the innovation process error estimated by the Kalman filter estimation algorithm, so that a weighting matrix resetting technique by adjusting and resetting the covariance matrices of parameter estimate obtained by the Kalman filter estimation algorithm is utilized to achieve the parameter estimation for faulty system recovery. Consequently, the proposed method can effectively cope with partially abrupt and/or gradual system faults and input failures by the fault detection. Copyright © 2013 ISA. Published by Elsevier Ltd. All rights reserved.
Distributed Fault-Tolerant Control of Networked Uncertain Euler-Lagrange Systems Under Actuator Faults.

PubMed

Chen, Gang; Song, Yongduan; Lewis, Frank L

2016-05-03

This paper investigates the distributed fault-tolerant control problem of networked Euler-Lagrange systems with actuator and communication link faults. An adaptive fault-tolerant cooperative control scheme is proposed to achieve the coordinated tracking control of networked uncertain Lagrange systems on a general directed communication topology, which contains a spanning tree with the root node being the active target system. The proposed algorithm is capable of compensating for the actuator bias fault, the partial loss of effectiveness actuation fault, the communication link fault, the model uncertainty, and the external disturbance simultaneously. The control scheme does not use any fault detection and isolation mechanism to detect, separate, and identify the actuator faults online, which largely reduces the online computation and expedites the responsiveness of the controller. To validate the effectiveness of the proposed method, a test-bed of multiple robot-arm cooperative control system is developed for real-time verification. Experiments on the networked robot-arms are conduced and the results confirm the benefits and the effectiveness of the proposed distributed fault-tolerant control algorithms.
A fault tolerant gait for a hexapod robot over uneven terrain.

PubMed

Yang, J M; Kim, J H

2000-01-01

The fault tolerant gait of legged robots in static walking is a gait which maintains its stability against a fault event preventing a leg from having the support state. In this paper, a fault tolerant quadruped gait is proposed for a hexapod traversing uneven terrain with forbidden regions, which do not offer viable footholds but can be stepped over. By comparing performance of straight-line motion and crab walking over even terrain, it is shown that the proposed gait has better mobility and terrain adaptability than previously developed gaits. Based on the proposed gait, we present a method for the generation of the fault tolerant locomotion of a hexapod over uneven terrain with forbidden regions. The proposed method minimizes the number of legs on the ground during walking, and foot adjustment algorithm is used for avoiding steps on forbidden regions. The effectiveness of the proposed strategy over uneven terrain is demonstrated with a computer simulation.
Reliable and Efficient Parallel Processing Algorithms and Architectures for Modern Signal Processing. Ph.D. Thesis

NASA Technical Reports Server (NTRS)

Liu, Kuojuey Ray

1990-01-01

Least-squares (LS) estimations and spectral decomposition algorithms constitute the heart of modern signal processing and communication problems. Implementations of recursive LS and spectral decomposition algorithms onto parallel processing architectures such as systolic arrays with efficient fault-tolerant schemes are the major concerns of this dissertation. There are four major results in this dissertation. First, we propose the systolic block Householder transformation with application to the recursive least-squares minimization. It is successfully implemented on a systolic array with a two-level pipelined implementation at the vector level as well as at the word level. Second, a real-time algorithm-based concurrent error detection scheme based on the residual method is proposed for the QRD RLS systolic array. The fault diagnosis, order degraded reconfiguration, and performance analysis are also considered. Third, the dynamic range, stability, error detection capability under finite-precision implementation, order degraded performance, and residual estimation under faulty situations for the QRD RLS systolic array are studied in details. Finally, we propose the use of multi-phase systolic algorithms for spectral decomposition based on the QR algorithm. Two systolic architectures, one based on triangular array and another based on rectangular array, are presented for the multiphase operations with fault-tolerant considerations. Eigenvectors and singular vectors can be easily obtained by using the multi-pase operations. Performance issues are also considered.
Multi-version software reliability through fault-avoidance and fault-tolerance

NASA Technical Reports Server (NTRS)

Vouk, Mladen A.; Mcallister, David F.

1989-01-01

A number of experimental and theoretical issues associated with the practical use of multi-version software to provide run-time tolerance to software faults were investigated. A specialized tool was developed and evaluated for measuring testing coverage for a variety of metrics. The tool was used to collect information on the relationships between software faults and coverage provided by the testing process as measured by different metrics (including data flow metrics). Considerable correlation was found between coverage provided by some higher metrics and the elimination of faults in the code. Back-to-back testing was continued as an efficient mechanism for removal of un-correlated faults, and common-cause faults of variable span. Software reliability estimation methods was also continued based on non-random sampling, and the relationship between software reliability and code coverage provided through testing. New fault tolerance models were formulated. Simulation studies of the Acceptance Voting and Multi-stage Voting algorithms were finished and it was found that these two schemes for software fault tolerance are superior in many respects to some commonly used schemes. Particularly encouraging are the safety properties of the Acceptance testing scheme.
Fault-Tolerant Algorithms for Connectivity Restoration in Wireless Sensor Networks.

PubMed

Zeng, Yali; Xu, Li; Chen, Zhide

2015-12-22

As wireless sensor network (WSN) is often deployed in a hostile environment, nodes in the networks are prone to large-scale failures, resulting in the network not working normally. In this case, an effective restoration scheme is needed to restore the faulty network timely. Most of existing restoration schemes consider more about the number of deployed nodes or fault tolerance alone, but fail to take into account the fact that network coverage and topology quality are also important to a network. To address this issue, we present two algorithms named Full 2-Connectivity Restoration Algorithm (F2CRA) and Partial 3-Connectivity Restoration Algorithm (P3CRA), which restore a faulty WSN in different aspects. F2CRA constructs the fan-shaped topology structure to reduce the number of deployed nodes, while P3CRA constructs the dual-ring topology structure to improve the fault tolerance of the network. F2CRA is suitable when the restoration cost is given the priority, and P3CRA is suitable when the network quality is considered first. Compared with other algorithms, these two algorithms ensure that the network has stronger fault-tolerant function, larger coverage area and better balanced load after the restoration.
Message Efficient Checkpointing and Rollback Recovery in Heterogeneous Mobile Networks

NASA Astrophysics Data System (ADS)

Jaggi, Parmeet Kaur; Singh, Awadhesh Kumar

2016-06-01

Heterogeneous networks provide an appealing way of expanding the computing capability of mobile networks by combining infrastructure-less mobile ad-hoc networks with the infrastructure-based cellular mobile networks. The nodes in such a network range from low-power nodes to macro base stations and thus, vary greatly in their capabilities such as computation power and battery power. The nodes are susceptible to different types of transient and permanent failures and therefore, the algorithms designed for such networks need to be fault-tolerant. The article presents a checkpointing algorithm for the rollback recovery of mobile hosts in a heterogeneous mobile network. Checkpointing is a well established approach to provide fault tolerance in static and cellular mobile distributed systems. However, the use of checkpointing for fault tolerance in a heterogeneous environment remains to be explored. The proposed protocol is based on the results of zigzag paths and zigzag cycles by Netzer-Xu. Considering the heterogeneity prevalent in the network, an uncoordinated checkpointing technique is employed. Yet, useless checkpoints are avoided without causing a high message overhead.
Certification trails for data structures

NASA Technical Reports Server (NTRS)

Sullivan, Gregory F.; Masson, Gerald M.

1993-01-01

Certification trails are a recently introduced and promising approach to fault detection and fault tolerance. The applicability of the certification trail technique is significantly generalized. Previously, certification trails had to be customized to each algorithm application; trails appropriate to wide classes of algorithms were developed. These certification trails are based on common data-structure operations such as those carried out using these sets of operations such as those carried out using balanced binary trees and heaps. Any algorithms using these sets of operations can therefore employ the certification trail method to achieve software fault tolerance. To exemplify the scope of the generalization of the certification trail technique provided, constructions of trails for abstract data types such as priority queues and union-find structures are given. These trails are applicable to any data-structure implementation of the abstract data type. It is also shown that these ideals lead naturally to monitors for data-structure operations.
Design for dependability: A simulation-based approach. Ph.D. Thesis, 1993

NASA Technical Reports Server (NTRS)

Goswami, Kumar K.

1994-01-01

This research addresses issues in simulation-based system level dependability analysis of fault-tolerant computer systems. The issues and difficulties of providing a general simulation-based approach for system level analysis are discussed and a methodology that address and tackle these issues is presented. The proposed methodology is designed to permit the study of a wide variety of architectures under various fault conditions. It permits detailed functional modeling of architectural features such as sparing policies, repair schemes, routing algorithms as well as other fault-tolerant mechanisms, and it allows the execution of actual application software. One key benefit of this approach is that the behavior of a system under faults does not have to be pre-defined as it is normally done. Instead, a system can be simulated in detail and injected with faults to determine its failure modes. The thesis describes how object-oriented design is used to incorporate this methodology into a general purpose design and fault injection package called DEPEND. A software model is presented that uses abstractions of application programs to study the behavior and effect of software on hardware faults in the early design stage when actual code is not available. Finally, an acceleration technique that combines hierarchical simulation, time acceleration algorithms and hybrid simulation to reduce simulation time is introduced.
Lattice surgery on the Raussendorf lattice

NASA Astrophysics Data System (ADS)

Herr, Daniel; Paler, Alexandru; Devitt, Simon J.; Nori, Franco

2018-07-01

Lattice surgery is a method to perform quantum computation fault-tolerantly by using operations on boundary qubits between different patches of the planar code. This technique allows for universal planar code computation without eliminating the intrinsic two-dimensional nearest-neighbor properties of the surface code that eases physical hardware implementations. Lattice surgery approaches to algorithmic compilation and optimization have been demonstrated to be more resource efficient for resource-intensive components of a fault-tolerant algorithm, and consequently may be preferable over braid-based logic. Lattice surgery can be extended to the Raussendorf lattice, providing a measurement-based approach to the surface code. In this paper we describe how lattice surgery can be performed on the Raussendorf lattice and therefore give a viable alternative to computation using braiding in measurement-based implementations of topological codes.
A verified design of a fault-tolerant clock synchronization circuit: Preliminary investigations

NASA Technical Reports Server (NTRS)

Miner, Paul S.

1992-01-01

Schneider demonstrates that many fault tolerant clock synchronization algorithms can be represented as refinements of a single proven correct paradigm. Shankar provides mechanical proof that Schneider's schema achieves Byzantine fault tolerant clock synchronization provided that 11 constraints are satisfied. Some of the constraints are assumptions about physical properties of the system and cannot be established formally. Proofs are given that the fault tolerant midpoint convergence function satisfies three of the constraints. A hardware design is presented, implementing the fault tolerant midpoint function, which is shown to satisfy the remaining constraints. The synchronization circuit will recover completely from transient faults provided the maximum fault assumption is not violated. The initialization protocol for the circuit also provides a recovery mechanism from total system failure caused by correlated transient faults.
The Design of a Fault-Tolerant COTS-Based Bus Architecture

NASA Technical Reports Server (NTRS)

Chau, Savio N.; Alkalai, Leon; Burt, John B.; Tai, Ann T.

1999-01-01

In this paper, we report our experiences and findings on the design of a fault-tolerant bus architecture comprised of two COTS buses, the IEEE 1394 and the 12C. This fault-tolerant bus is the backbone system bus for the avionics architecture of the X2000 program at the Jet Propulsion Laboratory. COTS buses are attractive because of the availability of low cost commercial products. However, they are not specifically designed for highly reliable applications such as long-life deep-space missions. The X2000 design team has devised a multi-level fault tolerance approach to compensate for this shortcoming of COTS buses. First, the approach enhances the fault tolerance capabilities of the IEEE 1394 and 12 C buses by adding a layer of fault handling hardware and software. Second, algorithms are developed to enable the IEEE 1394 and the 12 C buses assist each other to isolate and recovery from faults. Third, the set of IEEE 1394 and 12 C buses is duplicated to further enhance system reliability. The X2000 design team has paid special attention to guarantee that all fault tolerance provisions will not cause the bus design to deviate from the commercial standard specifications. Otherwise, the economic attractiveness of using COTS will be diminished. The hardware and software design of the X2000 fault-tolerant bus are being implemented and flight hardware will be delivered to the ST4 and Europa Orbiter missions.
Redundancy management for efficient fault recovery in NASA's distributed computing system

NASA Technical Reports Server (NTRS)

Malek, Miroslaw; Pandya, Mihir; Yau, Kitty

1991-01-01

The management of redundancy in computer systems was studied and guidelines were provided for the development of NASA's fault-tolerant distributed systems. Fault recovery and reconfiguration mechanisms were examined. A theoretical foundation was laid for redundancy management by efficient reconfiguration methods and algorithmic diversity. Algorithms were developed to optimize the resources for embedding of computational graphs of tasks in the system architecture and reconfiguration of these tasks after a failure has occurred. The computational structure represented by a path and the complete binary tree was considered and the mesh and hypercube architectures were targeted for their embeddings. The innovative concept of Hybrid Algorithm Technique was introduced. This new technique provides a mechanism for obtaining fault tolerance while exhibiting improved performance.
Verification of fault-tolerant clock synchronization systems. M.S. Thesis - College of William and Mary, 1992

NASA Technical Reports Server (NTRS)

Miner, Paul S.

1993-01-01

A critical function in a fault-tolerant computer architecture is the synchronization of the redundant computing elements. The synchronization algorithm must include safeguards to ensure that failed components do not corrupt the behavior of good clocks. Reasoning about fault-tolerant clock synchronization is difficult because of the possibility of subtle interactions involving failed components. Therefore, mechanical proof systems are used to ensure that the verification of the synchronization system is correct. In 1987, Schneider presented a general proof of correctness for several fault-tolerant clock synchronization algorithms. Subsequently, Shankar verified Schneider's proof by using the mechanical proof system EHDM. This proof ensures that any system satisfying its underlying assumptions will provide Byzantine fault-tolerant clock synchronization. The utility of Shankar's mechanization of Schneider's theory for the verification of clock synchronization systems is explored. Some limitations of Shankar's mechanically verified theory were encountered. With minor modifications to the theory, a mechanically checked proof is provided that removes these limitations. The revised theory also allows for proven recovery from transient faults. Use of the revised theory is illustrated with the verification of an abstract design of a clock synchronization system.
Data-based fault-tolerant control of high-speed trains with traction/braking notch nonlinearities and actuator failures.

PubMed

Song, Qi; Song, Yong-Duan

2011-12-01

This paper investigates the position and velocity tracking control problem of high-speed trains with multiple vehicles connected through couplers. A dynamic model reflecting nonlinear and elastic impacts between adjacent vehicles as well as traction/braking nonlinearities and actuation faults is derived. Neuroadaptive fault-tolerant control algorithms are developed to account for various factors such as input nonlinearities, actuator failures, and uncertain impacts of in-train forces in the system simultaneously. The resultant control scheme is essentially independent of system model and is primarily data-driven because with the appropriate input-output data, the proposed control algorithms are capable of automatically generating the intermediate control parameters, neuro-weights, and the compensation signals, literally producing the traction/braking force based upon input and response data only--the whole process does not require precise information on system model or system parameter, nor human intervention. The effectiveness of the proposed approach is also confirmed through numerical simulations.
Development and Evaluation of Fault-Tolerant Flight Control Systems

NASA Technical Reports Server (NTRS)

Song, Yong D.; Gupta, Kajal (Technical Monitor)

2004-01-01

The research is concerned with developing a new approach to enhancing fault tolerance of flight control systems. The original motivation for fault-tolerant control comes from the need for safe operation of control elements (e.g. actuators) in the event of hardware failures in high reliability systems. One such example is modem space vehicle subjected to actuator/sensor impairments. A major task in flight control is to revise the control policy to balance impairment detectability and to achieve sufficient robustness. This involves careful selection of types and parameters of the controllers and the impairment detecting filters used. It also involves a decision, upon the identification of some failures, on whether and how a control reconfiguration should take place in order to maintain a certain system performance level. In this project new flight dynamic model under uncertain flight conditions is considered, in which the effects of both ramp and jump faults are reflected. Stabilization algorithms based on neural network and adaptive method are derived. The control algorithms are shown to be effective in dealing with uncertain dynamics due to external disturbances and unpredictable faults. The overall strategy is easy to set up and the computation involved is much less as compared with other strategies. Computer simulation software is developed. A serious of simulation studies have been conducted with varying flight conditions.
Optimal fault-tolerant control strategy of a solid oxide fuel cell system

NASA Astrophysics Data System (ADS)

Wu, Xiaojuan; Gao, Danhui

2017-10-01

For solid oxide fuel cell (SOFC) development, load tracking, heat management, air excess ratio constraint, high efficiency, low cost and fault diagnosis are six key issues. However, no literature studies the control techniques combining optimization and fault diagnosis for the SOFC system. An optimal fault-tolerant control strategy is presented in this paper, which involves four parts: a fault diagnosis module, a switching module, two backup optimizers and a controller loop. The fault diagnosis part is presented to identify the SOFC current fault type, and the switching module is used to select the appropriate backup optimizer based on the diagnosis result. NSGA-II and TOPSIS are employed to design the two backup optimizers under normal and air compressor fault states. PID algorithm is proposed to design the control loop, which includes a power tracking controller, an anode inlet temperature controller, a cathode inlet temperature controller and an air excess ratio controller. The simulation results show the proposed optimal fault-tolerant control method can track the power, temperature and air excess ratio at the desired values, simultaneously achieving the maximum efficiency and the minimum unit cost in the case of SOFC normal and even in the air compressor fault.
Multi-Sensor Fusion with Interaction Multiple Model and Chi-Square Test Tolerant Filter.

PubMed

Yang, Chun; Mohammadi, Arash; Chen, Qing-Wei

2016-11-02

Motivated by the key importance of multi-sensor information fusion algorithms in the state-of-the-art integrated navigation systems due to recent advancements in sensor technologies, telecommunication, and navigation systems, the paper proposes an improved and innovative fault-tolerant fusion framework. An integrated navigation system is considered consisting of four sensory sub-systems, i.e., Strap-down Inertial Navigation System (SINS), Global Navigation System (GPS), the Bei-Dou2 (BD2) and Celestial Navigation System (CNS) navigation sensors. In such multi-sensor applications, on the one hand, the design of an efficient fusion methodology is extremely constrained specially when no information regarding the system's error characteristics is available. On the other hand, the development of an accurate fault detection and integrity monitoring solution is both challenging and critical. The paper addresses the sensitivity issues of conventional fault detection solutions and the unavailability of a precisely known system model by jointly designing fault detection and information fusion algorithms. In particular, by using ideas from Interacting Multiple Model (IMM) filters, the uncertainty of the system will be adjusted adaptively by model probabilities and using the proposed fuzzy-based fusion framework. The paper also addresses the problem of using corrupted measurements for fault detection purposes by designing a two state propagator chi-square test jointly with the fusion algorithm. Two IMM predictors, running in parallel, are used and alternatively reactivated based on the received information form the fusion filter to increase the reliability and accuracy of the proposed detection solution. With the combination of the IMM and the proposed fusion method, we increase the failure sensitivity of the detection system and, thereby, significantly increase the overall reliability and accuracy of the integrated navigation system. Simulation results indicate that the proposed fault tolerant fusion framework provides superior performance over its traditional counterparts.
Multi-Sensor Fusion with Interaction Multiple Model and Chi-Square Test Tolerant Filter

PubMed Central

Yang, Chun; Mohammadi, Arash; Chen, Qing-Wei

2016-01-01

Motivated by the key importance of multi-sensor information fusion algorithms in the state-of-the-art integrated navigation systems due to recent advancements in sensor technologies, telecommunication, and navigation systems, the paper proposes an improved and innovative fault-tolerant fusion framework. An integrated navigation system is considered consisting of four sensory sub-systems, i.e., Strap-down Inertial Navigation System (SINS), Global Navigation System (GPS), the Bei-Dou2 (BD2) and Celestial Navigation System (CNS) navigation sensors. In such multi-sensor applications, on the one hand, the design of an efficient fusion methodology is extremely constrained specially when no information regarding the system’s error characteristics is available. On the other hand, the development of an accurate fault detection and integrity monitoring solution is both challenging and critical. The paper addresses the sensitivity issues of conventional fault detection solutions and the unavailability of a precisely known system model by jointly designing fault detection and information fusion algorithms. In particular, by using ideas from Interacting Multiple Model (IMM) filters, the uncertainty of the system will be adjusted adaptively by model probabilities and using the proposed fuzzy-based fusion framework. The paper also addresses the problem of using corrupted measurements for fault detection purposes by designing a two state propagator chi-square test jointly with the fusion algorithm. Two IMM predictors, running in parallel, are used and alternatively reactivated based on the received information form the fusion filter to increase the reliability and accuracy of the proposed detection solution. With the combination of the IMM and the proposed fusion method, we increase the failure sensitivity of the detection system and, thereby, significantly increase the overall reliability and accuracy of the integrated navigation system. Simulation results indicate that the proposed fault tolerant fusion framework provides superior performance over its traditional counterparts. PMID:27827832
Cost and benefits design optimization model for fault tolerant flight control systems

NASA Technical Reports Server (NTRS)

Rose, J.

1982-01-01

Requirements and specifications for a method of optimizing the design of fault-tolerant flight control systems are provided. Algorithms that could be used for developing new and modifying existing computer programs are also provided, with recommendations for follow-on work.

Fault Tolerance Middleware for a Multi-Core System

NASA Technical Reports Server (NTRS)

Some, Raphael R.; Springer, Paul L.; Zima, Hans P.; James, Mark; Wagner, David A.

2012-01-01

Fault Tolerance Middleware (FTM) provides a framework to run on a dedicated core of a multi-core system and handles detection of single-event upsets (SEUs), and the responses to those SEUs, occurring in an application running on multiple cores of the processor. This software was written expressly for a multi-core system and can support different kinds of fault strategies, such as introspection, algorithm-based fault tolerance (ABFT), and triple modular redundancy (TMR). It focuses on providing fault tolerance for the application code, and represents the first step in a plan to eventually include fault tolerance in message passing and the FTM itself. In the multi-core system, the FTM resides on a single, dedicated core, separate from the cores used by the application. This is done in order to isolate the FTM from application faults and to allow it to swap out any application core for a substitute. The structure of the FTM consists of an interface to a fault tolerant strategy module, a responder module, a fault manager module, an error factory, and an error mapper that determines the severity of the error. In the present reference implementation, the only fault tolerant strategy implemented is introspection. The introspection code waits for an application node to send an error notification to it. It then uses the error factory to create an error object, and at this time, a severity level is assigned to the error. The introspection code uses its built-in knowledge base to generate a recommended response to the error. Responses might include ignoring the error, logging it, rolling back the application to a previously saved checkpoint, swapping in a new node to replace a bad one, or restarting the application. The original error and recommended response are passed to the top-level fault manager module, which invokes the response. The responder module also notifies the introspection module of the generated response. This provides additional information to the introspection module that it can use in generating its next response. For example, if the responder triggers an application rollback and errors are still occurring, the introspection module may decide to recommend an application restart.
Software-Implemented Fault Tolerance in Communications Systems

NASA Technical Reports Server (NTRS)

Gantenbein, Rex E.

1994-01-01

Software-implemented fault tolerance (SIFT) is used in many computer-based command, control, and communications (C(3)) systems to provide the nearly continuous availability that they require. In the communications subsystem of Space Station Alpha, SIFT algorithms are used to detect and recover from failures in the data and command link between the Station and its ground support. The paper presents a review of these algorithms and discusses how such techniques can be applied to similar systems found in applications such as manufacturing control, military communications, and programmable devices such as pacemakers. With support from the Tracking and Communication Division of NASA's Johnson Space Center, researchers at the University of Wyoming are developing a testbed for evaluating the effectiveness of these algorithms prior to their deployment. This testbed will be capable of simulating a variety of C(3) system failures and recording the response of the Space Station SIFT algorithms to these failures. The design of this testbed and the applicability of the approach in other environments is described.
Verification of the FtCayuga fault-tolerant microprocessor system. Volume 1: A case study in theorem prover-based verification

NASA Technical Reports Server (NTRS)

Srivas, Mandayam; Bickford, Mark

1991-01-01

The design and formal verification of a hardware system for a task that is an important component of a fault tolerant computer architecture for flight control systems is presented. The hardware system implements an algorithm for obtaining interactive consistancy (byzantine agreement) among four microprocessors as a special instruction on the processors. The property verified insures that an execution of the special instruction by the processors correctly accomplishes interactive consistency, provided certain preconditions hold. An assumption is made that the processors execute synchronously. For verification, the authors used a computer aided design hardware design verification tool, Spectool, and the theorem prover, Clio. A major contribution of the work is the demonstration of a significant fault tolerant hardware design that is mechanically verified by a theorem prover.
Verification of the FtCayuga fault-tolerant microprocessor system. Volume 2: Formal specification and correctness theorems

NASA Technical Reports Server (NTRS)

Bickford, Mark; Srivas, Mandayam

1991-01-01

Presented here is a formal specification and verification of a property of a quadruplicately redundant fault tolerant microprocessor system design. A complete listing of the formal specification of the system and the correctness theorems that are proved are given. The system performs the task of obtaining interactive consistency among the processors using a special instruction on the processors. The design is based on an algorithm proposed by Pease, Shostak, and Lamport. The property verified insures that an execution of the special instruction by the processors correctly accomplishes interactive consistency, providing certain preconditions hold, using a computer aided design verification tool, Spectool, and the theorem prover, Clio. A major contribution of the work is the demonstration of a significant fault tolerant hardware design that is mechanically verified by a theorem prover.
Induction machine bearing faults detection based on a multi-dimensional MUSIC algorithm and maximum likelihood estimation.

PubMed

Elbouchikhi, Elhoussin; Choqueuse, Vincent; Benbouzid, Mohamed

2016-07-01

Condition monitoring of electric drives is of paramount importance since it contributes to enhance the system reliability and availability. Moreover, the knowledge about the fault mode behavior is extremely important in order to improve system protection and fault-tolerant control. Fault detection and diagnosis in squirrel cage induction machines based on motor current signature analysis (MCSA) has been widely investigated. Several high resolution spectral estimation techniques have been developed and used to detect induction machine abnormal operating conditions. This paper focuses on the application of MCSA for the detection of abnormal mechanical conditions that may lead to induction machines failure. In fact, this paper is devoted to the detection of single-point defects in bearings based on parametric spectral estimation. A multi-dimensional MUSIC (MD MUSIC) algorithm has been developed for bearing faults detection based on bearing faults characteristic frequencies. This method has been used to estimate the fundamental frequency and the fault related frequency. Then, an amplitude estimator of the fault characteristic frequencies has been proposed and fault indicator has been derived for fault severity measurement. The proposed bearing faults detection approach is assessed using simulated stator currents data, issued from a coupled electromagnetic circuits approach for air-gap eccentricity emulating bearing faults. Then, experimental data are used for validation purposes. Copyright © 2016 ISA. Published by Elsevier Ltd. All rights reserved.
Distributed asynchronous microprocessor architectures in fault tolerant integrated flight systems

NASA Technical Reports Server (NTRS)

Dunn, W. R.

1983-01-01

The paper discusses the implementation of fault tolerant digital flight control and navigation systems for rotorcraft application. It is shown that in implementing fault tolerance at the systems level using advanced LSI/VLSI technology, aircraft physical layout and flight systems requirements tend to define a system architecture of distributed, asynchronous microprocessors in which fault tolerance can be achieved locally through hardware redundancy and/or globally through application of analytical redundancy. The effects of asynchronism on the execution of dynamic flight software is discussed. It is shown that if the asynchronous microprocessors have knowledge of time, these errors can be significantly reduced through appropiate modifications of the flight software. Finally, the papear extends previous work to show that through the combined use of time referencing and stable flight algorithms, individual microprocessors can be configured to autonomously tolerate intermittent faults.
Reliability Assessment for Low-cost Unmanned Aerial Vehicles

NASA Astrophysics Data System (ADS)

Freeman, Paul Michael

Existing low-cost unmanned aerospace systems are unreliable, and engineers must blend reliability analysis with fault-tolerant control in novel ways. This dissertation introduces the University of Minnesota unmanned aerial vehicle flight research platform, a comprehensive simulation and flight test facility for reliability and fault-tolerance research. An industry-standard reliability assessment technique, the failure modes and effects analysis, is performed for an unmanned aircraft. Particular attention is afforded to the control surface and servo-actuation subsystem. Maintaining effector health is essential for safe flight; failures may lead to loss of control incidents. Failure likelihood, severity, and risk are qualitatively assessed for several effector failure modes. Design changes are recommended to improve aircraft reliability based on this analysis. Most notably, the control surfaces are split, providing independent actuation and dual-redundancy. The simulation models for control surface aerodynamic effects are updated to reflect the split surfaces using a first-principles geometric analysis. The failure modes and effects analysis is extended by using a high-fidelity nonlinear aircraft simulation. A trim state discovery is performed to identify the achievable steady, wings-level flight envelope of the healthy and damaged vehicle. Tolerance of elevator actuator failures is studied using familiar tools from linear systems analysis. This analysis reveals significant inherent performance limitations for candidate adaptive/reconfigurable control algorithms used for the vehicle. Moreover, it demonstrates how these tools can be applied in a design feedback loop to make safety-critical unmanned systems more reliable. Control surface impairments that do occur must be quickly and accurately detected. This dissertation also considers fault detection and identification for an unmanned aerial vehicle using model-based and model-free approaches and applies those algorithms to experimental faulted and unfaulted flight test data. Flight tests are conducted with actuator faults that affect the plant input and sensor faults that affect the vehicle state measurements. A model-based detection strategy is designed and uses robust linear filtering methods to reject exogenous disturbances, e.g. wind, while providing robustness to model variation. A data-driven algorithm is developed to operate exclusively on raw flight test data without physical model knowledge. The fault detection and identification performance of these complementary but different methods is compared. Together, enhanced reliability assessment and multi-pronged fault detection and identification techniques can help to bring about the next generation of reliable low-cost unmanned aircraft.
Fault-tolerant optimised tracking control for unknown discrete-time linear systems using a combined reinforcement learning and residual compensation methodology

NASA Astrophysics Data System (ADS)

Han, Ke-Zhen; Feng, Jian; Cui, Xiaohong

2017-10-01

This paper considers the fault-tolerant optimised tracking control (FTOTC) problem for unknown discrete-time linear system. A research scheme is proposed on the basis of data-based parity space identification, reinforcement learning and residual compensation techniques. The main characteristic of this research scheme lies in the parity-space-identification-based simultaneous tracking control and residual compensation. The specific technical line consists of four main contents: apply subspace aided method to design observer-based residual generator; use reinforcement Q-learning approach to solve optimised tracking control policy; rely on robust H∞ theory to achieve noise attenuation; adopt fault estimation triggered by residual generator to perform fault compensation. To clarify the design and implementation procedures, an integrated algorithm is further constructed to link up these four functional units. The detailed analysis and proof are subsequently given to explain the guaranteed FTOTC performance of the proposed conclusions. Finally, a case simulation is provided to verify its effectiveness.
A novel N-input voting algorithm for X-by-wire fault-tolerant systems.

PubMed

Karimi, Abbas; Zarafshan, Faraneh; Al-Haddad, S A R; Ramli, Abdul Rahman

2014-01-01

Voting is an important operation in multichannel computation paradigm and realization of ultrareliable and real-time control systems that arbitrates among the results of N redundant variants. These systems include N-modular redundant (NMR) hardware systems and diversely designed software systems based on N-version programming (NVP). Depending on the characteristics of the application and the type of selected voter, the voting algorithms can be implemented for either hardware or software systems. In this paper, a novel voting algorithm is introduced for real-time fault-tolerant control systems, appropriate for applications in which N is large. Then, its behavior has been software implemented in different scenarios of error-injection on the system inputs. The results of analyzed evaluations through plots and statistical computations have demonstrated that this novel algorithm does not have the limitations of some popular voting algorithms such as median and weighted; moreover, it is able to significantly increase the reliability and availability of the system in the best case to 2489.7% and 626.74%, respectively, and in the worst case to 3.84% and 1.55%, respectively.
Markov chain algorithms: a template for building future robust low-power systems

PubMed Central

Deka, Biplab; Birklykke, Alex A.; Duwe, Henry; Mansinghka, Vikash K.; Kumar, Rakesh

2014-01-01

Although computational systems are looking towards post CMOS devices in the pursuit of lower power, the expected inherent unreliability of such devices makes it difficult to design robust systems without additional power overheads for guaranteeing robustness. As such, algorithmic structures with inherent ability to tolerate computational errors are of significant interest. We propose to cast applications as stochastic algorithms based on Markov chains (MCs) as such algorithms are both sufficiently general and tolerant to transition errors. We show with four example applications—Boolean satisfiability, sorting, low-density parity-check decoding and clustering—how applications can be cast as MC algorithms. Using algorithmic fault injection techniques, we demonstrate the robustness of these implementations to transition errors with high error rates. Based on these results, we make a case for using MCs as an algorithmic template for future robust low-power systems. PMID:24842030
Fault tolerance in space-based digital signal processing and switching systems: Protecting up-link processing resources, demultiplexer, demodulator, and decoder

NASA Technical Reports Server (NTRS)

Redinbo, Robert

1994-01-01

Fault tolerance features in the first three major subsystems appearing in the next generation of communications satellites are described. These satellites will contain extensive but efficient high-speed processing and switching capabilities to support the low signal strengths associated with very small aperture terminals. The terminals' numerous data channels are combined through frequency division multiplexing (FDM) on the up-links and are protected individually by forward error-correcting (FEC) binary convolutional codes. The front-end processing resources, demultiplexer, demodulators, and FEC decoders extract all data channels which are then switched individually, multiplexed, and remodulated before retransmission to earth terminals through narrow beam spot antennas. Algorithm based fault tolerance (ABFT) techniques, which relate real number parity values with data flows and operations, are used to protect the data processing operations. The additional checking features utilize resources that can be substituted for normal processing elements when resource reconfiguration is required to replace a failed unit.
Development of an interface for an ultrareliable fault-tolerant control system and an electronic servo-control unit

NASA Technical Reports Server (NTRS)

Shaver, Charles; Williamson, Michael

1986-01-01

The NASA Ames Research Center sponsors a research program for the investigation of Intelligent Flight Control Actuation systems. The use of artificial intelligence techniques in conjunction with algorithmic techniques for autonomous, decentralized fault management of flight-control actuation systems is explored under this program. The design, development, and operation of the interface for laboratory investigation of this program is documented. The interface, architecturally based on the Intel 8751 microcontroller, is an interrupt-driven system designed to receive a digital message from an ultrareliable fault-tolerant control system (UFTCS). The interface links the UFTCS to an electronic servo-control unit, which controls a set of hydraulic actuators. It was necessary to build a UFTCS emulator (also based on the Intel 8751) to provide signal sources for testing the equipment.
Sequential Test Strategies for Multiple Fault Isolation

NASA Technical Reports Server (NTRS)

Shakeri, M.; Pattipati, Krishna R.; Raghavan, V.; Patterson-Hine, Ann; Kell, T.

1997-01-01

In this paper, we consider the problem of constructing near optimal test sequencing algorithms for diagnosing multiple faults in redundant (fault-tolerant) systems. The computational complexity of solving the optimal multiple-fault isolation problem is super-exponential, that is, it is much more difficult than the single-fault isolation problem, which, by itself, is NP-hard. By employing concepts from information theory and Lagrangian relaxation, we present several static and dynamic (on-line or interactive) test sequencing algorithms for the multiple fault isolation problem that provide a trade-off between the degree of suboptimality and computational complexity. Furthermore, we present novel diagnostic strategies that generate a static diagnostic directed graph (digraph), instead of a static diagnostic tree, for multiple fault diagnosis. Using this approach, the storage complexity of the overall diagnostic strategy reduces substantially. Computational results based on real-world systems indicate that the size of a static multiple fault strategy is strictly related to the structure of the system, and that the use of an on-line multiple fault strategy can diagnose faults in systems with as many as 10,000 failure sources.
Sliding mode fault tolerant control dealing with modeling uncertainties and actuator faults.

PubMed

Wang, Tao; Xie, Wenfang; Zhang, Youmin

2012-05-01

In this paper, two sliding mode control algorithms are developed for nonlinear systems with both modeling uncertainties and actuator faults. The first algorithm is developed under an assumption that the uncertainty bounds are known. Different design parameters are utilized to deal with modeling uncertainties and actuator faults, respectively. The second algorithm is an adaptive version of the first one, which is developed to accommodate uncertainties and faults without utilizing exact bounds information. The stability of the overall control systems is proved by using a Lyapunov function. The effectiveness of the developed algorithms have been verified on a nonlinear longitudinal model of Boeing 747-100/200. Copyright © 2012 ISA. Published by Elsevier Ltd. All rights reserved.
Copilot: Monitoring Embedded Systems

NASA Technical Reports Server (NTRS)

Pike, Lee; Wegmann, Nis; Niller, Sebastian; Goodloe, Alwyn

2012-01-01

Runtime verification (RV) is a natural fit for ultra-critical systems, where correctness is imperative. In ultra-critical systems, even if the software is fault-free, because of the inherent unreliability of commodity hardware and the adversity of operational environments, processing units (and their hosted software) are replicated, and fault-tolerant algorithms are used to compare the outputs. We investigate both software monitoring in distributed fault-tolerant systems, as well as implementing fault-tolerance mechanisms using RV techniques. We describe the Copilot language and compiler, specifically designed for generating monitors for distributed, hard real-time systems. We also describe two case-studies in which we generated Copilot monitors in avionics systems.
Evolutionary Based Techniques for Fault Tolerant Field Programmable Gate Arrays

NASA Technical Reports Server (NTRS)

Larchev, Gregory V.; Lohn, Jason D.

2006-01-01

The use of SRAM-based Field Programmable Gate Arrays (FPGAs) is becoming more and more prevalent in space applications. Commercial-grade FPGAs are potentially susceptible to permanently debilitating Single-Event Latchups (SELs). Repair methods based on Evolutionary Algorithms may be applied to FPGA circuits to enable successful fault recovery. This paper presents the experimental results of applying such methods to repair four commonly used circuits (quadrature decoder, 3-by-3-bit multiplier, 3-by-3-bit adder, 440-7 decoder) into which a number of simulated faults have been introduced. The results suggest that evolutionary repair techniques can improve the process of fault recovery when used instead of or as a supplement to Triple Modular Redundancy (TMR), which is currently the predominant method for mitigating FPGA faults.
Parallel and fault-tolerant algorithms for hypercube multiprocessors

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aykanat, C.

1988-01-01

Several techniques for increasing the performance of parallel algorithms on distributed-memory message-passing multi-processor systems are investigated. These techniques are effectively implemented for the parallelization of the Scaled Conjugate Gradient (SCG) algorithm on a hypercube connected message-passing multi-processor. Significant performance improvement is achieved by using these techniques. The SCG algorithm is used for the solution phase of an FE modeling system. Almost linear speed-up is achieved, and it is shown that hypercube topology is scalable for an FE class of problem. The SCG algorithm is also shown to be suitable for vectorization, and near supercomputer performance is achieved on a vectormore » hypercube multiprocessor by exploiting both parallelization and vectorization. Fault-tolerance issues for the parallel SCG algorithm and for the hypercube topology are also addressed.« less
Hybrid routing technique for a fault-tolerant, integrated information network

NASA Technical Reports Server (NTRS)

Meredith, B. D.

1986-01-01

The evolutionary growth of the space station and the diverse activities onboard are expected to require a hierarchy of integrated, local area networks capable of supporting data, voice, and video communications. In addition, fault-tolerant network operation is necessary to protect communications between critical systems attached to the net and to relieve the valuable human resources onboard the space station of time-critical data system repair tasks. A key issue for the design of the fault-tolerant, integrated network is the development of a robust routing algorithm which dynamically selects the optimum communication paths through the net. A routing technique is described that adapts to topological changes in the network to support fault-tolerant operation and system evolvability.
Havens: Explicit Reliable Memory Regions for HPC Applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hukerikar, Saurabh; Engelmann, Christian

2016-01-01

Supporting error resilience in future exascale-class supercomputing systems is a critical challenge. Due to transistor scaling trends and increasing memory density, scientific simulations are expected to experience more interruptions caused by transient errors in the system memory. Existing hardware-based detection and recovery techniques will be inadequate to manage the presence of high memory fault rates. In this paper we propose a partial memory protection scheme based on region-based memory management. We define the concept of regions called havens that provide fault protection for program objects. We provide reliability for the regions through a software-based parity protection mechanism. Our approach enablesmore » critical program objects to be placed in these havens. The fault coverage provided by our approach is application agnostic, unlike algorithm-based fault tolerance techniques.« less
Gait planning for a quadruped robot with one faulty actuator

NASA Astrophysics Data System (ADS)

Chen, Xianbao; Gao, Feng; Qi, Chenkun; Tian, Xinghua

2015-01-01

Fault tolerance is essential for quadruped robots when they work in remote areas or hazardous environments. Many fault-tolerant gaits planning method proposed in the past decade constrained more degrees of freedom(DOFs) of a robot than necessary. Thus a novel method to realize the fault-tolerant walking is proposed. The mobility of the robot is analyzed first by using the screw theory. The result shows that the translation of the center of body(CoB) can be kept with one faulty actuator if the rotations of the body are controlled. Thus the DOFs of the robot body are divided into two parts: the translation of the CoB and the rotation of the body. The kinematic model of the whole robot is built, the algorithm is developed to actively control the body orientations at the velocity level so that the planned CoB trajectory can be realized in spite of the constraint of the faulty actuator. This gait has a similar generation sequence with the normal gait and can be applied to the robot at any position. Simulations and experiments of the fault-tolerant gait with one faulty actuator are carried out. The CoB errors and the body rotation angles are measured. Comparing to the traditional fault-tolerant gait they can be reduced by at least 50%. A fault-tolerant gait planning algorithm is presented, which not only realizes the walking of a quadruped robot with a faulty actuator, but also efficiently improves the walking performances by taking full advantage of the remaining operational actuators according to the results of the simulations and experiments.

Agent Based Fault Tolerance for the Mobile Environment

NASA Astrophysics Data System (ADS)

Park, Taesoon

This paper presents a fault-tolerance scheme based on mobile agents for the reliable mobile computing systems. Mobility of the agent is suitable to trace the mobile hosts and the intelligence of the agent makes it efficient to support the fault tolerance services. This paper presents two approaches to implement the mobile agent based fault tolerant service and their performances are evaluated and compared with other fault-tolerant schemes.
Allocating application to group of consecutive processors in fault-tolerant deadlock-free routing path defined by routers obeying same rules for path selection

DOEpatents

Leung, Vitus J [Albuquerque, NM; Phillips, Cynthia A [Albuquerque, NM; Bender, Michael A [East Northport, NY; Bunde, David P [Urbana, IL

2009-07-21

In a multiple processor computing apparatus, directional routing restrictions and a logical channel construct permit fault tolerant, deadlock-free routing. Processor allocation can be performed by creating a linear ordering of the processors based on routing rules used for routing communications between the processors. The linear ordering can assume a loop configuration, and bin-packing is applied to this loop configuration. The interconnection of the processors can be conceptualized as a generally rectangular 3-dimensional grid, and the MC allocation algorithm is applied with respect to the 3-dimensional grid.
A Performance Prediction Model for a Fault-Tolerant Computer During Recovery and Restoration

NASA Technical Reports Server (NTRS)

Obando, Rodrigo A.; Stoughton, John W.

1995-01-01

The modeling and design of a fault-tolerant multiprocessor system is addressed. Of interest is the behavior of the system during recovery and restoration after a fault has occurred. The multiprocessor systems are based on the Algorithm to Architecture Mapping Model (ATAMM) and the fault considered is the death of a processor. The developed model is useful in the determination of performance bounds of the system during recovery and restoration. The performance bounds include time to recover from the fault, time to restore the system, and determination of any permanent delay in the input to output latency after the system has regained steady state. Implementation of an ATAMM based computer was developed for a four-processor generic VHSIC spaceborne computer (GVSC) as the target system. A simulation of the GVSC was also written on the code used in the ATAMM Multicomputer Operating System (AMOS). The simulation is used to verify the new model for tracking the propagation of the delay through the system and predicting the behavior of the transient state of recovery and restoration. The model is shown to accurately predict the transient behavior of an ATAMM based multicomputer during recovery and restoration.
Fault-tolerant wait-free shared objects

NASA Technical Reports Server (NTRS)

Jayanti, Prasad; Chandra, Tushar D.; Toueg, Sam

1992-01-01

A concurrent system consists of processes communicating via shared objects, such as shared variables, queues, etc. The concept of wait-freedom was introduced to cope with process failures: each process that accesses a wait-free object is guaranteed to get a response even if all the other processes crash. However, if a wait-free object 'crashes,' all the processes that access that object are prevented from making progress. In this paper, we introduce the concept of fault-tolerant wait-free objects, and study the problem of implementing them. We give a universal method to construct fault-tolerant wait-free objects, for all types of 'responsive' failures (including one in which faulty objects may 'lie'). In sharp contrast, we prove that many common and interesting types (such as queues, sets, and test&set) have no fault-tolerant wait-free implementations even under the most benign of the 'non-responsive' types of failure. We also introduce several concepts and techniques that are central to the design of fault-tolerant concurrent systems: the concepts of self-implementation and graceful degradation, and techniques to automatically increase the fault-tolerance of implementations. We prove matching lower bounds on the resource complexity of most of our algorithms.
Method and apparatus for fault tolerance

NASA Technical Reports Server (NTRS)

Masson, Gerald M. (Inventor); Sullivan, Gregory F. (Inventor)

1993-01-01

A method and apparatus for achieving fault tolerance in a computer system having at least a first central processing unit and a second central processing unit. The method comprises the steps of first executing a first algorithm in the first central processing unit on input which produces a first output as well as a certification trail. Next, executing a second algorithm in the second central processing unit on the input and on at least a portion of the certification trail which produces a second output. The second algorithm has a faster execution time than the first algorithm for a given input. Then, comparing the first and second outputs such that an error result is produced if the first and second outputs are not the same. The step of executing a first algorithm and the step of executing a second algorithm preferably takes place over essentially the same time period.
Fault-tolerant nonlinear adaptive flight control using sliding mode online learning.

PubMed

Krüger, Thomas; Schnetter, Philipp; Placzek, Robin; Vörsmann, Peter

2012-08-01

An expanded nonlinear model inversion flight control strategy using sliding mode online learning for neural networks is presented. The proposed control strategy is implemented for a small unmanned aircraft system (UAS). This class of aircraft is very susceptible towards nonlinearities like atmospheric turbulence, model uncertainties and of course system failures. Therefore, these systems mark a sensible testbed to evaluate fault-tolerant, adaptive flight control strategies. Within this work the concept of feedback linearization is combined with feed forward neural networks to compensate for inversion errors and other nonlinear effects. Backpropagation-based adaption laws of the network weights are used for online training. Within these adaption laws the standard gradient descent backpropagation algorithm is augmented with the concept of sliding mode control (SMC). Implemented as a learning algorithm, this nonlinear control strategy treats the neural network as a controlled system and allows a stable, dynamic calculation of the learning rates. While considering the system's stability, this robust online learning method therefore offers a higher speed of convergence, especially in the presence of external disturbances. The SMC-based flight controller is tested and compared with the standard gradient descent backpropagation algorithm in the presence of system failures. Copyright © 2012 Elsevier Ltd. All rights reserved.
Onboard FPGA-based SAR processing for future spaceborne systems

NASA Technical Reports Server (NTRS)

Le, Charles; Chan, Samuel; Cheng, Frank; Fang, Winston; Fischman, Mark; Hensley, Scott; Johnson, Robert; Jourdan, Michael; Marina, Miguel; Parham, Bruce;

2004-01-01

We present a real-time high-performance and fault-tolerant FPGA-based hardware architecture for the processing of synthetic aperture radar (SAR) images in future spaceborne system. In particular, we will discuss the integrated design approach, from top-level algorithm specifications and system requirements, design methodology, functional verification and performance validation, down to hardware design and implementation.

On-line node fault injection training algorithm for MLP networks: objective function and convergence analysis.

PubMed

Sum, John Pui-Fai; Leung, Chi-Sing; Ho, Kevin I-J

2012-02-01

Improving fault tolerance of a neural network has been studied for more than two decades. Various training algorithms have been proposed in sequel. The on-line node fault injection-based algorithm is one of these algorithms, in which hidden nodes randomly output zeros during training. While the idea is simple, theoretical analyses on this algorithm are far from complete. This paper presents its objective function and the convergence proof. We consider three cases for multilayer perceptrons (MLPs). They are: (1) MLPs with single linear output node; (2) MLPs with multiple linear output nodes; and (3) MLPs with single sigmoid output node. For the convergence proof, we show that the algorithm converges with probability one. For the objective function, we show that the corresponding objective functions of cases (1) and (2) are of the same form. They both consist of a mean square errors term, a regularizer term, and a weight decay term. For case (3), the objective function is slight different from that of cases (1) and (2). With the objective functions derived, we can compare the similarities and differences among various algorithms and various cases.
Generating a fault-tolerant global clock using high-speed control signals for the MetaNet architecture

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ofek, Y.

1994-05-01

This work describes a new technique, based on exchanging control signals between neighboring nodes, for constructing a stable and fault-tolerant global clock in a distributed system with an arbitrary topology. It is shown that it is possible to construct a global clock reference with time step that is much smaller than the propagation delay over the network's links. The synchronization algorithm ensures that the global clock tick' has a stable periodicity, and therefore, it is possible to tolerate failures of links and clocks that operate faster and/or slower than nominally specified, as well as hard failures. The approach taken inmore » this work is to generate a global clock from the ensemble of the local transmission clocks and not to directly synchronize these high-speed clocks. The steady-state algorithm, which generates the global clock, is executed in hardware by the network interface of each node. At the network interface, it is possible to measure accurately the propagation delay between neighboring nodes with a small error or uncertainty and thereby to achieve global synchronization that is proportional to these error measurements. It is shown that the local clock drift (or rate uncertainty) has only a secondary effect on the maximum global clock rate. The synchronization algorithm can tolerate any physical failure. 18 refs.« less
Cost-effective solutions to maintaining smart grid reliability

NASA Astrophysics Data System (ADS)

Qin, Qiu

As the aging power systems are increasingly working closer to the capacity and thermal limits, maintaining an sufficient reliability has been of great concern to the government agency, utility companies and users. This dissertation focuses on improving the reliability of transmission and distribution systems. Based on the wide area measurements, multiple model algorithms are developed to diagnose transmission line three-phase short to ground faults in the presence of protection misoperations. The multiple model algorithms utilize the electric network dynamics to provide prompt and reliable diagnosis outcomes. Computational complexity of the diagnosis algorithm is reduced by using a two-step heuristic. The multiple model algorithm is incorporated into a hybrid simulation framework, which consist of both continuous state simulation and discrete event simulation, to study the operation of transmission systems. With hybrid simulation, line switching strategy for enhancing the tolerance to protection misoperations is studied based on the concept of security index, which involves the faulted mode probability and stability coverage. Local measurements are used to track the generator state and faulty mode probabilities are calculated in the multiple model algorithms. FACTS devices are considered as controllers for the transmission system. The placement of FACTS devices into power systems is investigated with a criterion of maintaining a prescribed level of control reconfigurability. Control reconfigurability measures the small signal combined controllability and observability of a power system with an additional requirement on fault tolerance. For the distribution systems, a hierarchical framework, including a high level recloser allocation scheme and a low level recloser placement scheme, is presented. The impacts of recloser placement on the reliability indices is analyzed. Evaluation of reliability indices in the placement process is carried out via discrete event simulation. The reliability requirements are described with probabilities and evaluated from the empirical distributions of reliability indices.
Integrated multiple-model adaptive fault identification and reconfigurable fault-tolerant control for Lead-Wing close formation systems

NASA Astrophysics Data System (ADS)

Liu, Chun; Jiang, Bin; Zhang, Ke

2018-03-01

This paper investigates the attitude and position tracking control problem for Lead-Wing close formation systems in the presence of loss of effectiveness and lock-in-place or hardover failure. In close formation flight, Wing unmanned aerial vehicle movements are influenced by vortex effects of the neighbouring Lead unmanned aerial vehicle. This situation allows modelling of aerodynamic coupling vortex-effects and linearisation based on optimal close formation geometry. Linearised Lead-Wing close formation model is transformed into nominal robust H-infinity models with respect to Mach hold, Heading hold, and Altitude hold autopilots; static feedback H-infinity controller is designed to guarantee effective tracking of attitude and position while manoeuvring Lead unmanned aerial vehicle. Based on H-infinity control design, an integrated multiple-model adaptive fault identification and reconfigurable fault-tolerant control scheme is developed to guarantee asymptotic stability of close-loop systems, error signal boundedness, and attitude and position tracking properties. Simulation results for Lead-Wing close formation systems validate the efficiency of the proposed integrated multiple-model adaptive control algorithm.
Application of cluster technology in location-based service

NASA Astrophysics Data System (ADS)

Chen, Jing; Wang, Xiaoman; Gong, Jianya

2005-10-01

This paper introduces the principle, algorithmic and realization of the Load Balancing Technology. It also designs a clustered method in the application of Location-Based Service (LBS), and explains its function characteristics and its whole system structure, followed by some experimental comparisons, showing that the Cluster Technology could ensure a LBS's continuous running and the sharing of fault-tolerance and cluster.
About problematic peculiarities of Fault Tolerance digital regulation organization

NASA Astrophysics Data System (ADS)

Rakov, V. I.; Zakharova, O. V.

2018-05-01

The solution of problems concerning estimation of working capacity of regulation chains and possibilities of preventing situations of its violation in three directions are offered. The first direction is working out (creating) the methods of representing the regulation loop (circuit) by means of uniting (combining) diffuse components and forming algorithmic tooling for building predicates of serviceability assessment separately for the components and the for regulation loops (circuits, contours) in general. The second direction is creating methods of Fault Tolerance redundancy in the process of complex assessment of current values of control actions, closure errors and their regulated parameters. The third direction is creating methods of comparing the processes of alteration (change) of control actions, errors of closure and regulating parameters with their standard models or their surroundings. This direction allows one to develop methods and algorithmic tool means, aimed at preventing loss of serviceability and effectiveness of not only a separate digital regulator, but also the whole complex of Fault Tolerance regulation.
FTAPE: A fault injection tool to measure fault tolerance

NASA Technical Reports Server (NTRS)

Tsai, Timothy K.; Iyer, Ravishankar K.

1995-01-01

The paper introduces FTAPE (Fault Tolerance And Performance Evaluator), a tool that can be used to compare fault-tolerant computers. The tool combines system-wide fault injection with a controllable workload. A workload generator is used to create high stress conditions for the machine. Faults are injected based on this workload activity in order to ensure a high level of fault propagation. The errors/fault ratio and performance degradation are presented as measures of fault tolerance.
A Self-Stabilizing Hybrid Fault-Tolerant Synchronization Protocol

NASA Technical Reports Server (NTRS)

Malekpour, Mahyar R.

2015-01-01

This paper presents a strategy for solving the Byzantine general problem for self-stabilizing a fully connected network from an arbitrary state and in the presence of any number of faults with various severities including any number of arbitrary (Byzantine) faulty nodes. The strategy consists of two parts: first, converting Byzantine faults into symmetric faults, and second, using a proven symmetric-fault tolerant algorithm to solve the general case of the problem. A protocol (algorithm) is also present that tolerates symmetric faults, provided that there are more good nodes than faulty ones. The solution applies to realizable systems, while allowing for differences in the network elements, provided that the number of arbitrary faults is not more than a third of the network size. The only constraint on the behavior of a node is that the interactions with other nodes are restricted to defined links and interfaces. The solution does not rely on assumptions about the initial state of the system and no central clock nor centrally generated signal, pulse, or message is used. Nodes are anonymous, i.e., they do not have unique identities. A mechanical verification of a proposed protocol is also present. A bounded model of the protocol is verified using the Symbolic Model Verifier (SMV). The model checking effort is focused on verifying correctness of the bounded model of the protocol as well as confirming claims of determinism and linear convergence with respect to the self-stabilization period.
Fault-tolerant composite Householder reflection

NASA Astrophysics Data System (ADS)

Torosov, Boyan T.; Kyoseva, Elica; Vitanov, Nikolay V.

2015-07-01

We propose a fault-tolerant implementation of the quantum Householder reflection, which is a key operation in various quantum algorithms, quantum-state engineering, generation of arbitrary unitaries, and entanglement characterization. We construct this operation using the modular approach of composite pulses and a relation between the Householder reflection and the quantum phase gate. The proposed implementation is highly insensitive to variations in the experimental parameters, which makes it suitable for high-fidelity quantum information processing.
Test experience on an ultrareliable computer communication network

NASA Technical Reports Server (NTRS)

Abbott, L. W.

1984-01-01

The dispersed sensor processing mesh (DSPM) is an experimental, ultrareliable, fault-tolerant computer communications network that exhibits an organic-like ability to regenerate itself after suffering damage. The regeneration is accomplished by two routines - grow and repair. This paper discusses the DSPM concept for achieving fault tolerance and provides a brief description of the mechanization of both the experiment and the six-node experimental network. The main topic of this paper is the system performance of the growth algorithm contained in the grow routine. The characteristics imbued to DSPM by the growth algorithm are also discussed. Data from an experimental DSPM network and software simulation of larger DSPM-type networks are used to examine the inherent limitation on growth time by the growth algorithm and the relationship of growth time to network size and topology.
Special Issue on a Fault Tolerant Network on Chip Architecture

NASA Astrophysics Data System (ADS)

Janidarmian, Majid; Tinati, Melika; Khademzadeh, Ahmad; Ghavibazou, Maryam; Fekr, Atena Roshan

2010-06-01

In this paper a fast and efficient spare switch selection algorithm is presented in a reliable NoC architecture based on specific application mapped onto mesh topology called FERNA. Based on ring concept used in FERNA, this algorithm achieves best results equivalent to exhaustive algorithm with much less run time improving two parameters. Inputs of FERNA algorithm for response time of the system and extra communication cost minimization are derived from simulation of high transaction level using SystemC TLM and mathematical formulation, respectively. The results demonstrate that improvement of above mentioned parameters lead to advance whole system reliability that is analytically calculated. Mapping algorithm has been also investigated as an effective issue on extra bandwidth requirement and system reliability.
Probabilistic evaluation of on-line checks in fault-tolerant multiprocessor systems

NASA Technical Reports Server (NTRS)

Nair, V. S. S.; Hoskote, Yatin V.; Abraham, Jacob A.

1992-01-01

The analysis of fault-tolerant multiprocessor systems that use concurrent error detection (CED) schemes is much more difficult than the analysis of conventional fault-tolerant architectures. Various analytical techniques have been proposed to evaluate CED schemes deterministically. However, these approaches are based on worst-case assumptions related to the failure of system components. Often, the evaluation results do not reflect the actual fault tolerance capabilities of the system. A probabilistic approach to evaluate the fault detecting and locating capabilities of on-line checks in a system is developed. The various probabilities associated with the checking schemes are identified and used in the framework of the matrix-based model. Based on these probabilistic matrices, estimates for the fault tolerance capabilities of various systems are derived analytically.
Experimental evaluation of the certification-trail method

NASA Technical Reports Server (NTRS)

Sullivan, Gregory F.; Wilson, Dwight S.; Masson, Gerald M.; Itoh, Mamoru; Smith, Warren W.; Kay, Jonathan S.

1993-01-01

Certification trails are a recently introduced and promising approach to fault-detection and fault-tolerance. A comprehensive attempt to assess experimentally the performance and overall value of the method is reported. The method is applied to algorithms for the following problems: huffman tree, shortest path, minimum spanning tree, sorting, and convex hull. Our results reveal many cases in which an approach using certification-trails allows for significantly faster overall program execution time than a basic time redundancy-approach. Algorithms for the answer-validation problem for abstract data types were also examined. This kind of problem provides a basis for applying the certification-trail method to wide classes of algorithms. Answer-validation solutions for two types of priority queues were implemented and analyzed. In both cases, the algorithm which performs answer-validation is substantially faster than the original algorithm for computing the answer. Next, a probabilistic model and analysis which enables comparison between the certification-trail method and the time-redundancy approach were presented. The analysis reveals some substantial and sometimes surprising advantages for ther certification-trail method. Finally, the work our group performed on the design and implementation of fault injection testbeds for experimental analysis of the certification trail technique is discussed. This work employs two distinct methodologies, software fault injection (modification of instruction, data, and stack segments of programs on a Sun Sparcstation ELC and on an IBM 386 PC) and hardware fault injection (control, address, and data lines of a Motorola MC68000-based target system pulsed at logical zero/one values). Our results indicate the viability of the certification trail technique. It is also believed that the tools developed provide a solid base for additional exploration.

Fault-tolerant measurement-based quantum computing with continuous-variable cluster states.

PubMed

Menicucci, Nicolas C

2014-03-28

A long-standing open question about Gaussian continuous-variable cluster states is whether they enable fault-tolerant measurement-based quantum computation. The answer is yes. Initial squeezing in the cluster above a threshold value of 20.5 dB ensures that errors from finite squeezing acting on encoded qubits are below the fault-tolerance threshold of known qubit-based error-correcting codes. By concatenating with one of these codes and using ancilla-based error correction, fault-tolerant measurement-based quantum computation of theoretically indefinite length is possible with finitely squeezed cluster states.
Analytical sensor redundancy assessment

NASA Technical Reports Server (NTRS)

Mulcare, D. B.; Downing, L. E.; Smith, M. K.

1988-01-01

The rationale and mechanization of sensor fault tolerance based on analytical redundancy principles are described. The concept involves the substitution of software procedures, such as an observer algorithm, to supplant additional hardware components. The observer synthesizes values of sensor states in lieu of their direct measurement. Such information can then be used, for example, to determine which of two disagreeing sensors is more correct, thus enhancing sensor fault survivability. Here a stability augmentation system is used as an example application, with required modifications being made to a quadruplex digital flight control system. The impact on software structure and the resultant revalidation effort are illustrated as well. Also, the use of an observer algorithm for wind gust filtering of the angle-of-attack sensor signal is presented.
Robust fault-tolerant tracking control design for spacecraft under control input saturation.

PubMed

Bustan, Danyal; Pariz, Naser; Sani, Seyyed Kamal Hosseini

2014-07-01

In this paper, a continuous globally stable tracking control algorithm is proposed for a spacecraft in the presence of unknown actuator failure, control input saturation, uncertainty in inertial matrix and external disturbances. The design method is based on variable structure control and has the following properties: (1) fast and accurate response in the presence of bounded disturbances; (2) robust to the partial loss of actuator effectiveness; (3) explicit consideration of control input saturation; and (4) robust to uncertainty in inertial matrix. In contrast to traditional fault-tolerant control methods, the proposed controller does not require knowledge of the actuator faults and is implemented without explicit fault detection and isolation processes. In the proposed controller a single parameter is adjusted dynamically in such a way that it is possible to prove that both attitude and angular velocity errors will tend to zero asymptotically. The stability proof is based on a Lyapunov analysis and the properties of the singularity free quaternion representation of spacecraft dynamics. Results of numerical simulations state that the proposed controller is successful in achieving high attitude performance in the presence of external disturbances, actuator failures, and control input saturation. Copyright © 2014 ISA. Published by Elsevier Ltd. All rights reserved.
Design and Implementation of a Distributed Version of the NASA Engine Performance Program

NASA Technical Reports Server (NTRS)

Cours, Jeffrey T.

1994-01-01

Distributed NEPP is a new version of the NASA Engine Performance Program that runs in parallel on a collection of Unix workstations connected through a network. The program is fault-tolerant, efficient, and shows significant speed-up in a multi-user, heterogeneous environment. This report describes the issues involved in designing distributed NEPP, the algorithms the program uses, and the performance distributed NEPP achieves. It develops an analytical model to predict and measure the performance of the simple distribution, multiple distribution, and fault-tolerant distribution algorithms that distributed NEPP incorporates. Finally, the appendices explain how to use distributed NEPP and document the organization of the program's source code.
Error rates and resource overheads of encoded three-qubit gates

NASA Astrophysics Data System (ADS)

Takagi, Ryuji; Yoder, Theodore J.; Chuang, Isaac L.

2017-10-01

A non-Clifford gate is required for universal quantum computation, and, typically, this is the most error-prone and resource-intensive logical operation on an error-correcting code. Small, single-qubit rotations are popular choices for this non-Clifford gate, but certain three-qubit gates, such as Toffoli or controlled-controlled-Z (ccz), are equivalent options that are also more suited for implementing some quantum algorithms, for instance, those with coherent classical subroutines. Here, we calculate error rates and resource overheads for implementing logical ccz with pieceable fault tolerance, a nontransversal method for implementing logical gates. We provide a comparison with a nonlocal magic-state scheme on a concatenated code and a local magic-state scheme on the surface code. We find the pieceable fault-tolerance scheme particularly advantaged over magic states on concatenated codes and in certain regimes over magic states on the surface code. Our results suggest that pieceable fault tolerance is a promising candidate for fault tolerance in a near-future quantum computer.
The Design of a Fault-Tolerant COTS-Based Bus Architecture for Space Applications

NASA Technical Reports Server (NTRS)

Chau, Savio N.; Alkalai, Leon; Tai, Ann T.

2000-01-01

The high-performance, scalability and miniaturization requirements together with the power, mass and cost constraints mandate the use of commercial-off-the-shelf (COTS) components and standards in the X2000 avionics system architecture for deep-space missions. In this paper, we report our experiences and findings on the design of an IEEE 1394 compliant fault-tolerant COTS-based bus architecture. While the COTS standard IEEE 1394 adequately supports power management, high performance and scalability, its topological criteria impose restrictions on fault tolerance realization. To circumvent the difficulties, we derive a "stack-tree" topology that not only complies with the IEEE 1394 standard but also facilitates fault tolerance realization in a spaceborne system with limited dedicated resource redundancies. Moreover, by exploiting pertinent standard features of the 1394 interface which are not purposely designed for fault tolerance, we devise a comprehensive set of fault detection mechanisms to support the fault-tolerant bus architecture.
Managing Network Partitions in Structured P2P Networks

NASA Astrophysics Data System (ADS)

Shafaat, Tallat M.; Ghodsi, Ali; Haridi, Seif

Structured overlay networks form a major class of peer-to-peer systems, which are touted for their abilities to scale, tolerate failures, and self-manage. Any long-lived Internet-scale distributed system is destined to face network partitions. Consequently, the problem of network partitions and mergers is highly related to fault-tolerance and self-management in large-scale systems. This makes it a crucial requirement for building any structured peer-to-peer systems to be resilient to network partitions. Although the problem of network partitions and mergers is highly related to fault-tolerance and self-management in large-scale systems, it has hardly been studied in the context of structured peer-to-peer systems. Structured overlays have mainly been studied under churn (frequent joins/failures), which as a side effect solves the problem of network partitions, as it is similar to massive node failures. Yet, the crucial aspect of network mergers has been ignored. In fact, it has been claimed that ring-based structured overlay networks, which constitute the majority of the structured overlays, are intrinsically ill-suited for merging rings. In this chapter, we motivate the problem of network partitions and mergers in structured overlays. We discuss how a structured overlay can automatically detect a network partition and merger. We present an algorithm for merging multiple similar ring-based overlays when the underlying network merges. We examine the solution in dynamic conditions, showing how our solution is resilient to churn during the merger, something widely believed to be difficult or impossible. We evaluate the algorithm for various scenarios and show that even when falsely detecting a merger, the algorithm quickly terminates and does not clutter the network with many messages. The algorithm is flexible as the tradeoff between message complexity and time complexity can be adjusted by a parameter.
A set-associative, fault-tolerant cache design

NASA Technical Reports Server (NTRS)

Lamet, Dan; Frenzel, James F.

1992-01-01

The design of a defect-tolerant control circuit for a set-associative cache memory is presented. The circuit maintains the stack ordering necessary for implementing the Least Recently Used (LRU) replacement algorithm. A discussion of programming techniques for bypassing defective blocks is included.
Test experience on an ultrareliable computer communication network

NASA Technical Reports Server (NTRS)

Abbott, L. W.

1984-01-01

The dispersed sensor processing mesh (DSPM) is an experimental, ultra-reliable, fault-tolerant computer communications network that exhibits an organic-like ability to regenerate itself after suffering damage. The regeneration is accomplished by two routines - grow and repair. This paper discusses the DSPM concept for achieving fault tolerance and provides a brief description of the mechanization of both the experiment and the six-node experimental network. The main topic of this paper is the system performance of the growth algorithm contained in the grow routine. The characteristics imbued to DSPM by the growth algorithm are also discussed. Data from an experimental DSPM network and software simulation of larger DSPM-type networks are used to examine the inherent limitation on growth time by the growth algorithm and the relationship of growth time to network size and topology.
Distributed Evaluation Functions for Fault Tolerant Multi-Rover Systems

NASA Technical Reports Server (NTRS)

Agogino, Adrian; Turner, Kagan

2005-01-01

The ability to evolve fault tolerant control strategies for large collections of agents is critical to the successful application of evolutionary strategies to domains where failures are common. Furthermore, while evolutionary algorithms have been highly successful in discovering single-agent control strategies, extending such algorithms to multiagent domains has proven to be difficult. In this paper we present a method for shaping evaluation functions for agents that provide control strategies that both are tolerant to different types of failures and lead to coordinated behavior in a multi-agent setting. This method neither relies of a centralized strategy (susceptible to single point of failures) nor a distributed strategy where each agent uses a system wide evaluation function (severe credit assignment problem). In a multi-rover problem, we show that agents using our agent-specific evaluation perform up to 500% better than agents using the system evaluation. In addition we show that agents are still able to maintain a high level of performance when up to 60% of the agents fail due to actuator, communication or controller faults.
Method and system for environmentally adaptive fault tolerant computing

NASA Technical Reports Server (NTRS)

Copenhaver, Jason L. (Inventor); Jeremy, Ramos (Inventor); Wolfe, Jeffrey M. (Inventor); Brenner, Dean (Inventor)

2010-01-01

A method and system for adapting fault tolerant computing. The method includes the steps of measuring an environmental condition representative of an environment. An on-board processing system's sensitivity to the measured environmental condition is measured. It is determined whether to reconfigure a fault tolerance of the on-board processing system based in part on the measured environmental condition. The fault tolerance of the on-board processing system may be reconfigured based in part on the measured environmental condition.
Optimal Fault-Tolerant Control for Discrete-Time Nonlinear Strict-Feedback Systems Based on Adaptive Critic Design.

PubMed

Wang, Zhanshan; Liu, Lei; Wu, Yanming; Zhang, Huaguang

2018-06-01

This paper investigates the problem of optimal fault-tolerant control (FTC) for a class of unknown nonlinear discrete-time systems with actuator fault in the framework of adaptive critic design (ACD). A pivotal highlight is the adaptive auxiliary signal of the actuator fault, which is designed to offset the effect of the fault. The considered systems are in strict-feedback forms and involve unknown nonlinear functions, which will result in the causal problem. To solve this problem, the original nonlinear systems are transformed into a novel system by employing the diffeomorphism theory. Besides, the action neural networks (ANNs) are utilized to approximate a predefined unknown function in the backstepping design procedure. Combined the strategic utility function and the ACD technique, a reinforcement learning algorithm is proposed to set up an optimal FTC, in which the critic neural networks (CNNs) provide an approximate structure of the cost function. In this case, it not only guarantees the stability of the systems, but also achieves the optimal control performance as well. In the end, two simulation examples are used to show the effectiveness of the proposed optimal FTC strategy.
Analysis of typical fault-tolerant architectures using HARP

NASA Technical Reports Server (NTRS)

Bavuso, Salvatore J.; Bechta Dugan, Joanne; Trivedi, Kishor S.; Rothmann, Elizabeth M.; Smith, W. Earl

1987-01-01

Difficulties encountered in the modeling of fault-tolerant systems are discussed. The Hybrid Automated Reliability Predictor (HARP) approach to modeling fault-tolerant systems is described. The HARP is written in FORTRAN, consists of nearly 30,000 lines of codes and comments, and is based on behavioral decomposition. Using the behavioral decomposition, the dependability model is divided into fault-occurrence/repair and fault/error-handling models; the characteristics and combining of these two models are examined. Examples in which the HARP is applied to the modeling of some typical fault-tolerant systems, including a local-area network, two fault-tolerant computer systems, and a flight control system, are presented.
Locating and decoding barcodes in fuzzy images captured by smart phones

NASA Astrophysics Data System (ADS)

Deng, Wupeng; Hu, Jiwei; Liu, Quan; Lou, Ping

2017-07-01

With the development of barcodes for commercial use, people's requirements for detecting barcodes by smart phone become increasingly pressing. The low quality of barcode image captured by mobile phone always affects the decoding and recognition rates. This paper focuses on locating and decoding EAN-13 barcodes in fuzzy images. We present a more accurate locating algorithm based on segment length and high fault-tolerant rate algorithm for decoding barcodes. Unlike existing approaches, location algorithm is based on the edge segment length of EAN -13 barcodes, while our decoding algorithm allows the appearance of fuzzy region in barcode image. Experimental results are performed on damaged, contaminated and scratched digital images, and provide a quite promising result for EAN -13 barcode location and decoding.
A fault-tolerant intelligent robotic control system

NASA Technical Reports Server (NTRS)

Marzwell, Neville I.; Tso, Kam Sing

1993-01-01

This paper describes the concept, design, and features of a fault-tolerant intelligent robotic control system being developed for space and commercial applications that require high dependability. The comprehensive strategy integrates system level hardware/software fault tolerance with task level handling of uncertainties and unexpected events for robotic control. The underlying architecture for system level fault tolerance is the distributed recovery block which protects against application software, system software, hardware, and network failures. Task level fault tolerance provisions are implemented in a knowledge-based system which utilizes advanced automation techniques such as rule-based and model-based reasoning to monitor, diagnose, and recover from unexpected events. The two level design provides tolerance of two or more faults occurring serially at any level of command, control, sensing, or actuation. The potential benefits of such a fault tolerant robotic control system include: (1) a minimized potential for damage to humans, the work site, and the robot itself; (2) continuous operation with a minimum of uncommanded motion in the presence of failures; and (3) more reliable autonomous operation providing increased efficiency in the execution of robotic tasks and decreased demand on human operators for controlling and monitoring the robotic servicing routines.
A fault-tolerant strategy based on SMC for current-controlled converters

NASA Astrophysics Data System (ADS)

Azer, Peter M.; Marei, Mostafa I.; Sattar, Ahmed A.

2018-05-01

The sliding mode control (SMC) is used to control variable structure systems such as power electronics converters. This paper presents a fault-tolerant strategy based on the SMC for current-controlled AC-DC converters. The proposed SMC is based on three sliding surfaces for the three legs of the AC-DC converter. Two sliding surfaces are assigned to control the phase currents since the input three-phase currents are balanced. Hence, the third sliding surface is considered as an extra degree of freedom which is utilised to control the neutral voltage. This action is utilised to enhance the performance of the converter during open-switch faults. The proposed fault-tolerant strategy is based on allocating the sliding surface of the faulty leg to control the neutral voltage. Consequently, the current waveform is improved. The behaviour of the current-controlled converter during different types of open-switch faults is analysed. Double switch faults include three cases: two upper switch fault; upper and lower switch fault at different legs; and two switches of the same leg. The dynamic performance of the proposed system is evaluated during healthy and open-switch fault operations. Simulation results exhibit the various merits of the proposed SMC-based fault-tolerant strategy.
Analysis of a hardware and software fault tolerant processor for critical applications

NASA Technical Reports Server (NTRS)

Dugan, Joanne B.

1993-01-01

Computer systems for critical applications must be designed to tolerate software faults as well as hardware faults. A unified approach to tolerating hardware and software faults is characterized by classifying faults in terms of duration (transient or permanent) rather than source (hardware or software). Errors arising from transient faults can be handled through masking or voting, but errors arising from permanent faults require system reconfiguration to bypass the failed component. Most errors which are caused by software faults can be considered transient, in that they are input-dependent. Software faults are triggered by a particular set of inputs. Quantitative dependability analysis of systems which exhibit a unified approach to fault tolerance can be performed by a hierarchical combination of fault tree and Markov models. A methodology for analyzing hardware and software fault tolerant systems is applied to the analysis of a hypothetical system, loosely based on the Fault Tolerant Parallel Processor. The models consider both transient and permanent faults, hardware and software faults, independent and related software faults, automatic recovery, and reconfiguration.
QCCM Center for Quantum Algorithms

DTIC Science & Technology

2008-10-17

algorithms (e.g., quantum walks and adiabatic computing ), as well as theoretical advances relating algorithms to physical implementations (e.g...Park, NC 27709-2211 15. SUBJECT TERMS Quantum algorithms, quantum computing , fault-tolerant error correction Richard Cleve MITACS East Academic...0511200 Algebraic results on quantum automata A. Ambainis, M. Beaudry, M. Golovkins, A. Kikusts, M. Mercer, D. Thrien Theory of Computing Systems 39(2006
A programmable two-qubit quantum processor in silicon.

PubMed

Watson, T F; Philips, S G J; Kawakami, E; Ward, D R; Scarlino, P; Veldhorst, M; Savage, D E; Lagally, M G; Friesen, Mark; Coppersmith, S N; Eriksson, M A; Vandersypen, L M K

2018-03-29

Now that it is possible to achieve measurement and control fidelities for individual quantum bits (qubits) above the threshold for fault tolerance, attention is moving towards the difficult task of scaling up the number of physical qubits to the large numbers that are needed for fault-tolerant quantum computing. In this context, quantum-dot-based spin qubits could have substantial advantages over other types of qubit owing to their potential for all-electrical operation and ability to be integrated at high density onto an industrial platform. Initialization, readout and single- and two-qubit gates have been demonstrated in various quantum-dot-based qubit representations. However, as seen with small-scale demonstrations of quantum computers using other types of qubit, combining these elements leads to challenges related to qubit crosstalk, state leakage, calibration and control hardware. Here we overcome these challenges by using carefully designed control techniques to demonstrate a programmable two-qubit quantum processor in a silicon device that can perform the Deutsch-Josza algorithm and the Grover search algorithm-canonical examples of quantum algorithms that outperform their classical analogues. We characterize the entanglement in our processor by using quantum-state tomography of Bell states, measuring state fidelities of 85-89 per cent and concurrences of 73-82 per cent. These results pave the way for larger-scale quantum computers that use spins confined to quantum dots.
A distributed fault-tolerant signal processor /FTSP/

NASA Astrophysics Data System (ADS)

Bonneau, R. J.; Evett, R. C.; Young, M. J.

1980-01-01

A digital fault-tolerant signal processor (FTSP), an example of a self-repairing programmable system is analyzed. The design configuration is discussed in terms of fault tolerance, system-level fault detection, isolation and common memory. Special attention is given to the FDIR (fault detection isolation and reconfiguration) logic, noting that the reconfiguration decisions are based on configuration, summary status, end-around tests, and north marker/synchro data. Several mechanisms of fault detection are described which initiate reconfiguration at different levels. It is concluded that the reliability of a signal processor can be significantly enhanced by the use of fault-tolerant techniques.

Chance of Vulnerability Reduction in Application-Specific NoC through Distance Aware Mapping Algorithm

NASA Astrophysics Data System (ADS)

Janidarmian, Majid; Fekr, Atena Roshan; Bokharaei, Vahhab Samadi

2011-08-01

Mapping algorithm which means which core should be linked to which router is one of the key issues in the design flow of network-on-chip. To achieve an application-specific NoC design procedure that minimizes the communication cost and improves the fault tolerant property, first a heuristic mapping algorithm that produces a set of different mappings in a reasonable time is presented. This algorithm allows the designers to identify the set of most promising solutions in a large design space, which has low communication costs while yielding optimum communication costs in some cases. Another evaluated parameter, vulnerability index, is then considered as a principle of estimating the fault-tolerance property in all produced mappings. Finally, in order to yield a mapping which considers trade-offs between these two parameters, a linear function is defined and introduced. It is also observed that more flexibility to prioritize solutions within the design space is possible by adjusting a set of if-then rules in fuzzy logic.
A survey of an introduction to fault diagnosis algorithms

NASA Technical Reports Server (NTRS)

Mathur, F. P.

1972-01-01

This report surveys the field of diagnosis and introduces some of the key algorithms and heuristics currently in use. Fault diagnosis is an important and a rapidly growing discipline. This is important in the design of self-repairable computers because the present diagnosis resolution of its fault-tolerant computer is limited to a functional unit or processor. Better resolution is necessary before failed units can become partially reuseable. The approach that holds the greatest promise is that of resident microdiagnostics; however, that presupposes a microprogrammable architecture for the computer being self-diagnosed. The presentation is tutorial and contains examples. An extensive bibliography of some 220 entries is included.
Decentralized Sliding Mode Observer Based Dual Closed-Loop Fault Tolerant Control for Reconfigurable Manipulator against Actuator Failure.

PubMed

Zhao, Bo; Li, Chenghao; Liu, Derong; Li, Yuanchun

2015-01-01

This paper considers a decentralized fault tolerant control (DFTC) scheme for reconfigurable manipulators. With the appearance of norm-bounded failure, a dual closed-loop trajectory tracking control algorithm is proposed on the basis of the Lyapunov stability theory. Characterized by the modularization property, the actuator failure is estimated by the proposed decentralized sliding mode observer (DSMO). Moreover, the actuator failure can be treated in view of the local joint information, so its control performance degradation is independent of other normal joints. In addition, the presented DFTC scheme is significantly simplified in terms of the structure of the controller due to its dual closed-loop architecture, and its feasibility is highly reflected in the control of reconfigurable manipulators. Finally, the effectiveness of the proposed DFTC scheme is demonstrated using simulations.
Decentralized Sliding Mode Observer Based Dual Closed-Loop Fault Tolerant Control for Reconfigurable Manipulator against Actuator Failure

PubMed Central

Zhao, Bo; Li, Yuanchun

2015-01-01

This paper considers a decentralized fault tolerant control (DFTC) scheme for reconfigurable manipulators. With the appearance of norm-bounded failure, a dual closed-loop trajectory tracking control algorithm is proposed on the basis of the Lyapunov stability theory. Characterized by the modularization property, the actuator failure is estimated by the proposed decentralized sliding mode observer (DSMO). Moreover, the actuator failure can be treated in view of the local joint information, so its control performance degradation is independent of other normal joints. In addition, the presented DFTC scheme is significantly simplified in terms of the structure of the controller due to its dual closed-loop architecture, and its feasibility is highly reflected in the control of reconfigurable manipulators. Finally, the effectiveness of the proposed DFTC scheme is demonstrated using simulations. PMID:26181826
A Note on Inconsistent Axioms in Rushby's Systematic Formal Verification for Fault-Tolerant Time-Triggered Algorithms

NASA Technical Reports Server (NTRS)

Pike, Lee

2005-01-01

I describe some inconsistencies in John Rushby s axiomatization of time-triggered algorithms that he presents in these transactions and that he formally specifies and verifies in a mechanical theorem-prover. I also present corrections for these inconsistencies.
Critical fault patterns determination in fault-tolerant computer systems

NASA Technical Reports Server (NTRS)

Mccluskey, E. J.; Losq, J.

1978-01-01

The method proposed tries to enumerate all the critical fault-patterns (successive occurrences of failures) without analyzing every single possible fault. The conditions for the system to be operating in a given mode can be expressed in terms of the static states. Thus, one can find all the system states that correspond to a given critical mode of operation. The next step consists in analyzing the fault-detection mechanisms, the diagnosis algorithm and the process of switch control. From them, one can find all the possible system configurations that can result from a failure occurrence. Thus, one can list all the characteristics, with respect to detection, diagnosis, and switch control, that failures must have to constitute critical fault-patterns. Such an enumeration of the critical fault-patterns can be directly used to evaluate the overall system tolerance to failures. Present research is focused on how to efficiently make use of these system-level characteristics to enumerate all the failures that verify these characteristics.
High Speed, High Temperature, Fault Tolerant Operation of a Combination Magnetic-Hydrostatic Bearing Rotor Support System for Turbomachinery

NASA Technical Reports Server (NTRS)

Jansen, Mark; Montague, Gerald; Provenza, Andrew; Palazzolo, Alan

2004-01-01

Closed loop operation of a single, high temperature magnetic radial bearing to 30,000 RPM (2.25 million DN) and 540 C (1000 F) is discussed. Also, high temperature, fault tolerant operation for the three axis system is examined. A novel, hydrostatic backup bearing system was employed to attain high speed, high temperature, lubrication free support of the entire rotor system. The hydrostatic bearings were made of a high lubricity material and acted as journal-type backup bearings. New, high temperature displacement sensors were successfully employed to monitor shaft position throughout the entire temperature range and are described in this paper. Control of the system was accomplished through a stand alone, high speed computer controller and it was used to run both the fault-tolerant PID and active vibration control algorithms.
Design of on-board Bluetooth wireless network system based on fault-tolerant technology

NASA Astrophysics Data System (ADS)

You, Zheng; Zhang, Xiangqi; Yu, Shijie; Tian, Hexiang

2007-11-01

In this paper, the Bluetooth wireless data transmission technology is applied in on-board computer system, to realize wireless data transmission between peripherals of the micro-satellite integrating electronic system, and in view of the high demand of reliability of a micro-satellite, a design of Bluetooth wireless network based on fault-tolerant technology is introduced. The reliability of two fault-tolerant systems is estimated firstly using Markov model, then the structural design of this fault-tolerant system is introduced; several protocols are established to make the system operate correctly, some related problems are listed and analyzed, with emphasis on Fault Auto-diagnosis System, Active-standby switch design and Data-Integrity process.
Algorithms and Libraries

NASA Technical Reports Server (NTRS)

Dongarra, Jack

1998-01-01

This exploratory study initiated our inquiry into algorithms and applications that would benefit by latency tolerant approach to algorithm building, including the construction of new algorithms where appropriate. In a multithreaded execution, when a processor reaches a point where remote memory access is necessary, the request is sent out on the network and a context--switch occurs to a new thread of computation. This effectively masks a long and unpredictable latency due to remote loads, thereby providing tolerance to remote access latency. We began to develop standards to profile various algorithm and application parameters, such as the degree of parallelism, granularity, precision, instruction set mix, interprocessor communication, latency etc. These tools will continue to develop and evolve as the Information Power Grid environment matures. To provide a richer context for this research, the project also focused on issues of fault-tolerance and computation migration of numerical algorithms and software. During the initial phase we tried to increase our understanding of the bottlenecks in single processor performance. Our work began by developing an approach for the automatic generation and optimization of numerical software for processors with deep memory hierarchies and pipelined functional units. Based on the results we achieved in this study we are planning to study other architectures of interest, including development of cost models, and developing code generators appropriate to these architectures.
An experimental investigation of fault tolerant software structures in an avionics application

NASA Technical Reports Server (NTRS)

Caglayan, Alper K.; Eckhardt, Dave E., Jr.

1989-01-01

The objective of this experimental investigation is to compare the functional performance and software reliability of competing fault tolerant software structures utilizing software diversity. In this experiment, three versions of the redundancy management software for a skewed sensor array have been developed using three diverse failure detection and isolation algorithms and incorporated into various N-version, recovery block and hybrid software structures. The empirical results show that, for maximum functional performance improvement in the selected application domain, the results of diverse algorithms should be voted before being processed by multiple versions without enforced diversity. Results also suggest that when the reliability gain with an N-version structure is modest, recovery block structures are more feasible since higher reliability can be obtained using an acceptance check with a modest reliability.
Health management and controls for Earth-to-orbit propulsion systems

NASA Astrophysics Data System (ADS)

Bickford, R. L.

1995-03-01

Avionics and health management technologies increase the safety and reliability while decreasing the overall cost for Earth-to-orbit (ETO) propulsion systems. New ETO propulsion systems will depend on highly reliable fault tolerant flight avionics, advanced sensing systems and artificial intelligence aided software to ensure critical control, safety and maintenance requirements are met in a cost effective manner. Propulsion avionics consist of the engine controller, actuators, sensors, software and ground support elements. In addition to control and safety functions, these elements perform system monitoring for health management. Health management is enhanced by advanced sensing systems and algorithms which provide automated fault detection and enable adaptive control and/or maintenance approaches. Aerojet is developing advanced fault tolerant rocket engine controllers which provide very high levels of reliability. Smart sensors and software systems which significantly enhance fault coverage and enable automated operations are also under development. Smart sensing systems, such as flight capable plume spectrometers, have reached maturity in ground-based applications and are suitable for bridging to flight. Software to detect failed sensors has reached similar maturity. This paper will discuss fault detection and isolation for advanced rocket engine controllers as well as examples of advanced sensing systems and software which significantly improve component failure detection for engine system safety and health management.
Fault Location Based on Synchronized Measurements: A Comprehensive Survey

PubMed Central

Al-Mohammed, A. H.; Abido, M. A.

2014-01-01

This paper presents a comprehensive survey on transmission and distribution fault location algorithms that utilize synchronized measurements. Algorithms based on two-end synchronized measurements and fault location algorithms on three-terminal and multiterminal lines are reviewed. Series capacitors equipped with metal oxide varistors (MOVs), when set on a transmission line, create certain problems for line fault locators and, therefore, fault location on series-compensated lines is discussed. The paper reports the work carried out on adaptive fault location algorithms aiming at achieving better fault location accuracy. Work associated with fault location on power system networks, although limited, is also summarized. Additionally, the nonstandard high-frequency-related fault location techniques based on wavelet transform are discussed. Finally, the paper highlights the area for future research. PMID:24701191
AUV Positioning Method Based on Tightly Coupled SINS/LBL for Underwater Acoustic Multipath Propagation.

PubMed

Zhang, Tao; Shi, Hongfei; Chen, Liping; Li, Yao; Tong, Jinwu

2016-03-11

This paper researches an AUV (Autonomous Underwater Vehicle) positioning method based on SINS (Strapdown Inertial Navigation System)/LBL (Long Base Line) tightly coupled algorithm. This algorithm mainly includes SINS-assisted searching method of optimum slant-range of underwater acoustic propagation multipath, SINS/LBL tightly coupled model and multi-sensor information fusion algorithm. Fuzzy correlation peak problem of underwater LBL acoustic propagation multipath could be solved based on SINS positional information, thus improving LBL positional accuracy. Moreover, introduction of SINS-centered LBL locating information could compensate accumulative AUV position error effectively and regularly. Compared to loosely coupled algorithm, this tightly coupled algorithm can still provide accurate location information when there are fewer than four available hydrophones (or within the signal receiving range). Therefore, effective positional calibration area of tightly coupled system based on LBL array is wider and has higher reliability and fault tolerance than loosely coupled. It is more applicable to AUV positioning based on SINS/LBL.
AUV Positioning Method Based on Tightly Coupled SINS/LBL for Underwater Acoustic Multipath Propagation

PubMed Central

Zhang, Tao; Shi, Hongfei; Chen, Liping; Li, Yao; Tong, Jinwu

2016-01-01

This paper researches an AUV (Autonomous Underwater Vehicle) positioning method based on SINS (Strapdown Inertial Navigation System)/LBL (Long Base Line) tightly coupled algorithm. This algorithm mainly includes SINS-assisted searching method of optimum slant-range of underwater acoustic propagation multipath, SINS/LBL tightly coupled model and multi-sensor information fusion algorithm. Fuzzy correlation peak problem of underwater LBL acoustic propagation multipath could be solved based on SINS positional information, thus improving LBL positional accuracy. Moreover, introduction of SINS-centered LBL locating information could compensate accumulative AUV position error effectively and regularly. Compared to loosely coupled algorithm, this tightly coupled algorithm can still provide accurate location information when there are fewer than four available hydrophones (or within the signal receiving range). Therefore, effective positional calibration area of tightly coupled system based on LBL array is wider and has higher reliability and fault tolerance than loosely coupled. It is more applicable to AUV positioning based on SINS/LBL. PMID:26978361
Applications of an architecture design and assessment system (ADAS)

NASA Technical Reports Server (NTRS)

Gray, F. Gail; Debrunner, Linda S.; White, Tennis S.

1988-01-01

A new Architecture Design and Assessment System (ADAS) tool package is introduced, and a range of possible applications is illustrated. ADAS was used to evaluate the performance of an advanced fault-tolerant computer architecture in a modern flight control application. Bottlenecks were identified and possible solutions suggested. The tool was also used to inject faults into the architecture and evaluate the synchronization algorithm, and improvements are suggested. Finally, ADAS was used as a front end research tool to aid in the design of reconfiguration algorithms in a distributed array architecture.
Investigation, Development, and Evaluation of Performance Proving for Fault-tolerant Computers

NASA Technical Reports Server (NTRS)

Levitt, K. N.; Schwartz, R.; Hare, D.; Moore, J. S.; Melliar-Smith, P. M.; Shostak, R. E.; Boyer, R. S.; Green, M. W.; Elliott, W. D.

1983-01-01

A number of methodologies for verifying systems and computer based tools that assist users in verifying their systems were developed. These tools were applied to verify in part the SIFT ultrareliable aircraft computer. Topics covered included: STP theorem prover; design verification of SIFT; high level language code verification; assembly language level verification; numerical algorithm verification; verification of flight control programs; and verification of hardware logic.
Study of fault tolerant software technology for dynamic systems

NASA Technical Reports Server (NTRS)

Caglayan, A. K.; Zacharias, G. L.

1985-01-01

The major aim of this study is to investigate the feasibility of using systems-based failure detection isolation and compensation (FDIC) techniques in building fault-tolerant software and extending them, whenever possible, to the domain of software fault tolerance. First, it is shown that systems-based FDIC methods can be extended to develop software error detection techniques by using system models for software modules. In particular, it is demonstrated that systems-based FDIC techniques can yield consistency checks that are easier to implement than acceptance tests based on software specifications. Next, it is shown that systems-based failure compensation techniques can be generalized to the domain of software fault tolerance in developing software error recovery procedures. Finally, the feasibility of using fault-tolerant software in flight software is investigated. In particular, possible system and version instabilities, and functional performance degradation that may occur in N-Version programming applications to flight software are illustrated. Finally, a comparative analysis of N-Version and recovery block techniques in the context of generic blocks in flight software is presented.
Fault-tolerant clock synchronization validation methodology. [in computer systems

NASA Technical Reports Server (NTRS)

Butler, Ricky W.; Palumbo, Daniel L.; Johnson, Sally C.

1987-01-01

A validation method for the synchronization subsystem of a fault-tolerant computer system is presented. The high reliability requirement of flight-crucial systems precludes the use of most traditional validation methods. The method presented utilizes formal design proof to uncover design and coding errors and experimentation to validate the assumptions of the design proof. The experimental method is described and illustrated by validating the clock synchronization system of the Software Implemented Fault Tolerance computer. The design proof of the algorithm includes a theorem that defines the maximum skew between any two nonfaulty clocks in the system in terms of specific system parameters. Most of these parameters are deterministic. One crucial parameter is the upper bound on the clock read error, which is stochastic. The probability that this upper bound is exceeded is calculated from data obtained by the measurement of system parameters. This probability is then included in a detailed reliability analysis of the system.
A Parameter Communication Optimization Strategy for Distributed Machine Learning in Sensors.

PubMed

Zhang, Jilin; Tu, Hangdi; Ren, Yongjian; Wan, Jian; Zhou, Li; Li, Mingwei; Wang, Jue; Yu, Lifeng; Zhao, Chang; Zhang, Lei

2017-09-21

In order to utilize the distributed characteristic of sensors, distributed machine learning has become the mainstream approach, but the different computing capability of sensors and network delays greatly influence the accuracy and the convergence rate of the machine learning model. Our paper describes a reasonable parameter communication optimization strategy to balance the training overhead and the communication overhead. We extend the fault tolerance of iterative-convergent machine learning algorithms and propose the Dynamic Finite Fault Tolerance (DFFT). Based on the DFFT, we implement a parameter communication optimization strategy for distributed machine learning, named Dynamic Synchronous Parallel Strategy (DSP), which uses the performance monitoring model to dynamically adjust the parameter synchronization strategy between worker nodes and the Parameter Server (PS). This strategy makes full use of the computing power of each sensor, ensures the accuracy of the machine learning model, and avoids the situation that the model training is disturbed by any tasks unrelated to the sensors.
Is the Multigrid Method Fault Tolerant? The Two-Grid Case

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ainsworth, Mark; Glusa, Christian

2016-06-30

The predicted reduced resiliency of next-generation high performance computers means that it will become necessary to take into account the effects of randomly occurring faults on numerical methods. Further, in the event of a hard fault occurring, a decision has to be made as to what remedial action should be taken in order to resume the execution of the algorithm. The action that is chosen can have a dramatic effect on the performance and characteristics of the scheme. Ideally, the resulting algorithm should be subjected to the same kind of mathematical analysis that was applied to the original, deterministic variant.more » The purpose of this work is to provide an analysis of the behaviour of the multigrid algorithm in the presence of faults. Multigrid is arguably the method of choice for the solution of large-scale linear algebra problems arising from discretization of partial differential equations and it is of considerable importance to anticipate its behaviour on an exascale machine. The analysis of resilience of algorithms is in its infancy and the current work is perhaps the first to provide a mathematical model for faults and analyse the behaviour of a state-of-the-art algorithm under the model. It is shown that the Two Grid Method fails to be resilient to faults. Attention is then turned to identifying the minimal necessary remedial action required to restore the rate of convergence to that enjoyed by the ideal fault-free method.« less

Eigenstructure Assignment for Fault Tolerant Flight Control Design

NASA Technical Reports Server (NTRS)

Sobel, Kenneth; Joshi, Suresh (Technical Monitor)

2002-01-01

In recent years, fault tolerant flight control systems have gained an increased interest for high performance military aircraft as well as civil aircraft. Fault tolerant control systems can be described as either active or passive. An active fault tolerant control system has to either reconfigure or adapt the controller in response to a failure. One approach is to reconfigure the controller based upon detection and identification of the failure. Another approach is to use direct adaptive control to adjust the controller without explicitly identifying the failure. In contrast, a passive fault tolerant control system uses a fixed controller which achieves acceptable performance for a presumed set of failures. We have obtained a passive fault tolerant flight control law for the F/A-18 aircraft which achieves acceptable handling qualities for a class of control surface failures. The class of failures includes the symmetric failure of any one control surface being stuck at its trim value. A comparison was made of an eigenstructure assignment gain designed for the unfailed aircraft with a fault tolerant multiobjective optimization gain. We have shown that time responses for the unfailed aircraft using the eigenstructure assignment gain and the fault tolerant gain are identical. Furthermore, the fault tolerant gain achieves MIL-F-8785C specifications for all failure conditions.
VLSI Implementation of Fault Tolerance Multiplier based on Reversible Logic Gate

NASA Astrophysics Data System (ADS)

Ahmad, Nabihah; Hakimi Mokhtar, Ahmad; Othman, Nurmiza binti; Fhong Soon, Chin; Rahman, Ab Al Hadi Ab

2017-08-01

Multiplier is one of the essential component in the digital world such as in digital signal processing, microprocessor, quantum computing and widely used in arithmetic unit. Due to the complexity of the multiplier, tendency of errors are very high. This paper aimed to design a 2×2 bit Fault Tolerance Multiplier based on Reversible logic gate with low power consumption and high performance. This design have been implemented using 90nm Complemetary Metal Oxide Semiconductor (CMOS) technology in Synopsys Electronic Design Automation (EDA) Tools. Implementation of the multiplier architecture is by using the reversible logic gates. The fault tolerance multiplier used the combination of three reversible logic gate which are Double Feynman gate (F2G), New Fault Tolerance (NFT) gate and Islam Gate (IG) with the area of 160μm x 420.3μm (67.25 mm2). This design achieved a low power consumption of 122.85μW and propagation delay of 16.99ns. The fault tolerance multiplier proposed achieved a low power consumption and high performance which suitable for application of modern computing as it has a fault tolerance capabilities.
Design and Implementation of Replicated Object Layer

NASA Technical Reports Server (NTRS)

Koka, Sudhir

1996-01-01

One of the widely used techniques for construction of fault tolerant applications is the replication of resources so that if one copy fails sufficient copies may still remain operational to allow the application to continue to function. This thesis involves the design and implementation of an object oriented framework for replicating data on multiple sites and across different platforms. Our approach, called the Replicated Object Layer (ROL) provides a mechanism for consistent replication of data over dynamic networks. ROL uses the Reliable Multicast Protocol (RMP) as a communication protocol that provides for reliable delivery, serialization and fault tolerance. Besides providing type registration, this layer facilitates distributed atomic transactions on replicated data. A novel algorithm called the RMP Commit Protocol, which commits transactions efficiently in reliable multicast environment is presented. ROL provides recovery procedures to ensure that site and communication failures do not corrupt persistent data, and male the system fault tolerant to network partitions. ROL will facilitate building distributed fault tolerant applications by performing the burdensome details of replica consistency operations, and making it completely transparent to the application.Replicated databases are a major class of applications which could be built on top of ROL.
A Performance Prediction Model for a Fault-Tolerant Computer During Recovery and Restoration. Ph.D. Thesis Report, 1 Jan. - 31 Dec. 1992

NASA Technical Reports Server (NTRS)

Stoughton, John W.; Obando, Rodrigo A.

1993-01-01

The modeling and design of a fault-tolerant multiprocessor system is addressed. In particular, the behavior of the system during recovery and restoration after a fault has occurred is investigated. Given that a multicomputer system is designed using the Algorithm to Architecture to Mapping Model (ATAMM), and that a fault (death of a computing resource) occurs during its normal steady-state operation, a model is presented as a viable research tool for predicting the performance bounds of the system during its recovery and restoration phases. Furthermore, the bounds of the performance behavior of the system during this transient mode can be assessed. These bounds include: time to recover from the fault (t(sub rec)), time to restore the system (t(sub rec)) and whether there is a permanent delay in the system's Time Between Input and Output (TBIO) after the system has reached a steady state. An implementation of an ATAMM based computer was developed with the Generic VHSIC Spaceborne Computer (GVSC) as the target system. A simulation of the GVSC was also written based on the code used in ATAMM Multicomputer Operating System (AMOS). The simulation is in turn used to validate the new model in the usefulness and accuracy in tracking the propagation of the delay through the system and predicting the behavior in the transient state of recovery and restoration. The model is validated as an accurate method to predict the transient behavior of an ATAMM based multicomputer during recovery and restoration.
Parameter Transient Behavior Analysis on Fault Tolerant Control System

NASA Technical Reports Server (NTRS)

Belcastro, Christine (Technical Monitor); Shin, Jong-Yeob

2003-01-01

In a fault tolerant control (FTC) system, a parameter varying FTC law is reconfigured based on fault parameters estimated by fault detection and isolation (FDI) modules. FDI modules require some time to detect fault occurrences in aero-vehicle dynamics. This paper illustrates analysis of a FTC system based on estimated fault parameter transient behavior which may include false fault detections during a short time interval. Using Lyapunov function analysis, the upper bound of an induced-L2 norm of the FTC system performance is calculated as a function of a fault detection time and the exponential decay rate of the Lyapunov function.
A Test Generation Framework for Distributed Fault-Tolerant Algorithms

NASA Technical Reports Server (NTRS)

Goodloe, Alwyn; Bushnell, David; Miner, Paul; Pasareanu, Corina S.

2009-01-01

Heavyweight formal methods such as theorem proving have been successfully applied to the analysis of safety critical fault-tolerant systems. Typically, the models and proofs performed during such analysis do not inform the testing process of actual implementations. We propose a framework for generating test vectors from specifications written in the Prototype Verification System (PVS). The methodology uses a translator to produce a Java prototype from a PVS specification. Symbolic (Java) PathFinder is then employed to generate a collection of test cases. A small example is employed to illustrate how the framework can be used in practice.
Ultrareliable fault-tolerant control systems

NASA Technical Reports Server (NTRS)

Webster, L. D.; Slykhouse, R. A.; Booth, L. A., Jr.; Carson, T. M.; Davis, G. J.; Howard, J. C.

1984-01-01

It is demonstrated that fault-tolerant computer systems, such as on the Shuttles, based on redundant, independent operation are a viable alternative in fault tolerant system designs. The ultrareliable fault-tolerant control system (UFTCS) was developed and tested in laboratory simulations of an UH-1H helicopter. UFTCS includes asymptotically stable independent control elements in a parallel, cross-linked system environment. Static redundancy provides the fault tolerance. A polling is performed among the computers, with results allowing for time-delay channel variations with tight bounds. When compared with the laboratory and actual flight data for the helicopter, the probability of a fault was, for the first 10 hr of flight given a quintuple computer redundancy, found to be 1 in 290 billion. Two weeks of untended Space Station operations would experience a fault probability of 1 in 24 million. Techniques for avoiding channel divergence problems are identified.
Fault Tolerant Parallel Implementations of Iterative Algorithms for Optimal Control Problems

DTIC Science & Technology

1988-01-21

p/.V)] steps, but did not discuss any specific parallel implementation. Gajski [51 improved upon this result by performing the SIMD computation in...N = p2. our approach reduces to that of [51, except that Gajski presents the coefficient computation and partial solution phases as a single...8217>. the SIMD algo- rithm presented by Gajski [5] can be most efficiently mapped to a unidirec- tional ring network with broadcasting capability. Based
High-throughput state-machine replication using software transactional memory.

PubMed

Zhao, Wenbing; Yang, William; Zhang, Honglei; Yang, Jack; Luo, Xiong; Zhu, Yueqin; Yang, Mary; Luo, Chaomin

2016-11-01

State-machine replication is a common way of constructing general purpose fault tolerance systems. To ensure replica consistency, requests must be executed sequentially according to some total order at all non-faulty replicas. Unfortunately, this could severely limit the system throughput. This issue has been partially addressed by identifying non-conflicting requests based on application semantics and executing these requests concurrently. However, identifying and tracking non-conflicting requests require intimate knowledge of application design and implementation, and a custom fault tolerance solution developed for one application cannot be easily adopted by other applications. Software transactional memory offers a new way of constructing concurrent programs. In this article, we present the mechanisms needed to retrofit existing concurrency control algorithms designed for software transactional memory for state-machine replication. The main benefit for using software transactional memory in state-machine replication is that general purpose concurrency control mechanisms can be designed without deep knowledge of application semantics. As such, new fault tolerance systems based on state-machine replications with excellent throughput can be easily designed and maintained. In this article, we introduce three different concurrency control mechanisms for state-machine replication using software transactional memory, namely, ordered strong strict two-phase locking, conventional timestamp-based multiversion concurrency control, and speculative timestamp-based multiversion concurrency control. Our experiments show that speculative timestamp-based multiversion concurrency control mechanism has the best performance in all types of workload, the conventional timestamp-based multiversion concurrency control offers the worst performance due to high abort rate in the presence of even moderate contention between transactions. The ordered strong strict two-phase locking mechanism offers the simplest solution with excellent performance in low contention workload, and fairly good performance in high contention workload.
High-throughput state-machine replication using software transactional memory

PubMed Central

Yang, William; Zhang, Honglei; Yang, Jack; Luo, Xiong; Zhu, Yueqin; Yang, Mary; Luo, Chaomin

2017-01-01

State-machine replication is a common way of constructing general purpose fault tolerance systems. To ensure replica consistency, requests must be executed sequentially according to some total order at all non-faulty replicas. Unfortunately, this could severely limit the system throughput. This issue has been partially addressed by identifying non-conflicting requests based on application semantics and executing these requests concurrently. However, identifying and tracking non-conflicting requests require intimate knowledge of application design and implementation, and a custom fault tolerance solution developed for one application cannot be easily adopted by other applications. Software transactional memory offers a new way of constructing concurrent programs. In this article, we present the mechanisms needed to retrofit existing concurrency control algorithms designed for software transactional memory for state-machine replication. The main benefit for using software transactional memory in state-machine replication is that general purpose concurrency control mechanisms can be designed without deep knowledge of application semantics. As such, new fault tolerance systems based on state-machine replications with excellent throughput can be easily designed and maintained. In this article, we introduce three different concurrency control mechanisms for state-machine replication using software transactional memory, namely, ordered strong strict two-phase locking, conventional timestamp-based multiversion concurrency control, and speculative timestamp-based multiversion concurrency control. Our experiments show that speculative timestamp-based multiversion concurrency control mechanism has the best performance in all types of workload, the conventional timestamp-based multiversion concurrency control offers the worst performance due to high abort rate in the presence of even moderate contention between transactions. The ordered strong strict two-phase locking mechanism offers the simplest solution with excellent performance in low contention workload, and fairly good performance in high contention workload. PMID:29075049
A Byzantine-Fault Tolerant Self-Stabilizing Protocol for Distributed Clock Synchronization Systems

NASA Technical Reports Server (NTRS)

Malekpour, Mahyar R.

2006-01-01

Embedded distributed systems have become an integral part of safety-critical computing applications, necessitating system designs that incorporate fault tolerant clock synchronization in order to achieve ultra-reliable assurance levels. Many efficient clock synchronization protocols do not, however, address Byzantine failures, and most protocols that do tolerate Byzantine failures do not self-stabilize. Of the Byzantine self-stabilizing clock synchronization algorithms that exist in the literature, they are based on either unjustifiably strong assumptions about initial synchrony of the nodes or on the existence of a common pulse at the nodes. The Byzantine self-stabilizing clock synchronization protocol presented here does not rely on any assumptions about the initial state of the clocks. Furthermore, there is neither a central clock nor an externally generated pulse system. The proposed protocol converges deterministically, is scalable, and self-stabilizes in a short amount of time. The convergence time is linear with respect to the self-stabilization period. Proofs of the correctness of the protocol as well as the results of formal verification efforts are reported.
Qualitative Event-Based Diagnosis: Case Study on the Second International Diagnostic Competition

NASA Technical Reports Server (NTRS)

Daigle, Matthew; Roychoudhury, Indranil

2010-01-01

We describe a diagnosis algorithm entered into the Second International Diagnostic Competition. We focus on the first diagnostic problem of the industrial track of the competition in which a diagnosis algorithm must detect, isolate, and identify faults in an electrical power distribution testbed and provide corresponding recovery recommendations. The diagnosis algorithm embodies a model-based approach, centered around qualitative event-based fault isolation. Faults produce deviations in measured values from model-predicted values. The sequence of these deviations is matched to those predicted by the model in order to isolate faults. We augment this approach with model-based fault identification, which determines fault parameters and helps to further isolate faults. We describe the diagnosis approach, provide diagnosis results from running the algorithm on provided example scenarios, and discuss the issues faced, and lessons learned, from implementing the approach
Sliding Mode Fault Tolerant Control with Adaptive Diagnosis for Aircraft Engines

NASA Astrophysics Data System (ADS)

Xiao, Lingfei; Du, Yanbin; Hu, Jixiang; Jiang, Bin

2018-03-01

In this paper, a novel sliding mode fault tolerant control method is presented for aircraft engine systems with uncertainties and disturbances on the basis of adaptive diagnostic observer. By taking both sensors faults and actuators faults into account, the general model of aircraft engine control systems which is subjected to uncertainties and disturbances, is considered. Then, the corresponding augmented dynamic model is established in order to facilitate the fault diagnosis and fault tolerant controller design. Next, a suitable detection observer is designed to detect the faults effectively. Through creating an adaptive diagnostic observer and based on sliding mode strategy, the sliding mode fault tolerant controller is constructed. Robust stabilization is discussed and the closed-loop system can be stabilized robustly. It is also proven that the adaptive diagnostic observer output errors and the estimations of faults converge to a set exponentially, and the converge rate greater than some value which can be adjusted by choosing designable parameters properly. The simulation on a twin-shaft aircraft engine verifies the applicability of the proposed fault tolerant control method.
High Speed Operation and Testing of a Fault Tolerant Magnetic Bearing

NASA Technical Reports Server (NTRS)

DeWitt, Kenneth; Clark, Daniel

2004-01-01

Research activities undertaken to upgrade the fault-tolerant facility, continue testing high-speed fault-tolerant operation, and assist in the commission of the high temperature (1000 degrees F) thrust magnetic bearing as described. The fault-tolerant magnetic bearing test facility was upgraded to operate to 40,000 RPM. The necessary upgrades included new state-of-the art position sensors with high frequency modulation and new power edge filtering of amplifier outputs. A comparison study of the new sensors and the previous system was done as well as a noise assessment of the sensor-to-controller signals. Also a comparison study of power edge filtering for amplifier-to-actuator signals was done; this information is valuable for all position sensing and motor actuation applications. After these facility upgrades were completed, the rig is believed to have capabilities for 40,000 RPM operation, though this has yet to be demonstrated. Other upgrades included verification and upgrading of safety shielding, and upgrading control algorithms. The rig will now also be used to demonstrate motoring capabilities and control algorithms are in the process of being created. Recently an extreme temperature thrust magnetic bearing was designed from the ground up. The thrust bearing was designed to fit within the existing high temperature facility. The retrofit began near the end of the summer, 04, and continues currently. Contract staff authored a NASA-TM entitled "An Overview of Magnetic Bearing Technology for Gas Turbine Engines", containing a compilation of bearing data as it pertains to operation in the regime of the gas turbine engine and a presentation of how magnetic bearings can become a viable candidate for use in future engine technology.
Control algorithm implementation for a redundant degree of freedom manipulator

NASA Technical Reports Server (NTRS)

Cohan, Steve

1991-01-01

This project's purpose is to develop and implement control algorithms for a kinematically redundant robotic manipulator. The manipulator is being developed concurrently by Odetics Inc., under internal research and development funding. This SBIR contract supports algorithm conception, development, and simulation, as well as software implementation and integration with the manipulator hardware. The Odetics Dexterous Manipulator is a lightweight, high strength, modular manipulator being developed for space and commercial applications. It has seven fully active degrees of freedom, is electrically powered, and is fully operational in 1 G. The manipulator consists of five self-contained modules. These modules join via simple quick-disconnect couplings and self-mating connectors which allow rapid assembly/disassembly for reconfiguration, transport, or servicing. Each joint incorporates a unique drive train design which provides zero backlash operation, is insensitive to wear, and is single fault tolerant to motor or servo amplifier failure. The sensing system is also designed to be single fault tolerant. Although the initial prototype is not space qualified, the design is well-suited to meeting space qualification requirements. The control algorithm design approach is to develop a hierarchical system with well defined access and interfaces at each level. The high level endpoint/configuration control algorithm transforms manipulator endpoint position/orientation commands to joint angle commands, providing task space motion. At the same time, the kinematic redundancy is resolved by controlling the configuration (pose) of the manipulator, using several different optimizing criteria. The center level of the hierarchy servos the joints to their commanded trajectories using both linear feedback and model-based nonlinear control techniques. The lowest control level uses sensed joint torque to close torque servo loops, with the goal of improving the manipulator dynamic behavior. The control algorithms are subjected to a dynamic simulation before implementation.
Predeployment validation of fault-tolerant systems through software-implemented fault insertion

NASA Technical Reports Server (NTRS)

Czeck, Edward W.; Siewiorek, Daniel P.; Segall, Zary Z.

1989-01-01

Fault injection-based automated testing (FIAT) environment, which can be used to experimentally characterize and evaluate distributed realtime systems under fault-free and faulted conditions is described. A survey is presented of validation methodologies. The need for fault insertion based on validation methodologies is demonstrated. The origins and models of faults, and motivation for the FIAT concept are reviewed. FIAT employs a validation methodology which builds confidence in the system through first providing a baseline of fault-free performance data and then characterizing the behavior of the system with faults present. Fault insertion is accomplished through software and allows faults or the manifestation of faults to be inserted by either seeding faults into memory or triggering error detection mechanisms. FIAT is capable of emulating a variety of fault-tolerant strategies and architectures, can monitor system activity, and can automatically orchestrate experiments involving insertion of faults. There is a common system interface which allows ease of use to decrease experiment development and run time. Fault models chosen for experiments on FIAT have generated system responses which parallel those observed in real systems under faulty conditions. These capabilities are shown by two example experiments each using a different fault-tolerance strategy.
Simulated fault injection - A methodology to evaluate fault tolerant microprocessor architectures

NASA Technical Reports Server (NTRS)

Choi, Gwan S.; Iyer, Ravishankar K.; Carreno, Victor A.

1990-01-01

A simulation-based fault-injection method for validating fault-tolerant microprocessor architectures is described. The approach uses mixed-mode simulation (electrical/logic analysis), and injects transient errors in run-time to assess the resulting fault impact. As an example, a fault-tolerant architecture which models the digital aspects of a dual-channel real-time jet-engine controller is used. The level of effectiveness of the dual configuration with respect to single and multiple transients is measured. The results indicate 100 percent coverage of single transients. Approximately 12 percent of the multiple transients affect both channels; none result in controller failure since two additional levels of redundancy exist.
A benchmark for fault tolerant flight control evaluation

NASA Astrophysics Data System (ADS)

Smaili, H.; Breeman, J.; Lombaerts, T.; Stroosma, O.

2013-12-01

A large transport aircraft simulation benchmark (REconfigurable COntrol for Vehicle Emergency Return - RECOVER) has been developed within the GARTEUR (Group for Aeronautical Research and Technology in Europe) Flight Mechanics Action Group 16 (FM-AG(16)) on Fault Tolerant Control (2004 2008) for the integrated evaluation of fault detection and identification (FDI) and reconfigurable flight control strategies. The benchmark includes a suitable set of assessment criteria and failure cases, based on reconstructed accident scenarios, to assess the potential of new adaptive control strategies to improve aircraft survivability. The application of reconstruction and modeling techniques, based on accident flight data, has resulted in high-fidelity nonlinear aircraft and fault models to evaluate new Fault Tolerant Flight Control (FTFC) concepts and their real-time performance to accommodate in-flight failures.
Specification and Design Methodologies for High-Speed Fault-Tolerant Array Algorithms and Structures for VLSI.

DTIC Science & Technology

1987-06-01

evaluation and chip layout planning for VLSI digital systems. A high-level applicative (functional) language, implemented at UCLA, allows combining of...operating system. 2.1 Introduction The complexity of VLSI requires the application of CAD tools at all levels of the design process. In order to be...effective, these tools must be adaptive to the specific design. In this project we studied a design method based on the use of applicative languages
A programmable two-qubit quantum processor in silicon

NASA Astrophysics Data System (ADS)

Watson, T. F.; Philips, S. G. J.; Kawakami, E.; Ward, D. R.; Scarlino, P.; Veldhorst, M.; Savage, D. E.; Lagally, M. G.; Friesen, Mark; Coppersmith, S. N.; Eriksson, M. A.; Vandersypen, L. M. K.

2018-03-01

Now that it is possible to achieve measurement and control fidelities for individual quantum bits (qubits) above the threshold for fault tolerance, attention is moving towards the difficult task of scaling up the number of physical qubits to the large numbers that are needed for fault-tolerant quantum computing. In this context, quantum-dot-based spin qubits could have substantial advantages over other types of qubit owing to their potential for all-electrical operation and ability to be integrated at high density onto an industrial platform. Initialization, readout and single- and two-qubit gates have been demonstrated in various quantum-dot-based qubit representations. However, as seen with small-scale demonstrations of quantum computers using other types of qubit, combining these elements leads to challenges related to qubit crosstalk, state leakage, calibration and control hardware. Here we overcome these challenges by using carefully designed control techniques to demonstrate a programmable two-qubit quantum processor in a silicon device that can perform the Deutsch–Josza algorithm and the Grover search algorithm—canonical examples of quantum algorithms that outperform their classical analogues. We characterize the entanglement in our processor by using quantum-state tomography of Bell states, measuring state fidelities of 85–89 per cent and concurrences of 73–82 per cent. These results pave the way for larger-scale quantum computers that use spins confined to quantum dots.

Fault tolerant, radiation hard, high performance digital signal processor

NASA Technical Reports Server (NTRS)

Holmann, Edgar; Linscott, Ivan R.; Maurer, Michael J.; Tyler, G. L.; Libby, Vibeke

1990-01-01

An architecture has been developed for a high-performance VLSI digital signal processor that is highly reliable, fault-tolerant, and radiation-hard. The signal processor, part of a spacecraft receiver designed to support uplink radio science experiments at the outer planets, organizes the connections between redundant arithmetic resources, register files, and memory through a shuffle exchange communication network. The configuration of the network and the state of the processor resources are all under microprogram control, which both maps the resources according to algorithmic needs and reconfigures the processing should a failure occur. In addition, the microprogram is reloadable through the uplink to accommodate changes in the science objectives throughout the course of the mission. The processor will be implemented with silicon compiler tools, and its design will be verified through silicon compilation simulation at all levels from the resources to full functionality. By blending reconfiguration with redundancy the processor implementation is fault-tolerant and reliable, and possesses the long expected lifetime needed for a spacecraft mission to the outer planets.
Reliable communication in the presence of failures

NASA Technical Reports Server (NTRS)

Birman, Kenneth P.; Joseph, Thomas A.

1987-01-01

The design and correctness of a communication facility for a distributed computer system are reported on. The facility provides support for fault-tolerant process groups in the form of a family of reliable multicast protocols that can be used in both local- and wide-area networks. These protocols attain high levels of concurrency, while respecting application-specific delivery ordering constraints, and have varying cost and performance that depend on the degree of ordering desired. In particular, a protocol that enforces causal delivery orderings is introduced and shown to be a valuable alternative to conventional asynchronous communication protocols. The facility also ensures that the processes belonging to a fault-tolerant process group will observe consistant orderings of events affecting the group as a whole, including process failures, recoveries, migration, and dynamic changes to group properties like member rankings. A review of several uses for the protocols is the ISIS system, which supports fault-tolerant resilient objects and bulletin boards, illustrates the significant simplification of higher level algorithms made possible by our approach.
Decoding small surface codes with feedforward neural networks

NASA Astrophysics Data System (ADS)

Varsamopoulos, Savvas; Criger, Ben; Bertels, Koen

2018-01-01

Surface codes reach high error thresholds when decoded with known algorithms, but the decoding time will likely exceed the available time budget, especially for near-term implementations. To decrease the decoding time, we reduce the decoding problem to a classification problem that a feedforward neural network can solve. We investigate quantum error correction and fault tolerance at small code distances using neural network-based decoders, demonstrating that the neural network can generalize to inputs that were not provided during training and that they can reach similar or better decoding performance compared to previous algorithms. We conclude by discussing the time required by a feedforward neural network decoder in hardware.
Improving Security for SCADA Sensor Networks with Reputation Systems and Self-Organizing Maps.

PubMed

Moya, José M; Araujo, Alvaro; Banković, Zorana; de Goyeneche, Juan-Mariano; Vallejo, Juan Carlos; Malagón, Pedro; Villanueva, Daniel; Fraga, David; Romero, Elena; Blesa, Javier

2009-01-01

The reliable operation of modern infrastructures depends on computerized systems and Supervisory Control and Data Acquisition (SCADA) systems, which are also based on the data obtained from sensor networks. The inherent limitations of the sensor devices make them extremely vulnerable to cyberwarfare/cyberterrorism attacks. In this paper, we propose a reputation system enhanced with distributed agents, based on unsupervised learning algorithms (self-organizing maps), in order to achieve fault tolerance and enhanced resistance to previously unknown attacks. This approach has been extensively simulated and compared with previous proposals.
Quantum neuromorphic hardware for quantum artificial intelligence

NASA Astrophysics Data System (ADS)

Prati, Enrico

2017-08-01

The development of machine learning methods based on deep learning boosted the field of artificial intelligence towards unprecedented achievements and application in several fields. Such prominent results were made in parallel with the first successful demonstrations of fault tolerant hardware for quantum information processing. To which extent deep learning can take advantage of the existence of a hardware based on qubits behaving as a universal quantum computer is an open question under investigation. Here I review the convergence between the two fields towards implementation of advanced quantum algorithms, including quantum deep learning.
Improving Security for SCADA Sensor Networks with Reputation Systems and Self-Organizing Maps

PubMed Central

Moya, José M.; Araujo, Álvaro; Banković, Zorana; de Goyeneche, Juan-Mariano; Vallejo, Juan Carlos; Malagón, Pedro; Villanueva, Daniel; Fraga, David; Romero, Elena; Blesa, Javier

2009-01-01

The reliable operation of modern infrastructures depends on computerized systems and Supervisory Control and Data Acquisition (SCADA) systems, which are also based on the data obtained from sensor networks. The inherent limitations of the sensor devices make them extremely vulnerable to cyberwarfare/cyberterrorism attacks. In this paper, we propose a reputation system enhanced with distributed agents, based on unsupervised learning algorithms (self-organizing maps), in order to achieve fault tolerance and enhanced resistance to previously unknown attacks. This approach has been extensively simulated and compared with previous proposals. PMID:22291569
Software fault tolerance in computer operating systems

NASA Technical Reports Server (NTRS)

Iyer, Ravishankar K.; Lee, Inhwan

1994-01-01

This chapter provides data and analysis of the dependability and fault tolerance for three operating systems: the Tandem/GUARDIAN fault-tolerant system, the VAX/VMS distributed system, and the IBM/MVS system. Based on measurements from these systems, basic software error characteristics are investigated. Fault tolerance in operating systems resulting from the use of process pairs and recovery routines is evaluated. Two levels of models are developed to analyze error and recovery processes inside an operating system and interactions among multiple instances of an operating system running in a distributed environment. The measurements show that the use of process pairs in Tandem systems, which was originally intended for tolerating hardware faults, allows the system to tolerate about 70% of defects in system software that result in processor failures. The loose coupling between processors which results in the backup execution (the processor state and the sequence of events occurring) being different from the original execution is a major reason for the measured software fault tolerance. The IBM/MVS system fault tolerance almost doubles when recovery routines are provided, in comparison to the case in which no recovery routines are available. However, even when recovery routines are provided, there is almost a 50% chance of system failure when critical system jobs are involved.
Advanced development for space robotics with emphasis on fault tolerance

NASA Technical Reports Server (NTRS)

Tesar, D.; Chladek, J.; Hooper, R.; Sreevijayan, D.; Kapoor, C.; Geisinger, J.; Meaney, M.; Browning, G.; Rackers, K.

1995-01-01

This paper describes the ongoing work in fault tolerance at the University of Texas at Austin. The paper describes the technical goals the group is striving to achieve and includes a brief description of the individual projects focusing on fault tolerance. The ultimate goal is to develop and test technology applicable to all future missions of NASA (lunar base, Mars exploration, planetary surveillance, space station, etc.).
Adaptive robust fault-tolerant control for linear MIMO systems with unmatched uncertainties

NASA Astrophysics Data System (ADS)

Zhang, Kangkang; Jiang, Bin; Yan, Xing-Gang; Mao, Zehui

2017-10-01

In this paper, two novel fault-tolerant control design approaches are proposed for linear MIMO systems with actuator additive faults, multiplicative faults and unmatched uncertainties. For time-varying multiplicative and additive faults, new adaptive laws and additive compensation functions are proposed. A set of conditions is developed such that the unmatched uncertainties are compensated by actuators in control. On the other hand, for unmatched uncertainties with their projection in unmatched space being not zero, based on a (vector) relative degree condition, additive functions are designed to compensate for the uncertainties from output channels in the presence of actuator faults. The developed fault-tolerant control schemes are applied to two aircraft systems to demonstrate the efficiency of the proposed approaches.
A Parameter Communication Optimization Strategy for Distributed Machine Learning in Sensors

PubMed Central

Zhang, Jilin; Tu, Hangdi; Ren, Yongjian; Wan, Jian; Zhou, Li; Li, Mingwei; Wang, Jue; Yu, Lifeng; Zhao, Chang; Zhang, Lei

2017-01-01

In order to utilize the distributed characteristic of sensors, distributed machine learning has become the mainstream approach, but the different computing capability of sensors and network delays greatly influence the accuracy and the convergence rate of the machine learning model. Our paper describes a reasonable parameter communication optimization strategy to balance the training overhead and the communication overhead. We extend the fault tolerance of iterative-convergent machine learning algorithms and propose the Dynamic Finite Fault Tolerance (DFFT). Based on the DFFT, we implement a parameter communication optimization strategy for distributed machine learning, named Dynamic Synchronous Parallel Strategy (DSP), which uses the performance monitoring model to dynamically adjust the parameter synchronization strategy between worker nodes and the Parameter Server (PS). This strategy makes full use of the computing power of each sensor, ensures the accuracy of the machine learning model, and avoids the situation that the model training is disturbed by any tasks unrelated to the sensors. PMID:28934163
Development and evaluation of a Fault-Tolerant Multiprocessor (FTMP) computer. Volume 2: FTMP software

NASA Technical Reports Server (NTRS)

Lala, J. H.; Smith, T. B., III

1983-01-01

The software developed for the Fault-Tolerant Multiprocessor (FTMP) is described. The FTMP executive is a timer-interrupt driven dispatcher that schedules iterative tasks which run at 3.125, 12.5, and 25 Hz. Major tasks which run under the executive include system configuration control, flight control, and display. The flight control task includes autopilot and autoland functions for a jet transport aircraft. System Displays include status displays of all hardware elements (processors, memories, I/O ports, buses), failure log displays showing transient and hard faults, and an autopilot display. All software is in a higher order language (AED, an ALGOL derivative). The executive is a fully distributed general purpose executive which automatically balances the load among available processor triads. Provisions for graceful performance degradation under processing overload are an integral part of the scheduling algorithms.
Fault-tolerant locomotion of the hexapod robot.

PubMed

Yang, J M; Kim, J H

1998-01-01

In this paper, we propose a scheme for fault detection and tolerance of the hexapod robot locomotion on even terrain. The fault stability margin is defined to represent potential stability which a gait can have in case a sudden fault event occurs to one leg. Based on this, the fault-tolerant quadruped periodic gaits of the hexapod walking over perfectly even terrain are derived. It is demonstrated that the derived quadruped gait is the optimal one the hexapod can have maintaining fault stability margin nonnegative and a geometric condition should be satisfied for the optimal locomotion. By this scheme, when one leg is in failure, the hexapod robot has the modified tripod gait to continue the optimal locomotion.
Gyro-based Maximum-Likelihood Thruster Fault Detection and Identification

NASA Technical Reports Server (NTRS)

Wilson, Edward; Lages, Chris; Mah, Robert; Clancy, Daniel (Technical Monitor)

2002-01-01

When building smaller, less expensive spacecraft, there is a need for intelligent fault tolerance vs. increased hardware redundancy. If fault tolerance can be achieved using existing navigation sensors, cost and vehicle complexity can be reduced. A maximum likelihood-based approach to thruster fault detection and identification (FDI) for spacecraft is developed here and applied in simulation to the X-38 space vehicle. The system uses only gyro signals to detect and identify hard, abrupt, single and multiple jet on- and off-failures. Faults are detected within one second and identified within one to five accords,
Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lifflander, Jonathan; Meneses, Esteban; Menon, Harshita

2014-09-22

Deterministic replay of a parallel application is commonly used for discovering bugs or to recover from a hard fault with message-logging fault tolerance. For message passing programs, a major source of overhead during forward execution is recording the order in which messages are sent and received. During replay, this ordering must be used to deterministically reproduce the execution. Previous work in replay algorithms often makes minimal assumptions about the programming model and application in order to maintain generality. However, in many cases, only a partial order must be recorded due to determinism intrinsic in the code, ordering constraints imposed bymore » the execution model, and events that are commutative (their relative execution order during replay does not need to be reproduced exactly). In this paper, we present a novel algebraic framework for reasoning about the minimum dependencies required to represent the partial order for different concurrent orderings and interleavings. By exploiting this theory, we improve on an existing scalable message-logging fault tolerance scheme. The improved scheme scales to 131,072 cores on an IBM BlueGene/P with up to 2x lower overhead than one that records a total order.« less
Experiments in fault tolerant software reliability

NASA Technical Reports Server (NTRS)

Mcallister, David F.; Tai, K. C.; Vouk, Mladen A.

1987-01-01

The reliability of voting was evaluated in a fault-tolerant software system for small output spaces. The effectiveness of the back-to-back testing process was investigated. Version 3.0 of the RSDIMU-ATS, a semi-automated test bed for certification testing of RSDIMU software, was prepared and distributed. Software reliability estimation methods based on non-random sampling are being studied. The investigation of existing fault-tolerance models was continued and formulation of new models was initiated.
Mixed H2/H∞-Based Fusion Estimation for Energy-Limited Multi-Sensors in Wearable Body Networks

PubMed Central

Li, Chao; Zhang, Zhenjiang; Chao, Han-Chieh

2017-01-01

In wireless sensor networks, sensor nodes collect plenty of data for each time period. If all of data are transmitted to a Fusion Center (FC), the power of sensor node would run out rapidly. On the other hand, the data also needs a filter to remove the noise. Therefore, an efficient fusion estimation model, which can save the energy of the sensor nodes while maintaining higher accuracy, is needed. This paper proposes a novel mixed H2/H∞-based energy-efficient fusion estimation model (MHEEFE) for energy-limited Wearable Body Networks. In the proposed model, the communication cost is firstly reduced efficiently while keeping the estimation accuracy. Then, the parameters in quantization method are discussed, and we confirm them by an optimization method with some prior knowledge. Besides, some calculation methods of important parameters are researched which make the final estimates more stable. Finally, an iteration-based weight calculation algorithm is presented, which can improve the fault tolerance of the final estimate. In the simulation, the impacts of some pivotal parameters are discussed. Meanwhile, compared with the other related models, the MHEEFE shows a better performance in accuracy, energy-efficiency and fault tolerance. PMID:29280950
Provable Transient Recovery for Frame-Based, Fault-Tolerant Computing Systems

NASA Technical Reports Server (NTRS)

DiVito, Ben L.; Butler, Ricky W.

1992-01-01

We present a formal verification of the transient fault recovery aspects of the Reliable Computing Platform (RCP), a fault-tolerant computing system architecture for digital flight control applications. The RCP uses NMR-style redundancy to mask faults and internal majority voting to purge the effects of transient faults. The system design has been formally specified and verified using the EHDM verification system. Our formalization accommodates a wide variety of voting schemes for purging the effects of transients.
cOSPREY: A Cloud-Based Distributed Algorithm for Large-Scale Computational Protein Design

PubMed Central

Pan, Yuchao; Dong, Yuxi; Zhou, Jingtian; Hallen, Mark; Donald, Bruce R.; Xu, Wei

2016-01-01

Abstract Finding the global minimum energy conformation (GMEC) of a huge combinatorial search space is the key challenge in computational protein design (CPD) problems. Traditional algorithms lack a scalable and efficient distributed design scheme, preventing researchers from taking full advantage of current cloud infrastructures. We design cloud OSPREY (cOSPREY), an extension to a widely used protein design software OSPREY, to allow the original design framework to scale to the commercial cloud infrastructures. We propose several novel designs to integrate both algorithm and system optimizations, such as GMEC-specific pruning, state search partitioning, asynchronous algorithm state sharing, and fault tolerance. We evaluate cOSPREY on three different cloud platforms using different technologies and show that it can solve a number of large-scale protein design problems that have not been possible with previous approaches. PMID:27154509
Optimal Management of Redundant Control Authority for Fault Tolerance

NASA Technical Reports Server (NTRS)

Wu, N. Eva; Ju, Jianhong

2000-01-01

This paper is intended to demonstrate the feasibility of a solution to a fault tolerant control problem. It explains, through a numerical example, the design and the operation of a novel scheme for fault tolerant control. The fundamental principle of the scheme was formalized in [5] based on the notion of normalized nonspecificity. The novelty lies with the use of a reliability criterion for redundancy management, and therefore leads to a high overall system reliability.
Data-based fault-tolerant control for affine nonlinear systems with actuator faults.

PubMed

Xie, Chun-Hua; Yang, Guang-Hong

2016-09-01

This paper investigates the fault-tolerant control (FTC) problem for unknown nonlinear systems with actuator faults including stuck, outage, bias and loss of effectiveness. The upper bounds of stuck faults, bias faults and loss of effectiveness faults are unknown. A new data-based FTC scheme is proposed. It consists of the online estimations of the bounds and a state-dependent function. The estimations are adjusted online to compensate automatically the actuator faults. The state-dependent function solved by using real system data helps to stabilize the system. Furthermore, all signals in the resulting closed-loop system are uniformly bounded and the states converge asymptotically to zero. Compared with the existing results, the proposed approach is data-based. Finally, two simulation examples are provided to show the effectiveness of the proposed approach. Copyright © 2016 ISA. Published by Elsevier Ltd. All rights reserved.

Study of a unified hardware and software fault-tolerant architecture

NASA Technical Reports Server (NTRS)

Lala, Jaynarayan; Alger, Linda; Friend, Steven; Greeley, Gregory; Sacco, Stephen; Adams, Stuart

1989-01-01

A unified architectural concept, called the Fault Tolerant Processor Attached Processor (FTP-AP), that can tolerate hardware as well as software faults is proposed for applications requiring ultrareliable computation capability. An emulation of the FTP-AP architecture, consisting of a breadboard Motorola 68010-based quadruply redundant Fault Tolerant Processor, four VAX 750s as attached processors, and four versions of a transport aircraft yaw damper control law, is used as a testbed in the AIRLAB to examine a number of critical issues. Solutions of several basic problems associated with N-Version software are proposed and implemented on the testbed. This includes a confidence voter to resolve coincident errors in N-Version software. A reliability model of N-Version software that is based upon the recent understanding of software failure mechanisms is also developed. The basic FTP-AP architectural concept appears suitable for hosting N-Version application software while at the same time tolerating hardware failures. Architectural enhancements for greater efficiency, software reliability modeling, and N-Version issues that merit further research are identified.
A fault-tolerant control architecture for unmanned aerial vehicles

NASA Astrophysics Data System (ADS)

Drozeski, Graham R.

Research has presented several approaches to achieve varying degrees of fault-tolerance in unmanned aircraft. Approaches in reconfigurable flight control are generally divided into two categories: those which incorporate multiple non-adaptive controllers and switch between them based on the output of a fault detection and identification element, and those that employ a single adaptive controller capable of compensating for a variety of fault modes. Regardless of the approach for reconfigurable flight control, certain fault modes dictate system restructuring in order to prevent a catastrophic failure. System restructuring enables active control of actuation not employed by the nominal system to recover controllability of the aircraft. After system restructuring, continued operation requires the generation of flight paths that adhere to an altered flight envelope. The control architecture developed in this research employs a multi-tiered hierarchy to allow unmanned aircraft to generate and track safe flight paths despite the occurrence of potentially catastrophic faults. The hierarchical architecture increases the level of autonomy of the system by integrating five functionalities with the baseline system: fault detection and identification, active system restructuring, reconfigurable flight control; reconfigurable path planning, and mission adaptation. Fault detection and identification algorithms continually monitor aircraft performance and issue fault declarations. When the severity of a fault exceeds the capability of the baseline flight controller, active system restructuring expands the controllability of the aircraft using unconventional control strategies not exploited by the baseline controller. Each of the reconfigurable flight controllers and the baseline controller employ a proven adaptive neural network control strategy. A reconfigurable path planner employs an adaptive model of the vehicle to re-shape the desired flight path. Generation of the revised flight path is posed as a linear program constrained by the response of the degraded system. Finally, a mission adaptation component estimates limitations on the closed-loop performance of the aircraft and adjusts the aircraft mission accordingly. A combination of simulation and flight test results using two unmanned helicopters validates the utility of the hierarchical architecture.
Soft-Fault Detection Technologies Developed for Electrical Power Systems

NASA Technical Reports Server (NTRS)

Button, Robert M.

2004-01-01

The NASA Glenn Research Center, partner universities, and defense contractors are working to develop intelligent power management and distribution (PMAD) technologies for future spacecraft and launch vehicles. The goals are to provide higher performance (efficiency, transient response, and stability), higher fault tolerance, and higher reliability through the application of digital control and communication technologies. It is also expected that these technologies will eventually reduce the design, development, manufacturing, and integration costs for large, electrical power systems for space vehicles. The main focus of this research has been to incorporate digital control, communications, and intelligent algorithms into power electronic devices such as direct-current to direct-current (dc-dc) converters and protective switchgear. These technologies, in turn, will enable revolutionary changes in the way electrical power systems are designed, developed, configured, and integrated in aerospace vehicles and satellites. Initial successes in integrating modern, digital controllers have proven that transient response performance can be improved using advanced nonlinear control algorithms. One technology being developed includes the detection of "soft faults," those not typically covered by current systems in use today. Soft faults include arcing faults, corona discharge faults, and undetected leakage currents. Using digital control and advanced signal analysis algorithms, we have shown that it is possible to reliably detect arcing faults in high-voltage dc power distribution systems (see the preceding photograph). Another research effort has shown that low-level leakage faults and cable degradation can be detected by analyzing power system parameters over time. This additional fault detection capability will result in higher reliability for long-lived power systems such as reusable launch vehicles and space exploration missions.
Hypothetical Scenario Generator for Fault-Tolerant Diagnosis

NASA Technical Reports Server (NTRS)

James, Mark

2007-01-01

The Hypothetical Scenario Generator for Fault-tolerant Diagnostics (HSG) is an algorithm being developed in conjunction with other components of artificial- intelligence systems for automated diagnosis and prognosis of faults in spacecraft, aircraft, and other complex engineering systems. By incorporating prognostic capabilities along with advanced diagnostic capabilities, these developments hold promise to increase the safety and affordability of the affected engineering systems by making it possible to obtain timely and accurate information on the statuses of the systems and predicting impending failures well in advance. The HSG is a specific instance of a hypothetical- scenario generator that implements an innovative approach for performing diagnostic reasoning when data are missing. The special purpose served by the HSG is to (1) look for all possible ways in which the present state of the engineering system can be mapped with respect to a given model and (2) generate a prioritized set of future possible states and the scenarios of which they are parts.
Use of non-adiabatic geometric phase for quantum computing by NMR.

PubMed

Das, Ranabir; Kumar, S K Karthick; Kumar, Anil

2005-12-01

Geometric phases have stimulated researchers for its potential applications in many areas of science. One of them is fault-tolerant quantum computation. A preliminary requisite of quantum computation is the implementation of controlled dynamics of qubits. In controlled dynamics, one qubit undergoes coherent evolution and acquires appropriate phase, depending on the state of other qubits. If the evolution is geometric, then the phase acquired depend only on the geometry of the path executed, and is robust against certain types of error. This phenomenon leads to an inherently fault-tolerant quantum computation. Here we suggest a technique of using non-adiabatic geometric phase for quantum computation, using selective excitation. In a two-qubit system, we selectively evolve a suitable subsystem where the control qubit is in state |1, through a closed circuit. By this evolution, the target qubit gains a phase controlled by the state of the control qubit. Using the non-adiabatic geometric phase we demonstrate implementation of Deutsch-Jozsa algorithm and Grover's search algorithm in a two-qubit system.
A Novel Online Data-Driven Algorithm for Detecting UAV Navigation Sensor Faults.

PubMed

Sun, Rui; Cheng, Qi; Wang, Guanyu; Ochieng, Washington Yotto

2017-09-29

The use of Unmanned Aerial Vehicles (UAVs) has increased significantly in recent years. On-board integrated navigation sensors are a key component of UAVs' flight control systems and are essential for flight safety. In order to ensure flight safety, timely and effective navigation sensor fault detection capability is required. In this paper, a novel data-driven Adaptive Neuron Fuzzy Inference System (ANFIS)-based approach is presented for the detection of on-board navigation sensor faults in UAVs. Contrary to the classic UAV sensor fault detection algorithms, based on predefined or modelled faults, the proposed algorithm combines an online data training mechanism with the ANFIS-based decision system. The main advantages of this algorithm are that it allows real-time model-free residual analysis from Kalman Filter (KF) estimates and the ANFIS to build a reliable fault detection system. In addition, it allows fast and accurate detection of faults, which makes it suitable for real-time applications. Experimental results have demonstrated the effectiveness of the proposed fault detection method in terms of accuracy and misdetection rate.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Katti, Amogh; Di Fatta, Giuseppe; Naughton, Thomas

Future extreme-scale high-performance computing systems will be required to work under frequent component failures. The MPI Forum s User Level Failure Mitigation proposal has introduced an operation, MPI Comm shrink, to synchronize the alive processes on the list of failed processes, so that applications can continue to execute even in the presence of failures by adopting algorithm-based fault tolerance techniques. This MPI Comm shrink operation requires a failure detection and consensus algorithm. This paper presents three novel failure detection and consensus algorithms using Gossiping. The proposed algorithms were implemented and tested using the Extreme-scale Simulator. The results show that inmore » all algorithms the number of Gossip cycles to achieve global consensus scales logarithmically with system size. The second algorithm also shows better scalability in terms of memory and network bandwidth usage and a perfect synchronization in achieving global consensus. The third approach is a three-phase distributed failure detection and consensus algorithm and provides consistency guarantees even in very large and extreme-scale systems while at the same time being memory and bandwidth efficient.« less
Formal Techniques for Synchronized Fault-Tolerant Systems

NASA Technical Reports Server (NTRS)

DiVito, Ben L.; Butler, Ricky W.

1992-01-01

We present the formal verification of synchronizing aspects of the Reliable Computing Platform (RCP), a fault-tolerant computing system for digital flight control applications. The RCP uses NMR-style redundancy to mask faults and internal majority voting to purge the effects of transient faults. The system design has been formally specified and verified using the EHDM verification system. Our formalization is based on an extended state machine model incorporating snapshots of local processors clocks.
Aircraft Engine On-Line Diagnostics Through Dual-Channel Sensor Measurements: Development of an Enhanced System

NASA Technical Reports Server (NTRS)

Kobayashi, Takahisa; Simon, Donald L.

2008-01-01

In this paper, an enhanced on-line diagnostic system which utilizes dual-channel sensor measurements is developed for the aircraft engine application. The enhanced system is composed of a nonlinear on-board engine model (NOBEM), the hybrid Kalman filter (HKF) algorithm, and fault detection and isolation (FDI) logic. The NOBEM provides the analytical third channel against which the dual-channel measurements are compared. The NOBEM is further utilized as part of the HKF algorithm which estimates measured engine parameters. Engine parameters obtained from the dual-channel measurements, the NOBEM, and the HKF are compared against each other. When the discrepancy among the signals exceeds a tolerance level, the FDI logic determines the cause of discrepancy. Through this approach, the enhanced system achieves the following objectives: 1) anomaly detection, 2) component fault detection, and 3) sensor fault detection and isolation. The performance of the enhanced system is evaluated in a simulation environment using faults in sensors and components, and it is compared to an existing baseline system.
Rigorously modeling self-stabilizing fault-tolerant circuits: An ultra-robust clocking scheme for systems-on-chip.

PubMed

Dolev, Danny; Függer, Matthias; Posch, Markus; Schmid, Ulrich; Steininger, Andreas; Lenzen, Christoph

2014-06-01

We present the first implementation of a distributed clock generation scheme for Systems-on-Chip that recovers from an unbounded number of arbitrary transient faults despite a large number of arbitrary permanent faults. We devise self-stabilizing hardware building blocks and a hybrid synchronous/asynchronous state machine enabling metastability-free transitions of the algorithm's states. We provide a comprehensive modeling approach that permits to prove, given correctness of the constructed low-level building blocks, the high-level properties of the synchronization algorithm (which have been established in a more abstract model). We believe this approach to be of interest in its own right, since this is the first technique permitting to mathematically verify, at manageable complexity, high-level properties of a fault-prone system in terms of its very basic components. We evaluate a prototype implementation, which has been designed in VHDL, using the Petrify tool in conjunction with some extensions, and synthesized for an Altera Cyclone FPGA.
Rigorously modeling self-stabilizing fault-tolerant circuits: An ultra-robust clocking scheme for systems-on-chip☆

PubMed Central

Dolev, Danny; Függer, Matthias; Posch, Markus; Schmid, Ulrich; Steininger, Andreas; Lenzen, Christoph

2014-01-01

We present the first implementation of a distributed clock generation scheme for Systems-on-Chip that recovers from an unbounded number of arbitrary transient faults despite a large number of arbitrary permanent faults. We devise self-stabilizing hardware building blocks and a hybrid synchronous/asynchronous state machine enabling metastability-free transitions of the algorithm's states. We provide a comprehensive modeling approach that permits to prove, given correctness of the constructed low-level building blocks, the high-level properties of the synchronization algorithm (which have been established in a more abstract model). We believe this approach to be of interest in its own right, since this is the first technique permitting to mathematically verify, at manageable complexity, high-level properties of a fault-prone system in terms of its very basic components. We evaluate a prototype implementation, which has been designed in VHDL, using the Petrify tool in conjunction with some extensions, and synthesized for an Altera Cyclone FPGA. PMID:26516290
Evolvable Hardware for Space Applications

NASA Technical Reports Server (NTRS)

Lohn, Jason; Globus, Al; Hornby, Gregory; Larchev, Gregory; Kraus, William

2004-01-01

This article surveys the research of the Evolvable Systems Group at NASA Ames Research Center. Over the past few years, our group has developed the ability to use evolutionary algorithms in a variety of NASA applications ranging from spacecraft antenna design, fault tolerance for programmable logic chips, atomic force field parameter fitting, analog circuit design, and earth observing satellite scheduling. In some of these applications, evolutionary algorithms match or improve on human performance.
Refinement for fault-tolerance: An aircraft hand-off protocol

NASA Technical Reports Server (NTRS)

Marzullo, Keith; Schneider, Fred B.; Dehn, Jon

1994-01-01

Part of the Advanced Automation System (AAS) for air-traffic control is a protocol to permit flight hand-off from one air-traffic controller to another. The protocol must be fault-tolerant and, therefore, is subtle -- an ideal candidate for the application of formal methods. This paper describes a formal method for deriving fault-tolerant protocols that is based on refinement and proof outlines. The AAS hand-off protocol was actually derived using this method; that derivation is given.
Testing For EM Upsets In Aircraft Control Computers

NASA Technical Reports Server (NTRS)

Belcastro, Celeste M.

1994-01-01

Effects of transient electrical signals evaluated in laboratory tests. Method of evaluating nominally fault-tolerant, aircraft-type digital-computer-based control system devised. Provides for evaluation of susceptibility of system to upset and evaluation of integrity of control when system subjected to transient electrical signals like those induced by electromagnetic (EM) source, in this case lightning. Beyond aerospace applications, fault-tolerant control systems becoming more wide-spread in industry; such as in automobiles. Method supports practical, systematic tests for evaluation of designs of fault-tolerant control systems.
Certification of computational results

NASA Technical Reports Server (NTRS)

Sullivan, Gregory F.; Wilson, Dwight S.; Masson, Gerald M.

1993-01-01

A conceptually novel and powerful technique to achieve fault detection and fault tolerance in hardware and software systems is described. When used for software fault detection, this new technique uses time and software redundancy and can be outlined as follows. In the initial phase, a program is run to solve a problem and store the result. In addition, this program leaves behind a trail of data called a certification trail. In the second phase, another program is run which solves the original problem again. This program, however, has access to the certification trail left by the first program. Because of the availability of the certification trail, the second phase can be performed by a less complex program and can execute more quickly. In the final phase, the two results are compared and if they agree the results are accepted as correct; otherwise an error is indicated. An essential aspect of this approach is that the second program must always generate either an error indication or a correct output even when the certification trail it receives from the first program is incorrect. The certification trail approach to fault tolerance is formalized and realizations of it are illustrated by considering algorithms for the following problems: convex hull, sorting, and shortest path. Cases in which the second phase can be run concurrently with the first and act as a monitor are discussed. The certification trail approach are compared to other approaches to fault tolerance.
Quantum Error Correction

NASA Astrophysics Data System (ADS)

Lidar, Daniel A.; Brun, Todd A.

2013-09-01

Prologue; Preface; Part I. Background: 1. Introduction to decoherence and noise in open quantum systems Daniel Lidar and Todd Brun; 2. Introduction to quantum error correction Dave Bacon; 3. Introduction to decoherence-free subspaces and noiseless subsystems Daniel Lidar; 4. Introduction to quantum dynamical decoupling Lorenza Viola; 5. Introduction to quantum fault tolerance Panos Aliferis; Part II. Generalized Approaches to Quantum Error Correction: 6. Operator quantum error correction David Kribs and David Poulin; 7. Entanglement-assisted quantum error-correcting codes Todd Brun and Min-Hsiu Hsieh; 8. Continuous-time quantum error correction Ognyan Oreshkov; Part III. Advanced Quantum Codes: 9. Quantum convolutional codes Mark Wilde; 10. Non-additive quantum codes Markus Grassl and Martin Rötteler; 11. Iterative quantum coding systems David Poulin; 12. Algebraic quantum coding theory Andreas Klappenecker; 13. Optimization-based quantum error correction Andrew Fletcher; Part IV. Advanced Dynamical Decoupling: 14. High order dynamical decoupling Zhen-Yu Wang and Ren-Bao Liu; 15. Combinatorial approaches to dynamical decoupling Martin Rötteler and Pawel Wocjan; Part V. Alternative Quantum Computation Approaches: 16. Holonomic quantum computation Paolo Zanardi; 17. Fault tolerance for holonomic quantum computation Ognyan Oreshkov, Todd Brun and Daniel Lidar; 18. Fault tolerant measurement-based quantum computing Debbie Leung; Part VI. Topological Methods: 19. Topological codes Héctor Bombín; 20. Fault tolerant topological cluster state quantum computing Austin Fowler and Kovid Goyal; Part VII. Applications and Implementations: 21. Experimental quantum error correction Dave Bacon; 22. Experimental dynamical decoupling Lorenza Viola; 23. Architectures Jacob Taylor; 24. Error correction in quantum communication Mark Wilde; Part VIII. Critical Evaluation of Fault Tolerance: 25. Hamiltonian methods in QEC and fault tolerance Eduardo Novais, Eduardo Mucciolo and Harold Baranger; 26. Critique of fault-tolerant quantum information processing Robert Alicki; References; Index.
Adaptive algorithm of magnetic heading detection

NASA Astrophysics Data System (ADS)

Liu, Gong-Xu; Shi, Ling-Feng

2017-11-01

Magnetic data obtained from a magnetic sensor usually fluctuate in a certain range, which makes it difficult to estimate the magnetic heading accurately. In fact, magnetic heading information is usually submerged in noise because of all kinds of electromagnetic interference and the diversity of the pedestrian’s motion states. In order to solve this problem, a new adaptive algorithm based on the (typically) right-angled corridors of a building or residential buildings is put forward to process heading information. First, a 3D indoor localization platform is set up based on MPU9250. Then, several groups of data are measured by changing the experimental environment and pedestrian’s motion pace. The raw data from the attached inertial measurement unit are calibrated and arranged into a time-stamped array and written to a data file. Later, the data file is imported into MATLAB for processing and analysis using the proposed adaptive algorithm. Finally, the algorithm is verified by comparison with the existing algorithm. The experimental results show that the algorithm has strong robustness and good fault tolerance, which can detect the heading information accurately and in real-time.
Achieving Agreement in Three Rounds With Bounded-Byzantine Faults

NASA Technical Reports Server (NTRS)

Malekpour, Mahyar R.

2015-01-01

A three-round algorithm is presented that guarantees agreement in a system of K (nodes) greater than or equal to 3F (faults) +1 nodes provided each faulty node induces no more than F faults and each good node experiences no more than F faults, where, F is the maximum number of simultaneous faults in the network. The algorithm is based on the Oral Message algorithm of Lamport et al. and is scalable with respect to the number of nodes in the system and applies equally to the traditional node-fault model as well as the link-fault model. We also present a mechanical verification of the algorithm focusing on verifying the correctness of a bounded model of the algorithm as well as confirming claims of determinism.
A data structure and algorithm for fault diagnosis

NASA Technical Reports Server (NTRS)

Bosworth, Edward L., Jr.

1987-01-01

Results of preliminary research on the design of a knowledge based fault diagnosis system for use with on-orbit spacecraft such as the Hubble Space Telescope are presented. A candidate data structure and associated search algorithm from which the knowledge based system can evolve is discussed. This algorithmic approach will then be examined in view of its inability to diagnose certain common faults. From that critique, a design for the corresponding knowledge based system will be given.
Fault-Tolerant Control of ANPC Three-Level Inverter Based on Order-Reduction Optimal Control Strategy under Multi-Device Open-Circuit Fault.

PubMed

Xu, Shi-Zhou; Wang, Chun-Jie; Lin, Fang-Li; Li, Shi-Xiang

2017-10-31

The multi-device open-circuit fault is a common fault of ANPC (Active Neutral-Point Clamped) three-level inverter and effect the operation stability of the whole system. To improve the operation stability, this paper summarized the main solutions currently firstly and analyzed all the possible states of multi-device open-circuit fault. Secondly, an order-reduction optimal control strategy was proposed under multi-device open-circuit fault to realize fault-tolerant control based on the topology and control requirement of ANPC three-level inverter and operation stability. This control strategy can solve the faults with different operation states, and can works in order-reduction state under specific open-circuit faults with specific combined devices, which sacrifices the control quality to obtain the stability priority control. Finally, the simulation and experiment proved the effectiveness of the proposed strategy.

Self-stabilizing byzantine-fault-tolerant clock synchronization system and method

NASA Technical Reports Server (NTRS)

Malekpour, Mahyar R. (Inventor)

2012-01-01

Systems and methods for rapid Byzantine-fault-tolerant self-stabilizing clock synchronization are provided. The systems and methods are based on a protocol comprising a state machine and a set of monitors that execute once every local oscillator tick. The protocol is independent of specific application specific requirements. The faults are assumed to be arbitrary and/or malicious. All timing measures of variables are based on the node's local clock and thus no central clock or externally generated pulse is used. Instances of the protocol are shown to tolerate bursts of transient failures and deterministically converge with a linear convergence time with respect to the synchronization period as predicted.
Adding Fault Tolerance to NPB Benchmarks Using ULFM

DOE Office of Scientific and Technical Information (OSTI.GOV)

Parchman, Zachary W; Vallee, Geoffroy R; Naughton III, Thomas J

2016-01-01

In the world of high-performance computing, fault tolerance and application resilience are becoming some of the primary concerns because of increasing hardware failures and memory corruptions. While the research community has been investigating various options, from system-level solutions to application-level solutions, standards such as the Message Passing Interface (MPI) are also starting to include such capabilities. The current proposal for MPI fault tolerant is centered around the User-Level Failure Mitigation (ULFM) concept, which provides means for fault detection and recovery of the MPI layer. This approach does not address application-level recovery, which is currently left to application developers. In thismore » work, we present a mod- ification of some of the benchmarks of the NAS parallel benchmark (NPB) to include support of the ULFM capabilities as well as application-level strategies and mechanisms for application-level failure recovery. As such, we present: (i) an application-level library to checkpoint and restore data, (ii) extensions of NPB benchmarks for fault tolerance based on different strategies, (iii) a fault injection tool, and (iv) some preliminary results that show the impact of such fault tolerant strategies on the application execution.« less
Neural-Network-Based Adaptive Decentralized Fault-Tolerant Control for a Class of Interconnected Nonlinear Systems.

PubMed

Li, Xiao-Jian; Yang, Guang-Hong

2018-01-01

This paper is concerned with the adaptive decentralized fault-tolerant tracking control problem for a class of uncertain interconnected nonlinear systems with unknown strong interconnections. An algebraic graph theory result is introduced to address the considered interconnections. In addition, to achieve the desirable tracking performance, a neural-network-based robust adaptive decentralized fault-tolerant control (FTC) scheme is given to compensate the actuator faults and system uncertainties. Furthermore, via the Lyapunov analysis method, it is proven that all the signals of the resulting closed-loop system are semiglobally bounded, and the tracking errors of each subsystem exponentially converge to a compact set, whose radius is adjustable by choosing different controller design parameters. Finally, the effectiveness and advantages of the proposed FTC approach are illustrated with two simulated examples.
Achieving Agreement in Three Rounds with Bounded-Byzantine Faults

NASA Technical Reports Server (NTRS)

Malekpour, Mahyar, R.

2017-01-01

A three-round algorithm is presented that guarantees agreement in a system of K greater than or equal to 3F+1 nodes provided each faulty node induces no more than F faults and each good node experiences no more than F faults, where, F is the maximum number of simultaneous faults in the network. The algorithm is based on the Oral Message algorithm of Lamport, Shostak, and Pease and is scalable with respect to the number of nodes in the system and applies equally to traditional node-fault model as well as the link-fault model. We also present a mechanical verification of the algorithm focusing on verifying the correctness of a bounded model of the algorithm as well as confirming claims of determinism.
Research on Mechanical Fault Prediction Algorithm for Circuit Breaker Based on Sliding Time Window and ANN

NASA Astrophysics Data System (ADS)

Wang, Xiaohua; Rong, Mingzhe; Qiu, Juan; Liu, Dingxin; Su, Biao; Wu, Yi

A new type of algorithm for predicting the mechanical faults of a vacuum circuit breaker (VCB) based on an artificial neural network (ANN) is proposed in this paper. There are two types of mechanical faults in a VCB: operation mechanism faults and tripping circuit faults. An angle displacement sensor is used to measure the main axle angle displacement which reflects the displacement of the moving contact, to obtain the state of the operation mechanism in the VCB, while a Hall current sensor is used to measure the trip coil current, which reflects the operation state of the tripping circuit. Then an ANN prediction algorithm based on a sliding time window is proposed in this paper and successfully used to predict mechanical faults in a VCB. The research results in this paper provide a theoretical basis for the realization of online monitoring and fault diagnosis of a VCB.
Software dependability in the Tandem GUARDIAN system

NASA Technical Reports Server (NTRS)

Lee, Inhwan; Iyer, Ravishankar K.

1995-01-01

Based on extensive field failure data for Tandem's GUARDIAN operating system this paper discusses evaluation of the dependability of operational software. Software faults considered are major defects that result in processor failures and invoke backup processes to take over. The paper categorizes the underlying causes of software failures and evaluates the effectiveness of the process pair technique in tolerating software faults. A model to describe the impact of software faults on the reliability of an overall system is proposed. The model is used to evaluate the significance of key factors that determine software dependability and to identify areas for improvement. An analysis of the data shows that about 77% of processor failures that are initially considered due to software are confirmed as software problems. The analysis shows that the use of process pairs to provide checkpointing and restart (originally intended for tolerating hardware faults) allows the system to tolerate about 75% of reported software faults that result in processor failures. The loose coupling between processors, which results in the backup execution (the processor state and the sequence of events) being different from the original execution, is a major reason for the measured software fault tolerance. Over two-thirds (72%) of measured software failures are recurrences of previously reported faults. Modeling, based on the data, shows that, in addition to reducing the number of software faults, software dependability can be enhanced by reducing the recurrence rate.
Software reliability through fault-avoidance and fault-tolerance

NASA Technical Reports Server (NTRS)

Vouk, Mladen A.; Mcallister, David F.

1993-01-01

Strategies and tools for the testing, risk assessment and risk control of dependable software-based systems were developed. Part of this project consists of studies to enable the transfer of technology to industry, for example the risk management techniques for safety-concious systems. Theoretical investigations of Boolean and Relational Operator (BRO) testing strategy were conducted for condition-based testing. The Basic Graph Generation and Analysis tool (BGG) was extended to fully incorporate several variants of the BRO metric. Single- and multi-phase risk, coverage and time-based models are being developed to provide additional theoretical and empirical basis for estimation of the reliability and availability of large, highly dependable software. A model for software process and risk management was developed. The use of cause-effect graphing for software specification and validation was investigated. Lastly, advanced software fault-tolerance models were studied to provide alternatives and improvements in situations where simple software fault-tolerance strategies break down.
Diagnostic emulation: Implementation and user's guide

NASA Technical Reports Server (NTRS)

Becher, Bernice

1987-01-01

The Diagnostic Emulation Technique was developed within the System Validation Methods Branch as a part of the development of methods for the analysis of the reliability of highly reliable, fault tolerant digital avionics systems. This is a general technique which allows for the emulation of a digital hardware system. The technique is general in the sense that it is completely independent of the particular target hardware which is being emulated. Parts of the system are described and emulated at the logic or gate level, while other parts of the system are described and emulated at the functional level. This algorithm allows for the insertion of faults into the system, and for the observation of the response of the system to these faults. This allows for controlled and accelerated testing of system reaction to hardware failures in the target machine. This document describes in detail how the algorithm was implemented at NASA Langley Research Center and gives instructions for using the system.
Integral Sliding Mode Fault-Tolerant Control for Uncertain Linear Systems Over Networks With Signals Quantization.

PubMed

Hao, Li-Ying; Park, Ju H; Ye, Dan

2017-09-01

In this paper, a new robust fault-tolerant compensation control method for uncertain linear systems over networks is proposed, where only quantized signals are assumed to be available. This approach is based on the integral sliding mode (ISM) method where two kinds of integral sliding surfaces are constructed. One is the continuous-state-dependent surface with the aim of sliding mode stability analysis and the other is the quantization-state-dependent surface, which is used for ISM controller design. A scheme that combines the adaptive ISM controller and quantization parameter adjustment strategy is then proposed. Through utilizing H ∞ control analytical technique, once the system is in the sliding mode, the nature of performing disturbance attenuation and fault tolerance from the initial time can be found without requiring any fault information. Finally, the effectiveness of our proposed ISM control fault-tolerant schemes against quantization errors is demonstrated in the simulation.
Fault tolerance in computational grids: perspectives, challenges, and issues.

PubMed

Haider, Sajjad; Nazir, Babar

2016-01-01

Computational grids are established with the intention of providing shared access to hardware and software based resources with special reference to increased computational capabilities. Fault tolerance is one of the most important issues faced by the computational grids. The main contribution of this survey is the creation of an extended classification of problems that incur in the computational grid environments. The proposed classification will help researchers, developers, and maintainers of grids to understand the types of issues to be anticipated. Moreover, different types of problems, such as omission, interaction, and timing related have been identified that need to be handled on various layers of the computational grid. In this survey, an analysis and examination is also performed pertaining to the fault tolerance and fault detection mechanisms. Our conclusion is that a dependable and reliable grid can only be established when more emphasis is on fault identification. Moreover, our survey reveals that adaptive and intelligent fault identification, and tolerance techniques can improve the dependability of grid working environments.
Fault-tolerant software - Experiment with the sift operating system. [Software Implemented Fault Tolerance computer

NASA Technical Reports Server (NTRS)

Brunelle, J. E.; Eckhardt, D. E., Jr.

1985-01-01

Results are presented of an experiment conducted in the NASA Avionics Integrated Research Laboratory (AIRLAB) to investigate the implementation of fault-tolerant software techniques on fault-tolerant computer architectures, in particular the Software Implemented Fault Tolerance (SIFT) computer. The N-version programming and recovery block techniques were implemented on a portion of the SIFT operating system. The results indicate that, to effectively implement fault-tolerant software design techniques, system requirements will be impacted and suggest that retrofitting fault-tolerant software on existing designs will be inefficient and may require system modification.
Fenix, A Fault Tolerant Programming Framework for MPI Applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gamel, Marc; Teranihi, Keita; Valenzuela, Eric

2016-10-05

Fenix provides APIs to allow the users to add fault tolerance capability to MPI-based parallel programs in a transparent manner. Fenix-enabled programs can run through process failures during program execution using a pool of spare processes accommodated by Fenix.
Sliding mode based fault detection, reconstruction and fault tolerant control scheme for motor systems.

PubMed

Mekki, Hemza; Benzineb, Omar; Boukhetala, Djamel; Tadjine, Mohamed; Benbouzid, Mohamed

2015-07-01

The fault-tolerant control problem belongs to the domain of complex control systems in which inter-control-disciplinary information and expertise are required. This paper proposes an improved faults detection, reconstruction and fault-tolerant control (FTC) scheme for motor systems (MS) with typical faults. For this purpose, a sliding mode controller (SMC) with an integral sliding surface is adopted. This controller can make the output of system to track the desired position reference signal in finite-time and obtain a better dynamic response and anti-disturbance performance. But this controller cannot deal directly with total system failures. However an appropriate combination of the adopted SMC and sliding mode observer (SMO), later it is designed to on-line detect and reconstruct the faults and also to give a sensorless control strategy which can achieve tolerance to a wide class of total additive failures. The closed-loop stability is proved, using the Lyapunov stability theory. Simulation results in healthy and faulty conditions confirm the reliability of the suggested framework. Copyright © 2015 ISA. Published by Elsevier Ltd. All rights reserved.
Aircraft applications of fault detection and isolation techniques

NASA Astrophysics Data System (ADS)

Marcos Esteban, Andres

In this thesis the problems of fault detection & isolation and fault tolerant systems are studied from the perspective of LTI frequency-domain, model-based techniques. Emphasis is placed on the applicability of these LTI techniques to nonlinear models, especially to aerospace systems. Two applications of Hinfinity LTI fault diagnosis are given using an open-loop (no controller) design approach: one for the longitudinal motion of a Boeing 747-100/200 aircraft, the other for a turbofan jet engine. An algorithm formalizing a robust identification approach based on model validation ideas is also given and applied to the previous jet engine. A general linear fractional transformation formulation is given in terms of the Youla and Dual Youla parameterizations for the integrated (control and diagnosis filter) approach. This formulation provides better insight into the trade-off between the control and the diagnosis objectives. It also provides the basic groundwork towards the development of nested schemes for the integrated approach. These nested structures allow iterative improvements on the control/filter Youla parameters based on successive identification of the system uncertainty (as given by the Dual Youla parameter). The thesis concludes with an application of Hinfinity LTI techniques to the integrated design for the longitudinal motion of the previous Boeing 747-100/200 model.
Adaptive sensor-fault tolerant control for a class of multivariable uncertain nonlinear systems.

PubMed

Khebbache, Hicham; Tadjine, Mohamed; Labiod, Salim; Boulkroune, Abdesselem

2015-03-01

This paper deals with the active fault tolerant control (AFTC) problem for a class of multiple-input multiple-output (MIMO) uncertain nonlinear systems subject to sensor faults and external disturbances. The proposed AFTC method can tolerate three additive (bias, drift and loss of accuracy) and one multiplicative (loss of effectiveness) sensor faults. By employing backstepping technique, a novel adaptive backstepping-based AFTC scheme is developed using the fact that sensor faults and system uncertainties (including external disturbances and unexpected nonlinear functions caused by sensor faults) can be on-line estimated and compensated via robust adaptive schemes. The stability analysis of the closed-loop system is rigorously proven using a Lyapunov approach. The effectiveness of the proposed controller is illustrated by two simulation examples. Copyright © 2014 ISA. Published by Elsevier Ltd. All rights reserved.
Information Weighted Consensus for Distributed Estimation in Vision Networks

ERIC Educational Resources Information Center

Kamal, Ahmed Tashrif

2013-01-01

Due to their high fault-tolerance, ease of installation and scalability to large networks, distributed algorithms have recently gained immense popularity in the sensor networks community, especially in computer vision. Multi-target tracking in a camera network is one of the fundamental problems in this domain. Distributed estimation algorithms…
Reliability model derivation of a fault-tolerant, dual, spare-switching, digital computer system

NASA Technical Reports Server (NTRS)

1974-01-01

A computer based reliability projection aid, tailored specifically for application in the design of fault-tolerant computer systems, is described. Its more pronounced characteristics include the facility for modeling systems with two distinct operational modes, measuring the effect of both permanent and transient faults, and calculating conditional system coverage factors. The underlying conceptual principles, mathematical models, and computer program implementation are presented.
Quantitative fault tolerant control design for a hydraulic actuator with a leaking piston seal

NASA Astrophysics Data System (ADS)

Karpenko, Mark

Hydraulic actuators are complex fluid power devices whose performance can be degraded in the presence of system faults. In this thesis a linear, fixed-gain, fault tolerant controller is designed that can maintain the positioning performance of an electrohydraulic actuator operating under load with a leaking piston seal and in the presence of parametric uncertainties. Developing a control system tolerant to this class of internal leakage fault is important since a leaking piston seal can be difficult to detect, unless the actuator is disassembled. The designed fault tolerant control law is of low-order, uses only the actuator position as feedback, and can: (i) accommodate nonlinearities in the hydraulic functions, (ii) maintain robustness against typical uncertainties in the hydraulic system parameters, and (iii) keep the positioning performance of the actuator within prescribed tolerances despite an internal leakage fault that can bypass up to 40% of the rated servovalve flow across the actuator piston. Experimental tests verify the functionality of the fault tolerant control under normal and faulty operating conditions. The fault tolerant controller is synthesized based on linear time-invariant equivalent (LTIE) models of the hydraulic actuator using the quantitative feedback theory (QFT) design technique. A numerical approach for identifying LTIE frequency response functions of hydraulic actuators from acceptable input-output responses is developed so that linearizing the hydraulic functions can be avoided. The proposed approach can properly identify the features of the hydraulic actuator frequency response that are important for control system design and requires no prior knowledge about the asymptotic behavior or structure of the LTIE transfer functions. A distributed hardware-in-the-loop (HIL) simulation architecture is constructed that enables the performance of the proposed fault tolerant control law to be further substantiated, under realistic operating conditions. Using the HIL framework, the fault tolerant hydraulic actuator is operated as a flight control actuator against the real-time numerical simulation of a high-performance jet aircraft. A robust electrohydraulic loading system is also designed using QFT so that the in-flight aerodynamic load can be experimentally replicated. The results of the HIL experiments show that using the fault tolerant controller to compensate the internal leakage fault at the actuator level can benefit the flight performance of the airplane.
Fault-tolerant cooperative output regulation for multi-vehicle systems with sensor faults

NASA Astrophysics Data System (ADS)

Qin, Liguo; He, Xiao; Zhou, D. H.

2017-10-01

This paper presents a unified framework of fault diagnosis and fault-tolerant cooperative output regulation (FTCOR) for a linear discrete-time multi-vehicle system with sensor faults. The FTCOR control law is designed through three steps. A cooperative output regulation (COR) controller is designed based on the internal mode principle when there are no sensor faults. A sufficient condition on the existence of the COR controller is given based on the discrete-time algebraic Riccati equation (DARE). Then, a decentralised fault diagnosis scheme is designed to cope with sensor faults occurring in followers. A residual generator is developed to detect sensor faults of each follower, and a bank of fault-matching estimators are proposed to isolate and estimate sensor faults of each follower. Unlike the current distributed fault diagnosis for multi-vehicle systems, the presented decentralised fault diagnosis scheme in each vehicle reduces the communication and computation load by only using the information of the vehicle. By combing the sensor fault estimation and the COR control law, an FTCOR controller is proposed. Finally, the simulation results demonstrate the effectiveness of the FTCOR controller.
Health management and controls for earth to orbit propulsion systems

NASA Technical Reports Server (NTRS)

Bickford, R. L.

1992-01-01

Fault detection and isolation for advanced rocket engine controllers are discussed focusing on advanced sensing systems and software which significantly improve component failure detection for engine safety and health management. Aerojet's Space Transportation Main Engine controller for the National Launch System is the state of the art in fault tolerant engine avionics. Health management systems provide high levels of automated fault coverage and significantly improve vehicle delivered reliability and lower preflight operations costs. Key technologies, including the sensor data validation algorithms and flight capable spectrometers, have been demonstrated in ground applications and are found to be suitable for bridging programs into flight applications.

Application of Fault-Tolerant Computing For Spacecraft Using Commercial-Off-The-Shelf Microprocessors

DTIC Science & Technology

2000-06-01

real - time operating system and design of a human-computer interface (HCI) for a triple modular redundant (TMR) fault-tolerant microprocessor for use in space-based applications. Once disadvantage of using COTS hardware components is their susceptibility to the radiation effects present in the space environment. and specifically, radiation-induced single-event upsets (SEUs). In the event of an SEU, a fault-tolerant system can mitigate the effects of the upset and continue to process from the last known correct system state. The TMR basic hardware
Introduction

NASA Astrophysics Data System (ADS)

de Laat, Cees; Develder, Chris; Jukan, Admela; Mambretti, Joe

This topic is devoted to communication issues in scalable compute and storage systems, such as parallel computers, networks of workstations, and clusters. All aspects of communication in modern systems were solicited, including advances in the design, implementation, and evaluation of interconnection networks, network interfaces, system and storage area networks, on-chip interconnects, communication protocols, routing and communication algorithms, and communication aspects of parallel and distributed algorithms. In total 15 papers were submitted to this topic of which we selected the 7 strongest papers. We grouped the papers in two sessions of 3 papers each and one paper was selected for the best paper session. We noted a number of papers dealing with changing topologies, stability and forwarding convergence in source routing based cluster interconnect network architectures. We grouped these for the first session. The authors of the paper titled: “Implementing a Change Assimilation Mechanism for Source Routing Interconnects” propose a mechanism that can obtain the new topology, and compute and distribute a new set of fabric paths to the source routed network end points to minimize the impact on the forwarding service. The article entitled “Dependability Analysis of a Fault-tolerant Network Reconfiguration Strateg” reports on a case study analyzing the effects of network size, mean time to node failure, mean time to node repair, mean time to network repair and coverage of the failure when using a 2D mesh network with a fault-tolerant mechanism (similar to the one used in the BlueGene/L system), that is able to remove rows and/or columns in the presence of failures. The last paper in this session: “RecTOR: A New and Efficient Method for Dynamic Network Reconfiguration” presents a new dynamic reconfiguration method, that ensures deadlock-freedom during the reconfiguration without causing performance degradation such as increased latency or decreased throughput. The second session groups 3 papers presenting methods, protocols and architectures that enhance capacities in the Networks. The paper titled: “NIC-assisted Cache-Efficient Receive Stack for Message Passing over Ethernet” presents the addition of multiqueue support in the Open-MX receive stack so that all incoming packets for the same process are treated on the same core. It then introduces the idea of binding the target end process near its dedicated receive queue. In general this multiqueue receive stack performs better than the original single queue stack, especially on large communication patterns where multiple processes are involved and manual binding is difficult. The authors of: “A Multipath Fault-Tolerant Routing Method for High-Speed Interconnection Networks” focus on the problem of fault tolerance for high-speed interconnection networks by designing a fault tolerant routing method. The goal was to solve a certain number of link and node failures, considering its impact, and occurrence probability. Their experiments show that their method allows applications to successfully finalize their execution in the presence of several faults, with an average performance value of 97% with respect to the fault-free scenarios. The paper: “Hardware implementation study of the Self-Clocked Fair Queuing Credit Aware (SCFQ-CA) and Deficit Round Robin Credit Aware (DRR-CA) scheduling algorithms” proposes specific implementations of the two schedulers taking into account the characteristics of current high-performance networks. A comparison is presented on the complexity of these two algorithms in terms of silicon area and computation delay. Finally we selected one paper for the special paper session: “A Case Study of Communication Optimizations on 3D Mesh Interconnects”. In this paper the authors present topology aware mapping as a technique to optimize communication on 3-dimensional mesh interconnects and hence improve performance. Results are presented for OpenAtom on up to 16,384 processors of Blue Gene/L, 8,192 processors of Blue Gene/P and 2,048 processors of Cray XT3.
Self-adaptive Fault-Tolerance of HLA-Based Simulations in the Grid Environment

NASA Astrophysics Data System (ADS)

Huang, Jijie; Chai, Xudong; Zhang, Lin; Li, Bo Hu

The objects of a HLA-based simulation can access model services to update their attributes. However, the grid server may be overloaded and refuse the model service to handle objects accesses. Because these objects have been accessed this model service during last simulation loop and their medium state are stored in this server, this may terminate the simulation. A fault-tolerance mechanism must be introduced into simulations. But the traditional fault-tolerance methods cannot meet the above needs because the transmission latency between a federate and the RTI in grid environment varies from several hundred milliseconds to several seconds. By adding model service URLs to the OMT and expanding the HLA services and model services with some interfaces, this paper proposes a self-adaptive fault-tolerance mechanism of simulations according to the characteristics of federates accessing model services. Benchmark experiments indicate that the expanded HLA/RTI can make simulations self-adaptively run in the grid environment.
Nonuniform code concatenation for universal fault-tolerant quantum computing

NASA Astrophysics Data System (ADS)

Nikahd, Eesa; Sedighi, Mehdi; Saheb Zamani, Morteza

2017-09-01

Using transversal gates is a straightforward and efficient technique for fault-tolerant quantum computing. Since transversal gates alone cannot be computationally universal, they must be combined with other approaches such as magic state distillation, code switching, or code concatenation to achieve universality. In this paper we propose an alternative approach for universal fault-tolerant quantum computing, mainly based on the code concatenation approach proposed in [T. Jochym-O'Connor and R. Laflamme, Phys. Rev. Lett. 112, 010505 (2014), 10.1103/PhysRevLett.112.010505], but in a nonuniform fashion. The proposed approach is described based on nonuniform concatenation of the 7-qubit Steane code with the 15-qubit Reed-Muller code, as well as the 5-qubit code with the 15-qubit Reed-Muller code, which lead to two 49-qubit and 47-qubit codes, respectively. These codes can correct any arbitrary single physical error with the ability to perform a universal set of fault-tolerant gates, without using magic state distillation.
Determination of the optimal tolerance for MLC positioning in sliding window and VMAT techniques

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hernandez, V., E-mail: vhernandezmasgrau@gmail.com; Abella, R.; Calvo, J. F.

2015-04-15

Purpose: Several authors have recommended a 2 mm tolerance for multileaf collimator (MLC) positioning in sliding window treatments. In volumetric modulated arc therapy (VMAT) treatments, however, the optimal tolerance for MLC positioning remains unknown. In this paper, the authors present the results of a multicenter study to determine the optimal tolerance for both techniques. Methods: The procedure used is based on dynalog file analysis. The study was carried out using seven Varian linear accelerators from five different centers. Dynalogs were collected from over 100 000 clinical treatments and in-house software was used to compute the number of tolerance faults as amore » function of the user-defined tolerance. Thus, the optimal value for this tolerance, defined as the lowest achievable value, was investigated. Results: Dynalog files accurately predict the number of tolerance faults as a function of the tolerance value, especially for low fault incidences. All MLCs behaved similarly and the Millennium120 and the HD120 models yielded comparable results. In sliding window techniques, the number of beams with an incidence of hold-offs >1% rapidly decreases for a tolerance of 1.5 mm. In VMAT techniques, the number of tolerance faults sharply drops for tolerances around 2 mm. For a tolerance of 2.5 mm, less than 0.1% of the VMAT arcs presented tolerance faults. Conclusions: Dynalog analysis provides a feasible method for investigating the optimal tolerance for MLC positioning in dynamic fields. In sliding window treatments, the tolerance of 2 mm was found to be adequate, although it can be reduced to 1.5 mm. In VMAT treatments, the typically used 5 mm tolerance is excessively high. Instead, a tolerance of 2.5 mm is recommended.« less
A real-time, practical sensor fault-tolerant module for robust EMG pattern recognition.

PubMed

Zhang, Xiaorong; Huang, He

2015-02-19

Unreliability of surface EMG recordings over time is a challenge for applying the EMG pattern recognition (PR)-controlled prostheses in clinical practice. Our previous study proposed a sensor fault-tolerant module (SFTM) by utilizing redundant information in multiple EMG signals. The SFTM consists of multiple sensor fault detectors and a self-recovery mechanism that can identify anomaly in EMG signals and remove the recordings of the disturbed signals from the input of the pattern classifier to recover the PR performance. While the proposed SFTM has shown great promise, the previous design is impractical. A practical SFTM has to be fast enough, lightweight, automatic, and robust under different conditions with or without disturbances. This paper presented a real-time, practical SFTM towards robust EMG PR. A novel fast LDA retraining algorithm and a fully automatic sensor fault detector based on outlier detection were developed, which allowed the SFTM to promptly detect disturbances and recover the PR performance immediately. These components of SFTM were then integrated with the EMG PR module and tested on five able-bodied subjects and a transradial amputee in real-time for classifying multiple hand and wrist motions under different conditions with different disturbance types and levels. The proposed fast LDA retraining algorithm significantly shortened the retraining time from nearly 1 s to less than 4 ms when tested on the embedded system prototype, which demonstrated the feasibility of a nearly "zero-delay" SFTM that is imperceptible to the users. The results of the real-time tests suggested that the SFTM was able to handle different types of disturbances investigated in this study and significantly improve the classification performance when one or multiple EMG signals were disturbed. In addition, the SFTM could also maintain the system's classification performance when there was no disturbance. This paper presented a real-time, lightweight, and automatic SFTM, which paved the way for reliable and robust EMG PR for prosthesis control.
An improved fault-tolerant control scheme for PWM inverter-fed induction motor-based EVs.

PubMed

Tabbache, Bekheïra; Benbouzid, Mohamed; Kheloui, Abdelaziz; Bourgeot, Jean-Matthieu; Mamoune, Abdeslam

2013-11-01

This paper proposes an improved fault-tolerant control scheme for PWM inverter-fed induction motor-based electric vehicles. The proposed strategy deals with power switch (IGBTs) failures mitigation within a reconfigurable induction motor control. To increase the vehicle powertrain reliability regarding IGBT open-circuit failures, 4-wire and 4-leg PWM inverter topologies are investigated and their performances discussed in a vehicle context. The proposed fault-tolerant topologies require only minimum hardware modifications to the conventional off-the-shelf six-switch three-phase drive, mitigating the IGBTs failures by specific inverter control. Indeed, the two topologies exploit the induction motor neutral accessibility for fault-tolerant purposes. The 4-wire topology uses then classical hysteresis controllers to account for the IGBT failures. The 4-leg topology, meanwhile, uses a specific 3D space vector PWM to handle vehicle requirements in terms of size (DC bus capacitors) and cost (IGBTs number). Experiments on an induction motor drive and simulations on an electric vehicle are carried-out using a European urban driving cycle to show that the proposed fault-tolerant control approach is effective and provides a simple configuration with high performance in terms of speed and torque responses. Copyright © 2013 ISA. Published by Elsevier Ltd. All rights reserved.
A Fault-tolerant RISC Microprocessor for Spacecraft Applications

NASA Technical Reports Server (NTRS)

Timoc, Constantin; Benz, Harry

1990-01-01

Viewgraphs on a fault-tolerant RISC microprocessor for spacecraft applications are presented. Topics covered include: reduced instruction set computer; fault tolerant registers; fault tolerant ALU; and double rail CMOS logic.
Fault Tolerance in ZigBee Wireless Sensor Networks

NASA Technical Reports Server (NTRS)

Alena, Richard; Gilstrap, Ray; Baldwin, Jarren; Stone, Thom; Wilson, Pete

2011-01-01

Wireless sensor networks (WSN) based on the IEEE 802.15.4 Personal Area Network standard are finding increasing use in the home automation and emerging smart energy markets. The network and application layers, based on the ZigBee 2007 PRO Standard, provide a convenient framework for component-based software that supports customer solutions from multiple vendors. This technology is supported by System-on-a-Chip solutions, resulting in extremely small and low-power nodes. The Wireless Connections in Space Project addresses the aerospace flight domain for both flight-critical and non-critical avionics. WSNs provide the inherent fault tolerance required for aerospace applications utilizing such technology. The team from Ames Research Center has developed techniques for assessing the fault tolerance of ZigBee WSNs challenged by radio frequency (RF) interference or WSN node failure.
Measurement and analysis of operating system fault tolerance

NASA Technical Reports Server (NTRS)

Lee, I.; Tang, D.; Iyer, R. K.

1992-01-01

This paper demonstrates a methodology to model and evaluate the fault tolerance characteristics of operational software. The methodology is illustrated through case studies on three different operating systems: the Tandem GUARDIAN fault-tolerant system, the VAX/VMS distributed system, and the IBM/MVS system. Measurements are made on these systems for substantial periods to collect software error and recovery data. In addition to investigating basic dependability characteristics such as major software problems and error distributions, we develop two levels of models to describe error and recovery processes inside an operating system and on multiple instances of an operating system running in a distributed environment. Based on the models, reward analysis is conducted to evaluate the loss of service due to software errors and the effect of the fault-tolerance techniques implemented in the systems. Software error correlation in multicomputer systems is also investigated.
Step-by-step magic state encoding for efficient fault-tolerant quantum computation

PubMed Central

Goto, Hayato

2014-01-01

Quantum error correction allows one to make quantum computers fault-tolerant against unavoidable errors due to decoherence and imperfect physical gate operations. However, the fault-tolerant quantum computation requires impractically large computational resources for useful applications. This is a current major obstacle to the realization of a quantum computer. In particular, magic state distillation, which is a standard approach to universality, consumes the most resources in fault-tolerant quantum computation. For the resource problem, here we propose step-by-step magic state encoding for concatenated quantum codes, where magic states are encoded step by step from the physical level to the logical one. To manage errors during the encoding, we carefully use error detection. Since the sizes of intermediate codes are small, it is expected that the resource overheads will become lower than previous approaches based on the distillation at the logical level. Our simulation results suggest that the resource requirements for a logical magic state will become comparable to those for a single logical controlled-NOT gate. Thus, the present method opens a new possibility for efficient fault-tolerant quantum computation. PMID:25511387
Step-by-step magic state encoding for efficient fault-tolerant quantum computation.

PubMed

Goto, Hayato

2014-12-16

Quantum error correction allows one to make quantum computers fault-tolerant against unavoidable errors due to decoherence and imperfect physical gate operations. However, the fault-tolerant quantum computation requires impractically large computational resources for useful applications. This is a current major obstacle to the realization of a quantum computer. In particular, magic state distillation, which is a standard approach to universality, consumes the most resources in fault-tolerant quantum computation. For the resource problem, here we propose step-by-step magic state encoding for concatenated quantum codes, where magic states are encoded step by step from the physical level to the logical one. To manage errors during the encoding, we carefully use error detection. Since the sizes of intermediate codes are small, it is expected that the resource overheads will become lower than previous approaches based on the distillation at the logical level. Our simulation results suggest that the resource requirements for a logical magic state will become comparable to those for a single logical controlled-NOT gate. Thus, the present method opens a new possibility for efficient fault-tolerant quantum computation.
Adaptive extended-state observer-based fault tolerant attitude control for spacecraft with reaction wheels

NASA Astrophysics Data System (ADS)

Ran, Dechao; Chen, Xiaoqian; de Ruiter, Anton; Xiao, Bing

2018-04-01

This study presents an adaptive second-order sliding control scheme to solve the attitude fault tolerant control problem of spacecraft subject to system uncertainties, external disturbances and reaction wheel faults. A novel fast terminal sliding mode is preliminarily designed to guarantee that finite-time convergence of the attitude errors can be achieved globally. Based on this novel sliding mode, an adaptive second-order observer is then designed to reconstruct the system uncertainties and the actuator faults. One feature of the proposed observer is that the design of the observer does not necessitate any priori information of the upper bounds of the system uncertainties and the actuator faults. In view of the reconstructed information supplied by the designed observer, a second-order sliding mode controller is developed to accomplish attitude maneuvers with great robustness and precise tracking accuracy. Theoretical stability analysis proves that the designed fault tolerant control scheme can achieve finite-time stability of the closed-loop system, even in the presence of reaction wheel faults and system uncertainties. Numerical simulations are also presented to demonstrate the effectiveness and superiority of the proposed control scheme over existing methodologies.
Hybrid information privacy system: integration of chaotic neural network and RSA coding

NASA Astrophysics Data System (ADS)

Hsu, Ming-Kai; Willey, Jeff; Lee, Ting N.; Szu, Harold H.

2005-03-01

Electronic mails are adopted worldwide; most are easily hacked by hackers. In this paper, we purposed a free, fast and convenient hybrid privacy system to protect email communication. The privacy system is implemented by combining private security RSA algorithm with specific chaos neural network encryption process. The receiver can decrypt received email as long as it can reproduce the specified chaos neural network series, so called spatial-temporal keys. The chaotic typing and initial seed value of chaos neural network series, encrypted by the RSA algorithm, can reproduce spatial-temporal keys. The encrypted chaotic typing and initial seed value are hidden in watermark mixed nonlinearly with message media, wrapped with convolution error correction codes for wireless 3rd generation cellular phones. The message media can be an arbitrary image. The pattern noise has to be considered during transmission and it could affect/change the spatial-temporal keys. Since any change/modification on chaotic typing or initial seed value of chaos neural network series is not acceptable, the RSA codec system must be robust and fault-tolerant via wireless channel. The robust and fault-tolerant properties of chaos neural networks (CNN) were proved by a field theory of Associative Memory by Szu in 1997. The 1-D chaos generating nodes from the logistic map having arbitrarily negative slope a = p/q generating the N-shaped sigmoid was given first by Szu in 1992. In this paper, we simulated the robust and fault-tolerance properties of CNN under additive noise and pattern noise. We also implement a private version of RSA coding and chaos encryption process on messages.
Low-Power Fault Tolerance for Spacecraft FPGA-Based Numerical Computing

DTIC Science & Technology

2006-09-01

Ranganathan , “Power Management – Guest Lecture for CS4135, NPS,” Naval Postgraduate School, Nov 2004 [32] R. L. Phelps, “Operational Experiences with the...4302, and to the Office of Management and Budget, Paperwork Reduction Project (0704-0188) Washington DC 20503. 1. AGENCY USE ONLY (Leave blank) 2...undesirable, are not necessarily harmful. Our intent is to prevent errors by properly managing faults. This research focuses on developing fault-tolerant
Epidemic failure detection and consensus for extreme parallelism

DOE PAGES

Katti, Amogh; Di Fatta, Giuseppe; Naughton, Thomas; ...

2017-02-01

Future extreme-scale high-performance computing systems will be required to work under frequent component failures. The MPI Forum s User Level Failure Mitigation proposal has introduced an operation, MPI Comm shrink, to synchronize the alive processes on the list of failed processes, so that applications can continue to execute even in the presence of failures by adopting algorithm-based fault tolerance techniques. This MPI Comm shrink operation requires a failure detection and consensus algorithm. This paper presents three novel failure detection and consensus algorithms using Gossiping. The proposed algorithms were implemented and tested using the Extreme-scale Simulator. The results show that inmore » all algorithms the number of Gossip cycles to achieve global consensus scales logarithmically with system size. The second algorithm also shows better scalability in terms of memory and network bandwidth usage and a perfect synchronization in achieving global consensus. The third approach is a three-phase distributed failure detection and consensus algorithm and provides consistency guarantees even in very large and extreme-scale systems while at the same time being memory and bandwidth efficient.« less
Coordinated Fault-Tolerance for High-Performance Computing Final Project Report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Panda, Dhabaleswar Kumar; Beckman, Pete

2011-07-28

With the Coordinated Infrastructure for Fault Tolerance Systems (CIFTS, as the original project came to be called) project, our aim has been to understand and tackle the following broad research questions, the answers to which will help the HEC community analyze and shape the direction of research in the field of fault tolerance and resiliency on future high-end leadership systems. Will availability of global fault information, obtained by fault information exchange between the different HEC software on a system, allow individual system software to better detect, diagnose, and adaptively respond to faults? If fault-awareness is raised throughout the system throughmore » fault information exchange, is it possible to get all system software working together to provide a more comprehensive end-to-end fault management on the system? What are the missing fault-tolerance features that widely used HEC system software lacks today that would inhibit such software from taking advantage of systemwide global fault information? What are the practical limitations of a systemwide approach for end-to-end fault management based on fault awareness and coordination? What mechanisms, tools, and technologies are needed to bring about fault awareness and coordination of responses on a leadership-class system? What standards, outreach, and community interaction are needed for adoption of the concept of fault awareness and coordination for fault management on future systems? Keeping our overall objectives in mind, the CIFTS team has taken a parallel fourfold approach. Our central goal was to design and implement a light-weight, scalable infrastructure with a simple, standardized interface to allow communication of fault-related information through the system and facilitate coordinated responses. This work led to the development of the Fault Tolerance Backplane (FTB) publish-subscribe API specification, together with a reference implementation and several experimental implementations on top of existing publish-subscribe tools. We enhanced the intrinsic fault tolerance capabilities representative implementations of a variety of key HPC software subsystems and integrated them with the FTB. Targeting software subsystems included: MPI communication libraries, checkpoint/restart libraries, resource managers and job schedulers, and system monitoring tools. Leveraging the aforementioned infrastructure, as well as developing and utilizing additional tools, we have examined issues associated with expanded, end-to-end fault response from both system and application viewpoints. From the standpoint of system operations, we have investigated log and root cause analysis, anomaly detection and fault prediction, and generalized notification mechanisms. Our applications work has included libraries for fault-tolerance linear algebra, application frameworks for coupled multiphysics applications, and external frameworks to support the monitoring and response for general applications. Our final goal was to engage the high-end computing community to increase awareness of tools and issues around coordinated end-to-end fault management.« less
Sequential behavior and its inherent tolerance to memory faults.

NASA Technical Reports Server (NTRS)

Meyer, J. F.

1972-01-01

Representation of a memory fault of a sequential machine M by a function mu on the states of M and the result of the fault by an appropriately determined machine M(mu). Given some sequential behavior B, its inherent tolerance to memory faults can then be measured in terms of the minimum memory redundancy required to realize B with a state-assigned machine having fault tolerance type tau and fault tolerance level t. A behavior having maximum inherent tolerance is exhibited, and it is shown that behaviors of the same size can have different inherent tolerance.
The Design of Fault Tolerant Quantum Dot Cellular Automata Based Logic

NASA Technical Reports Server (NTRS)

Armstrong, C. Duane; Humphreys, William M.; Fijany, Amir

2002-01-01

As transistor geometries are reduced, quantum effects begin to dominate device performance. At some point, transistors cease to have the properties that make them useful computational components. New computing elements must be developed in order to keep pace with Moore s Law. Quantum dot cellular automata (QCA) represent an alternative paradigm to transistor-based logic. QCA architectures that are robust to manufacturing tolerances and defects must be developed. We are developing software that allows the exploration of fault tolerant QCA gate architectures by automating the specification, simulation, analysis and documentation processes.
COTS-Based Fault Tolerance in Deep Space: Qualitative and Quantitative Analyses of a Bus Network Architecture

NASA Technical Reports Server (NTRS)

Tai, Ann T.; Chau, Savio N.; Alkalai, Leon

2000-01-01

Using COTS products, standards and intellectual properties (IPs) for all the system and component interfaces is a crucial step toward significant reduction of both system cost and development cost as the COTS interfaces enable other COTS products and IPs to be readily accommodated by the target system architecture. With respect to the long-term survivable systems for deep-space missions, the major challenge for us is, under stringent power and mass constraints, to achieve ultra-high reliability of the system comprising COTS products and standards that are not developed for mission-critical applications. The spirit of our solution is to exploit the pertinent standard features of a COTS product to circumvent its shortcomings, though these standard features may not be originally designed for highly reliable systems. In this paper, we discuss our experiences and findings on the design of an IEEE 1394 compliant fault-tolerant COTS-based bus architecture. We first derive and qualitatively analyze a -'stacktree topology" that not only complies with IEEE 1394 but also enables the implementation of a fault-tolerant bus architecture without node redundancy. We then present a quantitative evaluation that demonstrates significant reliability improvement from the COTS-based fault tolerance.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Duan, Sisi; Li, Yun; Levitt, Karl N.

Consensus is a fundamental approach to implementing fault-tolerant services through replication where there exists a tradeoff between the cost and the resilience. For instance, Crash Fault Tolerant (CFT) protocols have a low cost but can only handle crash failures while Byzantine Fault Tolerant (BFT) protocols handle arbitrary failures but have a higher cost. Hybrid protocols enjoy the benefits of both high performance without failures and high resiliency under failures by switching among different subprotocols. However, it is challenging to determine which subprotocols should be used. We propose a moving target approach to switch among protocols according to the existing systemmore » and network vulnerability. At the core of our approach is a formalized cost model that evaluates the vulnerability and performance of consensus protocols based on real-time Intrusion Detection System (IDS) signals. Based on the evaluation results, we demonstrate that a safe, cheap, and unpredictable protocol is always used and a high IDS error rate can be tolerated.« less
Design and experimental validation for direct-drive fault-tolerant permanent-magnet vernier machines.

PubMed

Liu, Guohai; Yang, Junqin; Chen, Ming; Chen, Qian

2014-01-01

A fault-tolerant permanent-magnet vernier (FT-PMV) machine is designed for direct-drive applications, incorporating the merits of high torque density and high reliability. Based on the so-called magnetic gearing effect, PMV machines have the ability of high torque density by introducing the flux-modulation poles (FMPs). This paper investigates the fault-tolerant characteristic of PMV machines and provides a design method, which is able to not only meet the fault-tolerant requirements but also keep the ability of high torque density. The operation principle of the proposed machine has been analyzed. The design process and optimization are presented specifically, such as the combination of slots and poles, the winding distribution, and the dimensions of PMs and teeth. By using the time-stepping finite element method (TS-FEM), the machine performances are evaluated. Finally, the FT-PMV machine is manufactured, and the experimental results are presented to validate the theoretical analysis.
High-Threshold Fault-Tolerant Quantum Computation with Analog Quantum Error Correction

NASA Astrophysics Data System (ADS)

Fukui, Kosuke; Tomita, Akihisa; Okamoto, Atsushi; Fujii, Keisuke

2018-04-01

To implement fault-tolerant quantum computation with continuous variables, the Gottesman-Kitaev-Preskill (GKP) qubit has been recognized as an important technological element. However, it is still challenging to experimentally generate the GKP qubit with the required squeezing level, 14.8 dB, of the existing fault-tolerant quantum computation. To reduce this requirement, we propose a high-threshold fault-tolerant quantum computation with GKP qubits using topologically protected measurement-based quantum computation with the surface code. By harnessing analog information contained in the GKP qubits, we apply analog quantum error correction to the surface code. Furthermore, we develop a method to prevent the squeezing level from decreasing during the construction of the large-scale cluster states for the topologically protected, measurement-based, quantum computation. We numerically show that the required squeezing level can be relaxed to less than 10 dB, which is within the reach of the current experimental technology. Hence, this work can considerably alleviate this experimental requirement and take a step closer to the realization of large-scale quantum computation.
Application of composite dictionary multi-atom matching in gear fault diagnosis.

PubMed

Cui, Lingli; Kang, Chenhui; Wang, Huaqing; Chen, Peng

2011-01-01

The sparse decomposition based on matching pursuit is an adaptive sparse expression method for signals. This paper proposes an idea concerning a composite dictionary multi-atom matching decomposition and reconstruction algorithm, and the introduction of threshold de-noising in the reconstruction algorithm. Based on the structural characteristics of gear fault signals, a composite dictionary combining the impulse time-frequency dictionary and the Fourier dictionary was constituted, and a genetic algorithm was applied to search for the best matching atom. The analysis results of gear fault simulation signals indicated the effectiveness of the hard threshold, and the impulse or harmonic characteristic components could be separately extracted. Meanwhile, the robustness of the composite dictionary multi-atom matching algorithm at different noise levels was investigated. Aiming at the effects of data lengths on the calculation efficiency of the algorithm, an improved segmented decomposition and reconstruction algorithm was proposed, and the calculation efficiency of the decomposition algorithm was significantly enhanced. In addition it is shown that the multi-atom matching algorithm was superior to the single-atom matching algorithm in both calculation efficiency and algorithm robustness. Finally, the above algorithm was applied to gear fault engineering signals, and achieved good results.
Coordinated Fault Tolerance for High-Performance Computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dongarra, Jack; Bosilca, George; et al.

2013-04-08

Our work to meet our goal of end-to-end fault tolerance has focused on two areas: (1) improving fault tolerance in various software currently available and widely used throughout the HEC domain and (2) using fault information exchange and coordination to achieve holistic, systemwide fault tolerance and understanding how to design and implement interfaces for integrating fault tolerance features for multiple layers of the software stack—from the application, math libraries, and programming language runtime to other common system software such as jobs schedulers, resource managers, and monitoring tools.
Mechanical verification of a schematic Byzantine clock synchronization algorithm

NASA Technical Reports Server (NTRS)

Shankar, Natarajan

1991-01-01

Schneider generalizes a number of protocols for Byzantine fault tolerant clock synchronization and presents a uniform proof for their correctness. The authors present a machine checked proof of this schematic protocol that revises some of the details in Schneider's original analysis. The verification was carried out with the EHDM system developed at the SRI Computer Science Laboratory. The mechanically checked proofs include the verification that the egocentric mean function used in Lamport and Melliar-Smith's Interactive Convergence Algorithm satisfies the requirements of Schneider's protocol.
An improved CS-LSSVM algorithm-based fault pattern recognition of ship power equipments.

PubMed

Yang, Yifei; Tan, Minjia; Dai, Yuewei

2017-01-01

A ship power equipments' fault monitoring signal usually provides few samples and the data's feature is non-linear in practical situation. This paper adopts the method of the least squares support vector machine (LSSVM) to deal with the problem of fault pattern identification in the case of small sample data. Meanwhile, in order to avoid involving a local extremum and poor convergence precision which are induced by optimizing the kernel function parameter and penalty factor of LSSVM, an improved Cuckoo Search (CS) algorithm is proposed for the purpose of parameter optimization. Based on the dynamic adaptive strategy, the newly proposed algorithm improves the recognition probability and the searching step length, which can effectively solve the problems of slow searching speed and low calculation accuracy of the CS algorithm. A benchmark example demonstrates that the CS-LSSVM algorithm can accurately and effectively identify the fault pattern types of ship power equipments.
Experimental evaluation of certification trails using abstract data type validation

NASA Technical Reports Server (NTRS)

Wilson, Dwight S.; Sullivan, Gregory F.; Masson, Gerald M.

1993-01-01

Certification trails are a recently introduced and promising approach to fault-detection and fault-tolerance. Recent experimental work reveals many cases in which a certification-trail approach allows for significantly faster program execution time than a basic time-redundancy approach. Algorithms for answer-validation of abstract data types allow a certification trail approach to be used for a wide variety of problems. An attempt to assess the performance of algorithms utilizing certification trails on abstract data types is reported. Specifically, this method was applied to the following problems: heapsort, Hullman tree, shortest path, and skyline. Previous results used certification trails specific to a particular problem and implementation. The approach allows certification trails to be localized to 'data structure modules,' making the use of this technique transparent to the user of such modules.
Design of physical and logical topologies with fault-tolerant ability in wavelength-routed optical network

NASA Astrophysics Data System (ADS)

Chen, Chunfeng; Liu, Hua; Fan, Ge

2005-02-01

In this paper we consider the problem of designing a network of optical cross-connects(OXCs) to provide end-to-end lightpath services to label switched routers (LSRs). Like some previous work, we select the number of OXCs as our objective. Compared with the previous studies, we take into account the fault-tolerant characteristic of logical topology. First of all, using a Prufer number randomly generated, we generate a tree. By adding some edges to the tree, we can obtain a physical topology which consists of a certain number of OXCs and fiber links connecting OXCs. It is notable that we for the first time limit the number of layers of the tree produced according to the method mentioned above. Then we design the logical topologies based on the physical topologies mentioned above. In principle, we will select the shortest path in addition to some consideration on the load balancing of links and the limitation owing to the SRLG. Notably, we implement the routing algorithm for the nodes in increasing order of the degree of the nodes. With regarding to the problem of the wavelength assignment, we adopt the heuristic algorithm of the graph coloring commonly used. It is clear our problem is computationally intractable especially when the scale of the network is large. We adopt the taboo search algorithm to find the near optimal solution to our objective. We present numerical results for up to 1000 LSRs and for a wide range of system parameters such as the number of wavelengths supported by each fiber link and traffic. The results indicate that it is possible to build large-scale optical networks with rich connectivity in a cost-effective manner, using relatively few but properly dimensioned OXCs.
Physical fault tolerance of nanoelectronics.

PubMed

Szkopek, Thomas; Roychowdhury, Vwani P; Antoniadis, Dimitri A; Damoulakis, John N

2011-04-29

The error rate in complementary transistor circuits is suppressed exponentially in electron number, arising from an intrinsic physical implementation of fault-tolerant error correction. Contrariwise, explicit assembly of gates into the most efficient known fault-tolerant architecture is characterized by a subexponential suppression of error rate with electron number, and incurs significant overhead in wiring and complexity. We conclude that it is more efficient to prevent logical errors with physical fault tolerance than to correct logical errors with fault-tolerant architecture.
Different-Level Simultaneous Minimization Scheme for Fault Tolerance of Redundant Manipulator Aided with Discrete-Time Recurrent Neural Network

PubMed Central

Jin, Long; Liao, Bolin; Liu, Mei; Xiao, Lin; Guo, Dongsheng; Yan, Xiaogang

2017-01-01

By incorporating the physical constraints in joint space, a different-level simultaneous minimization scheme, which takes both the robot kinematics and robot dynamics into account, is presented and investigated for fault-tolerant motion planning of redundant manipulator in this paper. The scheme is reformulated as a quadratic program (QP) with equality and bound constraints, which is then solved by a discrete-time recurrent neural network. Simulative verifications based on a six-link planar redundant robot manipulator substantiate the efficacy and accuracy of the presented acceleration fault-tolerant scheme, the resultant QP and the corresponding discrete-time recurrent neural network. PMID:28955217
CNN universal machine as classificaton platform: an art-like clustering algorithm.

PubMed

Bálya, David

2003-12-01

Fast and robust classification of feature vectors is a crucial task in a number of real-time systems. A cellular neural/nonlinear network universal machine (CNN-UM) can be very efficient as a feature detector. The next step is to post-process the results for object recognition. This paper shows how a robust classification scheme based on adaptive resonance theory (ART) can be mapped to the CNN-UM. Moreover, this mapping is general enough to include different types of feed-forward neural networks. The designed analogic CNN algorithm is capable of classifying the extracted feature vectors keeping the advantages of the ART networks, such as robust, plastic and fault-tolerant behaviors. An analogic algorithm is presented for unsupervised classification with tunable sensitivity and automatic new class creation. The algorithm is extended for supervised classification. The presented binary feature vector classification is implemented on the existing standard CNN-UM chips for fast classification. The experimental evaluation shows promising performance after 100% accuracy on the training set.
PSO/ACO algorithm-based risk assessment of human neural tube defects in Heshun County, China.

PubMed

Liao, Yi Lan; Wang, Jin Feng; Wu, Ji Lei; Wang, Jiao Jiao; Zheng, Xiao Ying

2012-10-01

To develop a new technique for assessing the risk of birth defects, which are a major cause of infant mortality and disability in many parts of the world. The region of interest in this study was Heshun County, the county in China with the highest rate of neural tube defects (NTDs). A hybrid particle swarm optimization/ant colony optimization (PSO/ACO) algorithm was used to quantify the probability of NTDs occurring at villages with no births. The hybrid PSO/ACO algorithm is a form of artificial intelligence adapted for hierarchical classification. It is a powerful technique for modeling complex problems involving impacts of causes. The algorithm was easy to apply, with the accuracy of the results being 69.5%±7.02% at the 95% confidence level. The proposed method is simple to apply, has acceptable fault tolerance, and greatly enhances the accuracy of calculations. Copyright © 2012 The Editorial Board of Biomedical and Environmental Sciences. Published by Elsevier B.V. All rights reserved.
Design of Energy Storage Management System Based on FPGA in Micro-Grid

NASA Astrophysics Data System (ADS)

Liang, Yafeng; Wang, Yanping; Han, Dexiao

2018-01-01

Energy storage system is the core to maintain the stable operation of smart micro-grid. Aiming at the existing problems of the energy storage management system in the micro-grid such as Low fault tolerance, easy to cause fluctuations in micro-grid, a new intelligent battery management system based on field programmable gate array is proposed : taking advantage of FPGA to combine the battery management system with the intelligent micro-grid control strategy. Finally, aiming at the problem that during estimation of battery charge State by neural network, initialization of weights and thresholds are not accurate leading to large errors in prediction results, the genetic algorithm is proposed to optimize the neural network method, and the experimental simulation is carried out. The experimental results show that the algorithm has high precision and provides guarantee for the stable operation of micro-grid.
A methodology for testing fault-tolerant software

NASA Technical Reports Server (NTRS)

Andrews, D. M.; Mahmood, A.; Mccluskey, E. J.

1985-01-01

A methodology for testing fault tolerant software is presented. There are problems associated with testing fault tolerant software because many errors are masked or corrected by voters, limiter, or automatic channel synchronization. This methodology illustrates how the same strategies used for testing fault tolerant hardware can be applied to testing fault tolerant software. For example, one strategy used in testing fault tolerant hardware is to disable the redundancy during testing. A similar testing strategy is proposed for software, namely, to move the major emphasis on testing earlier in the development cycle (before the redundancy is in place) thus reducing the possibility that undetected errors will be masked when limiters and voters are added.
Observer-Based Adaptive Fault-Tolerant Tracking Control of Nonlinear Nonstrict-Feedback Systems.

PubMed

Wu, Chengwei; Liu, Jianxing; Xiong, Yongyang; Wu, Ligang

2017-06-28

This paper studies an output-based adaptive fault-tolerant control problem for nonlinear systems with nonstrict-feedback form. Neural networks are utilized to identify the unknown nonlinear characteristics in the system. An observer and a general fault model are constructed to estimate the unavailable states and describe the fault, respectively. Adaptive parameters are constructed to overcome the difficulties in the design process for nonstrict-feedback systems. Meanwhile, dynamic surface control technique is introduced to avoid the problem of ''explosion of complexity''. Furthermore, based on adaptive backstepping control method, an output-based adaptive neural tracking control strategy is developed for the considered system against actuator fault, which can ensure that all the signals in the resulting closed-loop system are bounded, and the system output signal can be regulated to follow the response of the given reference signal with a small error. Finally, the simulation results are provided to validate the effectiveness of the control strategy proposed in this paper.
Software Fault Tolerance: A Tutorial

NASA Technical Reports Server (NTRS)

Torres-Pomales, Wilfredo

2000-01-01

Because of our present inability to produce error-free software, software fault tolerance is and will continue to be an important consideration in software systems. The root cause of software design errors is the complexity of the systems. Compounding the problems in building correct software is the difficulty in assessing the correctness of software for highly complex systems. After a brief overview of the software development processes, we note how hard-to-detect design faults are likely to be introduced during development and how software faults tend to be state-dependent and activated by particular input sequences. Although component reliability is an important quality measure for system level analysis, software reliability is hard to characterize and the use of post-verification reliability estimates remains a controversial issue. For some applications software safety is more important than reliability, and fault tolerance techniques used in those applications are aimed at preventing catastrophes. Single version software fault tolerance techniques discussed include system structuring and closure, atomic actions, inline fault detection, exception handling, and others. Multiversion techniques are based on the assumption that software built differently should fail differently and thus, if one of the redundant versions fails, it is expected that at least one of the other versions will provide an acceptable output. Recovery blocks, N-version programming, and other multiversion techniques are reviewed.
Room temperature high-fidelity holonomic single-qubit gate on a solid-state spin.

PubMed

Arroyo-Camejo, Silvia; Lazariev, Andrii; Hell, Stefan W; Balasubramanian, Gopalakrishnan

2014-09-12

At its most fundamental level, circuit-based quantum computation relies on the application of controlled phase shift operations on quantum registers. While these operations are generally compromised by noise and imperfections, quantum gates based on geometric phase shifts can provide intrinsically fault-tolerant quantum computing. Here we demonstrate the high-fidelity realization of a recently proposed fast (non-adiabatic) and universal (non-Abelian) holonomic single-qubit gate, using an individual solid-state spin qubit under ambient conditions. This fault-tolerant quantum gate provides an elegant means for achieving the fidelity threshold indispensable for implementing quantum error correction protocols. Since we employ a spin qubit associated with a nitrogen-vacancy colour centre in diamond, this system is based on integrable and scalable hardware exhibiting strong analogy to current silicon technology. This quantum gate realization is a promising step towards viable, fault-tolerant quantum computing under ambient conditions.
Adaptive Fault-Tolerant Control of Uncertain Nonlinear Large-Scale Systems With Unknown Dead Zone.

PubMed

Chen, Mou; Tao, Gang

2016-08-01

In this paper, an adaptive neural fault-tolerant control scheme is proposed and analyzed for a class of uncertain nonlinear large-scale systems with unknown dead zone and external disturbances. To tackle the unknown nonlinear interaction functions in the large-scale system, the radial basis function neural network (RBFNN) is employed to approximate them. To further handle the unknown approximation errors and the effects of the unknown dead zone and external disturbances, integrated as the compounded disturbances, the corresponding disturbance observers are developed for their estimations. Based on the outputs of the RBFNN and the disturbance observer, the adaptive neural fault-tolerant control scheme is designed for uncertain nonlinear large-scale systems by using a decentralized backstepping technique. The closed-loop stability of the adaptive control system is rigorously proved via Lyapunov analysis and the satisfactory tracking performance is achieved under the integrated effects of unknown dead zone, actuator fault, and unknown external disturbances. Simulation results of a mass-spring-damper system are given to illustrate the effectiveness of the proposed adaptive neural fault-tolerant control scheme for uncertain nonlinear large-scale systems.
Fault tolerant high-performance PACS network design and implementation

NASA Astrophysics Data System (ADS)

Chimiak, William J.; Boehme, Johannes M.

1998-07-01

The Wake Forest University School of Medicine and the Wake Forest University/Baptist Medical Center (WFUBMC) are implementing a second generation PACS. The first generation PACS provided helpful information about the functional and temporal requirements of the system. It highlighted the importance of image retrieval speed, system availability, RIS/HIS integration, the ability to rapidly view images on any PACS workstation, network bandwidth, equipment redundancy, and the ability for the system to evolve using standards-based components. This paper deals with the network design and implementation of the PACS. The physical layout of the hospital areas served by the PACS, the choice of network equipment and installation issues encountered are addressed. Efforts to optimize fault tolerance are discussed. The PACS network is a gigabit, mixed-media network based on LAN emulation over ATM (LANE) with a rapid migration from LANE to Multiple Protocols Over ATM (MPOA) planned. Two fault-tolerant backbone ATM switches serve to distribute network accesses with two load-balancing 622 megabit per second (Mbps) OC-12 interconnections. The switch was sized to be upgradable to provide a 2.54 Gbps OC-48 interconnection with an OC-12 interconnection as a load-balancing backup. Modalities connect with legacy network interface cards to a switched-ethernet device. This device has two 155 Mbps OC-3 load-balancing uplinks to each of the backbone ATM switches of the PACS. This provides a fault-tolerant logical connection to the modality servers which pass verified DICOM images to the PACS servers and proper PACS diagnostic workstations. Where fiber pulls were prohibitively expensive, edge ATM switches were installed with an OC-12 uplink to a backbone ATM switches. The PACS and data base servers are fault-tolerant, hot-swappable Sun Enterprise Servers with an OC-12 connection to a backbone ATM switch and a fast-ethernet connection to a back-up network. The workstations come with 10/100 BASET autosense cards. A redundant switched-ethernet network will be installed to provide yet another degree of network fault-tolerance. The switched-ethernet devices are connected to each of the backbone ATM switches with two-load-balancing OC-3 connections to provide fault-tolerant connectivity in the event of a primary network failure.

A hybrid robust fault tolerant control based on adaptive joint unscented Kalman filter.

PubMed

Shabbouei Hagh, Yashar; Mohammadi Asl, Reza; Cocquempot, Vincent

2017-01-01

In this paper, a new hybrid robust fault tolerant control scheme is proposed. A robust H ∞ control law is used in non-faulty situation, while a Non-Singular Terminal Sliding Mode (NTSM) controller is activated as soon as an actuator fault is detected. Since a linear robust controller is designed, the system is first linearized through the feedback linearization method. To switch from one controller to the other, a fuzzy based switching system is used. An Adaptive Joint Unscented Kalman Filter (AJUKF) is used for fault detection and diagnosis. The proposed method is based on the simultaneous estimation of the system states and parameters. In order to show the efficiency of the proposed scheme, a simulated 3-DOF robotic manipulator is used. Copyright © 2016 ISA. Published by Elsevier Ltd. All rights reserved.
Advanced information processing system: Authentication protocols for network communication

NASA Technical Reports Server (NTRS)

Harper, Richard E.; Adams, Stuart J.; Babikyan, Carol A.; Butler, Bryan P.; Clark, Anne L.; Lala, Jaynarayan H.

1994-01-01

In safety critical I/O and intercomputer communication networks, reliable message transmission is an important concern. Difficulties of communication and fault identification in networks arise primarily because the sender of a transmission cannot be identified with certainty, an intermediate node can corrupt a message without certainty of detection, and a babbling node cannot be identified and silenced without lengthy diagnosis and reconfiguration . Authentication protocols use digital signature techniques to verify the authenticity of messages with high probability. Such protocols appear to provide an efficient solution to many of these problems. The objective of this program is to develop, demonstrate, and evaluate intercomputer communication architectures which employ authentication. As a context for the evaluation, the authentication protocol-based communication concept was demonstrated under this program by hosting a real-time flight critical guidance, navigation and control algorithm on a distributed, heterogeneous, mixed redundancy system of workstations and embedded fault-tolerant computers.
A distributed programming environment for Ada

NASA Technical Reports Server (NTRS)

Brennan, Peter; Mcdonnell, Tom; Mcfarland, Gregory; Timmins, Lawrence J.; Litke, John D.

1986-01-01

Despite considerable commercial exploitation of fault tolerance systems, significant and difficult research problems remain in such areas as fault detection and correction. A research project is described which constructs a distributed computing test bed for loosely coupled computers. The project is constructing a tool kit to support research into distributed control algorithms, including a distributed Ada compiler, distributed debugger, test harnesses, and environment monitors. The Ada compiler is being written in Ada and will implement distributed computing at the subsystem level. The design goal is to provide a variety of control mechanics for distributed programming while retaining total transparency at the code level.
Software fault-tolerance by design diversity DEDIX: A tool for experiments

NASA Technical Reports Server (NTRS)

Avizienis, A.; Gunningberg, P.; Kelly, J. P. J.; Lyu, R. T.; Strigini, L.; Traverse, P. J.; Tso, K. S.; Voges, U.

1986-01-01

The use of multiple versions of a computer program, independently designed from a common specification, to reduce the effects of an error is discussed. If these versions are designed by independent programming teams, it is expected that a fault in one version will not have the same behavior as any fault in the other versions. Since the errors in the output of the versions are different and uncorrelated, it is possible to run the versions concurrently, cross-check their results at prespecified points, and mask errors. A DEsign DIversity eXperiments (DEDIX) testbed was implemented to study the influence of common mode errors which can result in a failure of the entire system. The layered design of DEDIX and its decision algorithm are described.
Fault tolerant software modules for SIFT

NASA Technical Reports Server (NTRS)

Hecht, M.; Hecht, H.

1982-01-01

The implementation of software fault tolerance is investigated for critical modules of the Software Implemented Fault Tolerance (SIFT) operating system to support the computational and reliability requirements of advanced fly by wire transport aircraft. Fault tolerant designs generated for the error reported and global executive are examined. A description of the alternate routines, implementation requirements, and software validation are included.
Fault tree models for fault tolerant hypercube multiprocessors

NASA Technical Reports Server (NTRS)

Boyd, Mark A.; Tuazon, Jezus O.

1991-01-01

Three candidate fault tolerant hypercube architectures are modeled, their reliability analyses are compared, and the resulting implications of these methods of incorporating fault tolerance into hypercube multiprocessors are discussed. In the course of performing the reliability analyses, the use of HARP and fault trees in modeling sequence dependent system behaviors is demonstrated.
DEPEND: A simulation-based environment for system level dependability analysis

NASA Technical Reports Server (NTRS)

Goswami, Kumar; Iyer, Ravishankar K.

1992-01-01

The design and evaluation of highly reliable computer systems is a complex issue. Designers mostly develop such systems based on prior knowledge and experience and occasionally from analytical evaluations of simplified designs. A simulation-based environment called DEPEND which is especially geared for the design and evaluation of fault-tolerant architectures is presented. DEPEND is unique in that it exploits the properties of object-oriented programming to provide a flexible framework with which a user can rapidly model and evaluate various fault-tolerant systems. The key features of the DEPEND environment are described, and its capabilities are illustrated with a detailed analysis of a real design. In particular, DEPEND is used to simulate the Unix based Tandem Integrity fault-tolerance and evaluate how well it handles near-coincident errors caused by correlated and latent faults. Issues such as memory scrubbing, re-integration policies, and workload dependent repair times which affect how the system handles near-coincident errors are also evaluated. Issues such as the method used by DEPEND to simulate error latency and the time acceleration technique that provides enormous simulation speed up are also discussed. Unlike any other simulation-based dependability studies, the use of these approaches and the accuracy of the simulation model are validated by comparing the results of the simulations, with measurements obtained from fault injection experiments conducted on a production Tandem Integrity machine.
Energy-efficient fault tolerance in multiprocessor real-time systems

NASA Astrophysics Data System (ADS)

Guo, Yifeng

The recent progress in the multiprocessor/multicore systems has important implications for real-time system design and operation. From vehicle navigation to space applications as well as industrial control systems, the trend is to deploy multiple processors in real-time systems: systems with 4 -- 8 processors are common, and it is expected that many-core systems with dozens of processing cores will be available in near future. For such systems, in addition to general temporal requirement common for all real-time systems, two additional operational objectives are seen as critical: energy efficiency and fault tolerance. An intriguing dimension of the problem is that energy efficiency and fault tolerance are typically conflicting objectives, due to the fact that tolerating faults (e.g., permanent/transient) often requires extra resources with high energy consumption potential. In this dissertation, various techniques for energy-efficient fault tolerance in multiprocessor real-time systems have been investigated. First, the Reliability-Aware Power Management (RAPM) framework, which can preserve the system reliability with respect to transient faults when Dynamic Voltage Scaling (DVS) is applied for energy savings, is extended to support parallel real-time applications with precedence constraints. Next, the traditional Standby-Sparing (SS) technique for dual processor systems, which takes both transient and permanent faults into consideration while saving energy, is generalized to support multiprocessor systems with arbitrary number of identical processors. Observing the inefficient usage of slack time in the SS technique, a Preference-Oriented Scheduling Framework is designed to address the problem where tasks are given preferences for being executed as soon as possible (ASAP) or as late as possible (ALAP). A preference-oriented earliest deadline (POED) scheduler is proposed and its application in multiprocessor systems for energy-efficient fault tolerance is investigated, where tasks' main copies are executed ASAP while backup copies ALAP to reduce the overlapped execution of main and backup copies of the same task and thus reduce energy consumption. All proposed techniques are evaluated through extensive simulations and compared with other state-of-the-art approaches. The simulation results confirm that the proposed schemes can preserve the system reliability while still achieving substantial energy savings. Finally, for both SS and POED based Energy-Efficient Fault-Tolerant (EEFT) schemes, a series of recovery strategies are designed when more than one (transient and permanent) faults need to be tolerated.
Comparison result of inversion of gravity data of a fault by particle swarm optimization and Levenberg-Marquardt methods.

PubMed

Toushmalani, Reza

2013-01-01

The purpose of this study was to compare the performance of two methods for gravity inversion of a fault. First method [Particle swarm optimization (PSO)] is a heuristic global optimization method and also an optimization algorithm, which is based on swarm intelligence. It comes from the research on the bird and fish flock movement behavior. Second method [The Levenberg-Marquardt algorithm (LM)] is an approximation to the Newton method used also for training ANNs. In this paper first we discussed the gravity field of a fault, then describes the algorithms of PSO and LM And presents application of Levenberg-Marquardt algorithm, and a particle swarm algorithm in solving inverse problem of a fault. Most importantly the parameters for the algorithms are given for the individual tests. Inverse solution reveals that fault model parameters are agree quite well with the known results. A more agreement has been found between the predicted model anomaly and the observed gravity anomaly in PSO method rather than LM method.
Research on rolling element bearing fault diagnosis based on genetic algorithm matching pursuit

NASA Astrophysics Data System (ADS)

Rong, R. W.; Ming, T. F.

2017-12-01

In order to solve the problem of slow computation speed, matching pursuit algorithm is applied to rolling bearing fault diagnosis, and the improvement are conducted from two aspects that are the construction of dictionary and the way to search for atoms. To be specific, Gabor function which can reflect time-frequency localization characteristic well is used to construct the dictionary, and the genetic algorithm to improve the searching speed. A time-frequency analysis method based on genetic algorithm matching pursuit (GAMP) algorithm is proposed. The way to set property parameters for the improvement of the decomposition results is studied. Simulation and experimental results illustrate that the weak fault feature of rolling bearing can be extracted effectively by this proposed method, at the same time, the computation speed increases obviously.
Specification, Synthesis, and Verification of Software-based Control Protocols for Fault-Tolerant Space Systems

DTIC Science & Technology

2016-08-16

Force Research Laboratory Space Vehicles Directorate AFRL /RVSV 3550 Aberdeen Ave, SE 11. SPONSOR/MONITOR’S REPORT Kirtland AFB, NM 87117-5776 NUMBER...Ft Belvoir, VA 22060-6218 1 cy AFRL /RVIL Kirtland AFB, NM 87117-5776 2 cys Official Record Copy AFRL /RVSV/Richard S. Erwin 1 cy... AFRL -RV-PS- AFRL -RV-PS- TR-2016-0112 TR-2016-0112 SPECIFICATION, SYNTHESIS, AND VERIFICATION OF SOFTWARE-BASED CONTROL PROTOCOLS FOR FAULT-TOLERANT
Research, Development and Testing of a Fault-Tolerant FPGA-Based Sequencer for CubeSat Launching Applications

DTIC Science & Technology

2013-03-01

amounts of time and effort to implement. Future testing with commercial, fault-tolerant synthesis software, under a radiation environment, will yield ...initial viewpoint of the author is to take the flash-based FPGA route. This will yield a simple, reconfigurable circuit while providing the added...structure seen in Figure 30. Each of these full adder blocks were replaced in subsequent iterations to yield proper comparison with this baseline
Design and Experimental Validation for Direct-Drive Fault-Tolerant Permanent-Magnet Vernier Machines

PubMed Central

Liu, Guohai; Yang, Junqin; Chen, Ming; Chen, Qian

2014-01-01

A fault-tolerant permanent-magnet vernier (FT-PMV) machine is designed for direct-drive applications, incorporating the merits of high torque density and high reliability. Based on the so-called magnetic gearing effect, PMV machines have the ability of high torque density by introducing the flux-modulation poles (FMPs). This paper investigates the fault-tolerant characteristic of PMV machines and provides a design method, which is able to not only meet the fault-tolerant requirements but also keep the ability of high torque density. The operation principle of the proposed machine has been analyzed. The design process and optimization are presented specifically, such as the combination of slots and poles, the winding distribution, and the dimensions of PMs and teeth. By using the time-stepping finite element method (TS-FEM), the machine performances are evaluated. Finally, the FT-PMV machine is manufactured, and the experimental results are presented to validate the theoretical analysis. PMID:25045729
Reliability modeling of fault-tolerant computer based systems

NASA Technical Reports Server (NTRS)

Bavuso, Salvatore J.

1987-01-01

Digital fault-tolerant computer-based systems have become commonplace in military and commercial avionics. These systems hold the promise of increased availability, reliability, and maintainability over conventional analog-based systems through the application of replicated digital computers arranged in fault-tolerant configurations. Three tightly coupled factors of paramount importance, ultimately determining the viability of these systems, are reliability, safety, and profitability. Reliability, the major driver affects virtually every aspect of design, packaging, and field operations, and eventually produces profit for commercial applications or increased national security. However, the utilization of digital computer systems makes the task of producing credible reliability assessment a formidable one for the reliability engineer. The root of the problem lies in the digital computer's unique adaptability to changing requirements, computational power, and ability to test itself efficiently. Addressed here are the nuances of modeling the reliability of systems with large state sizes, in the Markov sense, which result from systems based on replicated redundant hardware and to discuss the modeling of factors which can reduce reliability without concomitant depletion of hardware. Advanced fault-handling models are described and methods of acquiring and measuring parameters for these models are delineated.
An Ontology for Identifying Cyber Intrusion Induced Faults in Process Control Systems

NASA Astrophysics Data System (ADS)

Hieb, Jeffrey; Graham, James; Guan, Jian

This paper presents an ontological framework that permits formal representations of process control systems, including elements of the process being controlled and the control system itself. A fault diagnosis algorithm based on the ontological model is also presented. The algorithm can identify traditional process elements as well as control system elements (e.g., IP network and SCADA protocol) as fault sources. When these elements are identified as a likely fault source, the possibility exists that the process fault is induced by a cyber intrusion. A laboratory-scale distillation column is used to illustrate the model and the algorithm. Coupled with a well-defined statistical process model, this fault diagnosis approach provides cyber security enhanced fault diagnosis information to plant operators and can help identify that a cyber attack is underway before a major process failure is experienced.
Fault tolerant multi-sensor fusion based on the information gain

NASA Astrophysics Data System (ADS)

Hage, Joelle Al; El Najjar, Maan E.; Pomorski, Denis

2017-01-01

In the last decade, multi-robot systems are used in several applications like for example, the army, the intervention areas presenting danger to human life, the management of natural disasters, the environmental monitoring, exploration and agriculture. The integrity of localization of the robots must be ensured in order to achieve their mission in the best conditions. Robots are equipped with proprioceptive (encoders, gyroscope) and exteroceptive sensors (Kinect). However, these sensors could be affected by various faults types that can be assimilated to erroneous measurements, bias, outliers, drifts,… In absence of a sensor fault diagnosis step, the integrity and the continuity of the localization are affected. In this work, we present a muti-sensors fusion approach with Fault Detection and Exclusion (FDE) based on the information theory. In this context, we are interested by the information gain given by an observation which may be relevant when dealing with the fault tolerance aspect. Moreover, threshold optimization based on the quantity of information given by a decision on the true hypothesis is highlighted.
Layered clustering multi-fault diagnosis for hydraulic piston pump

NASA Astrophysics Data System (ADS)

Du, Jun; Wang, Shaoping; Zhang, Haiyan

2013-04-01

Efficient diagnosis is very important for improving reliability and performance of aircraft hydraulic piston pump, and it is one of the key technologies in prognostic and health management system. In practice, due to harsh working environment and heavy working loads, multiple faults of an aircraft hydraulic pump may occur simultaneously after long time operations. However, most existing diagnosis methods can only distinguish pump faults that occur individually. Therefore, new method needs to be developed to realize effective diagnosis of simultaneous multiple faults on aircraft hydraulic pump. In this paper, a new method based on the layered clustering algorithm is proposed to diagnose multiple faults of an aircraft hydraulic pump that occur simultaneously. The intensive failure mechanism analyses of the five main types of faults are carried out, and based on these analyses the optimal combination and layout of diagnostic sensors is attained. The three layered diagnosis reasoning engine is designed according to the faults' risk priority number and the characteristics of different fault feature extraction methods. The most serious failures are first distinguished with the individual signal processing. To the desultory faults, i.e., swash plate eccentricity and incremental clearance increases between piston and slipper, the clustering diagnosis algorithm based on the statistical average relative power difference (ARPD) is proposed. By effectively enhancing the fault features of these two faults, the ARPDs calculated from vibration signals are employed to complete the hypothesis testing. The ARPDs of the different faults follow different probability distributions. Compared with the classical fast Fourier transform-based spectrum diagnosis method, the experimental results demonstrate that the proposed algorithm can diagnose the multiple faults, which occur synchronously, with higher precision and reliability.
An uncertainty-based distributed fault detection mechanism for wireless sensor networks.

PubMed

Yang, Yang; Gao, Zhipeng; Zhou, Hang; Qiu, Xuesong

2014-04-25

Exchanging too many messages for fault detection will cause not only a degradation of the network quality of service, but also represents a huge burden on the limited energy of sensors. Therefore, we propose an uncertainty-based distributed fault detection through aided judgment of neighbors for wireless sensor networks. The algorithm considers the serious influence of sensing measurement loss and therefore uses Markov decision processes for filling in missing data. Most important of all, fault misjudgments caused by uncertainty conditions are the main drawbacks of traditional distributed fault detection mechanisms. We draw on the experience of evidence fusion rules based on information entropy theory and the degree of disagreement function to increase the accuracy of fault detection. Simulation results demonstrate our algorithm can effectively reduce communication energy overhead due to message exchanges and provide a higher detection accuracy ratio.
Research Supporting Satellite Communications Technology

NASA Technical Reports Server (NTRS)

Horan Stephen; Lyman, Raphael

2005-01-01

This report describes the second year of research effort under the grant Research Supporting Satellite Communications Technology. The research program consists of two major projects: Fault Tolerant Link Establishment and the design of an Auto-Configurable Receiver. The Fault Tolerant Link Establishment protocol is being developed to assist the designers of satellite clusters to manage the inter-satellite communications. During this second year, the basic protocol design was validated with an extensive testing program. After this testing was completed, a channel error model was added to the protocol to permit the effects of channel errors to be measured. This error generation was used to test the effects of channel errors on Heartbeat and Token message passing. The C-language source code for the protocol modules was delivered to Goddard Space Flight Center for integration with the GSFC testbed. The need for a receiver autoconfiguration capability arises when a satellite-to-ground transmission is interrupted due to an unexpected event, the satellite transponder may reset to an unknown state and begin transmitting in a new mode. During Year 2, we completed testing of these algorithms when noise-induced bit errors were introduced. We also developed and tested an algorithm for estimating the data rate, assuming an NRZ-formatted signal corrupted with additive white Gaussian noise, and we took initial steps in integrating both algorithms into the SDR test bed at GSFC.
Fault tolerant architectures for integrated aircraft electronics systems, task 2

NASA Technical Reports Server (NTRS)

Levitt, K. N.; Melliar-Smith, P. M.; Schwartz, R. L.

1984-01-01

The architectural basis for an advanced fault tolerant on-board computer to succeed the current generation of fault tolerant computers is examined. The network error tolerant system architecture is studied with particular attention to intercluster configurations and communication protocols, and to refined reliability estimates. The diagnosis of faults, so that appropriate choices for reconfiguration can be made is discussed. The analysis relates particularly to the recognition of transient faults in a system with tasks at many levels of priority. The demand driven data-flow architecture, which appears to have possible application in fault tolerant systems is described and work investigating the feasibility of automatic generation of aircraft flight control programs from abstract specifications is reported.

Model-based fault detection and isolation for intermittently active faults with application to motion-based thruster fault detection and isolation for spacecraft

NASA Technical Reports Server (NTRS)

Wilson, Edward (Inventor)

2008-01-01

The present invention is a method for detecting and isolating fault modes in a system having a model describing its behavior and regularly sampled measurements. The models are used to calculate past and present deviations from measurements that would result with no faults present, as well as with one or more potential fault modes present. Algorithms that calculate and store these deviations, along with memory of when said faults, if present, would have an effect on the said actual measurements, are used to detect when a fault is present. Related algorithms are used to exonerate false fault modes and finally to isolate the true fault mode. This invention is presented with application to detection and isolation of thruster faults for a thruster-controlled spacecraft. As a supporting aspect of the invention, a novel, effective, and efficient filtering method for estimating the derivative of a noisy signal is presented.
Fault-tolerance of a neural network solving the traveling salesman problem

NASA Technical Reports Server (NTRS)

Protzel, P.; Palumbo, D.; Arras, M.

1989-01-01

This study presents the results of a fault-injection experiment that stimulates a neural network solving the Traveling Salesman Problem (TSP). The network is based on a modified version of Hopfield's and Tank's original method. We define a performance characteristic for the TSP that allows an overall assessment of the solution quality for different city-distributions and problem sizes. Five different 10-, 20-, and 30- city cases are sued for the injection of up to 13 simultaneous stuck-at-0 and stuck-at-1 faults. The results of more than 4000 simulation-runs show the extreme fault-tolerance of the network, especially with respect to stuck-at-0 faults. One possible explanation for the overall surprising result is the redundancy of the problem representation.
Demonstration of qubit operations below a rigorous fault tolerance threshold with gate set tomography

DOE PAGES

Blume-Kohout, Robin; Gamble, John King; Nielsen, Erik; ...

2017-02-15

Quantum information processors promise fast algorithms for problems inaccessible to classical computers. But since qubits are noisy and error-prone, they will depend on fault-tolerant quantum error correction (FTQEC) to compute reliably. Quantum error correction can protect against general noise if—and only if—the error in each physical qubit operation is smaller than a certain threshold. The threshold for general errors is quantified by their diamond norm. Until now, qubits have been assessed primarily by randomized benchmarking, which reports a different error rate that is not sensitive to all errors, and cannot be compared directly to diamond norm thresholds. Finally, we usemore » gate set tomography to completely characterize operations on a trapped-Yb +-ion qubit and demonstrate with greater than 95% confidence that they satisfy a rigorous threshold for FTQEC (diamond norm ≤6.7 × 10 -4).« less
Demonstration of qubit operations below a rigorous fault tolerance threshold with gate set tomography

PubMed Central

Blume-Kohout, Robin; Gamble, John King; Nielsen, Erik; Rudinger, Kenneth; Mizrahi, Jonathan; Fortier, Kevin; Maunz, Peter

2017-01-01

Quantum information processors promise fast algorithms for problems inaccessible to classical computers. But since qubits are noisy and error-prone, they will depend on fault-tolerant quantum error correction (FTQEC) to compute reliably. Quantum error correction can protect against general noise if—and only if—the error in each physical qubit operation is smaller than a certain threshold. The threshold for general errors is quantified by their diamond norm. Until now, qubits have been assessed primarily by randomized benchmarking, which reports a different error rate that is not sensitive to all errors, and cannot be compared directly to diamond norm thresholds. Here we use gate set tomography to completely characterize operations on a trapped-Yb+-ion qubit and demonstrate with greater than 95% confidence that they satisfy a rigorous threshold for FTQEC (diamond norm ≤6.7 × 10−4). PMID:28198466
Demonstration of qubit operations below a rigorous fault tolerance threshold with gate set tomography

DOE Office of Scientific and Technical Information (OSTI.GOV)

Blume-Kohout, Robin; Gamble, John King; Nielsen, Erik

Quantum information processors promise fast algorithms for problems inaccessible to classical computers. But since qubits are noisy and error-prone, they will depend on fault-tolerant quantum error correction (FTQEC) to compute reliably. Quantum error correction can protect against general noise if—and only if—the error in each physical qubit operation is smaller than a certain threshold. The threshold for general errors is quantified by their diamond norm. Until now, qubits have been assessed primarily by randomized benchmarking, which reports a different error rate that is not sensitive to all errors, and cannot be compared directly to diamond norm thresholds. Finally, we usemore » gate set tomography to completely characterize operations on a trapped-Yb +-ion qubit and demonstrate with greater than 95% confidence that they satisfy a rigorous threshold for FTQEC (diamond norm ≤6.7 × 10 -4).« less
Rapid recovery from transient faults in the fault-tolerant processor with fault-tolerant shared memory

NASA Technical Reports Server (NTRS)

Harper, Richard E.; Butler, Bryan P.

1990-01-01

The Draper fault-tolerant processor with fault-tolerant shared memory (FTP/FTSM), which is designed to allow application tasks to continue execution during the memory alignment process, is described. Processor performance is not affected by memory alignment. In addition, the FTP/FTSM incorporates a hardware scrubber device to perform the memory alignment quickly during unused memory access cycles. The FTP/FTSM architecture is described, followed by an estimate of the time required for channel reintegration.
Parallel and distributed computation for fault-tolerant object recognition

NASA Technical Reports Server (NTRS)

Wechsler, Harry

1988-01-01

The distributed associative memory (DAM) model is suggested for distributed and fault-tolerant computation as it relates to object recognition tasks. The fault-tolerance is with respect to geometrical distortions (scale and rotation), noisy inputs, occulsion/overlap, and memory faults. An experimental system was developed for fault-tolerant structure recognition which shows the feasibility of such an approach. The approach is futher extended to the problem of multisensory data integration and applied successfully to the recognition of colored polyhedral objects.
Common spaceborne multicomputer operating system and development environment

NASA Technical Reports Server (NTRS)

Craymer, L. G.; Lewis, B. F.; Hayes, P. J.; Jones, R. L.

1994-01-01

A preliminary technical specification for a multicomputer operating system is developed. The operating system is targeted for spaceborne flight missions and provides a broad range of real-time functionality, dynamic remote code-patching capability, and system fault tolerance and long-term survivability features. Dataflow concepts are used for representing application algorithms. Functional features are included to ensure real-time predictability for a class of algorithms which require data-driven execution on an iterative steady state basis. The development environment supports the development of algorithm code, design of control parameters, performance analysis, simulation of real-time dataflow applications, and compiling and downloading of the resulting application.
Implementing a C++ Version of the Joint Seismic-Geodetic Algorithm for Finite-Fault Detection and Slip Inversion for Earthquake Early Warning

NASA Astrophysics Data System (ADS)

Smith, D. E.; Felizardo, C.; Minson, S. E.; Boese, M.; Langbein, J. O.; Guillemot, C.; Murray, J. R.

2015-12-01

The earthquake early warning (EEW) systems in California and elsewhere can greatly benefit from algorithms that generate estimates of finite-fault parameters. These estimates could significantly improve real-time shaking calculations and yield important information for immediate disaster response. Minson et al. (2015) determined that combining FinDer's seismic-based algorithm (Böse et al., 2012) with BEFORES' geodetic-based algorithm (Minson et al., 2014) yields a more robust and informative joint solution than using either algorithm alone. FinDer examines the distribution of peak ground accelerations from seismic stations and determines the best finite-fault extent and strike from template matching. BEFORES employs a Bayesian framework to search for the best slip inversion over all possible fault geometries in terms of strike and dip. Using FinDer and BEFORES together generates estimates of finite-fault extent, strike, dip, preferred slip, and magnitude. To yield the quickest, most flexible, and open-source version of the joint algorithm, we translated BEFORES and FinDer from Matlab into C++. We are now developing a C++ Application Protocol Interface for these two algorithms to be connected to the seismic and geodetic data flowing from the EEW system. The interface that is being developed will also enable communication between the two algorithms to generate the joint solution of finite-fault parameters. Once this interface is developed and implemented, the next step will be to run test seismic and geodetic data through the system via the Earthworm module, Tank Player. This will allow us to examine algorithm performance on simulated data and past real events.
Development and evaluation of a fault-tolerant multiprocessor (FTMP) computer. Volume 1: FTMP principles of operation

NASA Technical Reports Server (NTRS)

Smith, T. B., Jr.; Lala, J. H.

1983-01-01

The basic organization of the fault tolerant multiprocessor, (FTMP) is that of a general purpose homogeneous multiprocessor. Three processors operate on a shared system (memory and I/O) bus. Replication and tight synchronization of all elements and hardware voting is employed to detect and correct any single fault. Reconfiguration is then employed to repair a fault. Multiple faults may be tolerated as a sequence of single faults with repair between fault occurrences.
The Design and Semi-Physical Simulation Test of Fault-Tolerant Controller for Aero Engine

NASA Astrophysics Data System (ADS)

Liu, Yuan; Zhang, Xin; Zhang, Tianhong

2017-11-01

A new fault-tolerant control method for aero engine is proposed, which can accurately diagnose the sensor fault by Kalman filter banks and reconstruct the signal by real-time on-board adaptive model combing with a simplified real-time model and an improved Kalman filter. In order to verify the feasibility of the method proposed, a semi-physical simulation experiment has been carried out. Besides the real I/O interfaces, controller hardware and the virtual plant model, semi-physical simulation system also contains real fuel system. Compared with the hardware-in-the-loop (HIL) simulation, semi-physical simulation system has a higher degree of confidence. In order to meet the needs of semi-physical simulation, a rapid prototyping controller with fault-tolerant control ability based on NI CompactRIO platform is designed and verified on the semi-physical simulation test platform. The result shows that the controller can realize the aero engine control safely and reliably with little influence on controller performance in the event of fault on sensor.
Advanced reliability modeling of fault-tolerant computer-based systems

NASA Technical Reports Server (NTRS)

Bavuso, S. J.

1982-01-01

Two methodologies for the reliability assessment of fault tolerant digital computer based systems are discussed. The computer-aided reliability estimation 3 (CARE 3) and gate logic software simulation (GLOSS) are assessment technologies that were developed to mitigate a serious weakness in the design and evaluation process of ultrareliable digital systems. The weak link is based on the unavailability of a sufficiently powerful modeling technique for comparing the stochastic attributes of one system against others. Some of the more interesting attributes are reliability, system survival, safety, and mission success.
Indirect adaptive fuzzy fault-tolerant tracking control for MIMO nonlinear systems with actuator and sensor failures.

PubMed

Bounemeur, Abdelhamid; Chemachema, Mohamed; Essounbouli, Najib

2018-05-10

In this paper, an active fuzzy fault tolerant tracking control (AFFTTC) scheme is developed for a class of multi-input multi-output (MIMO) unknown nonlinear systems in the presence of unknown actuator faults, sensor failures and external disturbance. The developed control scheme deals with four kinds of faults for both sensors and actuators. The bias, drift, and loss of accuracy additive faults are considered along with the loss of effectiveness multiplicative fault. A fuzzy adaptive controller based on back-stepping design is developed to deal with actuator failures and unknown system dynamics. However, an additional robust control term is added to deal with sensor faults, approximation errors, and external disturbances. Lyapunov theory is used to prove the stability of the closed loop system. Numerical simulations on a quadrotor are presented to show the effectiveness of the proposed approach. Copyright © 2018 ISA. Published by Elsevier Ltd. All rights reserved.
An Efficient Silent Data Corruption Detection Method with Error-Feedback Control and Even Sampling for HPC Applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Di, Sheng; Berrocal, Eduardo; Cappello, Franck

The silent data corruption (SDC) problem is attracting more and more attentions because it is expected to have a great impact on exascale HPC applications. SDC faults are hazardous in that they pass unnoticed by hardware and can lead to wrong computation results. In this work, we formulate SDC detection as a runtime one-step-ahead prediction method, leveraging multiple linear prediction methods in order to improve the detection results. The contributions are twofold: (1) we propose an error feedback control model that can reduce the prediction errors for different linear prediction methods, and (2) we propose a spatial-data-based even-sampling method tomore » minimize the detection overheads (including memory and computation cost). We implement our algorithms in the fault tolerance interface, a fault tolerance library with multiple checkpoint levels, such that users can conveniently protect their HPC applications against both SDC errors and fail-stop errors. We evaluate our approach by using large-scale traces from well-known, large-scale HPC applications, as well as by running those HPC applications on a real cluster environment. Experiments show that our error feedback control model can improve detection sensitivity by 34-189% for bit-flip memory errors injected with the bit positions in the range [20,30], without any degradation on detection accuracy. Furthermore, memory size can be reduced by 33% with our spatial-data even-sampling method, with only a slight and graceful degradation in the detection sensitivity.« less
Validation Methods for Fault-Tolerant avionics and control systems, working group meeting 1

NASA Technical Reports Server (NTRS)

1979-01-01

The proceedings of the first working group meeting on validation methods for fault tolerant computer design are presented. The state of the art in fault tolerant computer validation was examined in order to provide a framework for future discussions concerning research issues for the validation of fault tolerant avionics and flight control systems. The development of positions concerning critical aspects of the validation process are given.
Dual-quaternion based fault-tolerant control for spacecraft formation flying with finite-time convergence.

PubMed

Dong, Hongyang; Hu, Qinglei; Ma, Guangfu

2016-03-01

Study results of developing control system for spacecraft formation proximity operations between a target and a chaser are presented. In particular, a coupled model using dual quaternion is employed to describe the proximity problem of spacecraft formation, and a nonlinear adaptive fault-tolerant feedback control law is developed to enable the chaser spacecraft to track the position and attitude of the target even though its actuator occurs fault. Multiple-task capability of the proposed control system is further demonstrated in the presence of disturbances and parametric uncertainties as well. In addition, the practical finite-time stability feature of the closed-loop system is guaranteed theoretically under the designed control law. Numerical simulation of the proposed method is presented to demonstrate the advantages with respect to interference suppression, fast tracking, fault tolerant and practical finite-time stability. Copyright © 2015 ISA. Published by Elsevier Ltd. All rights reserved.
Experimental Robot Position Sensor Fault Tolerance Using Accelerometers and Joint Torque Sensors

NASA Technical Reports Server (NTRS)

Aldridge, Hal A.; Juang, Jer-Nan

1997-01-01

Robot systems in critical applications, such as those in space and nuclear environments, must be able to operate during component failure to complete important tasks. One failure mode that has received little attention is the failure of joint position sensors. Current fault tolerant designs require the addition of directly redundant position sensors which can affect joint design. The proposed method uses joint torque sensors found in most existing advanced robot designs along with easily locatable, lightweight accelerometers to provide a joint position sensor fault recovery mode. This mode uses the torque sensors along with a virtual passive control law for stability and accelerometers for joint position information. Two methods for conversion from Cartesian acceleration to joint position based on robot kinematics, not integration, are presented. The fault tolerant control method was tested on several joints of a laboratory robot. The controllers performed well with noisy, biased data and a model with uncertain parameters.
An Uncertainty-Based Distributed Fault Detection Mechanism for Wireless Sensor Networks

PubMed Central

Yang, Yang; Gao, Zhipeng; Zhou, Hang; Qiu, Xuesong

2014-01-01

Exchanging too many messages for fault detection will cause not only a degradation of the network quality of service, but also represents a huge burden on the limited energy of sensors. Therefore, we propose an uncertainty-based distributed fault detection through aided judgment of neighbors for wireless sensor networks. The algorithm considers the serious influence of sensing measurement loss and therefore uses Markov decision processes for filling in missing data. Most important of all, fault misjudgments caused by uncertainty conditions are the main drawbacks of traditional distributed fault detection mechanisms. We draw on the experience of evidence fusion rules based on information entropy theory and the degree of disagreement function to increase the accuracy of fault detection. Simulation results demonstrate our algorithm can effectively reduce communication energy overhead due to message exchanges and provide a higher detection accuracy ratio. PMID:24776937
Advanced cloud fault tolerance system

NASA Astrophysics Data System (ADS)

Sumangali, K.; Benny, Niketa

2017-11-01

Cloud computing has become a prevalent on-demand service on the internet to store, manage and process data. A pitfall that accompanies cloud computing is the failures that can be encountered in the cloud. To overcome these failures, we require a fault tolerance mechanism to abstract faults from users. We have proposed a fault tolerant architecture, which is a combination of proactive and reactive fault tolerance. This architecture essentially increases the reliability and the availability of the cloud. In the future, we would like to compare evaluations of our proposed architecture with existing architectures and further improve it.
Verifiable fault tolerance in measurement-based quantum computation

NASA Astrophysics Data System (ADS)

Fujii, Keisuke; Hayashi, Masahito

2017-09-01

Quantum systems, in general, cannot be simulated efficiently by a classical computer, and hence are useful for solving certain mathematical problems and simulating quantum many-body systems. This also implies, unfortunately, that verification of the output of the quantum systems is not so trivial, since predicting the output is exponentially hard. As another problem, the quantum system is very delicate for noise and thus needs an error correction. Here, we propose a framework for verification of the output of fault-tolerant quantum computation in a measurement-based model. In contrast to existing analyses on fault tolerance, we do not assume any noise model on the resource state, but an arbitrary resource state is tested by using only single-qubit measurements to verify whether or not the output of measurement-based quantum computation on it is correct. Verifiability is equipped by a constant time repetition of the original measurement-based quantum computation in appropriate measurement bases. Since full characterization of quantum noise is exponentially hard for large-scale quantum computing systems, our framework provides an efficient way to practically verify the experimental quantum error correction.

ROBUS-2: A Fault-Tolerant Broadcast Communication System

NASA Technical Reports Server (NTRS)

Torres-Pomales, Wilfredo; Malekpour, Mahyar R.; Miner, Paul S.

2005-01-01

The Reliable Optical Bus (ROBUS) is the core communication system of the Scalable Processor-Independent Design for Enhanced Reliability (SPIDER), a general-purpose fault-tolerant integrated modular architecture currently under development at NASA Langley Research Center. The ROBUS is a time-division multiple access (TDMA) broadcast communication system with medium access control by means of time-indexed communication schedule. ROBUS-2 is a developmental version of the ROBUS providing guaranteed fault-tolerant services to the attached processing elements (PEs), in the presence of a bounded number of faults. These services include message broadcast (Byzantine Agreement), dynamic communication schedule update, clock synchronization, and distributed diagnosis (group membership). The ROBUS also features fault-tolerant startup and restart capabilities. ROBUS-2 is tolerant to internal as well as PE faults, and incorporates a dynamic self-reconfiguration capability driven by the internal diagnostic system. This version of the ROBUS is intended for laboratory experimentation and demonstrations of the capability to reintegrate failed nodes, dynamically update the communication schedule, and tolerate and recover from correlated transient faults.
High speed, long distance, data transmission multiplexing circuit

DOEpatents

Mariotti, Razvan

1991-01-01

A high speed serial data transmission multiplexing circuit, which is operable to accurately transmit data over long distances (up to 3 Km), and to multiplex, select and continuously display real time analog signals in a bandwidth from DC to 100 Khz. The circuit is made fault tolerant by use of a programmable flywheel algorithm, which enables the circuit to tolerate one transmission error before losing synchronization of the transmitted frames of data. A method of encoding and framing captured and transmitted data is used which has a low overhead and prevents some particular transmitted data patterns from locking an included detector/decoder circuit.
Geodetic Finite-Fault-based Earthquake Early Warning Performance for Great Earthquakes Worldwide

NASA Astrophysics Data System (ADS)

Ruhl, C. J.; Melgar, D.; Grapenthin, R.; Allen, R. M.

2017-12-01

GNSS-based earthquake early warning (EEW) algorithms estimate fault-finiteness and unsaturated moment magnitude for the largest, most damaging earthquakes. Because large events are infrequent, algorithms are not regularly exercised and insufficiently tested on few available datasets. The Geodetic Alarm System (G-larmS) is a GNSS-based finite-fault algorithm developed as part of the ShakeAlert EEW system in the western US. Performance evaluations using synthetic earthquakes offshore Cascadia showed that G-larmS satisfactorily recovers magnitude and fault length, providing useful alerts 30-40 s after origin time and timely warnings of ground motion for onshore urban areas. An end-to-end test of the ShakeAlert system demonstrated the need for GNSS data to accurately estimate ground motions in real-time. We replay real data from several subduction-zone earthquakes worldwide to demonstrate the value of GNSS-based EEW for the largest, most damaging events. We compare predicted ground acceleration (PGA) from first-alert-solutions with those recorded in major urban areas. In addition, where applicable, we compare observed tsunami heights to those predicted from the G-larmS solutions. We show that finite-fault inversion based on GNSS-data is essential to achieving the goals of EEW.
Application of majority voting and consensus voting algorithms in N-version software

NASA Astrophysics Data System (ADS)

Tsarev, R. Yu; Durmuş, M. S.; Üstoglu, I.; Morozov, V. A.

2018-05-01

N-version programming is one of the most common techniques which is used to improve the reliability of software by building in fault tolerance, redundancy and decreasing common cause failures. N different equivalent software versions are developed by N different and isolated workgroups by considering the same software specifications. The versions solve the same task and return results that have to be compared to determine the correct result. Decisions of N different versions are evaluated by a voting algorithm or the so-called voter. In this paper, two of the most commonly used software voting algorithms such as the majority voting algorithm and the consensus voting algorithm are studied. The distinctive features of Nversion programming with majority voting and N-version programming with consensus voting are described. These two algorithms make a decision about the correct result on the base of the agreement matrix. However, if the equivalence relation on the agreement matrix is not satisfied it is impossible to make a decision. It is shown that the agreement matrix can be transformed into an appropriate form by using the Boolean compositions when the equivalence relation is satisfied.
MPI Enhancements in John the Ripper

NASA Astrophysics Data System (ADS)

Sykes, Edward R.; Lin, Michael; Skoczen, Wesley

2010-11-01

John the Ripper (JtR) is an open source software package commonly used by system administrators to enforce password policy. JtR is designed to attack (i.e., crack) passwords encrypted in a wide variety of commonly used formats. While parallel implementations of JtR exist, there are several limitations to them. This research reports on two distinct algorithms that enhance this password cracking tool using the Message Passing Interface. The first algorithm is a novel approach that uses numerous processors to crack one password by using an innovative approach to workload distribution. In this algorithm the candidate password is distributed to all participating processors and the word list is divided based on probability so that each processor has the same likelihood of cracking the password while eliminating overlapping operations. The second algorithm developed in this research involves dividing the passwords within a password file equally amongst available processors while ensuring load-balanced and fault-tolerant behavior. This paper describes John the Ripper, the design of these two algorithms and preliminary results. Given the same amount of time, the original JtR can crack 29 passwords, whereas our algorithms 1 and 2 can crack an additional 35 and 45 passwords respectively.
Fault recovery characteristics of the fault tolerant multi-processor

NASA Technical Reports Server (NTRS)

Padilla, Peter A.

1990-01-01

The fault handling performance of the fault tolerant multiprocessor (FTMP) was investigated. Fault handling errors detected during fault injection experiments were characterized. In these fault injection experiments, the FTMP disabled a working unit instead of the faulted unit once every 500 faults, on the average. System design weaknesses allow active faults to exercise a part of the fault management software that handles byzantine or lying faults. It is pointed out that these weak areas in the FTMP's design increase the probability that, for any hardware fault, a good LRU (line replaceable unit) is mistakenly disabled by the fault management software. It is concluded that fault injection can help detect and analyze the behavior of a system in the ultra-reliable regime. Although fault injection testing cannot be exhaustive, it has been demonstrated that it provides a unique capability to unmask problems and to characterize the behavior of a fault-tolerant system.
Implementation Of The Configurable Fault Tolerant System Experiment On NPSAT 1

DTIC Science & Technology

2016-03-01

REPORT TYPE AND DATES COVERED Master’s thesis 4. TITLE AND SUBTITLE IMPLEMENTATION OF THE CONFIGURABLE FAULT TOLERANT SYSTEM EXPERIMENT ON NPSAT...open-source microprocessor without interlocked pipeline stages (MIPS) based processor softcore, a cached memory structure capable of accessing double...data rate type three and secure digital card memories, an interface to the main satellite bus, and XILINX’s soft error mitigation softcore. The
ALLIANCE: An architecture for fault tolerant, cooperative control of heterogeneous mobile robots

DOE Office of Scientific and Technical Information (OSTI.GOV)

Parker, L.E.

1995-02-01

This research addresses the problem of achieving fault tolerant cooperation within small- to medium-sized teams of heterogeneous mobile robots. The author describes a novel behavior-based, fully distributed architecture, called ALLIANCE, that utilizes adaptive action selection to achieve fault tolerant cooperative control in robot missions involving loosely coupled, largely independent tasks. The robots in this architecture possess a variety of high-level functions that they can perform during a mission, and must at all times select an appropriate action based on the requirements of the mission, the activities of other robots, the current environmental conditions, and their own internal states. Since suchmore » cooperative teams often work in dynamic and unpredictable environments, the software architecture allows the team members to respond robustly and reliably to unexpected environmental changes and modifications in the robot team that may occur due to mechanical failure, the learning of new skills, or the addition or removal of robots from the team by human intervention. After presenting ALLIANCE, the author describes in detail experimental results of an implementation of this architecture on a team of physical mobile robots performing a cooperative box pushing demonstration. These experiments illustrate the ability of ALLIANCE to achieve adaptive, fault-tolerant cooperative control amidst dynamic changes in the capabilities of the robot team.« less
Autonomous Propulsion System Technology Being Developed to Optimize Engine Performance Throughout the Lifecycle

NASA Technical Reports Server (NTRS)

Litt, Jonathan S.

2004-01-01

The goal of the Autonomous Propulsion System Technology (APST) project is to reduce pilot workload under both normal and anomalous conditions. Ongoing work under APST develops and leverages technologies that provide autonomous engine monitoring, diagnosing, and controller adaptation functions, resulting in an integrated suite of algorithms that maintain the propulsion system's performance and safety throughout its life. Engine-to-engine performance variation occurs among new engines because of manufacturing tolerances and assembly practices. As an engine wears, the performance changes as operability limits are reached. In addition to these normal phenomena, other unanticipated events such as sensor failures, bird ingestion, or component faults may occur, affecting pilot workload as well as compromising safety. APST will adapt the controller as necessary to achieve optimal performance for a normal aging engine, and the safety net of APST algorithms will examine and interpret data from a variety of onboard sources to detect, isolate, and if possible, accommodate faults. Situations that cannot be accommodated within the faulted engine itself will be referred to a higher level vehicle management system. This system will have the authority to redistribute the faulted engine's functionality among other engines, or to replan the mission based on this new engine health information. Work is currently underway in the areas of adaptive control to compensate for engine degradation due to aging, data fusion for diagnostics and prognostics of specific sensor and component faults, and foreign object ingestion detection. In addition, a framework is being defined for integrating all the components of APST into a unified system. A multivariable, adaptive, multimode control algorithm has been developed that accommodates degradation-induced thrust disturbances during throttle transients. The baseline controller of the engine model currently being investigated has multiple control modes that are selected according to some performance or operational criteria. As the engine degrades, parameters shift from their nominal values. Thus, when a new control mode is swapped in, a variable that is being brought under control might have an excessive initial error. The new adaptive algorithm adjusts the controller gains on the basis of the level of degradation to minimize the disruptive influence of the large error on other variables and to recover the desired thrust response.
New-Sum: A Novel Online ABFT Scheme For General Iterative Methods

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tao, Dingwen; Song, Shuaiwen; Krishnamoorthy, Sriram

Emerging high-performance computing platforms, with large component counts and lower power margins, are anticipated to be more susceptible to soft errors in both logic circuits and memory subsystems. We present an online algorithm-based fault tolerance (ABFT) approach to efficiently detect and recover soft errors for general iterative methods. We design a novel checksum-based encoding scheme for matrix-vector multiplication that is resilient to both arithmetic and memory errors. Our design decouples the checksum updating process from the actual computation, and allows adaptive checksum overhead control. Building on this new encoding mechanism, we propose two online ABFT designs that can effectively recovermore » from errors when combined with a checkpoint/rollback scheme.« less
Adaptive Control Allocation for Fault Tolerant Overactuated Autonomous Vehicles

DTIC Science & Technology

2007-11-01

Tolerant Overactuated Autonomous Vehicles Casavola, A.; Garone, E. (2007) Adaptive Control Allocation for Fault Tolerant Overactuated Autonomous ...Adaptive Control Allocation for Fault Tolerant Overactuated Autonomous Vehicles 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6...Tolerant Overactuated Autonomous Vehicles 3.2 - 2 RTO-MP-AVT-145 UNCLASSIFIED/UNLIMITED Control allocation problem (CAP) - Given a virtual input v(t
Study of fault-tolerant software technology

NASA Technical Reports Server (NTRS)

Slivinski, T.; Broglio, C.; Wild, C.; Goldberg, J.; Levitt, K.; Hitt, E.; Webb, J.

1984-01-01

Presented is an overview of the current state of the art of fault-tolerant software and an analysis of quantitative techniques and models developed to assess its impact. It examines research efforts as well as experience gained from commercial application of these techniques. The paper also addresses the computer architecture and design implications on hardware, operating systems and programming languages (including Ada) of using fault-tolerant software in real-time aerospace applications. It concludes that fault-tolerant software has progressed beyond the pure research state. The paper also finds that, although not perfectly matched, newer architectural and language capabilities provide many of the notations and functions needed to effectively and efficiently implement software fault-tolerance.
Fault tolerant control laws

NASA Technical Reports Server (NTRS)

Ly, U. L.; Ho, J. K.

1986-01-01

A systematic procedure for the synthesis of fault tolerant control laws to actuator failure has been presented. Two design methods were used to synthesize fault tolerant controllers: the conventional LQ design method and a direct feedback controller design method SANDY. The latter method is used primarily to streamline the full-state Q feedback design into a practical implementable output feedback controller structure. To achieve robustness to control actuator failure, the redundant surfaces are properly balanced according to their control effectiveness. A simple gain schedule based on the landing gear up/down logic involving only three gains was developed to handle three design flight conditions: Mach .25 and Mach .60 at 5000 ft and Mach .90 at 20,000 ft. The fault tolerant control law developed in this study provides good stability augmentation and performance for the relaxed static stability aircraft. The augmented aircraft responses are found to be invariant to the presence of a failure. Furthermore, single-loop stability margins of +6 dB in gain and +30 deg in phase were achieved along with -40 dB/decade rolloff at high frequency.
Depth optimal sorting networks resistant to k passive faults

DOE Office of Scientific and Technical Information (OSTI.GOV)

Piotrow, M.

In this paper, we study the problem of constructing a sorting network that is tolerant to faults and whose running time (i.e. depth) is as small as possible. We consider the scenario of worst-case comparator faults and follow the model of passive comparator failure proposed by Yao and Yao, in which a faulty comparator outputs directly its inputs without comparison. Our main result is the first construction of an N-input, k-fault-tolerant sorting network that is of an asymptotically optimal depth {theta}(log N+k). That improves over the recent result of Leighton and Ma, whose network is of depth O(log N +more » k log log N/log k). Actually, we present a fault-tolerant correction network that can be added after any N-input sorting network to correct its output in the presence of at most k faulty comparators. Since the depth of the network is O(log N + k) and the constants hidden behind the {open_quotes}O{close_quotes} notation are not big, the construction can be of practical use. Developing the techniques necessary to show the main result, we construct a fault-tolerant network for the insertion problem. As a by-product, we get an N-input, O(log N)-depth INSERT-network that is tolerant to random faults, thereby answering a question posed by Ma in his PhD thesis. The results are based on a new notion of constant delay comparator networks, that is, networks in which each register is used (compared) only in a period of time of a constant length. Copies of such networks can be put one after another with only a constant increase in depth per copy.« less
Event-triggered decentralized adaptive fault-tolerant control of uncertain interconnected nonlinear systems with actuator failures.

PubMed

Choi, Yun Ho; Yoo, Sung Jin

2018-06-01

This paper investigates the event-triggered decentralized adaptive tracking problem of a class of uncertain interconnected nonlinear systems with unexpected actuator failures. It is assumed that local control signals are transmitted to local actuators with time-varying faults whenever predefined conditions for triggering events are satisfied. Compared with the existing control-input-based event-triggering strategy for adaptive control of uncertain nonlinear systems, the aim of this paper is to propose a tracking-error-based event-triggering strategy in the decentralized adaptive fault-tolerant tracking framework. The proposed approach can relax drastic changes in control inputs caused by actuator faults in the existing triggering strategy. The stability of the proposed event-triggering control system is analyzed in the Lyapunov sense. Finally, simulation comparisons of the proposed and existing approaches are provided to show the effectiveness of the proposed theoretical result in the presence of actuator faults. Copyright © 2018 ISA. Published by Elsevier Ltd. All rights reserved.
Active fault tolerant control based on interval type-2 fuzzy sliding mode controller and non linear adaptive observer for 3-DOF laboratory helicopter.

PubMed

Zeghlache, Samir; Benslimane, Tarak; Bouguerra, Abderrahmen

2017-11-01

In this paper, a robust controller for a three degree of freedom (3 DOF) helicopter control is proposed in presence of actuator and sensor faults. For this purpose, Interval type-2 fuzzy logic control approach (IT2FLC) and sliding mode control (SMC) technique are used to design a controller, named active fault tolerant interval type-2 Fuzzy Sliding mode controller (AFTIT2FSMC) based on non-linear adaptive observer to estimate and detect the system faults for each subsystem of the 3-DOF helicopter. The proposed control scheme allows avoiding difficult modeling, attenuating the chattering effect of the SMC, reducing the rules number of the fuzzy controller. Exponential stability of the closed loop is guaranteed by using the Lyapunov method. The simulation results show that the AFTIT2FSMC can greatly alleviate the chattering effect, providing good tracking performance, even in presence of actuator and sensor faults. Copyright © 2017 ISA. Published by Elsevier Ltd. All rights reserved.
Fault detection and isolation for complex system

NASA Astrophysics Data System (ADS)

Jing, Chan Shi; Bayuaji, Luhur; Samad, R.; Mustafa, M.; Abdullah, N. R. H.; Zain, Z. M.; Pebrianti, Dwi

2017-07-01

Fault Detection and Isolation (FDI) is a method to monitor, identify, and pinpoint the type and location of system fault in a complex multiple input multiple output (MIMO) non-linear system. A two wheel robot is used as a complex system in this study. The aim of the research is to construct and design a Fault Detection and Isolation algorithm. The proposed method for the fault identification is using hybrid technique that combines Kalman filter and Artificial Neural Network (ANN). The Kalman filter is able to recognize the data from the sensors of the system and indicate the fault of the system in the sensor reading. Error prediction is based on the fault magnitude and the time occurrence of fault. Additionally, Artificial Neural Network (ANN) is another algorithm used to determine the type of fault and isolate the fault in the system.
Characteristic investigation and control of a modular multilevel converter-based HVDC system under single-line-to-ground fault conditions

DOE PAGES

Shi, Xiaojie; Wang, Zhiqiang; Liu, Bo; ...

2014-05-16

This paper presents the analysis and control of a multilevel modular converter (MMC)-based HVDC transmission system under three possible single-line-to-ground fault conditions, with special focus on the investigation of their different fault characteristics. Considering positive-, negative-, and zero-sequence components in both arm voltages and currents, the generalized instantaneous power of a phase unit is derived theoretically according to the equivalent circuit model of the MMC under unbalanced conditions. Based on this model, a novel double-line frequency dc-voltage ripple suppression control is proposed. This controller, together with the negative-and zero-sequence current control, could enhance the overall fault-tolerant capability of the HVDCmore » system without additional cost. To further improve the fault-tolerant capability, the operation performance of the HVDC system with and without single-phase switching is discussed and compared in detail. Lastly, simulation results from a three-phase MMC-HVDC system generated with MATLAB/Simulink are provided to support the theoretical analysis and proposed control schemes.« less
Interface Circuits for Self-Checking Microprocessors

NASA Technical Reports Server (NTRS)

Rennels, D. A.; Chandramouli, R.

1986-01-01

Fault-tolerant-microcomputer concept based on enhancing "simple" computer with redundancy and self-checking logic circuits detect hardware faults. Interface and checking logic and redundant processors confer on 16-bit microcomputer ability to check itself for hardware faults. Checking circuitry also checks itself. Concept of self-checking complementary pairs (SCCP's) employed throughout ICL unit.
Problems related to the integration of fault tolerant aircraft electronic systems

NASA Technical Reports Server (NTRS)

Bannister, J. A.; Adlakha, V.; Triyedi, K.; Alspaugh, T. A., Jr.

1982-01-01

Problems related to the design of the hardware for an integrated aircraft electronic system are considered. Taxonomies of concurrent systems are reviewed and a new taxonomy is proposed. An informal methodology intended to identify feasible regions of the taxonomic design space is described. Specific tools are recommended for use in the methodology. Based on the methodology, a preliminary strawman integrated fault tolerant aircraft electronic system is proposed. Next, problems related to the programming and control of inegrated aircraft electronic systems are discussed. Issues of system resource management, including the scheduling and allocation of real time periodic tasks in a multiprocessor environment, are treated in detail. The role of software design in integrated fault tolerant aircraft electronic systems is discussed. Conclusions and recommendations for further work are included.

Privacy-Assured Aggregation Protocol for Smart Metering: A Proactive Fault-Tolerant Approach [Proactive Fault-Tolerant Aggregation Protocol for Privacy-Assured Smart Metering

DOE PAGES

Won, Jongho; Ma, Chris Y. T.; Yau, David K. Y.; ...

2016-06-01

Smart meters are integral to demand response in emerging smart grids, by reporting the electricity consumption of users to serve application needs. But reporting real-time usage information for individual households raises privacy concerns. Existing techniques to guarantee differential privacy (DP) of smart meter users either are not fault tolerant or achieve (possibly partial) fault tolerance at high communication overheads. In this paper, we propose a fault-tolerant protocol for smart metering that can handle general communication failures while ensuring DP with significantly improved efficiency and lower errors compared with the state of the art. Our protocol handles fail-stop faults proactively bymore » using a novel design of future ciphertexts, and distributes trust among the smart meters by sharing secret keys among them. We prove the DP properties of our protocol and analyze its advantages in fault tolerance, accuracy, and communication efficiency relative to competing techniques. We illustrate our analysis by simulations driven by real-world traces of electricity consumption.« less
Privacy-Assured Aggregation Protocol for Smart Metering: A Proactive Fault-Tolerant Approach [Proactive Fault-Tolerant Aggregation Protocol for Privacy-Assured Smart Metering

DOE Office of Scientific and Technical Information (OSTI.GOV)

Won, Jongho; Ma, Chris Y. T.; Yau, David K. Y.

Smart meters are integral to demand response in emerging smart grids, by reporting the electricity consumption of users to serve application needs. But reporting real-time usage information for individual households raises privacy concerns. Existing techniques to guarantee differential privacy (DP) of smart meter users either are not fault tolerant or achieve (possibly partial) fault tolerance at high communication overheads. In this paper, we propose a fault-tolerant protocol for smart metering that can handle general communication failures while ensuring DP with significantly improved efficiency and lower errors compared with the state of the art. Our protocol handles fail-stop faults proactively bymore » using a novel design of future ciphertexts, and distributes trust among the smart meters by sharing secret keys among them. We prove the DP properties of our protocol and analyze its advantages in fault tolerance, accuracy, and communication efficiency relative to competing techniques. We illustrate our analysis by simulations driven by real-world traces of electricity consumption.« less
Model-based diagnosis through Structural Analysis and Causal Computation for automotive Polymer Electrolyte Membrane Fuel Cell systems

NASA Astrophysics Data System (ADS)

Polverino, Pierpaolo; Frisk, Erik; Jung, Daniel; Krysander, Mattias; Pianese, Cesare

2017-07-01

The present paper proposes an advanced approach for Polymer Electrolyte Membrane Fuel Cell (PEMFC) systems fault detection and isolation through a model-based diagnostic algorithm. The considered algorithm is developed upon a lumped parameter model simulating a whole PEMFC system oriented towards automotive applications. This model is inspired by other models available in the literature, with further attention to stack thermal dynamics and water management. The developed model is analysed by means of Structural Analysis, to identify the correlations among involved physical variables, defined equations and a set of faults which may occur in the system (related to both auxiliary components malfunctions and stack degradation phenomena). Residual generators are designed by means of Causal Computation analysis and the maximum theoretical fault isolability, achievable with a minimal number of installed sensors, is investigated. The achieved results proved the capability of the algorithm to theoretically detect and isolate almost all faults with the only use of stack voltage and temperature sensors, with significant advantages from an industrial point of view. The effective fault isolability is proved through fault simulations at a specific fault magnitude with an advanced residual evaluation technique, to consider quantitative residual deviations from normal conditions and achieve univocal fault isolation.
On the design of fault-tolerant robotic manipulator systems

NASA Technical Reports Server (NTRS)

Tesar, Delbert

1993-01-01

Robotic systems are finding increasing use in space applications. Many of these devices are going to be operational on board the Space Station Freedom. Fault tolerance has been deemed necessary because of the criticality of the tasks and the inaccessibility of the systems to maintenance and repair. Design for fault tolerance in manipulator systems is an area within robotics that is without precedence in the literature. In this paper, we will attempt to lay down the foundations for such a technology. Design for fault tolerance demands new and special approaches to design, often at considerable variance from established design practices. These design aspects, together with reliability evaluation and modeling tools, are presented. Mechanical architectures that employ protective redundancies at many levels and have a modular architecture are then studied in detail. Once a mechanical architecture for fault tolerance has been derived, the chronological stages of operational fault tolerance are investigated. Failure detection, isolation, and estimation methods are surveyed, and such methods for robot sensors and actuators are derived. Failure recovery methods are also presented for each of the protective layers of redundancy. Failure recovery tactics often span all of the layers of a control hierarchy. Thus, a unified framework for decision-making and control, which orchestrates both the nominal redundancy management tasks and the failure management tasks, has been derived. The well-developed field of fault-tolerant computers is studied next, and some design principles relevant to the design of fault-tolerant robot controllers are abstracted. Conclusions are drawn, and a road map for the design of fault-tolerant manipulator systems is laid out with recommendations for a 10 DOF arm with dual actuators at each joint.
A comparative study of sensor fault diagnosis methods based on observer for ECAS system

NASA Astrophysics Data System (ADS)

Xu, Xing; Wang, Wei; Zou, Nannan; Chen, Long; Cui, Xiaoli

2017-03-01

The performance and practicality of electronically controlled air suspension (ECAS) system are highly dependent on the state information supplied by kinds of sensors, but faults of sensors occur frequently. Based on a non-linearized 3-DOF 1/4 vehicle model, different methods of fault detection and isolation (FDI) are used to diagnose the sensor faults for ECAS system. The considered approaches include an extended Kalman filter (EKF) with concise algorithm, a strong tracking filter (STF) with robust tracking ability, and the cubature Kalman filter (CKF) with numerical precision. We propose three filters of EKF, STF, and CKF to design a state observer of ECAS system under typical sensor faults and noise. Results show that three approaches can successfully detect and isolate faults respectively despite of the existence of environmental noise, FDI time delay and fault sensitivity of different algorithms are different, meanwhile, compared with EKF and STF, CKF method has best performing FDI of sensor faults for ECAS system.
Software fault tolerance for real-time avionics systems

NASA Technical Reports Server (NTRS)

Anderson, T.; Knight, J. C.

1983-01-01

Avionics systems have very high reliability requirements and are therefore prime candidates for the inclusion of fault tolerance techniques. In order to provide tolerance to software faults, some form of state restoration is usually advocated as a means of recovery. State restoration can be very expensive for systems which utilize concurrent processes. The concurrency present in most avionics systems and the further difficulties introduced by timing constraints imply that providing tolerance for software faults may be inordinately expensive or complex. A straightforward pragmatic approach to software fault tolerance which is believed to be applicable to many real-time avionics systems is proposed. A classification system for software errors is presented together with approaches to recovery and continued service for each error type.
A novel KFCM based fault diagnosis method for unknown faults in satellite reaction wheels.

PubMed

Hu, Di; Sarosh, Ali; Dong, Yun-Feng

2012-03-01

Reaction wheels are one of the most critical components of the satellite attitude control system, therefore correct diagnosis of their faults is quintessential for efficient operation of these spacecraft. The known faults in any of the subsystems are often diagnosed by supervised learning algorithms, however, this method fails to work correctly when a new or unknown fault occurs. In such cases an unsupervised learning algorithm becomes essential for obtaining the correct diagnosis. Kernel Fuzzy C-Means (KFCM) is one of the unsupervised algorithms, although it has its own limitations; however in this paper a novel method has been proposed for conditioning of KFCM method (C-KFCM) so that it can be effectively used for fault diagnosis of both known and unknown faults as in satellite reaction wheels. The C-KFCM approach involves determination of exact class centers from the data of known faults, in this way discrete number of fault classes are determined at the start. Similarity parameters are derived and determined for each of the fault data point. Thereafter depending on the similarity threshold each data point is issued with a class label. The high similarity points fall into one of the 'known-fault' classes while the low similarity points are labeled as 'unknown-faults'. Simulation results show that as compared to the supervised algorithm such as neural network, the C-KFCM method can effectively cluster historical fault data (as in reaction wheels) and diagnose the faults to an accuracy of more than 91%. Copyright © 2011 ISA. Published by Elsevier Ltd. All rights reserved.
Design and Analysis of Linear Fault-Tolerant Permanent-Magnet Vernier Machines

PubMed Central

Xu, Liang; Liu, Guohai; Du, Yi; Liu, Hu

2014-01-01

This paper proposes a new linear fault-tolerant permanent-magnet (PM) vernier (LFTPMV) machine, which can offer high thrust by using the magnetic gear effect. Both PMs and windings of the proposed machine are on short mover, while the long stator is only manufactured from iron. Hence, the proposed machine is very suitable for long stroke system applications. The key of this machine is that the magnetizer splits the two movers with modular and complementary structures. Hence, the proposed machine offers improved symmetrical and sinusoidal back electromotive force waveform and reduced detent force. Furthermore, owing to the complementary structure, the proposed machine possesses favorable fault-tolerant capability, namely, independent phases. In particular, differing from the existing fault-tolerant machines, the proposed machine offers fault tolerance without sacrificing thrust density. This is because neither fault-tolerant teeth nor the flux-barriers are adopted. The electromagnetic characteristics of the proposed machine are analyzed using the time-stepping finite-element method, which verifies the effectiveness of the theoretical analysis. PMID:24982959
Design and analysis of linear fault-tolerant permanent-magnet vernier machines.

PubMed

Xu, Liang; Ji, Jinghua; Liu, Guohai; Du, Yi; Liu, Hu

2014-01-01

This paper proposes a new linear fault-tolerant permanent-magnet (PM) vernier (LFTPMV) machine, which can offer high thrust by using the magnetic gear effect. Both PMs and windings of the proposed machine are on short mover, while the long stator is only manufactured from iron. Hence, the proposed machine is very suitable for long stroke system applications. The key of this machine is that the magnetizer splits the two movers with modular and complementary structures. Hence, the proposed machine offers improved symmetrical and sinusoidal back electromotive force waveform and reduced detent force. Furthermore, owing to the complementary structure, the proposed machine possesses favorable fault-tolerant capability, namely, independent phases. In particular, differing from the existing fault-tolerant machines, the proposed machine offers fault tolerance without sacrificing thrust density. This is because neither fault-tolerant teeth nor the flux-barriers are adopted. The electromagnetic characteristics of the proposed machine are analyzed using the time-stepping finite-element method, which verifies the effectiveness of the theoretical analysis.
A Decentralized Adaptive Approach to Fault Tolerant Flight Control

NASA Technical Reports Server (NTRS)

Wu, N. Eva; Nikulin, Vladimir; Heimes, Felix; Shormin, Victor

2000-01-01

This paper briefly reports some results of our study on the application of a decentralized adaptive control approach to a 6 DOF nonlinear aircraft model. The simulation results showed the potential of using this approach to achieve fault tolerant control. Based on this observation and some analysis, the paper proposes a multiple channel adaptive control scheme that makes use of the functionally redundant actuating and sensing capabilities in the model, and explains how to implement the scheme to tolerate actuator and sensor failures. The conditions, under which the scheme is applicable, are stated in the paper.
Generic, scalable and decentralized fault detection for robot swarms.

PubMed

Tarapore, Danesh; Christensen, Anders Lyhne; Timmis, Jon

2017-01-01

Robot swarms are large-scale multirobot systems with decentralized control which means that each robot acts based only on local perception and on local coordination with neighboring robots. The decentralized approach to control confers number of potential benefits. In particular, inherent scalability and robustness are often highlighted as key distinguishing features of robot swarms compared with systems that rely on traditional approaches to multirobot coordination. It has, however, been shown that swarm robotics systems are not always fault tolerant. To realize the robustness potential of robot swarms, it is thus essential to give systems the capacity to actively detect and accommodate faults. In this paper, we present a generic fault-detection system for robot swarms. We show how robots with limited and imperfect sensing capabilities are able to observe and classify the behavior of one another. In order to achieve this, the underlying classifier is an immune system-inspired algorithm that learns to distinguish between normal behavior and abnormal behavior online. Through a series of experiments, we systematically assess the performance of our approach in a detailed simulation environment. In particular, we analyze our system's capacity to correctly detect robots with faults, false positive rates, performance in a foraging task in which each robot exhibits a composite behavior, and performance under perturbations of the task environment. Results show that our generic fault-detection system is robust, that it is able to detect faults in a timely manner, and that it achieves a low false positive rate. The developed fault-detection system has the potential to enable long-term autonomy for robust multirobot systems, thus increasing the usefulness of robots for a diverse repertoire of upcoming applications in the area of distributed intelligent automation.
Generic, scalable and decentralized fault detection for robot swarms

PubMed Central

Christensen, Anders Lyhne; Timmis, Jon

2017-01-01

Robot swarms are large-scale multirobot systems with decentralized control which means that each robot acts based only on local perception and on local coordination with neighboring robots. The decentralized approach to control confers number of potential benefits. In particular, inherent scalability and robustness are often highlighted as key distinguishing features of robot swarms compared with systems that rely on traditional approaches to multirobot coordination. It has, however, been shown that swarm robotics systems are not always fault tolerant. To realize the robustness potential of robot swarms, it is thus essential to give systems the capacity to actively detect and accommodate faults. In this paper, we present a generic fault-detection system for robot swarms. We show how robots with limited and imperfect sensing capabilities are able to observe and classify the behavior of one another. In order to achieve this, the underlying classifier is an immune system-inspired algorithm that learns to distinguish between normal behavior and abnormal behavior online. Through a series of experiments, we systematically assess the performance of our approach in a detailed simulation environment. In particular, we analyze our system’s capacity to correctly detect robots with faults, false positive rates, performance in a foraging task in which each robot exhibits a composite behavior, and performance under perturbations of the task environment. Results show that our generic fault-detection system is robust, that it is able to detect faults in a timely manner, and that it achieves a low false positive rate. The developed fault-detection system has the potential to enable long-term autonomy for robust multirobot systems, thus increasing the usefulness of robots for a diverse repertoire of upcoming applications in the area of distributed intelligent automation. PMID:28806756
Sensor fault diagnosis of aero-engine based on divided flight status.

PubMed

Zhao, Zhen; Zhang, Jun; Sun, Yigang; Liu, Zhexu

2017-11-01

Fault diagnosis and safety analysis of an aero-engine have attracted more and more attention in modern society, whose safety directly affects the flight safety of an aircraft. In this paper, the problem concerning sensor fault diagnosis is investigated for an aero-engine during the whole flight process. Considering that the aero-engine is always working in different status through the whole flight process, a flight status division-based sensor fault diagnosis method is presented to improve fault diagnosis precision for the aero-engine. First, aero-engine status is partitioned according to normal sensor data during the whole flight process through the clustering algorithm. Based on that, a diagnosis model is built for each status using the principal component analysis algorithm. Finally, the sensors are monitored using the built diagnosis models by identifying the aero-engine status. The simulation result illustrates the effectiveness of the proposed method.
Sensor fault diagnosis of aero-engine based on divided flight status

NASA Astrophysics Data System (ADS)

Zhao, Zhen; Zhang, Jun; Sun, Yigang; Liu, Zhexu

2017-11-01

Fault diagnosis and safety analysis of an aero-engine have attracted more and more attention in modern society, whose safety directly affects the flight safety of an aircraft. In this paper, the problem concerning sensor fault diagnosis is investigated for an aero-engine during the whole flight process. Considering that the aero-engine is always working in different status through the whole flight process, a flight status division-based sensor fault diagnosis method is presented to improve fault diagnosis precision for the aero-engine. First, aero-engine status is partitioned according to normal sensor data during the whole flight process through the clustering algorithm. Based on that, a diagnosis model is built for each status using the principal component analysis algorithm. Finally, the sensors are monitored using the built diagnosis models by identifying the aero-engine status. The simulation result illustrates the effectiveness of the proposed method.
Vibration Sensor-Based Bearing Fault Diagnosis Using Ellipsoid-ARTMAP and Differential Evolution Algorithms

PubMed Central

Liu, Chang; Wang, Guofeng; Xie, Qinglu; Zhang, Yanchao

2014-01-01

Effective fault classification of rolling element bearings provides an important basis for ensuring safe operation of rotating machinery. In this paper, a novel vibration sensor-based fault diagnosis method using an Ellipsoid-ARTMAP network (EAM) and a differential evolution (DE) algorithm is proposed. The original features are firstly extracted from vibration signals based on wavelet packet decomposition. Then, a minimum-redundancy maximum-relevancy algorithm is introduced to select the most prominent features so as to decrease feature dimensions. Finally, a DE-based EAM (DE-EAM) classifier is constructed to realize the fault diagnosis. The major characteristic of EAM is that the sample distribution of each category is realized by using a hyper-ellipsoid node and smoothing operation algorithm. Therefore, it can depict the decision boundary of disperse samples accurately and effectively avoid over-fitting phenomena. To optimize EAM network parameters, the DE algorithm is presented and two objectives, including both classification accuracy and nodes number, are simultaneously introduced as the fitness functions. Meanwhile, an exponential criterion is proposed to realize final selection of the optimal parameters. To prove the effectiveness of the proposed method, the vibration signals of four types of rolling element bearings under different loads were collected. Moreover, to improve the robustness of the classifier evaluation, a two-fold cross validation scheme is adopted and the order of feature samples is randomly arranged ten times within each fold. The results show that DE-EAM classifier can recognize the fault categories of the rolling element bearings reliably and accurately. PMID:24936949
Modeling the Fault Tolerant Capability of a Flight Control System: An Exercise in SCR Specification

NASA Technical Reports Server (NTRS)

Alexander, Chris; Cortellessa, Vittorio; DelGobbo, Diego; Mili, Ali; Napolitano, Marcello

2000-01-01

In life-critical and mission-critical applications, it is important to make provisions for a wide range of contingencies, by providing means for fault tolerance. In this paper, we discuss the specification of a flight control system that is fault tolerant with respect to sensor faults. Redundancy is provided by analytical relations that hold between sensor readings; depending on the conditions, this redundancy can be used to detect, identify and accommodate sensor faults.
Design study of Software-Implemented Fault-Tolerance (SIFT) computer

NASA Technical Reports Server (NTRS)

Wensley, J. H.; Goldberg, J.; Green, M. W.; Kutz, W. H.; Levitt, K. N.; Mills, M. E.; Shostak, R. E.; Whiting-Okeefe, P. M.; Zeidler, H. M.

1982-01-01

Software-implemented fault tolerant (SIFT) computer design for commercial aviation is reported. A SIFT design concept is addressed. Alternate strategies for physical implementation are considered. Hardware and software design correctness is addressed. System modeling and effectiveness evaluation are considered from a fault-tolerant point of view.
Fault diagnosis for wind turbine planetary ring gear via a meshing resonance based filtering algorithm.

PubMed

Wang, Tianyang; Chu, Fulei; Han, Qinkai

2017-03-01

Identifying the differences between the spectra or envelope spectra of a faulty signal and a healthy baseline signal is an efficient planetary gearbox local fault detection strategy. However, causes other than local faults can also generate the characteristic frequency of a ring gear fault; this may further affect the detection of a local fault. To address this issue, a new filtering algorithm based on the meshing resonance phenomenon is proposed. In detail, the raw signal is first decomposed into different frequency bands and levels. Then, a new meshing index and an MRgram are constructed to determine which bands belong to the meshing resonance frequency band. Furthermore, an optimal filter band is selected from this MRgram. Finally, the ring gear fault can be detected according to the envelope spectrum of the band-pass filtering result. Copyright © 2016 ISA. Published by Elsevier Ltd. All rights reserved.
Application of improved wavelet total variation denoising for rolling bearing incipient fault diagnosis

NASA Astrophysics Data System (ADS)

Zhang, W.; Jia, M. P.

2018-06-01

When incipient fault appear in the rolling bearing, the fault feature is too small and easily submerged in the strong background noise. In this paper, wavelet total variation denoising based on kurtosis (Kurt-WATV) is studied, which can extract the incipient fault feature of the rolling bearing more effectively. The proposed algorithm contains main steps: a) establish a sparse diagnosis model, b) represent periodic impulses based on the redundant wavelet dictionary, c) solve the joint optimization problem by alternating direction method of multipliers (ADMM), d) obtain the reconstructed signal using kurtosis value as criterion and then select optimal wavelet subbands. This paper uses overcomplete rational-dilation wavelet transform (ORDWT) as a dictionary, and adjusts the control parameters to achieve the concentration in the time-frequency plane. Incipient fault of rolling bearing is used as an example, and the result shows that the effectiveness and superiority of the proposed Kurt- WATV bearing fault diagnosis algorithm.
Test Generation Algorithm for Fault Detection of Analog Circuits Based on Extreme Learning Machine

PubMed Central

Zhou, Jingyu; Tian, Shulin; Yang, Chenglin; Ren, Xuelong

2014-01-01

This paper proposes a novel test generation algorithm based on extreme learning machine (ELM), and such algorithm is cost-effective and low-risk for analog device under test (DUT). This method uses test patterns derived from the test generation algorithm to stimulate DUT, and then samples output responses of the DUT for fault classification and detection. The novel ELM-based test generation algorithm proposed in this paper contains mainly three aspects of innovation. Firstly, this algorithm saves time efficiently by classifying response space with ELM. Secondly, this algorithm can avoid reduced test precision efficiently in case of reduction of the number of impulse-response samples. Thirdly, a new process of test signal generator and a test structure in test generation algorithm are presented, and both of them are very simple. Finally, the abovementioned improvement and functioning are confirmed in experiments. PMID:25610458

ECFS: A decentralized, distributed and fault-tolerant FUSE filesystem for the LHCb online farm

NASA Astrophysics Data System (ADS)

Rybczynski, Tomasz; Bonaccorsi, Enrico; Neufeld, Niko

2014-06-01

The LHCb experiment records millions of proton collisions every second, but only a fraction of them are useful for LHCb physics. In order to filter out the "bad events" a large farm of x86-servers (~2000 nodes) has been put in place. These servers boot from and run from NFS, however they use their local disk to temporarily store data, which cannot be processed in real-time ("data-deferring"). These events are subsequently processed, when there are no live-data coming in. The effective CPU power is thus greatly increased. This gain in CPU power depends critically on the availability of the local disks. For cost and power-reasons, mirroring (RAID-1) is not used, leading to a lot of operational headache with failing disks and disk-errors or server failures induced by faulty disks. To mitigate these problems and increase the reliability of the LHCb farm, while at same time keeping cost and power-consumption low, an extensive research and study of existing highly available and distributed file systems has been done. While many distributed file systems are providing reliability by "file replication", none of the evaluated ones supports erasure algorithms. A decentralised, distributed and fault-tolerant "write once read many" file system has been designed and implemented as a proof of concept providing fault tolerance without using expensive - in terms of disk space - file replication techniques and providing a unique namespace as a main goals. This paper describes the design and the implementation of the Erasure Codes File System (ECFS) and presents the specialised FUSE interface for Linux. Depending on the encoding algorithm ECFS will use a certain number of target directories as a backend to store the segments that compose the encoded data. When target directories are mounted via nfs/autofs - ECFS will act as a file-system over network/block-level raid over multiple servers.
Quantum Error Correction with Biased Noise

NASA Astrophysics Data System (ADS)

Brooks, Peter

Quantum computing offers powerful new techniques for speeding up the calculation of many classically intractable problems. Quantum algorithms can allow for the efficient simulation of physical systems, with applications to basic research, chemical modeling, and drug discovery; other algorithms have important implications for cryptography and internet security. At the same time, building a quantum computer is a daunting task, requiring the coherent manipulation of systems with many quantum degrees of freedom while preventing environmental noise from interacting too strongly with the system. Fortunately, we know that, under reasonable assumptions, we can use the techniques of quantum error correction and fault tolerance to achieve an arbitrary reduction in the noise level. In this thesis, we look at how additional information about the structure of noise, or "noise bias," can improve or alter the performance of techniques in quantum error correction and fault tolerance. In Chapter 2, we explore the possibility of designing certain quantum gates to be extremely robust with respect to errors in their operation. This naturally leads to structured noise where certain gates can be implemented in a protected manner, allowing the user to focus their protection on the noisier unprotected operations. In Chapter 3, we examine how to tailor error-correcting codes and fault-tolerant quantum circuits in the presence of dephasing biased noise, where dephasing errors are far more common than bit-flip errors. By using an appropriately asymmetric code, we demonstrate the ability to improve the amount of error reduction and decrease the physical resources required for error correction. In Chapter 4, we analyze a variety of protocols for distilling magic states, which enable universal quantum computation, in the presence of faulty Clifford operations. Here again there is a hierarchy of noise levels, with a fixed error rate for faulty gates, and a second rate for errors in the distilled states which decreases as the states are distilled to better quality. The interplay of of these different rates sets limits on the achievable distillation and how quickly states converge to that limit.
Guest Editor's Introduction: Special section on dependable distributed systems

NASA Astrophysics Data System (ADS)

Fetzer, Christof

1999-09-01

We rely more and more on computers. For example, the Internet reshapes the way we do business. A `computer outage' can cost a company a substantial amount of money. Not only with respect to the business lost during an outage, but also with respect to the negative publicity the company receives. This is especially true for Internet companies. After recent computer outages of Internet companies, we have seen a drastic fall of the shares of the affected companies. There are multiple causes for computer outages. Although computer hardware becomes more reliable, hardware related outages remain an important issue. For example, some of the recent computer outages of companies were caused by failed memory and system boards, and even by crashed disks - a failure type which can easily be masked using disk mirroring. Transient hardware failures might also look like software failures and, hence, might be incorrectly classified as such. However, many outages are software related. Faulty system software, middleware, and application software can crash a system. Dependable computing systems are systems we can rely on. Dependable systems are, by definition, reliable, available, safe and secure [3]. This special section focuses on issues related to dependable distributed systems. Distributed systems have the potential to be more dependable than a single computer because the probability that all computers in a distributed system fail is smaller than the probability that a single computer fails. However, if a distributed system is not built well, it is potentially less dependable than a single computer since the probability that at least one computer in a distributed system fails is higher than the probability that one computer fails. For example, if the crash of any computer in a distributed system can bring the complete system to a halt, the system is less dependable than a single-computer system. Building dependable distributed systems is an extremely difficult task. There is no silver bullet solution. Instead one has to apply a variety of engineering techniques [2]: fault-avoidance (minimize the occurrence of faults, e.g. by using a proper design process), fault-removal (remove faults before they occur, e.g. by testing), fault-evasion (predict faults by monitoring and reconfigure the system before failures occur), and fault-tolerance (mask and/or contain failures). Building a system from scratch is an expensive and time consuming effort. To reduce the cost of building dependable distributed systems, one would choose to use commercial off-the-shelf (COTS) components whenever possible. The usage of COTS components has several potential advantages beyond minimizing costs. For example, through the widespread usage of a COTS component, design failures might be detected and fixed before the component is used in a dependable system. Custom-designed components have to mature without the widespread in-field testing of COTS components. COTS components have various potential disadvantages when used in dependable systems. For example, minimizing the time to market might lead to the release of components with inherent design faults (e.g. use of `shortcuts' that only work most of the time). In addition, the components might be more complex than needed and, hence, potentially have more design faults than simpler components. However, given economic constraints and the ability to cope with some of the problems using fault-evasion and fault-tolerance, only for a small percentage of systems can one justify not using COTS components. Distributed systems built from current COTS components are asynchronous systems in the sense that there exists no a priori known bound on the transmission delay of messages or the execution time of processes. When designing a distributed algorithm, one would like to make sure (e.g. by testing or verification) that it is correct, i.e. satisfies its specification. Many distributed algorithms make use of consensus (eventually all non-crashed processes have to agree on a value), leader election (a crashed leader is eventually replaced by a new leader, but at any time there is at most one leader) or a group membership detection service (a crashed process is eventually suspected to have crashed but only crashed processes are suspected). From a theoretical point of view, the service specifications given for such services are not implementable in asynchronous systems. In particular, for each implementation one can derive a counter example in which the service violates its specification. From a practical point of view, the consensus, the leader election, and the membership detection problem are solvable in asynchronous distributed systems. In this special section, Raynal and Tronel show how to bridge this difference by showing how to implement the group membership detection problem with a negligible probability [1] to fail in an asynchronous system. The group membership detection problem is specified by a liveness condition (L) and a safety property (S): (L) if a process p crashes, then eventually every non-crashed process q has to suspect that p has crashed; and (S) if a process q suspects p, then p has indeed crashed. One can show that either (L) or (S) is implementable, but one cannot implement both (L) and (S) at the same time in an asynchronous system. In practice, one only needs to implement (L) and (S) such that the probability that (L) or (S) is violated becomes negligible. Raynal and Tronel propose and analyse a protocol that implements (L) with certainty and that can be tuned such that the probability that (S) is violated becomes negligible. Designing and implementing distributed fault-tolerant protocols for asynchronous systems is a difficult but not an impossible task. A fault-tolerant protocol has to detect and mask certain failure classes, e.g. crash failures and message omission failures. There is a trade-off between the performance of a fault-tolerant protocol and the failure classes the protocol can tolerate. One wants to tolerate as many failure classes as needed to satisfy the stochastic requirements of the protocol [1] while still maintaining a sufficient performance. Since clients of a protocol have different requirements with respect to the performance/fault-tolerance trade-off, one would like to be able to customize protocols such that one can select an appropriate performance/fault-tolerance trade-off. In this special section Hiltunen et al describe how one can compose protocols from micro-protocols in their Cactus system. They show how a group RPC system can be tailored to the needs of a client. In particular, they show how considering additional failure classes affects the performance of a group RPC system. References [1] Cristian F 1991 Understanding fault-tolerant distributed systems Communications of ACM 34 (2) 56-78 [2] Heimerdinger W L and Weinstock C B 1992 A conceptual framework for system fault tolerance Technical Report 92-TR-33, CMU/SEI [3] Laprie J C (ed) 1992 Dependability: Basic Concepts and Terminology (Vienna: Springer)
14 CFR Special Federal Aviation... - Fuel Tank System Fault Tolerance Evaluation Requirements

Code of Federal Regulations, 2014 CFR

2014-01-01

... 14 Aeronautics and Space 1 2014-01-01 2014-01-01 false Fuel Tank System Fault Tolerance Evaluation Requirements Federal Special Federal Aviation Regulation No. 88 Aeronautics and Space FEDERAL AVIATION..., SFAR No. 88 Special Federal Aviation Regulation No. 88—Fuel Tank System Fault Tolerance Evaluation...
14 CFR Special Federal Aviation... - Fuel Tank System Fault Tolerance Evaluation Requirements

Code of Federal Regulations, 2011 CFR

2011-01-01

... 14 Aeronautics and Space 1 2011-01-01 2011-01-01 false Fuel Tank System Fault Tolerance Evaluation Requirements Federal Special Federal Aviation Regulation No. 88 Aeronautics and Space FEDERAL AVIATION..., SFAR No. 88 Special Federal Aviation Regulation No. 88—Fuel Tank System Fault Tolerance Evaluation...
14 CFR Special Federal Aviation... - Fuel Tank System Fault Tolerance Evaluation Requirements

Code of Federal Regulations, 2012 CFR

2012-01-01

... 14 Aeronautics and Space 1 2012-01-01 2012-01-01 false Fuel Tank System Fault Tolerance Evaluation Requirements Federal Special Federal Aviation Regulation No. 88 Aeronautics and Space FEDERAL AVIATION..., SFAR No. 88 Special Federal Aviation Regulation No. 88—Fuel Tank System Fault Tolerance Evaluation...
14 CFR Special Federal Aviation... - Fuel Tank System Fault Tolerance Evaluation Requirements

Code of Federal Regulations, 2010 CFR

2010-01-01

... 14 Aeronautics and Space 1 2010-01-01 2010-01-01 false Fuel Tank System Fault Tolerance Evaluation Requirements Federal Special Federal Aviation Regulation No. 88 Aeronautics and Space FEDERAL AVIATION..., SFAR No. 88 Special Federal Aviation Regulation No. 88—Fuel Tank System Fault Tolerance Evaluation...
14 CFR Special Federal Aviation... - Fuel Tank System Fault Tolerance Evaluation Requirements

Code of Federal Regulations, 2013 CFR

2013-01-01

... 14 Aeronautics and Space 1 2013-01-01 2013-01-01 false Fuel Tank System Fault Tolerance Evaluation Requirements Federal Special Federal Aviation Regulation No. 88 Aeronautics and Space FEDERAL AVIATION..., SFAR No. 88 Special Federal Aviation Regulation No. 88—Fuel Tank System Fault Tolerance Evaluation...
A second generation experiment in fault-tolerant software

NASA Technical Reports Server (NTRS)

Knight, J. C.

1986-01-01

The primary goal was to determine whether the application of fault tolerance to software increases its reliability if the cost of production is the same as for an equivalent nonfault tolerance version derived from the same requirements specification. Software development protocols are discussed. The feasibility of adapting to software design fault tolerance the technique of N-fold Modular Redundancy with majority voting was studied.
A Scheduling Algorithm for Replicated Real-Time Tasks

NASA Technical Reports Server (NTRS)

Yu, Albert C.; Lin, Kwei-Jay

1991-01-01

We present an algorithm for scheduling real-time periodic tasks on a multiprocessor system under fault-tolerant requirement. Our approach incorporates both the redundancy and masking technique and the imprecise computation model. Since the tasks in hard real-time systems have stringent timing constraints, the redundancy and masking technique are more appropriate than the rollback techniques which usually require extra time for error recovery. The imprecise computation model provides flexible functionality by trading off the quality of the result produced by a task with the amount of processing time required to produce it. It therefore permits the performance of a real-time system to degrade gracefully. We evaluate the algorithm by stochastic analysis and Monte Carlo simulations. The results show that the algorithm is resilient under hardware failures.
A Sparsity-Promoted Method Based on Majorization-Minimization for Weak Fault Feature Enhancement

PubMed Central

Hao, Yansong; Song, Liuyang; Tang, Gang; Yuan, Hongfang

2018-01-01

Fault transient impulses induced by faulty components in rotating machinery usually contain substantial interference. Fault features are comparatively weak in the initial fault stage, which renders fault diagnosis more difficult. In this case, a sparse representation method based on the Majorzation-Minimization (MM) algorithm is proposed to enhance weak fault features and extract the features from strong background noise. However, the traditional MM algorithm suffers from two issues, which are the choice of sparse basis and complicated calculations. To address these challenges, a modified MM algorithm is proposed in which a sparse optimization objective function is designed firstly. Inspired by the Basis Pursuit (BP) model, the optimization function integrates an impulsive feature-preserving factor and a penalty function factor. Second, a modified Majorization iterative method is applied to address the convex optimization problem of the designed function. A series of sparse coefficients can be achieved through iterating, which only contain transient components. It is noteworthy that there is no need to select the sparse basis in the proposed iterative method because it is fixed as a unit matrix. Then the reconstruction step is omitted, which can significantly increase detection efficiency. Eventually, envelope analysis of the sparse coefficients is performed to extract weak fault features. Simulated and experimental signals including bearings and gearboxes are employed to validate the effectiveness of the proposed method. In addition, comparisons are made to prove that the proposed method outperforms the traditional MM algorithm in terms of detection results and efficiency. PMID:29597280
A Sparsity-Promoted Method Based on Majorization-Minimization for Weak Fault Feature Enhancement.

PubMed

Ren, Bangyue; Hao, Yansong; Wang, Huaqing; Song, Liuyang; Tang, Gang; Yuan, Hongfang

2018-03-28

Fault transient impulses induced by faulty components in rotating machinery usually contain substantial interference. Fault features are comparatively weak in the initial fault stage, which renders fault diagnosis more difficult. In this case, a sparse representation method based on the Majorzation-Minimization (MM) algorithm is proposed to enhance weak fault features and extract the features from strong background noise. However, the traditional MM algorithm suffers from two issues, which are the choice of sparse basis and complicated calculations. To address these challenges, a modified MM algorithm is proposed in which a sparse optimization objective function is designed firstly. Inspired by the Basis Pursuit (BP) model, the optimization function integrates an impulsive feature-preserving factor and a penalty function factor. Second, a modified Majorization iterative method is applied to address the convex optimization problem of the designed function. A series of sparse coefficients can be achieved through iterating, which only contain transient components. It is noteworthy that there is no need to select the sparse basis in the proposed iterative method because it is fixed as a unit matrix. Then the reconstruction step is omitted, which can significantly increase detection efficiency. Eventually, envelope analysis of the sparse coefficients is performed to extract weak fault features. Simulated and experimental signals including bearings and gearboxes are employed to validate the effectiveness of the proposed method. In addition, comparisons are made to prove that the proposed method outperforms the traditional MM algorithm in terms of detection results and efficiency.
Immunity-Based Aircraft Fault Detection System

NASA Technical Reports Server (NTRS)

Dasgupta, D.; KrishnaKumar, K.; Wong, D.; Berry, M.

2004-01-01

In the study reported in this paper, we have developed and applied an Artificial Immune System (AIS) algorithm for aircraft fault detection, as an extension to a previous work on intelligent flight control (IFC). Though the prior studies had established the benefits of IFC, one area of weakness that needed to be strengthened was the control dead band induced by commanding a failed surface. Since the IFC approach uses fault accommodation with no detection, the dead band, although it reduces over time due to learning, is present and causes degradation in handling qualities. If the failure can be identified, this dead band can be further A ed to ensure rapid fault accommodation and better handling qualities. The paper describes the application of an immunity-based approach that can detect a broad spectrum of known and unforeseen failures. The approach incorporates the knowledge of the normal operational behavior of the aircraft from sensory data, and probabilistically generates a set of pattern detectors that can detect any abnormalities (including faults) in the behavior pattern indicating unsafe in-flight operation. We developed a tool called MILD (Multi-level Immune Learning Detection) based on a real-valued negative selection algorithm that can generate a small number of specialized detectors (as signatures of known failure conditions) and a larger set of generalized detectors for unknown (or possible) fault conditions. Once the fault is detected and identified, an adaptive control system would use this detection information to stabilize the aircraft by utilizing available resources (control surfaces). We experimented with data sets collected under normal and various simulated failure conditions using a piloted motion-base simulation facility. The reported results are from a collection of test cases that reflect the performance of the proposed immunity-based fault detection algorithm.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Duan, Sisi; Nicely, Lucas D; Zhang, Haibin

Modern large-scale networks require the ability to withstand arbitrary failures (i.e., Byzantine failures). Byzantine reliable broadcast algorithms can be used to reliably disseminate information in the presence of Byzantine failures. We design a novel Byzantine reliable broadcast protocol for loosely connected and synchronous networks. While previous such protocols all assume correct senders, our protocol is the first to handle Byzantine senders. To achieve this goal, we have developed new techniques for fault detection and fault tolerance. Our protocol is efficient, and under normal circumstances, no expensive public-key cryptographic operations are used. We implement and evaluate our protocol, demonstrating that ourmore » protocol has high throughput and is superior to the existing protocols in uncivil executions.« less
Pipeline synthetic aperture radar data compression utilizing systolic binary tree-searched architecture for vector quantization

NASA Technical Reports Server (NTRS)

Chang, Chi-Yung (Inventor); Fang, Wai-Chi (Inventor); Curlander, John C. (Inventor)

1995-01-01

A system for data compression utilizing systolic array architecture for Vector Quantization (VQ) is disclosed for both full-searched and tree-searched. For a tree-searched VQ, the special case of a Binary Tree-Search VQ (BTSVQ) is disclosed with identical Processing Elements (PE) in the array for both a Raw-Codebook VQ (RCVQ) and a Difference-Codebook VQ (DCVQ) algorithm. A fault tolerant system is disclosed which allows a PE that has developed a fault to be bypassed in the array and replaced by a spare at the end of the array, with codebook memory assignment shifted one PE past the faulty PE of the array.
Fault tolerant system based on IDDQ testing

NASA Astrophysics Data System (ADS)

Guibane, Badi; Hamdi, Belgacem; Mtibaa, Abdellatif; Bensalem, Brahim

2018-06-01

Offline test is essential to ensure good manufacturing quality. However, for permanent or transient faults that occur during the use of the integrated circuit in an application, an online integrated test is needed as well. This procedure should ensure the detection and possibly the correction or the masking of these faults. This requirement of self-correction is sometimes necessary, especially in critical applications that require high security such as automotive, space or biomedical applications. We propose a fault-tolerant design for analogue and mixed-signal design complementary metal oxide (CMOS) circuits based on the quiescent current supply (IDDQ) testing. A defect can cause an increase in current consumption. IDDQ testing technique is based on the measurement of power supply current to distinguish between functional and failed circuits. The technique has been an effective testing method for detecting physical defects such as gate-oxide shorts, floating gates (open) and bridging defects in CMOS integrated circuits. An architecture called BICS (Built In Current Sensor) is used for monitoring the supply current (IDDQ) of the connected integrated circuit. If the measured current is not within the normal range, a defect is signalled and the system switches connection from the defective to a functional integrated circuit. The fault-tolerant technique is composed essentially by a double mirror built-in current sensor, allowing the detection of abnormal current consumption and blocks allowing the connection to redundant circuits, if a defect occurs. Spices simulations are performed to valid the proposed design.
Network-Physics(NP) Bec DIGITAL(#)-VULNERABILITY Versus Fault-Tolerant Analog

NASA Astrophysics Data System (ADS)

Alexander, G. K.; Hathaway, M.; Schmidt, H. E.; Siegel, E.

2011-03-01

Siegel[AMS Joint Mtg.(2002)-Abs.973-60-124] digits logarithmic-(Newcomb(1881)-Weyl(1914; 1916)-Benford(1938)-"NeWBe"/"OLDbe")-law algebraic-inversion to ONLY BEQS BEC:Quanta/Bosons= digits: Synthesis reveals EMP-like SEVERE VULNERABILITY of ONLY DIGITAL-networks(VS. FAULT-TOLERANT ANALOG INvulnerability) via Barabasi "Network-Physics" relative-``statics''(VS.dynamics-[Willinger-Alderson-Doyle(Not.AMS(5/09)]-]critique); (so called)"Quantum-computing is simple-arithmetic(sans division/ factorization); algorithmic-complexities: INtractibility/ UNdecidability/ INefficiency/NONcomputability / HARDNESS(so MIScalled) "noise"-induced-phase-transitions(NITS) ACCELERATION: Cook-Levin theorem Reducibility is Renormalization-(Semi)-Group fixed-points; number-Randomness DEFINITION via WHAT? Query(VS. Goldreich[Not.AMS(02)] How? mea culpa)can ONLY be MBCS "hot-plasma" versus digit-clumping NON-random BEC; Modular-arithmetic Congruences= Signal X Noise PRODUCTS = clock-model; NON-Shor[Physica A,341,586(04)] BEC logarithmic-law inversion factorization:Watkins number-thy. U stat.-phys.); P=/=NP TRIVIAL Proof: Euclid!!! [(So Miscalled) computational-complexity J-O obviation via geometry.
Hybrid Model-Based and Data-Driven Fault Detection and Diagnostics for Commercial Buildings

DOE Office of Scientific and Technical Information (OSTI.GOV)

Frank, Stephen; Heaney, Michael; Jin, Xin

Commercial buildings often experience faults that produce undesirable behavior in building systems. Building faults waste energy, decrease occupants' comfort, and increase operating costs. Automated fault detection and diagnosis (FDD) tools for buildings help building owners discover and identify the root causes of faults in building systems, equipment, and controls. Proper implementation of FDD has the potential to simultaneously improve comfort, reduce energy use, and narrow the gap between actual and optimal building performance. However, conventional rule-based FDD requires expensive instrumentation and valuable engineering labor, which limit deployment opportunities. This paper presents a hybrid, automated FDD approach that combines building energymore » models and statistical learning tools to detect and diagnose faults noninvasively, using minimal sensors, with little customization. We compare and contrast the performance of several hybrid FDD algorithms for a small security building. Our results indicate that the algorithms can detect and diagnose several common faults, but more work is required to reduce false positive rates and improve diagnosis accuracy.« less
Hybrid Model-Based and Data-Driven Fault Detection and Diagnostics for Commercial Buildings: Preprint

DOE Office of Scientific and Technical Information (OSTI.GOV)

Frank, Stephen; Heaney, Michael; Jin, Xin

Commercial buildings often experience faults that produce undesirable behavior in building systems. Building faults waste energy, decrease occupants' comfort, and increase operating costs. Automated fault detection and diagnosis (FDD) tools for buildings help building owners discover and identify the root causes of faults in building systems, equipment, and controls. Proper implementation of FDD has the potential to simultaneously improve comfort, reduce energy use, and narrow the gap between actual and optimal building performance. However, conventional rule-based FDD requires expensive instrumentation and valuable engineering labor, which limit deployment opportunities. This paper presents a hybrid, automated FDD approach that combines building energymore » models and statistical learning tools to detect and diagnose faults noninvasively, using minimal sensors, with little customization. We compare and contrast the performance of several hybrid FDD algorithms for a small security building. Our results indicate that the algorithms can detect and diagnose several common faults, but more work is required to reduce false positive rates and improve diagnosis accuracy.« less
An algorithm to diagnose ball bearing faults in servomotors running arbitrary motion profiles

NASA Astrophysics Data System (ADS)

Cocconcelli, Marco; Bassi, Luca; Secchi, Cristian; Fantuzzi, Cesare; Rubini, Riccardo

2012-02-01

This paper describes a procedure to extend the scope of classical methods to detect ball bearing faults (based on envelope analysis and fault frequencies identification) beyond their usual area of application. The objective of this procedure is to allow condition-based monitoring of such bearings in servomotor applications, where typically the motor in its normal mode of operation has to follow a non-constant angular velocity profile that may contain motion inversions. After describing and analyzing the algorithm from a theoretical point of view, experimental results obtained on a real industrial application are presented and commented.

Guidance, Navigation, and Control System Design in a Mass Reduction Exercise

NASA Technical Reports Server (NTRS)

Crain, Timothy; Begly, Michael; Jackson, Mark; Broome, Joel

2008-01-01

Early Orion GN&C system designs optimized for robustness, simplicity, and utilization of commercially available components. During the System Definition Review (SDR), all subsystems on Orion were asked to re-optimize with component mass and steady state power as primary design metrics. The objective was to create a mass reserve in the Orion point of departure vehicle design prior to beginning the PDR analysis cycle. The Orion GN&C subsystem team transitioned from a philosophy of absolute 2 fault tolerance for crew safety and 1 fault tolerance for mission success to an approach of 1 fault tolerance for crew safety and risk based redundancy to meet probability allocations of loss of mission and loss of crew. This paper will discuss the analyses, rationale, and end results of this activity regarding Orion navigation sensor hardware, control effectors, and trajectory design.
A method for joint routing, wavelength dimensioning and fault tolerance for any set of simultaneous failures on dynamic WDM optical networks

NASA Astrophysics Data System (ADS)

Jara, Nicolás; Vallejos, Reinaldo; Rubino, Gerardo

2017-11-01

The design of optical networks decomposes into different tasks, where the engineers must basically organize the way the main system's resources are used, minimizing the design and operation costs and respecting critical performance constraints. More specifically, network operators face the challenge of solving routing and wavelength dimensioning problems while aiming to simultaneously minimize the network cost and to ensure that the network performance meets the level established in the Service Level Agreement (SLA). We call this the Routing and Wavelength Dimensioning (R&WD) problem. Another important problem to be solved is how to deal with failures of links when the network is operating. When at least one link fails, a high rate of data loss may occur. To avoid it, the network must be designed in such a manner that upon one or multiple failures, the affected connections can still communicate using alternative routes, a mechanism known as Fault Tolerance (FT). When the mechanism allows to deal with an arbitrary number of faults, we speak about Multiple Fault Tolerance (MFT). The different tasks before mentioned are usually solved separately, or in some cases by pairs, leading to solutions that are not necessarily close to optimal ones. This paper proposes a novel method to simultaneously solve all of them, that is, the Routing, the Wavelength Dimensioning, and the Multiple Fault Tolerance problems. The method allows to obtain: a) all the primary routes by which each connection normally transmits its information, b) the additional routes, called secondary routes, used to keep each user connected in cases where one or more simultaneous failures occur, and c) the number of wavelengths available at each link of the network, calculated such that the blocking probability of each connection is lower than a pre-determined threshold (which is a network design parameter), despite the occurrence of simultaneous link failures. The solution obtained by the new algorithm is significantly more efficient than current methods, its implementation is notably simple and its on-line operation is very fast. In the paper, different examples illustrate the results provided by the proposed technique.
Fault tolerance of artificial neural networks with applications in critical systems

NASA Technical Reports Server (NTRS)

Protzel, Peter W.; Palumbo, Daniel L.; Arras, Michael K.

1992-01-01

This paper investigates the fault tolerance characteristics of time continuous recurrent artificial neural networks (ANN) that can be used to solve optimization problems. The principle of operations and performance of these networks are first illustrated by using well-known model problems like the traveling salesman problem and the assignment problem. The ANNs are then subjected to 13 simultaneous 'stuck at 1' or 'stuck at 0' faults for network sizes of up to 900 'neurons'. The effects of these faults is demonstrated and the cause for the observed fault tolerance is discussed. An application is presented in which a network performs a critical task for a real-time distributed processing system by generating new task allocations during the reconfiguration of the system. The performance degradation of the ANN under the presence of faults is investigated by large-scale simulations, and the potential benefits of delegating a critical task to a fault tolerant network are discussed.
On-line experimental validation of a model-based diagnostic algorithm dedicated to a solid oxide fuel cell system

NASA Astrophysics Data System (ADS)

Polverino, Pierpaolo; Esposito, Angelo; Pianese, Cesare; Ludwig, Bastian; Iwanschitz, Boris; Mai, Andreas

2016-02-01

In the current energetic scenario, Solid Oxide Fuel Cells (SOFCs) exhibit appealing features which make them suitable for environmental-friendly power production, especially for stationary applications. An example is represented by micro-combined heat and power (μ-CHP) generation units based on SOFC stacks, which are able to produce electric and thermal power with high efficiency and low pollutant and greenhouse gases emissions. However, the main limitations to their diffusion into the mass market consist in high maintenance and production costs and short lifetime. To improve these aspects, the current research activity focuses on the development of robust and generalizable diagnostic techniques, aimed at detecting and isolating faults within the entire system (i.e. SOFC stack and balance of plant). Coupled with appropriate recovery strategies, diagnosis can prevent undesired system shutdowns during faulty conditions, with consequent lifetime increase and maintenance costs reduction. This paper deals with the on-line experimental validation of a model-based diagnostic algorithm applied to a pre-commercial SOFC system. The proposed algorithm exploits a Fault Signature Matrix based on a Fault Tree Analysis and improved through fault simulations. The algorithm is characterized on the considered system and it is validated by means of experimental induction of faulty states in controlled conditions.
ASCS online fault detection and isolation based on an improved MPCA

NASA Astrophysics Data System (ADS)

Peng, Jianxin; Liu, Haiou; Hu, Yuhui; Xi, Junqiang; Chen, Huiyan

2014-09-01

Multi-way principal component analysis (MPCA) has received considerable attention and been widely used in process monitoring. A traditional MPCA algorithm unfolds multiple batches of historical data into a two-dimensional matrix and cut the matrix along the time axis to form subspaces. However, low efficiency of subspaces and difficult fault isolation are the common disadvantages for the principal component model. This paper presents a new subspace construction method based on kernel density estimation function that can effectively reduce the storage amount of the subspace information. The MPCA model and the knowledge base are built based on the new subspace. Then, fault detection and isolation with the squared prediction error (SPE) statistic and the Hotelling ( T 2) statistic are also realized in process monitoring. When a fault occurs, fault isolation based on the SPE statistic is achieved by residual contribution analysis of different variables. For fault isolation of subspace based on the T 2 statistic, the relationship between the statistic indicator and state variables is constructed, and the constraint conditions are presented to check the validity of fault isolation. Then, to improve the robustness of fault isolation to unexpected disturbances, the statistic method is adopted to set the relation between single subspace and multiple subspaces to increase the corrective rate of fault isolation. Finally fault detection and isolation based on the improved MPCA is used to monitor the automatic shift control system (ASCS) to prove the correctness and effectiveness of the algorithm. The research proposes a new subspace construction method to reduce the required storage capacity and to prove the robustness of the principal component model, and sets the relationship between the state variables and fault detection indicators for fault isolation.
Spacecraft fault tolerance: The Magellan experience

NASA Technical Reports Server (NTRS)

Kasuda, Rick; Packard, Donna Sexton

1993-01-01

Interplanetary and earth orbiting missions are now imposing unique fault tolerant requirements upon spacecraft design. Mission success is the prime motivator for building spacecraft with fault tolerant systems. The Magellan spacecraft had many such requirements imposed upon its design. Magellan met these requirements by building redundancy into all the major subsystem components and designing the onboard hardware and software with the capability to detect a fault, isolate it to a component, and issue commands to achieve a back-up configuration. This discussion is limited to fault protection, which is the autonomous capability to respond to a fault. The Magellan fault protection design is discussed, as well as the developmental and flight experiences and a summary of the lessons learned.
Safety Verification of a Fault Tolerant Reconfigurable Autonomous Goal-Based Robotic Control System

NASA Technical Reports Server (NTRS)

Braman, Julia M. B.; Murray, Richard M; Wagner, David A.

2007-01-01

Fault tolerance and safety verification of control systems are essential for the success of autonomous robotic systems. A control architecture called Mission Data System (MDS), developed at the Jet Propulsion Laboratory, takes a goal-based control approach. In this paper, a method for converting goal network control programs into linear hybrid systems is developed. The linear hybrid system can then be verified for safety in the presence of failures using existing symbolic model checkers. An example task is simulated in MDS and successfully verified using HyTech, a symbolic model checking software for linear hybrid systems.
Design and evaluation of a fault-tolerant multiprocessor using hardware recovery blocks

NASA Technical Reports Server (NTRS)

Lee, Y. H.; Shin, K. G.

1982-01-01

A fault-tolerant multiprocessor with a rollback recovery mechanism is discussed. The rollback mechanism is based on the hardware recovery block which is a hardware equivalent to the software recovery block. The hardware recovery block is constructed by consecutive state-save operations and several state-save units in every processor and memory module. When a fault is detected, the multiprocessor reconfigures itself to replace the faulty component and then the process originally assigned to the faulty component retreats to one of the previously saved states in order to resume fault-free execution. A mathematical model is proposed to calculate both the coverage of multi-step rollback recovery and the risk of restart. A performance evaluation in terms of task execution time is also presented.
An optimized implementation of a fault-tolerant clock synchronization circuit

NASA Technical Reports Server (NTRS)

Torres-Pomales, Wilfredo

1995-01-01

A fault-tolerant clock synchronization circuit was designed and tested. A comparison to a previous design and the procedure followed to achieve the current optimization are included. The report also includes a description of the system and the results of tests performed to study the synchronization and fault-tolerant characteristics of the implementation.
An Integrated Fault Tolerant Robotic Controller System for High Reliability and Safety

NASA Technical Reports Server (NTRS)

Marzwell, Neville I.; Tso, Kam S.; Hecht, Myron

1994-01-01

This paper describes the concepts and features of a fault-tolerant intelligent robotic control system being developed for applications that require high dependability (reliability, availability, and safety). The system consists of two major elements: a fault-tolerant controller and an operator workstation. The fault-tolerant controller uses a strategy which allows for detection and recovery of hardware, operating system, and application software failures.The fault-tolerant controller can be used by itself in a wide variety of applications in industry, process control, and communications. The controller in combination with the operator workstation can be applied to robotic applications such as spaceborne extravehicular activities, hazardous materials handling, inspection and maintenance of high value items (e.g., space vehicles, reactor internals, or aircraft), medicine, and other tasks where a robot system failure poses a significant risk to life or property.
Reliability of Fault Tolerant Control Systems. Part 1

NASA Technical Reports Server (NTRS)

Wu, N. Eva

2001-01-01

This paper reports Part I of a two part effort, that is intended to delineate the relationship between reliability and fault tolerant control in a quantitative manner. Reliability analysis of fault-tolerant control systems is performed using Markov models. Reliability properties, peculiar to fault-tolerant control systems are emphasized. As a consequence, coverage of failures through redundancy management can be severely limited. It is shown that in the early life of a syi1ein composed of highly reliable subsystems, the reliability of the overall system is affine with respect to coverage, and inadequate coverage induces dominant single point failures. The utility of some existing software tools for assessing the reliability of fault tolerant control systems is also discussed. Coverage modeling is attempted in Part II in a way that captures its dependence on the control performance and on the diagnostic resolution.
[Advanced Development for Space Robotics With Emphasis on Fault Tolerance Technology

NASA Technical Reports Server (NTRS)

Tesar, Delbert

1997-01-01

This report describes work developing fault tolerant redundant robotic architectures and adaptive control strategies for robotic manipulator systems which can dynamically accommodate drastic robot manipulator mechanism, sensor or control failures and maintain stable end-point trajectory control with minimum disturbance. Kinematic designs of redundant, modular, reconfigurable arms for fault tolerance were pursued at a fundamental level. The approach developed robotic testbeds to evaluate disturbance responses of fault tolerant concepts in robotic mechanisms and controllers. The development was implemented in various fault tolerant mechanism testbeds including duality in the joint servo motor modules, parallel and serial structural architectures, and dual arms. All have real-time adaptive controller technologies to react to mechanism or controller disturbances (failures) to perform real-time reconfiguration to continue the task operations. The developments fall into three main areas: hardware, software, and theoretical.
Efficient Parallel Algorithms on Restartable Fail-Stop Processors

DTIC Science & Technology

1991-01-01

resource (memory), and ( 3 ) that processors, memory and their interconnection must be The model of parallel computation known as the Par- perfectly...setting), arid ure an(I restart errors. We describe these arguments if] [AAtPS 871 (in a deterministic setting). Fault-tolerance Section 3 . of...grannmarity at the processor level --- for recent work on where Al is the nmber of failures during this step’s gate granilarities see [All 90, Pip 85
Reconfiguration Schemes for Fault-Tolerant Processor Arrays

DTIC Science & Technology

1992-10-15

partially notion of linear schedule are easily related to similar ordered subset of a multidimensional integer lattice models and concepts used in [11-[131...and several other (called indec set). The points of this lattice correspond works. to (i.e.. are the indices of) computations, and the partial There are...These data dependencies are represented as vectors that of all computations of the algorithm is to be minimized. connect points of the lattice . If a
A high precision position sensor design and its signal processing algorithm for a maglev train.

PubMed

Xue, Song; Long, Zhiqiang; He, Ning; Chang, Wensen

2012-01-01

High precision positioning technology for a kind of high speed maglev train with an electromagnetic suspension (EMS) system is studied. At first, the basic structure and functions of the position sensor are introduced and some key techniques to enhance the positioning precision are designed. Then, in order to further improve the positioning signal quality and the fault-tolerant ability of the sensor, a new kind of discrete-time tracking differentiator (TD) is proposed based on nonlinear optimal control theory. This new TD has good filtering and differentiating performances and a small calculation load. It is suitable for real-time signal processing. The stability, convergence property and frequency characteristics of the TD are studied and analyzed thoroughly. The delay constant of the TD is figured out and an effective time delay compensation algorithm is proposed. Based on the TD technology, a filtering process is introduced in to improve the positioning signal waveform when the sensor is under bad working conditions, and a two-sensor switching algorithm is designed to eliminate the positioning errors caused by the joint gaps of the long stator. The effectiveness and stability of the sensor and its signal processing algorithms are proved by the experiments on a test train during a long-term test run.
A High Precision Position Sensor Design and Its Signal Processing Algorithm for a Maglev Train

PubMed Central

Xue, Song; Long, Zhiqiang; He, Ning; Chang, Wensen

2012-01-01

High precision positioning technology for a kind of high speed maglev train with an electromagnetic suspension (EMS) system is studied. At first, the basic structure and functions of the position sensor are introduced and some key techniques to enhance the positioning precision are designed. Then, in order to further improve the positioning signal quality and the fault-tolerant ability of the sensor, a new kind of discrete-time tracking differentiator (TD) is proposed based on nonlinear optimal control theory. This new TD has good filtering and differentiating performances and a small calculation load. It is suitable for real-time signal processing. The stability, convergence property and frequency characteristics of the TD are studied and analyzed thoroughly. The delay constant of the TD is figured out and an effective time delay compensation algorithm is proposed. Based on the TD technology, a filtering process is introduced in to improve the positioning signal waveform when the sensor is under bad working conditions, and a two-sensor switching algorithm is designed to eliminate the positioning errors caused by the joint gaps of the long stator. The effectiveness and stability of the sensor and its signal processing algorithms are proved by the experiments on a test train during a long-term test run. PMID:22778582
Observer-based distributed adaptive fault-tolerant containment control of multi-agent systems with general linear dynamics.

PubMed

Ye, Dan; Chen, Mengmeng; Li, Kui

2017-11-01

In this paper, we consider the distributed containment control problem of multi-agent systems with actuator bias faults based on observer method. The objective is to drive the followers into the convex hull spanned by the dynamic leaders, where the input is unknown but bounded. By constructing an observer to estimate the states and bias faults, an effective distributed adaptive fault-tolerant controller is developed. Different from the traditional method, an auxiliary controller gain is designed to deal with the unknown inputs and bias faults together. Moreover, the coupling gain can be adjusted online through the adaptive mechanism without using the global information. Furthermore, the proposed control protocol can guarantee that all the signals of the closed-loop systems are bounded and all the followers converge to the convex hull with bounded residual errors formed by the dynamic leaders. Finally, a decoupled linearized longitudinal motion model of the F-18 aircraft is used to demonstrate the effectiveness. Copyright © 2017 ISA. Published by Elsevier Ltd. All rights reserved.
Interface For Fault-Tolerant Control System

NASA Technical Reports Server (NTRS)

Shaver, Charles; Williamson, Michael

1989-01-01

Interface unit and controller emulator developed for research on electronic helicopter-flight-control systems equipped with artificial intelligence. Interface unit interrupt-driven system designed to link microprocessor-based, quadruply-redundant, asynchronous, ultra-reliable, fault-tolerant control system (controller) with electronic servocontrol unit that controls set of hydraulic actuators. Receives digital feedforward messages from, and transmits digital feedback messages to, controller through differential signal lines or fiber-optic cables (thus far only differential signal lines have been used). Analog signals transmitted to and from servocontrol unit via coaxial cables.
Implementation of a Fault Tolerant Control Unit within an FPGA for Space Applications

DTIC Science & Technology

2006-12-01

Conference 2002, September 2002. [20] M. Alderighi, A. Candelori, F. Casini, S. D’Angelo, M. Mancini, A. Paccagnella, S. Pastore , G.R. Sechi, “Heavy...Luigi Carro and Ricardo Reis , “Designing and Testing Fault-Tolerant Techniques for SRAM-based FPGAs,” in Proc. 1st Conference on Computer Frontiers, pp...susceptibility,” in IEEE Proc. 12th IEEE Intl. Symposium on On-Line Testing, pp. 89-91, 2006. [45] Fernanda Lima, Luigi Carro and Ricardo Reis
A seismic fault recognition method based on ant colony optimization

NASA Astrophysics Data System (ADS)

Chen, Lei; Xiao, Chuangbai; Li, Xueliang; Wang, Zhenli; Huo, Shoudong

2018-05-01

Fault recognition is an important section in seismic interpretation and there are many methods for this technology, but no one can recognize fault exactly enough. For this problem, we proposed a new fault recognition method based on ant colony optimization which can locate fault precisely and extract fault from the seismic section. Firstly, seismic horizons are extracted by the connected component labeling algorithm; secondly, the fault location are decided according to the horizontal endpoints of each horizon; thirdly, the whole seismic section is divided into several rectangular blocks and the top and bottom endpoints of each rectangular block are considered as the nest and food respectively for the ant colony optimization algorithm. Besides that, the positive section is taken as an actual three dimensional terrain by using the seismic amplitude as a height. After that, the optimal route from nest to food calculated by the ant colony in each block is judged as a fault. Finally, extensive comparative tests were performed on the real seismic data. Availability and advancement of the proposed method were validated by the experimental results.

Quantum Computation: Entangling with the Future

NASA Technical Reports Server (NTRS)

Jiang, Zhang

2017-01-01

Commercial applications of quantum computation have become viable due to the rapid progress of the field in the recent years. Efficient quantum algorithms are discovered to cope with the most challenging real-world problems that are too hard for classical computers. Manufactured quantum hardware has reached unprecedented precision and controllability, enabling fault-tolerant quantum computation. Here, I give a brief introduction on what principles in quantum mechanics promise its unparalleled computational power. I will discuss several important quantum algorithms that achieve exponential or polynomial speedup over any classical algorithm. Building a quantum computer is a daunting task, and I will talk about the criteria and various implementations of quantum computers. I conclude the talk with near-future commercial applications of a quantum computer.
Quantum gates with controlled adiabatic evolutions

NASA Astrophysics Data System (ADS)

Hen, Itay

2015-02-01

We introduce a class of quantum adiabatic evolutions that we claim may be interpreted as the equivalents of the unitary gates of the quantum gate model. We argue that these gates form a universal set and may therefore be used as building blocks in the construction of arbitrary "adiabatic circuits," analogously to the manner in which gates are used in the circuit model. One implication of the above construction is that arbitrary classical boolean circuits as well as gate model circuits may be directly translated to adiabatic algorithms with no additional resources or complexities. We show that while these adiabatic algorithms fail to exhibit certain aspects of the inherent fault tolerance of traditional quantum adiabatic algorithms, they may have certain other experimental advantages acting as quantum gates.
Sequoia: A fault-tolerant tightly coupled multiprocessor for transaction processing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bernstein, P.A.

1988-02-01

The Sequoia computer is a tightly coupled multiprocessor, and thus attains the performance advantages of this style of architecture. It avoids most of the fault-tolerance disadvantages of tight coupling by using a new fault-tolerance design. The Sequoia architecture is similar to other multimicroprocessor architectures, such as those of Encore and Sequent, in that it gives dozens of microprocessors shared access to a large main memory. It resembles the Stratus architecture in its extensive use of hardware fault-detection techniques. It resembles Stratus and Auragen in its ability to quickly recover all processes after a single point failure, transparently to the user.more » However, Sequoia is unique in its combination of a large-scale tightly coupled architecture with a hardware approach to fault tolerance. This article gives an overview of how the hardware architecture and operating systems (OS) work together to provide a high degree of fault tolerance with good system performance.« less
Computer architecture for efficient algorithmic executions in real-time systems: New technology for avionics systems and advanced space vehicles

NASA Technical Reports Server (NTRS)

Carroll, Chester C.; Youngblood, John N.; Saha, Aindam

1987-01-01

Improvements and advances in the development of computer architecture now provide innovative technology for the recasting of traditional sequential solutions into high-performance, low-cost, parallel system to increase system performance. Research conducted in development of specialized computer architecture for the algorithmic execution of an avionics system, guidance and control problem in real time is described. A comprehensive treatment of both the hardware and software structures of a customized computer which performs real-time computation of guidance commands with updated estimates of target motion and time-to-go is presented. An optimal, real-time allocation algorithm was developed which maps the algorithmic tasks onto the processing elements. This allocation is based on the critical path analysis. The final stage is the design and development of the hardware structures suitable for the efficient execution of the allocated task graph. The processing element is designed for rapid execution of the allocated tasks. Fault tolerance is a key feature of the overall architecture. Parallel numerical integration techniques, tasks definitions, and allocation algorithms are discussed. The parallel implementation is analytically verified and the experimental results are presented. The design of the data-driven computer architecture, customized for the execution of the particular algorithm, is discussed.
Computer architecture for efficient algorithmic executions in real-time systems: new technology for avionics systems and advanced space vehicles

DOE Office of Scientific and Technical Information (OSTI.GOV)

Carroll, C.C.; Youngblood, J.N.; Saha, A.

1987-12-01

Improvements and advances in the development of computer architecture now provide innovative technology for the recasting of traditional sequential solutions into high-performance, low-cost, parallel system to increase system performance. Research conducted in development of specialized computer architecture for the algorithmic execution of an avionics system, guidance and control problem in real time is described. A comprehensive treatment of both the hardware and software structures of a customized computer which performs real-time computation of guidance commands with updated estimates of target motion and time-to-go is presented. An optimal, real-time allocation algorithm was developed which maps the algorithmic tasks onto the processingmore » elements. This allocation is based on the critical path analysis. The final stage is the design and development of the hardware structures suitable for the efficient execution of the allocated task graph. The processing element is designed for rapid execution of the allocated tasks. Fault tolerance is a key feature of the overall architecture. Parallel numerical integration techniques, tasks definitions, and allocation algorithms are discussed. The parallel implementation is analytically verified and the experimental results are presented. The design of the data-driven computer architecture, customized for the execution of the particular algorithm, is discussed.« less
Technology transfer by means of fault tree synthesis

NASA Astrophysics Data System (ADS)

Batzias, Dimitris F.

2012-12-01

Since Fault Tree Analysis (FTA) attempts to model and analyze failure processes of engineering, it forms a common technique for good industrial practice. On the contrary, fault tree synthesis (FTS) refers to the methodology of constructing complex trees either from dentritic modules built ad hoc or from fault tress already used and stored in a Knowledge Base. In both cases, technology transfer takes place in a quasi-inductive mode, from partial to holistic knowledge. In this work, an algorithmic procedure, including 9 activity steps and 3 decision nodes is developed for performing effectively this transfer when the fault under investigation occurs within one of the latter stages of an industrial procedure with several stages in series. The main parts of the algorithmic procedure are: (i) the construction of a local fault tree within the corresponding production stage, where the fault has been detected, (ii) the formation of an interface made of input faults that might occur upstream, (iii) the fuzzy (to count for uncertainty) multicriteria ranking of these faults according to their significance, and (iv) the synthesis of an extended fault tree based on the construction of part (i) and on the local fault tree of the first-ranked fault in part (iii). An implementation is presented, referring to 'uneven sealing of Al anodic film', thus proving the functionality of the developed methodology.
Advanced information processing system - Status report. [for fault tolerant and damage tolerant data processing for aerospace vehicles

NASA Technical Reports Server (NTRS)

Brock, L. D.; Lala, J.

1986-01-01

The Advanced Information Processing System (AIPS) is designed to provide a fault tolerant and damage tolerant data processing architecture for a broad range of aerospace vehicles. The AIPS architecture also has attributes to enhance system effectiveness such as graceful degradation, growth and change tolerance, integrability, etc. Two key building blocks being developed by the AIPS program are a fault and damage tolerant processor and communication network. A proof-of-concept system is now being built and will be tested to demonstrate the validity and performance of the AIPS concepts.
Error Mitigation of Point-to-Point Communication for Fault-Tolerant Computing

NASA Technical Reports Server (NTRS)

Akamine, Robert L.; Hodson, Robert F.; LaMeres, Brock J.; Ray, Robert E.

2011-01-01

Fault tolerant systems require the ability to detect and recover from physical damage caused by the hardware s environment, faulty connectors, and system degradation over time. This ability applies to military, space, and industrial computing applications. The integrity of Point-to-Point (P2P) communication, between two microcontrollers for example, is an essential part of fault tolerant computing systems. In this paper, different methods of fault detection and recovery are presented and analyzed.
Robust fault tolerant control based on sliding mode method for uncertain linear systems with quantization.

PubMed

Hao, Li-Ying; Yang, Guang-Hong

2013-09-01

This paper is concerned with the problem of robust fault-tolerant compensation control problem for uncertain linear systems subject to both state and input signal quantization. By incorporating novel matrix full-rank factorization technique with sliding surface design successfully, the total failure of certain actuators can be coped with, under a special actuator redundancy assumption. In order to compensate for quantization errors, an adjustment range of quantization sensitivity for a dynamic uniform quantizer is given through the flexible choices of design parameters. Comparing with the existing results, the derived inequality condition leads to the fault tolerance ability stronger and much wider scope of applicability. With a static adjustment policy of quantization sensitivity, an adaptive sliding mode controller is then designed to maintain the sliding mode, where the gain of the nonlinear unit vector term is updated automatically to compensate for the effects of actuator faults, quantization errors, exogenous disturbances and parameter uncertainties without the need for a fault detection and isolation (FDI) mechanism. Finally, the effectiveness of the proposed design method is illustrated via a model of a rocket fairing structural-acoustic. Copyright © 2013 ISA. Published by Elsevier Ltd. All rights reserved.
A survey of NASA and military standards on fault tolerance and reliability applied to robotics

NASA Technical Reports Server (NTRS)

Cavallaro, Joseph R.; Walker, Ian D.

1994-01-01

There is currently increasing interest and activity in the area of reliability and fault tolerance for robotics. This paper discusses the application of Standards in robot reliability, and surveys the literature of relevant existing standards. A bibliography of relevant Military and NASA standards for reliability and fault tolerance is included.
SOM neural network fault diagnosis method of polymerization kettle equipment optimized by improved PSO algorithm.

PubMed

Wang, Jie-sheng; Li, Shu-xia; Gao, Jie

2014-01-01

For meeting the real-time fault diagnosis and the optimization monitoring requirements of the polymerization kettle in the polyvinyl chloride resin (PVC) production process, a fault diagnosis strategy based on the self-organizing map (SOM) neural network is proposed. Firstly, a mapping between the polymerization process data and the fault pattern is established by analyzing the production technology of polymerization kettle equipment. The particle swarm optimization (PSO) algorithm with a new dynamical adjustment method of inertial weights is adopted to optimize the structural parameters of SOM neural network. The fault pattern classification of the polymerization kettle equipment is to realize the nonlinear mapping from symptom set to fault set according to the given symptom set. Finally, the simulation experiments of fault diagnosis are conducted by combining with the industrial on-site historical data of the polymerization kettle and the simulation results show that the proposed PSO-SOM fault diagnosis strategy is effective.
Abstractions for Fault-Tolerant Distributed System Verification

NASA Technical Reports Server (NTRS)

Pike, Lee S.; Maddalon, Jeffrey M.; Miner, Paul S.; Geser, Alfons

2004-01-01

Four kinds of abstraction for the design and analysis of fault tolerant distributed systems are discussed. These abstractions concern system messages, faults, fault masking voting, and communication. The abstractions are formalized in higher order logic, and are intended to facilitate specifying and verifying such systems in higher order theorem provers.
Fault tolerant architectures for integrated aircraft electronics systems

NASA Technical Reports Server (NTRS)

Levitt, K. N.; Melliar-Smith, P. M.; Schwartz, R. L.

1983-01-01

Work into possible architectures for future flight control computer systems is described. Ada for Fault-Tolerant Systems, the NETS Network Error-Tolerant System architecture, and voting in asynchronous systems are covered.
Evaluation of a fault tolerant system for an integrated avionics sensor configuration with TSRV flight data

NASA Technical Reports Server (NTRS)

Caglayan, A. K.; Godiwala, P. M.

1985-01-01

The performance analysis results of a fault inferring nonlinear detection system (FINDS) using sensor flight data for the NASA ATOPS B-737 aircraft in a Microwave Landing System (MLS) environment is presented. First, a statistical analysis of the flight recorded sensor data was made in order to determine the characteristics of sensor inaccuracies. Next, modifications were made to the detection and decision functions in the FINDS algorithm in order to improve false alarm and failure detection performance under real modelling errors present in the flight data. Finally, the failure detection and false alarm performance of the FINDS algorithm were analyzed by injecting bias failures into fourteen sensor outputs over six repetitive runs of the five minute flight data. In general, the detection speed, failure level estimation, and false alarm performance showed a marked improvement over the previously reported simulation runs. In agreement with earlier results, detection speed was faster for filter measurement sensors soon as MLS than for filter input sensors such as flight control accelerometers.
Intelligent Fault Diagnosis of HVCB with Feature Space Optimization-Based Random Forest

PubMed Central

Ma, Suliang; Wu, Jianwen; Wang, Yuhao; Jia, Bowen; Jiang, Yuan

2018-01-01

Mechanical faults of high-voltage circuit breakers (HVCBs) always happen over long-term operation, so extracting the fault features and identifying the fault type have become a key issue for ensuring the security and reliability of power supply. Based on wavelet packet decomposition technology and random forest algorithm, an effective identification system was developed in this paper. First, compared with the incomplete description of Shannon entropy, the wavelet packet time-frequency energy rate (WTFER) was adopted as the input vector for the classifier model in the feature selection procedure. Then, a random forest classifier was used to diagnose the HVCB fault, assess the importance of the feature variable and optimize the feature space. Finally, the approach was verified based on actual HVCB vibration signals by considering six typical fault classes. The comparative experiment results show that the classification accuracy of the proposed method with the origin feature space reached 93.33% and reached up to 95.56% with optimized input feature vector of classifier. This indicates that feature optimization procedure is successful, and the proposed diagnosis algorithm has higher efficiency and robustness than traditional methods. PMID:29659548
Formal verification of a fault tolerant clock synchronization algorithm

NASA Technical Reports Server (NTRS)

Rushby, John; Vonhenke, Frieder

1989-01-01

A formal specification and mechanically assisted verification of the interactive convergence clock synchronization algorithm of Lamport and Melliar-Smith is described. Several technical flaws in the analysis given by Lamport and Melliar-Smith were discovered, even though their presentation is unusally precise and detailed. It seems that these flaws were not detected by informal peer scrutiny. The flaws are discussed and a revised presentation of the analysis is given that not only corrects the flaws but is also more precise and easier to follow. Some of the corrections to the flaws require slight modifications to the original assumptions underlying the algorithm and to the constraints on its parameters, and thus change the external specifications of the algorithm. The formal analysis of the interactive convergence clock synchronization algorithm was performed using the Enhanced Hierarchical Development Methodology (EHDM) formal specification and verification environment. This application of EHDM provides a demonstration of some of the capabilities of the system.
A Rolling Element Bearing Fault Diagnosis Approach Based on Multifractal Theory and Gray Relation Theory

PubMed Central

Li, Jingchao; Cao, Yunpeng; Ying, Yulong; Li, Shuying

2016-01-01

Bearing failure is one of the dominant causes of failure and breakdowns in rotating machinery, leading to huge economic loss. Aiming at the nonstationary and nonlinear characteristics of bearing vibration signals as well as the complexity of condition-indicating information distribution in the signals, a novel rolling element bearing fault diagnosis method based on multifractal theory and gray relation theory was proposed in the paper. Firstly, a generalized multifractal dimension algorithm was developed to extract the characteristic vectors of fault features from the bearing vibration signals, which can offer more meaningful and distinguishing information reflecting different bearing health status in comparison with conventional single fractal dimension. After feature extraction by multifractal dimensions, an adaptive gray relation algorithm was applied to implement an automated bearing fault pattern recognition. The experimental results show that the proposed method can identify various bearing fault types as well as severities effectively and accurately. PMID:28036329
An Ensemble Deep Convolutional Neural Network Model with Improved D-S Evidence Fusion for Bearing Fault Diagnosis.

PubMed

Li, Shaobo; Liu, Guokai; Tang, Xianghong; Lu, Jianguang; Hu, Jianjun

2017-07-28

Intelligent machine health monitoring and fault diagnosis are becoming increasingly important for modern manufacturing industries. Current fault diagnosis approaches mostly depend on expert-designed features for building prediction models. In this paper, we proposed IDSCNN, a novel bearing fault diagnosis algorithm based on ensemble deep convolutional neural networks and an improved Dempster-Shafer theory based evidence fusion. The convolutional neural networks take the root mean square (RMS) maps from the FFT (Fast Fourier Transformation) features of the vibration signals from two sensors as inputs. The improved D-S evidence theory is implemented via distance matrix from evidences and modified Gini Index. Extensive evaluations of the IDSCNN on the Case Western Reserve Dataset showed that our IDSCNN algorithm can achieve better fault diagnosis performance than existing machine learning methods by fusing complementary or conflicting evidences from different models and sensors and adapting to different load conditions.
A Rolling Element Bearing Fault Diagnosis Approach Based on Multifractal Theory and Gray Relation Theory.

PubMed

Li, Jingchao; Cao, Yunpeng; Ying, Yulong; Li, Shuying

2016-01-01

Bearing failure is one of the dominant causes of failure and breakdowns in rotating machinery, leading to huge economic loss. Aiming at the nonstationary and nonlinear characteristics of bearing vibration signals as well as the complexity of condition-indicating information distribution in the signals, a novel rolling element bearing fault diagnosis method based on multifractal theory and gray relation theory was proposed in the paper. Firstly, a generalized multifractal dimension algorithm was developed to extract the characteristic vectors of fault features from the bearing vibration signals, which can offer more meaningful and distinguishing information reflecting different bearing health status in comparison with conventional single fractal dimension. After feature extraction by multifractal dimensions, an adaptive gray relation algorithm was applied to implement an automated bearing fault pattern recognition. The experimental results show that the proposed method can identify various bearing fault types as well as severities effectively and accurately.
An Ensemble Deep Convolutional Neural Network Model with Improved D-S Evidence Fusion for Bearing Fault Diagnosis

PubMed Central

Li, Shaobo; Liu, Guokai; Tang, Xianghong; Lu, Jianguang

2017-01-01

Intelligent machine health monitoring and fault diagnosis are becoming increasingly important for modern manufacturing industries. Current fault diagnosis approaches mostly depend on expert-designed features for building prediction models. In this paper, we proposed IDSCNN, a novel bearing fault diagnosis algorithm based on ensemble deep convolutional neural networks and an improved Dempster–Shafer theory based evidence fusion. The convolutional neural networks take the root mean square (RMS) maps from the FFT (Fast Fourier Transformation) features of the vibration signals from two sensors as inputs. The improved D-S evidence theory is implemented via distance matrix from evidences and modified Gini Index. Extensive evaluations of the IDSCNN on the Case Western Reserve Dataset showed that our IDSCNN algorithm can achieve better fault diagnosis performance than existing machine learning methods by fusing complementary or conflicting evidences from different models and sensors and adapting to different load conditions. PMID:28788099

Fault tolerant operation of switched reluctance machine

NASA Astrophysics Data System (ADS)

Wang, Wei

The energy crisis and environmental challenges have driven industry towards more energy efficient solutions. With nearly 60% of electricity consumed by various electric machines in industry sector, advancement in the efficiency of the electric drive system is of vital importance. Adjustable speed drive system (ASDS) provides excellent speed regulation and dynamic performance as well as dramatically improved system efficiency compared with conventional motors without electronics drives. Industry has witnessed tremendous grow in ASDS applications not only as a driving force but also as an electric auxiliary system for replacing bulky and low efficiency auxiliary hydraulic and mechanical systems. With the vast penetration of ASDS, its fault tolerant operation capability is more widely recognized as an important feature of drive performance especially for aerospace, automotive applications and other industrial drive applications demanding high reliability. The Switched Reluctance Machine (SRM), a low cost, highly reliable electric machine with fault tolerant operation capability, has drawn substantial attention in the past three decades. Nevertheless, SRM is not free of fault. Certain faults such as converter faults, sensor faults, winding shorts, eccentricity and position sensor faults are commonly shared among all ASDS. In this dissertation, a thorough understanding of various faults and their influence on transient and steady state performance of SRM is developed via simulation and experimental study, providing necessary knowledge for fault detection and post fault management. Lumped parameter models are established for fast real time simulation and drive control. Based on the behavior of the faults, a fault detection scheme is developed for the purpose of fast and reliable fault diagnosis. In order to improve the SRM power and torque capacity under faults, the maximum torque per ampere excitation are conceptualized and validated through theoretical analysis and experiments. With the proposed optimal waveform, torque production is greatly improved under the same Root Mean Square (RMS) current constraint. Additionally, position sensorless operation methods under phase faults are investigated to account for the combination of physical position sensor and phase winding faults. A comprehensive solution for position sensorless operation under single and multiple phases fault are proposed and validated through experiments. Continuous position sensorless operation with seamless transition between various numbers of phase fault is achieved.
Bearing Fault Diagnosis Based on Statistical Locally Linear Embedding

PubMed Central

Wang, Xiang; Zheng, Yuan; Zhao, Zhenzhou; Wang, Jinping

2015-01-01

Fault diagnosis is essentially a kind of pattern recognition. The measured signal samples usually distribute on nonlinear low-dimensional manifolds embedded in the high-dimensional signal space, so how to implement feature extraction, dimensionality reduction and improve recognition performance is a crucial task. In this paper a novel machinery fault diagnosis approach based on a statistical locally linear embedding (S-LLE) algorithm which is an extension of LLE by exploiting the fault class label information is proposed. The fault diagnosis approach first extracts the intrinsic manifold features from the high-dimensional feature vectors which are obtained from vibration signals that feature extraction by time-domain, frequency-domain and empirical mode decomposition (EMD), and then translates the complex mode space into a salient low-dimensional feature space by the manifold learning algorithm S-LLE, which outperforms other feature reduction methods such as PCA, LDA and LLE. Finally in the feature reduction space pattern classification and fault diagnosis by classifier are carried out easily and rapidly. Rolling bearing fault signals are used to validate the proposed fault diagnosis approach. The results indicate that the proposed approach obviously improves the classification performance of fault pattern recognition and outperforms the other traditional approaches. PMID:26153771
The Natural-CCD Algorithm, a Novel Method to Solve the Inverse Kinematics of Hyper-redundant and Soft Robots.

PubMed

Martín, Andrés; Barrientos, Antonio; Del Cerro, Jaime

2018-03-22

This article presents a new method to solve the inverse kinematics problem of hyper-redundant and soft manipulators. From an engineering perspective, this kind of robots are underdetermined systems. Therefore, they exhibit an infinite number of solutions for the inverse kinematics problem, and to choose the best one can be a great challenge. A new algorithm based on the cyclic coordinate descent (CCD) and named as natural-CCD is proposed to solve this issue. It takes its name as a result of generating very harmonious robot movements and trajectories that also appear in nature, such as the golden spiral. In addition, it has been applied to perform continuous trajectories, to develop whole-body movements, to analyze motion planning in complex environments, and to study fault tolerance, even for both prismatic and rotational joints. The proposed algorithm is very simple, precise, and computationally efficient. It works for robots either in two or three spatial dimensions and handles a large amount of degrees-of-freedom. Because of this, it is aimed to break down barriers between discrete hyper-redundant and continuum soft robots.
A highly reliable, high performance open avionics architecture for real time Nap-of-the-Earth operations

NASA Technical Reports Server (NTRS)

Harper, Richard E.; Elks, Carl

1995-01-01

An Army Fault Tolerant Architecture (AFTA) has been developed to meet real-time fault tolerant processing requirements of future Army applications. AFTA is the enabling technology that will allow the Army to configure existing processors and other hardware to provide high throughput and ultrahigh reliability necessary for TF/TA/NOE flight control and other advanced Army applications. A comprehensive conceptual study of AFTA has been completed that addresses a wide range of issues including requirements, architecture, hardware, software, testability, producibility, analytical models, validation and verification, common mode faults, VHDL, and a fault tolerant data bus. A Brassboard AFTA for demonstration and validation has been fabricated, and two operating systems and a flight-critical Army application have been ported to it. Detailed performance measurements have been made of fault tolerance and operating system overheads while AFTA was executing the flight application in the presence of faults.
Fault-tolerant rotary actuator

DOEpatents

Tesar, Delbert

2006-10-17

A fault-tolerant actuator module, in a single containment shell, containing two actuator subsystems that are either asymmetrically or symmetrically laid out is provided. Fault tolerance in the actuators of the present invention is achieved by the employment of dual sets of equal resources. Dual resources are integrated into single modules, with each having the external appearance and functionality of a single set of resources.
Electric machine differential for vehicle traction control and stability control

NASA Astrophysics Data System (ADS)

Kuruppu, Sandun Shivantha

Evolving requirements in energy efficiency and tightening regulations for reliable electric drivetrains drive the advancement of the hybrid electric (HEV) and full electric vehicle (EV) technology. Different configurations of EV and HEV architectures are evaluated for their performance. The future technology is trending towards utilizing distinctive properties in electric machines to not only to improve efficiency but also to realize advanced road adhesion controls and vehicle stability controls. Electric machine differential (EMD) is such a concept under current investigation for applications in the near future. Reliability of a power train is critical. Therefore, sophisticated fault detection schemes are essential in guaranteeing reliable operation of a complex system such as an EMD. The research presented here emphasize on implementation of a 4kW electric machine differential, a novel single open phase fault diagnostic scheme, an implementation of a real time slip optimization algorithm and an electric machine differential based yaw stability improvement study. The proposed d-q current signature based SPO fault diagnostic algorithm detects the fault within one electrical cycle. The EMD based extremum seeking slip optimization algorithm reduces stopping distance by 30% compared to hydraulic braking based ABS.
Evaluation of reliability modeling tools for advanced fault tolerant systems

NASA Technical Reports Server (NTRS)

Baker, Robert; Scheper, Charlotte

1986-01-01

The Computer Aided Reliability Estimation (CARE III) and Automated Reliability Interactice Estimation System (ARIES 82) reliability tools for application to advanced fault tolerance aerospace systems were evaluated. To determine reliability modeling requirements, the evaluation focused on the Draper Laboratories' Advanced Information Processing System (AIPS) architecture as an example architecture for fault tolerance aerospace systems. Advantages and limitations were identified for each reliability evaluation tool. The CARE III program was designed primarily for analyzing ultrareliable flight control systems. The ARIES 82 program's primary use was to support university research and teaching. Both CARE III and ARIES 82 were not suited for determining the reliability of complex nodal networks of the type used to interconnect processing sites in the AIPS architecture. It was concluded that ARIES was not suitable for modeling advanced fault tolerant systems. It was further concluded that subject to some limitations (the difficulty in modeling systems with unpowered spare modules, systems where equipment maintenance must be considered, systems where failure depends on the sequence in which faults occurred, and systems where multiple faults greater than a double near coincident faults must be considered), CARE III is best suited for evaluating the reliability of advanced tolerant systems for air transport.
A Self-Stabilizing Byzantine-Fault-Tolerant Clock Synchronization Protocol

NASA Technical Reports Server (NTRS)

Malekpour, Mahyar R.

2009-01-01

This report presents a rapid Byzantine-fault-tolerant self-stabilizing clock synchronization protocol that is independent of application-specific requirements. It is focused on clock synchronization of a system in the presence of Byzantine faults after the cause of any transient faults has dissipated. A model of this protocol is mechanically verified using the Symbolic Model Verifier (SMV) [SMV] where the entire state space is examined and proven to self-stabilize in the presence of one arbitrary faulty node. Instances of the protocol are proven to tolerate bursts of transient failures and deterministically converge with a linear convergence time with respect to the synchronization period. This protocol does not rely on assumptions about the initial state of the system other than the presence of sufficient number of good nodes. All timing measures of variables are based on the node s local clock, and no central clock or externally generated pulse is used. The Byzantine faulty behavior modeled here is a node with arbitrarily malicious behavior that is allowed to influence other nodes at every clock tick. The only constraint is that the interactions are restricted to defined interfaces.
Naive Bayes Bearing Fault Diagnosis Based on Enhanced Independence of Data

PubMed Central

Zhang, Nannan; Wu, Lifeng; Yang, Jing; Guan, Yong

2018-01-01

The bearing is the key component of rotating machinery, and its performance directly determines the reliability and safety of the system. Data-based bearing fault diagnosis has become a research hotspot. Naive Bayes (NB), which is based on independent presumption, is widely used in fault diagnosis. However, the bearing data are not completely independent, which reduces the performance of NB algorithms. In order to solve this problem, we propose a NB bearing fault diagnosis method based on enhanced independence of data. The method deals with data vector from two aspects: the attribute feature and the sample dimension. After processing, the classification limitation of NB is reduced by the independence hypothesis. First, we extract the statistical characteristics of the original signal of the bearings effectively. Then, the Decision Tree algorithm is used to select the important features of the time domain signal, and the low correlation features is selected. Next, the Selective Support Vector Machine (SSVM) is used to prune the dimension data and remove redundant vectors. Finally, we use NB to diagnose the fault with the low correlation data. The experimental results show that the independent enhancement of data is effective for bearing fault diagnosis. PMID:29401730
Fault detection and isolation of high temperature proton exchange membrane fuel cell stack under the influence of degradation

NASA Astrophysics Data System (ADS)

Jeppesen, Christian; Araya, Samuel Simon; Sahlin, Simon Lennart; Thomas, Sobi; Andreasen, Søren Juhl; Kær, Søren Knudsen

2017-08-01

This study proposes a data-drive impedance-based methodology for fault detection and isolation of low and high cathode stoichiometry, high CO concentration in the anode gas, high methanol vapour concentrations in the anode gas and low anode stoichiometry, for high temperature PEM fuel cells. The fault detection and isolation algorithm is based on an artificial neural network classifier, which uses three extracted features as input. Two of the proposed features are based on angles in the impedance spectrum, and are therefore relative to specific points, and shown to be independent of degradation, contrary to other available feature extraction methods in the literature. The experimental data is based on a 35 day experiment, where 2010 unique electrochemical impedance spectroscopy measurements were recorded. The test of the algorithm resulted in a good detectability of the faults, except for high methanol vapour concentration in the anode gas fault, which was found to be difficult to distinguish from a normal operational data. The achieved accuracy for faults related to CO pollution, anode- and cathode stoichiometry is 100% success rate. Overall global accuracy on the test data is 94.6%.
Demonstration of Qubit Operations Below a Rigorous Fault Tolerance Threshold With Gate Set Tomography (Open Access, Publisher’s Version)

DTIC Science & Technology

2017-02-15

Maunz2 Quantum information processors promise fast algorithms for problems inaccessible to classical computers. But since qubits are noisy and error-prone...information processors have been demonstrated experimentally using superconducting circuits1–3, electrons in semiconductors4–6, trapped atoms and...qubit quantum information processor has been realized14, and single- qubit gates have demonstrated randomized benchmarking (RB) infidelities as low as 10
Online Detection of Broken Rotor Bar Fault in Induction Motors by Combining Estimation of Signal Parameters via Min-norm Algorithm and Least Square Method

NASA Astrophysics Data System (ADS)

Wang, Pan-Pan; Yu, Qiang; Hu, Yong-Jun; Miao, Chang-Xin

2017-11-01

Current research in broken rotor bar (BRB) fault detection in induction motors is primarily focused on a high-frequency resolution analysis of the stator current. Compared with a discrete Fourier transformation, the parametric spectrum estimation technique has a higher frequency accuracy and resolution. However, the existing detection methods based on parametric spectrum estimation cannot realize online detection, owing to the large computational cost. To improve the efficiency of BRB fault detection, a new detection method based on the min-norm algorithm and least square estimation is proposed in this paper. First, the stator current is filtered using a band-pass filter and divided into short overlapped data windows. The min-norm algorithm is then applied to determine the frequencies of the fundamental and fault characteristic components with each overlapped data window. Next, based on the frequency values obtained, a model of the fault current signal is constructed. Subsequently, a linear least squares problem solved through singular value decomposition is designed to estimate the amplitudes and phases of the related components. Finally, the proposed method is applied to a simulated current and an actual motor, the results of which indicate that, not only parametric spectrum estimation technique.
Award ER25750: Coordinated Infrastructure for Fault Tolerance Systems Indiana University Final Report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lumsdaine, Andrew

2013-03-08

The main purpose of the Coordinated Infrastructure for Fault Tolerance in Systems initiative has been to conduct research with a goal of providing end-to-end fault tolerance on a systemwide basis for applications and other system software. While fault tolerance has been an integral part of most high-performance computing (HPC) system software developed over the past decade, it has been treated mostly as a collection of isolated stovepipes. Visibility and response to faults has typically been limited to the particular hardware and software subsystems in which they are initially observed. Little fault information is shared across subsystems, allowing little flexibility ormore » control on a system-wide basis, making it practically impossible to provide cohesive end-to-end fault tolerance in support of scientific applications. As an example, consider faults such as communication link failures that can be seen by a network library but are not directly visible to the job scheduler, or consider faults related to node failures that can be detected by system monitoring software but are not inherently visible to the resource manager. If information about such faults could be shared by the network libraries or monitoring software, then other system software, such as a resource manager or job scheduler, could ensure that failed nodes or failed network links were excluded from further job allocations and that further diagnosis could be performed. As a founding member and one of the lead developers of the Open MPI project, our efforts over the course of this project have been focused on making Open MPI more robust to failures by supporting various fault tolerance techniques, and using fault information exchange and coordination between MPI and the HPC system software stack from the application, numeric libraries, and programming language runtime to other common system components such as jobs schedulers, resource managers, and monitoring tools.« less
Combinatorial Optimization Algorithms for Dynamic Multiple Fault Diagnosis in Automotive and Aerospace Applications

NASA Astrophysics Data System (ADS)

Kodali, Anuradha

In this thesis, we develop dynamic multiple fault diagnosis (DMFD) algorithms to diagnose faults that are sporadic and coupled. Firstly, we formulate a coupled factorial hidden Markov model-based (CFHMM) framework to diagnose dependent faults occurring over time (dynamic case). Here, we implement a mixed memory Markov coupling model to determine the most likely sequence of (dependent) fault states, the one that best explains the observed test outcomes over time. An iterative Gauss-Seidel coordinate ascent optimization method is proposed for solving the problem. A soft Viterbi algorithm is also implemented within the framework for decoding dependent fault states over time. We demonstrate the algorithm on simulated and real-world systems with coupled faults; the results show that this approach improves the correct isolation rate as compared to the formulation where independent fault states are assumed. Secondly, we formulate a generalization of set-covering, termed dynamic set-covering (DSC), which involves a series of coupled set-covering problems over time. The objective of the DSC problem is to infer the most probable time sequence of a parsimonious set of failure sources that explains the observed test outcomes over time. The DSC problem is NP-hard and intractable due to the fault-test dependency matrix that couples the failed tests and faults via the constraint matrix, and the temporal dependence of failure sources over time. Here, the DSC problem is motivated from the viewpoint of a dynamic multiple fault diagnosis problem, but it has wide applications in operations research, for e.g., facility location problem. Thus, we also formulated the DSC problem in the context of a dynamically evolving facility location problem. Here, a facility can be opened, closed, or can be temporarily unavailable at any time for a given requirement of demand points. These activities are associated with costs or penalties, viz., phase-in or phase-out for the opening or closing of a facility, respectively. The set-covering matrix encapsulates the relationship among the rows (tests or demand points) and columns (faults or locations) of the system at each time. By relaxing the coupling constraints using Lagrange multipliers, the DSC problem can be decoupled into independent subproblems, one for each column. Each subproblem is solved using the Viterbi decoding algorithm, and a primal feasible solution is constructed by modifying the Viterbi solutions via a heuristic. The proposed Viterbi-Lagrangian relaxation algorithm (VLRA) provides a measure of suboptimality via an approximate duality gap. As a major practical extension of the above problem, we also consider the problem of diagnosing faults with delayed test outcomes, termed delay-dynamic set-covering (DDSC), and experiment with real-world problems that exhibit masking faults. Also, we present simulation results on OR-library datasets (set-covering formulations are predominantly validated on these matrices in the literature), posed as facility location problems. Finally, we implement these algorithms to solve problems in aerospace and automotive applications. Firstly, we address the diagnostic ambiguity problem in aerospace and automotive applications by developing a dynamic fusion framework that includes dynamic multiple fault diagnosis algorithms. This improves the correct fault isolation rate, while minimizing the false alarm rates, by considering multiple faults instead of the traditional data-driven techniques based on single fault (class)-single epoch (static) assumption. The dynamic fusion problem is formulated as a maximum a posteriori decision problem of inferring the fault sequence based on uncertain outcomes of multiple binary classifiers over time. The fusion process involves three steps: the first step transforms the multi-class problem into dichotomies using error correcting output codes (ECOC), thereby solving the concomitant binary classification problems; the second step fuses the outcomes of multiple binary classifiers over time using a sliding window or block dynamic fusion method that exploits temporal data correlations over time. We solve this NP-hard optimization problem via a Lagrangian relaxation (variational) technique. The third step optimizes the classifier parameters, viz., probabilities of detection and false alarm, using a genetic algorithm. The proposed algorithm is demonstrated by computing the diagnostic performance metrics on a twin-spool commercial jet engine, an automotive engine, and UCI datasets (problems with high classification error are specifically chosen for experimentation). We show that the primal-dual optimization framework performed consistently better than any traditional fusion technique, even when it is forced to give a single fault decision across a range of classification problems. Secondly, we implement the inference algorithms to diagnose faults in vehicle systems that are controlled by a network of electronic control units (ECUs). The faults, originating from various interactions and especially between hardware and software, are particularly challenging to address. Our basic strategy is to divide the fault universe of such cyber-physical systems in a hierarchical manner, and monitor the critical variables/signals that have impact at different levels of interactions. The proposed diagnostic strategy is validated on an electrical power generation and storage system (EPGS) controlled by two ECUs in an environment with CANoe/MATLAB co-simulation. Eleven faults are injected with the failures originating in actuator hardware, sensor, controller hardware and software components. Diagnostic matrix is established to represent the relationship between the faults and the test outcomes (also known as fault signatures) via simulations. The results show that the proposed diagnostic strategy is effective in addressing the interaction-caused faults.
NETRA: A parallel architecture for integrated vision systems. 1: Architecture and organization

NASA Technical Reports Server (NTRS)

Choudhary, Alok N.; Patel, Janak H.; Ahuja, Narendra

1989-01-01

Computer vision is regarded as one of the most complex and computationally intensive problems. An integrated vision system (IVS) is considered to be a system that uses vision algorithms from all levels of processing for a high level application (such as object recognition). A model of computation is presented for parallel processing for an IVS. Using the model, desired features and capabilities of a parallel architecture suitable for IVSs are derived. Then a multiprocessor architecture (called NETRA) is presented. This architecture is highly flexible without the use of complex interconnection schemes. The topology of NETRA is recursively defined and hence is easily scalable from small to large systems. Homogeneity of NETRA permits fault tolerance and graceful degradation under faults. It is a recursively defined tree-type hierarchical architecture where each of the leaf nodes consists of a cluster of processors connected with a programmable crossbar with selective broadcast capability to provide for desired flexibility. A qualitative evaluation of NETRA is presented. Then general schemes are described to map parallel algorithms onto NETRA. Algorithms are classified according to their communication requirements for parallel processing. An extensive analysis of inter-cluster communication strategies in NETRA is presented, and parameters affecting performance of parallel algorithms when mapped on NETRA are discussed. Finally, a methodology to evaluate performance of algorithms on NETRA is described.
Tutorial: Advanced fault tree applications using HARP

NASA Technical Reports Server (NTRS)

Dugan, Joanne Bechta; Bavuso, Salvatore J.; Boyd, Mark A.

1993-01-01

Reliability analysis of fault tolerant computer systems for critical applications is complicated by several factors. These modeling difficulties are discussed and dynamic fault tree modeling techniques for handling them are described and demonstrated. Several advanced fault tolerant computer systems are described, and fault tree models for their analysis are presented. HARP (Hybrid Automated Reliability Predictor) is a software package developed at Duke University and NASA Langley Research Center that is capable of solving the fault tree models presented.
Fault tolerant filtering and fault detection for quantum systems driven by fields in single photon states

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gao, Qing, E-mail: qing.gao.chance@gmail.com; Dong, Daoyi, E-mail: daoyidong@gmail.com; Petersen, Ian R., E-mail: i.r.petersen@gmai.com

The purpose of this paper is to solve the fault tolerant filtering and fault detection problem for a class of open quantum systems driven by a continuous-mode bosonic input field in single photon states when the systems are subject to stochastic faults. Optimal estimates of both the system observables and the fault process are simultaneously calculated and characterized by a set of coupled recursive quantum stochastic differential equations.
Final Project Report. Scalable fault tolerance runtime technology for petascale computers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Krishnamoorthy, Sriram; Sadayappan, P

With the massive number of components comprising the forthcoming petascale computer systems, hardware failures will be routinely encountered during execution of large-scale applications. Due to the multidisciplinary, multiresolution, and multiscale nature of scientific problems that drive the demand for high end systems, applications place increasingly differing demands on the system resources: disk, network, memory, and CPU. In addition to MPI, future applications are expected to use advanced programming models such as those developed under the DARPA HPCS program as well as existing global address space programming models such as Global Arrays, UPC, and Co-Array Fortran. While there has been amore » considerable amount of work in fault tolerant MPI with a number of strategies and extensions for fault tolerance proposed, virtually none of advanced models proposed for emerging petascale systems is currently fault aware. To achieve fault tolerance, development of underlying runtime and OS technologies able to scale to petascale level is needed. This project has evaluated range of runtime techniques for fault tolerance for advanced programming models.« less
Advanced information processing system: The Army Fault-Tolerant Architecture detailed design overview

NASA Technical Reports Server (NTRS)

Harper, Richard E.; Babikyan, Carol A.; Butler, Bryan P.; Clasen, Robert J.; Harris, Chris H.; Lala, Jaynarayan H.; Masotto, Thomas K.; Nagle, Gail A.; Prizant, Mark J.; Treadwell, Steven

1994-01-01

The Army Avionics Research and Development Activity (AVRADA) is pursuing programs that would enable effective and efficient management of large amounts of situational data that occurs during tactical rotorcraft missions. The Computer Aided Low Altitude Night Helicopter Flight Program has identified automated Terrain Following/Terrain Avoidance, Nap of the Earth (TF/TA, NOE) operation as key enabling technology for advanced tactical rotorcraft to enhance mission survivability and mission effectiveness. The processing of critical information at low altitudes with short reaction times is life-critical and mission-critical necessitating an ultra-reliable/high throughput computing platform for dependable service for flight control, fusion of sensor data, route planning, near-field/far-field navigation, and obstacle avoidance operations. To address these needs the Army Fault Tolerant Architecture (AFTA) is being designed and developed. This computer system is based upon the Fault Tolerant Parallel Processor (FTPP) developed by Charles Stark Draper Labs (CSDL). AFTA is hard real-time, Byzantine, fault-tolerant parallel processor which is programmed in the ADA language. This document describes the results of the Detailed Design (Phase 2 and 3 of a 3-year project) of the AFTA development. This document contains detailed descriptions of the program objectives, the TF/TA NOE application requirements, architecture, hardware design, operating systems design, systems performance measurements and analytical models.
A fault diagnosis system for PV power station based on global partitioned gradually approximation method

NASA Astrophysics Data System (ADS)

Wang, S.; Zhang, X. N.; Gao, D. D.; Liu, H. X.; Ye, J.; Li, L. R.

2016-08-01

As the solar photovoltaic (PV) power is applied extensively, more attentions are paid to the maintenance and fault diagnosis of PV power plants. Based on analysis of the structure of PV power station, the global partitioned gradually approximation method is proposed as a fault diagnosis algorithm to determine and locate the fault of PV panels. The PV array is divided into 16x16 blocks and numbered. On the basis of modularly processing of the PV array, the current values of each block are analyzed. The mean current value of each block is used for calculating the fault weigh factor. The fault threshold is defined to determine the fault, and the shade is considered to reduce the probability of misjudgments. A fault diagnosis system is designed and implemented with LabVIEW. And it has some functions including the data realtime display, online check, statistics, real-time prediction and fault diagnosis. Through the data from PV plants, the algorithm is verified. The results show that the fault diagnosis results are accurate, and the system works well. The validity and the possibility of the system are verified by the results as well. The developed system will be benefit for the maintenance and management of large scale PV array.

Full-Authority Fault-Tolerant Electronic Engine Control System for Variable Cycle Engines.

DTIC Science & Technology

1982-04-01

single internally self-checked VLSI micro - processor . The selected configuration is an externally checked pair of com- mercially available...Electronic Engine Control FPMH Failures per Million Hours FTMP Fault Tolerant Multi- Processor FTSC Fault Tolerant Spaceborn Computer GRAMP Generalized...Removal * MTBR Mean Time Between Repair MTTF Mean Time to Failure xiii List of Abbreviations (continued) - NH High Pressure Rotor Speed O&S Operating
[Application of optimized parameters SVM based on photoacoustic spectroscopy method in fault diagnosis of power transformer].

PubMed

Zhang, Yu-xin; Cheng, Zhi-feng; Xu, Zheng-ping; Bai, Jing

2015-01-01

In order to solve the problems such as complex operation, consumption for the carrier gas and long test period in traditional power transformer fault diagnosis approach based on dissolved gas analysis (DGA), this paper proposes a new method which is detecting 5 types of characteristic gas content in transformer oil such as CH4, C2H2, C2H4, C2H6 and H2 based on photoacoustic Spectroscopy and C2H2/C2H4, CH4/H2, C2H4/C2H6 three-ratios data are calculated. The support vector machine model was constructed using cross validation method under five support vector machine functions and four kernel functions, heuristic algorithms were used in parameter optimization for penalty factor c and g, which to establish the best SVM model for the highest fault diagnosis accuracy and the fast computing speed. Particles swarm optimization and genetic algorithm two types of heuristic algorithms were comparative studied in this paper for accuracy and speed in optimization. The simulation result shows that SVM model composed of C-SVC, RBF kernel functions and genetic algorithm obtain 97. 5% accuracy in test sample set and 98. 333 3% accuracy in train sample set, and genetic algorithm was about two times faster than particles swarm optimization in computing speed. The methods described in this paper has many advantages such as simple operation, non-contact measurement, no consumption for the carrier gas, long test period, high stability and sensitivity, the result shows that the methods described in this paper can instead of the traditional transformer fault diagnosis by gas chromatography and meets the actual project needs in transformer fault diagnosis.
Software reliability through fault-avoidance and fault-tolerance

NASA Technical Reports Server (NTRS)

Vouk, Mladen A.; Mcallister, David F.

1992-01-01

Accomplishments in the following research areas are summarized: structure based testing, reliability growth, and design testability with risk evaluation; reliability growth models and software risk management; and evaluation of consensus voting, consensus recovery block, and acceptance voting. Four papers generated during the reporting period are included as appendices.
Failure detection and identification for a reconfigurable flight control system

NASA Technical Reports Server (NTRS)

Dallery, Francois

1987-01-01

Failure detection and identification logic for a fault-tolerant longitudinal control system were investigated. Aircraft dynamics were based upon the cruise condition for a hypothetical transonic business jet transport configuration. The fault-tolerant control system consists of conventional control and estimation plus a new outer loop containing failure detection, identification, and reconfiguration (FDIR) logic. It is assumed that the additional logic has access to all measurements, as well as to the outputs of the control and estimation logic. The pilot may also command the FDIR logic to perform special tests.
Peer-to-peer model for the area coverage and cooperative control of mobile sensor networks

NASA Astrophysics Data System (ADS)

Tan, Jindong; Xi, Ning

2004-09-01

This paper presents a novel model and distributed algorithms for the cooperation and redeployment of mobile sensor networks. A mobile sensor network composes of a collection of wireless connected mobile robots equipped with a variety of sensors. In such a sensor network, each mobile node has sensing, computation, communication, and locomotion capabilities. The locomotion ability enhances the autonomous deployment of the system. The system can be rapidly deployed to hostile environment, inaccessible terrains or disaster relief operations. The mobile sensor network is essentially a cooperative multiple robot system. This paper first presents a peer-to-peer model to define the relationship between neighboring communicating robots. Delaunay Triangulation and Voronoi diagrams are used to define the geometrical relationship between sensor nodes. This distributed model allows formal analysis for the fusion of spatio-temporal sensory information of the network. Based on the distributed model, this paper discusses a fault tolerant algorithm for autonomous self-deployment of the mobile robots. The algorithm considers the environment constraints, the presence of obstacles and the nonholonomic constraints of the robots. The distributed algorithm enables the system to reconfigure itself such that the area covered by the system can be enlarged. Simulation results have shown the effectiveness of the distributed model and deployment algorithms.
Universal fault-tolerant quantum computation with only transversal gates and error correction.

PubMed

Paetznick, Adam; Reichardt, Ben W

2013-08-30

Transversal implementations of encoded unitary gates are highly desirable for fault-tolerant quantum computation. Though transversal gates alone cannot be computationally universal, they can be combined with specially distilled resource states in order to achieve universality. We show that "triorthogonal" stabilizer codes, introduced for state distillation by Bravyi and Haah [Phys. Rev. A 86, 052329 (2012)], admit transversal implementation of the controlled-controlled-Z gate. We then construct a universal set of fault-tolerant gates without state distillation by using only transversal controlled-controlled-Z, transversal Hadamard, and fault-tolerant error correction. We also adapt the distillation procedure of Bravyi and Haah to Toffoli gates, improving on existing Toffoli distillation schemes.
A Vehicle Management End-to-End Testing and Analysis Platform for Validation of Mission and Fault Management Algorithms to Reduce Risk for NASA's Space Launch System

NASA Technical Reports Server (NTRS)

Trevino, Luis; Johnson, Stephen B.; Patterson, Jonathan; Teare, David

2015-01-01

The development of the Space Launch System (SLS) launch vehicle requires cross discipline teams with extensive knowledge of launch vehicle subsystems, information theory, and autonomous algorithms dealing with all operations from pre-launch through on orbit operations. The characteristics of these systems must be matched with the autonomous algorithm monitoring and mitigation capabilities for accurate control and response to abnormal conditions throughout all vehicle mission flight phases, including precipitating safing actions and crew aborts. This presents a large complex systems engineering challenge being addressed in part by focusing on the specific subsystems handling of off-nominal mission and fault tolerance. Using traditional model based system and software engineering design principles from the Unified Modeling Language (UML), the Mission and Fault Management (M&FM) algorithms are crafted and vetted in specialized Integrated Development Teams composed of multiple development disciplines. NASA also has formed an M&FM team for addressing fault management early in the development lifecycle. This team has developed a dedicated Vehicle Management End-to-End Testbed (VMET) that integrates specific M&FM algorithms, specialized nominal and off-nominal test cases, and vendor-supplied physics-based launch vehicle subsystem models. The flexibility of VMET enables thorough testing of the M&FM algorithms by providing configurable suites of both nominal and off-nominal test cases to validate the algorithms utilizing actual subsystem models. The intent is to validate the algorithms and substantiate them with performance baselines for each of the vehicle subsystems in an independent platform exterior to flight software test processes. In any software development process there is inherent risk in the interpretation and implementation of concepts into software through requirements and test processes. Risk reduction is addressed by working with other organizations such as S&MA, Structures and Environments, GNC, Orion, the Crew Office, Flight Operations, and Ground Operations by assessing performance of the M&FM algorithms in terms of their ability to reduce Loss of Mission and Loss of Crew probabilities. In addition, through state machine and diagnostic modeling, analysis efforts investigate a broader suite of failure effects and detection and responses that can be tested in VMET and confirm that responses do not create additional risks or cause undesired states through interactive dynamic effects with other algorithms and systems. VMET further contributes to risk reduction by prototyping and exercising the M&FM algorithms early in their implementation and without any inherent hindrances such as meeting FSW processor scheduling constraints due to their target platform - ARINC 653 partitioned OS, resource limitations, and other factors related to integration with other subsystems not directly involved with M&FM. The plan for VMET encompasses testing the original M&FM algorithms coded in the same C++ language and state machine architectural concepts as that used by Flight Software. This enables the development of performance standards and test cases to characterize the M&FM algorithms and sets a benchmark from which to measure the effectiveness of M&FM algorithms performance in the FSW development and test processes. This paper is outlined in a systematic fashion analogous to a lifecycle process flow for engineering development of algorithms into software and testing. Section I describes the NASA SLS M&FM context, presenting the current infrastructure, leading principles, methods, and participants. Section II defines the testing philosophy of the M&FM algorithms as related to VMET followed by section III, which presents the modeling methods of the algorithms to be tested and validated in VMET. Its details are then further presented in section IV followed by Section V presenting integration, test status, and state analysis. Finally, section VI addresses the summary and forward directions followed by the appendices presenting relevant information on terminology and documentation.
Fault Detection for Automotive Shock Absorber

NASA Astrophysics Data System (ADS)

Hernandez-Alcantara, Diana; Morales-Menendez, Ruben; Amezquita-Brooks, Luis

2015-11-01

Fault detection for automotive semi-active shock absorbers is a challenge due to the non-linear dynamics and the strong influence of the disturbances such as the road profile. First obstacle for this task, is the modeling of the fault, which has been shown to be of multiplicative nature. Many of the most widespread fault detection schemes consider additive faults. Two model-based fault algorithms for semiactive shock absorber are compared: an observer-based approach and a parameter identification approach. The performance of these schemes is validated and compared using a commercial vehicle model that was experimentally validated. Early results shows that a parameter identification approach is more accurate, whereas an observer-based approach is less sensible to parametric uncertainty.
Towards scalable Byzantine fault-tolerant replication

NASA Astrophysics Data System (ADS)

Zbierski, Maciej

2017-08-01

Byzantine fault-tolerant (BFT) replication is a powerful technique, enabling distributed systems to remain available and correct even in the presence of arbitrary faults. Unfortunately, existing BFT replication protocols are mostly load-unscalable, i.e. they fail to respond with adequate performance increase whenever new computational resources are introduced into the system. This article proposes a universal architecture facilitating the creation of load-scalable distributed services based on BFT replication. The suggested approach exploits parallel request processing to fully utilize the available resources, and uses a load balancer module to dynamically adapt to the properties of the observed client workload. The article additionally provides a discussion on selected deployment scenarios, and explains how the proposed architecture could be used to increase the dependability of contemporary large-scale distributed systems.
Fault tolerant and lifetime control architecture for autonomous vehicles

NASA Astrophysics Data System (ADS)

Bogdanov, Alexander; Chen, Yi-Liang; Sundareswaran, Venkataraman; Altshuler, Thomas

2008-04-01

Increased vehicle autonomy, survivability and utility can provide an unprecedented impact on mission success and are one of the most desirable improvements for modern autonomous vehicles. We propose a general architecture of intelligent resource allocation, reconfigurable control and system restructuring for autonomous vehicles. The architecture is based on fault-tolerant control and lifetime prediction principles, and it provides improved vehicle survivability, extended service intervals, greater operational autonomy through lower rate of time-critical mission failures and lesser dependence on supplies and maintenance. The architecture enables mission distribution, adaptation and execution constrained on vehicle and payload faults and desirable lifetime. The proposed architecture will allow managing missions more efficiently by weighing vehicle capabilities versus mission objectives and replacing the vehicle only when it is necessary.
Design of a fault-tolerant reversible control unit in molecular quantum-dot cellular automata

NASA Astrophysics Data System (ADS)

Bahadori, Golnaz; Houshmand, Monireh; Zomorodi-Moghadam, Mariam

Quantum-dot cellular automata (QCA) is a promising emerging nanotechnology that has been attracting considerable attention due to its small feature size, ultra-low power consuming, and high clock frequency. Therefore, there have been many efforts to design computational units based on this technology. Despite these advantages of the QCA-based nanotechnologies, their implementation is susceptible to a high error rate. On the other hand, using the reversible computing leads to zero bit erasures and no energy dissipation. As the reversible computation does not lose information, the fault detection happens with a high probability. In this paper, first we propose a fault-tolerant control unit using reversible gates which improves on the previous design. The proposed design is then synthesized to the QCA technology and is simulated by the QCADesigner tool. Evaluation results indicate the performance of the proposed approach.
Discretized Streams: A Fault-Tolerant Model for Scalable Stream Processing

DTIC Science & Technology

2012-12-14

Discretized Streams: A Fault-Tolerant Model for Scalable Stream Processing Matei Zaharia Tathagata Das Haoyuan Li Timothy Hunter Scott Shenker Ion...SUBTITLE Discretized Streams: A Fault-Tolerant Model for Scalable Stream Processing 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER...time. However, current programming models for distributed stream processing are relatively low-level often leaving the user to worry about consistency of
Simulating and Detecting Radiation-Induced Errors for Onboard Machine Learning

NASA Technical Reports Server (NTRS)

Wagstaff, Kiri L.; Bornstein, Benjamin; Granat, Robert; Tang, Benyang; Turmon, Michael

2009-01-01

Spacecraft processors and memory are subjected to high radiation doses and therefore employ radiation-hardened components. However, these components are orders of magnitude more expensive than typical desktop components, and they lag years behind in terms of speed and size. We have integrated algorithm-based fault tolerance (ABFT) methods into onboard data analysis algorithms to detect radiation-induced errors, which ultimately may permit the use of spacecraft memory that need not be fully hardened, reducing cost and increasing capability at the same time. We have also developed a lightweight software radiation simulator, BITFLIPS, that permits evaluation of error detection strategies in a controlled fashion, including the specification of the radiation rate and selective exposure of individual data structures. Using BITFLIPS, we evaluated our error detection methods when using a support vector machine to analyze data collected by the Mars Odyssey spacecraft. We found ABFT error detection for matrix multiplication is very successful, while error detection for Gaussian kernel computation still has room for improvement.
SABRE: a bio-inspired fault-tolerant electronic architecture.

PubMed

Bremner, P; Liu, Y; Samie, M; Dragffy, G; Pipe, A G; Tempesti, G; Timmis, J; Tyrrell, A M

2013-03-01

As electronic devices become increasingly complex, ensuring their reliable, fault-free operation is becoming correspondingly more challenging. It can be observed that, in spite of their complexity, biological systems are highly reliable and fault tolerant. Hence, we are motivated to take inspiration for biological systems in the design of electronic ones. In SABRE (self-healing cellular architectures for biologically inspired highly reliable electronic systems), we have designed a bio-inspired fault-tolerant hierarchical architecture for this purpose. As in biology, the foundation for the whole system is cellular in nature, with each cell able to detect faults in its operation and trigger intra-cellular or extra-cellular repair as required. At the next level in the hierarchy, arrays of cells are configured and controlled as function units in a transport triggered architecture (TTA), which is able to perform partial-dynamic reconfiguration to rectify problems that cannot be solved at the cellular level. Each TTA is, in turn, part of a larger multi-processor system which employs coarser grain reconfiguration to tolerate faults that cause a processor to fail. In this paper, we describe the details of operation of each layer of the SABRE hierarchy, and how these layers interact to provide a high systemic level of fault tolerance.
Methodologies for Adaptive Flight Envelope Estimation and Protection

NASA Technical Reports Server (NTRS)

Tang, Liang; Roemer, Michael; Ge, Jianhua; Crassidis, Agamemnon; Prasad, J. V. R.; Belcastro, Christine

2009-01-01

This paper reports the latest development of several techniques for adaptive flight envelope estimation and protection system for aircraft under damage upset conditions. Through the integration of advanced fault detection algorithms, real-time system identification of the damage/faulted aircraft and flight envelop estimation, real-time decision support can be executed autonomously for improving damage tolerance and flight recoverability. Particularly, a bank of adaptive nonlinear fault detection and isolation estimators were developed for flight control actuator faults; a real-time system identification method was developed for assessing the dynamics and performance limitation of impaired aircraft; online learning neural networks were used to approximate selected aircraft dynamics which were then inverted to estimate command margins. As off-line training of network weights is not required, the method has the advantage of adapting to varying flight conditions and different vehicle configurations. The key benefit of the envelope estimation and protection system is that it allows the aircraft to fly close to its limit boundary by constantly updating the controller command limits during flight. The developed techniques were demonstrated on NASA s Generic Transport Model (GTM) simulation environments with simulated actuator faults. Simulation results and remarks on future work are presented.
On providing the fault-tolerant operation of information systems based on open content management systems

NASA Astrophysics Data System (ADS)

Kratov, Sergey

2018-01-01

Modern information systems designed to service a wide range of users, regardless of their subject area, are increasingly based on Web technologies and are available to users via Internet. The article discusses the issues of providing the fault-tolerant operation of such information systems, based on free and open source content management systems. The toolkit available to administrators of similar systems is shown; the scenarios for using these tools are described. Options for organizing backups and restoring the operability of systems after failures are suggested. Application of the proposed methods and approaches allows providing continuous monitoring of the state of systems, timely response to the emergence of possible problems and their prompt solution.
Fault-tolerant Greenberger-Horne-Zeilinger paradox based on non-Abelian anyons.

PubMed

Deng, Dong-Ling; Wu, Chunfeng; Chen, Jing-Ling; Oh, C H

2010-08-06

We propose a scheme to test the Greenberger-Horne-Zeilinger paradox based on braidings of non-Abelian anyons, which are exotic quasiparticle excitations of topological states of matter. Because topological ordered states are robust against local perturbations, this scheme is in some sense "fault-tolerant" and might close the detection inefficiency loophole problem in previous experimental tests of the Greenberger-Horne-Zeilinger paradox. In turn, the construction of the Greenberger-Horne-Zeilinger paradox reveals the nonlocal property of non-Abelian anyons. Our results indicate that the non-Abelian fractional statistics is a pure quantum effect and cannot be described by local realistic theories. Finally, we present a possible experimental implementation of the scheme based on the anyonic interferometry technologies.
Monitoring and Control Interface Based on Virtual Sensors

PubMed Central

Escobar, Ricardo F.; Adam-Medina, Manuel; García-Beltrán, Carlos D.; Olivares-Peregrino, Víctor H.; Juárez-Romero, David; Guerrero-Ramírez, Gerardo V.

2014-01-01

In this article, a toolbox based on a monitoring and control interface (MCI) is presented and applied in a heat exchanger. The MCI was programed in order to realize sensor fault detection and isolation and fault tolerance using virtual sensors. The virtual sensors were designed from model-based high-gain observers. To develop the control task, different kinds of control laws were included in the monitoring and control interface. These control laws are PID, MPC and a non-linear model-based control law. The MCI helps to maintain the heat exchanger under operation, even if a temperature outlet sensor fault occurs; in the case of outlet temperature sensor failure, the MCI will display an alarm. The monitoring and control interface is used as a practical tool to support electronic engineering students with heat transfer and control concepts to be applied in a double-pipe heat exchanger pilot plant. The method aims to teach the students through the observation and manipulation of the main variables of the process and by the interaction with the monitoring and control interface (MCI) developed in LabVIEW©. The MCI provides the electronic engineering students with the knowledge of heat exchanger behavior, since the interface is provided with a thermodynamic model that approximates the temperatures and the physical properties of the fluid (density and heat capacity). An advantage of the interface is the easy manipulation of the actuator for an automatic or manual operation. Another advantage of the monitoring and control interface is that all algorithms can be manipulated and modified by the users. PMID:25365462
A Bayesian least squares support vector machines based framework for fault diagnosis and failure prognosis

NASA Astrophysics Data System (ADS)

Khawaja, Taimoor Saleem

A high-belief low-overhead Prognostics and Health Management (PHM) system is desired for online real-time monitoring of complex non-linear systems operating in a complex (possibly non-Gaussian) noise environment. This thesis presents a Bayesian Least Squares Support Vector Machine (LS-SVM) based framework for fault diagnosis and failure prognosis in nonlinear non-Gaussian systems. The methodology assumes the availability of real-time process measurements, definition of a set of fault indicators and the existence of empirical knowledge (or historical data) to characterize both nominal and abnormal operating conditions. An efficient yet powerful Least Squares Support Vector Machine (LS-SVM) algorithm, set within a Bayesian Inference framework, not only allows for the development of real-time algorithms for diagnosis and prognosis but also provides a solid theoretical framework to address key concepts related to classification for diagnosis and regression modeling for prognosis. SVM machines are founded on the principle of Structural Risk Minimization (SRM) which tends to find a good trade-off between low empirical risk and small capacity. The key features in SVM are the use of non-linear kernels, the absence of local minima, the sparseness of the solution and the capacity control obtained by optimizing the margin. The Bayesian Inference framework linked with LS-SVMs allows a probabilistic interpretation of the results for diagnosis and prognosis. Additional levels of inference provide the much coveted features of adaptability and tunability of the modeling parameters. The two main modules considered in this research are fault diagnosis and failure prognosis. With the goal of designing an efficient and reliable fault diagnosis scheme, a novel Anomaly Detector is suggested based on the LS-SVM machines. The proposed scheme uses only baseline data to construct a 1-class LS-SVM machine which, when presented with online data is able to distinguish between normal behavior and any abnormal or novel data during real-time operation. The results of the scheme are interpreted as a posterior probability of health (1 - probability of fault). As shown through two case studies in Chapter 3, the scheme is well suited for diagnosing imminent faults in dynamical non-linear systems. Finally, the failure prognosis scheme is based on an incremental weighted Bayesian LS-SVR machine. It is particularly suited for online deployment given the incremental nature of the algorithm and the quick optimization problem solved in the LS-SVR algorithm. By way of kernelization and a Gaussian Mixture Modeling (GMM) scheme, the algorithm can estimate "possibly" non-Gaussian posterior distributions for complex non-linear systems. An efficient regression scheme associated with the more rigorous core algorithm allows for long-term predictions, fault growth estimation with confidence bounds and remaining useful life (RUL) estimation after a fault is detected. The leading contributions of this thesis are (a) the development of a novel Bayesian Anomaly Detector for efficient and reliable Fault Detection and Identification (FDI) based on Least Squares Support Vector Machines, (b) the development of a data-driven real-time architecture for long-term Failure Prognosis using Least Squares Support Vector Machines, (c) Uncertainty representation and management using Bayesian Inference for posterior distribution estimation and hyper-parameter tuning, and finally (d) the statistical characterization of the performance of diagnosis and prognosis algorithms in order to relate the efficiency and reliability of the proposed schemes.
Making Classical Ground State Spin Computing Fault-Tolerant

DTIC Science & Technology

2010-06-24

approaches to perebor (brute-force searches) algorithms,” IEEE Annals of the History of Computing, 6, 384–400 (1984). [24] D. Bacon and S . T. Flammia ...Adiabatic gate teleportation,” Phys. Rev. Lett., 103, 120504 (2009). [25] D. Bacon and S . T. Flammia , “Adiabatic cluster state quantum computing...v1 [ co nd -m at . s ta t- m ec h] 2 2 Ju n 20 10 Report Documentation Page Form ApprovedOMB No. 0704-0188 Public reporting burden for the

Design and Evaluation of Fault-Tolerant VLSI/WSI Processor Arrays.

DTIC Science & Technology

1987-12-31

studies reported in this paper. In Section .3, the reliabuility characteristics of single-level FTPA’s are discusseri. Four different type of FTPA’s are...for processor arrays are proposed and studied . Stu- dies on algorithmic and software aspects relevant to systems are reported in items 4, 5, 8, 12 and...O’Keefe M., and Fortes, J. A. B., "A Comparative Study of Two Systematic Design Methodologies for Systolic Arrays," (Long Version) International Workshop on
Fault detection and isolation in GPS receiver autonomous integrity monitoring based on chaos particle swarm optimization-particle filter algorithm

NASA Astrophysics Data System (ADS)

Wang, Ershen; Jia, Chaoying; Tong, Gang; Qu, Pingping; Lan, Xiaoyu; Pang, Tao

2018-03-01

The receiver autonomous integrity monitoring (RAIM) is one of the most important parts in an avionic navigation system. Two problems need to be addressed to improve this system, namely, the degeneracy phenomenon and lack of samples for the standard particle filter (PF). However, the number of samples cannot adequately express the real distribution of the probability density function (i.e., sample impoverishment). This study presents a GPS receiver autonomous integrity monitoring (RAIM) method based on a chaos particle swarm optimization particle filter (CPSO-PF) algorithm with a log likelihood ratio. The chaos sequence generates a set of chaotic variables, which are mapped to the interval of optimization variables to improve particle quality. This chaos perturbation overcomes the potential for the search to become trapped in a local optimum in the particle swarm optimization (PSO) algorithm. Test statistics are configured based on a likelihood ratio, and satellite fault detection is then conducted by checking the consistency between the state estimate of the main PF and those of the auxiliary PFs. Based on GPS data, the experimental results demonstrate that the proposed algorithm can effectively detect and isolate satellite faults under conditions of non-Gaussian measurement noise. Moreover, the performance of the proposed novel method is better than that of RAIM based on the PF or PSO-PF algorithm.
Proactive Fault Tolerance for HPC with Xen Virtualization

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nagarajan, Arun Babu; Mueller, Frank; Engelmann, Christian

2007-01-01

with thousands of processors. At such large counts of compute nodes, faults are becoming common place. Current techniques to tolerate faults focus on reactive schemes to recover from faults and generally rely on a checkpoint/restart mechanism. Yet, in today's systems, node failures can often be anticipated by detecting a deteriorating health status. Instead of a reactive scheme for fault tolerance (FT), we are promoting a proactive one where processes automatically migrate from unhealthy nodes to healthy ones. Our approach relies on operating system virtualization techniques exemplied by but not limited to Xen. This paper contributes an automatic and transparent mechanismmore » for proactive FT for arbitrary MPI applications. It leverages virtualization techniques combined with health monitoring and load-based migration. We exploit Xen's live migration mechanism for a guest operating system (OS) to migrate an MPI task from a health-deteriorating node to a healthy one without stopping the MPI task during most of the migration. Our proactive FT daemon orchestrates the tasks of health monitoring, load determination and initiation of guest OS migration. Experimental results demonstrate that live migration hides migration costs and limits the overhead to only a few seconds making it an attractive approach to realize FT in HPC systems. Overall, our enhancements make proactive FT a valuable asset for long-running MPI application that is complementary to reactive FT using full checkpoint/ restart schemes since checkpoint frequencies can be reduced as fewer unanticipated failures are encountered. In the context of OS virtualization, we believe that this is the rst comprehensive study of proactive fault tolerance where live migration is actually triggered by health monitoring.« less
Support vector machines-based fault diagnosis for turbo-pump rotor

NASA Astrophysics Data System (ADS)

Yuan, Sheng-Fa; Chu, Fu-Lei

2006-05-01

Most artificial intelligence methods used in fault diagnosis are based on empirical risk minimisation principle and have poor generalisation when fault samples are few. Support vector machines (SVM) is a new general machine-learning tool based on structural risk minimisation principle that exhibits good generalisation even when fault samples are few. Fault diagnosis based on SVM is discussed. Since basic SVM is originally designed for two-class classification, while most of fault diagnosis problems are multi-class cases, a new multi-class classification of SVM named 'one to others' algorithm is presented to solve the multi-class recognition problems. It is a binary tree classifier composed of several two-class classifiers organised by fault priority, which is simple, and has little repeated training amount, and the rate of training and recognition is expedited. The effectiveness of the method is verified by the application to the fault diagnosis for turbo pump rotor.
Fault-tolerant onboard digital information switching and routing for communications satellites

NASA Technical Reports Server (NTRS)

Shalkhauser, Mary JO; Quintana, Jorge A.; Soni, Nitin J.; Kim, Heechul

1993-01-01

The NASA Lewis Research Center is developing an information-switching processor for future meshed very-small-aperture terminal (VSAT) communications satellites. The information-switching processor will switch and route baseband user data onboard the VSAT satellite to connect thousands of Earth terminals. Fault tolerance is a critical issue in developing information-switching processor circuitry that will provide and maintain reliable communications services. In parallel with the conceptual development of the meshed VSAT satellite network architecture, NASA designed and built a simple test bed for developing and demonstrating baseband switch architectures and fault-tolerance techniques. The meshed VSAT architecture and the switching demonstration test bed are described, and the initial switching architecture and the fault-tolerance techniques that were developed and tested are discussed.
H∞ robust fault-tolerant controller design for an autonomous underwater vehicle's navigation control system

NASA Astrophysics Data System (ADS)

Cheng, Xiang-Qin; Qu, Jing-Yuan; Yan, Zhe-Ping; Bian, Xin-Qian

2010-03-01

In order to improve the security and reliability for autonomous underwater vehicle (AUV) navigation, an H∞ robust fault-tolerant controller was designed after analyzing variations in state-feedback gain. Operating conditions and the design method were then analyzed so that the control problem could be expressed as a mathematical optimization problem. This permitted the use of linear matrix inequalities (LMI) to solve for the H∞ controller for the system. When considering different actuator failures, these conditions were then also mathematically expressed, allowing the H∞ robust controller to solve for these events and thus be fault-tolerant. Finally, simulation results showed that the H∞ robust fault-tolerant controller could provide precise AUV navigation control with strong robustness.
Advanced information processing system: The Army fault tolerant architecture conceptual study. Volume 2: Army fault tolerant architecture design and analysis

NASA Technical Reports Server (NTRS)

Harper, R. E.; Alger, L. S.; Babikyan, C. A.; Butler, B. P.; Friend, S. A.; Ganska, R. J.; Lala, J. H.; Masotto, T. K.; Meyer, A. J.; Morton, D. P.

1992-01-01

Described here is the Army Fault Tolerant Architecture (AFTA) hardware architecture and components and the operating system. The architectural and operational theory of the AFTA Fault Tolerant Data Bus is discussed. The test and maintenance strategy developed for use in fielded AFTA installations is presented. An approach to be used in reducing the probability of AFTA failure due to common mode faults is described. Analytical models for AFTA performance, reliability, availability, life cycle cost, weight, power, and volume are developed. An approach is presented for using VHSIC Hardware Description Language (VHDL) to describe and design AFTA's developmental hardware. A plan is described for verifying and validating key AFTA concepts during the Dem/Val phase. Analytical models and partial mission requirements are used to generate AFTA configurations for the TF/TA/NOE and Ground Vehicle missions.
Development and analysis of the Software Implemented Fault-Tolerance (SIFT) computer

NASA Technical Reports Server (NTRS)

Goldberg, J.; Kautz, W. H.; Melliar-Smith, P. M.; Green, M. W.; Levitt, K. N.; Schwartz, R. L.; Weinstock, C. B.

1984-01-01

SIFT (Software Implemented Fault Tolerance) is an experimental, fault-tolerant computer system designed to meet the extreme reliability requirements for safety-critical functions in advanced aircraft. Errors are masked by performing a majority voting operation over the results of identical computations, and faulty processors are removed from service by reassigning computations to the nonfaulty processors. This scheme has been implemented in a special architecture using a set of standard Bendix BDX930 processors, augmented by a special asynchronous-broadcast communication interface that provides direct, processor to processor communication among all processors. Fault isolation is accomplished in hardware; all other fault-tolerance functions, together with scheduling and synchronization are implemented exclusively by executive system software. The system reliability is predicted by a Markov model. Mathematical consistency of the system software with respect to the reliability model has been partially verified, using recently developed tools for machine-aided proof of program correctness.
An intelligent fault diagnosis method of rolling bearings based on regularized kernel Marginal Fisher analysis

NASA Astrophysics Data System (ADS)

Jiang, Li; Shi, Tielin; Xuan, Jianping

2012-05-01

Generally, the vibration signals of fault bearings are non-stationary and highly nonlinear under complicated operating conditions. Thus, it's a big challenge to extract optimal features for improving classification and simultaneously decreasing feature dimension. Kernel Marginal Fisher analysis (KMFA) is a novel supervised manifold learning algorithm for feature extraction and dimensionality reduction. In order to avoid the small sample size problem in KMFA, we propose regularized KMFA (RKMFA). A simple and efficient intelligent fault diagnosis method based on RKMFA is put forward and applied to fault recognition of rolling bearings. So as to directly excavate nonlinear features from the original high-dimensional vibration signals, RKMFA constructs two graphs describing the intra-class compactness and the inter-class separability, by combining traditional manifold learning algorithm with fisher criteria. Therefore, the optimal low-dimensional features are obtained for better classification and finally fed into the simplest K-nearest neighbor (KNN) classifier to recognize different fault categories of bearings. The experimental results demonstrate that the proposed approach improves the fault classification performance and outperforms the other conventional approaches.
Simple Random Sampling-Based Probe Station Selection for Fault Detection in Wireless Sensor Networks

PubMed Central

Huang, Rimao; Qiu, Xuesong; Rui, Lanlan

2011-01-01

Fault detection for wireless sensor networks (WSNs) has been studied intensively in recent years. Most existing works statically choose the manager nodes as probe stations and probe the network at a fixed frequency. This straightforward solution leads however to several deficiencies. Firstly, by only assigning the fault detection task to the manager node the whole network is out of balance, and this quickly overloads the already heavily burdened manager node, which in turn ultimately shortens the lifetime of the whole network. Secondly, probing with a fixed frequency often generates too much useless network traffic, which results in a waste of the limited network energy. Thirdly, the traditional algorithm for choosing a probing node is too complicated to be used in energy-critical wireless sensor networks. In this paper, we study the distribution characters of the fault nodes in wireless sensor networks, validate the Pareto principle that a small number of clusters contain most of the faults. We then present a Simple Random Sampling-based algorithm to dynamic choose sensor nodes as probe stations. A dynamic adjusting rule for probing frequency is also proposed to reduce the number of useless probing packets. The simulation experiments demonstrate that the algorithm and adjusting rule we present can effectively prolong the lifetime of a wireless sensor network without decreasing the fault detected rate. PMID:22163789
Simple random sampling-based probe station selection for fault detection in wireless sensor networks.

PubMed

Huang, Rimao; Qiu, Xuesong; Rui, Lanlan

2011-01-01

Fault detection for wireless sensor networks (WSNs) has been studied intensively in recent years. Most existing works statically choose the manager nodes as probe stations and probe the network at a fixed frequency. This straightforward solution leads however to several deficiencies. Firstly, by only assigning the fault detection task to the manager node the whole network is out of balance, and this quickly overloads the already heavily burdened manager node, which in turn ultimately shortens the lifetime of the whole network. Secondly, probing with a fixed frequency often generates too much useless network traffic, which results in a waste of the limited network energy. Thirdly, the traditional algorithm for choosing a probing node is too complicated to be used in energy-critical wireless sensor networks. In this paper, we study the distribution characters of the fault nodes in wireless sensor networks, validate the Pareto principle that a small number of clusters contain most of the faults. We then present a Simple Random Sampling-based algorithm to dynamic choose sensor nodes as probe stations. A dynamic adjusting rule for probing frequency is also proposed to reduce the number of useless probing packets. The simulation experiments demonstrate that the algorithm and adjusting rule we present can effectively prolong the lifetime of a wireless sensor network without decreasing the fault detected rate.
SCADA-based Operator Support System for Power Plant Equipment Fault Forecasting

NASA Astrophysics Data System (ADS)

Mayadevi, N.; Ushakumari, S. S.; Vinodchandra, S. S.

2014-12-01

Power plant equipment must be monitored closely to prevent failures from disrupting plant availability. Online monitoring technology integrated with hybrid forecasting techniques can be used to prevent plant equipment faults. A self learning rule-based expert system is proposed in this paper for fault forecasting in power plants controlled by supervisory control and data acquisition (SCADA) system. Self-learning utilizes associative data mining algorithms on the SCADA history database to form new rules that can dynamically update the knowledge base of the rule-based expert system. In this study, a number of popular associative learning algorithms are considered for rule formation. Data mining results show that the Tertius algorithm is best suited for developing a learning engine for power plants. For real-time monitoring of the plant condition, graphical models are constructed by K-means clustering. To build a time-series forecasting model, a multi layer preceptron (MLP) is used. Once created, the models are updated in the model library to provide an adaptive environment for the proposed system. Graphical user interface (GUI) illustrates the variation of all sensor values affecting a particular alarm/fault, as well as the step-by-step procedure for avoiding critical situations and consequent plant shutdown. The forecasting performance is evaluated by computing the mean absolute error and root mean square error of the predictions.
Detection of faults and software reliability analysis

NASA Technical Reports Server (NTRS)

Knight, J. C.

1986-01-01

Multiversion or N-version programming was proposed as a method of providing fault tolerance in software. The approach requires the separate, independent preparation of multiple versions of a piece of software for some application. Specific topics addressed are: failure probabilities in N-version systems, consistent comparison in N-version systems, descriptions of the faults found in the Knight and Leveson experiment, analytic models of comparison testing, characteristics of the input regions that trigger faults, fault tolerance through data diversity, and the relationship between failures caused by automatically seeded faults.
Position, Attitude, and Fault-Tolerant Control of Tilting-Rotor Quadcopter

NASA Astrophysics Data System (ADS)

Kumar, Rumit

The aim of this thesis is to present algorithms for autonomous control of tilt-rotor quadcopter UAV. In particular, this research work describes position, attitude and fault tolerant control in tilt-rotor quadcopter. Quadcopters are one of the most popular and reliable unmanned aerial systems because of the design simplicity, hovering capabilities and minimal operational cost. Numerous applications for quadcopters have been explored all over the world but very little work has been done to explore design enhancements and address the fault-tolerant capabilities of the quadcopters. The tilting rotor quadcopter is a structural advancement of traditional quadcopter and it provides additional actuated controls as the propeller motors are actuated for tilt which can be utilized to improve efficiency of the aerial vehicle during flight. The tilting rotor quadcopter design is accomplished by using an additional servo motor for each rotor that enables the rotor to tilt about the axis of the quadcopter arm. Tilting rotor quadcopter is a more agile version of conventional quadcopter and it is a fully actuated system. The tilt-rotor quadcopter is capable of following complex trajectories with ease. The control strategy in this work is to use the propeller tilts for position and orientation control during autonomous flight of the quadcopter. In conventional quadcopters, two propellers rotate in clockwise direction and other two propellers rotate in counter clockwise direction to cancel out the effective yawing moment of the system. The variation in rotational speeds of these four propellers is utilized for maneuvering. On the other hand, this work incorporates use of varying propeller rotational speeds along with tilting of the propellers for maneuvering during flight. The rotational motion of propellers work in sync with propeller tilts to control the position and orientation of the UAV during the flight. A PD flight controller is developed to achieve various modes of the flight. Further, the performance of the controller and the tilt-rotor design has been compared with respect to the conventional quadcopter in the presence of wind disturbances and sensor uncertainties. In this work, another novel feed-forward control design approach is presented for complex trajectory tracking during autonomous flight. Differential flatness based feed-forward position control is employed to enhance the performance of the UAV during complex trajectory tracking. By accounting for differential flatness based feed-forward control input parameters, a new PD controller is designed to achieve the desired performance in autonomous flight. The results for tracking complex trajectories have been presented by performing numerical simulations with and without environmental uncertainties to demonstrate robustness of the controller during flight. The conventional quadcopters are under-actuated systems and, upon failure of one propeller, the conventional quadcopter would have a tendency of spinning about the primary axis fixed to the vehicle as an outcome of the asymmetry in resultant yawing moment in the system. In this work, control of tilt-rotor quadcopter is presented upon failure of one propeller during flight. The tilt-rotor quadcopter is capable of handling a propeller failure and hence is a fault-tolerant system. The dynamic model of tilting-rotor quadcopter with one propeller failure is derived and a controller has been designed to achieve hovering and navigation capability. The simulation results of way point navigation, complex trajectory tracking and fault-tolerance are presented.
SFT: Scalable Fault Tolerance

DOE Office of Scientific and Technical Information (OSTI.GOV)

Petrini, Fabrizio; Nieplocha, Jarek; Tipparaju, Vinod

2006-04-15

In this paper we will present a new technology that we are currently developing within the SFT: Scalable Fault Tolerance FastOS project which seeks to implement fault tolerance at the operating system level. Major design goals include dynamic reallocation of resources to allow continuing execution in the presence of hardware failures, very high scalability, high efficiency (low overhead), and transparency—requiring no changes to user applications. Our technology is based on a global coordination mechanism, that enforces transparent recovery lines in the system, and TICK, a lightweight, incremental checkpointing software architecture implemented as a Linux kernel module. TICK is completely user-transparentmore » and does not require any changes to user code or system libraries; it is highly responsive: an interrupt, such as a timer interrupt, can trigger a checkpoint in as little as 2.5μs; and it supports incremental and full checkpoints with minimal overhead—less than 6% with full checkpointing to disk performed as frequently as once per minute.« less
The use of automatic programming techniques for fault tolerant computing systems

NASA Technical Reports Server (NTRS)

Wild, C.

1985-01-01

It is conjectured that the production of software for ultra-reliable computing systems such as required by Space Station, aircraft, nuclear power plants and the like will require a high degree of automation as well as fault tolerance. In this paper, the relationship between automatic programming techniques and fault tolerant computing systems is explored. Initial efforts in the automatic synthesis of code from assertions to be used for error detection as well as the automatic generation of assertions and test cases from abstract data type specifications is outlined. Speculation on the ability to generate truly diverse designs capable of recovery from errors by exploring alternate paths in the program synthesis tree is discussed. Some initial thoughts on the use of knowledge based systems for the global detection of abnormal behavior using expectations and the goal-directed reconfiguration of resources to meet critical mission objectives are given. One of the sources of information for these systems would be the knowledge captured during the automatic programming process.
A Fault Tolerant System for an Integrated Avionics Sensor Configuration

NASA Technical Reports Server (NTRS)

Caglayan, A. K.; Lancraft, R. E.

1984-01-01

An aircraft sensor fault tolerant system methodology for the Transport Systems Research Vehicle in a Microwave Landing System (MLS) environment is described. The fault tolerant system provides reliable estimates in the presence of possible failures both in ground-based navigation aids, and in on-board flight control and inertial sensors. Sensor failures are identified by utilizing the analytic relationships between the various sensors arising from the aircraft point mass equations of motion. The estimation and failure detection performance of the software implementation (called FINDS) of the developed system was analyzed on a nonlinear digital simulation of the research aircraft. Simulation results showing the detection performance of FINDS, using a dual redundant sensor compliment, are presented for bias, hardover, null, ramp, increased noise and scale factor failures. In general, the results show that FINDS can distinguish between normal operating sensor errors and failures while providing an excellent detection speed for bias failures in the MLS, indicated airspeed, attitude and radar altimeter sensors.
Identification of transformer fault based on dissolved gas analysis using hybrid support vector machine-modified evolutionary particle swarm optimisation

PubMed Central

2018-01-01

Early detection of power transformer fault is important because it can reduce the maintenance cost of the transformer and it can ensure continuous electricity supply in power systems. Dissolved Gas Analysis (DGA) technique is commonly used to identify oil-filled power transformer fault type but utilisation of artificial intelligence method with optimisation methods has shown convincing results. In this work, a hybrid support vector machine (SVM) with modified evolutionary particle swarm optimisation (EPSO) algorithm was proposed to determine the transformer fault type. The superiority of the modified PSO technique with SVM was evaluated by comparing the results with the actual fault diagnosis, unoptimised SVM and previous reported works. Data reduction was also applied using stepwise regression prior to the training process of SVM to reduce the training time. It was found that the proposed hybrid SVM-Modified EPSO (MEPSO)-Time Varying Acceleration Coefficient (TVAC) technique results in the highest correct identification percentage of faults in a power transformer compared to other PSO algorithms. Thus, the proposed technique can be one of the potential solutions to identify the transformer fault type based on DGA data on site. PMID:29370230
Identification of transformer fault based on dissolved gas analysis using hybrid support vector machine-modified evolutionary particle swarm optimisation.

PubMed

Illias, Hazlee Azil; Zhao Liang, Wee

2018-01-01

Early detection of power transformer fault is important because it can reduce the maintenance cost of the transformer and it can ensure continuous electricity supply in power systems. Dissolved Gas Analysis (DGA) technique is commonly used to identify oil-filled power transformer fault type but utilisation of artificial intelligence method with optimisation methods has shown convincing results. In this work, a hybrid support vector machine (SVM) with modified evolutionary particle swarm optimisation (EPSO) algorithm was proposed to determine the transformer fault type. The superiority of the modified PSO technique with SVM was evaluated by comparing the results with the actual fault diagnosis, unoptimised SVM and previous reported works. Data reduction was also applied using stepwise regression prior to the training process of SVM to reduce the training time. It was found that the proposed hybrid SVM-Modified EPSO (MEPSO)-Time Varying Acceleration Coefficient (TVAC) technique results in the highest correct identification percentage of faults in a power transformer compared to other PSO algorithms. Thus, the proposed technique can be one of the potential solutions to identify the transformer fault type based on DGA data on site.
Research on criticality analysis method of CNC machine tools components under fault rate correlation

NASA Astrophysics Data System (ADS)

Gui-xiang, Shen; Xian-zhuo, Zhao; Zhang, Ying-zhi; Chen-yu, Han

2018-02-01

In order to determine the key components of CNC machine tools under fault rate correlation, a system component criticality analysis method is proposed. Based on the fault mechanism analysis, the component fault relation is determined, and the adjacency matrix is introduced to describe it. Then, the fault structure relation is hierarchical by using the interpretive structure model (ISM). Assuming that the impact of the fault obeys the Markov process, the fault association matrix is described and transformed, and the Pagerank algorithm is used to determine the relative influence values, combined component fault rate under time correlation can obtain comprehensive fault rate. Based on the fault mode frequency and fault influence, the criticality of the components under the fault rate correlation is determined, and the key components are determined to provide the correct basis for equationting the reliability assurance measures. Finally, taking machining centers as an example, the effectiveness of the method is verified.

Relaxed fault-tolerant hardware implementation of neural networks in the presence of multiple transient errors.

PubMed

Mahdiani, Hamid Reza; Fakhraie, Sied Mehdi; Lucas, Caro

2012-08-01

Reliability should be identified as the most important challenge in future nano-scale very large scale integration (VLSI) implementation technologies for the development of complex integrated systems. Normally, fault tolerance (FT) in a conventional system is achieved by increasing its redundancy, which also implies higher implementation costs and lower performance that sometimes makes it even infeasible. In contrast to custom approaches, a new class of applications is categorized in this paper, which is inherently capable of absorbing some degrees of vulnerability and providing FT based on their natural properties. Neural networks are good indicators of imprecision-tolerant applications. We have also proposed a new class of FT techniques called relaxed fault-tolerant (RFT) techniques which are developed for VLSI implementation of imprecision-tolerant applications. The main advantage of RFT techniques with respect to traditional FT solutions is that they exploit inherent FT of different applications to reduce their implementation costs while improving their performance. To show the applicability as well as the efficiency of the RFT method, the experimental results for implementation of a face-recognition computationally intensive neural network and its corresponding RFT realization are presented in this paper. The results demonstrate promising higher performance of artificial neural network VLSI solutions for complex applications in faulty nano-scale implementation environments.
Fault Injection Campaign for a Fault Tolerant Duplex Framework

NASA Technical Reports Server (NTRS)

Sacco, Gian Franco; Ferraro, Robert D.; von llmen, Paul; Rennels, Dave A.

2007-01-01

Fault tolerance is an efficient approach adopted to avoid or reduce the damage of a system failure. In this work we present the results of a fault injection campaign we conducted on the Duplex Framework (DF). The DF is a software developed by the UCLA group [1, 2] that uses a fault tolerant approach and allows to run two replicas of the same process on two different nodes of a commercial off-the-shelf (COTS) computer cluster. A third process running on a different node, constantly monitors the results computed by the two replicas, and eventually restarts the two replica processes if an inconsistency in their computation is detected. This approach is very cost efficient and can be adopted to control processes on spacecrafts where the fault rate produced by cosmic rays is not very high.
Reconfigurable tree architectures using subtree oriented fault tolerance

NASA Technical Reports Server (NTRS)

Lowrie, Matthew B.

1987-01-01

An approach to the design of reconfigurable tree architecture is presented in which spare processors are allocated at the leaves. The approach is unique in that spares are associated with subtrees and sharing of spares between these subtrees can occur. The Subtree Oriented Fault Tolerance (SOFT) approach is more reliable than previous approaches capable of tolerating link and switch failures for both single chip and multichip tree implementations while reducing redundancy in terms of both spare processors and links. VLSI layout is 0(n) for binary trees and is directly extensible to N-ary trees and fault tolerance through performance degradation.
Airborne Advanced Reconfigurable Computer System (ARCS)

NASA Technical Reports Server (NTRS)

Bjurman, B. E.; Jenkins, G. M.; Masreliez, C. J.; Mcclellan, K. L.; Templeman, J. E.

1976-01-01

A digital computer subsystem fault-tolerant concept was defined, and the potential benefits and costs of such a subsystem were assessed when used as the central element of a new transport's flight control system. The derived advanced reconfigurable computer system (ARCS) is a triple-redundant computer subsystem that automatically reconfigures, under multiple fault conditions, from triplex to duplex to simplex operation, with redundancy recovery if the fault condition is transient. The study included criteria development covering factors at the aircraft's operation level that would influence the design of a fault-tolerant system for commercial airline use. A new reliability analysis tool was developed for evaluating redundant, fault-tolerant system availability and survivability; and a stringent digital system software design methodology was used to achieve design/implementation visibility.
Fault-tolerant communication channel structures

NASA Technical Reports Server (NTRS)

Tai, Ann T. (Inventor); Alkalai, Leon (Inventor); Chau, Savio N. (Inventor)

2006-01-01

Systems and techniques for implementing fault-tolerant communication channels and features in communication systems. Selected commercial-off-the-shelf devices can be integrated in such systems to reduce the cost.
Lambda network having 2.sup.m-1 nodes in each of m stages with each node coupled to four other nodes for bidirectional routing of data packets between nodes

DOEpatents

Napolitano, Jr., Leonard M.

1995-01-01

The Lambda network is a single stage, packet-switched interprocessor communication network for a distributed memory, parallel processor computer. Its design arises from the desired network characteristics of minimizing mean and maximum packet transfer time, local routing, expandability, deadlock avoidance, and fault tolerance. The network is based on fixed degree nodes and has mean and maximum packet transfer distances where n is the number of processors. The routing method is detailed, as are methods for expandability, deadlock avoidance, and fault tolerance.
Preliminary design of the redundant software experiment

NASA Technical Reports Server (NTRS)

Campbell, Roy; Deimel, Lionel; Eckhardt, Dave, Jr.; Kelly, John; Knight, John; Lauterbach, Linda; Lee, Larry; Mcallister, Dave; Mchugh, John

1985-01-01

The goal of the present experiment is to characterize the fault distributions of highly reliable software replicates, constructed using techniques and environments which are similar to those used in comtemporary industrial software facilities. The fault distributions and their effect on the reliability of fault tolerant configurations of the software will be determined through extensive life testing of the replicates against carefully constructed randomly generated test data. Each detected error will be carefully analyzed to provide insight in to their nature and cause. A direct objective is to develop techniques for reducing the intensity of coincident errors, thus increasing the reliability gain which can be achieved with fault tolerance. Data on the reliability gains realized, and the cost of the fault tolerant configurations can be used to design a companion experiment to determine the cost effectiveness of the fault tolerant strategy. Finally, the data and analysis produced by this experiment will be valuable to the software engineering community as a whole because it will provide a useful insight into the nature and cause of hard to find, subtle faults which escape standard software engineering validation techniques and thus persist far into the software life cycle.
Experimental Demonstration of Fault-Tolerant State Preparation with Superconducting Qubits.

PubMed

Takita, Maika; Cross, Andrew W; Córcoles, A D; Chow, Jerry M; Gambetta, Jay M

2017-11-03

Robust quantum computation requires encoding delicate quantum information into degrees of freedom that are hard for the environment to change. Quantum encodings have been demonstrated in many physical systems by observing and correcting storage errors, but applications require not just storing information; we must accurately compute even with faulty operations. The theory of fault-tolerant quantum computing illuminates a way forward by providing a foundation and collection of techniques for limiting the spread of errors. Here we implement one of the smallest quantum codes in a five-qubit superconducting transmon device and demonstrate fault-tolerant state preparation. We characterize the resulting code words through quantum process tomography and study the free evolution of the logical observables. Our results are consistent with fault-tolerant state preparation in a protected qubit subspace.
A Voyager attitude control perspective on fault tolerant systems

NASA Technical Reports Server (NTRS)

Rasmussen, R. D.; Litty, E. C.

1981-01-01

In current spacecraft design, a trend can be observed to achieve greater fault tolerance through the application of on-board software dedicated to detecting and isolating failures. Whether fault tolerance through software can meet the desired objectives depends on very careful consideration and control of the system in which the software is imbedded. The considered investigation has the objective to provide some of the insight needed for the required analysis of the system. A description is given of the techniques which have been developed in this connection during the development of the Voyager spacecraft. The Voyager Galileo Attitude and Articulation Control Subsystem (AACS) fault tolerant design is discussed to emphasize basic lessons learned from this experience. The central driver of hardware redundancy implementation on Voyager was known as the 'single point failure criterion'.
Using Rollback Avoidance to Mitigate Failures in Next-Generation Extreme-Scale Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Levy, Scott N.

2016-05-01

High-performance computing (HPC) systems enable scientists to numerically model complex phenomena in many important physical systems. The next major milestone in the development of HPC systems is the construction of the rst supercomputer capable executing more than an exa op, 10 18 oating point operations per second. On systems of this scale, failures will occur much more frequently than on current systems. As a result, resilience is a key obstacle to building next-generation extremescale systems. Coordinated checkpointing is currently the most widely-used mechanism for handling failures on HPC systems. Although coordinated checkpointing remains e ective on current systems, increasing themore » scale of today's systems to build next-generation systems will increase the cost of fault tolerance as more and more time is taken away from the application to protect against or recover from failure. Rollback avoidance techniques seek to mitigate the cost of checkpoint/restart by allowing an application to continue its execution rather than rolling back to an earlier checkpoint when failures occur. These techniqes include failure prediction and preventive migration, replicated computation, fault-tolerant algorithms, and softwarebased memory fault correction. In this thesis, we examine how rollback avoidance techniques can be used to address failures on extreme-scale systems. Using a combination of analytic modeling and simulation, we evaluate the potential impact of rollback avoidance on these systems. We then present a novel rollback avoidance technique that exploits similarities in application memory. Finally, we examine the feasibility of using this technique to protect against memory faults in kernel memory.« less
Fault tolerant programmable digital attitude control electronics study

NASA Technical Reports Server (NTRS)

Sorensen, A. A.

1974-01-01

The attitude control electronics mechanization study to develop a fault tolerant autonomous concept for a three axis system is reported. Programmable digital electronics are compared to general purpose digital computers. The requirements, constraints, and tradeoffs are discussed. It is concluded that: (1) general fault tolerance can be achieved relatively economically, (2) recovery times of less than one second can be obtained, (3) the number of faulty behavior patterns must be limited, and (4) adjoined processes are the best indicators of faulty operation.
Symposium on the Interface: Computing Science and Statistics (20th). Theme: Computationally Intensive Methods in Statistics Held in Reston, Virginia on April 20-23, 1988

DTIC Science & Technology

1988-08-20

34 William A. Link, Patuxent Wildlife Research Center "Increasing reliability of multiversion fault-tolerant software design by modulation," Junryo 3... Multiversion lault-Tolerant Software Design by Modularization Junryo Miyashita Department of Computer Science California state University at san Bernardino Fault...They shall beE refered to as " multiversion fault-tolerant software design". Onel problem of developing multi-versions of a program is the high cost
OBIST methodology incorporating modified sensitivity of pulses for active analogue filter components

NASA Astrophysics Data System (ADS)

Khade, R. H.; Chaudhari, D. S.

2018-03-01

In this paper, oscillation-based built-in self-test method is used to diagnose catastrophic and parametric faults in integrated circuits. Sallen-Key low pass filter and high pass filter circuits with different gains are used to investigate defects. Variation in seven parameters of operational amplifier (OP-AMP) like gain, input impedance, output impedance, slew rate, input bias current, input offset current, input offset voltage and catastrophic as well as parametric defects in components outside OP-AMP are introduced in the circuit and simulation results are analysed. Oscillator output signal is converted to pulses which are used to generate a signature of the circuit. The signature and pulse count changes with the type of fault present in the circuit under test (CUT). The change in oscillation frequency is observed for fault detection. Designer has flexibility to predefine tolerance band of cut-off frequency and range of pulses for which circuit should be accepted. The fault coverage depends upon the required tolerance band of the CUT. We propose a modification of sensitivity of parameter (pulses) to avoid test escape and enhance yield. Result shows that the method provides 100% fault coverage for catastrophic faults.
Design of the Protocol Processor for the ROBUS-2 Communication System

NASA Technical Reports Server (NTRS)

Torres-Pomales, Wilfredo; Malekpour, Mahyar R.; Miner, Paul S.

2005-01-01

The ROBUS-2 Protocol Processor (RPP) is a custom-designed hardware component implementing the functionality of the ROBUS-2 fault-tolerant communication system. The Reliable Optical Bus (ROBUS) is the core communication system of the Scalable Processor-Independent Design for Enhanced Reliability (SPIDER), a general-purpose fault tolerant integrated modular architecture currently under development at NASA Langley Research Center. ROBUS is a time-division multiple access (TDMA) broadcast communication system with medium access control by means of time-indexed communication schedule. ROBUS-2 is a developmental version of the ROBUS providing guaranteed fault-tolerant services to the attached processing elements (PEs), in the presence of a bounded number of faults. These services include message broadcast (Byzantine Agreement), dynamic communication schedule update, time reference (clock synchronization), and distributed diagnosis (group membership). ROBUS also features fault-tolerant startup and restart capabilities. ROBUS-2 tolerates internal as well as PE faults, and incorporates a dynamic self-reconfiguration capability driven by the internal diagnostic system. ROBUS consists of RPPs connected to each other by a lower-level physical communication network. The RPP has a pipelined architecture and the design is parameterized in the behavioral and structural domains. The design of the RPP enables the bus to achieve a PE-message throughput that approaches the available bandwidth at the physical layer.
Power maximization of variable-speed variable-pitch wind turbines using passive adaptive neural fault tolerant control

NASA Astrophysics Data System (ADS)

Habibi, Hamed; Rahimi Nohooji, Hamed; Howard, Ian

2017-09-01

Power maximization has always been a practical consideration in wind turbines. The question of how to address optimal power capture, especially when the system dynamics are nonlinear and the actuators are subject to unknown faults, is significant. This paper studies the control methodology for variable-speed variable-pitch wind turbines including the effects of uncertain nonlinear dynamics, system fault uncertainties, and unknown external disturbances. The nonlinear model of the wind turbine is presented, and the problem of maximizing extracted energy is formulated by designing the optimal desired states. With the known system, a model-based nonlinear controller is designed; then, to handle uncertainties, the unknown nonlinearities of the wind turbine are estimated by utilizing radial basis function neural networks. The adaptive neural fault tolerant control is designed passively to be robust on model uncertainties, disturbances including wind speed and model noises, and completely unknown actuator faults including generator torque and pitch actuator torque. The Lyapunov direct method is employed to prove that the closed-loop system is uniformly bounded. Simulation studies are performed to verify the effectiveness of the proposed method.
Satellite Fault Diagnosis Using Support Vector Machines Based on a Hybrid Voting Mechanism

PubMed Central

Yang, Shuqiang; Zhu, Xiaoqian; Jin, Songchang; Wang, Xiang

2014-01-01

The satellite fault diagnosis has an important role in enhancing the safety, reliability, and availability of the satellite system. However, the problem of enormous parameters and multiple faults makes a challenge to the satellite fault diagnosis. The interactions between parameters and misclassifications from multiple faults will increase the false alarm rate and the false negative rate. On the other hand, for each satellite fault, there is not enough fault data for training. To most of the classification algorithms, it will degrade the performance of model. In this paper, we proposed an improving SVM based on a hybrid voting mechanism (HVM-SVM) to deal with the problem of enormous parameters, multiple faults, and small samples. Many experimental results show that the accuracy of fault diagnosis using HVM-SVM is improved. PMID:25215324
Abnormal fault-recovery characteristics of the fault-tolerant multiprocessor uncovered using a new fault-injection methodology

NASA Technical Reports Server (NTRS)

Padilla, Peter A.

1991-01-01

An investigation was made in AIRLAB of the fault handling performance of the Fault Tolerant MultiProcessor (FTMP). Fault handling errors detected during fault injection experiments were characterized. In these fault injection experiments, the FTMP disabled a working unit instead of the faulted unit once in every 500 faults, on the average. System design weaknesses allow active faults to exercise a part of the fault management software that handles Byzantine or lying faults. Byzantine faults behave such that the faulted unit points to a working unit as the source of errors. The design's problems involve: (1) the design and interface between the simplex error detection hardware and the error processing software, (2) the functional capabilities of the FTMP system bus, and (3) the communication requirements of a multiprocessor architecture. These weak areas in the FTMP's design increase the probability that, for any hardware fault, a good line replacement unit (LRU) is mistakenly disabled by the fault management software.
A Vehicle Management End-to-End Testing and Analysis Platform for Validation of Mission and Fault Management Algorithms to Reduce Risk for NASA's Space Launch System

NASA Technical Reports Server (NTRS)

Trevino, Luis; Patterson, Jonathan; Teare, David; Johnson, Stephen

2015-01-01

The engineering development of the new Space Launch System (SLS) launch vehicle requires cross discipline teams with extensive knowledge of launch vehicle subsystems, information theory, and autonomous algorithms dealing with all operations from pre-launch through on orbit operations. The characteristics of these spacecraft systems must be matched with the autonomous algorithm monitoring and mitigation capabilities for accurate control and response to abnormal conditions throughout all vehicle mission flight phases, including precipitating safing actions and crew aborts. This presents a large and complex system engineering challenge, which is being addressed in part by focusing on the specific subsystems involved in the handling of off-nominal mission and fault tolerance with response management. Using traditional model based system and software engineering design principles from the Unified Modeling Language (UML) and Systems Modeling Language (SysML), the Mission and Fault Management (M&FM) algorithms for the vehicle are crafted and vetted in specialized Integrated Development Teams (IDTs) composed of multiple development disciplines such as Systems Engineering (SE), Flight Software (FSW), Safety and Mission Assurance (S&MA) and the major subsystems and vehicle elements such as Main Propulsion Systems (MPS), boosters, avionics, Guidance, Navigation, and Control (GNC), Thrust Vector Control (TVC), and liquid engines. These model based algorithms and their development lifecycle from inception through Flight Software certification are an important focus of this development effort to further insure reliable detection and response to off-nominal vehicle states during all phases of vehicle operation from pre-launch through end of flight. NASA formed a dedicated M&FM team for addressing fault management early in the development lifecycle for the SLS initiative. As part of the development of the M&FM capabilities, this team has developed a dedicated testbed that integrates specific M&FM algorithms, specialized nominal and off-nominal test cases, and vendor-supplied physics-based launch vehicle subsystem models. Additionally, the team has developed processes for implementing and validating these algorithms for concept validation and risk reduction for the SLS program. The flexibility of the Vehicle Management End-to-end Testbed (VMET) enables thorough testing of the M&FM algorithms by providing configurable suites of both nominal and off-nominal test cases to validate the developed algorithms utilizing actual subsystem models such as MPS. The intent of VMET is to validate the M&FM algorithms and substantiate them with performance baselines for each of the target vehicle subsystems in an independent platform exterior to the flight software development infrastructure and its related testing entities. In any software development process there is inherent risk in the interpretation and implementation of concepts into software through requirements and test cases into flight software compounded with potential human errors throughout the development lifecycle. Risk reduction is addressed by the M&FM analysis group working with other organizations such as S&MA, Structures and Environments, GNC, Orion, the Crew Office, Flight Operations, and Ground Operations by assessing performance of the M&FM algorithms in terms of their ability to reduce Loss of Mission and Loss of Crew probabilities. In addition, through state machine and diagnostic modeling, analysis efforts investigate a broader suite of failure effects and associated detection and responses that can be tested in VMET to ensure that failures can be detected, and confirm that responses do not create additional risks or cause undesired states through interactive dynamic effects with other algorithms and systems. VMET further contributes to risk reduction by prototyping and exercising the M&FM algorithms early in their implementation and without any inherent hindrances such as meeting FSW processor scheduling constraints due to their target platform - ARINC 653 partitioned OS, resource limitations, and other factors related to integration with other subsystems not directly involved with M&FM such as telemetry packing and processing. The baseline plan for use of VMET encompasses testing the original M&FM algorithms coded in the same C++ language and state machine architectural concepts as that used by Flight Software. This enables the development of performance standards and test cases to characterize the M&FM algorithms and sets a benchmark from which to measure the effectiveness of M&FM algorithms performance in the FSW development and test processes.
Fault-tolerant Control of a Cyber-physical System

NASA Astrophysics Data System (ADS)

Roxana, Rusu-Both; Eva-Henrietta, Dulf

2017-10-01

Cyber-physical systems represent a new emerging field in automatic control. The fault system is a key component, because modern, large scale processes must meet high standards of performance, reliability and safety. Fault propagation in large scale chemical processes can lead to loss of production, energy, raw materials and even environmental hazard. The present paper develops a multi-agent fault-tolerant control architecture using robust fractional order controllers for a (13C) cryogenic separation column cascade. The JADE (Java Agent DEvelopment Framework) platform was used to implement the multi-agent fault tolerant control system while the operational model of the process was implemented in Matlab/SIMULINK environment. MACSimJX (Multiagent Control Using Simulink with Jade Extension) toolbox was used to link the control system and the process model. In order to verify the performance and to prove the feasibility of the proposed control architecture several fault simulation scenarios were performed.
Fault-tolerant building-block computer study

NASA Technical Reports Server (NTRS)

Rennels, D. A.

1978-01-01

Ultra-reliable core computers are required for improving the reliability of complex military systems. Such computers can provide reliable fault diagnosis, failure circumvention, and, in some cases serve as an automated repairman for their host systems. A small set of building-block circuits which can be implemented as single very large integration devices, and which can be used with off-the-shelf microprocessors and memories to build self checking computer modules (SCCM) is described. Each SCCM is a microcomputer which is capable of detecting its own faults during normal operation and is described to communicate with other identical modules over one or more Mil Standard 1553A buses. Several SCCMs can be connected into a network with backup spares to provide fault-tolerant operation, i.e. automated recovery from faults. Alternative fault-tolerant SCCM configurations are discussed along with the cost and reliability associated with their implementation.

Automatic Fault Recognition of Photovoltaic Modules Based on Statistical Analysis of Uav Thermography

NASA Astrophysics Data System (ADS)

Kim, D.; Youn, J.; Kim, C.

2017-08-01

As a malfunctioning PV (Photovoltaic) cell has a higher temperature than adjacent normal cells, we can detect it easily with a thermal infrared sensor. However, it will be a time-consuming way to inspect large-scale PV power plants by a hand-held thermal infrared sensor. This paper presents an algorithm for automatically detecting defective PV panels using images captured with a thermal imaging camera from an UAV (unmanned aerial vehicle). The proposed algorithm uses statistical analysis of thermal intensity (surface temperature) characteristics of each PV module to verify the mean intensity and standard deviation of each panel as parameters for fault diagnosis. One of the characteristics of thermal infrared imaging is that the larger the distance between sensor and target, the lower the measured temperature of the object. Consequently, a global detection rule using the mean intensity of all panels in the fault detection algorithm is not applicable. Therefore, a local detection rule based on the mean intensity and standard deviation range was developed to detect defective PV modules from individual array automatically. The performance of the proposed algorithm was tested on three sample images; this verified a detection accuracy of defective panels of 97 % or higher. In addition, as the proposed algorithm can adjust the range of threshold values for judging malfunction at the array level, the local detection rule is considered better suited for highly sensitive fault detection compared to a global detection rule.
Eddy current loss analysis of open-slot fault-tolerant permanent-magnet machines based on conformal mapping method

NASA Astrophysics Data System (ADS)

Ji, Jinghua; Luo, Jianhua; Lei, Qian; Bian, Fangfang

2017-05-01

This paper proposed an analytical method, based on conformal mapping (CM) method, for the accurate evaluation of magnetic field and eddy current (EC) loss in fault-tolerant permanent-magnet (FTPM) machines. The aim of modulation function, applied in CM method, is to change the open-slot structure into fully closed-slot structure, whose air-gap flux density is easy to calculate analytically. Therefore, with the help of Matlab Schwarz-Christoffel (SC) Toolbox, both the magnetic flux density and EC density of FTPM machine are obtained accurately. Finally, time-stepped transient finite-element method (FEM) is used to verify the theoretical analysis, showing that the proposed method is able to predict the magnetic flux density and EC loss precisely.
Fault Diagnostics for Turbo-Shaft Engine Sensors Based on a Simplified On-Board Model

PubMed Central

Lu, Feng; Huang, Jinquan; Xing, Yaodong

2012-01-01

Combining a simplified on-board turbo-shaft model with sensor fault diagnostic logic, a model-based sensor fault diagnosis method is proposed. The existing fault diagnosis method for turbo-shaft engine key sensors is mainly based on a double redundancies technique, and this can't be satisfied in some occasions as lack of judgment. The simplified on-board model provides the analytical third channel against which the dual channel measurements are compared, while the hardware redundancy will increase the structure complexity and weight. The simplified turbo-shaft model contains the gas generator model and the power turbine model with loads, this is built up via dynamic parameters method. Sensor fault detection, diagnosis (FDD) logic is designed, and two types of sensor failures, such as the step faults and the drift faults, are simulated. When the discrepancy among the triplex channels exceeds a tolerance level, the fault diagnosis logic determines the cause of the difference. Through this approach, the sensor fault diagnosis system achieves the objectives of anomaly detection, sensor fault diagnosis and redundancy recovery. Finally, experiments on this method are carried out on a turbo-shaft engine, and two types of faults under different channel combinations are presented. The experimental results show that the proposed method for sensor fault diagnostics is efficient. PMID:23112645
Fault diagnostics for turbo-shaft engine sensors based on a simplified on-board model.

PubMed

Lu, Feng; Huang, Jinquan; Xing, Yaodong

2012-01-01

Combining a simplified on-board turbo-shaft model with sensor fault diagnostic logic, a model-based sensor fault diagnosis method is proposed. The existing fault diagnosis method for turbo-shaft engine key sensors is mainly based on a double redundancies technique, and this can't be satisfied in some occasions as lack of judgment. The simplified on-board model provides the analytical third channel against which the dual channel measurements are compared, while the hardware redundancy will increase the structure complexity and weight. The simplified turbo-shaft model contains the gas generator model and the power turbine model with loads, this is built up via dynamic parameters method. Sensor fault detection, diagnosis (FDD) logic is designed, and two types of sensor failures, such as the step faults and the drift faults, are simulated. When the discrepancy among the triplex channels exceeds a tolerance level, the fault diagnosis logic determines the cause of the difference. Through this approach, the sensor fault diagnosis system achieves the objectives of anomaly detection, sensor fault diagnosis and redundancy recovery. Finally, experiments on this method are carried out on a turbo-shaft engine, and two types of faults under different channel combinations are presented. The experimental results show that the proposed method for sensor fault diagnostics is efficient.
The Seismicity of the Central Apennines Region Studied by Means of a Physics-Based Earthquake Simulator

NASA Astrophysics Data System (ADS)

Console, R.; Vannoli, P.; Carluccio, R.

2016-12-01

The application of a physics-based earthquake simulation algorithm to the central Apennines region, where the 24 August 2016 Amatrice earthquake occurred, allowed the compilation of a synthetic seismic catalog lasting 100 ky, and containing more than 500,000 M ≥ 4.0 events, without the limitations that real catalogs suffer in terms of completeness, homogeneity and time duration. The algorithm on which this simulator is based is constrained by several physical elements as: (a) an average slip rate for every single fault in the investigated fault systems, (b) the process of rupture growth and termination, leading to a self-organized earthquake magnitude distribution, and (c) interaction between earthquake sources, including small magnitude events. Events nucleated in one fault are allowed to expand into neighboring faults, even belonging to a different fault system, if they are separated by less than a given maximum distance. The seismogenic model upon which we applied the simulator code, was derived from the DISS 3.2.0 database (http://diss.rm.ingv.it/diss/), selecting all the fault systems that are recognized in the central Apennines region, for a total of 24 fault systems. The application of our simulation algorithm provides typical features in time, space and magnitude behavior of the seismicity, which are comparable with those of real observations. These features include long-term periodicity and clustering of strong earthquakes, and a realistic earthquake magnitude distribution departing from the linear Gutenberg-Richter distribution in the moderate and higher magnitude range. The statistical distribution of earthquakes with M ≥ 6.0 on single faults exhibits a fairly clear pseudo-periodic behavior, with a coefficient of variation Cv of the order of 0.3-0.6. We found in our synthetic catalog a clear trend of long-term acceleration of seismic activity preceding M ≥ 6.0 earthquakes and quiescence following those earthquakes. Lastly, as an example of a possible use of synthetic catalogs, an attenuation law was applied to all the events reported in the synthetic catalog for the production of maps showing the exceedence probability of given values of peak acceleration (PGA) on the territory under investigation. The application of a physics-based earthquake simulation algorithm to the central Apennines region, where the 24 August 2016 Amatrice earthquake occurred, allowed the compilation of a synthetic seismic catalog lasting 100 ky, and containing more than 500,000 M ≥ 4.0 events, without the limitations that real catalogs suffer in terms of completeness, homogeneity and time duration. The algorithm on which this simulator is based is constrained by several physical elements as: (a) an average slip rate for every single fault in the investigated fault systems, (b) the process of rupture growth and termination, leading to a self-organized earthquake magnitude distribution, and (c) interaction between earthquake sources, including small magnitude events. Events nucleated in one fault are allowed to expand into neighboring faults, even belonging to a different fault system, if they are separated by less than a given maximum distance. The seismogenic model upon which we applied the simulator code, was derived from the DISS 3.2.0 database (http://diss.rm.ingv.it/diss/), selecting all the fault systems that are recognized in the central Apennines region, for a total of 24 fault systems. The application of our simulation algorithm provides typical features in time, space and magnitude behavior of the seismicity, which are comparable with those of real observations. These features include long-term periodicity and clustering of strong earthquakes, and a realistic earthquake magnitude distribution departing from the linear Gutenberg-Richter distribution in the moderate and higher magnitude range. The statistical distribution of earthquakes with M ≥ 6.0 on single faults exhibits a fairly clear pseudo-periodic behavior, with a coefficient of variation Cv of the order of 0.3-0.6. We found in our synthetic catalog a clear trend of long-term acceleration of seismic activity preceding M ≥ 6.0 earthquakes and quiescence following those earthquakes. Lastly, as an example of a possible use of synthetic catalogs, an attenuation law was applied to all the events reported in the synthetic catalog for the production of maps showing the exceedence probability of given values of peak acceleration (PGA) on the territory under investigation.
Evaluation of fault-tolerant parallel-processor architectures over long space missions

NASA Technical Reports Server (NTRS)

Johnson, Sally C.

1989-01-01

The impact of a five year space mission environment on fault-tolerant parallel processor architectures is examined. The target application is a Strategic Defense Initiative (SDI) satellite requiring 256 parallel processors to provide the computation throughput. The reliability requirements are that the system still be operational after five years with .99 probability and that the probability of system failure during one-half hour of full operation be less than 10(-7). The fault tolerance features an architecture must possess to meet these reliability requirements are presented, many potential architectures are briefly evaluated, and one candidate architecture, the Charles Stark Draper Laboratory's Fault-Tolerant Parallel Processor (FTPP) is evaluated in detail. A methodology for designing a preliminary system configuration to meet the reliability and performance requirements of the mission is then presented and demonstrated by designing an FTPP configuration.
Propulsion Health Monitoring for Enhanced Safety

NASA Technical Reports Server (NTRS)

Butz, Mark G.; Rodriguez, Hector M.

2003-01-01

This report presents the results of the NASA contract Propulsion System Health Management for Enhanced Safety performed by General Electric Aircraft Engines (GE AE), General Electric Global Research (GE GR), and Pennsylvania State University Applied Research Laboratory (PSU ARL) under the NASA Aviation Safety Program. This activity supports the overall goal of enhanced civil aviation safety through a reduction in the occurrence of safety-significant propulsion system malfunctions. Specific objectives are to develop and demonstrate vibration diagnostics techniques for the on-line detection of turbine rotor disk cracks, and model-based fault tolerant control techniques for the prevention and mitigation of in-flight engine shutdown, surge/stall, and flameout events. The disk crack detection work was performed by GE GR which focused on a radial-mode vibration monitoring technique, and PSU ARL which focused on a torsional-mode vibration monitoring technique. GE AE performed the Model-Based Fault Tolerant Control work which focused on the development of analytical techniques for detecting, isolating, and accommodating gas-path faults.
Wavelet subspace decomposition of thermal infrared images for defect detection in artworks

NASA Astrophysics Data System (ADS)

Ahmad, M. Z.; Khan, A. A.; Mezghani, S.; Perrin, E.; Mouhoubi, K.; Bodnar, J. L.; Vrabie, V.

2016-07-01

Health of ancient artworks must be routinely monitored for their adequate preservation. Faults in these artworks may develop over time and must be identified as precisely as possible. The classical acoustic testing techniques, being invasive, risk causing permanent damage during periodic inspections. Infrared thermometry offers a promising solution to map faults in artworks. It involves heating the artwork and recording its thermal response using infrared camera. A novel strategy based on pseudo-random binary excitation principle is used in this work to suppress the risks associated with prolonged heating. The objective of this work is to develop an automatic scheme for detecting faults in the captured images. An efficient scheme based on wavelet based subspace decomposition is developed which favors identification of, the otherwise invisible, weaker faults. Two major problems addressed in this work are the selection of the optimal wavelet basis and the subspace level selection. A novel criterion based on regional mutual information is proposed for the latter. The approach is successfully tested on a laboratory based sample as well as real artworks. A new contrast enhancement metric is developed to demonstrate the quantitative efficiency of the algorithm. The algorithm is successfully deployed for both laboratory based and real artworks.
Fault Identification by Unsupervised Learning Algorithm

NASA Astrophysics Data System (ADS)

Nandan, S.; Mannu, U.

2012-12-01

Contemporary fault identification techniques predominantly rely on the surface expression of the fault. This biased observation is inadequate to yield detailed fault structures in areas with surface cover like cities deserts vegetation etc and the changes in fault patterns with depth. Furthermore it is difficult to estimate faults structure which do not generate any surface rupture. Many disastrous events have been attributed to these blind faults. Faults and earthquakes are very closely related as earthquakes occur on faults and faults grow by accumulation of coseismic rupture. For a better seismic risk evaluation it is imperative to recognize and map these faults. We implement a novel approach to identify seismically active fault planes from three dimensional hypocenter distribution by making use of unsupervised learning algorithms. We employ K-means clustering algorithm and Expectation Maximization (EM) algorithm modified to identify planar structures in spatial distribution of hypocenter after filtering out isolated events. We examine difference in the faults reconstructed by deterministic assignment in K- means and probabilistic assignment in EM algorithm. The method is conceptually identical to methodologies developed by Ouillion et al (2008, 2010) and has been extensively tested on synthetic data. We determined the sensitivity of the methodology to uncertainties in hypocenter location, density of clustering and cross cutting fault structures. The method has been applied to datasets from two contrasting regions. While Kumaon Himalaya is a convergent plate boundary, Koyna-Warna lies in middle of the Indian Plate but has a history of triggered seismicity. The reconstructed faults were validated by examining the fault orientation of mapped faults and the focal mechanism of these events determined through waveform inversion. The reconstructed faults could be used to solve the fault plane ambiguity in focal mechanism determination and constrain the fault orientations for finite source inversions. The faults produced by the method exhibited good correlation with the fault planes obtained by focal mechanism solutions and previously mapped faults.
General linear codes for fault-tolerant matrix operations on processor arrays

NASA Technical Reports Server (NTRS)

Nair, V. S. S.; Abraham, J. A.

1988-01-01

Various checksum codes have been suggested for fault-tolerant matrix computations on processor arrays. Use of these codes is limited due to potential roundoff and overflow errors. Numerical errors may also be misconstrued as errors due to physical faults in the system. In this a set of linear codes is identified which can be used for fault-tolerant matrix operations such as matrix addition, multiplication, transposition, and LU-decomposition, with minimum numerical error. Encoding schemes are given for some of the example codes which fall under the general set of codes. With the help of experiments, a rule of thumb for the selection of a particular code for a given application is derived.
Fault-tolerant, high-level quantum circuits: form, compilation and description

NASA Astrophysics Data System (ADS)

Paler, Alexandru; Polian, Ilia; Nemoto, Kae; Devitt, Simon J.

2017-06-01

Fault-tolerant quantum error correction is a necessity for any quantum architecture destined to tackle interesting, large-scale problems. Its theoretical formalism has been well founded for nearly two decades. However, we still do not have an appropriate compiler to produce a fault-tolerant, error-corrected description from a higher-level quantum circuit for state-of the-art hardware models. There are many technical hurdles, including dynamic circuit constructions that occur when constructing fault-tolerant circuits with commonly used error correcting codes. We introduce a package that converts high-level quantum circuits consisting of commonly used gates into a form employing all decompositions and ancillary protocols needed for fault-tolerant error correction. We call this form the (I)initialisation, (C)NOT, (M)measurement form (ICM) and consists of an initialisation layer of qubits into one of four distinct states, a massive, deterministic array of CNOT operations and a series of time-ordered X- or Z-basis measurements. The form allows a more flexible approach towards circuit optimisation. At the same time, the package outputs a standard circuit or a canonical geometric description which is a necessity for operating current state-of-the-art hardware architectures using topological quantum codes.
The environmental control and life support system advanced automation project. Phase 1: Application evaluation

NASA Technical Reports Server (NTRS)

Dewberry, Brandon S.

1990-01-01

The Environmental Control and Life Support System (ECLSS) is a Freedom Station distributed system with inherent applicability to advanced automation primarily due to the comparatively large reaction times of its subsystem processes. This allows longer contemplation times in which to form a more intelligent control strategy and to detect or prevent faults. The objective of the ECLSS Advanced Automation Project is to reduce the flight and ground manpower needed to support the initial and evolutionary ECLS system. The approach is to search out and make apparent those processes in the baseline system which are in need of more automatic control and fault detection strategies, to influence the ECLSS design by suggesting software hooks and hardware scars which will allow easy adaptation to advanced algorithms, and to develop complex software prototypes which fit into the ECLSS software architecture and will be shown in an ECLSS hardware testbed to increase the autonomy of the system. Covered here are the preliminary investigation and evaluation process, aimed at searching the ECLSS for candidate functions for automation and providing a software hooks and hardware scars analysis. This analysis shows changes needed in the baselined system for easy accommodation of knowledge-based or other complex implementations which, when integrated in flight or ground sustaining engineering architectures, will produce a more autonomous and fault tolerant Environmental Control and Life Support System.
Analysis on Behaviour of Wavelet Coefficient during Fault Occurrence in Transformer

NASA Astrophysics Data System (ADS)

Sreewirote, Bancha; Ngaopitakkul, Atthapol

2018-03-01

The protection system for transformer has play significant role in avoiding severe damage to equipment when disturbance occur and ensure overall system reliability. One of the methodology that widely used in protection scheme and algorithm is discrete wavelet transform. However, characteristic of coefficient under fault condition must be analyzed to ensure its effectiveness. So, this paper proposed study and analysis on wavelet coefficient characteristic when fault occur in transformer in both high- and low-frequency component from discrete wavelet transform. The effect of internal and external fault on wavelet coefficient of both fault and normal phase has been taken into consideration. The fault signal has been simulate using transmission connected to transformer experimental setup on laboratory level that modelled after actual system. The result in term of wavelet coefficient shown a clearly differentiate between wavelet characteristic in both high and low frequency component that can be used to further design and improve detection and classification algorithm that based on discrete wavelet transform methodology in the future.
Graph-based real-time fault diagnostics

NASA Technical Reports Server (NTRS)

Padalkar, S.; Karsai, G.; Sztipanovits, J.

1988-01-01

A real-time fault detection and diagnosis capability is absolutely crucial in the design of large-scale space systems. Some of the existing AI-based fault diagnostic techniques like expert systems and qualitative modelling are frequently ill-suited for this purpose. Expert systems are often inadequately structured, difficult to validate and suffer from knowledge acquisition bottlenecks. Qualitative modelling techniques sometimes generate a large number of failure source alternatives, thus hampering speedy diagnosis. In this paper we present a graph-based technique which is well suited for real-time fault diagnosis, structured knowledge representation and acquisition and testing and validation. A Hierarchical Fault Model of the system to be diagnosed is developed. At each level of hierarchy, there exist fault propagation digraphs denoting causal relations between failure modes of subsystems. The edges of such a digraph are weighted with fault propagation time intervals. Efficient and restartable graph algorithms are used for on-line speedy identification of failure source components.
Framework for a space shuttle main engine health monitoring system

NASA Technical Reports Server (NTRS)

Hawman, Michael W.; Galinaitis, William S.; Tulpule, Sharayu; Mattedi, Anita K.; Kamenetz, Jeffrey

1990-01-01

A framework developed for a health management system (HMS) which is directed at improving the safety of operation of the Space Shuttle Main Engine (SSME) is summarized. An emphasis was placed on near term technology through requirements to use existing SSME instrumentation and to demonstrate the HMS during SSME ground tests within five years. The HMS framework was developed through an analysis of SSME failure modes, fault detection algorithms, sensor technologies, and hardware architectures. A key feature of the HMS framework design is that a clear path from the ground test system to a flight HMS was maintained. Fault detection techniques based on time series, nonlinear regression, and clustering algorithms were developed and demonstrated on data from SSME ground test failures. The fault detection algorithms exhibited 100 percent detection of faults, had an extremely low false alarm rate, and were robust to sensor loss. These algorithms were incorporated into a hierarchical decision making strategy for overall assessment of SSME health. A preliminary design for a hardware architecture capable of supporting real time operation of the HMS functions was developed. Utilizing modular, commercial off-the-shelf components produced a reliable low cost design with the flexibility to incorporate advances in algorithm and sensor technology as they become available.
Fault tolerant linear actuator

DOEpatents

Tesar, Delbert

2004-09-14

In varying embodiments, the fault tolerant linear actuator of the present invention is a new and improved linear actuator with fault tolerance and positional control that may incorporate velocity summing, force summing, or a combination of the two. In one embodiment, the invention offers a velocity summing arrangement with a differential gear between two prime movers driving a cage, which then drives a linear spindle screw transmission. Other embodiments feature two prime movers driving separate linear spindle screw transmissions, one internal and one external, in a totally concentric and compact integrated module.
Sinusoidal synthesis based adaptive tracking for rotating machinery fault detection

NASA Astrophysics Data System (ADS)

Li, Gang; McDonald, Geoff L.; Zhao, Qing

2017-01-01

This paper presents a novel Sinusoidal Synthesis Based Adaptive Tracking (SSBAT) technique for vibration-based rotating machinery fault detection. The proposed SSBAT algorithm is an adaptive time series technique that makes use of both frequency and time domain information of vibration signals. Such information is incorporated in a time varying dynamic model. Signal tracking is then realized by applying adaptive sinusoidal synthesis to the vibration signal. A modified Least-Squares (LS) method is adopted to estimate the model parameters. In addition to tracking, the proposed vibration synthesis model is mainly used as a linear time-varying predictor. The health condition of the rotating machine is monitored by checking the residual between the predicted and measured signal. The SSBAT method takes advantage of the sinusoidal nature of vibration signals and transfers the nonlinear problem into a linear adaptive problem in the time domain based on a state-space realization. It has low computation burden and does not need a priori knowledge of the machine under the no-fault condition which makes the algorithm ideal for on-line fault detection. The method is validated using both numerical simulation and practical application data. Meanwhile, the fault detection results are compared with the commonly adopted autoregressive (AR) and autoregressive Minimum Entropy Deconvolution (ARMED) method to verify the feasibility and performance of the SSBAT method.
The scientific data acquisition system of the GAMMA-400 space project

NASA Astrophysics Data System (ADS)

Bobkov, S. G.; Serdin, O. V.; Gorbunov, M. S.; Arkhangelskiy, A. I.; Topchiev, N. P.

2016-02-01

The description of scientific data acquisition system (SDAS) designed by SRISA for the GAMMA-400 space project is presented. We consider the problem of different level electronics unification: the set of reliable fault-tolerant integrated circuits fabricated on Silicon-on-Insulator 0.25 mkm CMOS technology and the high-speed interfaces and reliable modules used in the space instruments. The characteristics of reliable fault-tolerant very large scale integration (VLSI) technology designed by SRISA for the developing of computation systems for space applications are considered. The scalable net structure of SDAS based on Serial RapidIO interface including real-time operating system BAGET is described too.
Lambda network having 2{sup m{minus}1} nodes in each of m stages with each node coupled to four other nodes for bidirectional routing of data packets between nodes

DOEpatents

Napolitano, L.M. Jr.

1995-11-28

The Lambda network is a single stage, packet-switched interprocessor communication network for a distributed memory, parallel processor computer. Its design arises from the desired network characteristics of minimizing mean and maximum packet transfer time, local routing, expandability, deadlock avoidance, and fault tolerance. The network is based on fixed degree nodes and has mean and maximum packet transfer distances where n is the number of processors. The routing method is detailed, as are methods for expandability, deadlock avoidance, and fault tolerance. 14 figs.
The Development of Design Tools for Fault Tolerant Quantum Dot Cellular Automata Based Logic

NASA Technical Reports Server (NTRS)

Armstrong, Curtis D.; Humphreys, William M.

2003-01-01

We are developing software to explore the fault tolerance of quantum dot cellular automata gate architectures in the presence of manufacturing variations and device defects. The Topology Optimization Methodology using Applied Statistics (TOMAS) framework extends the capabilities of the A Quantum Interconnected Network Array Simulator (AQUINAS) by adding front-end and back-end software and creating an environment that integrates all of these components. The front-end tools establish all simulation parameters, configure the simulation system, automate the Monte Carlo generation of simulation files, and execute the simulation of these files. The back-end tools perform automated data parsing, statistical analysis and report generation.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.