Science.gov

Sample records for fault-tolerant non-linear analytic

  1. Fault-tolerant flight control system combining expert system and analytical redundancy concepts

    NASA Technical Reports Server (NTRS)

    Handelman, Dave

    1987-01-01

    This research involves the development of a knowledge-based fault-tolerant flight control system. A software architecture is presented that integrates quantitative analytical redundancy techniques and heuristic expert system problem solving concepts for the purpose of in-flight, real-time failure accommodation.

  2. Formal specification of requirements for analytical redundancy-based fault-tolerant flight control systems

    NASA Astrophysics Data System (ADS)

    Del Gobbo, Diego

    2000-10-01

    Flight control systems are undergoing a rapid process of automation. The use of Fly-By-Wire digital flight control systems in commercial aviation (Airbus 320 and Boeing FBW-B777) is a clear sign of this trend. The increased automation goes in parallel with an increased complexity of flight control systems with obvious consequences on reliability and safety. Flight control systems must meet strict fault-tolerance requirements. The standard solution to achieving fault tolerance capability relies on multi-string architectures. On the other hand, multi-string architectures further increase the complexity of the system inducing a reduction of overall reliability. In the past two decades a variety of techniques based on analytical redundancy have been suggested for fault diagnosis purposes. While research on analytical redundancy has obtained desirable results, a design methodology involving requirements specification and feasibility analysis of analytical redundancy based fault tolerant flight control systems is missing. The main objective of this research work is to describe within a formal framework the implications of adopting analytical redundancy as a basis to achieve fault tolerance. The research activity involves analysis of the analytical redundancy approach, analysis of flight control system informal requirements, and re-engineering (modeling and specification) of the fault tolerance requirements. The USAF military specification MIL-F-9490D and supporting documents are adopted as source for the flight control informal requirements. The De Havilland DHC-2 general aviation aircraft equipped with standard autopilot control functions is adopted as pilot application. Relational algebra is adopted as formal framework for the specification of the requirements. The detailed analysis and formalization of the requirements resulted in a better definition of the fault tolerance problem in the framework of analytical redundancy. Fault tolerance requirements and related

  3. Rapid Non-Linear Uncertainty Propagation via Analytical Techniques

    NASA Astrophysics Data System (ADS)

    Fujimoto, K.; Scheeres, D. J.

    2012-09-01

    Space situational awareness (SSA) is known to be a data starved problem compared to traditional estimation problems in that observation gaps per object may span over days if not weeks. Therefore, consistent characterization of the uncertainty associated with these objects including non-linear effects is crucial in maintaining an accurate catalog of objects in Earth orbit. Simultaneously, the motion of satellites in Earth orbit is well-modeled in that it is particularly amenable to having their solution and their uncertainty described through analytic or semi-analytic techniques. Even when stronger non-gravitational perturbations such as solar radiation pressure and atmospheric drag are encountered, these perturbations generally have deterministic components that are substantially larger than their time-varying stochastic components. Analytic techniques are powerful because time propagation is only a matter of changing the time parameter, allowing for rapid computational turnaround. These two ideas are combined in this paper: a method of analytically propagating non-linear orbit uncertainties is discussed. In particular, the uncertainty is expressed as an analytic probability density function (pdf) for all time. For a deterministic system model, such pdfs may be obtained if the initial pdf and the system states for all time are also given analytically. Even when closed-form solutions are not available, approximate solutions exist in the form of Edgeworth series for pdfs and Taylor series for the states. The coefficients of the latter expansion are referred to as state transition tensors (STTs), which are a generalization of state transition matrices to arbitrary order. Analytically expressed pdfs can be incorporated in many practical tasks in SSA. One can compute the mean and covariance of the uncertainty, for example, with the moments of the initial pdf as inputs. This process does not involve any sampling and its accuracy can be determined a priori. Analytical

  4. Intelligent fault-tolerant controllers

    NASA Technical Reports Server (NTRS)

    Huang, Chien Y.

    1987-01-01

    A system with fault tolerant controls is one that can detect, isolate, and estimate failures and perform necessary control reconfiguration based on this new information. Artificial intelligence (AI) is concerned with semantic processing, and it has evolved to include the topics of expert systems and machine learning. This research represents an attempt to apply AI to fault tolerant controls, hence, the name intelligent fault tolerant control (IFTC). A generic solution to the problem is sought, providing a system based on logic in addition to analytical tools, and offering machine learning capabilities. The advantages are that redundant system specific algorithms are no longer needed, that reasonableness is used to quickly choose the correct control strategy, and that the system can adapt to new situations by learning about its effects on system dynamics.

  5. An aircraft sensor fault tolerant system

    NASA Technical Reports Server (NTRS)

    Caglayan, A. K.; Lancraft, R. E.

    1982-01-01

    The design of a sensor fault tolerant system which uses analytical redundancy for the Terminal Configured Vehicle (TCV) research aircraft in a Microwave Landing System (MLS) environment was studied. The fault tolerant system provides reliable estimates for aircraft position, velocity, and attitude in the presence of possible failures in navigation aid instruments and onboard sensors. The estimates, provided by the fault tolerant system, are used by the automated guidance and control system to land the aircraft along a prescribed path. Sensor failures are identified by utilizing the analytic relationship between the various sensor outputs arising from the aircraft equations of motion.

  6. Fault tolerant control of spacecraft

    NASA Astrophysics Data System (ADS)

    Godard

    Autonomous multiple spacecraft formation flying space missions demand the development of reliable control systems to ensure rapid, accurate, and effective response to various attitude and formation reconfiguration commands. Keeping in mind the complexities involved in the technology development to enable spacecraft formation flying, this thesis presents the development and validation of a fault tolerant control algorithm that augments the AOCS on-board a spacecraft to ensure that these challenging formation flying missions will fly successfully. Taking inspiration from the existing theory of nonlinear control, a fault-tolerant control system for the RyePicoSat missions is designed to cope with actuator faults whilst maintaining the desirable degree of overall stability and performance. Autonomous fault tolerant adaptive control scheme for spacecraft equipped with redundant actuators and robust control of spacecraft in underactuated configuration, represent the two central themes of this thesis. The developed algorithms are validated using a hardware-in-the-loop simulation. A reaction wheel testbed is used to validate the proposed fault tolerant attitude control scheme. A spacecraft formation flying experimental testbed is used to verify the performance of the proposed robust control scheme for underactuated spacecraft configurations. The proposed underactuated formation flying concept leads to more than 60% savings in fuel consumption when compared to a fully actuated spacecraft formation configuration. We also developed a novel attitude control methodology that requires only a single thruster to stabilize three axis attitude and angular velocity components of a spacecraft. Numerical simulations and hardware-in-the-loop experimental results along with rigorous analytical stability analysis shows that the proposed methodology will greatly enhance the reliability of the spacecraft, while allowing for potentially significant overall mission cost reduction.

  7. Fault tolerant linear actuator

    DOEpatents

    Tesar, Delbert

    2004-09-14

    In varying embodiments, the fault tolerant linear actuator of the present invention is a new and improved linear actuator with fault tolerance and positional control that may incorporate velocity summing, force summing, or a combination of the two. In one embodiment, the invention offers a velocity summing arrangement with a differential gear between two prime movers driving a cage, which then drives a linear spindle screw transmission. Other embodiments feature two prime movers driving separate linear spindle screw transmissions, one internal and one external, in a totally concentric and compact integrated module.

  8. Validated Fault Tolerant Architectures for Space Station

    NASA Technical Reports Server (NTRS)

    Lala, Jaynarayan H.

    1990-01-01

    Viewgraphs on validated fault tolerant architectures for space station are presented. Topics covered include: fault tolerance approach; advanced information processing system (AIPS); and fault tolerant parallel processor (FTPP).

  9. Design and validation of fault-tolerant flight systems

    NASA Technical Reports Server (NTRS)

    Finelli, George B.; Palumbo, Daniel L.

    1987-01-01

    NASA has undertaken the development of a methodology for the design of easily validated fault-tolerant systems which emphasizes validation processes that can be directly incorporated into the design process. Attention is presently given to the statistical issues arising in the validation of highly reliable fault-tolerant systems. Structured specification and design methodologies, mathematical proof techniques, analytical modeling, simulation/emulation, and physical testing, are all discussed. Important design factors associated with fault-tolerance are noted; synchronization and 'Byzantine resilience' must accompany fault tolerance.

  10. Approximate Analytical Solutions for Primary Chatter in the Non-Linear Metal Cutting Model

    NASA Astrophysics Data System (ADS)

    Warmiński, J.; Litak, G.; Cartmell, M. P.; Khanin, R.; Wiercigroch, M.

    2003-01-01

    This paper considers an accepted model of the metal cutting process dynamics in the context of an approximate analysis of the resulting non-linear differential equations of motion. The process model is based upon the established mechanics of orthogonal cutting and results in a pair of non-linear ordinary differential equations which are then restated in a form suitable for approximate analytical solution. The chosen solution technique is the perturbation method of multiple time scales and approximate closed-form solutions are generated for the most important non-resonant case. Numerical data are then substituted into the analytical solutions and key results are obtained and presented. Some comparisons between the exact numerical calculations for the forces involved and their reduced and simplified analytical counterparts are given. It is shown that there is almost no discernible difference between the two thus confirming the validity of the excitation functions adopted in the analysis for the data sets used, these being chosen to represent a real orthogonal cutting process. In an attempt to provide guidance for the selection of technological parameters for the avoidance of primary chatter, this paper determines for the first time the stability regions in terms of the depth of cut and the cutting speed co-ordinates.

  11. Fault-tolerant processing system

    NASA Technical Reports Server (NTRS)

    Palumbo, Daniel L. (Inventor)

    1996-01-01

    A fault-tolerant, fiber optic interconnect, or backplane, which serves as a via for data transfer between modules. Fault tolerance algorithms are embedded in the backplane by dividing the backplane into a read bus and a write bus and placing a redundancy management unit (RMU) between the read bus and the write bus so that all data transmitted by the write bus is subjected to the fault tolerance algorithms before the data is passed for distribution to the read bus. The RMU provides both backplane control and fault tolerance.

  12. SFT: Scalable Fault Tolerance

    SciTech Connect

    Petrini, Fabrizio; Nieplocha, Jarek; Tipparaju, Vinod

    2006-04-15

    In this paper we will present a new technology that we are currently developing within the SFT: Scalable Fault Tolerance FastOS project which seeks to implement fault tolerance at the operating system level. Major design goals include dynamic reallocation of resources to allow continuing execution in the presence of hardware failures, very high scalability, high efficiency (low overhead), and transparency—requiring no changes to user applications. Our technology is based on a global coordination mechanism, that enforces transparent recovery lines in the system, and TICK, a lightweight, incremental checkpointing software architecture implemented as a Linux kernel module. TICK is completely user-transparent and does not require any changes to user code or system libraries; it is highly responsive: an interrupt, such as a timer interrupt, can trigger a checkpoint in as little as 2.5μs; and it supports incremental and full checkpoints with minimal overhead—less than 6% with full checkpointing to disk performed as frequently as once per minute.

  13. Fault Tolerant State Machines

    NASA Technical Reports Server (NTRS)

    Burke, Gary R.; Taft, Stephanie

    2004-01-01

    State machines are commonly used to control sequential logic in FPGAs and ASKS. An errant state machine can cause considerable damage to the device it is controlling. For example in space applications, the FPGA might be controlling Pyros, which when fired at the wrong time will cause a mission failure. Even a well designed state machine can be subject to random errors us a result of SEUs from the radiation environment in space. There are various ways to encode the states of a state machine, and the type of encoding makes a large difference in the susceptibility of the state machine to radiation. In this paper we compare 4 methods of state machine encoding and find which method gives the best fault tolerance, as well as determining the resources needed for each method.

  14. Robot Position Sensor Fault Tolerance

    NASA Technical Reports Server (NTRS)

    Aldridge, Hal A.

    1997-01-01

    Robot systems in critical applications, such as those in space and nuclear environments, must be able to operate during component failure to complete important tasks. One failure mode that has received little attention is the failure of joint position sensors. Current fault tolerant designs require the addition of directly redundant position sensors which can affect joint design. A new method is proposed that utilizes analytical redundancy to allow for continued operation during joint position sensor failure. Joint torque sensors are used with a virtual passive torque controller to make the robot joint stable without position feedback and improve position tracking performance in the presence of unknown link dynamics and end-effector loading. Two Cartesian accelerometer based methods are proposed to determine the position of the joint. The joint specific position determination method utilizes two triaxial accelerometers attached to the link driven by the joint with the failed position sensor. The joint specific method is not computationally complex and the position error is bounded. The system wide position determination method utilizes accelerometers distributed on different robot links and the end-effector to determine the position of sets of multiple joints. The system wide method requires fewer accelerometers than the joint specific method to make all joint position sensors fault tolerant but is more computationally complex and has lower convergence properties. Experiments were conducted on a laboratory manipulator. Both position determination methods were shown to track the actual position satisfactorily. A controller using the position determination methods and the virtual passive torque controller was able to servo the joints to a desired position during position sensor failure.

  15. Fault-tolerant rotary actuator

    DOEpatents

    Tesar, Delbert

    2006-10-17

    A fault-tolerant actuator module, in a single containment shell, containing two actuator subsystems that are either asymmetrically or symmetrically laid out is provided. Fault tolerance in the actuators of the present invention is achieved by the employment of dual sets of equal resources. Dual resources are integrated into single modules, with each having the external appearance and functionality of a single set of resources.

  16. Reinitialization issues in fault tolerant systems

    NASA Technical Reports Server (NTRS)

    Caglayan, A. K.; Lancraft, R. E.

    1983-01-01

    This paper is concerned with the reinitialization of fault tolerant systems in which detection and isolation (FDI) techniques are used, on-line, to identify and compensate for system failures. Specifically, it will focus on FDI techniques which utilize analytic redundancy, arising from a knowledge of the plant dynamics, by analyzing the residuals of a no-fail filter designed on the assumption of no failures. In these types of fault tolerant systems, system failures have to propagate through the no-fail filter dynamics in order to get detected. Therefore, the no-fail filter must be reinitialized after the isolation of a failure so that the accumulated effects of the failure are removed. In this paper, various approaches to this reinitialization problem will be discussed.

  17. Genetic programming as an analytical tool for non-linear dielectric spectroscopy.

    PubMed

    Woodward, A M; Gilbert, R J; Kell, D B

    1999-05-01

    By modelling the non-linear effects of membranous enzymes on an applied oscillating electromagnetic field using supervised multivariate analysis methods, Non-Linear Dielectric Spectroscopy (NLDS) has previously been shown to produce quantitative information that is indicative of the metabolic state of various organisms. The use of Genetic Programming (GP) for the multivariate analysis of NLDS data recorded from yeast fermentations is discussed, and GPs are compared with previous results using Partial Least Squares (PLS) and Artificial Neural Nets (NN). GP considerably outperforms these methods, both in terms of the precision of the predictions and their interpretability. PMID:10379559

  18. Implementing fault-tolerant sensors

    NASA Technical Reports Server (NTRS)

    Marzullo, Keith

    1989-01-01

    One aspect of fault tolerance in process control programs is the ability to tolerate sensor failure. A methodology is presented for transforming a process control program that cannot tolerate sensor failures to one that can. Additionally, a hierarchy of failure models is identified.

  19. Chip level simulation of fault tolerant computers

    NASA Technical Reports Server (NTRS)

    Armstrong, J. R.

    1982-01-01

    Chip-level modeling techniques in the evaluation of fault tolerant systems were researched. A fault tolerant computer was modeled. An efficient approach to functional fault simulation was developed. Simulation software was also developed.

  20. Fault-Tolerant Heat Exchanger

    NASA Technical Reports Server (NTRS)

    Izenson, Michael G.; Crowley, Christopher J.

    2005-01-01

    A compact, lightweight heat exchanger has been designed to be fault-tolerant in the sense that a single-point leak would not cause mixing of heat-transfer fluids. This particular heat exchanger is intended to be part of the temperature-regulation system for habitable modules of the International Space Station and to function with water and ammonia as the heat-transfer fluids. The basic fault-tolerant design is adaptable to other heat-transfer fluids and heat exchangers for applications in which mixing of heat-transfer fluids would pose toxic, explosive, or other hazards: Examples could include fuel/air heat exchangers for thermal management on aircraft, process heat exchangers in the cryogenic industry, and heat exchangers used in chemical processing. The reason this heat exchanger can tolerate a single-point leak is that the heat-transfer fluids are everywhere separated by a vented volume and at least two seals. The combination of fault tolerance, compactness, and light weight is implemented in a unique heat-exchanger core configuration: Each fluid passage is entirely surrounded by a vented region bridged by solid structures through which heat is conducted between the fluids. Precise, proprietary fabrication techniques make it possible to manufacture the vented regions and heat-conducting structures with very small dimensions to obtain a very large coefficient of heat transfer between the two fluids. A large heat-transfer coefficient favors compact design by making it possible to use a relatively small core for a given heat-transfer rate. Calculations and experiments have shown that in most respects, the fault-tolerant heat exchanger can be expected to equal or exceed the performance of the non-fault-tolerant heat exchanger that it is intended to supplant (see table). The only significant disadvantages are a slight weight penalty and a small decrease in the mass-specific heat transfer.

  1. a Frequency Domain Based NUMERIC-ANALYTICAL Method for Non-Linear Dynamical Systems

    NASA Astrophysics Data System (ADS)

    Narayanan, S.; Sekar, P.

    1998-04-01

    In this paper a multiharmonic balancing technique is used to develop certain algorithms to determine periodic orbits of non-liner dynamical systems with external, parametric and self excitations. Essentially, in this method the non-linear differential equations are transformed into a set of non-linear algebraic equations in terms of the Fourier coefficients of the periodic solutions which are solved by using the Newton-Raphson technique. The method is developed such that both fast Fourier transform and discrete Fourier transform algorithms can be used. It is capable of treating all types of non-linearities and higher dimensional systems. The stability of periodic orbits is investigated by obtaining the monodromy matrix. A path following algorithm based on the predictor-corrector method is also presented to enable the bifurcation analysis. The prediction is done with a cubic extrapolation technique with an arc length incrementation while the correction is done with the use of the least square minimisation technique. The under determined system of equations is solved by singular value decomposition. The suitability of the method is demonstrated by obtaining the bifurcational behaviour of rolling contact vibrations modelled by Hertz contact law.

  2. Fault tolerant software modules for SIFT

    NASA Technical Reports Server (NTRS)

    Hecht, M.; Hecht, H.

    1982-01-01

    The implementation of software fault tolerance is investigated for critical modules of the Software Implemented Fault Tolerance (SIFT) operating system to support the computational and reliability requirements of advanced fly by wire transport aircraft. Fault tolerant designs generated for the error reported and global executive are examined. A description of the alternate routines, implementation requirements, and software validation are included.

  3. Unique signature of bivalent analyte surface plasmon resonance model: A model governed by non-linear differential equations

    NASA Astrophysics Data System (ADS)

    Tiwari, Purushottam; Wang, Xuewen; Darici, Yesim; He, Jin; Uren, Aykut

    Surface plasmon resonance (SPR) is a biophysical technique for the quantitative analysis of bimolecular interactions. Correct identification of the binding model is crucial for the interpretation of SPR data. Bivalent SPR model is governed by non-linear differential equations, which, in general, have no analytical solutions. Therefore, an analytical based approach cannot be employed in order to identify this particular model. There exists a unique signature in the bivalent analyte model, existence of an `optimal analyte concentration', which can distinguish this model from other biphasic models. The unambiguous identification and related analysis of the bivalent analyte model is demonstrated by using theoretical simulations and experimentally measured SPR sensorgrams. Experimental SPR sensorgrams were measured by using Biacore T200 instrument available in Biacore Molecular Interaction Shared Resource facility, supported by NIH Grant P30CA51008, at Georgetown University.

  4. On three-dimensional flow and heat transfer over a non-linearly stretching sheet: analytical and numerical solutions.

    PubMed

    Khan, Junaid Ahmad; Mustafa, Meraj; Hayat, Tasawar; Alsaedi, Ahmed

    2014-01-01

    This article studies the viscous flow and heat transfer over a plane horizontal surface stretched non-linearly in two lateral directions. Appropriate wall conditions characterizing the non-linear variation in the velocity and temperature of the sheet are employed for the first time. A new set of similarity variables is introduced to reduce the boundary layer equations into self-similar forms. The velocity and temperature distributions are determined by two methods, namely (i) optimal homotopy analysis method (OHAM) and (ii) fourth-fifth-order Runge-Kutta integration based shooting technique. The analytic and numerical solutions are compared and these are found in excellent agreement. Influences of embedded parameters on momentum and thermal boundary layers are sketched and discussed. PMID:25198696

  5. On Three-Dimensional Flow and Heat Transfer over a Non-Linearly Stretching Sheet: Analytical and Numerical Solutions

    PubMed Central

    Khan, Junaid Ahmad; Mustafa, Meraj; Hayat, Tasawar; Alsaedi, Ahmed

    2014-01-01

    This article studies the viscous flow and heat transfer over a plane horizontal surface stretched non-linearly in two lateral directions. Appropriate wall conditions characterizing the non-linear variation in the velocity and temperature of the sheet are employed for the first time. A new set of similarity variables is introduced to reduce the boundary layer equations into self-similar forms. The velocity and temperature distributions are determined by two methods, namely (i) optimal homotopy analysis method (OHAM) and (ii) fourth-fifth-order Runge-Kutta integration based shooting technique. The analytic and numerical solutions are compared and these are found in excellent agreement. Influences of embedded parameters on momentum and thermal boundary layers are sketched and discussed. PMID:25198696

  6. Parallel fault-tolerant robot control

    NASA Technical Reports Server (NTRS)

    Hamilton, D. L.; Bennett, J. K.; Walker, I. D.

    1992-01-01

    A shared memory multiprocessor architecture is used to develop a parallel fault-tolerant robot controller. Several versions of the robot controller are developed and compared. A robot simulation is also developed for control observation. Comparison of a serial version of the controller and a parallel version without fault tolerance showed the speedup possible with the coarse-grained parallelism currently employed. The performance degradation due to the addition of processor fault tolerance was demonstrated by comparison of these controllers with their fault-tolerant versions. Comparison of the more fault-tolerant controller with the lower-level fault-tolerant controller showed how varying the amount of redundant data affects performance. The results demonstrate the trade-off between speed performance and processor fault tolerance.

  7. Software Fault Tolerance: A Tutorial

    NASA Technical Reports Server (NTRS)

    Torres-Pomales, Wilfredo

    2000-01-01

    Because of our present inability to produce error-free software, software fault tolerance is and will continue to be an important consideration in software systems. The root cause of software design errors is the complexity of the systems. Compounding the problems in building correct software is the difficulty in assessing the correctness of software for highly complex systems. After a brief overview of the software development processes, we note how hard-to-detect design faults are likely to be introduced during development and how software faults tend to be state-dependent and activated by particular input sequences. Although component reliability is an important quality measure for system level analysis, software reliability is hard to characterize and the use of post-verification reliability estimates remains a controversial issue. For some applications software safety is more important than reliability, and fault tolerance techniques used in those applications are aimed at preventing catastrophes. Single version software fault tolerance techniques discussed include system structuring and closure, atomic actions, inline fault detection, exception handling, and others. Multiversion techniques are based on the assumption that software built differently should fail differently and thus, if one of the redundant versions fails, it is expected that at least one of the other versions will provide an acceptable output. Recovery blocks, N-version programming, and other multiversion techniques are reviewed.

  8. Hardware and software fault tolerance - A unified architectural approach

    NASA Technical Reports Server (NTRS)

    Lala, Jaynarayan H.; Alger, Linda S.

    1988-01-01

    The loss of hardware fault tolerance which often arises when design diversity is used to improve the fault tolerance of computer software is considered analytically, and a unified design approach is proposed to avoid the problem. The fundamental theory of fault-tolerant (FT) architectures is reviewed; the current status of design-diversity software development is surveyed; and the FT-processor/attached-processor (FTP/AP) architecture developed by Lala et al. (1986) is described in detail and illustrated with diagrams. FTP/AP is shown to permit efficient implementation of N-version FT software while still tolerating random hardware failures with very high coverage; the reliability is found to be significantly higher than that of conventional majority-vote N-version software.

  9. A Log-Scaling Fault Tolerant Agreement Algorithm for a Fault Tolerant MPI

    SciTech Connect

    Hursey, Joshua J; Naughton, III, Thomas J; Vallee, Geoffroy R; Graham, Richard L

    2011-01-01

    The lack of fault tolerance is becoming a limiting factor for application scalability in HPC systems. The MPI does not provide standardized fault tolerance interfaces and semantics. The MPI Forum's Fault Tolerance Working Group is proposing a collective fault tolerant agreement algorithm for the next MPI standard. Such algorithms play a central role in many fault tolerant applications. This paper combines a log-scaling two-phase commit agreement algorithm with a reduction operation to provide the necessary functionality for the new collective without any additional messages. Error handling mechanisms are described that preserve the fault tolerance properties while maintaining overall scalability.

  10. Fault-tolerant parallel processor

    SciTech Connect

    Harper, R.E.; Lala, J.H. )

    1991-06-01

    This paper addresses issues central to the design and operation of an ultrareliable, Byzantine resilient parallel computer. Interprocessor connectivity requirements are met by treating connectivity as a resource that is shared among many processing elements, allowing flexibility in their configuration and reducing complexity. Redundant groups are synchronized solely by message transmissions and receptions, which aslo provide input data consistency and output voting. Reliability analysis results are presented that demonstrate the reduced failure probability of such a system. Performance analysis results are presented that quantify the temporal overhead involved in executing such fault-tolerance-specific operations. Empirical performance measurements of prototypes of the architecture are presented. 30 refs.

  11. The fault-tolerant multiprocessor computer

    NASA Technical Reports Server (NTRS)

    Smith, T. B., III (Editor); Lala, J. H. (Editor); Goldberg, J. (Editor); Kautz, W. H. (Editor); Melliar-Smith, P. M. (Editor); Green, M. W. (Editor); Levitt, K. N. (Editor); Schwartz, R. L. (Editor); Weinstock, C. B. (Editor); Palumbo, D. L. (Editor)

    1986-01-01

    The development and evaluation of fault-tolerant computer architectures and software-implemented fault tolerance (SIFT) for use in advanced NASA vehicles and potentially in flight-control systems are described in a collection of previously published reports prepared for NASA. Topics addressed include the principles of fault-tolerant multiprocessor (FTMP) operation; processor and slave regional designs; FTMP executive, facilities, acceptance-test/diagnostic, applications, and support software; FTM reliability and availability models; SIFT hardware design; and SIFT validation and verification.

  12. Exact analytic solution for non-linear density fluctuation in a ΛCDM universe

    NASA Astrophysics Data System (ADS)

    Yoo, Jaiyul; Gong, Jinn-Ouk

    2016-07-01

    We derive the exact third-order analytic solution of the matter density fluctuation in the proper-time hypersurface in a ΛCDM universe, accounting for the explicit time-dependence and clarifying the relation to the initial condition. Furthermore, we compare our analytic solution to the previous calculation in the comoving gauge, and to the standard Newtonian perturbation theory by providing Fourier kernels for the relativistic effects. Our results provide an essential ingredient for a complete description of galaxy bias in the relativistic context.

  13. Analytical solutions for non-linear differential equations with the help of a digital computer

    NASA Technical Reports Server (NTRS)

    Cromwell, P. C.

    1964-01-01

    A technique was developed with the help of a digital computer for analytic (algebraic) solutions of autonomous and nonautonomous equations. Two operational transform techniques have been programmed for the solution of these equations. Only relatively simple nonlinear differential equations have been considered. In the cases considered it has been possible to assimilate the secular terms into the solutions. For cases where f(t) is not a bounded function, a direct series solution is developed which can be shown to be an analytic function. All solutions have been checked against results obtained by numerical integration for given initial conditions and constants. It is evident that certain nonlinear differential equations can be solved with the help of a digital computer.

  14. Stress analysis for multilayered coating systems using semi-analytical BEM with geometric non-linearities

    NASA Astrophysics Data System (ADS)

    Zhang, Yao-Ming; Gu, Yan; Chen, Jeng-Tzong

    2011-05-01

    For a long time, most of the current numerical methods, including the finite element method, have not been efficient to analyze stress fields of very thin structures, such as the problems of thin coatings and their interfacial/internal mechanics. In this paper, the boundary element method for 2-D elastostatic problems is studied for the analysis of multi-coating systems. The nearly singular integrals, which is the primary obstacle associated with the BEM formulations, are dealt with efficiently by using a semi-analytical algorithm. The proposed semi-analytical integral formulas, compared with current analytical methods in the BEM literature, are suitable for high-order geometry elements when nearly singular integrals need to be calculated. Owing to the employment of the curved surface elements, only a small number of elements need to be divided along the boundary, and high accuracy can be achieved without increasing more computational efforts. For the test problems studied, very promising results are obtained when the thickness of coated layers is in the orders of 10-6-10-9, which is sufficient for modeling most coated systems in the micro- or nano-scales.

  15. Construction of approximate analytical solutions to a new class of non-linear oscillator equation

    NASA Technical Reports Server (NTRS)

    Mickens, R. E.; Oyedeji, K.

    1985-01-01

    The principle of harmonic balance is invoked in the development of an approximate analytic model for a class of nonlinear oscillators typified by a mass attached to a stretched wire. By assuming that harmonic balance will hold, solutions are devised for a steady state limit cycle and/or limit point motion. A method of slowly varying amplitudes then allows derivation of approximate solutions by determining the form of the exact solutions and substituting into them the lowest order terms of their respective Fourier expansions. The latter technique is actually a generalization of the method proposed by Kryloff and Bogoliuboff (1943).

  16. Finite size and geometrical non-linear effects during crack pinning by heterogeneities: An analytical and experimental study

    NASA Astrophysics Data System (ADS)

    Vasoya, Manish; Unni, Aparna Beena; Leblond, Jean-Baptiste; Lazarus, Veronique; Ponson, Laurent

    2016-04-01

    Crack pinning by heterogeneities is a central toughening mechanism in the failure of brittle materials. So far, most analytical explorations of the crack front deformation arising from spatial variations of fracture properties have been restricted to weak toughness contrasts using first order approximation and to defects of small dimensions with respect to the sample size. In this work, we investigate the non-linear effects arising from larger toughness contrasts by extending the approximation to the second order, while taking into account the finite sample thickness. Our calculations predict the evolution of a planar crack lying on the mid-plane of a plate as a function of material parameters and loading conditions, especially in the case of a single infinitely elongated obstacle. Peeling experiments are presented which validate the approach and evidence that the second order term broadens its range of validity in terms of toughness contrast values. The work highlights the non-linear response of the crack front to strong defects and the central role played by the thickness of the specimen on the pinning process.

  17. Fault-tolerant software - Experiment with the sift operating system. [Software Implemented Fault Tolerance computer

    NASA Technical Reports Server (NTRS)

    Brunelle, J. E.; Eckhardt, D. E., Jr.

    1985-01-01

    Results are presented of an experiment conducted in the NASA Avionics Integrated Research Laboratory (AIRLAB) to investigate the implementation of fault-tolerant software techniques on fault-tolerant computer architectures, in particular the Software Implemented Fault Tolerance (SIFT) computer. The N-version programming and recovery block techniques were implemented on a portion of the SIFT operating system. The results indicate that, to effectively implement fault-tolerant software design techniques, system requirements will be impacted and suggest that retrofitting fault-tolerant software on existing designs will be inefficient and may require system modification.

  18. A Standard Analytical Quasi Non Linear Module For Hydrologic and Water Resources Systems Simulation

    NASA Astrophysics Data System (ADS)

    Ostrowski, M.; Mehler, R.; Lohr, H.; Lempert, M.

    Hydrologic and water resources systems simulation involves an inhomogeneous set of different types of empirical and less (conceptual) or more physically defined dif- ferential equations. In hydrology these equations have been traditionally defined in a way to provide computationally efficient analytical solutions as well as practically applicable models, which are strongly simplified versions of complex reality. This simplifications have often led to the assumption of linear first order differential equa- tions such as the linear reservoir theory or derivatives thereof. It is evident that new approaches are necessary to close the gap between these simplifying assumptions and more realistic of nonlinear differential equations using the increased computer power. Over a period of several years the authors have developed a generic module for the computationally efficient simulation of nonlinear hydrologic/water resources systems. The module is based on the piecewise linearised nonlinear inhomogeneous differential equation of multiple input/output storage modules. Starting from soil moisture simu- lation the approach has been extendedand applied to other processes such as reservoir systems simulations as well as urban drainage systems analysis. The module has been implemented as a standard module into several practically applied complex simulation packages such as Reservoirs System Operation, GIS-based Catchment Modelling and Urban Pollution Load Modelling The scope of the presentation is to - describe the theoretical and practical background of the simulation module - the range of applicability - give validated examples of application

  19. An analytical study of the endoreversible Curzon-Ahlborn cycle for a non-linear heat transfer law

    NASA Astrophysics Data System (ADS)

    Páez-Hernández, Ricardo T.; Portillo-Díaz, Pedro; Ladino-Luna, Delfino; Ramírez-Rojas, Alejandro; Pacheco-Paez, Juan C.

    2016-01-01

    In the present article, an endoreversible Curzon-Ahlborn engine is studied by considering a non-linear heat transfer law, particularly the Dulong-Petit heat transfer law, using the `componendo and dividendo' rule as well as a simple differentiation to obtain the Curzon-Ahlborn efficiency as proposed by Agrawal in 2009. This rule is actually a change of variable that simplifies a two-variable problem to a one-variable problem. From elemental calculus, we obtain an analytical expression of efficiency and the power output. The efficiency is given only in terms of the temperatures of the reservoirs, such as both Carnot and Curzon-Ahlborn cycles. We make a comparison between efficiencies measured in real power plants and theoretical values from analytical expressions obtained in this article and others found in literature from several other authors. This comparison shows that the theoretical values of efficiency are close to real efficiency, and in some cases, they are exactly the same. Therefore, we can say that the Agrawal method is good in calculating thermal engine efficiencies approximately.

  20. Concatenated codes for fault tolerant quantum computing

    SciTech Connect

    Knill, E.; Laflamme, R.; Zurek, W.

    1995-05-01

    The application of concatenated codes to fault tolerant quantum computing is discussed. We have previously shown that for quantum memories and quantum communication, a state can be transmitted with error {epsilon} provided each gate has error at most c{epsilon}. We show how this can be used with Shor`s fault tolerant operations to reduce the accuracy requirements when maintaining states not currently participating in the computation. Viewing Shor`s fault tolerant operations as a method for reducing the error of operations, we give a concatenated implementation which promises to propagate the reduction hierarchically. This has the potential of reducing the accuracy requirements in long computations.

  1. Fault Tolerant Homopolar Magnetic Bearings

    NASA Technical Reports Server (NTRS)

    Li, Ming-Hsiu; Palazzolo, Alan; Kenny, Andrew; Provenza, Andrew; Beach, Raymond; Kascak, Albert

    2003-01-01

    Magnetic suspensions (MS) satisfy the long life and low loss conditions demanded by satellite and ISS based flywheels used for Energy Storage and Attitude Control (ACESE) service. This paper summarizes the development of a novel MS that improves reliability via fault tolerant operation. Specifically, flux coupling between poles of a homopolar magnetic bearing is shown to deliver desired forces even after termination of coil currents to a subset of failed poles . Linear, coordinate decoupled force-voltage relations are also maintained before and after failure by bias linearization. Current distribution matrices (CDM) which adjust the currents and fluxes following a pole set failure are determined for many faulted pole combinations. The CDM s and the system responses are obtained utilizing 1D magnetic circuit models with fringe and leakage factors derived from detailed, 3D, finite element field models. Reliability results are presented vs. detection/correction delay time and individual power amplifier reliability for 4, 6, and 7 pole configurations. Reliability is shown for two success criteria, i.e. (a) no catcher bearing contact following pole failures and (b) re-levitation off of the catcher bearings following pole failures. An advantage of the method presented over other redundant operation approaches is a significantly reduced requirement for backup hardware such as additional actuators or power amplifiers.

  2. Fault-tolerant PACS server

    NASA Astrophysics Data System (ADS)

    Cao, Fei; Liu, Brent J.; Huang, H. K.; Zhou, Michael Z.; Zhang, Jianguo; Zhang, X. C.; Mogel, Greg T.

    2002-05-01

    Failure of a PACS archive server could cripple an entire PACS operation. Last year we demonstrated that it was possible to design a fault-tolerant (FT) server with 99.999% uptime. The FT design was based on a triple modular redundancy with a simple majority vote to automatically detect and mask a faulty module. The purpose of this presentation is to report on its continuous developments in integrating with external mass storage devices, and to delineate laboratory failover experiments. An FT PACS Simulator with generic PACS software has been used in the experiment. To simulate a PACS clinical operation, image examinations are transmitted continuously from the modality simulator to the DICOM gateway and then to the FT PACS server and workstations. The hardware failures in network, FT server module, disk, RAID, and DLT are manually induced to observe the failover recovery of the FT PACS to resume its normal data flow. We then test and evaluate the FT PACS server in its reliability, functionality, and performance.

  3. Advanced information processing system: The Army fault tolerant architecture conceptual study. Volume 2: Army fault tolerant architecture design and analysis

    NASA Technical Reports Server (NTRS)

    Harper, R. E.; Alger, L. S.; Babikyan, C. A.; Butler, B. P.; Friend, S. A.; Ganska, R. J.; Lala, J. H.; Masotto, T. K.; Meyer, A. J.; Morton, D. P.

    1992-01-01

    Described here is the Army Fault Tolerant Architecture (AFTA) hardware architecture and components and the operating system. The architectural and operational theory of the AFTA Fault Tolerant Data Bus is discussed. The test and maintenance strategy developed for use in fielded AFTA installations is presented. An approach to be used in reducing the probability of AFTA failure due to common mode faults is described. Analytical models for AFTA performance, reliability, availability, life cycle cost, weight, power, and volume are developed. An approach is presented for using VHSIC Hardware Description Language (VHDL) to describe and design AFTA's developmental hardware. A plan is described for verifying and validating key AFTA concepts during the Dem/Val phase. Analytical models and partial mission requirements are used to generate AFTA configurations for the TF/TA/NOE and Ground Vehicle missions.

  4. Fault-tolerant communication channel structures

    NASA Technical Reports Server (NTRS)

    Alkalai, Leon (Inventor); Chau, Savio N. (Inventor); Tai, Ann T. (Inventor)

    2006-01-01

    Systems and techniques for implementing fault-tolerant communication channels and features in communication systems. Selected commercial-off-the-shelf devices can be integrated in such systems to reduce the cost.

  5. Study of fault-tolerant software technology

    NASA Technical Reports Server (NTRS)

    Slivinski, T.; Broglio, C.; Wild, C.; Goldberg, J.; Levitt, K.; Hitt, E.; Webb, J.

    1984-01-01

    Presented is an overview of the current state of the art of fault-tolerant software and an analysis of quantitative techniques and models developed to assess its impact. It examines research efforts as well as experience gained from commercial application of these techniques. The paper also addresses the computer architecture and design implications on hardware, operating systems and programming languages (including Ada) of using fault-tolerant software in real-time aerospace applications. It concludes that fault-tolerant software has progressed beyond the pure research state. The paper also finds that, although not perfectly matched, newer architectural and language capabilities provide many of the notations and functions needed to effectively and efficiently implement software fault-tolerance.

  6. Magnetic levitation-based electromagnetic energy harvesting: a semi-analytical non-linear model for energy transduction.

    PubMed

    Soares Dos Santos, Marco P; Ferreira, Jorge A F; Simões, José A O; Pascoal, Ricardo; Torrão, João; Xue, Xiaozheng; Furlani, Edward P

    2016-01-01

    Magnetic levitation has been used to implement low-cost and maintenance-free electromagnetic energy harvesting. The ability of levitation-based harvesting systems to operate autonomously for long periods of time makes them well-suited for self-powering a broad range of technologies. In this paper, a combined theoretical and experimental study is presented of a harvester configuration that utilizes the motion of a levitated hard-magnetic element to generate electrical power. A semi-analytical, non-linear model is introduced that enables accurate and efficient analysis of energy transduction. The model predicts the transient and steady-state response of the harvester a function of its motion (amplitude and frequency) and load impedance. Very good agreement is obtained between simulation and experiment with energy errors lower than 14.15% (mean absolute percentage error of 6.02%) and cross-correlations higher than 86%. The model provides unique insight into fundamental mechanisms of energy transduction and enables the geometric optimization of harvesters prior to fabrication and the rational design of intelligent energy harvesters. PMID:26725842

  7. Magnetic levitation-based electromagnetic energy harvesting: a semi-analytical non-linear model for energy transduction

    PubMed Central

    Soares dos Santos, Marco P.; Ferreira, Jorge A. F.; Simões, José A. O.; Pascoal, Ricardo; Torrão, João; Xue, Xiaozheng; Furlani, Edward P.

    2016-01-01

    Magnetic levitation has been used to implement low-cost and maintenance-free electromagnetic energy harvesting. The ability of levitation-based harvesting systems to operate autonomously for long periods of time makes them well-suited for self-powering a broad range of technologies. In this paper, a combined theoretical and experimental study is presented of a harvester configuration that utilizes the motion of a levitated hard-magnetic element to generate electrical power. A semi-analytical, non-linear model is introduced that enables accurate and efficient analysis of energy transduction. The model predicts the transient and steady-state response of the harvester a function of its motion (amplitude and frequency) and load impedance. Very good agreement is obtained between simulation and experiment with energy errors lower than 14.15% (mean absolute percentage error of 6.02%) and cross-correlations higher than 86%. The model provides unique insight into fundamental mechanisms of energy transduction and enables the geometric optimization of harvesters prior to fabrication and the rational design of intelligent energy harvesters. PMID:26725842

  8. Magnetic levitation-based electromagnetic energy harvesting: a semi-analytical non-linear model for energy transduction

    NASA Astrophysics Data System (ADS)

    Soares Dos Santos, Marco P.; Ferreira, Jorge A. F.; Simões, José A. O.; Pascoal, Ricardo; Torrão, João; Xue, Xiaozheng; Furlani, Edward P.

    2016-01-01

    Magnetic levitation has been used to implement low-cost and maintenance-free electromagnetic energy harvesting. The ability of levitation-based harvesting systems to operate autonomously for long periods of time makes them well-suited for self-powering a broad range of technologies. In this paper, a combined theoretical and experimental study is presented of a harvester configuration that utilizes the motion of a levitated hard-magnetic element to generate electrical power. A semi-analytical, non-linear model is introduced that enables accurate and efficient analysis of energy transduction. The model predicts the transient and steady-state response of the harvester a function of its motion (amplitude and frequency) and load impedance. Very good agreement is obtained between simulation and experiment with energy errors lower than 14.15% (mean absolute percentage error of 6.02%) and cross-correlations higher than 86%. The model provides unique insight into fundamental mechanisms of energy transduction and enables the geometric optimization of harvesters prior to fabrication and the rational design of intelligent energy harvesters.

  9. Modeling the Fault Tolerant Capability of a Flight Control System: An Exercise in SCR Specification

    NASA Technical Reports Server (NTRS)

    Alexander, Chris; Cortellessa, Vittorio; DelGobbo, Diego; Mili, Ali; Napolitano, Marcello

    2000-01-01

    In life-critical and mission-critical applications, it is important to make provisions for a wide range of contingencies, by providing means for fault tolerance. In this paper, we discuss the specification of a flight control system that is fault tolerant with respect to sensor faults. Redundancy is provided by analytical relations that hold between sensor readings; depending on the conditions, this redundancy can be used to detect, identify and accommodate sensor faults.

  10. Reconfigurable fault tolerant avionics system

    NASA Astrophysics Data System (ADS)

    Ibrahim, M. M.; Asami, K.; Cho, Mengu

    This paper presents the design of a reconfigurable avionics system based on modern Static Random Access Memory (SRAM)-based Field Programmable Gate Array (FPGA) to be used in future generations of nano satellites. A major concern in satellite systems and especially nano satellites is to build robust systems with low-power consumption profiles. The system is designed to be flexible by providing the capability of reconfiguring itself based on its orbital position. As Single Event Upsets (SEU) do not have the same severity and intensity in all orbital locations, having the maximum at the South Atlantic Anomaly (SAA) and the polar cusps, the system does not have to be fully protected all the time in its orbit. An acceptable level of protection against high-energy cosmic rays and charged particles roaming in space is provided within the majority of the orbit through software fault tolerance. Check pointing and roll back, besides control flow assertions, is used for that level of protection. In the minority part of the orbit where severe SEUs are expected to exist, a reconfiguration for the system FPGA is initiated where the processor systems are triplicated and protection through Triple Modular Redundancy (TMR) with feedback is provided. This technique of reconfiguring the system as per the level of the threat expected from SEU-induced faults helps in reducing the average dynamic power consumption of the system to one-third of its maximum. This technique can be viewed as a smart protection through system reconfiguration. The system is built on the commercial version of the (XC5VLX50) Xilinx Virtex5 FPGA on bulk silicon with 324 IO. Simulations of orbit SEU rates were carried out using the SPENVIS web-based software package.

  11. A Fault Tolerant System for an Integrated Avionics Sensor Configuration

    NASA Technical Reports Server (NTRS)

    Caglayan, A. K.; Lancraft, R. E.

    1984-01-01

    An aircraft sensor fault tolerant system methodology for the Transport Systems Research Vehicle in a Microwave Landing System (MLS) environment is described. The fault tolerant system provides reliable estimates in the presence of possible failures both in ground-based navigation aids, and in on-board flight control and inertial sensors. Sensor failures are identified by utilizing the analytic relationships between the various sensors arising from the aircraft point mass equations of motion. The estimation and failure detection performance of the software implementation (called FINDS) of the developed system was analyzed on a nonlinear digital simulation of the research aircraft. Simulation results showing the detection performance of FINDS, using a dual redundant sensor compliment, are presented for bias, hardover, null, ramp, increased noise and scale factor failures. In general, the results show that FINDS can distinguish between normal operating sensor errors and failures while providing an excellent detection speed for bias failures in the MLS, indicated airspeed, attitude and radar altimeter sensors.

  12. Reliability of Fault Tolerant Control Systems. Part 2

    NASA Technical Reports Server (NTRS)

    Wu, N. Eva

    2000-01-01

    This paper reports Part II of a two part effort that is intended to delineate the relationship between reliability and fault tolerant control in a quantitative manner. Reliability properties peculiar to fault-tolerant control systems are emphasized, such as the presence of analytic redundancy in high proportion, the dependence of failures on control performance, and high risks associated with decisions in redundancy management due to multiple sources of uncertainties and sometimes large processing requirements. As a consequence, coverage of failures through redundancy management can be severely limited. The paper proposes to formulate the fault tolerant control problem as an optimization problem that maximizes coverage of failures through redundancy management. Coverage modeling is attempted in a way that captures its dependence on the control performance and on the diagnostic resolution. Under the proposed redundancy management policy, it is shown that an enhanced overall system reliability can be achieved with a control law of a superior robustness, with an estimator of a higher resolution, and with a control performance requirement of a lesser stringency.

  13. Software fault tolerance in computer operating systems

    NASA Technical Reports Server (NTRS)

    Iyer, Ravishankar K.; Lee, Inhwan

    1994-01-01

    This chapter provides data and analysis of the dependability and fault tolerance for three operating systems: the Tandem/GUARDIAN fault-tolerant system, the VAX/VMS distributed system, and the IBM/MVS system. Based on measurements from these systems, basic software error characteristics are investigated. Fault tolerance in operating systems resulting from the use of process pairs and recovery routines is evaluated. Two levels of models are developed to analyze error and recovery processes inside an operating system and interactions among multiple instances of an operating system running in a distributed environment. The measurements show that the use of process pairs in Tandem systems, which was originally intended for tolerating hardware faults, allows the system to tolerate about 70% of defects in system software that result in processor failures. The loose coupling between processors which results in the backup execution (the processor state and the sequence of events occurring) being different from the original execution is a major reason for the measured software fault tolerance. The IBM/MVS system fault tolerance almost doubles when recovery routines are provided, in comparison to the case in which no recovery routines are available. However, even when recovery routines are provided, there is almost a 50% chance of system failure when critical system jobs are involved.

  14. Analysis of fault-tolerant neurocontrol architectures

    NASA Technical Reports Server (NTRS)

    Troudet, T.; Merrill, W.

    1992-01-01

    The fault-tolerance of analog parallel distributed implementations of a multivariable aircraft neurocontroller is analyzed by simulating weight and neuron failures in a simplified scheme of analog processing based on the functional architecture of the ETANN chip (Electrically Trainable Artificial Neural Network). The neural information processing is found to be only partially distributed throughout the set of weights of the neurocontroller synthesized with the backpropagation algorithm. Although the degree of distribution of the neural processing, and consequently the fault-tolerance of the neurocontroller, could be enhanced using Locally Distributed Weight and Neuron Approaches, a satisfactory level of fault-tolerance could only be obtained by retraining the degrated VLSI neurocontroller. The possibility of maintaining neurocontrol performance and stability in the presence of single weight of neuron failures was demonstrated through an automated retraining procedure of the neurocontroller based on a pre-programmed choice and sequence of the training parameters.

  15. Fault-tolerant dynamic task graph scheduling

    SciTech Connect

    Kurt, Mehmet C.; Krishnamoorthy, Sriram; Agrawal, Kunal; Agrawal, Gagan

    2014-11-16

    In this paper, we present an approach to fault tolerant execution of dynamic task graphs scheduled using work stealing. In particular, we focus on selective and localized recovery of tasks in the presence of soft faults. We elicit from the user the basic task graph structure in terms of successor and predecessor relationships. The work stealing-based algorithm to schedule such a task graph is augmented to enable recovery when the data and meta-data associated with a task get corrupted. We use this redundancy, and the knowledge of the task graph structure, to selectively recover from faults with low space and time overheads. We show that the fault tolerant design retains the essential properties of the underlying work stealing-based task scheduling algorithm, and that the fault tolerant execution is asymptotically optimal when task re-execution is taken into account. Experimental evaluation demonstrates the low cost of recovery under various fault scenarios.

  16. Experiments in fault tolerant software reliability

    NASA Technical Reports Server (NTRS)

    Mcallister, David F.; Tai, K. C.; Vouk, Mladen A.

    1987-01-01

    The reliability of voting was evaluated in a fault-tolerant software system for small output spaces. The effectiveness of the back-to-back testing process was investigated. Version 3.0 of the RSDIMU-ATS, a semi-automated test bed for certification testing of RSDIMU software, was prepared and distributed. Software reliability estimation methods based on non-random sampling are being studied. The investigation of existing fault-tolerance models was continued and formulation of new models was initiated.

  17. The cost of software fault tolerance

    NASA Technical Reports Server (NTRS)

    Migneault, G. E.

    1982-01-01

    The proposed use of software fault tolerance techniques as a means of reducing software costs in avionics and as a means of addressing the issue of system unreliability due to faults in software is examined. A model is developed to provide a view of the relationships among cost, redundancy, and reliability which suggests strategies for software development and maintenance which are not conventional.

  18. Towards fault-tolerant optimal control

    NASA Technical Reports Server (NTRS)

    Chizeck, H. J.; Willsky, A. S.

    1979-01-01

    The paper considers the design of fault-tolerant controllers that may endow systems with dynamic reliability. Results for jump linear quadratic Gaussian control problems are extended to include random jump costs, trajectory discontinuities, and a simple case of non-Markovian mode transitions.

  19. A methodology for testing fault-tolerant software

    NASA Technical Reports Server (NTRS)

    Andrews, D. M.; Mahmood, A.; Mccluskey, E. J.

    1985-01-01

    A methodology for testing fault tolerant software is presented. There are problems associated with testing fault tolerant software because many errors are masked or corrected by voters, limiter, or automatic channel synchronization. This methodology illustrates how the same strategies used for testing fault tolerant hardware can be applied to testing fault tolerant software. For example, one strategy used in testing fault tolerant hardware is to disable the redundancy during testing. A similar testing strategy is proposed for software, namely, to move the major emphasis on testing earlier in the development cycle (before the redundancy is in place) thus reducing the possibility that undetected errors will be masked when limiters and voters are added.

  20. Parametric Modeling and Fault Tolerant Control

    NASA Technical Reports Server (NTRS)

    Wu, N. Eva; Ju, Jianhong

    2000-01-01

    Fault tolerant control is considered for a nonlinear aircraft model expressed as a linear parameter-varying system. By proper parameterization of foreseeable faults, the linear parameter-varying system can include fault effects as additional varying parameters. A recently developed technique in fault effect parameter estimation allows us to assume that estimates of the fault effect parameters are available on-line. Reconfigurability is calculated for this model with respect to the loss of control effectiveness to assess the potentiality of the model to tolerate such losses prior to control design. The control design is carried out by applying a polytopic method to the aircraft model. An error bound on fault effect parameter estimation is provided, within which the Lyapunov stability of the closed-loop system is robust. Our simulation results show that as long as the fault parameter estimates are sufficiently accurate, the polytopic controller can provide satisfactory fault-tolerance.

  1. A Unified Fault-Tolerance Protocol

    NASA Technical Reports Server (NTRS)

    Miner, Paul; Gedser, Alfons; Pike, Lee; Maddalon, Jeffrey

    2004-01-01

    Davies and Wakerly show that Byzantine fault tolerance can be achieved by a cascade of broadcasts and middle value select functions. We present an extension of the Davies and Wakerly protocol, the unified protocol, and its proof of correctness. We prove that it satisfies validity and agreement properties for communication of exact values. We then introduce bounded communication error into the model. Inexact communication is inherent for clock synchronization protocols. We prove that validity and agreement properties hold for inexact communication, and that exact communication is a special case. As a running example, we illustrate the unified protocol using the SPIDER family of fault-tolerant architectures. In particular we demonstrate that the SPIDER interactive consistency, distributed diagnosis, and clock synchronization protocols are instances of the unified protocol.

  2. The MAFT architecture for distributed fault tolerance

    SciTech Connect

    Kieckhafer, R.M.; Walter, C.J.; Finn, A.M.; Thambidurai, P.M.

    1988-04-01

    This paper describes the Multicomputer Architecture for Fault-Tolerance (MAFT), a distributed system designed to provide extremely reliable computation in real-time control systems. MAFT is based on the physical and functional partitioning of executive functions from application functions. The implementation of the executive functions in a special-purpose hardware processor allows the fault-tolerance functions to be transparent to the application programs and minimizes overhead. Byzantine Agreement and Approximate Agreement algorithms are employed for critical system parameters. MAFT supports the use of multiversion hardware and software to tolerate built-in or generic faults. Graceful degradation and restoration of the application workload is permitted in response to the exclusion and readmission of nodes, respectively.

  3. Fault tolerant GPS/Inertial System design

    NASA Astrophysics Data System (ADS)

    Brown, Alison K.; Sturza, Mark A.; Deangelis, Franco; Lukaszewski, David A.

    The use of a GPS/Inertial integrated system in future launch vehicles motivates the described design of the present fault-tolerant system. The robustness of the navigation system is enhanced by integrating the GPS with an inertial fault-tolerant system. Three layers of failure detection and isolation are incorporated to determine the nature of flaws in the inertial instruments, the GPS receivers, or the integrated navigation solution. The layers are based on: (1) a high-rate parity algorithm for instrument failures; (2) a similar parity algorithm for GPS satellite or receiver failures; and (3) a GPS navigation solution to monitor inertial navigation failures. Dual failures of any system component can occur in any system component without affecting the performance of launch-vehicle navigation or guidance.

  4. Performance Analysis on Fault Tolerant Control System

    NASA Technical Reports Server (NTRS)

    Shin, Jong-Yeob; Belcastro, Christine

    2005-01-01

    In a fault tolerant control (FTC) system, a parameter varying FTC law is reconfigured based on fault parameters estimated by fault detection and isolation (FDI) modules. FDI modules require some time to detect fault occurrences in aero-vehicle dynamics. In this paper, an FTC analysis framework is provided to calculate the upper bound of an induced-L(sub 2) norm of an FTC system with existence of false identification and detection time delay. The upper bound is written as a function of a fault detection time and exponential decay rates and has been used to determine which FTC law produces less performance degradation (tracking error) due to false identification. The analysis framework is applied for an FTC system of a HiMAT (Highly Maneuverable Aircraft Technology) vehicle. Index Terms fault tolerant control system, linear parameter varying system, HiMAT vehicle.

  5. Fault-tolerant electrical power system

    NASA Astrophysics Data System (ADS)

    Mehdi, Ishaque S.; Weimer, Joseph A.

    1987-10-01

    An electrical system that will meet the requirements of a 1990s two-engine fighter is being developed in the Fault-Tolerant Electrical Power System (FTEPS) program, sponsored by the AFWAL Aero Propulsion Laboratory. FTEPS will demonstrate the generation and distribution of fault-tolerant, reliable, electrical power required for future aircraft. The system incorporates MIL-STD-1750A digital processors and MIL-STD-1553B data buses for control and communications. Electrical power is distributed through electrical load management centers by means of solid-state power controllers for fault protection and individual load control. The system will provide uninterruptible power to flight-critical loads such as the flight control and mission computers with sealed lead-acid batteries. Primary power is provided by four 60 kVA variable speed constant frequency generators. Buildup and testing of the FTEPS demonstrator is expected to be complete by May 1988.

  6. A Primer on Architectural Level Fault Tolerance

    NASA Technical Reports Server (NTRS)

    Butler, Ricky W.

    2008-01-01

    This paper introduces the fundamental concepts of fault tolerant computing. Key topics covered are voting, fault detection, clock synchronization, Byzantine Agreement, diagnosis, and reliability analysis. Low level mechanisms such as Hamming codes or low level communications protocols are not covered. The paper is tutorial in nature and does not cover any topic in detail. The focus is on rationale and approach rather than detailed exposition.

  7. A dual, fault-tolerant aerospace actuator

    NASA Technical Reports Server (NTRS)

    Siebert, C. J.

    1985-01-01

    The requirements for mechanisms used in the Space Transportation System (STS) are to provide dual fault tolerance, and if the payload equipment violates the Shuttle bay door envelope, these deployment/restow mechanisms must have independent primary and backup features. The research and development of an electromechanical actuator that meets these requirements and will be used on the Transfer Orbit Stage (TOS) program is described.

  8. Fault-tolerant architectures for superconducting qubits

    NASA Astrophysics Data System (ADS)

    DiVincenzo, David P.

    2009-12-01

    In this short review, I draw attention to new developments in the theory of fault tolerance in quantum computation that may give concrete direction to future work in the development of superconducting qubit systems. The basics of quantum error-correction codes, which I will briefly review, have not significantly changed since their introduction 15 years ago. But an interesting picture has emerged of an efficient use of these codes that may put fault-tolerant operation within reach. It is now understood that two-dimensional surface codes, close relatives of the original toric code of Kitaev, can be adapted as shown by Raussendorf and Harrington to effectively perform logical gate operations in a very simple planar architecture, with error thresholds for fault-tolerant operation simulated to be 0.75%. This architecture uses topological ideas in its functioning, but it is not 'topological quantum computation'—there are no non-abelian anyons in sight. I offer some speculations on the crucial pieces of superconducting hardware that could be demonstrated in the next couple of years that would be clear stepping stones towards this surface-code architecture.

  9. Interstitial fault tolerance-a technique for making systolic arrays fault tolerant

    SciTech Connect

    Kuhn, R.H.

    1983-01-01

    Systolic arrays are a popular model for the implementation of highly parallel VLSI systems. In this paper interstitial fault tolerance (IFT), a technique for incorporating fault tolerance into systolic arrays in a natural manner, is discussed. IFT can be used for reliable computation or for yield enhancement. Previous fault tolerance techniques for reliable computation on SIMD systems have employed redundant hardware. IFT on the other hand employs time redundancy. Previous wafer scale integration techniques for yield enhancement have been proposed only for linear processing element arrays. Ift is effective for both linear and two dimensional arrays. The time redundancy to achieve IFT is shown to be bounded by a factor of 3, allowing no processor redundancy. Results of monte carlo simulation of ift are presented. 19 references.

  10. An accurate analytic approximation to the non-linear change in volume of solids with applied pressure

    NASA Technical Reports Server (NTRS)

    Schlosser, Herbert; Ferrante, John

    1989-01-01

    An accurate analytic expression for the nonlinear change of the volume of a solid as a function of applied pressure is of great interest in high-pressure experimentation. It is found that a two-parameter analytic expression, fits the experimental volume-change data to within a few percent over the entire experimentally attainable pressure range. Results are presented for 24 different materials including metals, ceramic semiconductors, polymers, and ionic and rare-gas solids.

  11. Fault-Tolerant Coding for State Machines

    NASA Technical Reports Server (NTRS)

    Naegle, Stephanie Taft; Burke, Gary; Newell, Michael

    2008-01-01

    Two reliable fault-tolerant coding schemes have been proposed for state machines that are used in field-programmable gate arrays and application-specific integrated circuits to implement sequential logic functions. The schemes apply to strings of bits in state registers, which are typically implemented in practice as assemblies of flip-flop circuits. If a single-event upset (SEU, a radiation-induced change in the bit in one flip-flop) occurs in a state register, the state machine that contains the register could go into an erroneous state or could hang, by which is meant that the machine could remain in undefined states indefinitely. The proposed fault-tolerant coding schemes are intended to prevent the state machine from going into an erroneous or hang state when an SEU occurs. To ensure reliability of the state machine, the coding scheme for bits in the state register must satisfy the following criteria: 1. All possible states are defined. 2. An SEU brings the state machine to a known state. 3. There is no possibility of a hang state. 4. No false state is entered. 5. An SEU exerts no effect on the state machine. Fault-tolerant coding schemes that have been commonly used include binary encoding and "one-hot" encoding. Binary encoding is the simplest state machine encoding and satisfies criteria 1 through 3 if all possible states are defined. Binary encoding is a binary count of the state machine number in sequence; the table represents an eight-state example. In one-hot encoding, N bits are used to represent N states: All except one of the bits in a string are 0, and the position of the 1 in the string represents the state. With proper circuit design, one-hot encoding can satisfy criteria 1 through 4. Unfortunately, the requirement to use N bits to represent N states makes one-hot coding inefficient.

  12. Fault tolerant massively parallel processing architecture

    SciTech Connect

    Balasubramanian, V.; Banerjee, P.

    1987-08-01

    This paper presents two massively parallel processing architectures suitable for solving a wide variety of algorithms of divide-and-conquer type for problems such as the discrete Fourier transform, production systems, design automation, and others. The first architecture, called the Chain-structured Butterfly ARchitecture (CBAR), consists of a two-dimensional array of N-L . (log/sub 2/(L)+1) processing elements (PE) organized as L levels of log/sub 2/(L)+1 stages, and which has the butterfly connection between PEs in consecutive stages with straight-through feedback between PEs in the last and first stages. This connection system has the desirable property of allowing thousands of PEs to be connected with O(N) connection cost, O(log/sub 2/(N/log/sub 2/N)) communication paths, and a small number (=4) of I/O ports per PE. However, this architecture is not fault tolerant. The authors, therefore, propose a second architecture, called the REconfigurable Chain-structured Butterfly ARchitecture (RECBAR), which is a modified version of the CBAR. The RECBAR possesses all the desirable features of the CBAR, with the number of I/O ports per PE increased to six, and uses O(log/sub 2/N)/N) overhead in PEs and approximately 50% overhead in links to achieve single-level fault tolerance. Reliability improvements of the RECBAR over the CBAR are studied. This paper also presents a distributed diagnostic and structuring algorithm for the RECBAR that enables the architecture to detect faults and structure itself accordingly within 2 . log/sub 2/(L)+1 time steps, thus making it a truly fault tolerant architecture.

  13. A fault tolerant 80960 engine controller

    NASA Technical Reports Server (NTRS)

    Reichmuth, D. M.; Gage, M. L.; Paterson, E. S.; Kramer, D. D.

    1993-01-01

    The paper describes the design of the 80960 Fault Tolerant Engine Controller for the supervision of engine operations, which was designed for the NASA Marshall Space Center. Consideration is given to the major electronic components of the controller, including the engine controller, effectors, and the sensors, as well as to the controller hardware, the controller module and the communications module, and the controller software. The architecture of the controller hardware allows modifications to be made to fit the requirements of any new propulsion systems. Multiple flow diagrams are presented illustrating the controller's operations.

  14. Software fault tolerance using data diversity

    NASA Technical Reports Server (NTRS)

    Knight, John C.

    1991-01-01

    Research on data diversity is discussed. Data diversity relies on a different form of redundancy from existing approaches to software fault tolerance and is substantially less expensive to implement. Data diversity can also be applied to software testing and greatly facilitates the automation of testing. Up to now it has been explored both theoretically and in a pilot study, and has been shown to be a promising technique. The effectiveness of data diversity as an error detection mechanism and the application of data diversity to differential equation solvers are discussed.

  15. Fabrication of fault-tolerant systolic array processors

    SciTech Connect

    Golovko, V.A.

    1995-05-01

    Methods for designing fault-tolerant systolic array processors are discussed. Several ways of bypassing faulty elements in configurations, which depend on an input-data flow organization, are suggested. An analysis of the additional hardware costs of providing fault tolerance by various techniques and for various levels of redundancy is presented. Hadamard fault-tolerant processor design was used to illustrate the efficiency of the techniques suggested.

  16. FTAPE: A fault injection tool to measure fault tolerance

    NASA Technical Reports Server (NTRS)

    Tsai, Timothy K.; Iyer, Ravishankar K.

    1995-01-01

    The paper introduces FTAPE (Fault Tolerance And Performance Evaluator), a tool that can be used to compare fault-tolerant computers. The tool combines system-wide fault injection with a controllable workload. A workload generator is used to create high stress conditions for the machine. Faults are injected based on this workload activity in order to ensure a high level of fault propagation. The errors/fault ratio and performance degradation are presented as measures of fault tolerance.

  17. FTAPE: A fault injection tool to measure fault tolerance

    NASA Technical Reports Server (NTRS)

    Tsai, Timothy K.; Iyer, Ravishankar K.

    1994-01-01

    The paper introduces FTAPE (Fault Tolerance And Performance Evaluator), a tool that can be used to compare fault-tolerant computers. The tool combines system-wide fault injection with a controllable workload. A workload generator is used to create high stress conditions for the machine. Faults are injected based on this workload activity in order to ensure a high level of fault propagation. The errors/fault ratio and performance degradation are presented as measures of fault tolerance.

  18. Method and system for environmentally adaptive fault tolerant computing

    NASA Technical Reports Server (NTRS)

    Copenhaver, Jason L. (Inventor); Jeremy, Ramos (Inventor); Wolfe, Jeffrey M. (Inventor); Brenner, Dean (Inventor)

    2010-01-01

    A method and system for adapting fault tolerant computing. The method includes the steps of measuring an environmental condition representative of an environment. An on-board processing system's sensitivity to the measured environmental condition is measured. It is determined whether to reconfigure a fault tolerance of the on-board processing system based in part on the measured environmental condition. The fault tolerance of the on-board processing system may be reconfigured based in part on the measured environmental condition.

  19. FTAPE: A fault injection tool to measure fault tolerance

    NASA Astrophysics Data System (ADS)

    Tsai, Timothy K.; Iyer, Ravishankar K.

    1994-07-01

    The paper introduces FTAPE (Fault Tolerance And Performance Evaluator), a tool that can be used to compare fault-tolerant computers. The tool combines system-wide fault injection with a controllable workload. A workload generator is used to create high stress conditions for the machine. Faults are injected based on this workload activity in order to ensure a high level of fault propagation. The errors/fault ratio and performance degradation are presented as measures of fault tolerance.

  20. Coordinated Fault Tolerance for High-Performance Computing

    SciTech Connect

    Dongarra, Jack; Bosilca, George; et al.

    2013-04-08

    Our work to meet our goal of end-to-end fault tolerance has focused on two areas: (1) improving fault tolerance in various software currently available and widely used throughout the HEC domain and (2) using fault information exchange and coordination to achieve holistic, systemwide fault tolerance and understanding how to design and implement interfaces for integrating fault tolerance features for multiple layers of the software stack—from the application, math libraries, and programming language runtime to other common system software such as jobs schedulers, resource managers, and monitoring tools.

  1. Fault detection and fault tolerance in robotics

    NASA Technical Reports Server (NTRS)

    Visinsky, Monica; Walker, Ian D.; Cavallaro, Joseph R.

    1992-01-01

    Robots are used in inaccessible or hazardous environments in order to alleviate some of the time, cost and risk involved in preparing men to endure these conditions. In order to perform their expected tasks, the robots are often quite complex, thus increasing their potential for failures. If men must be sent into these environments to repair each component failure in the robot, the advantages of using the robot are quickly lost. Fault tolerant robots are needed which can effectively cope with failures and continue their tasks until repairs can be realistically scheduled. Before fault tolerant capabilities can be created, methods of detecting and pinpointing failures must be perfected. This paper develops a basic fault tree analysis of a robot in order to obtain a better understanding of where failures can occur and how they contribute to other failures in the robot. The resulting failure flow chart can also be used to analyze the resiliency of the robot in the presence of specific faults. By simulating robot failures and fault detection schemes, the problems involved in detecting failures for robots are explored in more depth.

  2. Fault Tolerant Magnetic Bearing for Turbomachinery

    NASA Technical Reports Server (NTRS)

    Choi, Benjamin; Provenza, Andrew

    2001-01-01

    NASA Glenn Research Center (GRC) has developed a Fault-Tolerant Magnetic Bearing Suspension rig to enhance the bearing system safety. It successfully demonstrated that using only two active poles out of eight redundant poles from each radial bearing (that is, simply 12 out of 16 poles dead) levitated the rotor and spun it without losing stability and desired position up to the maximum allowable speed of 20,000 rpm. In this paper, it is demonstrated that as far as the summation of force vectors of the attracting poles and rotor weight is zero, a fault-tolerant magnetic bearing system maintained the rotor at the desired position without losing stability even at the maximum rotor speed. A proportional-integral-derivative (PID) controller generated autonomous corrective actions with no operator's input for the fault situations without losing load capacity in terms of rotor position. This paper also deals with a centralized modal controller to better control the dynamic behavior over system modes.

  3. Fault tolerance analysis and applications to microwave modules and MMIC's

    NASA Astrophysics Data System (ADS)

    Boggan, Garry H.

    A project whose objective was to provide an overview of built-in-test (BIT) considerations applicable to microwave systems, modules, and MMICs (monolithic microwave integrated circuits) is discussed. Available analytical techniques and software for assessing system failure characteristics were researched, and the resulting investigation provides a review of two techniques which have applicability to microwave systems design. A system-level approach to fault tolerance and redundancy management is presented in its relationship to the subsystem/element design. An overview of the microwave BIT focus from the Air Force Integrated Diagnostics program is presented. The technical reports prepared by the GIMADS team were reviewed for applicability to microwave modules and components. A review of MIMIC (millimeter and microwave integrated circuit) program activities relative to BIT/BITE is given.

  4. Architectural issues in fault-tolerant, secure computing systems

    SciTech Connect

    Joseph, M.K.

    1988-01-01

    This dissertation explores several facets of the applicability of fault-tolerance techniques to secure computer design, these being: (1) how fault-tolerance techniques can be used on unsolved problems in computer security (e.g., computer viruses, and denial-of-service); (2) how fault-tolerance techniques can be used to support classical computer-security mechanisms in the presence of accidental and deliberate faults; and (3) the problems involved in designing a fault-tolerant, secure computer system (e.g., how computer security can degrade along with both the computational and fault-tolerance capabilities of a computer system). The approach taken in this research is almost as important as its results. It is different from current computer-security research in that a design paradigm for fault-tolerant computer design is used. This led to an extensive fault and error classification of many typical security threats. Throughout this work, a fault-tolerance perspective is taken. However, the author did not ignore basic computer-security technology. For some problems he investigated how to support and extend basic-security mechanism (e.g., trusted computing base), instead of trying to achieve the same result with purely fault-tolerance techniques.

  5. Fault tree models for fault tolerant hypercube multiprocessors

    NASA Technical Reports Server (NTRS)

    Boyd, Mark A.; Tuazon, Jezus O.

    1991-01-01

    Three candidate fault tolerant hypercube architectures are modeled, their reliability analyses are compared, and the resulting implications of these methods of incorporating fault tolerance into hypercube multiprocessors are discussed. In the course of performing the reliability analyses, the use of HARP and fault trees in modeling sequence dependent system behaviors is demonstrated.

  6. Fault-tolerant software for the FIMP

    NASA Technical Reports Server (NTRS)

    Hecht, H.; Hecht, M.

    1984-01-01

    The work reported here provides protection against software failures in the task dispatcher of the FTMP, a particularly critical portion of the system software. Faults in other system modules and application programs can be handled by similar techniques but are not covered in this effort. Goals of the work reported here are: (1) to develop provisions in the software design that will detect and mitigate software failures in the dispatcher portion of the FTMP Executive and, (2) to propose the implementation of specific software reliability measures in other parts of the system. Beyond the specific support to the FTMP project, the work reported here represents a considerable advance in the practical application of the recovery block methodology for fault tolerant software design.

  7. Fault Tolerance and Parallel Processing for NGST

    NASA Astrophysics Data System (ADS)

    Sengupta, R.; Offenberg, J. D.; Fixsen, D. J.; Nieto-Santisteban, M. A.; Hanisch, R. J.; Stockman, H. S.; Mather, J. C.

    1999-12-01

    The Next Generation Space Telescope (NGST) Image Processing Group is developing scalable cosmic ray rejection and data compression algorithms for parallel processors as part of NASA's Remote Exploration and Experimentation (REE) Project. The primary intention of the REE project is to use commercial-off-the shelf (COTS) technology to develop scalable, low-power, fault tolerant, high performance computers in space. NGST is one of the applications selected to demonstrate the benefit of having on-board supercomputing power. Real-time cosmic ray rejection would enable us to reduce the downlink data volume by as much as two orders of magnitude by combining multiple read-outs on the spacecraft rather than downlinking them separately. The combined read-outs can be further reduced in size by applying lossy and/or lossless data compression algorithms. This work is funded by NASA's REE project, managed by JPL.

  8. Method and apparatus for fault tolerance

    NASA Technical Reports Server (NTRS)

    Masson, Gerald M. (Inventor); Sullivan, Gregory F. (Inventor)

    1993-01-01

    A method and apparatus for achieving fault tolerance in a computer system having at least a first central processing unit and a second central processing unit. The method comprises the steps of first executing a first algorithm in the first central processing unit on input which produces a first output as well as a certification trail. Next, executing a second algorithm in the second central processing unit on the input and on at least a portion of the certification trail which produces a second output. The second algorithm has a faster execution time than the first algorithm for a given input. Then, comparing the first and second outputs such that an error result is produced if the first and second outputs are not the same. The step of executing a first algorithm and the step of executing a second algorithm preferably takes place over essentially the same time period.

  9. Quantum fault-tolerant thresholds for universal concatenated schemes

    NASA Astrophysics Data System (ADS)

    Chamberland, Christopher; Jochym-O'Connor, Tomas; Laflamme, Raymond

    Fault-tolerant quantum computation uses ancillary qubits in order to protect logical data qubits while allowing for the manipulation of the quantum information without severe losses in coherence. While different models for fault-tolerant quantum computation exist, determining the ancillary qubit overhead for competing schemes remains a challenging theoretical problem. In this work, we study the fault-tolerance threshold rates of different models for universal fault-tolerant quantum computation. Namely, we provide different threshold rates for the 105-qubit concatenated coding scheme for universal computation without the need for state distillation. We study two error models: adversarial noise and depolarizing noise and provide lower bounds for the threshold in each of these error regimes. Establishing the threshold rates for the concatenated coding scheme will allow for a physical quantum resource comparison between our fault-tolerant universal quantum computation model and the traditional model using magic state distillation.

  10. Fault tolerant operation of switched reluctance machine

    NASA Astrophysics Data System (ADS)

    Wang, Wei

    The energy crisis and environmental challenges have driven industry towards more energy efficient solutions. With nearly 60% of electricity consumed by various electric machines in industry sector, advancement in the efficiency of the electric drive system is of vital importance. Adjustable speed drive system (ASDS) provides excellent speed regulation and dynamic performance as well as dramatically improved system efficiency compared with conventional motors without electronics drives. Industry has witnessed tremendous grow in ASDS applications not only as a driving force but also as an electric auxiliary system for replacing bulky and low efficiency auxiliary hydraulic and mechanical systems. With the vast penetration of ASDS, its fault tolerant operation capability is more widely recognized as an important feature of drive performance especially for aerospace, automotive applications and other industrial drive applications demanding high reliability. The Switched Reluctance Machine (SRM), a low cost, highly reliable electric machine with fault tolerant operation capability, has drawn substantial attention in the past three decades. Nevertheless, SRM is not free of fault. Certain faults such as converter faults, sensor faults, winding shorts, eccentricity and position sensor faults are commonly shared among all ASDS. In this dissertation, a thorough understanding of various faults and their influence on transient and steady state performance of SRM is developed via simulation and experimental study, providing necessary knowledge for fault detection and post fault management. Lumped parameter models are established for fast real time simulation and drive control. Based on the behavior of the faults, a fault detection scheme is developed for the purpose of fast and reliable fault diagnosis. In order to improve the SRM power and torque capacity under faults, the maximum torque per ampere excitation are conceptualized and validated through theoretical analysis and

  11. Measuring fault tolerance with the FTAPE fault injection tool

    NASA Technical Reports Server (NTRS)

    Tsai, Timothy K.; Iyer, Ravishankar K.

    1995-01-01

    This paper describes FTAPE (Fault Tolerance And Performance Evaluator), a tool that can be used to compare fault-tolerant computers. The major parts of the tool include a system-wide fault-injector, a workload generator, and a workload activity measurement tool. The workload creates high stress conditions on the machine. Using stress-based injection, the fault injector is able to utilize knowledge of the workload activity to ensure a high level of fault propagation. The errors/fault ratio, performance degradation, and number of system crashes are presented as measures of fault tolerance.

  12. Measuring fault tolerance with the FTAPE fault injection tool

    NASA Astrophysics Data System (ADS)

    Tsai, Timothy K.; Iyer, Ravishankar K.

    1995-05-01

    This paper describes FTAPE (Fault Tolerance And Performance Evaluator), a tool that can be used to compare fault-tolerant computers. The major parts of the tool include a system-wide fault-injector, a workload generator, and a workload activity measurement tool. The workload creates high stress conditions on the machine. Using stress-based injection, the fault injector is able to utilize knowledge of the workload activity to ensure a high level of fault propagation. The errors/fault ratio, performance degradation, and number of system crashes are presented as measures of fault tolerance.

  13. A fault-tolerant network architecture for integrated avionics

    NASA Technical Reports Server (NTRS)

    Butler, Bryan; Adams, Stuart

    1991-01-01

    The Army Fault-Tolerant Architecture (AFTA) under construction at the Charles Stark Draper Laboratory is an example of a highly integrated critical avionics system. The AFTA system must connect to other redundant and nonredundant systems, as well as to input/output devices. A fault-tolerant data bus (FTDB) is being developed to provide highly reliable communication between the AFTA computer and other network stations. The FTDB is being designed for Byzantine resilience and is probably capable of tolerating any single arbitrary fault. The author describes a prototype architecture for the fault-tolerant data bus.

  14. FTMP (Fault Tolerant Multiprocessor) programmer's manual

    NASA Technical Reports Server (NTRS)

    Feather, F. E.; Liceaga, C. A.; Padilla, P. A.

    1986-01-01

    The Fault Tolerant Multiprocessor (FTMP) computer system was constructed using the Rockwell/Collins CAPS-6 processor. It is installed in the Avionics Integration Research Laboratory (AIRLAB) of NASA Langley Research Center. It is hosted by AIRLAB's System 10, a VAX 11/750, for the loading of programs and experimentation. The FTMP support software includes a cross compiler for a high level language called Automated Engineering Design (AED) System, an assembler for the CAPS-6 processor assembly language, and a linker. Access to this support software is through an automated remote access facility on the VAX which relieves the user of the burden of learning how to use the IBM 4381. This manual is a compilation of information about the FTMP support environment. It explains the FTMP software and support environment along many of the finer points of running programs on FTMP. This will be helpful to the researcher trying to run an experiment on FTMP and even to the person probing FTMP with fault injections. Much of the information in this manual can be found in other sources; we are only attempting to bring together the basic points in a single source. If the reader should need points clarified, there is a list of support documentation in the back of this manual.

  15. Fault-tolerant multichannel demultiplexer subsystems

    NASA Technical Reports Server (NTRS)

    Redinbo, Robert

    1991-01-01

    Fault tolerance in future processing and switching communication satellites is addressed by showing new methods for detecting hardware failures in the first major subsystem, the multichannel demultiplexer. An efficient method for demultiplexing frequency slotted channels uses multirate filter banks which contain fast Fourier transform processing. All numerical processing is performed at a lower rate commensurate with the small bandwidth of each bandbase channel. The integrity of the demultiplexing operations is protected by using real number convolutional codes to compute comparable parity values which detect errors at the data sample level. High rate, systematic convolutional codes produce parity values at a much reduced rate, and protection is achieved by generating parity values in two ways and comparing them. Parity values corresponding to each output channel are generated in parallel by a subsystem, operating even slower and in parallel with the demultiplexer that is virtually identical to the original structure. These parity calculations may be time shared with the same processing resources because they are so similar.

  16. Model-Based Fault Tolerant Control

    NASA Technical Reports Server (NTRS)

    Kumar, Aditya; Viassolo, Daniel

    2008-01-01

    The Model Based Fault Tolerant Control (MBFTC) task was conducted under the NASA Aviation Safety and Security Program. The goal of MBFTC is to develop and demonstrate real-time strategies to diagnose and accommodate anomalous aircraft engine events such as sensor faults, actuator faults, or turbine gas-path component damage that can lead to in-flight shutdowns, aborted take offs, asymmetric thrust/loss of thrust control, or engine surge/stall events. A suite of model-based fault detection algorithms were developed and evaluated. Based on the performance and maturity of the developed algorithms two approaches were selected for further analysis: (i) multiple-hypothesis testing, and (ii) neural networks; both used residuals from an Extended Kalman Filter to detect the occurrence of the selected faults. A simple fusion algorithm was implemented to combine the results from each algorithm to obtain an overall estimate of the identified fault type and magnitude. The identification of the fault type and magnitude enabled the use of an online fault accommodation strategy to correct for the adverse impact of these faults on engine operability thereby enabling continued engine operation in the presence of these faults. The performance of the fault detection and accommodation algorithm was extensively tested in a simulation environment.

  17. Fault-tolerance for exascale systems.

    SciTech Connect

    Riesen, Rolf E.; Varela, Maria Ruiz; Ferreira, Kurt Brian

    2010-08-01

    Periodic, coordinated, checkpointing to disk is the most prevalent fault tolerance method used in modern large-scale, capability class, high-performance computing (HPC) systems. Previous work has shown that as the system grows in size, the inherent synchronization of coordinated checkpoint/restart (CR) limits application scalability; at large node counts the application spends most of its time checkpointing instead of executing useful work. Furthermore, a single component failure forces an application restart from the last correct checkpoint. Suggested alternatives to coordinated CR include uncoordinated CR with message logging, redundant computation, and RAID-inspired, in-memory distributed checkpointing schemes. Each of these alternatives have differing overheads that are dependent on both the scale and communication characteristics of the application. In this work, using the Structural Simulation Toolkit (SST) simulator, we compare the performance characteristics of each of these resilience methods for a number of HPC application patterns on a number of proposed exascale machines. The result of this work provides valuable guidance on the most efficient resilience methods for exascale systems.

  18. An analytical coupled technique for solving nonlinear large-amplitude oscillation of a conservative system with inertia and static non-linearity.

    PubMed

    Razzak, Md Abdur; Alam, Md Shamsul

    2016-01-01

    Based on a new trial function, an analytical coupled technique (a combination of homotopy perturbation method and variational method) is presented to obtain the approximate frequencies and the corresponding periodic solutions of the free vibration of a conservative oscillator having inertia and static non-linearities. In some of the previous articles, the first and second-order approximations have been determined by the same method of such nonlinear oscillator, but the trial functions have not been satisfied the initial conditions. It seemed to be a big shortcoming of those articles. The new trial function of this paper overcomes aforementioned limitation. The first-order approximation is mainly considered in this paper. The main advantage of this present paper is, the first-order approximation gives better result than other existing second-order harmonic balance methods. The present method is valid for large amplitudes of oscillation. The absolute relative error measures (first-order approximate frequency) in this paper is 0.00 % for large amplitude A = 1000, while the relative error gives two different second-order harmonic balance methods: 10.33 and 3.72 %. Thus the present method is suitable for solving the above-mentioned nonlinear oscillator. PMID:27119060

  19. Programming fault-tolerant distributed systems in Ada

    NASA Technical Reports Server (NTRS)

    Voigt, Susan J.

    1985-01-01

    Viewgraphs on the topic of programming fault-tolerant distributed systems in the Ada programming language are presented. Topics covered include project goals, Ada difficulties and solutions, testbed requirements, and virtual processors.

  20. Fault-tolerant interconnection networks for multiprocessor systems

    SciTech Connect

    Nassar, H.M.

    1989-01-01

    Interconnection networks represent the backbone of multiprocessor systems. A failure in the network, therefore, could seriously degrade the system performance. For this reason, fault tolerance has been regarded as a major consideration in interconnection network design. This thesis presents two novel techniques to provide fault tolerance capabilities to three major networks: the Beneline network and the Clos network. First, the Simple Fault Tolerance Technique (SFT) is presented. The SFT technique is in fact the result of merging two widely known interconnection mechanisms: a normal interconnection network and a shared bus. This technique is most suitable for networks with small switches, such as the Baseline network and the Benes network. For the Clos network, whose switches may be large for the SFT, another technique is developed to produce the Fault-Tolerant Clos (FTC) network. In the FTC, one switch is added to each stage. The two techniques are described and thoroughly analyzed.

  1. Fault tolerant architectures for integrated aircraft electronics systems

    NASA Technical Reports Server (NTRS)

    Levitt, K. N.; Melliar-Smith, P. M.; Schwartz, R. L.

    1983-01-01

    Work into possible architectures for future flight control computer systems is described. Ada for Fault-Tolerant Systems, the NETS Network Error-Tolerant System architecture, and voting in asynchronous systems are covered.

  2. Fault-tolerance - The survival attribute of digital systems

    NASA Technical Reports Server (NTRS)

    Avizienis, A.

    1978-01-01

    Fault-tolerance is the architectural attribute of a digital system that keeps the logic machine doing its specified tasks when its host, the physical system, suffers various kinds of failures of its components. A more general concept of fault-tolerance also includes human mistakes committed during software and hardware implementation and during man/machine interaction among the causes of faults that are to be tolerated by the logic machine. This paper discusses the concept of fault-tolerance, the reasons for its inclusion in digital system architecture, and the methods of its implementation. A chronological view of the evolution of fault-tolerant systems and an outline of some goals for its further development conclude the presentation.

  3. Analysis of typical fault-tolerant architectures using HARP

    NASA Technical Reports Server (NTRS)

    Bavuso, Salvatore J.; Bechta Dugan, Joanne; Trivedi, Kishor S.; Rothmann, Elizabeth M.; Smith, W. Earl

    1987-01-01

    Difficulties encountered in the modeling of fault-tolerant systems are discussed. The Hybrid Automated Reliability Predictor (HARP) approach to modeling fault-tolerant systems is described. The HARP is written in FORTRAN, consists of nearly 30,000 lines of codes and comments, and is based on behavioral decomposition. Using the behavioral decomposition, the dependability model is divided into fault-occurrence/repair and fault/error-handling models; the characteristics and combining of these two models are examined. Examples in which the HARP is applied to the modeling of some typical fault-tolerant systems, including a local-area network, two fault-tolerant computer systems, and a flight control system, are presented.

  4. Application-Specific Fault Tolerance via Data Access Characterization

    SciTech Connect

    Ali, Nawab; Krishnamoorthy, Sriram; Govind, Niranjan; Kowalski, Karol; Sadayappan, Ponnuswamy

    2011-08-30

    Recent trends in semiconductor technology and supercomputer design predict an increasing probability of faults during an application's execution. Designing an application that is resilient to system failures requires careful evaluation of the impact of various approaches on preserving key application state. In this paper, we present our experiences in an ongoing effort to make a large computational chemistry application fault tolerant. We construct the data access signatures of key application modules to evaluate alternative fault tolerance approaches. We present the instrumentation methodology, characterization of the application modules, and evaluation of fault tolerance techniques using the information collected. The application signatures developed capture application characteristics not traditionally revealed by performance tools. We believe these can be used in the design and evaluation of runtimes beyond fault tolerance.

  5. Optimal Management of Redundant Control Authority for Fault Tolerance

    NASA Technical Reports Server (NTRS)

    Wu, N. Eva; Ju, Jianhong

    2000-01-01

    This paper is intended to demonstrate the feasibility of a solution to a fault tolerant control problem. It explains, through a numerical example, the design and the operation of a novel scheme for fault tolerant control. The fundamental principle of the scheme was formalized in [5] based on the notion of normalized nonspecificity. The novelty lies with the use of a reliability criterion for redundancy management, and therefore leads to a high overall system reliability.

  6. Fault tolerance analysis of the class of rearrangeable interconnection networks

    SciTech Connect

    Pakzad, S. . Dept. of Electrical Engineering)

    1989-08-01

    This paper analyzes the fault tolerance characteristics of a range or rearrangeable {beta}-networks based on the concepts and the framework developed by S. Pakzad and S. Lakshmivarahan. These rearrangeable {beta}-networks include the Benes network, the Waksman network, the Joel network, and the serial network. In addition, this paper presents a comparative analysis of the aforementioned networks according to their hardware cost, performance, and degree of fault tolerance.

  7. Fault tolerant programmable digital attitude control electronics study

    NASA Technical Reports Server (NTRS)

    Sorensen, A. A.

    1974-01-01

    The attitude control electronics mechanization study to develop a fault tolerant autonomous concept for a three axis system is reported. Programmable digital electronics are compared to general purpose digital computers. The requirements, constraints, and tradeoffs are discussed. It is concluded that: (1) general fault tolerance can be achieved relatively economically, (2) recovery times of less than one second can be obtained, (3) the number of faulty behavior patterns must be limited, and (4) adjoined processes are the best indicators of faulty operation.

  8. On the design of fault-tolerant robotic manipulator systems

    NASA Astrophysics Data System (ADS)

    Tesar, Delbert

    1993-02-01

    Robotic systems are finding increasing use in space applications. Many of these devices are going to be operational on board the Space Station Freedom. Fault tolerance has been deemed necessary because of the criticality of the tasks and the inaccessibility of the systems to maintenance and repair. Design for fault tolerance in manipulator systems is an area within robotics that is without precedence in the literature. In this paper, we will attempt to lay down the foundations for such a technology. Design for fault tolerance demands new and special approaches to design, often at considerable variance from established design practices. These design aspects, together with reliability evaluation and modeling tools, are presented. Mechanical architectures that employ protective redundancies at many levels and have a modular architecture are then studied in detail. Once a mechanical architecture for fault tolerance has been derived, the chronological stages of operational fault tolerance are investigated. Failure detection, isolation, and estimation methods are surveyed, and such methods for robot sensors and actuators are derived. Failure recovery methods are also presented for each of the protective layers of redundancy. Failure recovery tactics often span all of the layers of a control hierarchy. Thus, a unified framework for decision-making and control, which orchestrates both the nominal redundancy management tasks and the failure management tasks, has been derived. The well-developed field of fault-tolerant computers is studied next, and some design principles relevant to the design of fault-tolerant robot controllers are abstracted. Conclusions are drawn, and a road map for the design of fault-tolerant manipulator systems is laid out with recommendations for a 10 DOF arm with dual actuators at each joint.

  9. Advanced development for space robotics with emphasis on fault tolerance

    NASA Technical Reports Server (NTRS)

    Tesar, D.; Chladek, J.; Hooper, R.; Sreevijayan, D.; Kapoor, C.; Geisinger, J.; Meaney, M.; Browning, G.; Rackers, K.

    1995-01-01

    This paper describes the ongoing work in fault tolerance at the University of Texas at Austin. The paper describes the technical goals the group is striving to achieve and includes a brief description of the individual projects focusing on fault tolerance. The ultimate goal is to develop and test technology applicable to all future missions of NASA (lunar base, Mars exploration, planetary surveillance, space station, etc.).

  10. A second generation experiment in fault-tolerant software

    NASA Technical Reports Server (NTRS)

    Knight, J. C.

    1986-01-01

    The primary goal was to determine whether the application of fault tolerance to software increases its reliability if the cost of production is the same as for an equivalent nonfault tolerance version derived from the same requirements specification. Software development protocols are discussed. The feasibility of adapting to software design fault tolerance the technique of N-fold Modular Redundancy with majority voting was studied.

  11. Active resonator reset in the non-linear regime of circuit QED to improve multi-round quantum parity checks

    NASA Astrophysics Data System (ADS)

    Bultink, Cornelis Christiaan; Rol, M. A.; Fu, X.; Dikken, B. C. S.; de Sterke, J. C.; Vermeulen, R. F. L.; Schouten, R. N.; Bruno, A.; Bertels, K. L. M.; Dicarlo, L.

    Reliable quantum parity measurements are essential for fault-tolerant quantum computing. In quantum processors based on circuit QED, the fidelity and speed of multi-round quantum parity checks using an ancillary qubit can be compromised by photons remaining in the readout resonator post measurement, leading to ancilla dephasing and gate errors. The challenge of quickly depleting photons is biggest when maximizing the single-shot readout fidelity involves strong pulses turning the resonators non-linear. We experimentally demonstrate the numerical optimization of counter pulses for fast photon depletion in this non-analytic regime. We compare two methods, one using digital feedback and another running open loop. We assess both methods by minimizing the average number of rounds to ancilla measurement error. We acknowledge funding from the EU FP7 project SCALEQIT, FOM, and an ERC Synergy Grant.

  12. Steps toward fault-tolerant quantum chemistry.

    SciTech Connect

    Taube, Andrew Garvin

    2010-05-01

    Developing quantum chemistry programs on the coming generation of exascale computers will be a difficult task. The programs will need to be fault-tolerant and minimize the use of global operations. This work explores the use a task-based model that uses a data-centric approach to allocate work to different processes as it applies to quantum chemistry. After introducing the key problems that appear when trying to parallelize a complicated quantum chemistry method such as coupled-cluster theory, we discuss the implications of that model as it pertains to the computational kernel of a coupled-cluster program - matrix multiplication. Also, we discuss the extensions that would required to build a full coupled-cluster program using the task-based model. Current programming models for high-performance computing are fault-intolerant and use global operations. Those properties are unsustainable as computers scale to millions of CPUs; instead one must recognize that these systems will be hierarchical in structure, prone to constant faults, and global operations will be infeasible. The FAST-OS HARE project is introducing a scale-free computing model to address these issues. This model is hierarchical and fault-tolerant by design, allows for the clean overlap of computation and communication, reducing the network load, does not require checkpointing, and avoids the complexity of many HPC runtimes. Development of an algorithm within this model requires a change in focus from imperative programming to a data-centric approach. Quantum chemistry (QC) algorithms, in particular electronic structure methods, are an ideal test bed for this computing model. These methods describe the distribution of electrons in a molecule, which determine the properties of the molecule. The computational cost of these methods is high, scaling quartically or higher in the size of the molecule, which is why QC applications are major users of HPC resources. The complexity of these algorithms means that

  13. Reliability of voting in fault-tolerant software systems for small output spaces

    NASA Technical Reports Server (NTRS)

    Mcallister, David F.; Sun, Chien-En; Vouk, Mladen A.

    1987-01-01

    Under a voting strategy in a fault-tolerant software system there is a difference between correctness and agreement. An independent N-version programming reliability model is proposed for treating small output spaces which distinguishes between correctness and agreement. System reliability is investigated using analytical relationships and simulation. A consensus majority voting strategy is proposed and its performance is analyzed and compared with other voting strategies. Consensus majority strategy automatically adapts the voting to different component reliability and output space cardinality characteristics. It is shown that absolute majority voting strategy provides a lower bound on the reliability provided by the consensus majority, and 2-of-n voting strategy an upper bound. If r is the cardinality of the output space it is proved the 1/r is a lower bound on the average reliability of fault-tolerant system components below which the system reliability begins to deteriorate as more versions are added.

  14. Block QCA Fault-Tolerant Logic Gates

    NASA Technical Reports Server (NTRS)

    Firjany, Amir; Toomarian, Nikzad; Modarres, Katayoon

    2003-01-01

    Suitably patterned arrays (blocks) of quantum-dot cellular automata (QCA) have been proposed as fault-tolerant universal logic gates. These block QCA gates could be used to realize the potential of QCA for further miniaturization, reduction of power consumption, increase in switching speed, and increased degree of integration of very-large-scale integrated (VLSI) electronic circuits. The limitations of conventional VLSI circuitry, the basic principle of operation of QCA, and the potential advantages of QCA-based VLSI circuitry were described in several NASA Tech Briefs articles, namely Implementing Permutation Matrices by Use of Quantum Dots (NPO-20801), Vol. 25, No. 10 (October 2001), page 42; Compact Interconnection Networks Based on Quantum Dots (NPO-20855) Vol. 27, No. 1 (January 2003), page 32; Bit-Serial Adder Based on Quantum Dots (NPO-20869), Vol. 27, No. 1 (January 2003), page 35; and Hybrid VLSI/QCA Architecture for Computing FFTs (NPO-20923), which follows this article. To recapitulate the principle of operation (greatly oversimplified because of the limitation on space available for this article): A quantum-dot cellular automata contains four quantum dots positioned at or between the corners of a square cell. The cell contains two extra mobile electrons that can tunnel (in the quantummechanical sense) between neighboring dots within the cell. The Coulomb repulsion between the two electrons tends to make them occupy antipodal dots in the cell. For an isolated cell, there are two energetically equivalent arrangements (denoted polarization states) of the extra electrons. The cell polarization is used to encode binary information. Because the polarization of a nonisolated cell depends on Coulomb-repulsion interactions with neighboring cells, universal logic gates and binary wires could be constructed, in principle, by arraying QCA of suitable design in suitable patterns. Heretofore, researchers have recognized two major obstacles to realization of QCA

  15. Fault tolerant hypercube computer system architecture

    NASA Technical Reports Server (NTRS)

    Madan, Herb S. (Inventor); Chow, Edward (Inventor)

    1989-01-01

    A fault-tolerant multiprocessor computer system of the hypercube type comprising a hierarchy of computers of like kind which can be functionally substituted for one another as necessary is disclosed. Communication between the working nodes is via one communications network while communications between the working nodes and watch dog nodes and load balancing nodes higher in the structure is via another communications network separate from the first. A typical branch of the hierarchy reporting to a master node or host computer comprises, a plurality of first computing nodes; a first network of message conducting paths for interconnecting the first computing nodes as a hypercube. The first network provides a path for message transfer between the first computing nodes; a first watch dog node; and a second network of message connecting paths for connecting the first computing nodes to the first watch dog node independent from the first network, the second network provides an independent path for test message and reconfiguration affecting transfers between the first computing nodes and the first switch watch dog node. There is additionally, a plurality of second computing nodes; a third network of message conducting paths for interconnecting the second computing nodes as a hypercube. The third network provides a path for message transfer between the second computing nodes; a fourth network of message conducting paths for connecting the second computing nodes to the first watch dog node independent from the third network. The fourth network provides an independent path for test message and reconfiguration affecting transfers between the second computing nodes and the first watch dog node; and a first multiplexer disposed between the first watch dog node and the second and fourth networks for allowing the first watch dog node to selectively communicate with individual ones of the computing nodes through the second and fourth networks; as well as, a second watch dog node

  16. Advanced information processing system: The Army Fault-Tolerant Architecture detailed design overview

    NASA Technical Reports Server (NTRS)

    Harper, Richard E.; Babikyan, Carol A.; Butler, Bryan P.; Clasen, Robert J.; Harris, Chris H.; Lala, Jaynarayan H.; Masotto, Thomas K.; Nagle, Gail A.; Prizant, Mark J.; Treadwell, Steven

    1994-01-01

    The Army Avionics Research and Development Activity (AVRADA) is pursuing programs that would enable effective and efficient management of large amounts of situational data that occurs during tactical rotorcraft missions. The Computer Aided Low Altitude Night Helicopter Flight Program has identified automated Terrain Following/Terrain Avoidance, Nap of the Earth (TF/TA, NOE) operation as key enabling technology for advanced tactical rotorcraft to enhance mission survivability and mission effectiveness. The processing of critical information at low altitudes with short reaction times is life-critical and mission-critical necessitating an ultra-reliable/high throughput computing platform for dependable service for flight control, fusion of sensor data, route planning, near-field/far-field navigation, and obstacle avoidance operations. To address these needs the Army Fault Tolerant Architecture (AFTA) is being designed and developed. This computer system is based upon the Fault Tolerant Parallel Processor (FTPP) developed by Charles Stark Draper Labs (CSDL). AFTA is hard real-time, Byzantine, fault-tolerant parallel processor which is programmed in the ADA language. This document describes the results of the Detailed Design (Phase 2 and 3 of a 3-year project) of the AFTA development. This document contains detailed descriptions of the program objectives, the TF/TA NOE application requirements, architecture, hardware design, operating systems design, systems performance measurements and analytical models.

  17. Extending quantum error correction: New continuous measurement protocols and improved fault-tolerant overhead

    NASA Astrophysics Data System (ADS)

    Ahn, Charlene Sonja

    Quantum mechanical applications range from quantum computers to quantum key distribution to teleportation. In these applications, quantum error correction is extremely important for protecting quantum states against decoherence. Here I present two main results regarding quantum error correction protocols. The first main topic I address is the development of continuous-time quantum error correction protocols via combination with techniques from quantum control. These protocols rely on weak measurement and Hamiltonian feedback instead of the projective measurements and unitary gates usually assumed by canonical quantum error correction. I show that a subclass of these protocols can be understood as a quantum feedback protocol, and analytically analyze the general case using the stabilizer formalism; I show that in this case perfect feedback can perfectly protect a stabilizer subspace. I also show through numerical simulations that another subclass of these protocols does better than canonical quantum error correction when the time between corrections is limited. The second main topic is development of improved overhead results for fault-tolerant computation. In particular, through analysis of topological quantum error correcting codes, it will be shown that the required blowup in depth of a noisy circuit performing a fault-tolerant computation can be reduced to a factor of O(log log L), an improvement over previous results. Showing this requires investigation into a local method of performing fault-tolerant correction on a topological code of arbitrary dimension.

  18. Fault-tolerant wait-free shared objects

    NASA Technical Reports Server (NTRS)

    Jayanti, Prasad; Chandra, Tushar D.; Toueg, Sam

    1992-01-01

    A concurrent system consists of processes communicating via shared objects, such as shared variables, queues, etc. The concept of wait-freedom was introduced to cope with process failures: each process that accesses a wait-free object is guaranteed to get a response even if all the other processes crash. However, if a wait-free object 'crashes,' all the processes that access that object are prevented from making progress. In this paper, we introduce the concept of fault-tolerant wait-free objects, and study the problem of implementing them. We give a universal method to construct fault-tolerant wait-free objects, for all types of 'responsive' failures (including one in which faulty objects may 'lie'). In sharp contrast, we prove that many common and interesting types (such as queues, sets, and test&set) have no fault-tolerant wait-free implementations even under the most benign of the 'non-responsive' types of failure. We also introduce several concepts and techniques that are central to the design of fault-tolerant concurrent systems: the concepts of self-implementation and graceful degradation, and techniques to automatically increase the fault-tolerance of implementations. We prove matching lower bounds on the resource complexity of most of our algorithms.

  19. A fault-tolerant intelligent robotic control system

    NASA Technical Reports Server (NTRS)

    Marzwell, Neville I.; Tso, Kam Sing

    1993-01-01

    This paper describes the concept, design, and features of a fault-tolerant intelligent robotic control system being developed for space and commercial applications that require high dependability. The comprehensive strategy integrates system level hardware/software fault tolerance with task level handling of uncertainties and unexpected events for robotic control. The underlying architecture for system level fault tolerance is the distributed recovery block which protects against application software, system software, hardware, and network failures. Task level fault tolerance provisions are implemented in a knowledge-based system which utilizes advanced automation techniques such as rule-based and model-based reasoning to monitor, diagnose, and recover from unexpected events. The two level design provides tolerance of two or more faults occurring serially at any level of command, control, sensing, or actuation. The potential benefits of such a fault tolerant robotic control system include: (1) a minimized potential for damage to humans, the work site, and the robot itself; (2) continuous operation with a minimum of uncommanded motion in the presence of failures; and (3) more reliable autonomous operation providing increased efficiency in the execution of robotic tasks and decreased demand on human operators for controlling and monitoring the robotic servicing routines.

  20. Software reliability through fault-avoidance and fault-tolerance

    NASA Technical Reports Server (NTRS)

    Vouk, Mladen A.; Mcallister, David F.

    1991-01-01

    Twenty independently developed but functionally equivalent software versions were used to investigate and compare empirically some properties of N-version programming, Recovery Block, and Consensus Recovery Block, using the majority and consensus voting algorithms. This was also compared with another hybrid fault-tolerant scheme called Acceptance Voting, using dynamic versions of consensus and majority voting. Consensus voting provides adaptation of the voting strategy to varying component reliability, failure correlation, and output space characteristics. Since failure correlation among versions effectively reduces the cardinality of the space in which the voter make decisions, consensus voting is usually preferable to simple majority voting in any fault-tolerant system. When versions have considerably different reliabilities, the version with the best reliability will perform better than any of the fault-tolerant techniques.

  1. Multiple Embedded Processors for Fault-Tolerant Computing

    NASA Technical Reports Server (NTRS)

    Bolotin, Gary; Watson, Robert; Katanyoutanant, Sunant; Burke, Gary; Wang, Mandy

    2005-01-01

    A fault-tolerant computer architecture has been conceived in an effort to reduce vulnerability to single-event upsets (spurious bit flips caused by impingement of energetic ionizing particles or photons). As in some prior fault-tolerant architectures, the redundancy needed for fault tolerance is obtained by use of multiple processors in one computer. Unlike prior architectures, the multiple processors are embedded in a single field-programmable gate array (FPGA). What makes this new approach practical is the recent commercial availability of FPGAs that are capable of having multiple embedded processors. A working prototype (see figure) consists of two embedded IBM PowerPC 405 processor cores and a comparator built on a Xilinx Virtex-II Pro FPGA. This relatively simple instantiation of the architecture implements an error-detection scheme. A planned future version, incorporating four processors and two comparators, would correct some errors in addition to detecting them.

  2. Measurement and analysis of operating system fault tolerance

    NASA Technical Reports Server (NTRS)

    Lee, I.; Tang, D.; Iyer, R. K.

    1992-01-01

    This paper demonstrates a methodology to model and evaluate the fault tolerance characteristics of operational software. The methodology is illustrated through case studies on three different operating systems: the Tandem GUARDIAN fault-tolerant system, the VAX/VMS distributed system, and the IBM/MVS system. Measurements are made on these systems for substantial periods to collect software error and recovery data. In addition to investigating basic dependability characteristics such as major software problems and error distributions, we develop two levels of models to describe error and recovery processes inside an operating system and on multiple instances of an operating system running in a distributed environment. Based on the models, reward analysis is conducted to evaluate the loss of service due to software errors and the effect of the fault-tolerance techniques implemented in the systems. Software error correlation in multicomputer systems is also investigated.

  3. Garbage collection: an exercise in distributed, fault-tolerant programming

    SciTech Connect

    Vestal, S.C.

    1987-01-01

    Two garbage-collection algorithms are presented to reclaim unused storage in object-oriented systems implemented on local area networks. The algorithms are fault-tolerant and allowed parallel, incremental collection in an object address space distributed throughout the system. The two approaches allow multiple collectors, so some unused storage can be reclaimed in partitioned networks. The first method makes use of fault-tolerant reference counts together with an algorithm to collect cycles of objects that would otherwise remain unclaimed. The second method adapts a parallel collector so that it can be used to collect subspaces of the entire network address space. Throughout this work concern is with a methodology for developing distributed, parallel, fault-tolerant programs. Also, there is concern with the suitability of object-oriented systems for such applications.

  4. Performance of fault-tolerant diagnostics in the hypercube systems

    SciTech Connect

    Ghafoor, A.; Sole, P.

    1989-08-01

    In this paper, they introduce the concept of fault-tolerant self-diagnosis for distributed systems and show that there exists a performance tradeoff between the complexity of a self-diagnostic algorithm and the level of fault tolerance inherited by the algorithm. For the study, they select hypercube systems and show that designing an optimal algorithm for such systems has an equivalent coding theory formulation which belongs to the class of NP-hard problems. Subsequently, they propose an ''efficient'' diagnostic scheme for these systems and study the performance tradeoff of the proposed algorithm which is based on a combinatorial structure called Hadamard matrix. The authors make an essential use of its properties of symmetrical partitioning and covering in hypercube networks. Using known translate weight distributions, they evaluated the tradeoff between the fault tolerance and traffic complexity of the proposed diagnostic algorithm for hypercubes of small sizes. An interesting compromise is exhibited for the hypercube with an arbitrary size.

  5. Fault tolerant architectures for integrated aircraft electronics systems, task 2

    NASA Technical Reports Server (NTRS)

    Levitt, K. N.; Melliar-Smith, P. M.; Schwartz, R. L.

    1984-01-01

    The architectural basis for an advanced fault tolerant on-board computer to succeed the current generation of fault tolerant computers is examined. The network error tolerant system architecture is studied with particular attention to intercluster configurations and communication protocols, and to refined reliability estimates. The diagnosis of faults, so that appropriate choices for reconfiguration can be made is discussed. The analysis relates particularly to the recognition of transient faults in a system with tasks at many levels of priority. The demand driven data-flow architecture, which appears to have possible application in fault tolerant systems is described and work investigating the feasibility of automatic generation of aircraft flight control programs from abstract specifications is reported.

  6. Rule-based fault-tolerant flight control

    NASA Technical Reports Server (NTRS)

    Handelman, Dave

    1988-01-01

    Fault tolerance has always been a desirable characteristic of aircraft. The ability to withstand unexpected changes in aircraft configuration has a direct impact on the ability to complete a mission effectively and safely. The possible synergistic effects of combining techniques of modern control theory, statistical hypothesis testing, and artificial intelligence in the attempt to provide failure accommodation for aircraft are investigated. This effort has resulted in the definition of a theory for rule based control and a system for development of such a rule based controller. Although presented here in response to the goal of aircraft fault tolerance, the rule based control technique is applicable to a wide range of complex control problems.

  7. Architectural concepts and redundancy techniques in fault-tolerant computers

    NASA Technical Reports Server (NTRS)

    Rennels, D. A.

    1974-01-01

    This paper presents a description of redundancy techniques employed in the design of fault-tolerant computers, and a discussion of the effects of functional requirements, technology constraints, and cost considerations which enter into the choice of these techniques. The STAR computer, developed at the Jet Propulsion Laboratory for long-duration planetary spacecraft missions, is discussed along with several later fault-tolerant computer designs. The class of computers described in this paper employs dynamic redundancy, i.e., the machine is divided into a set of submodules, each with standby spares; a special hard core monitor unit detects and diagnoses faults, and effects automated recovery by replacing failed parts.

  8. Single-Shot Fault-Tolerant Quantum Error Correction

    NASA Astrophysics Data System (ADS)

    Bombín, Héctor

    2015-07-01

    Conventional quantum error correcting codes require multiple rounds of measurements to detect errors with enough confidence in fault-tolerant scenarios. Here, I show that for suitable topological codes, a single round of local measurements is enough. This feature is generic and is related to self-correction and confinement phenomena in the corresponding quantum Hamiltonian model. Three-dimensional gauge color codes exhibit this single-shot feature, which also applies to initialization and gauge fixing. Assuming the time for efficient classical computations to be negligible, this yields a topological fault-tolerant quantum computing scheme where all elementary logical operations can be performed in constant time.

  9. Fault tolerant kinematic control of hyper-redundant manipulators

    NASA Technical Reports Server (NTRS)

    Bedrossian, Nazareth S.

    1994-01-01

    Hyper-redundant spatial manipulators possess fault-tolerant features because of their redundant structure. The kinematic control of these manipulators is investigated with special emphasis on fault-tolerant control. The manipulator tasks are viewed in the end-effector space while actuator commands are in joint-space, requiring an inverse kinematic algorithm to generate joint-angle commands from the end-effector ones. The rate-inverse kinematic control algorithm presented in this paper utilizes the pseudoinverse to accommodate for joint motor failures. An optimal scale factor for the robust inverse is derived.

  10. Reconfigurable tree architectures using subtree oriented fault tolerance

    NASA Technical Reports Server (NTRS)

    Lowrie, Matthew B.

    1987-01-01

    An approach to the design of reconfigurable tree architecture is presented in which spare processors are allocated at the leaves. The approach is unique in that spares are associated with subtrees and sharing of spares between these subtrees can occur. The Subtree Oriented Fault Tolerance (SOFT) approach is more reliable than previous approaches capable of tolerating link and switch failures for both single chip and multichip tree implementations while reducing redundancy in terms of both spare processors and links. VLSI layout is 0(n) for binary trees and is directly extensible to N-ary trees and fault tolerance through performance degradation.

  11. Enhanced Fault-Tolerant Quantum Computing in d -Level Systems

    NASA Astrophysics Data System (ADS)

    Campbell, Earl T.

    2014-12-01

    Error-correcting codes protect quantum information and form the basis of fault-tolerant quantum computing. Leading proposals for fault-tolerant quantum computation require codes with an exceedingly rare property, a transversal non-Clifford gate. Codes with the desired property are presented for d -level qudit systems with prime d . The codes use n =d -1 qudits and can detect up to ˜d /3 errors. We quantify the performance of these codes for one approach to quantum computation known as magic-state distillation. Unlike prior work, we find performance is always enhanced by increasing d .

  12. SIFT - Design and analysis of a fault-tolerant computer for aircraft control. [Software Implemented Fault Tolerant systems

    NASA Technical Reports Server (NTRS)

    Wensley, J. H.; Lamport, L.; Goldberg, J.; Green, M. W.; Levitt, K. N.; Melliar-Smith, P. M.; Shostak, R. E.; Weinstock, C. B.

    1978-01-01

    SIFT (Software Implemented Fault Tolerance) is an ultrareliable computer for critical aircraft control applications that achieves fault tolerance by the replication of tasks among processing units. The main processing units are off-the-shelf minicomputers, with standard microcomputers serving as the interface to the I/O system. Fault isolation is achieved by using a specially designed redundant bus system to interconnect the processing units. Error detection and analysis and system reconfiguration are performed by software. Iterative tasks are redundantly executed, and the results of each iteration are voted upon before being used. Thus, any single failure in a processing unit or bus can be tolerated with triplication of tasks, and subsequent failures can be tolerated after reconfiguration. Independent execution by separate processors means that the processors need only be loosely synchronized, and a novel fault-tolerant synchronization method is described.

  13. Trends in reliability modeling technology for fault tolerant systems

    NASA Technical Reports Server (NTRS)

    Bavuso, S. J.

    1979-01-01

    Reliability modeling for fault tolerant avionic computing systems was developed. The modeling of large systems involving issues of state size and complexity, fault coverage, and practical computation was discussed. A novel technique which provides the tool for studying the reliability of systems with nonconstant failure rates is presented. The fault latency which may provide a method of obtaining vital latent fault data is measured.

  14. Electronic Power Switch for Fault-Tolerant Networks

    NASA Technical Reports Server (NTRS)

    Volp, J.

    1987-01-01

    Power field-effect transistors reduce energy waste and simplify interconnections. Current switch containing power field-effect transistor (PFET) placed in series with each load in fault-tolerant power-distribution system. If system includes several loads and supplies, switches placed in series with adjacent loads and supplies. System of switches protects against overloads and losses of individual power sources.

  15. Adding Fault Tolerance to NPB Benchmarks Using ULFM

    SciTech Connect

    Parchman, Zachary W; Vallee, Geoffroy R; Naughton III, Thomas J; Engelmann, Christian; Bernholdt, David E; Scott, Stephen L

    2016-01-01

    In the world of high-performance computing, fault tolerance and application resilience are becoming some of the primary concerns because of increasing hardware failures and memory corruptions. While the research community has been investigating various options, from system-level solutions to application-level solutions, standards such as the Message Passing Interface (MPI) are also starting to include such capabilities. The current proposal for MPI fault tolerant is centered around the User-Level Failure Mitigation (ULFM) concept, which provides means for fault detection and recovery of the MPI layer. This approach does not address application-level recovery, which is currently left to application developers. In this work, we present a mod- ification of some of the benchmarks of the NAS parallel benchmark (NPB) to include support of the ULFM capabilities as well as application-level strategies and mechanisms for application-level failure recovery. As such, we present: (i) an application-level library to checkpoint and restore data, (ii) extensions of NPB benchmarks for fault tolerance based on different strategies, (iii) a fault injection tool, and (iv) some preliminary results that show the impact of such fault tolerant strategies on the application execution.

  16. Fault-free performance validation of fault-tolerant multiprocessors

    NASA Technical Reports Server (NTRS)

    Czeck, Edward W.; Feather, Frank E.; Grizzaffi, Ann Marie; Segall, Zary Z.; Siewiorek, Daniel P.

    1987-01-01

    A validation methodology for testing the performance of fault-tolerant computer systems was developed and applied to the Fault-Tolerant Multiprocessor (FTMP) at NASA-Langley's AIRLAB facility. This methodology was claimed to be general enough to apply to any ultrareliable computer system. The goal of this research was to extend the validation methodology and to demonstrate the robustness of the validation methodology by its more extensive application to NASA's Fault-Tolerant Multiprocessor System (FTMP) and to the Software Implemented Fault-Tolerance (SIFT) Computer System. Furthermore, the performance of these two multiprocessors was compared by conducting similar experiments. An analysis of the results shows high level language instruction execution times for both SIFT and FTMP were consistent and predictable, with SIFT having greater throughput. At the operating system level, FTMP consumes 60% of the throughput for its real-time dispatcher and 5% on fault-handling tasks. In contrast, SIFT consumes 16% of its throughput for the dispatcher, but consumes 66% in fault-handling software overhead.

  17. Abstractions for Fault-Tolerant Distributed System Verification

    NASA Technical Reports Server (NTRS)

    Pike, Lee S.; Maddalon, Jeffrey M.; Miner, Paul S.; Geser, Alfons

    2004-01-01

    Four kinds of abstraction for the design and analysis of fault tolerant distributed systems are discussed. These abstractions concern system messages, faults, fault masking voting, and communication. The abstractions are formalized in higher order logic, and are intended to facilitate specifying and verifying such systems in higher order theorem provers.

  18. Study of fault tolerant software technology for dynamic systems

    NASA Technical Reports Server (NTRS)

    Caglayan, A. K.; Zacharias, G. L.

    1985-01-01

    The major aim of this study is to investigate the feasibility of using systems-based failure detection isolation and compensation (FDIC) techniques in building fault-tolerant software and extending them, whenever possible, to the domain of software fault tolerance. First, it is shown that systems-based FDIC methods can be extended to develop software error detection techniques by using system models for software modules. In particular, it is demonstrated that systems-based FDIC techniques can yield consistency checks that are easier to implement than acceptance tests based on software specifications. Next, it is shown that systems-based failure compensation techniques can be generalized to the domain of software fault tolerance in developing software error recovery procedures. Finally, the feasibility of using fault-tolerant software in flight software is investigated. In particular, possible system and version instabilities, and functional performance degradation that may occur in N-Version programming applications to flight software are illustrated. Finally, a comparative analysis of N-Version and recovery block techniques in the context of generic blocks in flight software is presented.

  19. Clouds: A support architecture for fault tolerant, distributed systems

    NASA Technical Reports Server (NTRS)

    Dasgupta, P.; Leblanco, R. J., Jr.

    1986-01-01

    Clouds is a distributed operating system providing support for fault tolerance, location independence, reconfiguration, and transactions. The implementation paradigm uses objects and nested actions as building blocks. Subsystems and applications that can be supported by Clouds to further enhance the performance and utility of the system are also discussed.

  20. cost and benefits optimization model for fault-tolerant aircraft electronic systems

    NASA Technical Reports Server (NTRS)

    1983-01-01

    The factors involved in economic assessment of fault tolerant systems (FTS) and fault tolerant flight control systems (FTFCS) are discussed. Algorithms for optimization and economic analysis of FTFCS are documented.

  1. Non Linear Conjugate Gradient

    Energy Science and Technology Software Center (ESTSC)

    2006-11-17

    Software that simulates and inverts electromagnetic field data for subsurface electrical properties (electrical conductivity) of geological media. The software treats data produced by a time harmonic source field excitation arising from the following antenna geometery: loops and grounded bipoles, as well as point electric and magnetic dioples. The inversion process is carried out using a non-linear conjugate gradient optimization scheme, which minimizes the misfit between field data and model data using a least squares criteria.more » The software is an upgrade from the code NLCGCS_MP ver 1.0. The upgrade includes the following components: Incorporation of new 1 D field sourcing routines to more accurately simulate the 3D electromagnetic field for arbitrary geologic& media, treatment for generalized finite length transmitting antenna geometry (antennas with vertical and horizontal component directions). In addition, the software has been upgraded to treat transverse anisotropy in electrical conductivity.« less

  2. The IEEE eighteenth international symposium on fault-tolerant computing (Digest of Papers)

    SciTech Connect

    Not Available

    1988-01-01

    These proceedings collect papers on fault detection and computers. Topics include: software failure behavior, fault tolerant distributed programs, parallel simulation of faults, concurrent built-in self-test techniques, fault-tolerant parallel processor architectures, probabilistic fault diagnosis, fault tolerances in hypercube processors and cellular automation modeling.

  3. Fault Tolerance Middleware for a Multi-Core System

    NASA Technical Reports Server (NTRS)

    Some, Raphael R.; Springer, Paul L.; Zima, Hans P.; James, Mark; Wagner, David A.

    2012-01-01

    Fault Tolerance Middleware (FTM) provides a framework to run on a dedicated core of a multi-core system and handles detection of single-event upsets (SEUs), and the responses to those SEUs, occurring in an application running on multiple cores of the processor. This software was written expressly for a multi-core system and can support different kinds of fault strategies, such as introspection, algorithm-based fault tolerance (ABFT), and triple modular redundancy (TMR). It focuses on providing fault tolerance for the application code, and represents the first step in a plan to eventually include fault tolerance in message passing and the FTM itself. In the multi-core system, the FTM resides on a single, dedicated core, separate from the cores used by the application. This is done in order to isolate the FTM from application faults and to allow it to swap out any application core for a substitute. The structure of the FTM consists of an interface to a fault tolerant strategy module, a responder module, a fault manager module, an error factory, and an error mapper that determines the severity of the error. In the present reference implementation, the only fault tolerant strategy implemented is introspection. The introspection code waits for an application node to send an error notification to it. It then uses the error factory to create an error object, and at this time, a severity level is assigned to the error. The introspection code uses its built-in knowledge base to generate a recommended response to the error. Responses might include ignoring the error, logging it, rolling back the application to a previously saved checkpoint, swapping in a new node to replace a bad one, or restarting the application. The original error and recommended response are passed to the top-level fault manager module, which invokes the response. The responder module also notifies the introspection module of the generated response. This provides additional information to the

  4. Fault Tolerant Characteristics of Artificial Neural Network Electronic Hardware

    NASA Technical Reports Server (NTRS)

    Zee, Frank

    1995-01-01

    The fault tolerant characteristics of analog-VLSI artificial neural network (with 32 neurons and 532 synapses) chips are studied by exposing them to high energy electrons, high energy protons, and gamma ionizing radiations under biased and unbiased conditions. The biased chips became nonfunctional after receiving a cumulative dose of less than 20 krads, while the unbiased chips only started to show degradation with a cumulative dose of over 100 krads. As the total radiation dose increased, all the components demonstrated graceful degradation. The analog sigmoidal function of the neuron became steeper (increase in gain), current leakage from the synapses progressively shifted the sigmoidal curve, and the digital memory of the synapses and the memory addressing circuits began to gradually fail. From these radiation experiments, we can learn how to modify certain designs of the neural network electronic hardware without using radiation-hardening techniques to increase its reliability and fault tolerance.

  5. Fault Injection Campaign for a Fault Tolerant Duplex Framework

    NASA Technical Reports Server (NTRS)

    Sacco, Gian Franco; Ferraro, Robert D.; von llmen, Paul; Rennels, Dave A.

    2007-01-01

    Fault tolerance is an efficient approach adopted to avoid or reduce the damage of a system failure. In this work we present the results of a fault injection campaign we conducted on the Duplex Framework (DF). The DF is a software developed by the UCLA group [1, 2] that uses a fault tolerant approach and allows to run two replicas of the same process on two different nodes of a commercial off-the-shelf (COTS) computer cluster. A third process running on a different node, constantly monitors the results computed by the two replicas, and eventually restarts the two replica processes if an inconsistency in their computation is detected. This approach is very cost efficient and can be adopted to control processes on spacecrafts where the fault rate produced by cosmic rays is not very high.

  6. A benchmark for fault tolerant flight control evaluation

    NASA Astrophysics Data System (ADS)

    Smaili, H.; Breeman, J.; Lombaerts, T.; Stroosma, O.

    2013-12-01

    A large transport aircraft simulation benchmark (REconfigurable COntrol for Vehicle Emergency Return - RECOVER) has been developed within the GARTEUR (Group for Aeronautical Research and Technology in Europe) Flight Mechanics Action Group 16 (FM-AG(16)) on Fault Tolerant Control (2004 2008) for the integrated evaluation of fault detection and identification (FDI) and reconfigurable flight control strategies. The benchmark includes a suitable set of assessment criteria and failure cases, based on reconstructed accident scenarios, to assess the potential of new adaptive control strategies to improve aircraft survivability. The application of reconstruction and modeling techniques, based on accident flight data, has resulted in high-fidelity nonlinear aircraft and fault models to evaluate new Fault Tolerant Flight Control (FTFC) concepts and their real-time performance to accommodate in-flight failures.

  7. Fault-tolerant software for aircraft control systems

    NASA Technical Reports Server (NTRS)

    1978-01-01

    Concepts for software to implement real time aircraft control systems on a centralized digital computer were discussed. A fault tolerant software structure employing functionally redundant routines with concurrent error detection was proposed for critical control functions involving safety of flight and landing. A degraded recovery block concept was devised to allow collocation of critical and noncritical software modules within the same control structure. The additional computer resources required to implement the proposed software structure for a representative set of aircraft control functions were discussed. It was estimated that approximately 30 percent more memory space is required to implement the total set of control functions. A reliability model for the fault tolerant software was described and parametric estimates of failure rate were made.

  8. Fault-tolerant clock synchronization validation methodology. [in computer systems

    NASA Technical Reports Server (NTRS)

    Butler, Ricky W.; Palumbo, Daniel L.; Johnson, Sally C.

    1987-01-01

    A validation method for the synchronization subsystem of a fault-tolerant computer system is presented. The high reliability requirement of flight-crucial systems precludes the use of most traditional validation methods. The method presented utilizes formal design proof to uncover design and coding errors and experimentation to validate the assumptions of the design proof. The experimental method is described and illustrated by validating the clock synchronization system of the Software Implemented Fault Tolerance computer. The design proof of the algorithm includes a theorem that defines the maximum skew between any two nonfaulty clocks in the system in terms of specific system parameters. Most of these parameters are deterministic. One crucial parameter is the upper bound on the clock read error, which is stochastic. The probability that this upper bound is exceeded is calculated from data obtained by the measurement of system parameters. This probability is then included in a detailed reliability analysis of the system.

  9. Combining dynamical decoupling with fault-tolerant quantum computation

    SciTech Connect

    Ng, Hui Khoon; Preskill, John; Lidar, Daniel A.

    2011-07-15

    We study how dynamical decoupling (DD) pulse sequences can improve the reliability of quantum computers. We prove upper bounds on the accuracy of DD-protected quantum gates and derive sufficient conditions for DD-protected gates to outperform unprotected gates. Under suitable conditions, fault-tolerant quantum circuits constructed from DD-protected gates can tolerate stronger noise and have a lower overhead cost than fault-tolerant circuits constructed from unprotected gates. Our accuracy estimates depend on the dynamics of the bath that couples to the quantum computer and can be expressed either in terms of the operator norm of the bath's Hamiltonian or in terms of the power spectrum of bath correlations; we explain in particular how the performance of recursively generated concatenated pulse sequences can be analyzed from either viewpoint. Our results apply to Hamiltonian noise models with limited spatial correlations.

  10. Fault Tolerance in ZigBee Wireless Sensor Networks

    NASA Technical Reports Server (NTRS)

    Alena, Richard; Gilstrap, Ray; Baldwin, Jarren; Stone, Thom; Wilson, Pete

    2011-01-01

    Wireless sensor networks (WSN) based on the IEEE 802.15.4 Personal Area Network standard are finding increasing use in the home automation and emerging smart energy markets. The network and application layers, based on the ZigBee 2007 PRO Standard, provide a convenient framework for component-based software that supports customer solutions from multiple vendors. This technology is supported by System-on-a-Chip solutions, resulting in extremely small and low-power nodes. The Wireless Connections in Space Project addresses the aerospace flight domain for both flight-critical and non-critical avionics. WSNs provide the inherent fault tolerance required for aerospace applications utilizing such technology. The team from Ames Research Center has developed techniques for assessing the fault tolerance of ZigBee WSNs challenged by radio frequency (RF) interference or WSN node failure.

  11. Fault recovery characteristics of the fault tolerant multi-processor

    NASA Technical Reports Server (NTRS)

    Padilla, Peter A.

    1990-01-01

    The fault handling performance of the fault tolerant multiprocessor (FTMP) was investigated. Fault handling errors detected during fault injection experiments were characterized. In these fault injection experiments, the FTMP disabled a working unit instead of the faulted unit once every 500 faults, on the average. System design weaknesses allow active faults to exercise a part of the fault management software that handles byzantine or lying faults. It is pointed out that these weak areas in the FTMP's design increase the probability that, for any hardware fault, a good LRU (line replaceable unit) is mistakenly disabled by the fault management software. It is concluded that fault injection can help detect and analyze the behavior of a system in the ultra-reliable regime. Although fault injection testing cannot be exhaustive, it has been demonstrated that it provides a unique capability to unmask problems and to characterize the behavior of a fault-tolerant system.

  12. Fault-tolerant building-block computer study

    NASA Technical Reports Server (NTRS)

    Rennels, D. A.

    1978-01-01

    Ultra-reliable core computers are required for improving the reliability of complex military systems. Such computers can provide reliable fault diagnosis, failure circumvention, and, in some cases serve as an automated repairman for their host systems. A small set of building-block circuits which can be implemented as single very large integration devices, and which can be used with off-the-shelf microprocessors and memories to build self checking computer modules (SCCM) is described. Each SCCM is a microcomputer which is capable of detecting its own faults during normal operation and is described to communicate with other identical modules over one or more Mil Standard 1553A buses. Several SCCMs can be connected into a network with backup spares to provide fault-tolerant operation, i.e. automated recovery from faults. Alternative fault-tolerant SCCM configurations are discussed along with the cost and reliability associated with their implementation.

  13. Survey of fault-tolerant multistage networks and comparison to the extra stage cube

    SciTech Connect

    Adams, G.B. III; Siegel, H.J.

    1984-01-01

    A variety of fault-tolerant multistage interconnection networks for parallel processing systems that have been proposed in the literature are surveyed. A network is fault-tolerant if it can continue to meet its fault tolerance criterion in the presence of one or more failures of the type(s) allowed by its fault model. Significant differences in fault models and fault-tolerance criteria exist among various fault-tolerant networks. This makes direct comparison of these networks difficult. In analyzing the networks, this paper compares the various models and assesses the effect of choosing a common model and criterion. Network characteristics such as degree of fault tolerance, routing control method, and permutation capability are discussed. The networks surveyed and compared to the extra stage cube are the modified baseline, augmented delta, f-network, enhanced inverse augmented data manipulator, gamma, fault-tolerant Benes, and beta-networks. 21 references.

  14. Bounded-time fault-tolerant rule-based systems

    NASA Technical Reports Server (NTRS)

    Browne, James C.; Emerson, Allen; Gouda, Mohamed; Miranker, Daniel; Mok, Aloysius; Rosier, Louis

    1990-01-01

    Two systems concepts are introduced: bounded response-time and self-stabilization in the context of rule-based programs. These concepts are essential for the design of rule-based programs which must be highly fault tolerant and perform in a real time environment. The mechanical analysis of programs for these two properties is discussed. The techniques are used to analyze a NASA application.

  15. Formal Techniques for Synchronized Fault-Tolerant Systems

    NASA Technical Reports Server (NTRS)

    DiVito, Ben L.; Butler, Ricky W.

    1992-01-01

    We present the formal verification of synchronizing aspects of the Reliable Computing Platform (RCP), a fault-tolerant computing system for digital flight control applications. The RCP uses NMR-style redundancy to mask faults and internal majority voting to purge the effects of transient faults. The system design has been formally specified and verified using the EHDM verification system. Our formalization is based on an extended state machine model incorporating snapshots of local processors clocks.

  16. Design methods for fault-tolerant finite state machines

    NASA Technical Reports Server (NTRS)

    Niranjan, Shailesh; Frenzel, James F.

    1993-01-01

    VLSI electronic circuits are increasingly being used in space-borne applications where high levels of radiation may induce faults, known as single event upsets. In this paper we review the classical methods of designing fault tolerant digital systems, with an emphasis on those methods which are particularly suitable for VLSI-implementation of finite state machines. Four methods are presented and will be compared in terms of design complexity, circuit size, and estimated circuit delay.

  17. Decomposition in reliability analysis of fault-tolerant systems

    NASA Technical Reports Server (NTRS)

    Trivedi, K. S.; Geist, R. M.

    1983-01-01

    The existing approaches to reliability modeling are briefly reviewed. An examination of the limitations of the existing approaches in modeling ultrareliable fault-tolerant systems illustrates the need to use decomposition techniques. The notion of behavioral decomposition is introduced for dealing with reliability models with a large number of states, and a series of examples is presented. The CARE (computer-aided reliability estimation) and HARP (hybrid automated reliability predictor) approaches to reliability are discussed.

  18. Using certification trails to achieve software fault tolerance

    NASA Technical Reports Server (NTRS)

    Sullivan, Gregory F.; Masson, Gerald M.

    1993-01-01

    A conceptually novel and powerful technique to achieve fault tolerance in hardware and software systems is introduced. When used for software fault tolerance, this new technique uses time and software redundancy and can be outlined as follows. In the initial phase, a program is run to solve a problem and store the result. In addition, this program leaves behind a trail of data called a certification trail. In the second phase, another program is run which solves the original problem again. This program, however, has access to the certification trail left by the first program. Because of the availability of the certification trail, the second phase can be performed by a less complex program and can execute more quickly. In the final phase, the two results are accepted as correct; otherwise an error is indicated. An essential aspect of this approach is that the second program must always generate either an error indication or a correct output even when the certification trail it receives from the first program is incorrect. The certification trail approach to fault tolerance was formalized and it was illustrated by applying it to the fundamental problem of finding a minimum spanning tree. Cases in which the second phase can be run concorrectly with the first and act as a monitor are discussed. The certification trail approach was compared to other approaches to fault tolerance. Because of space limitations we have omitted examples of our technique applied to the Huffman tree, and convex hull problems. These can be found in the full version of this paper.

  19. Fault tolerant sequential circuits using sequence invariant state machines

    NASA Technical Reports Server (NTRS)

    Alahmad, M.; Whitaker, S.

    1991-01-01

    The idea of introducing redundancy to improve the reliability of digital systems originates from papers published in the 1950's. Since then, redundancy has been recognized as a realistic means for constructing reliable systems. A method using redundancy to reconfigure the Sequency Invariant State Machine (SISM) to achieve fault tolerance is introduced. This new architecture is most useful in space applications, where recovery rather than replacement of faulty modules is the only means of maintenance.

  20. The art of fault-tolerant system reliability modeling

    NASA Technical Reports Server (NTRS)

    Butler, Ricky W.; Johnson, Sally C.

    1990-01-01

    A step-by-step tutorial of the methods and tools used for the reliability analysis of fault-tolerant systems is presented. Emphasis is on the representation of architectural features in mathematical models. Details of the mathematical solution of complex reliability models are not presented. Instead the use of several recently developed computer programs--SURE, ASSIST, STEM, PAWS--which automate the generation and solution of these models is described.

  1. Validation of a fault-tolerant clock synchronization system

    NASA Technical Reports Server (NTRS)

    Butler, R. W.; Johnson, S. C.

    1984-01-01

    A validation method for the synchronization subsystem of a fault tolerant computer system is investigated. The method combines formal design verification with experimental testing. The design proof reduces the correctness of the clock synchronization system to the correctness of a set of axioms which are experimentally validated. Since the reliability requirements are often extreme, requiring the estimation of extremely large quantiles, an asymptotic approach to estimation in the tail of a distribution is employed.

  2. A fault-tolerant one-way quantum computer

    SciTech Connect

    Raussendorf, R. . E-mail: rraussendorf@perimeterinstitute.ca; Harrington, J.; Goyal, K.

    2006-09-15

    We describe a fault-tolerant one-way quantum computer on cluster states in three dimensions. The presented scheme uses methods of topological error correction resulting from a link between cluster states and surface codes. The error threshold is 1.4% for local depolarizing error and 0.11% for each source in an error model with preparation-, gate-, storage-, and measurement errors.

  3. Fault-tolerant Landau-Zener quantum gates

    SciTech Connect

    Hicke, C.; Santos, L. F.; Dykman, M. I.

    2006-01-15

    We present a method to perform fault-tolerant single-qubit gate operations using Landau-Zener tunneling. In a single Landau-Zener pulse, the qubit transition frequency is varied in time so that it passes through the frequency of the radiation field. We show that a simple three-pulse sequence allows eliminating errors in the gate up to the third order in errors in the qubit energies or the radiation frequency.

  4. ROBUS-2: A Fault-Tolerant Broadcast Communication System

    NASA Technical Reports Server (NTRS)

    Torres-Pomales, Wilfredo; Malekpour, Mahyar R.; Miner, Paul S.

    2005-01-01

    The Reliable Optical Bus (ROBUS) is the core communication system of the Scalable Processor-Independent Design for Enhanced Reliability (SPIDER), a general-purpose fault-tolerant integrated modular architecture currently under development at NASA Langley Research Center. The ROBUS is a time-division multiple access (TDMA) broadcast communication system with medium access control by means of time-indexed communication schedule. ROBUS-2 is a developmental version of the ROBUS providing guaranteed fault-tolerant services to the attached processing elements (PEs), in the presence of a bounded number of faults. These services include message broadcast (Byzantine Agreement), dynamic communication schedule update, clock synchronization, and distributed diagnosis (group membership). The ROBUS also features fault-tolerant startup and restart capabilities. ROBUS-2 is tolerant to internal as well as PE faults, and incorporates a dynamic self-reconfiguration capability driven by the internal diagnostic system. This version of the ROBUS is intended for laboratory experimentation and demonstrations of the capability to reintegrate failed nodes, dynamically update the communication schedule, and tolerate and recover from correlated transient faults.

  5. Data-driven Fault Tolerance for Work Stealing Computations

    SciTech Connect

    Ma, Wenjing; Krishnamoorthy, Sriram

    2012-06-25

    Checkpoint-restart approaches to fault tolerance typically roll back all the processes to the previous checkpoint in the event of a failure. Work stealing is a promising technique to dynamically tolerate variations in the execution environment, including faults, system noise, and energy constraints. In this paper, we present fault tolerance mechanisms for task parallel computations, a popular computation idiom, employing work stealing. The computation is organized as a collection of tasks with data in a global address space. The completion of data operations, rather than the actual messages, is tracked to derive an idempotent data store. This information is used to accurately identify the tasks to be re-executed, therefore to recompute only the lost data, in the presence of random work stealing. We consider three recovery schemes that present distinct trade-offs -- lazy recovery with potentially increased re-execution cost, immediate collective recovery with associated synchronization overheads, and noncollective recovery enabled by additional communication. We employ distributed work stealing to dynamically rebalance the tasks on the live processes and evaluate the three schemes using candidate application benchmarks. We demonstrate that the overheads (space and time) of the fault tolerance mechanism are low, the cost incurred due to failures are small, and the overheads decrease with per-process work at scale.

  6. Faster quantum chemistry simulation on fault-tolerant quantum computers

    NASA Astrophysics Data System (ADS)

    Cody Jones, N.; Whitfield, James D.; McMahon, Peter L.; Yung, Man-Hong; Van Meter, Rodney; Aspuru-Guzik, Alán; Yamamoto, Yoshihisa

    2012-11-01

    Quantum computers can in principle simulate quantum physics exponentially faster than their classical counterparts, but some technical hurdles remain. We propose methods which substantially improve the performance of a particular form of simulation, ab initio quantum chemistry, on fault-tolerant quantum computers; these methods generalize readily to other quantum simulation problems. Quantum teleportation plays a key role in these improvements and is used extensively as a computing resource. To improve execution time, we examine techniques for constructing arbitrary gates which perform substantially faster than circuits based on the conventional Solovay-Kitaev algorithm (Dawson and Nielsen 2006 Quantum Inform. Comput. 6 81). For a given approximation error ɛ, arbitrary single-qubit gates can be produced fault-tolerantly and using a restricted set of gates in time which is O(log ɛ) or O(log log ɛ) with sufficient parallel preparation of ancillas, constant average depth is possible using a method we call programmable ancilla rotations. Moreover, we construct and analyze efficient implementations of first- and second-quantized simulation algorithms using the fault-tolerant arbitrary gates and other techniques, such as implementing various subroutines in constant time. A specific example we analyze is the ground-state energy calculation for lithium hydride.

  7. Active Fault Tolerant Control for Ultrasonic Piezoelectric Motor

    NASA Astrophysics Data System (ADS)

    Boukhnifer, Moussa

    2012-07-01

    Ultrasonic piezoelectric motor technology is an important system component in integrated mechatronics devices working on extreme operating conditions. Due to these constraints, robustness and performance of the control interfaces should be taken into account in the motor design. In this paper, we apply a new architecture for a fault tolerant control using Youla parameterization for an ultrasonic piezoelectric motor. The distinguished feature of proposed controller architecture is that it shows structurally how the controller design for performance and robustness may be done separately which has the potential to overcome the conflict between performance and robustness in the traditional feedback framework. A fault tolerant control architecture includes two parts: one part for performance and the other part for robustness. The controller design works in such a way that the feedback control system will be solely controlled by the proportional plus double-integral PI2 performance controller for a nominal model without disturbances and H∞ robustification controller will only be activated in the presence of the uncertainties or an external disturbances. The simulation results demonstrate the effectiveness of the proposed fault tolerant control architecture.

  8. Evaluation of reliability modeling tools for advanced fault tolerant systems

    NASA Technical Reports Server (NTRS)

    Baker, Robert; Scheper, Charlotte

    1986-01-01

    The Computer Aided Reliability Estimation (CARE III) and Automated Reliability Interactice Estimation System (ARIES 82) reliability tools for application to advanced fault tolerance aerospace systems were evaluated. To determine reliability modeling requirements, the evaluation focused on the Draper Laboratories' Advanced Information Processing System (AIPS) architecture as an example architecture for fault tolerance aerospace systems. Advantages and limitations were identified for each reliability evaluation tool. The CARE III program was designed primarily for analyzing ultrareliable flight control systems. The ARIES 82 program's primary use was to support university research and teaching. Both CARE III and ARIES 82 were not suited for determining the reliability of complex nodal networks of the type used to interconnect processing sites in the AIPS architecture. It was concluded that ARIES was not suitable for modeling advanced fault tolerant systems. It was further concluded that subject to some limitations (the difficulty in modeling systems with unpowered spare modules, systems where equipment maintenance must be considered, systems where failure depends on the sequence in which faults occurred, and systems where multiple faults greater than a double near coincident faults must be considered), CARE III is best suited for evaluating the reliability of advanced tolerant systems for air transport.

  9. Scalable and Fault Tolerant Failure Detection and Consensus

    SciTech Connect

    Katti, Amogh; Di Fatta, Giuseppe; Naughton III, Thomas J; Engelmann, Christian

    2015-01-01

    Future extreme-scale high-performance computing systems will be required to work under frequent component failures. The MPI Forum's User Level Failure Mitigation proposal has introduced an operation, MPI_Comm_shrink, to synchronize the alive processes on the list of failed processes, so that applications can continue to execute even in the presence of failures by adopting algorithm-based fault tolerance techniques. This MPI_Comm_shrink operation requires a fault tolerant failure detection and consensus algorithm. This paper presents and compares two novel failure detection and consensus algorithms. The proposed algorithms are based on Gossip protocols and are inherently fault-tolerant and scalable. The proposed algorithms were implemented and tested using the Extreme-scale Simulator. The results show that in both algorithms the number of Gossip cycles to achieve global consensus scales logarithmically with system size. The second algorithm also shows better scalability in terms of memory and network bandwidth usage and a perfect synchronization in achieving global consensus.

  10. Resource requirements for a fault-tolerant quantum Fourier transform

    NASA Astrophysics Data System (ADS)

    Goto, Hayato

    2014-11-01

    We investigate resource requirements for a fault-tolerant quantum Fourier transform. The quantum Fourier transform is a basic subroutine for quantum algorithms which provide an exponential speedup over known classical ones, such as Shor's algorithm for factoring. To implement single-qubit rotations required for a quantum Fourier transform in a fault-tolerant manner, we consider two types of approaches: gate synthesis and state distillation. While the gate synthesis approximates single-qubit rotations with basic quantum operations, the state distillation allows one to perform single-qubit rotations for a quantum Fourier transform exactly. It is unknown, however, which approach is better for a quantum Fourier transform. Here we develop a state-distillation method optimized for a quantum Fourier transform and compare this performance with those of state-of-the-art techniques for gate synthesis without and with ancillary states (ancillas). The performance is evaluated with the resource requirement for a quantum Fourier transform. The resource is measured by the total number of π /8 gates denoted by T , which is called the T count. Contrary to the expectation, the T count for the state distillation is considerably larger than those for the ancilla-free and ancilla-assisted gate synthesis. Thus, we conclude that the ancilla-assisted gate synthesis is a better approach to a fault-tolerant quantum Fourier transform.

  11. Algorithm-dependent fault tolerance for distributed computing

    SciTech Connect

    P. D. Hough; M. e. Goldsby; E. J. Walsh

    2000-02-01

    Large-scale distributed systems assembled from commodity parts, like CPlant, have become common tools in the distributed computing world. Because of their size and diversity of parts, these systems are prone to failures. Applications that are being run on these systems have not been equipped to efficiently deal with failures, nor is there vendor support for fault tolerance. Thus, when a failure occurs, the application crashes. While most programmers make use of checkpoints to allow for restarting of their applications, this is cumbersome and incurs substantial overhead. In many cases, there are more efficient and more elegant ways in which to address failures. The goal of this project is to develop a software architecture for the detection of and recovery from faults in a cluster computing environment. The detection phase relies on the latest techniques developed in the fault tolerance community. Recovery is being addressed in an application-dependent manner, thus allowing the programmer to take advantage of algorithmic characteristics to reduce the overhead of fault tolerance. This architecture will allow large-scale applications to be more robust in high-performance computing environments that are comprised of clusters of commodity computers such as CPlant and SMP clusters.

  12. Exploiting data representation for fault tolerance

    DOE PAGESBeta

    Hoemmen, Mark Frederick; Elliott, J.; Sandia National Lab.; Mueller, F.

    2015-01-06

    Incorrect computer hardware behavior may corrupt intermediate computations in numerical algorithms, possibly resulting in incorrect answers. Prior work models misbehaving hardware by randomly flipping bits in memory. We start by accepting this premise, and present an analytic model for the error introduced by a bit flip in an IEEE 754 floating-point number. We then relate this finding to the linear algebra concepts of normalization and matrix equilibration. In particular, we present a case study illustrating that normalizing both vector inputs of a dot product minimizes the probability of a single bit flip causing a large error in the dot product'smore » result. Moreover, the absolute error is either less than one or very large, which allows detection of large errors. Then, we apply this to the GMRES iterative solver. We count all possible errors that can be introduced through faults in arithmetic in the computationally intensive orthogonalization phase of GMRES, and show that when the matrix is equilibrated, the absolute error is bounded above by one.« less

  13. Exploiting data representation for fault tolerance

    SciTech Connect

    Hoemmen, Mark Frederick; Elliott, J.; Mueller, F.

    2015-01-06

    Incorrect computer hardware behavior may corrupt intermediate computations in numerical algorithms, possibly resulting in incorrect answers. Prior work models misbehaving hardware by randomly flipping bits in memory. We start by accepting this premise, and present an analytic model for the error introduced by a bit flip in an IEEE 754 floating-point number. We then relate this finding to the linear algebra concepts of normalization and matrix equilibration. In particular, we present a case study illustrating that normalizing both vector inputs of a dot product minimizes the probability of a single bit flip causing a large error in the dot product's result. Moreover, the absolute error is either less than one or very large, which allows detection of large errors. Then, we apply this to the GMRES iterative solver. We count all possible errors that can be introduced through faults in arithmetic in the computationally intensive orthogonalization phase of GMRES, and show that when the matrix is equilibrated, the absolute error is bounded above by one.

  14. The non-linear MSW equation

    NASA Astrophysics Data System (ADS)

    Thomson, Mark J.; McKellar, Bruce H. J.

    1991-04-01

    A simple, non-linear generalization of the MSW equation is presented and its analytic solution is outlined. The orbits of the polarization vector are shown to be periodic, and to lie on a sphere. Their non-trivial flow patterns fall into two topological categories, the more complex of which can become chaotic if perturbed.

  15. Evaluation Applied to Reliability Analysis of Reconfigurable, Highly Reliable, Fault-Tolerant, Computing Systems for Avionics

    NASA Technical Reports Server (NTRS)

    Migneault, G. E.

    1979-01-01

    Emulation techniques are proposed as a solution to a difficulty arising in the analysis of the reliability of highly reliable computer systems for future commercial aircraft. The difficulty, viz., the lack of credible precision in reliability estimates obtained by analytical modeling techniques are established. The difficulty is shown to be an unavoidable consequence of: (1) a high reliability requirement so demanding as to make system evaluation by use testing infeasible, (2) a complex system design technique, fault tolerance, (3) system reliability dominated by errors due to flaws in the system definition, and (4) elaborate analytical modeling techniques whose precision outputs are quite sensitive to errors of approximation in their input data. The technique of emulation is described, indicating how its input is a simple description of the logical structure of a system and its output is the consequent behavior. The use of emulation techniques is discussed for pseudo-testing systems to evaluate bounds on the parameter values needed for the analytical techniques.

  16. Engineering Non-Classical Light with Non-Linear Microwaveguides

    NASA Astrophysics Data System (ADS)

    Grimsmo, Arne; Clerk, Aashish; Blais, Alexandre

    The quest for ever increasing fidelity and scalability in measurement of superconducting qubits to be used for fault-tolerant quantum computing has recently led to the development of near quantum-limited broadband phase preserving amplifiers in the microwave regime. These devices are, however, more than just amplifiers: They are sources of high-quality, broadband two-mode squeezed light. We show how bottom-up engineering of Josephson junction embedded waveguides can be used to design novel squeezing spectra. Furthermore, the entanglement in the two-mode squeezed output field can be imprinted onto quantum systems coupled to the device's output. These broadband microwave amplifiers constitute a realization of non-linear waveguide QED, a very interesting playground for non-equilibrium many-body physics.

  17. Energy dissipation and error probability in fault-tolerant binary switching

    PubMed Central

    Fashami, Mohammad Salehi; Atulasimha, Jayasimha; Bandyopadhyay, Supriyo

    2013-01-01

    The potential energy profile of an ideal binary switch is a symmetric double well. Switching between the wells without energy dissipation requires time-modulating the height of the potential barrier separating the wells and tilting the profile towards the desired well at the precise juncture when the barrier disappears. This, however, demands perfect timing synchronization and is therefore fault-intolerant even in the absence of noise. A fault-tolerant strategy that requires no time modulation of the barrier (and hence no timing synchronization) switches by tilting the profile by an amount at least equal to the barrier height and dissipates at least that amount of energy in abrupt switching. Here, we present a third strategy that requires a time modulated barrier but no timing synchronization. It is therefore fault-tolerant, error-free in the absence of thermal noise, and yet it dissipates arbitrarily small energy in a noise-free environment since an arbitrarily small tilt is required for slow switching. This case is exemplified with stress induced switching of a shape-anisotropic single-domain soft nanomagnet dipole-coupled to a hard magnet. When thermal noise is present, we show analytically that the minimum energy dissipated to switch in this scheme is ~2kTln(1/p) [p = switching error probability]. PMID:24220310

  18. Non-linear oscillations

    NASA Astrophysics Data System (ADS)

    Hagedorn, P.

    The mathematical pendulum is used to provide a survey of free and forced oscillations in damped and undamped systems. This simple model is employed to present illustrations for and comparisons between the various approximation schemes. A summary of the Liapunov stability theory is provided. The first and the second method of Liapunov are explained for autonomous as well as for nonautonomous systems. Here, a basic familiarity with the theory of linear oscillations is assumed. La Salle's theorem about the stability of invariant domains is explained in terms of illustrative examples. Self-excited oscillations are examined, taking into account such oscillations in mechanical and electrical systems, analytical approximation methods for the computation of self-excited oscillations, analytical criteria for the existence of limit cycles, forced oscillations in self-excited systems, and self-excited oscillations in systems with several degrees of freedom. Attention is given to Hamiltonian systems and an introduction to the theory of optimal control is provided.

  19. Prediction of pressure drawdown in gas reservoirs using a semi-analytical solution of the non-linear gas flow equation

    SciTech Connect

    Mattar, L.; Adegbesan, L.O.

    1980-01-01

    The differential equation for flow of gases in a porous medium is nonlinear and cannot be solved by strictly analytical methods. Previous studies in the literature have obtained analytical solutions to this equation by linearlization (i.e., treating viscosity and compressibilty as constant). In this study, the solution for nonlinear gas flow equation is obtained using the semianalytical technique developed by Kale and Mattar which solves the nonlinear equation by the method of perturbation. Results obtained, for prediction of pressure drawdown in gas reservoirs, indicate that the solution of the linearlized form of the equation is valid for both low and high permeability reservoirs.

  20. Multi-fault Tolerance for Cartesian Data Distributions

    SciTech Connect

    Ali, Nawab; Krishnamoorthy, Sriram; Halappanavar, Mahantesh; Daily, Jeffrey A.

    2013-06-01

    Faults are expected to play an increasingly important role in how algorithms and applications are designed to run on future extreme-scale sys- tems. Algorithm-based fault tolerance (ABFT) is a promising approach that involves modications to the algorithm to recover from faults with lower over- heads than replicated storage and a signicant reduction in lost work compared to checkpoint-restart techniques. Fault-tolerant linear algebra (FTLA) algo- rithms employ additional processors that store parities along the dimensions of a matrix to tolerate multiple, simultaneous faults. Existing approaches as- sume regular data distributions (blocked or block-cyclic) with the failures of each data block being independent. To match the characteristics of failures on parallel computers, we extend these approaches to mapping parity blocks in several important ways. First, we handle parity computation for generalized Cartesian data distributions with each processor holding arbitrary subsets of blocks in a Cartesian-distributed array. Second, techniques to handle corre- lated failures, i.e., multiple processors that can be expected to fail together, are presented. Third, we handle the colocation of parity blocks with the data blocks and do not require them to be on additional processors. Several al- ternative approaches, based on graph matching, are presented that attempt to balance the memory overhead on processors while guaranteeing the same fault tolerance properties as existing approaches that assume independent fail- ures on regular blocked data distributions. The evaluation of these algorithms demonstrates that the additional desirable properties are provided by the pro- posed approach with minimal overhead.

  1. Minimizing resource overheads for fault-tolerant preparation of encoded states of the Steane code

    PubMed Central

    Goto, Hayato

    2016-01-01

    The seven-qubit quantum error-correcting code originally proposed by Steane is one of the best known quantum codes. The Steane code has a desirable property that most basic operations can be performed easily in a fault-tolerant manner. A major obstacle to fault-tolerant quantum computation with the Steane code is fault-tolerant preparation of encoded states, which requires large computational resources. Here we propose efficient state preparation methods for zero and magic states encoded with the Steane code, where the zero state is one of the computational basis states and the magic state allows us to achieve universality in fault-tolerant quantum computation. The methods minimize resource overheads for the fault-tolerant state preparation, and therefore reduce necessary resources for quantum computation with the Steane code. Thus, the present results will open a new possibility for efficient fault-tolerant quantum computation. PMID:26812959

  2. Implementing fault tolerance in a superconducting quantum circuit

    NASA Astrophysics Data System (ADS)

    Barends, Rami

    2015-03-01

    The surface code error correction scheme is appealing for superconducting circuits as the fundamental operations have been demonstrated at the fault-tolerant threshold. Here, we present experimental results on the repetition code, a one-dimensional primitive of the surface code which can detect bit-flip errors, implemented on a device consisting of nine Xmon transmon qubits. We discuss the basic mechanics of error detection, show preservation of a Greenberger-Horne-Zeilinger state, and show suppression of environmentally-induced error.

  3. Fault tolerant issues in the BTeV trigger

    SciTech Connect

    Jeffrey A. Appel et al.

    2002-12-03

    The BTeV trigger performs sophisticated computations using large ensembles of FPGAs, DSPs, and conventional microprocessors. This system will have between 5,000 and 10,000 computing elements and many networks and data switches. While much attention has been devoted to developing efficient algorithms, the need for fault-tolerant, fault-adaptive, and flexible techniques and software to manage this huge computing platform has been identified as one of the most challenging aspects of this project. They describe the problem and offer an approach to solving it based on a distributed, hierarchical fault management system.

  4. A Test Generation Framework for Distributed Fault-Tolerant Algorithms

    NASA Technical Reports Server (NTRS)

    Goodloe, Alwyn; Bushnell, David; Miner, Paul; Pasareanu, Corina S.

    2009-01-01

    Heavyweight formal methods such as theorem proving have been successfully applied to the analysis of safety critical fault-tolerant systems. Typically, the models and proofs performed during such analysis do not inform the testing process of actual implementations. We propose a framework for generating test vectors from specifications written in the Prototype Verification System (PVS). The methodology uses a translator to produce a Java prototype from a PVS specification. Symbolic (Java) PathFinder is then employed to generate a collection of test cases. A small example is employed to illustrate how the framework can be used in practice.

  5. CMOS processor element for a fault-tolerant SVD array

    NASA Astrophysics Data System (ADS)

    Kota, Kishore; Cavallaro, Joseph R.

    1993-11-01

    This paper describes the VLSI implementation of a CORDIC based processor element for use in a fault-reconfigurable systolic array to compute the singular value decomposition (SVD) of a matrix. The chip implements a time redundant fault tolerance scheme, which allows processors adjacent to a faulty processor to act as computation backup during the systolic idle time. Also, processors around a fault collaborate to reroute data around the faulty processor. This form of time redundancy is attractive when tolerance to a few faults needs to be achieved with little hardware overhead.

  6. Computer-aided reliability estimation. [for fault-tolerant systems

    NASA Technical Reports Server (NTRS)

    Stiffler, J. J.

    1977-01-01

    Computer-aided reliability estimation (CARE) programs are developed to improve the tools available for estimating the reliability of fault-tolerant systems. A description is presented of a program, called CARE II, which was developed after the first program reported by Mathur (1971). Attention is given to the CARE II reliability model, the CARE II coverage model, and CARE II limitations which are to be rectified in CARE III. It is pointed out that the present coverage model in CARE II is extremely versatile. The major limitation is related to the burden placed on the user to determine the basic parameters from which the coverage calculations are made.

  7. Parameter Transient Behavior Analysis on Fault Tolerant Control System

    NASA Technical Reports Server (NTRS)

    Belcastro, Christine (Technical Monitor); Shin, Jong-Yeob

    2003-01-01

    In a fault tolerant control (FTC) system, a parameter varying FTC law is reconfigured based on fault parameters estimated by fault detection and isolation (FDI) modules. FDI modules require some time to detect fault occurrences in aero-vehicle dynamics. This paper illustrates analysis of a FTC system based on estimated fault parameter transient behavior which may include false fault detections during a short time interval. Using Lyapunov function analysis, the upper bound of an induced-L2 norm of the FTC system performance is calculated as a function of a fault detection time and the exponential decay rate of the Lyapunov function.

  8. Programs For Modeling Fault-Tolerant Computing Systems

    NASA Technical Reports Server (NTRS)

    Butler, Ricky W.

    1991-01-01

    Pade Approximation with Scaling, (PAWS) and Scaling Taylor Exponential Matrix (STEM) computer programs are software tools for design and validation. Provide flexible, user-friendly, language-based interface for input of Markov mathematical methods describing behaviors of fault-tolerant computer systems. Markov models include both recovery from faults via reconfiguration and behaviors of such systems when faults occur. PAWS and STEM produce exact solutions of probability of system failure and provide conservative estimate of number of significant digits in solution. Written in PASCAL and FORTRAN.

  9. Software Implemented Fault-Tolerant (SIFT) user's guide

    NASA Technical Reports Server (NTRS)

    Green, D. F., Jr.; Palumbo, D. L.; Baltrus, D. W.

    1984-01-01

    Program development for a Software Implemented Fault Tolerant (SIFT) computer system is accomplished in the NASA LaRC AIRLAB facility using a DEC VAX-11 to interface with eight Bendix BDX 930 flight control processors. The interface software which provides this SIFT program development capability was developed by AIRLAB personnel. This technical memorandum describes the application and design of this software in detail, and is intended to assist both the user in performance of SIFT research and the systems programmer responsible for maintaining and/or upgrading the SIFT programming environment.

  10. Reliability analysis of fault-tolerant reconfigurable nano-architectures

    SciTech Connect

    Bhaduri, D.; Graham, P. S.; Shukla, S. K.

    2004-01-01

    Manufacturing defects and transient errors will be abundant in high - density reconfigurable nano-scale designs. Recently, we have automated a computational scheme based on Markov Random Field (MRF) and Belief Propagation algorithms in a tool named NANOLAB to evaluate the reliability of nano architectures. In this paper, we show how our methodology can be exploited to design defect- and fault-tolerant programmable logic architectures. The effectiveness of such automation is illustrated by analyzing reconfigurable Boolean networks formed using different industry-based configurable logic blocks (CLBs), both in the presence of thermal perturbations and signal noise.

  11. Single event upset tests of a RISC-based fault-tolerant computer

    SciTech Connect

    Kimbrough, J.R.; Butner, D.N.; Colella, N.J.; Kaschmitter, J.L.; Shaeffer, D.L.; McKnett, C.L.; Coakley, P.G.; Casteneda, C.

    1996-03-23

    The project successfully demonstrated that dual lock-step comparison of commercial RISC processors is a viable fault-tolerant approach to handling SEU in space environment. The fault tolerant approach on orbit error rate was 38 times less than the single processor error rate. The random nature of the upsets and appearance in critical code section show it is essential to incorporate both hardware and software in the design and operation of fault-tolerant computers.

  12. Validation Methods for Fault-Tolerant avionics and control systems, working group meeting 1

    NASA Technical Reports Server (NTRS)

    1979-01-01

    The proceedings of the first working group meeting on validation methods for fault tolerant computer design are presented. The state of the art in fault tolerant computer validation was examined in order to provide a framework for future discussions concerning research issues for the validation of fault tolerant avionics and flight control systems. The development of positions concerning critical aspects of the validation process are given.

  13. Novel neural networks-based fault tolerant control scheme with fault alarm.

    PubMed

    Shen, Qikun; Jiang, Bin; Shi, Peng; Lim, Cheng-Chew

    2014-11-01

    In this paper, the problem of adaptive active fault-tolerant control for a class of nonlinear systems with unknown actuator fault is investigated. The actuator fault is assumed to have no traditional affine appearance of the system state variables and control input. The useful property of the basis function of the radial basis function neural network (NN), which will be used in the design of the fault tolerant controller, is explored. Based on the analysis of the design of normal and passive fault tolerant controllers, by using the implicit function theorem, a novel NN-based active fault-tolerant control scheme with fault alarm is proposed. Comparing with results in the literature, the fault-tolerant control scheme can minimize the time delay between fault occurrence and accommodation that is called the time delay due to fault diagnosis, and reduce the adverse effect on system performance. In addition, the FTC scheme has the advantages of a passive fault-tolerant control scheme as well as the traditional active fault-tolerant control scheme's properties. Furthermore, the fault-tolerant control scheme requires no additional fault detection and isolation model which is necessary in the traditional active fault-tolerant control scheme. Finally, simulation results are presented to demonstrate the efficiency of the developed techniques. PMID:25014982

  14. Development and Evaluation of Fault-Tolerant Flight Control Systems

    NASA Technical Reports Server (NTRS)

    Song, Yong D.; Gupta, Kajal (Technical Monitor)

    2004-01-01

    The research is concerned with developing a new approach to enhancing fault tolerance of flight control systems. The original motivation for fault-tolerant control comes from the need for safe operation of control elements (e.g. actuators) in the event of hardware failures in high reliability systems. One such example is modem space vehicle subjected to actuator/sensor impairments. A major task in flight control is to revise the control policy to balance impairment detectability and to achieve sufficient robustness. This involves careful selection of types and parameters of the controllers and the impairment detecting filters used. It also involves a decision, upon the identification of some failures, on whether and how a control reconfiguration should take place in order to maintain a certain system performance level. In this project new flight dynamic model under uncertain flight conditions is considered, in which the effects of both ramp and jump faults are reflected. Stabilization algorithms based on neural network and adaptive method are derived. The control algorithms are shown to be effective in dealing with uncertain dynamics due to external disturbances and unpredictable faults. The overall strategy is easy to set up and the computation involved is much less as compared with other strategies. Computer simulation software is developed. A serious of simulation studies have been conducted with varying flight conditions.

  15. Fault-tolerant adaptive FIR filters using variable detection threshold

    NASA Astrophysics Data System (ADS)

    Lin, L. K.; Redinbo, G. R.

    1994-10-01

    Adaptive filters are widely used in many digital signal processing applications, where tap weight of the filters are adjusted by stochastic gradient search methods. Block adaptive filtering techniques, such as block least mean square and block conjugate gradient algorithm, were developed to speed up the convergence as well as improve the tracking capability which are two important factors in designing real-time adaptive filter systems. Even though algorithm-based fault tolerance can be used as a low-cost high level fault-tolerant technique to protect the aforementioned systems from hardware failures with minimal hardware overhead, the issue of choosing a good detection threshold remains a challenging problem. First of all, the systems usually only have limited computational resources, i.e., concurrent error detection and correction is not feasible. Secondly, any prior knowledge of input data is very difficult to get in practical settings. We propose a checksum-based fault detection scheme using two-level variable detection thresholds that is dynamically dependent on the past syndromes. Simulations show that the proposed scheme reduces the possibility of false alarms and has a high degree of fault coverage in adaptive filter systems.

  16. Active fault tolerant control of a flexible beam

    NASA Astrophysics Data System (ADS)

    Bai, Yuanqiang; Grigoriadis, Karolos M.; Song, Gangbing

    2007-04-01

    This paper presents the development and application of an H∞ fault detection and isolation (FDI) filter and fault tolerant controller (FTC) for smart structures. A linear matrix inequality (LMI) formulation is obtained to design the full order robust H∞ filter to estimate the faulty input signals. A fault tolerant H∞ controller is designed for the combined system of plant and filter which minimizes the control objective selected in the presence of disturbances and faults. A cantilevered flexible beam bonded with piezoceramic smart materials, in particular the PZT (Lead Zirconate Titanate), in the form of a patch is used in the validation of the FDI filter and FTC controller design. These PZT patches are surface-bonded on the beam and perform as actuators and sensors. A real-time data acquisition and control system is used to record the experimental data and to implement the designed FDI filter and FTC. To assist the control system design, system identification is conducted for the first mode of the smart structural system. The state space model from system identification is used for the H∞ FDI filter design. The controller was designed based on minimization of the control effort and displacement of the beam. The residuals obtained from the filter through experiments clearly identify the fault signals. The experimental results of the proposed FTC controller show its e effectiveness for the vibration suppression of the beam for the faulty system when the piezoceramic actuator has a partial failure.

  17. A validation methodology for fault-tolerant clock synchronization

    NASA Technical Reports Server (NTRS)

    Johnson, S. C.; Butler, R. W.

    1984-01-01

    A validation method for the synchronization subsystem of a fault-tolerant computer system is presented. The high reliability requirement of flight crucial systems precludes the use of most traditional validation methods. The method presented utilizes formal design proof to uncover design and coding errors and experimentation to validate the assumptions of the design proof. The experimental method is described and illustrated by validating an experimental implementation of the Software Implemented Fault Tolerance (SIFT) clock synchronization algorithm. The design proof of the algorithm defines the maximum skew between any two nonfaulty clocks in the system in terms of theoretical upper bounds on certain system parameters. The quantile to which each parameter must be estimated is determined by a combinatorial analysis of the system reliability. The parameters are measured by direct and indirect means, and upper bounds are estimated. A nonparametric method based on an asymptotic property of the tail of a distribution is used to estimate the upper bound of a critical system parameter. Although the proof process is very costly, it is extremely valuable when validating the crucial synchronization subsystem.

  18. Reliability modeling of fault-tolerant computer based systems

    NASA Technical Reports Server (NTRS)

    Bavuso, Salvatore J.

    1987-01-01

    Digital fault-tolerant computer-based systems have become commonplace in military and commercial avionics. These systems hold the promise of increased availability, reliability, and maintainability over conventional analog-based systems through the application of replicated digital computers arranged in fault-tolerant configurations. Three tightly coupled factors of paramount importance, ultimately determining the viability of these systems, are reliability, safety, and profitability. Reliability, the major driver affects virtually every aspect of design, packaging, and field operations, and eventually produces profit for commercial applications or increased national security. However, the utilization of digital computer systems makes the task of producing credible reliability assessment a formidable one for the reliability engineer. The root of the problem lies in the digital computer's unique adaptability to changing requirements, computational power, and ability to test itself efficiently. Addressed here are the nuances of modeling the reliability of systems with large state sizes, in the Markov sense, which result from systems based on replicated redundant hardware and to discuss the modeling of factors which can reduce reliability without concomitant depletion of hardware. Advanced fault-handling models are described and methods of acquiring and measuring parameters for these models are delineated.

  19. Software reliability through fault-avoidance and fault-tolerance

    NASA Technical Reports Server (NTRS)

    Vouk, Mladen A.; Mcallister, David F.

    1993-01-01

    Strategies and tools for the testing, risk assessment and risk control of dependable software-based systems were developed. Part of this project consists of studies to enable the transfer of technology to industry, for example the risk management techniques for safety-concious systems. Theoretical investigations of Boolean and Relational Operator (BRO) testing strategy were conducted for condition-based testing. The Basic Graph Generation and Analysis tool (BGG) was extended to fully incorporate several variants of the BRO metric. Single- and multi-phase risk, coverage and time-based models are being developed to provide additional theoretical and empirical basis for estimation of the reliability and availability of large, highly dependable software. A model for software process and risk management was developed. The use of cause-effect graphing for software specification and validation was investigated. Lastly, advanced software fault-tolerance models were studied to provide alternatives and improvements in situations where simple software fault-tolerance strategies break down.

  20. Performance and economy of a fault-tolerant multiprocessor

    NASA Technical Reports Server (NTRS)

    Lala, J. H.; Smith, C. J.

    1979-01-01

    The FTMP (Fault-Tolerant Multiprocessor) is one of two central aircraft fault-tolerant architectures now in the prototype phase under NASA sponsorship. The intended application of the computer includes such critical real-time tasks as 'fly-by-wire' active control and completely automatic Category III landings of commercial aircraft. The FTMP architecture is briefly described and it is shown that it is a viable solution to the multi-faceted problems of safety, speed, and cost. Three job dispatch strategies are described, and their results with respect to job-starting delay are presented. The first strategy is a simple First-Come-First-Serve (FCFS) job dispatch executive. The other two schedulers are an adaptive FCFS and an interrupt driven scheduler. Three failure modes are discussed, and the FTMP survival probability in the face of random hard failures is evaluated. It is noted that the hourly cost of operating two FTMPs in a transport aircraft can be as little as one-to-two percent of the total flight-hour cost of the aircraft.

  1. Fault-tolerance in Two-dimensional Topological Systems

    NASA Astrophysics Data System (ADS)

    Anderson, Jonas T.

    This thesis is a collection of ideas with the general goal of building, at least in the abstract, a local fault-tolerant quantum computer. The connection between quantum information and topology has proven to be an active area of research in several fields. The introduction of the toric code by Alexei Kitaev demonstrated the usefulness of topology for quantum memory and quantum computation. Many quantum codes used for quantum memory are modeled by spin systems on a lattice, with operators that extract syndrome information placed on vertices or faces of the lattice. It is natural to wonder whether the useful codes in such systems can be classified. This thesis presents work that leverages ideas from topology and graph theory to explore the space of such codes. Homological stabilizer codes are introduced and it is shown that, under a set of reasonable assumptions, any qubit homological stabilizer code is equivalent to either a toric code or a color code. Additionally, the toric code and the color code correspond to distinct classes of graphs. Many systems have been proposed as candidate quantum computers. It is very desirable to design quantum computing architectures with two-dimensional layouts and low complexity in parity-checking circuitry. Kitaev's surface codes provided the first example of codes satisfying this property. They provided a new route to fault tolerance with more modest overheads and thresholds approaching 1%. The recently discovered color codes share many properties with the surface codes, such as the ability to perform syndrome extraction locally in two dimensions. Some families of color codes admit a transversal implementation of the entire Clifford group. This work investigates color codes on the 4.8.8 lattice known as triangular codes. I develop a fault-tolerant error-correction strategy for these codes in which repeated syndrome measurements on this lattice generate a three-dimensional space-time combinatorial structure. I then develop an

  2. Analysis of non-linearity in differential wavefront sensing technique.

    PubMed

    Duan, Hui-Zong; Liang, Yu-Rong; Yeh, Hsien-Chi

    2016-03-01

    An analytical model of a differential wavefront sensing (DWS) technique based on Gaussian Beam propagation has been derived. Compared with the result of the interference signals detected by quadrant photodiode, which is calculated by using the numerical method, the analytical model has been verified. Both the analytical model and numerical simulation show milli-radians level non-linearity effect of DWS detection. In addition, the beam clipping has strong influence on the non-linearity of DWS. The larger the beam clipping is, the smaller the non-linearity is. However, the beam walking effect hardly has influence on DWS. Thus, it can be ignored in laser interferometer. PMID:26974079

  3. Fault tolerant small satellite attitude control using adaptive non-singular terminal sliding mode

    NASA Astrophysics Data System (ADS)

    Cao, Lu; Chen, XiaoQian; Sheng, Tao

    2013-06-01

    The Attitude Control System (ACS) plays a pivotal role in the whole performance of the spacecraft on the orbit; therefore, it is vitally important to design the control system with the performance of rapid response, high control precision and insensitive to external perturbations. In the first place, this paper proposes two adaptive nonlinear control algorithms based on the sliding mode control (SMC), which are designed for small satellite attitude control system. The nonlinear dynamics describing the attitude of small satellite is considered in a circle reference orbit, and the stability of the closed-loop system in the presence of external perturbations is investigated. Then, in order to account for accidental or degradation fault in satellite actuators, the fault-tolerant control schemes are presented. Hence, two adaptive fault-tolerant control laws (continuous sliding mode control and non-singular terminal sliding mode control) are developed by adopting the nonlinear analytical model to describe the system, which can guarantee global asymptotic convergence of the attitude control error with the existence of unknown external perturbations. The nonlinear hyperplane based Terminal sliding mode is introduced into the control law design; therefore, the system convergence performance improves and the control error is convergent in "finite time". As a result, the study on the non-singular terminal sliding mode control is the emphasis and the continuous sliding mode control is used to compare with the non-singular terminal sliding mode control. Meanwhile, an adaptive fuzzy algorithm has been proposed to suppress the chattering phenomenon. Moreover, several numerical examples are presented to demonstrate the efficacy of the proposed controllers by correcting for the external perturbations. Simulation results confirm that the suggested methodologies yield high control precision in control. In addition, actuator degradation, actuator stuck and actuator failure for a

  4. The X-38 Spacecraft Fault-Tolerant Avionics System

    NASA Technical Reports Server (NTRS)

    Kouba,Coy; Buscher, Deborah; Busa, Joseph

    2003-01-01

    In 1995 NASA began an experimental program to develop a reusable crew return vehicle (CRV) for the International Space Station. The purpose of the CRV was threefold: (i) to bring home an injured or ill crewmember; (ii) to bring home the entire crew if the Shuttle fleet was grounded; and (iii) to evacuate the crew in the case of an imminent Station threat (i.e., fire, decompression, etc). Built at the Johnson Space Center, were two approach and landing prototypes and one spacecraft demonstrator (called V201). A series of increasingly complex ground subsystem tests were completed, and eight successful high-altitude drop tests were achieved to prove the design concept. In this program, an unprecedented amount of commercial-off-the-shelf technology was utilized in this first crewed spacecraft NASA has built since the Shuttle program. Unfortunately, in 2002 the program was canceled due to changing Agency priorities. The vehicle was 80% complete and the program was shut down in such a manner as to preserve design, development, test and engineering data. This paper describes the X-38 V201 fault-tolerant avionics system. Based on Draper Laboratory's Byzantine-resilient fault-tolerant parallel processing system and their "network element" hardware, each flight computer exchanges information on a strict timescale to process input data, compare results, and issue voted vehicle output commands. Major accomplishments achieved in this development include: (i) a space qualified two-fault tolerant design using mostly COTS (hardware and operating system); (ii) a single event upset tolerant network element board, (iii) on-the-fly recovery of a failed processor; (iv) use of synched cache; (v) realignment of memory to bring back a failed channel; (vi) flight code automatically generated from the master measurement list; and (vii) built in-house by a team of civil servants and support contractors. This paper will present an overview of the avionics system and the hardware

  5. FPGA-Based, Self-Checking, Fault-Tolerant Computers

    NASA Technical Reports Server (NTRS)

    Some, Raphael; Rennels, David

    2004-01-01

    A proposed computer architecture would exploit the capabilities of commercially available field-programmable gate arrays (FPGAs) to enable computers to detect and recover from bit errors. The main purpose of the proposed architecture is to enable fault-tolerant computing in the presence of single-event upsets (SEUs). [An SEU is a spurious bit flip (also called a soft error) caused by a single impact of ionizing radiation.] The architecture would also enable recovery from some soft errors caused by electrical transients and, to some extent, from intermittent and permanent (hard) errors caused by aging of electronic components. A typical FPGA of the current generation contains one or more complete processor cores, memories, and highspeed serial input/output (I/O) channels, making it possible to shrink a board-level processor node to a single integrated-circuit chip. Custom, highly efficient microcontrollers, general-purpose computers, custom I/O processors, and signal processors can be rapidly and efficiently implemented by use of FPGAs. Unfortunately, FPGAs are susceptible to SEUs. Prior efforts to mitigate the effects of SEUs have yielded solutions that degrade performance of the system and require support from external hardware and software. In comparison with other fault-tolerant- computing architectures (e.g., triple modular redundancy), the proposed architecture could be implemented with less circuitry and lower power demand. Moreover, the fault-tolerant computing functions would require only minimal support from circuitry outside the central processing units (CPUs) of computers, would not require any software support, and would be largely transparent to software and to other computer hardware. There would be two types of modules: a self-checking processor module and a memory system (see figure). The self-checking processor module would be implemented on a single FPGA and would be capable of detecting its own internal errors. It would contain two CPUs executing

  6. 14 CFR Special Federal Aviation... - Fuel Tank System Fault Tolerance Evaluation Requirements

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... 14 Aeronautics and Space 1 2013-01-01 2013-01-01 false Fuel Tank System Fault Tolerance Evaluation Requirements Federal Special Federal Aviation Regulation No. 88 Aeronautics and Space FEDERAL AVIATION..., SFAR No. 88 Special Federal Aviation Regulation No. 88—Fuel Tank System Fault Tolerance...

  7. 14 CFR Special Federal Aviation... - Fuel Tank System Fault Tolerance Evaluation Requirements

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... 14 Aeronautics and Space 1 2012-01-01 2012-01-01 false Fuel Tank System Fault Tolerance Evaluation Requirements Federal Special Federal Aviation Regulation No. 88 Aeronautics and Space FEDERAL AVIATION..., SFAR No. 88 Special Federal Aviation Regulation No. 88—Fuel Tank System Fault Tolerance...

  8. 14 CFR Special Federal Aviation... - Fuel Tank System Fault Tolerance Evaluation Requirements

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 14 Aeronautics and Space 1 2010-01-01 2010-01-01 false Fuel Tank System Fault Tolerance Evaluation Requirements Federal Special Federal Aviation Regulation No. 88 Aeronautics and Space FEDERAL AVIATION..., SFAR No. 88 Special Federal Aviation Regulation No. 88—Fuel Tank System Fault Tolerance...

  9. 14 CFR Special Federal Aviation... - Fuel Tank System Fault Tolerance Evaluation Requirements

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 14 Aeronautics and Space 1 2011-01-01 2011-01-01 false Fuel Tank System Fault Tolerance Evaluation Requirements Federal Special Federal Aviation Regulation No. 88 Aeronautics and Space FEDERAL AVIATION..., SFAR No. 88 Special Federal Aviation Regulation No. 88—Fuel Tank System Fault Tolerance...

  10. 14 CFR Special Federal Aviation... - Fuel Tank System Fault Tolerance Evaluation Requirements

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... 14 Aeronautics and Space 1 2014-01-01 2014-01-01 false Fuel Tank System Fault Tolerance Evaluation Requirements Federal Special Federal Aviation Regulation No. 88 Aeronautics and Space FEDERAL AVIATION..., SFAR No. 88 Special Federal Aviation Regulation No. 88—Fuel Tank System Fault Tolerance...

  11. Design study of Software-Implemented Fault-Tolerance (SIFT) computer

    NASA Technical Reports Server (NTRS)

    Wensley, J. H.; Goldberg, J.; Green, M. W.; Kutz, W. H.; Levitt, K. N.; Mills, M. E.; Shostak, R. E.; Whiting-Okeefe, P. M.; Zeidler, H. M.

    1982-01-01

    Software-implemented fault tolerant (SIFT) computer design for commercial aviation is reported. A SIFT design concept is addressed. Alternate strategies for physical implementation are considered. Hardware and software design correctness is addressed. System modeling and effectiveness evaluation are considered from a fault-tolerant point of view.

  12. A complete hardening method for the generation of fault tolerant circuits

    NASA Astrophysics Data System (ADS)

    Portela-Garcia, Marta; Garcia-Valderas, Mario; Lopez-Ongil, Celia; Entrena, Luis

    2005-06-01

    Fault Tolerance has become an important requirement for integrated circuits, not only in safety critical applications like aerospace circuits, but also for applications working at the earth surface. Since the appearance of nanometer technologies, the sensitiveness of integrated circuits to radiation has increased notably, making the occurrence of soft errors much more frequent. Therefore, hardened circuits are currently required in many applications where fault tolerance was not a requirement in the very near past. In this paper, tools and methods for the whole hardening process of a circuit are presented: tools for the automatic insertion of fault tolerant structures in a circuit description and methods for the evaluation of fault tolerance achieved. These methods allow the evaluation of fault tolerance by means of emulation in platform FPGAs, which offer a much faster way to perform evaluation than simulation based techniques. Different circuits are used to test the proposed tool for inserting fault tolerant structures. Fault tolerance evaluation is performed using the proposed fault emulation methods, before and after applying hardening process, showing the fault tolerance improvement. The proposed techniques for evaluation have been compared, in terms of evaluation time, with previously proposed solutions and with simulation based solutions, showing improvements of several orders of magnitude.

  13. An optimized implementation of a fault-tolerant clock synchronization circuit

    NASA Technical Reports Server (NTRS)

    Torres-Pomales, Wilfredo

    1995-01-01

    A fault-tolerant clock synchronization circuit was designed and tested. A comparison to a previous design and the procedure followed to achieve the current optimization are included. The report also includes a description of the system and the results of tests performed to study the synchronization and fault-tolerant characteristics of the implementation.

  14. SIFT - Multiprocessor architecture for Software Implemented Fault Tolerance flight control and avionics computers

    NASA Technical Reports Server (NTRS)

    Forman, P.; Moses, K.

    1979-01-01

    A brief description of a SIFT (Software Implemented Fault Tolerance) Flight Control Computer with emphasis on implementation is presented. A multiprocessor system that relies on software-implemented fault detection and reconfiguration algorithms is described. A high level reliability and fault tolerance is achieved by the replication of computing tasks among processing units.

  15. A survey of NASA and military standards on fault tolerance and reliability applied to robotics

    NASA Technical Reports Server (NTRS)

    Cavallaro, Joseph R.; Walker, Ian D.

    1994-01-01

    There is currently increasing interest and activity in the area of reliability and fault tolerance for robotics. This paper discusses the application of Standards in robot reliability, and surveys the literature of relevant existing standards. A bibliography of relevant Military and NASA standards for reliability and fault tolerance is included.

  16. Gain-Scheduled Fault Tolerance Control Under False Identification

    NASA Technical Reports Server (NTRS)

    Shin, Jong-Yeob; Belcastro, Christine (Technical Monitor)

    2006-01-01

    An active fault tolerant control (FTC) law is generally sensitive to false identification since the control gain is reconfigured for fault occurrence. In the conventional FTC law design procedure, dynamic variations due to false identification are not considered. In this paper, an FTC synthesis method is developed in order to consider possible variations of closed-loop dynamics under false identification into the control design procedure. An active FTC synthesis problem is formulated into an LMI optimization problem to minimize the upper bound of the induced-L2 norm which can represent the worst-case performance degradation due to false identification. The developed synthesis method is applied for control of the longitudinal motions of FASER (Free-flying Airplane for Subscale Experimental Research). The designed FTC law of the airplane is simulated for pitch angle command tracking under a false identification case.

  17. FTMP - A highly reliable Fault-Tolerant Multiprocessor for aircraft

    NASA Technical Reports Server (NTRS)

    Hopkins, A. L., Jr.; Smith, T. B., III; Lala, J. H.

    1978-01-01

    The FTMP (Fault-Tolerant Multiprocessor) is a complex multiprocessor computer that employs a form of redundancy related to systems considered by Mathur (1971), in which each major module can substitute for any other module of the same type. Despite the conceptual simplicity of the redundancy form, the implementation has many intricacies owing partly to the low target failure rate, and partly to the difficulty of eliminating single-fault vulnerability. An extensive analysis of the computer through the use of such modeling techniques as Markov processes and combinatorial mathematics shows that for random hard faults the computer can meet its requirements. It is also shown that the maintenance scheduled at intervals of 200 hr or more can be adequate most of the time.

  18. Fault model development for fault tolerant VLSI design

    NASA Astrophysics Data System (ADS)

    Hartmann, C. R.; Lala, P. K.; Ali, A. M.; Visweswaran, G. S.; Ganguly, S.

    1988-05-01

    Fault models provide systematic and precise representations of physical defects in microcircuits in a form suitable for simulation and test generation. The current difficulty in testing VLSI circuits can be attributed to the tremendous increase in design complexity and the inappropriateness of traditional stuck-at fault models. This report develops fault models for three different types of common defects that are not accurately represented by the stuck-at fault model. The faults examined in this report are: bridging faults, transistor stuck-open faults, and transient faults caused by alpha particle radiation. A generalized fault model could not be developed for the three fault types. However, microcircuit behavior and fault detection strategies are described for the bridging, transistor stuck-open, and transient (alpha particle strike) faults. The results of this study can be applied to the simulation and analysis of faults in fault tolerant VLSI circuits.

  19. Coverage modeling for dependability analysis of fault-tolerant systems

    NASA Technical Reports Server (NTRS)

    Dugan, Joanne Bechta; Trivedi, Kishor S.

    1989-01-01

    Several different models for predicting coverage in a fault-tolerant system, including models for permanent, intermittent, and transient errors, are discussed. Markov, semi-Markov, nonhomogeneous Markov, and extended stochastic Petri net models for computing coverage are developed. Two types of events that interfere with recovery are examined; and methods for modeling such events, whether they are deterministic or random, are given. The sensitivity of system reliability/availability to the coverage parameter and the sensitivity of the coverage parameter to various error-handling strategies are investigated. It is found that a policy of attempting transient recovery upon detection of an error can actually increase the unreliability of the system. This result is true if the error detectability is not nearly perfect, so that the risk of producing an undetectable error is greater than the benefit gained by not discarding the component.

  20. Hypothetical Scenario Generator for Fault-Tolerant Diagnosis

    NASA Technical Reports Server (NTRS)

    James, Mark

    2007-01-01

    The Hypothetical Scenario Generator for Fault-tolerant Diagnostics (HSG) is an algorithm being developed in conjunction with other components of artificial- intelligence systems for automated diagnosis and prognosis of faults in spacecraft, aircraft, and other complex engineering systems. By incorporating prognostic capabilities along with advanced diagnostic capabilities, these developments hold promise to increase the safety and affordability of the affected engineering systems by making it possible to obtain timely and accurate information on the statuses of the systems and predicting impending failures well in advance. The HSG is a specific instance of a hypothetical- scenario generator that implements an innovative approach for performing diagnostic reasoning when data are missing. The special purpose served by the HSG is to (1) look for all possible ways in which the present state of the engineering system can be mapped with respect to a given model and (2) generate a prioritized set of future possible states and the scenarios of which they are parts.

  1. An empirical comparison of software fault tolerance and fault elimination

    NASA Technical Reports Server (NTRS)

    Shimeall, Timothy J.; Leveson, Nancy G.

    1991-01-01

    Reliability is an important concern in the development of software for modern systems. Some researchers have hypothesized that particular fault-handling approaches or techniques are so effective that other approaches or techniques are superfluous. The authors have performed a study that compares two major approaches to the improvement of software, software fault elimination and software fault tolerance, by examination of the fault detection obtained by five techniques: run-time assertions, multi-version voting, functional testing augmented by structural testing, code reading by stepwise abstraction, and static data-flow analysis. This study has focused on characterizing the sets of faults detected by the techniques and on characterizing the relationships between these sets of faults. The results of the study show that none of the techniques studied is necessarily redundant to any combination of the others. Further results reveal strengths and weakness in the fault detection by the techniques studied and suggest directions for future research.

  2. Using Ada for a distributed, fault tolerant system

    NASA Technical Reports Server (NTRS)

    Dewolf, J. B.; Sodano, N. M.; Whittredge, R. S.

    1984-01-01

    It is pointed out that advanced avionics applications increasingly require underlying machine architectures which are damage and fault tolerant, and which provide access to distributed sensors, effectors and high-throughput computational resources. The Advanced Information Processing System (AIPS), sponsored by NASA, is to provide an architecture which can meet the considered requirements. Ada was selected for implementing the AIPS system software. Advantages of Ada are related to its provisions for real-time programming, error detection, modularity and separate compilation, and standardization and portability. Chief drawbacks of this language are currently limited availability and maturity of language implementations, and limited experience in applying the language to real-time applications. The present investigation is concerned with current plans for employing Ada in the design of the software for AIPS. Attention is given to an overview of AIPS, AIPS software services, and representative design issues in each of four major software categories.

  3. Fault-Tolerant Control Based on Hybrid Redundancy

    NASA Astrophysics Data System (ADS)

    Takagi, Taro; Takahashi, Masanori

    This paper presents a new fault-tolerant control system (FTCS) against actuator failures. The proposed FTCS is based on a hybrid of static and dynamic redundancies. The redundancy-mode is selected appropriately by only a switching logic which is designed from the control performance. Hence, no fault detector is utilized. For all switched modes, a unity high-gain feedback controller with a parallel feedforward compensator is introduced to attain the stabilization and the asymptotic tracking. Because the controller has high robustness with respect to uncertainties, the FTCS can cope with variations in dynamics that is caused by the failure. In this paper, several simulation results for the connected vehicles are shown to confirm the effectiveness of the FTCS.

  4. Evolution of shuttle avionics redundancy management/fault tolerance

    NASA Technical Reports Server (NTRS)

    Boykin, J. C.; Thibodeau, J. R.; Schneider, H. E.

    1985-01-01

    The challenge of providing redundancy management (RM) and fault tolerance to meet the Shuttle Program requirements of fail operational/fail safe for the avionics systems was complicated by the critical program constraints of weight, cost, and schedule. The basic and sometimes false effectivity of less than pure RM designs is addressed. Evolution of the multiple input selection filter (the heart of the RM function) is discussed with emphasis on the subtle interactions of the flight control system that were found to be potentially catastrophic. Several other general RM development problems are discussed, with particular emphasis on the inertial measurement unit RM, indicative of the complexity of managing that three string system and its critical interfaces with the guidance and control systems.

  5. MCNP load balancing and fault tolerance with PVM

    SciTech Connect

    McKinney, G.W.

    1995-07-01

    Version 4A of the Monte Carlo neutron, photon, and electron transport code MCNP, developed by LANL (Los Alamos National Laboratory), supports distributed-memory multiprocessing through the software package PVM (Parallel Virtual Machine, version 3.1.4). Using PVM for interprocessor communication, MCNP can simultaneously execute a single problem on a cluster of UNIX-based workstations. This capability provided system efficiencies that exceeded 80% on dedicated workstation clusters, however, on heterogeneous or multiuser systems, the performance was limited by the slowest processor (i.e., equal work was assigned to each processor). The next public release of MCNP will provide multiprocessing enhancements that include load balancing and fault tolerance which are shown to dramatically increase multiuser system efficiency and reliability.

  6. Fault-tolerant control of heavy-haul trains

    NASA Astrophysics Data System (ADS)

    Zhuan, Xiangtao; Xia, Xiaohua

    2010-06-01

    The fault-tolerant control (FTC) of heavy-haul trains is discussed on the basis of the speed regulation proposed in previous works. The fault modes of trains are assumed and the corresponding fault detection and isolation (FDI) are studied. The FDI of sensor faults is based on a geometric approach for residual generators. The FDI of a braking system is based on the observation of the steady-state speed. From the difference of the steady-state speeds between the fault system and the faultless system, one can get fault information. Simulation tests were conducted on the suitability of the FDIs and the redesigned speed regulators. It is shown that the proposed FTC does not explicitly worsen the performance of the speed regulator in the case of a faultless system, while it obviously improves the performance of the speed regulator in the case of a faulty system.

  7. Investigation of an advanced fault tolerant integrated avionics system

    NASA Technical Reports Server (NTRS)

    Dunn, W. R.; Cottrell, D.; Flanders, J.; Javornik, A.; Rusovick, M.

    1986-01-01

    Presented is an advanced, fault-tolerant multiprocessor avionics architecture as could be employed in an advanced rotorcraft such as LHX. The processor structure is designed to interface with existing digital avionics systems and concepts including the Army Digital Avionics System (ADAS) cockpit/display system, navaid and communications suites, integrated sensing suite, and the Advanced Digital Optical Control System (ADOCS). The report defines mission, maintenance and safety-of-flight reliability goals as might be expected for an operational LHX aircraft. Based on use of a modular, compact (16-bit) microprocessor card family, results of a preliminary study examining simplex, dual and standby-sparing architectures is presented. Given the stated constraints, it is shown that the dual architecture is best suited to meet reliability goals with minimum hardware and software overhead. The report presents hardware and software design considerations for realizing the architecture including redundancy management requirements and techniques as well as verification and validation needs and methods.

  8. Fault Tolerant Coverage and Connectivity in Presence of Channel Randomness

    PubMed Central

    Sagar, Anil Kumar; Lobiyal, D. K.

    2014-01-01

    Some applications of wireless sensor network require K-coverage and K-connectivity to ensure the system to be fault tolerance and to make it more reliable. Therefore, it makes coverage and connectivity an important issue in wireless sensor networks. In this paper, we proposed K-coverage and K-connectivity models for wireless sensor networks. In both models, nodes are distributed according to Poisson distribution in the sensor field. To make the proposed model more realistic we used log-normal shadowing path loss model to capture the radio irregularities and studied its impact on K-coverage and K-connectivity. The value of K can be different for different types of applications. Further, we also analyzed the problem of node failure for K-coverage model. In the simulation section, results clearly show that coverage and connectivity of wireless sensor network depend on the node density, shadowing parameters like the path loss exponent, and standard deviation. PMID:24574922

  9. Validating Requirements for Fault Tolerant Systems Using Model Checking

    NASA Technical Reports Server (NTRS)

    Schneider, Francis; Easterbrook, Steve M.; Callahan, John R.; Holzmann, Gerard J.

    1997-01-01

    Model checking is shown to be an effective tool in validating the behavior of a fault tolerant embedded spacecraft controller. The case study presented here shows that by judiciously abstracting away extraneous complexity, the state space of the model could be exhaustively searched allowing critical functional requirements to be validated down to the design level. Abstracting away detail not germane to the problem of interest leaves by definition a partial specification behind. The success of this procedure shows that it is feasible to effectively validate a partial specification with this technique. Three anomalies were found in the system one of which is an error in the detailed requirements, and the other two are missing/ambiguous requirements. Because the method allows validation of partial specifications, it also is an effective methodology towards maintaining fidelity between a co-evolving specification and an implementation.

  10. Buffered coscheduling for parallel programming and enhanced fault tolerance

    DOEpatents

    Petrini, Fabrizio; Feng, Wu-chun

    2006-01-31

    A computer implemented method schedules processor jobs on a network of parallel machine processors or distributed system processors. Control information communications generated by each process performed by each processor during a defined time interval is accumulated in buffers, where adjacent time intervals are separated by strobe intervals for a global exchange of control information. A global exchange of the control information communications at the end of each defined time interval is performed during an intervening strobe interval so that each processor is informed by all of the other processors of the number of incoming jobs to be received by each processor in a subsequent time interval. The buffered coscheduling method of this invention also enhances the fault tolerance of a network of parallel machine processors or distributed system processors

  11. Fault tolerant vector control of induction motor drive

    NASA Astrophysics Data System (ADS)

    Odnokopylov, G.; Bragin, A.

    2014-10-01

    For electric composed of technical objects hazardous industries, such as nuclear, military, chemical, etc. an urgent task is to increase their resiliency and survivability. The construction principle of vector control system fault-tolerant asynchronous electric. Displaying recovery efficiency three-phase induction motor drive in emergency mode using two-phase vector control system. The process of formation of a simulation model of the asynchronous electric unbalance in emergency mode. When modeling used coordinate transformation, providing emergency operation electric unbalance work. The results of modeling transient phase loss motor stator. During a power failure phase induction motor cannot save circular rotating field in the air gap of the motor and ensure the restoration of its efficiency at rated torque and speed.

  12. Ultrafast and fault-tolerant quantum communication across long distances.

    PubMed

    Muralidharan, Sreraman; Kim, Jungsang; Lütkenhaus, Norbert; Lukin, Mikhail D; Jiang, Liang

    2014-06-27

    Quantum repeaters (QRs) provide a way of enabling long distance quantum communication by establishing entangled qubits between remote locations. In this Letter, we investigate a new approach to QRs in which quantum information can be faithfully transmitted via a noisy channel without the use of long distance teleportation, thus eliminating the need to establish remote entangled links. Our approach makes use of small encoding blocks to fault-tolerantly correct both operational and photon loss errors. We describe a way to optimize the resource requirement for these QRs with the aim of the generation of a secure key. Numerical calculations indicate that the number of quantum memory bits at each repeater station required for the generation of one secure key has favorable polylogarithmic scaling with the distance across which the communication is desired. PMID:25014798

  13. The Design of a Fault-Tolerant COTS-Based Bus Architecture for Space Applications

    NASA Technical Reports Server (NTRS)

    Chau, Savio N.; Alkalai, Leon; Tai, Ann T.

    2000-01-01

    The high-performance, scalability and miniaturization requirements together with the power, mass and cost constraints mandate the use of commercial-off-the-shelf (COTS) components and standards in the X2000 avionics system architecture for deep-space missions. In this paper, we report our experiences and findings on the design of an IEEE 1394 compliant fault-tolerant COTS-based bus architecture. While the COTS standard IEEE 1394 adequately supports power management, high performance and scalability, its topological criteria impose restrictions on fault tolerance realization. To circumvent the difficulties, we derive a "stack-tree" topology that not only complies with the IEEE 1394 standard but also facilitates fault tolerance realization in a spaceborne system with limited dedicated resource redundancies. Moreover, by exploiting pertinent standard features of the 1394 interface which are not purposely designed for fault tolerance, we devise a comprehensive set of fault detection mechanisms to support the fault-tolerant bus architecture.

  14. Analysis of GPS Abnormal Conditions within Fault Tolerant Control Laws

    NASA Astrophysics Data System (ADS)

    Al-Sinbol, Gahssan

    The Global Position System (GPS) is a critical element for the functionality of autonomous flying vehicles. The GPS operation at normal and abnormal conditions directly impacts the trajectory tracking performance of the autonomous Unmanned Aerial Vehicles (UAVs) controllers. The effects of GPS parameter variation must be well understood and user-friendly computational tools must be developed to facilitate the design and evaluation of fault tolerant control laws. This thesis presents the development of a simplified GPS error model in Matlab/Simulink and its use performing a sensitivity analysis of GPS parameters effect under system normal and abnormal operation on different UAV trajectory tracking controllers. The model statistically generates position and velocity errors, simulates the effect of GPS satellite configuration on the position and velocity measurement accuracy, and implements a set of failures to the GPS readings. The model and its graphical user interface was integrated within the WVU UAV simulation environment as a masked Simulink block. The effects on the controllers' trajectory tracking performance of the following GPS parameters were investigated within normal operation ranges and outside: time delay, update rate, error standard deviation, bias, and major position and velocity failures. Several sets of control laws with fixed and adaptive parameters and of different levels of complexity have been used in this investigation. A complex performance index formulated in terms of tracking errors and control activity was used for control laws performance evaluation. The composition of various metrics within the performance index was performed using fixed and variable weights depending on the local characteristics of the commanded trajectory. This study has revealed that GPS error parameters have a significant impact on control laws performance. The proposed GPS model has proved to be a valuable, flexible tool for testing and evaluation of the fault

  15. Lightweight storage and overlay networks for fault tolerance.

    SciTech Connect

    Oldfield, Ron A.

    2010-01-01

    The next generation of capability-class, massively parallel processing (MPP) systems is expected to have hundreds of thousands to millions of processors, In such environments, it is critical to have fault-tolerance mechanisms, including checkpoint/restart, that scale with the size of applications and the percentage of the system on which the applications execute. For application-driven, periodic checkpoint operations, the state-of-the-art does not provide a scalable solution. For example, on today's massive-scale systems that execute applications which consume most of the memory of the employed compute nodes, checkpoint operations generate I/O that consumes nearly 80% of the total I/O usage. Motivated by this observation, this project aims to improve I/O performance for application-directed checkpoints through the use of lightweight storage architectures and overlay networks. Lightweight storage provide direct access to underlying storage devices. Overlay networks provide caching and processing capabilities in the compute-node fabric. The combination has potential to signifcantly reduce I/O overhead for large-scale applications. This report describes our combined efforts to model and understand overheads for application-directed checkpoints, as well as implementation and performance analysis of a checkpoint service that uses available compute nodes as a network cache for checkpoint operations.

  16. Distributed Evaluation Functions for Fault Tolerant Multi-Rover Systems

    NASA Technical Reports Server (NTRS)

    Agogino, Adrian; Turner, Kagan

    2005-01-01

    The ability to evolve fault tolerant control strategies for large collections of agents is critical to the successful application of evolutionary strategies to domains where failures are common. Furthermore, while evolutionary algorithms have been highly successful in discovering single-agent control strategies, extending such algorithms to multiagent domains has proven to be difficult. In this paper we present a method for shaping evaluation functions for agents that provide control strategies that both are tolerant to different types of failures and lead to coordinated behavior in a multi-agent setting. This method neither relies of a centralized strategy (susceptible to single point of failures) nor a distributed strategy where each agent uses a system wide evaluation function (severe credit assignment problem). In a multi-rover problem, we show that agents using our agent-specific evaluation perform up to 500% better than agents using the system evaluation. In addition we show that agents are still able to maintain a high level of performance when up to 60% of the agents fail due to actuator, communication or controller faults.

  17. Algorithm-Based Fault Tolerance for Numerical Subroutines

    NASA Technical Reports Server (NTRS)

    Tumon, Michael; Granat, Robert; Lou, John

    2007-01-01

    A software library implements a new methodology of detecting faults in numerical subroutines, thus enabling application programs that contain the subroutines to recover transparently from single-event upsets. The software library in question is fault-detecting middleware that is wrapped around the numericalsubroutines. Conventional serial versions (based on LAPACK and FFTW) and a parallel version (based on ScaLAPACK) exist. The source code of the application program that contains the numerical subroutines is not modified, and the middleware is transparent to the user. The methodology used is a type of algorithm- based fault tolerance (ABFT). In ABFT, a checksum is computed before a computation and compared with the checksum of the computational result; an error is declared if the difference between the checksums exceeds some threshold. Novel normalization methods are used in the checksum comparison to ensure correct fault detections independent of algorithm inputs. In tests of this software reported in the peer-reviewed literature, this library was shown to enable detection of 99.9 percent of significant faults while generating no false alarms.

  18. Fault tolerant channel-encrypting quantum dialogue against collective noise

    NASA Astrophysics Data System (ADS)

    Ye, TianYu

    2015-04-01

    In this paper, two fault tolerant channel-encrypting quantum dialogue (QD) protocols against collective noise are presented. One is against collective-dephasing noise, while the other is against collective-rotation noise. The decoherent-free states, each of which is composed of two physical qubits, act as traveling states combating collective noise. Einstein-Podolsky-Rosen pairs, which play the role of private quantum key, are securely shared between two participants over a collective-noise channel in advance. Through encryption and decryption with private quantum key, the initial state of each traveling two-photon logical qubit is privately shared between two participants. Due to quantum encryption sharing of the initial state of each traveling logical qubit, the issue of information leakage is overcome. The private quantum key can be repeatedly used after rotation as long as the rotation angle is properly chosen, making quantum resource economized. As a result, their information-theoretical efficiency is nearly up to 66.7%. The proposed QD protocols only need single-photon measurements rather than two-photon joint measurements for quantum measurements. Security analysis shows that an eavesdropper cannot obtain anything useful about secret messages during the dialogue process without being discovered. Furthermore, the proposed QD protocols can be implemented with current techniques in experiment.

  19. Making classical ground-state spin computing fault-tolerant.

    PubMed

    Crosson, I J; Bacon, D; Brown, K R

    2010-09-01

    We examine a model of classical deterministic computing in which the ground state of the classical system is a spatial history of the computation. This model is relevant to quantum dot cellular automata as well as to recent universal adiabatic quantum computing constructions. In its most primitive form, systems constructed in this model cannot compute in an error-free manner when working at nonzero temperature. However, by exploiting a mapping between the partition function for this model and probabilistic classical circuits we are able to show that it is possible to make this model effectively error-free. We achieve this by using techniques in fault-tolerant classical computing and the result is that the system can compute effectively error-free if the temperature is below a critical temperature. We further link this model to computational complexity and show that a certain problem concerning finite temperature classical spin systems is complete for the complexity class Merlin-Arthur. This provides an interesting connection between the physical behavior of certain many-body spin systems and computational complexity. PMID:21230024

  20. Study on fault-tolerant processors for advanced launch system

    NASA Technical Reports Server (NTRS)

    Shin, Kang G.; Liu, Jyh-Charn

    1990-01-01

    Issues related to the reliability of a redundant system with large main memory are addressed. The Fault-Tolerant Processor (FTP) for the Advanced Launch System (ALS) is used as a basis for the presentation. When the system is free of latent faults, the probability of system crash due to multiple channel faults is shown to be insignificant even when voting on the outputs of computing channels is infrequent. Using channel error maskers (CEMs) is shown to improve reliability more effectively than increasing redundancy or the number of channels for applications with long mission times. Even without using a voter, most memory errors can be immediately corrected by those CEMs implemented with conventional coding techniques. In addition to their ability to enhance system reliability, CEMs (with a very low hardware overhead) can be used to dramatically reduce not only the need of memory realignment, but also the time required to realign channel memories in case, albeit rare, such a need arises. Using CEMs, two different schemes were developed to solve the memory realignment problem. In both schemes, most errors are corrected by CEMs, and the remaining errors are masked by a voter.

  1. ALLIANCE: An architecture for fault tolerant multi-robot cooperation

    SciTech Connect

    Parker, L.E.

    1995-02-01

    ALLIANCE is a software architecture that facilitates the fault tolerant cooperative control of teams of heterogeneous mobile robots performing missions composed of loosely coupled, largely independent subtasks. ALLIANCE allows teams of robots, each of which possesses a variety of high-level functions that it can perform during a mission, to individually select appropriate actions throughout the mission based on the requirements of the mission, the activities of other robots, the current environmental conditions, and the robot`s own internal states. ALLIANCE is a fully distributed, behavior-based architecture that incorporates the use of mathematically modeled motivations (such as impatience and acquiescence) within each robot to achieve adaptive action selection. Since cooperative robotic teams usually work in dynamic and unpredictable environments, this software architecture allows the robot team members to respond robustly, reliably, flexibly, and coherently to unexpected environmental changes and modifications in the robot team that may occur due to mechanical failure, the learning of new skills, or the addition or removal of robots from the team by human intervention. The feasibility of this architecture is demonstrated in an implementation on a team of mobile robots performing a laboratory version of hazardous waste cleanup.

  2. Fault-tolerant authenticated quantum dialogue using logical Bell states

    NASA Astrophysics Data System (ADS)

    Ye, Tian-Yu

    2015-09-01

    Two fault-tolerant authenticated quantum dialogue protocols are proposed in this paper by employing logical Bell states as the quantum resource, which combat the collective-dephasing noise and the collective-rotation noise, respectively. The two proposed protocols each can accomplish the mutual identity authentication and the dialogue between two participants simultaneously and securely over one kind of collective noise channels. In each of two proposed protocols, the information transmitted through the classical channel is assumed to be eavesdroppable and modifiable. The key for choosing the measurement bases of sample logical qubits is pre-shared privately between two participants. The Bell state measurements rather than the four-qubit joint measurements are adopted for decoding. The two participants share the initial states of message logical Bell states with resort to the direct transmission of auxiliary logical Bell states so that the information leakage problem is avoided. The impersonation attack, the man-in-the-middle attack, the modification attack and the Trojan horse attacks from Eve all are detectable.

  3. Fault-tolerant self-routing computer network topology

    SciTech Connect

    Mitchell, T.L.

    1987-01-01

    This document reports on the development and analysis of a new, easily expandable, highly fault tolerant self-routing computer network topology. The topology applies equally to any general-purpose computer-networking environment, whether local, metropolitan, or wide area. This new connectivity scheme is named the spiral topology because the architecture is built around modules of four computer nodes each, connected by top and bottom spirals. The spiral topology features a simple internal self-routing algorithm that adapts quickly, and automatically, to failed nodes and links. The six most important direct consequences of the spiral computer-network architecture are the topology's (1) ease of expansion; (2) fast, on-the-fly self-routing; (3) extremely high tolerance to network faults; (4) increased network security; (5) potential for the total elimination of store and forward transmissions due to routing decision delays; and (6) rendering the maximum path length issue moots. The fast on-the-fly routing capability of the spiral topology makes it highly amenable to fiber topic communications in any networking environment.

  4. Application of fault-tolerant controls to UAVs

    NASA Astrophysics Data System (ADS)

    Vos, David W.; Motazed, Ben

    1996-05-01

    Autonomous unmanned systems require provision for fault detection and recovery. Multiply-redundant schemes typically used in aerospace applications are prohibitively expensive and inappropriate solution for unmanned systems where low cost and small size are critical. Aurora Flight Sciences is developing alternative low-cost, fault-tolerant control (FTC) capabilities, incorporating failure detection and isolation, and control reconfiguring algorithms into aircraft flight control systems. A 'monitoring observer', or failure detection filter, predicts the future aircraft state based on prior control inputs and measurements, and interprets discrepancies between the output of the two systems. The FTC detects and isolates the onset of a sensor or actuator failure in real-time, and automatically reconfigures the control laws to maintain full control authority. This methodology is unique in providing a compact and elegant FTC solution to dynamic systems with nonlinear parameter dependence, such as high-altitude UAVs (unmanned air vehicles) and UUVs (unmanned undersea vehicles), where the dynamic behavior varies strongly with speed (i.e., dynamic pressure) and density. In simulation, the application of the algorithm to actual telemetry data from an in-flight vertical gyro failure, shows the algorithm can easily detect the failure and further demonstrated (in simulation) reconfiguring of the autopilots to successfully accommodate recovery.

  5. Fault-Tolerant Software-Defined Radio on Manycore

    NASA Technical Reports Server (NTRS)

    Ricketts, Scott

    2015-01-01

    Software-defined radio (SDR) platforms generally rely on field-programmable gate arrays (FPGAs) and digital signal processors (DSPs), but such architectures require significant software development. In addition, application demands for radiation mitigation and fault tolerance exacerbate programming challenges. MaXentric Technologies, LLC, has developed a manycore-based SDR technology that provides 100 times the throughput of conventional radiationhardened general purpose processors. Manycore systems (30-100 cores and beyond) have the potential to provide high processing performance at error rates that are equivalent to current space-deployed uniprocessor systems. MaXentric's innovation is a highly flexible radio, providing over-the-air reconfiguration; adaptability; and uninterrupted, real-time, multimode operation. The technology is also compliant with NASA's Space Telecommunications Radio System (STRS) architecture. In addition to its many uses within NASA communications, the SDR can also serve as a highly programmable research-stage prototyping device for new waveforms and other communications technologies. It can also support noncommunication codes on its multicore processor, collocated with the communications workload-reducing the size, weight, and power of the overall system by aggregating processing jobs to a single board computer.

  6. Integrated sensor and actuator fault-tolerant control

    NASA Astrophysics Data System (ADS)

    Seron, María M.; De Doná, José A.; Richter, Jan H.

    2013-04-01

    We propose a fault-tolerant control scheme that deals with sensor and actuator faults through the use of a virtual actuator (VA) and a bank of virtual sensors (VSs). A novel feature of the scheme is that the VSs implicitly integrate both fault detection and isolation (FDI) and - in conjunction with the VA - controller reconfiguration tasks. The VA and the bank of VSs operate in closed-loop with an observer-based tracking controller designed for a nominal (fault free) model of the plant. A switching rule that reconfigures the VA and engages the suitable VS from the bank is based on sets defined for measurable residual signals constructed directly from the VS signals. Our method handles abrupt actuator and sensor faults of arbitrary magnitude including complete outage. The overall scheme is shown to guarantee closed-loop boundedness and setpoint tracking under all considered fault situations. Enhancements of the scheme to deal with errors in the fault detection and isolation are also proposed. Applications of the scheme to a winding machine and an interconnected tank system are presented.

  7. Fault-tolerant error correction with the gauge color code

    NASA Astrophysics Data System (ADS)

    Brown, Benjamin J.; Nickerson, Naomi H.; Browne, Dan E.

    2016-07-01

    The constituent parts of a quantum computer are inherently vulnerable to errors. To this end, we have developed quantum error-correcting codes to protect quantum information from noise. However, discovering codes that are capable of a universal set of computational operations with the minimal cost in quantum resources remains an important and ongoing challenge. One proposal of significant recent interest is the gauge color code. Notably, this code may offer a reduced resource cost over other well-studied fault-tolerant architectures by using a new method, known as gauge fixing, for performing the non-Clifford operations that are essential for universal quantum computation. Here we examine the gauge color code when it is subject to noise. Specifically, we make use of single-shot error correction to develop a simple decoding algorithm for the gauge color code, and we numerically analyse its performance. Remarkably, we find threshold error rates comparable to those of other leading proposals. Our results thus provide the first steps of a comparative study between the gauge color code and other promising computational architectures.

  8. Reactive system verification case study: Fault-tolerant transputer communication

    NASA Technical Reports Server (NTRS)

    Crane, D. Francis; Hamory, Philip J.

    1993-01-01

    A reactive program is one which engages in an ongoing interaction with its environment. A system which is controlled by an embedded reactive program is called a reactive system. Examples of reactive systems are aircraft flight management systems, bank automatic teller machine (ATM) networks, airline reservation systems, and computer operating systems. Reactive systems are often naturally modeled (for logical design purposes) as a composition of autonomous processes which progress concurrently and which communicate to share information and/or to coordinate activities. Formal (i.e., mathematical) frameworks for system verification are tools used to increase the users' confidence that a system design satisfies its specification. A framework for reactive system verification includes formal languages for system modeling and for behavior specification and decision procedures and/or proof-systems for verifying that the system model satisfies the system specifications. Using the Ostroff framework for reactive system verification, an approach to achieving fault-tolerant communication between transputers was shown to be effective. The key components of the design, the decoupler processes, may be viewed as discrete-event-controllers introduced to constrain system behavior such that system specifications are satisfied. The Ostroff framework was also effective. The expressiveness of the modeling language permitted construction of a faithful model of the transputer network. The relevant specifications were readily expressed in the specification language. The set of decision procedures provided was adequate to verify the specifications of interest. The need for improved support for system behavior visualization is emphasized.

  9. Software-Implemented Fault Tolerance in Communications Systems

    NASA Technical Reports Server (NTRS)

    Gantenbein, Rex E.

    1994-01-01

    Software-implemented fault tolerance (SIFT) is used in many computer-based command, control, and communications (C(3)) systems to provide the nearly continuous availability that they require. In the communications subsystem of Space Station Alpha, SIFT algorithms are used to detect and recover from failures in the data and command link between the Station and its ground support. The paper presents a review of these algorithms and discusses how such techniques can be applied to similar systems found in applications such as manufacturing control, military communications, and programmable devices such as pacemakers. With support from the Tracking and Communication Division of NASA's Johnson Space Center, researchers at the University of Wyoming are developing a testbed for evaluating the effectiveness of these algorithms prior to their deployment. This testbed will be capable of simulating a variety of C(3) system failures and recording the response of the Space Station SIFT algorithms to these failures. The design of this testbed and the applicability of the approach in other environments is described.

  10. Fault-tolerant quantum blind signature protocols against collective noise

    NASA Astrophysics Data System (ADS)

    Zhang, Ming-Hui; Li, Hui-Fang

    2016-07-01

    This work proposes two fault-tolerant quantum blind signature protocols based on the entanglement swapping of logical Bell states, which are robust against two kinds of collective noises: the collective-dephasing noise and the collective-rotation noise, respectively. Both of the quantum blind signature protocols are constructed from four-qubit decoherence-free (DF) states, i.e., logical Bell qubits. The initial message is encoded on the logical Bell qubits with logical unitary operations, which will not destroy the anti-noise trait of the logical Bell qubits. Based on the fundamental property of quantum entanglement swapping, the receiver simply performs two Bell-state measurements (rather than four-qubit joint measurements) on the logical Bell qubits to verify the signature, which makes the protocols more convenient in a practical application. Different from the existing quantum signature protocols, our protocols can offer the high fidelity of quantum communication with the employment of logical qubits. Moreover, we hereinafter prove the security of the protocols against some individual eavesdropping attacks, and we show that our protocols have the characteristics of unforgeability, undeniability and blindness.

  11. Fault-tolerant error correction with the gauge color code.

    PubMed

    Brown, Benjamin J; Nickerson, Naomi H; Browne, Dan E

    2016-01-01

    The constituent parts of a quantum computer are inherently vulnerable to errors. To this end, we have developed quantum error-correcting codes to protect quantum information from noise. However, discovering codes that are capable of a universal set of computational operations with the minimal cost in quantum resources remains an important and ongoing challenge. One proposal of significant recent interest is the gauge color code. Notably, this code may offer a reduced resource cost over other well-studied fault-tolerant architectures by using a new method, known as gauge fixing, for performing the non-Clifford operations that are essential for universal quantum computation. Here we examine the gauge color code when it is subject to noise. Specifically, we make use of single-shot error correction to develop a simple decoding algorithm for the gauge color code, and we numerically analyse its performance. Remarkably, we find threshold error rates comparable to those of other leading proposals. Our results thus provide the first steps of a comparative study between the gauge color code and other promising computational architectures. PMID:27470619

  12. Fault-tolerant error correction with the gauge color code

    PubMed Central

    Brown, Benjamin J.; Nickerson, Naomi H.; Browne, Dan E.

    2016-01-01

    The constituent parts of a quantum computer are inherently vulnerable to errors. To this end, we have developed quantum error-correcting codes to protect quantum information from noise. However, discovering codes that are capable of a universal set of computational operations with the minimal cost in quantum resources remains an important and ongoing challenge. One proposal of significant recent interest is the gauge color code. Notably, this code may offer a reduced resource cost over other well-studied fault-tolerant architectures by using a new method, known as gauge fixing, for performing the non-Clifford operations that are essential for universal quantum computation. Here we examine the gauge color code when it is subject to noise. Specifically, we make use of single-shot error correction to develop a simple decoding algorithm for the gauge color code, and we numerically analyse its performance. Remarkably, we find threshold error rates comparable to those of other leading proposals. Our results thus provide the first steps of a comparative study between the gauge color code and other promising computational architectures. PMID:27470619

  13. A Self-Stabilizing Hybrid Fault-Tolerant Synchronization Protocol

    NASA Technical Reports Server (NTRS)

    Malekpour, Mahyar R.

    2015-01-01

    This paper presents a strategy for solving the Byzantine general problem for self-stabilizing a fully connected network from an arbitrary state and in the presence of any number of faults with various severities including any number of arbitrary (Byzantine) faulty nodes. The strategy consists of two parts: first, converting Byzantine faults into symmetric faults, and second, using a proven symmetric-fault tolerant algorithm to solve the general case of the problem. A protocol (algorithm) is also present that tolerates symmetric faults, provided that there are more good nodes than faulty ones. The solution applies to realizable systems, while allowing for differences in the network elements, provided that the number of arbitrary faults is not more than a third of the network size. The only constraint on the behavior of a node is that the interactions with other nodes are restricted to defined links and interfaces. The solution does not rely on assumptions about the initial state of the system and no central clock nor centrally generated signal, pulse, or message is used. Nodes are anonymous, i.e., they do not have unique identities. A mechanical verification of a proposed protocol is also present. A bounded model of the protocol is verified using the Symbolic Model Verifier (SMV). The model checking effort is focused on verifying correctness of the bounded model of the protocol as well as confirming claims of determinism and linear convergence with respect to the self-stabilization period.

  14. Fault tolerance control for proton exchange membrane fuel cell systems

    NASA Astrophysics Data System (ADS)

    Wu, Xiaojuan; Zhou, Boyang

    2016-08-01

    Fault diagnosis and controller design are two important aspects to improve proton exchange membrane fuel cell (PEMFC) system durability. However, the two tasks are often separately performed. For example, many pressure and voltage controllers have been successfully built. However, these controllers are designed based on the normal operation of PEMFC. When PEMFC faces problems such as flooding or membrane drying, a controller with a specific design must be used. This paper proposes a unique scheme that simultaneously performs fault diagnosis and tolerance control for the PEMFC system. The proposed control strategy consists of a fault diagnosis, a reconfiguration mechanism and adjustable controllers. Using a back-propagation neural network, a model-based fault detection method is employed to detect the PEMFC current fault type (flooding, membrane drying or normal). According to the diagnosis results, the reconfiguration mechanism determines which backup controllers to be selected. Three nonlinear controllers based on feedback linearization approaches are respectively built to adjust the voltage and pressure difference in the case of normal, membrane drying and flooding conditions. The simulation results illustrate that the proposed fault tolerance control strategy can track the voltage and keep the pressure difference at desired levels in faulty conditions.

  15. RAID Unbound: Storage Fault Tolerance in a Distributed Environment

    NASA Technical Reports Server (NTRS)

    Ritchie, Brian

    1996-01-01

    Mirroring, data replication, backup, and more recently, redundant arrays of independent disks (RAID) are all technologies used to protect and ensure access to critical company data. A new set of problems has arisen as data becomes more and more geographically distributed. Each of the technologies listed above provides important benefits; but each has failed to adapt fully to the realities of distributed computing. The key to data high availability and protection is to take the technologies' strengths and 'virtualize' them across a distributed network. RAID and mirroring offer high data availability, which data replication and backup provide strong data protection. If we take these concepts at a very granular level (defining user, record, block, file, or directory types) and them liberate them from the physical subsystems with which they have traditionally been associated, we have the opportunity to create a highly scalable network wide storage fault tolerance. The network becomes the virtual storage space in which the traditional concepts of data high availability and protection are implemented without their corresponding physical constraints.

  16. Modeling and measurement of fault-tolerant multiprocessors

    NASA Technical Reports Server (NTRS)

    Shin, K. G.; Woodbury, M. H.; Lee, Y. H.

    1985-01-01

    The workload effects on computer performance are addressed first for a highly reliable unibus multiprocessor used in real-time control. As an approach to studing these effects, a modified Stochastic Petri Net (SPN) is used to describe the synchronous operation of the multiprocessor system. From this model the vital components affecting performance can be determined. However, because of the complexity in solving the modified SPN, a simpler model, i.e., a closed priority queuing network, is constructed that represents the same critical aspects. The use of this model for a specific application requires the partitioning of the workload into job classes. It is shown that the steady state solution of the queuing model directly produces useful results. The use of this model in evaluating an existing system, the Fault Tolerant Multiprocessor (FTMP) at the NASA AIRLAB, is outlined with some experimental results. Also addressed is the technique of measuring fault latency, an important microscopic system parameter. Most related works have assumed no or a negligible fault latency and then performed approximate analyses. To eliminate this deficiency, a new methodology for indirectly measuring fault latency is presented.

  17. Fault-tolerant Holonomic Quantum Computation in Surface Codes

    NASA Astrophysics Data System (ADS)

    Zheng, Yicong; Brun, Todd; USC QIP Team Team

    2015-03-01

    We show that universal holonomic quantum computation (HQC) can be achieved by adiabatically deforming the gapped stabilizer Hamiltonian of the surface code, where quantum information is encoded in the degenerate ground space of the system Hamiltonian. We explicitly propose procedures to perform each logical operation, including logical state initialization, logical state measurement, logical CNOT, state injection and distillation,etc. In particular, adiabatic braiding of different types of holes on the surface leads to a topologically protected, non-Abelian geometric logical CNOT. Throughout the computation, quantum information is protected from both small perturbations and low weight thermal excitations by a constant energy gap, and is independent of the system size. Also the Hamiltonian terms have weight at most four during the whole process. The effect of thermal error propagation is considered during the adiabatic code deformation. With the help of active error correction, this scheme is fault-tolerant, in the sense that the computation time can be arbitrarily long for large enough lattice size. It is shown that the frequency of error correction and the physical resources needed can be greatly reduced by the constant energy gap.

  18. A verified design of a fault-tolerant clock synchronization circuit: Preliminary investigations

    NASA Technical Reports Server (NTRS)

    Miner, Paul S.

    1992-01-01

    Schneider demonstrates that many fault tolerant clock synchronization algorithms can be represented as refinements of a single proven correct paradigm. Shankar provides mechanical proof that Schneider's schema achieves Byzantine fault tolerant clock synchronization provided that 11 constraints are satisfied. Some of the constraints are assumptions about physical properties of the system and cannot be established formally. Proofs are given that the fault tolerant midpoint convergence function satisfies three of the constraints. A hardware design is presented, implementing the fault tolerant midpoint function, which is shown to satisfy the remaining constraints. The synchronization circuit will recover completely from transient faults provided the maximum fault assumption is not violated. The initialization protocol for the circuit also provides a recovery mechanism from total system failure caused by correlated transient faults.

  19. An Integrated Fault Tolerant Robotic Controller System for High Reliability and Safety

    NASA Technical Reports Server (NTRS)

    Marzwell, Neville I.; Tso, Kam S.; Hecht, Myron

    1994-01-01

    This paper describes the concepts and features of a fault-tolerant intelligent robotic control system being developed for applications that require high dependability (reliability, availability, and safety). The system consists of two major elements: a fault-tolerant controller and an operator workstation. The fault-tolerant controller uses a strategy which allows for detection and recovery of hardware, operating system, and application software failures.The fault-tolerant controller can be used by itself in a wide variety of applications in industry, process control, and communications. The controller in combination with the operator workstation can be applied to robotic applications such as spaceborne extravehicular activities, hazardous materials handling, inspection and maintenance of high value items (e.g., space vehicles, reactor internals, or aircraft), medicine, and other tasks where a robot system failure poses a significant risk to life or property.

  20. A fault-tolerant multiprocessor architecture for aircraft, volume 1. [autopilot configuration

    NASA Technical Reports Server (NTRS)

    Smith, T. B.; Hopkins, A. L.; Taylor, W.; Ausrotas, R. A.; Lala, J. H.; Hanley, L. D.; Martin, J. H.

    1978-01-01

    A fault-tolerant multiprocessor architecture is reported. This architecture, together with a comprehensive information system architecture, has important potential for future aircraft applications. A preliminary definition and assessment of a suitable multiprocessor architecture for such applications is developed.

  1. [Advanced Development for Space Robotics With Emphasis on Fault Tolerance Technology

    NASA Technical Reports Server (NTRS)

    Tesar, Delbert

    1997-01-01

    This report describes work developing fault tolerant redundant robotic architectures and adaptive control strategies for robotic manipulator systems which can dynamically accommodate drastic robot manipulator mechanism, sensor or control failures and maintain stable end-point trajectory control with minimum disturbance. Kinematic designs of redundant, modular, reconfigurable arms for fault tolerance were pursued at a fundamental level. The approach developed robotic testbeds to evaluate disturbance responses of fault tolerant concepts in robotic mechanisms and controllers. The development was implemented in various fault tolerant mechanism testbeds including duality in the joint servo motor modules, parallel and serial structural architectures, and dual arms. All have real-time adaptive controller technologies to react to mechanism or controller disturbances (failures) to perform real-time reconfiguration to continue the task operations. The developments fall into three main areas: hardware, software, and theoretical.

  2. Sequoia: A fault-tolerant tightly coupled multiprocessor for transaction processing

    SciTech Connect

    Bernstein, P.A.

    1988-02-01

    The Sequoia computer is a tightly coupled multiprocessor, and thus attains the performance advantages of this style of architecture. It avoids most of the fault-tolerance disadvantages of tight coupling by using a new fault-tolerance design. The Sequoia architecture is similar to other multimicroprocessor architectures, such as those of Encore and Sequent, in that it gives dozens of microprocessors shared access to a large main memory. It resembles the Stratus architecture in its extensive use of hardware fault-detection techniques. It resembles Stratus and Auragen in its ability to quickly recover all processes after a single point failure, transparently to the user. However, Sequoia is unique in its combination of a large-scale tightly coupled architecture with a hardware approach to fault tolerance. This article gives an overview of how the hardware architecture and operating systems (OS) work together to provide a high degree of fault tolerance with good system performance.

  3. Design of a fault-tolerant decision-making system for biomedical applications.

    PubMed

    Faust, Oliver; Acharya, U Rajendra; Sputh, Bernhard H C; Tamura, Toshiyo

    2013-01-01

    This paper describes the design of a fault-tolerant classification system for medical applications. The design process follows the systems engineering methodology: in the agreement phase, we make the case for fault tolerance in diagnosis systems for biomedical applications. The argument extends the idea that machine diagnosis systems mimic the functionality of human decision-making, but in many cases they do not achieve the fault tolerance of the human brain. After making the case for fault tolerance, both requirements and specification for the fault-tolerant system are introduced before the implementation is discussed. The system is tested with fault and use cases to build up trust in the implemented system. This structured approach aided in the realisation of the fault-tolerant classification system. During the specification phase, we produced a formal model that enabled us to discuss what fault tolerance, reliability and safety mean for this particular classification system. Furthermore, such a formal basis for discussion is extremely useful during the initial stages of the design, because it helps to avoid big mistakes caused by a lack of overview later on in the project. During the implementation, we practiced component reuse by incorporating a reliable classification block, which was developed during a previous project, into the current design. Using a well-structured approach and practicing component reuse we follow best practice for both research and industry projects, which enabled us to realise the fault-tolerant classification system on time and within budget. This system can serve in a wide range of future health care systems. PMID:22288838

  4. Error Mitigation of Point-to-Point Communication for Fault-Tolerant Computing

    NASA Technical Reports Server (NTRS)

    Akamine, Robert L.; Hodson, Robert F.; LaMeres, Brock J.; Ray, Robert E.

    2011-01-01

    Fault tolerant systems require the ability to detect and recover from physical damage caused by the hardware s environment, faulty connectors, and system degradation over time. This ability applies to military, space, and industrial computing applications. The integrity of Point-to-Point (P2P) communication, between two microcontrollers for example, is an essential part of fault tolerant computing systems. In this paper, different methods of fault detection and recovery are presented and analyzed.

  5. Advanced information processing system: The Army fault tolerant architecture conceptual study. Volume 1: Army fault tolerant architecture overview

    NASA Technical Reports Server (NTRS)

    Harper, R. E.; Alger, L. S.; Babikyan, C. A.; Butler, B. P.; Friend, S. A.; Ganska, R. J.; Lala, J. H.; Masotto, T. K.; Meyer, A. J.; Morton, D. P.

    1992-01-01

    Digital computing systems needed for Army programs such as the Computer-Aided Low Altitude Helicopter Flight Program and the Armored Systems Modernization (ASM) vehicles may be characterized by high computational throughput and input/output bandwidth, hard real-time response, high reliability and availability, and maintainability, testability, and producibility requirements. In addition, such a system should be affordable to produce, procure, maintain, and upgrade. To address these needs, the Army Fault Tolerant Architecture (AFTA) is being designed and constructed under a three-year program comprised of a conceptual study, detailed design and fabrication, and demonstration and validation phases. Described here are the results of the conceptual study phase of the AFTA development. Given here is an introduction to the AFTA program, its objectives, and key elements of its technical approach. A format is designed for representing mission requirements in a manner suitable for first order AFTA sizing and analysis, followed by a discussion of the current state of mission requirements acquisition for the targeted Army missions. An overview is given of AFTA's architectural theory of operation.

  6. The Design of a Fault-Tolerant COTS-Based Bus Architecture

    NASA Technical Reports Server (NTRS)

    Chau, Savio N.; Alkalai, Leon; Burt, John B.; Tai, Ann T.

    1999-01-01

    In this paper, we report our experiences and findings on the design of a fault-tolerant bus architecture comprised of two COTS buses, the IEEE 1394 and the 12C. This fault-tolerant bus is the backbone system bus for the avionics architecture of the X2000 program at the Jet Propulsion Laboratory. COTS buses are attractive because of the availability of low cost commercial products. However, they are not specifically designed for highly reliable applications such as long-life deep-space missions. The X2000 design team has devised a multi-level fault tolerance approach to compensate for this shortcoming of COTS buses. First, the approach enhances the fault tolerance capabilities of the IEEE 1394 and 12 C buses by adding a layer of fault handling hardware and software. Second, algorithms are developed to enable the IEEE 1394 and the 12 C buses assist each other to isolate and recovery from faults. Third, the set of IEEE 1394 and 12 C buses is duplicated to further enhance system reliability. The X2000 design team has paid special attention to guarantee that all fault tolerance provisions will not cause the bus design to deviate from the commercial standard specifications. Otherwise, the economic attractiveness of using COTS will be diminished. The hardware and software design of the X2000 fault-tolerant bus are being implemented and flight hardware will be delivered to the ST4 and Europa Orbiter missions.

  7. Direct Fault Tolerant RLV Altitude Control: A Singular Perturbation Approach

    NASA Technical Reports Server (NTRS)

    Zhu, J. J.; Lawrence, D. A.; Fisher, J.; Shtessel, Y. B.; Hodel, A. S.; Lu, P.; Jackson, Scott (Technical Monitor)

    2002-01-01

    In this paper, we present a direct fault tolerant control (DFTC) technique, where by "direct" we mean that no explicit fault identification is used. The technique will be presented for the attitude controller (autopilot) for a reusable launch vehicle (RLV), although in principle it can be applied to many other applications. Any partial or complete failure of control actuators and effectors will be inferred from saturation of one or more commanded control signals generated by the controller. The saturation causes a reduction in the effective gain, or bandwidth of the feedback loop, which can be modeled as an increase in singular perturbation in the loop. In order to maintain stability, the bandwidth of the nominal (reduced-order) system will be reduced proportionally according to the singular perturbation theory. The presented DFTC technique automatically handles momentary saturations and integrator windup caused by excessive disturbances, guidance command or dispersions under normal vehicle conditions. For multi-input, multi-output (MIMO) systems with redundant control effectors, such as the RLV attitude control system, an algorithm is presented for determining the direction of bandwidth cutback using the method of minimum-time optimal control with constrained control in order to maintain the best performance that is possible with the reduced control authority. Other bandwidth cutback logic, such as one that preserves the commanded direction of the bandwidth or favors a preferred direction when the commanded direction cannot be achieved, is also discussed. In this extended abstract, a simplistic example is proved to demonstrate the idea. In the final paper, test results on the high fidelity 6-DOF X-33 model with severe dispersions will be presented.

  8. Rule-based fault diagnosis of hall sensors and fault-tolerant control of PMSM

    NASA Astrophysics Data System (ADS)

    Song, Ziyou; Li, Jianqiu; Ouyang, Minggao; Gu, Jing; Feng, Xuning; Lu, Dongbin

    2013-07-01

    Hall sensor is widely used for estimating rotor phase of permanent magnet synchronous motor(PMSM). And rotor position is an essential parameter of PMSM control algorithm, hence it is very dangerous if Hall senor faults occur. But there is scarcely any research focusing on fault diagnosis and fault-tolerant control of Hall sensor used in PMSM. From this standpoint, the Hall sensor faults which may occur during the PMSM operating are theoretically analyzed. According to the analysis results, the fault diagnosis algorithm of Hall sensor, which is based on three rules, is proposed to classify the fault phenomena accurately. The rotor phase estimation algorithms, based on one or two Hall sensor(s), are initialized to engender the fault-tolerant control algorithm. The fault diagnosis algorithm can detect 60 Hall fault phenomena in total as well as all detections can be fulfilled in 1/138 rotor rotation period. The fault-tolerant control algorithm can achieve a smooth torque production which means the same control effect as normal control mode (with three Hall sensors). Finally, the PMSM bench test verifies the accuracy and rapidity of fault diagnosis and fault-tolerant control strategies. The fault diagnosis algorithm can detect all Hall sensor faults promptly and fault-tolerant control algorithm allows the PMSM to face failure conditions of one or two Hall sensor(s). In addition, the transitions between health-control and fault-tolerant control conditions are smooth without any additional noise and harshness. Proposed algorithms can deal with the Hall sensor faults of PMSM in real applications, and can be provided to realize the fault diagnosis and fault-tolerant control of PMSM.

  9. A fault-tolerant control architecture for unmanned aerial vehicles

    NASA Astrophysics Data System (ADS)

    Drozeski, Graham R.

    Research has presented several approaches to achieve varying degrees of fault-tolerance in unmanned aircraft. Approaches in reconfigurable flight control are generally divided into two categories: those which incorporate multiple non-adaptive controllers and switch between them based on the output of a fault detection and identification element, and those that employ a single adaptive controller capable of compensating for a variety of fault modes. Regardless of the approach for reconfigurable flight control, certain fault modes dictate system restructuring in order to prevent a catastrophic failure. System restructuring enables active control of actuation not employed by the nominal system to recover controllability of the aircraft. After system restructuring, continued operation requires the generation of flight paths that adhere to an altered flight envelope. The control architecture developed in this research employs a multi-tiered hierarchy to allow unmanned aircraft to generate and track safe flight paths despite the occurrence of potentially catastrophic faults. The hierarchical architecture increases the level of autonomy of the system by integrating five functionalities with the baseline system: fault detection and identification, active system restructuring, reconfigurable flight control; reconfigurable path planning, and mission adaptation. Fault detection and identification algorithms continually monitor aircraft performance and issue fault declarations. When the severity of a fault exceeds the capability of the baseline flight controller, active system restructuring expands the controllability of the aircraft using unconventional control strategies not exploited by the baseline controller. Each of the reconfigurable flight controllers and the baseline controller employ a proven adaptive neural network control strategy. A reconfigurable path planner employs an adaptive model of the vehicle to re-shape the desired flight path. Generation of the revised

  10. Design and analysis of linear fault-tolerant permanent-magnet vernier machines.

    PubMed

    Xu, Liang; Ji, Jinghua; Liu, Guohai; Du, Yi; Liu, Hu

    2014-01-01

    This paper proposes a new linear fault-tolerant permanent-magnet (PM) vernier (LFTPMV) machine, which can offer high thrust by using the magnetic gear effect. Both PMs and windings of the proposed machine are on short mover, while the long stator is only manufactured from iron. Hence, the proposed machine is very suitable for long stroke system applications. The key of this machine is that the magnetizer splits the two movers with modular and complementary structures. Hence, the proposed machine offers improved symmetrical and sinusoidal back electromotive force waveform and reduced detent force. Furthermore, owing to the complementary structure, the proposed machine possesses favorable fault-tolerant capability, namely, independent phases. In particular, differing from the existing fault-tolerant machines, the proposed machine offers fault tolerance without sacrificing thrust density. This is because neither fault-tolerant teeth nor the flux-barriers are adopted. The electromagnetic characteristics of the proposed machine are analyzed using the time-stepping finite-element method, which verifies the effectiveness of the theoretical analysis. PMID:24982959

  11. Fault-Tolerant Consensus of Multi-Agent System With Distributed Adaptive Protocol.

    PubMed

    Chen, Shun; Ho, Daniel W C; Li, Lulu; Liu, Ming

    2015-10-01

    In this paper, fault-tolerant consensus in multi-agent system using distributed adaptive protocol is investigated. Firstly, distributed adaptive online updating strategies for some parameters are proposed based on local information of the network structure. Then, under the online updating parameters, a distributed adaptive protocol is developed to compensate the fault effects and the uncertainty effects in the leaderless multi-agent system. Based on the local state information of neighboring agents, a distributed updating protocol gain is developed which leads to a fully distributed continuous adaptive fault-tolerant consensus protocol design for the leaderless multi-agent system. Furthermore, a distributed fault-tolerant leader-follower consensus protocol for multi-agent system is constructed by the proposed adaptive method. Finally, a simulation example is given to illustrate the effectiveness of the theoretical analysis. PMID:25415998

  12. Fault-tolerant onboard digital information switching and routing for communications satellites

    NASA Technical Reports Server (NTRS)

    Shalkhauser, Mary JO; Quintana, Jorge A.; Soni, Nitin J.; Kim, Heechul

    1993-01-01

    The NASA Lewis Research Center is developing an information-switching processor for future meshed very-small-aperture terminal (VSAT) communications satellites. The information-switching processor will switch and route baseband user data onboard the VSAT satellite to connect thousands of Earth terminals. Fault tolerance is a critical issue in developing information-switching processor circuitry that will provide and maintain reliable communications services. In parallel with the conceptual development of the meshed VSAT satellite network architecture, NASA designed and built a simple test bed for developing and demonstrating baseband switch architectures and fault-tolerance techniques. The meshed VSAT architecture and the switching demonstration test bed are described, and the initial switching architecture and the fault-tolerance techniques that were developed and tested are discussed.

  13. The Impact of a Fault Tolerant MPI on Scalable Systems Services and Applications

    SciTech Connect

    Graham, Richard L; Hursey, Joshua J; Vallee, Geoffroy R; Naughton, III, Thomas J; Boehm, Swen

    2012-01-01

    Exascale targeted scientific applications must be prepared for a highly concurrent computing environment where failure will be a regular event during execution. Natural and algorithm-based fault tolerance (ABFT) techniques can often manage failures more efficiently than traditional checkpoint/restart techniques alone. Central to many petascale applications is an MPI standard that lacks support for ABFT. The Run-Through Stabilization (RTS) proposal, under consideration for MPI 3, allows an application to continue execution when processes fail. The requirements of scalable, fault tolerant MPI implementations and applications will stress the capabilities of many system services. System services must evolve to efficiently support such applications and libraries in the presence of system component failures. This paper discusses how the RTS proposal impacts system services, highlighting specific requirements. Early experimentation results from Cray systems at ORNL using prototype MPI and runtime implementations are presented. Additionally, this paper outlines fault tolerance techniques targeted at leadership class applications.

  14. Performance test results of a fault-tolerant inertial reference system

    NASA Astrophysics Data System (ADS)

    Jeerage, Mahesh K.

    This paper presents the performance test results of a fault-tolerant inertial reference system featuring skewed axis inertial sensors, sensor redundancy management scheme, and fault-tolerant electronics. This system, built by Honeywell Commercial Flight Systems Group, was calibrated and tested in the laboratory by Honeywell Systems and Research Center. This system was flight tested in 1989, by Boeing Commercial Aiplane Company, with excellent navigation and failure detection and isolation performance. A brief description of the system is presented in the paper with emphasis on the fault-tolerant aspects. The performance test results presented include nominal navigation performance and navigation performance under sensor failures. Performance of the failure detection and isolation scheme is also presented.

  15. Adaptive Fault Tolerance for Many-Core Based Space-Borne Computing

    NASA Technical Reports Server (NTRS)

    James, Mark; Springer, Paul; Zima, Hans

    2010-01-01

    This paper describes an approach to providing software fault tolerance for future deep-space robotic NASA missions, which will require a high degree of autonomy supported by an enhanced on-board computational capability. Such systems have become possible as a result of the emerging many-core technology, which is expected to offer 1024-core chips by 2015. We discuss the challenges and opportunities of this new technology, focusing on introspection-based adaptive fault tolerance that takes into account the specific requirements of applications, guided by a fault model. Introspection supports runtime monitoring of the program execution with the goal of identifying, locating, and analyzing errors. Fault tolerance assertions for the introspection system can be provided by the user, domain-specific knowledge, or via the results of static or dynamic program analysis. This work is part of an on-going project at the Jet Propulsion Laboratory in Pasadena, California.

  16. Hybrid routing technique for a fault-tolerant, integrated information network

    NASA Technical Reports Server (NTRS)

    Meredith, B. D.

    1986-01-01

    The evolutionary growth of the space station and the diverse activities onboard are expected to require a hierarchy of integrated, local area networks capable of supporting data, voice, and video communications. In addition, fault-tolerant network operation is necessary to protect communications between critical systems attached to the net and to relieve the valuable human resources onboard the space station of time-critical data system repair tasks. A key issue for the design of the fault-tolerant, integrated network is the development of a robust routing algorithm which dynamically selects the optimum communication paths through the net. A routing technique is described that adapts to topological changes in the network to support fault-tolerant operation and system evolvability.

  17. A Redundant Communication Approach to Scalable Fault Tolerance in PGAS Programming Models

    SciTech Connect

    Ali, Nawab; Krishnamoorthy, Sriram; Govind, Niranjan; Palmer, B. J.

    2011-02-09

    Recent trends in high-performance computing point towards increasingly large machines with millions of processing, storage, and networking elements. Unfortunately, the reliability of these machines is inversely proportional to their size, resulting in a system-wide mean-time-between-failures (MTBF) ranging from a few days to a few hours. As such, for long-running applications, the ability to efficiently recover from frequent failures is essential. Traditional forms of fault tolerance, such as checkpoint/restart, suffer from performance issues related to limited I/O and memory bandwidth. In this paper, we present a fault-tolerance mechanism that reduces the cost of failure recovery by maintaining shadow data structures and performing redundant remote memory accesses. We present results from a computational chemistry application running at scale to show that our techniques provide applications with a high degree of fault tolerance and low (2%--4%) overhead for 2048 processors.

  18. Fault tolerance of artificial neural networks with applications in critical systems

    NASA Technical Reports Server (NTRS)

    Protzel, Peter W.; Palumbo, Daniel L.; Arras, Michael K.

    1992-01-01

    This paper investigates the fault tolerance characteristics of time continuous recurrent artificial neural networks (ANN) that can be used to solve optimization problems. The principle of operations and performance of these networks are first illustrated by using well-known model problems like the traveling salesman problem and the assignment problem. The ANNs are then subjected to 13 simultaneous 'stuck at 1' or 'stuck at 0' faults for network sizes of up to 900 'neurons'. The effects of these faults is demonstrated and the cause for the observed fault tolerance is discussed. An application is presented in which a network performs a critical task for a real-time distributed processing system by generating new task allocations during the reconfiguration of the system. The performance degradation of the ANN under the presence of faults is investigated by large-scale simulations, and the potential benefits of delegating a critical task to a fault tolerant network are discussed.

  19. Focused fault injection testing of software implemented fault tolerance mechanisms of Voltan TMR nodes

    NASA Astrophysics Data System (ADS)

    Tao, S.; Ezhilchelvan, P. D.; Shrivastava, S. K.

    1995-03-01

    One way of gaining confidence in the adequacy of fault tolerance mechanisms of a system is to test the system by injecting faults and see how the system performs under faulty conditions. This paper presents an application of the focused fault injection method that has been developed for testing software implemented fault tolerance mechanisms of distributed systems. The method exploits the object oriented approach of software implementation to support the injection of specific classes of faults. With the focused fault injection method, the system tester is able to inject specific classes of faults (including malicious ones) such that the fault tolerance mechanisms of a target system can be tested adequately. The method has been applied to test the design and implementation of voting, clock synchronization, and ordering modules of the Voltan TMR (triple modular redundant) node. The tests performed uncovered three flaws in the system software.

  20. Development and analysis of the Software Implemented Fault-Tolerance (SIFT) computer

    NASA Technical Reports Server (NTRS)

    Goldberg, J.; Kautz, W. H.; Melliar-Smith, P. M.; Green, M. W.; Levitt, K. N.; Schwartz, R. L.; Weinstock, C. B.

    1984-01-01

    SIFT (Software Implemented Fault Tolerance) is an experimental, fault-tolerant computer system designed to meet the extreme reliability requirements for safety-critical functions in advanced aircraft. Errors are masked by performing a majority voting operation over the results of identical computations, and faulty processors are removed from service by reassigning computations to the nonfaulty processors. This scheme has been implemented in a special architecture using a set of standard Bendix BDX930 processors, augmented by a special asynchronous-broadcast communication interface that provides direct, processor to processor communication among all processors. Fault isolation is accomplished in hardware; all other fault-tolerance functions, together with scheduling and synchronization are implemented exclusively by executive system software. The system reliability is predicted by a Markov model. Mathematical consistency of the system software with respect to the reliability model has been partially verified, using recently developed tools for machine-aided proof of program correctness.

  1. Self-adaptive Fault-Tolerance of HLA-Based Simulations in the Grid Environment

    NASA Astrophysics Data System (ADS)

    Huang, Jijie; Chai, Xudong; Zhang, Lin; Li, Bo Hu

    The objects of a HLA-based simulation can access model services to update their attributes. However, the grid server may be overloaded and refuse the model service to handle objects accesses. Because these objects have been accessed this model service during last simulation loop and their medium state are stored in this server, this may terminate the simulation. A fault-tolerance mechanism must be introduced into simulations. But the traditional fault-tolerance methods cannot meet the above needs because the transmission latency between a federate and the RTI in grid environment varies from several hundred milliseconds to several seconds. By adding model service URLs to the OMT and expanding the HLA services and model services with some interfaces, this paper proposes a self-adaptive fault-tolerance mechanism of simulations according to the characteristics of federates accessing model services. Benchmark experiments indicate that the expanded HLA/RTI can make simulations self-adaptively run in the grid environment.

  2. A novel five-phase fault-tolerant modular in-wheel permanent-magnet synchronous machine for electric vehicles

    NASA Astrophysics Data System (ADS)

    Sui, Yi; Zheng, Ping; Wu, Fan; Wang, Pengfei; Cheng, Luming; Zhu, Jianguo

    2015-05-01

    This paper describes a five-phase fault-tolerant modular in-wheel permanent-magnet synchronous machine (PMSM) for electric vehicles. By adopting both the analytical and finite-element methods, the magnetic isolation abilities of some typical slot/pole combinations are analyzed, and a new fractional-slot concentrated winding topology that features hybrid single/double-layer concentrated windings and modular stator structure is developed. For the proposed hybrid single/double-layer concentrated windings, feasible slot/pole combinations are studied for three-, four-, and five-phase PMSMs. A five-phase in-wheel PMSM that adopts the proposed winding topology is designed and compared with the conventional PMSM, and the proposed machine shows advantages of large output torque, zero mutual inductances, low short-circuit current, and high magnetic isolation ability. Some of the analysis results are verified by experiments.

  3. Fault tolerant VLSI (Very Large-Scale Integration) design using error correcting codes

    NASA Astrophysics Data System (ADS)

    Hartmann, C. R.; Lala, P. K.; Ali, A. M.; Ganguly, S.; Visweswaran, G. S.

    1989-02-01

    Very Large-Scale Integration (VLSI) provides the opportunity to design fault tolerant, self-checking circuits with on-chip, concurrent error correction. This study determines the applicability of a variety of error-detecting, error-correcting codes (EDAC) in high speed digital data processors and buses. In considering both microcircuit faults and bus faults, some of the codes examined are: Berger, repetition, parity, residue, and Modified Reflected Binary codes. The report describes the improvement in fault tolerance obtained as a result of implementing these EDAC schemes and the associated penalties in circuit area.

  4. Validation Methods Research for Fault-Tolerant Avionics and Control Systems: Working Group Meeting, 2

    NASA Technical Reports Server (NTRS)

    Gault, J. W. (Editor); Trivedi, K. S. (Editor); Clary, J. B. (Editor)

    1980-01-01

    The validation process comprises the activities required to insure the agreement of system realization with system specification. A preliminary validation methodology for fault tolerant systems documented. A general framework for a validation methodology is presented along with a set of specific tasks intended for the validation of two specimen system, SIFT and FTMP. Two major areas of research are identified. First, are those activities required to support the ongoing development of the validation process itself, and second, are those activities required to support the design, development, and understanding of fault tolerant systems.

  5. Run-Through Stabilization: An MPI Proposal for Process Fault Tolerance

    SciTech Connect

    Hursey, Joshua J; Graham, Richard L; Bronevetsky, Greg; Butinas, Darius; Pritchard, Howard; Solt, David G.

    2011-01-01

    The MPI standard lacks semantics and interfaces for sustained application execution in the presence of process failures. Exascale HPC systems may require scalable, fault resilient MPI applications. The mission of the MPI Forum's Fault Tolerance Working Group is to enhance the standard to enable the development of scalable, fault tolerant HPC applications. This paper presents an overview of the Run-Through Stabilization proposal. This proposal allows an application to continue execution even if MPI processes fail during execution. The discussion introduces the implications on point-to-point and collective operations over communicators, though the full proposal addresses all aspects of the MPI standard.

  6. Fault-tolerant computer study. [logic designs for building block circuits

    NASA Technical Reports Server (NTRS)

    Rennels, D. A.; Avizienis, A. A.; Ercegovac, M. D.

    1981-01-01

    A set of building block circuits is described which can be used with commercially available microprocessors and memories to implement fault tolerant distributed computer systems. Each building block circuit is intended for VLSI implementation as a single chip. Several building blocks and associated processor and memory chips form a self checking computer module with self contained input output and interfaces to redundant communications buses. Fault tolerance is achieved by connecting self checking computer modules into a redundant network in which backup buses and computer modules are provided to circumvent failures. The requirements and design methodology which led to the definition of the building block circuits are discussed.

  7. A multiple fault-tolerant processor network architecture for pipeline computing

    SciTech Connect

    Tyszer, J. )

    1988-11-01

    Certain fault-tolerant multiprocessor networks that can emulate linear array interconnections are considered. The system is fault tolerant of (m - 1) node and link failures. One of the particularly attractive features of this network is that it allows for a linear array structure starting with any node even in spite of (m - 2) faults. The configuration algorithm is fully distributed, and is performed on the basis of test results obtained from nonfaulty processors only. A simple fault identification procedure is developed using the above routing algorithm.

  8. A model for the analysis of fault-tolerant signal processing architectures

    NASA Technical Reports Server (NTRS)

    Nair, V. S. S.; Abraham, J. A.

    1989-01-01

    This paper develops a new model, using matrices, for the analysis of fault-tolerant multiprocessor systems. The relationship between processors computing useful data, the output data, and the check processors is defined in terms of matrix entries. Unlike the matrix-based models proposed previously for the analysis of digital systems, this model uses only numerical computations rather than logical operations for the analysis of a system. Algorithms to evaluate the fault detection and location capability of the system are proposed which are much less complex than the existing ones. The new model is used to analyze some fault-tolerant architectures proposed for signal-processing applications.

  9. Refinement for fault-tolerance: An aircraft hand-off protocol

    NASA Technical Reports Server (NTRS)

    Marzullo, Keith; Schneider, Fred B.; Dehn, Jon

    1994-01-01

    Part of the Advanced Automation System (AAS) for air-traffic control is a protocol to permit flight hand-off from one air-traffic controller to another. The protocol must be fault-tolerant and, therefore, is subtle -- an ideal candidate for the application of formal methods. This paper describes a formal method for deriving fault-tolerant protocols that is based on refinement and proof outlines. The AAS hand-off protocol was actually derived using this method; that derivation is given.

  10. The SIFT computer and its development. [Software Implemented Fault Tolerance for aircraft control

    NASA Technical Reports Server (NTRS)

    Goldberg, J.

    1981-01-01

    Software Implemented Fault Tolerance (SIFT) is an aircraft control computer designed to allow failure probability of less than 10 to the -10th/hour. The system is based on advanced fault-tolerance computing and validation methodology. Since confirmation of reliability by observation is essentially impossible, system reliability is estimated by a Markov model. A mathematical proof is used to justify the validity of the Markov model. System design is represented by a hierarchy of abstract models, and the design proof comprises mathematical proofs that each model is, in fact, an elaboration of the next more abstract model.

  11. Non-Linear Finite Element Modeling of THUNDER Piezoelectric Actuators

    NASA Technical Reports Server (NTRS)

    Taleghani, Barmac K.; Campbell, Joel F.

    1999-01-01

    A NASTRAN non-linear finite element model has been developed for predicting the dome heights of THUNDER (THin Layer UNimorph Ferroelectric DrivER) piezoelectric actuators. To analytically validate the finite element model, a comparison was made with a non-linear plate solution using Von Karmen's approximation. A 500 volt input was used to examine the actuator deformation. The NASTRAN finite element model was also compared with experimental results. Four groups of specimens were fabricated and tested. Four different input voltages, which included 120, 160, 200, and 240 Vp-p with a 0 volts offset, were used for this comparison.

  12. An approximation formula for a class of fault-tolerant computers

    NASA Technical Reports Server (NTRS)

    White, A. L.

    1986-01-01

    An approximation formula is derived for the probability of failure for fault-tolerant process-control computers. These computers use redundancy and reconfiguration to achieve high reliability. Finite-state Markov models capture the dynamic behavior of component failure and system recovery, and the approximation formula permits an estimation of system reliability by an easy examination of the model.

  13. Real-time optimal torque control of fault-tolerant permanent magnet brushless machines

    NASA Astrophysics Data System (ADS)

    Max, L.; Wang, J.; Atallah, K.; Howe, D.

    2005-05-01

    The paper describes issues that are pertinent to control system hardware and software design for the real-time implementation of an optimal torque control strategy for fault-tolerant permanent magnet brushless ac drives, and reports experimental results. The influence of the current control loop bandwidth and pulse width modulation on the torque ripple are investigated and quantified.

  14. Final Project Report. Scalable fault tolerance runtime technology for petascale computers

    SciTech Connect

    Krishnamoorthy, Sriram; Sadayappan, P

    2015-06-16

    With the massive number of components comprising the forthcoming petascale computer systems, hardware failures will be routinely encountered during execution of large-scale applications. Due to the multidisciplinary, multiresolution, and multiscale nature of scientific problems that drive the demand for high end systems, applications place increasingly differing demands on the system resources: disk, network, memory, and CPU. In addition to MPI, future applications are expected to use advanced programming models such as those developed under the DARPA HPCS program as well as existing global address space programming models such as Global Arrays, UPC, and Co-Array Fortran. While there has been a considerable amount of work in fault tolerant MPI with a number of strategies and extensions for fault tolerance proposed, virtually none of advanced models proposed for emerging petascale systems is currently fault aware. To achieve fault tolerance, development of underlying runtime and OS technologies able to scale to petascale level is needed. This project has evaluated range of runtime techniques for fault tolerance for advanced programming models.

  15. Cost and benefits design optimization model for fault tolerant flight control systems

    NASA Technical Reports Server (NTRS)

    Rose, J.

    1982-01-01

    Requirements and specifications for a method of optimizing the design of fault-tolerant flight control systems are provided. Algorithms that could be used for developing new and modifying existing computer programs are also provided, with recommendations for follow-on work.

  16. Fault-tolerant interconnection network and image-processing applications for the PASM parallel processing system

    SciTech Connect

    Adams, G.B. III

    1984-01-01

    The demand for very high speed data processing coupled with falling hardware costs has made large-scale parallel and distributed computer systems both desirable and feasible. Two modes of parallel processing are single instruction stream-multiple data stream (SIMD) and multiple instruction stream-multiple data stream (MIMD). PASM, a partitionable SIMD/MIMD system, is a reconfigurable multimicroprocessor system being designed for image processing and pattern recognition. An important component of these systems is the interconnection network, the mechanism for communication among the computation nodes and memories. Assuring high reliability for such complex systems is a significant task. Thus, a crucial practical aspect of an interconnection network is fault tolerance. In answer to this need, the Extra Stage Cube (ESC), a fault-tolerant, multistage cube-type interconnection network, is define. The fault tolerance of the ESC is explored for both single and multiple faults, routing tags are defined, and consideration is given to permuting data and partitioning the ESC in the presence of faults. The ESC is compared with other fault-tolerant multistage networks. Finally, reliability of the ESC and an enhanced version of it are investigated.

  17. Design of a 2*2 fault-tolerant switching element

    SciTech Connect

    Woei Lin; Chuan-lin Wu

    1982-01-01

    The architecture of a 2*2 fault-tolerant switching element which can be used to modularly construct interconnection networks for multiprocessing and local computer networking is described. The switching element uses distributed control and circuit switching. Its good gate-to-pin ratio can facilitate VLSI implementation. 18 references.

  18. Fault-tolerant system considerations for a redundant strapdown inertial measurement unit

    NASA Technical Reports Server (NTRS)

    Motyka, P.; Ornedo, R.; Mangoubi, R.

    1984-01-01

    The development and evaluation of a fault-tolerant system for the Redundant Strapdown Inertial Measurement Unit (RSDIMU) being developed and evaluated by the NASA Langley Research Center was continued. The RSDIMU consists of four two-degree-of-freedom gyros and accelerometers mounted on the faces of a semi-octahedron which can be separated into two halves for damage protection. Compensated and uncompensated fault-tolerant system failure decision algorithms were compared. An algorithm to compensate for sensor noise effects in the fault-tolerant system thresholds was evaluated via simulation. The effects of sensor location and magnitude of the vehicle structural modes on system performance were assessed. A threshold generation algorithm, which incorporates noise compensation and filtered parity equation residuals for structural mode compensation, was evaluated. The effects of the fault-tolerant system on navigational accuracy were also considered. A sensor error parametric study was performed in an attempt to improve the soft failure detection capability without obtaining false alarms. Also examined was an FDI system strategy based on the pairwise comparison of sensor measurements. This strategy has the specific advantage of, in many instances, successfully detecting and isolating up to two simultaneously occurring failures.

  19. Reliability model derivation of a fault-tolerant, dual, spare-switching, digital computer system

    NASA Technical Reports Server (NTRS)

    1974-01-01

    A computer based reliability projection aid, tailored specifically for application in the design of fault-tolerant computer systems, is described. Its more pronounced characteristics include the facility for modeling systems with two distinct operational modes, measuring the effect of both permanent and transient faults, and calculating conditional system coverage factors. The underlying conceptual principles, mathematical models, and computer program implementation are presented.

  20. A hardware implementation of a provably correct design of a fault-tolerant clock synchronization circuit

    NASA Technical Reports Server (NTRS)

    Torres-Pomales, Wilfredo

    1993-01-01

    A fault-tolerant clock synchronization system was designed to a proven correct formal specification. Formal methods were used in the development of this specification. A description of the system and an analysis of the tests performed are presented. Plots of typical experimental results are included.

  1. A novel learning algorithm which improves the partial fault tolerance of multilayer neural networks.

    PubMed

    Cavalieri, Salvatore; Mirabella, Orazio

    1999-01-01

    The paper deals with the problem of fault tolerance in a multilayer perceptron network. Although it already possesses a reasonable fault tolerance capability, it may be insufficient in particularly critical applications. Studies carried out by the authors have shown that the traditional backpropagation learning algorithm may entail the presence of a certain number of weights with a much higher absolute value than the others. Further studies have shown that faults in these weights is the main cause of deterioration in the performance of the neural network. In other words, the main cause of incorrect network functioning on the occurrence of a fault is the non-uniform distribution of absolute values of weights in each layer. The paper proposes a learning algorithm which updates the weights, distributing their absolute values as uniformly as possible in each layer. Tests performed on benchmark test sets have shown the considerable increase in fault tolerance obtainable with the proposed approach as compared with the traditional backpropagation algorithm and with some of the most efficient fault tolerance approaches to be found in literature. PMID:12662719

  2. A survey of provably correct fault-tolerant clock synchronization techniques

    NASA Technical Reports Server (NTRS)

    Butler, Ricky W.

    1988-01-01

    Six provably correct fault-tolerant clock synchronization algorithms are examined. These algorithms are all presented in the same notation to permit easier comprehension and comparison. The advantages and disadvantages of the different techniques are examined and issues related to the implementation of these algorithms are discussed. The paper argues for the use of such algorithms in life-critical applications.

  3. Step-by-step magic state encoding for efficient fault-tolerant quantum computation

    PubMed Central

    Goto, Hayato

    2014-01-01

    Quantum error correction allows one to make quantum computers fault-tolerant against unavoidable errors due to decoherence and imperfect physical gate operations. However, the fault-tolerant quantum computation requires impractically large computational resources for useful applications. This is a current major obstacle to the realization of a quantum computer. In particular, magic state distillation, which is a standard approach to universality, consumes the most resources in fault-tolerant quantum computation. For the resource problem, here we propose step-by-step magic state encoding for concatenated quantum codes, where magic states are encoded step by step from the physical level to the logical one. To manage errors during the encoding, we carefully use error detection. Since the sizes of intermediate codes are small, it is expected that the resource overheads will become lower than previous approaches based on the distillation at the logical level. Our simulation results suggest that the resource requirements for a logical magic state will become comparable to those for a single logical controlled-NOT gate. Thus, the present method opens a new possibility for efficient fault-tolerant quantum computation. PMID:25511387

  4. Step-by-step magic state encoding for efficient fault-tolerant quantum computation

    NASA Astrophysics Data System (ADS)

    Goto, Hayato

    2014-12-01

    Quantum error correction allows one to make quantum computers fault-tolerant against unavoidable errors due to decoherence and imperfect physical gate operations. However, the fault-tolerant quantum computation requires impractically large computational resources for useful applications. This is a current major obstacle to the realization of a quantum computer. In particular, magic state distillation, which is a standard approach to universality, consumes the most resources in fault-tolerant quantum computation. For the resource problem, here we propose step-by-step magic state encoding for concatenated quantum codes, where magic states are encoded step by step from the physical level to the logical one. To manage errors during the encoding, we carefully use error detection. Since the sizes of intermediate codes are small, it is expected that the resource overheads will become lower than previous approaches based on the distillation at the logical level. Our simulation results suggest that the resource requirements for a logical magic state will become comparable to those for a single logical controlled-NOT gate. Thus, the present method opens a new possibility for efficient fault-tolerant quantum computation.

  5. Software reliability models for fault-tolerant avionics computers and related topics

    NASA Technical Reports Server (NTRS)

    Miller, Douglas R.

    1987-01-01

    Software reliability research is briefly described. General research topics are reliability growth models, quality of software reliability prediction, the complete monotonicity property of reliability growth, conceptual modelling of software failure behavior, assurance of ultrahigh reliability, and analysis techniques for fault-tolerant systems.

  6. Fault tolerance control of phase current in permanent magnet synchronous motor control system

    NASA Astrophysics Data System (ADS)

    Chen, Kele; Chen, Ke; Chen, Xinglong; Li, Jinying

    2014-08-01

    As the Photoelectric tracking system develops from earth based platform to all kinds of moving platform such as plane based, ship based, car based, satellite based and missile based, the fault tolerance control system of phase current sensor is studied in order to detect and control of failure of phase current sensor on a moving platform. By using a DC-link current sensor and the switching state of the corresponding SVPWM inverter, the failure detection and fault control of three phase current sensor is achieved. Under such conditions as one failure, two failures and three failures, fault tolerance is able to be controlled. The reason why under the method, there exists error between fault tolerance control and actual phase current, is analyzed, and solution to weaken the error is provided. The experiment based on permanent magnet synchronous motor system is conducted, and the method is proven to be capable of detecting the failure of phase current sensor effectively and precisely, and controlling the fault tolerance simultaneously. With this method, even though all the three phase current sensors malfunction, the moving platform can still work by reconstructing the phase current of the motor.

  7. Neural network-based robust actuator fault diagnosis for a non-linear multi-tank system.

    PubMed

    Mrugalski, Marcin; Luzar, Marcel; Pazera, Marcin; Witczak, Marcin; Aubrun, Christophe

    2016-03-01

    The paper is devoted to the problem of the robust actuator fault diagnosis of the dynamic non-linear systems. In the proposed method, it is assumed that the diagnosed system can be modelled by the recurrent neural network, which can be transformed into the linear parameter varying form. Such a system description allows developing the designing scheme of the robust unknown input observer within H∞ framework for a class of non-linear systems. The proposed approach is designed in such a way that a prescribed disturbance attenuation level is achieved with respect to the actuator fault estimation error, while guaranteeing the convergence of the observer. The application of the robust unknown input observer enables actuator fault estimation, which allows applying the developed approach to the fault tolerant control tasks. PMID:26838675

  8. Award ER25750: Coordinated Infrastructure for Fault Tolerance Systems Indiana University Final Report

    SciTech Connect

    Lumsdaine, Andrew

    2013-03-08

    The main purpose of the Coordinated Infrastructure for Fault Tolerance in Systems initiative has been to conduct research with a goal of providing end-to-end fault tolerance on a systemwide basis for applications and other system software. While fault tolerance has been an integral part of most high-performance computing (HPC) system software developed over the past decade, it has been treated mostly as a collection of isolated stovepipes. Visibility and response to faults has typically been limited to the particular hardware and software subsystems in which they are initially observed. Little fault information is shared across subsystems, allowing little flexibility or control on a system-wide basis, making it practically impossible to provide cohesive end-to-end fault tolerance in support of scientific applications. As an example, consider faults such as communication link failures that can be seen by a network library but are not directly visible to the job scheduler, or consider faults related to node failures that can be detected by system monitoring software but are not inherently visible to the resource manager. If information about such faults could be shared by the network libraries or monitoring software, then other system software, such as a resource manager or job scheduler, could ensure that failed nodes or failed network links were excluded from further job allocations and that further diagnosis could be performed. As a founding member and one of the lead developers of the Open MPI project, our efforts over the course of this project have been focused on making Open MPI more robust to failures by supporting various fault tolerance techniques, and using fault information exchange and coordination between MPI and the HPC system software stack from the application, numeric libraries, and programming language runtime to other common system components such as jobs schedulers, resource managers, and monitoring tools.

  9. Non-linearity in clinical practice.

    PubMed

    Petros, Peter

    2003-05-01

    The whole spectrum of medicine consists of complex non-linear systems that are balanced and interact with each other. How non-linearity confers stability on a system and explains variation and uncertainty in clinical medicine is discussed. A major theme is that a small alteration in initial conditions may have a major effect on the end result. In the context of non-linearity, it is argued that 'evidence-based medicine' (EBM) as it exists today can only ever be relevant to a small fraction of the domain of medicine, that the 'art of medicine' consists of an intuitive 'tuning in' to these complex systems and as such is not so much an art as an expression of non-linear science. The main cause of iatrogenic disease is interpreted as a failure to understand the complexity of the systems being treated. Case study examples are given and analysed in non-linear terms. It is concluded that good medicine concerns individualized treatment of an individual patient whose body functions are governed by non-linear processes. EBM as it exists today paints with a broad and limited brush, but it does promise a fresh new direction. In this context, we need to expand the spectrum of scientific medicine to include non-linearity, and to look upon the 'art of medicine' as a historical (but unstated) legacy in this domain. PMID:12787180

  10. Approximate solutions for non-linear iterative fractional differential equations

    NASA Astrophysics Data System (ADS)

    Damag, Faten H.; Kiliçman, Adem; Ibrahim, Rabha W.

    2016-06-01

    This paper establishes approximate solution for non-linear iterative fractional differential equations: d/γv (s ) d sγ =ℵ (s ,v ,v (v )), where γ ∈ (0, 1], s ∈ I := [0, 1]. Our method is based on some convergence tools for analytic solution in a connected region. We show that the suggested solution is unique and convergent by some well known geometric functions.

  11. Special Issue on a Fault Tolerant Network on Chip Architecture

    NASA Astrophysics Data System (ADS)

    Janidarmian, Majid; Tinati, Melika; Khademzadeh, Ahmad; Ghavibazou, Maryam; Fekr, Atena Roshan

    2010-06-01

    In this paper a fast and efficient spare switch selection algorithm is presented in a reliable NoC architecture based on specific application mapped onto mesh topology called FERNA. Based on ring concept used in FERNA, this algorithm achieves best results equivalent to exhaustive algorithm with much less run time improving two parameters. Inputs of FERNA algorithm for response time of the system and extra communication cost minimization are derived from simulation of high transaction level using SystemC TLM and mathematical formulation, respectively. The results demonstrate that improvement of above mentioned parameters lead to advance whole system reliability that is analytically calculated. Mapping algorithm has been also investigated as an effective issue on extra bandwidth requirement and system reliability.

  12. Fault tolerance in onboard processors - Protecting efficient FDM demultiplexers

    NASA Technical Reports Server (NTRS)

    Redinbo, Robert

    1992-01-01

    The application of convolutional codes to protect demultiplexer filter banks is demonstrated analytically for efficient implementations. An overview is given of the parameters for the efficient implementations of filter banks, and real convolutional codes are discussed in terms of DSP operations. Methods for composite filtering and parity generation are outlined, and attention is given to the protection of polyphase filter demultiplexing systems. Real convolutional codes can be applied to protect demultiplexer filter banks by employing two forms of low-rate parity calculation to each filter bank. The parity values are computed either by the output with an FIR parity filter or in parallel with the normal processing by a composite filter. Hardware similarities between the filter bank and the main demultiplexer bank permit efficient redeployment of the processing resources to the main processing function in any configuration.

  13. Fault tolerance in parity-state linear optical quantum computing

    SciTech Connect

    Hayes, A. J. F.; Ralph, T. C.; Haselgrove, H. L.; Gilchrist, Alexei

    2010-08-15

    We use a combination of analytical and numerical techniques to calculate the noise threshold and resource requirements for a linear optical quantum computing scheme based on parity-state encoding. Parity-state encoding is used at the lowest level of code concatenation in order to efficiently correct errors arising from the inherent nondeterminism of two-qubit linear-optical gates. When combined with teleported error-correction (using either a Steane or Golay code) at higher levels of concatenation, the parity-state scheme is found to achieve a saving of approximately three orders of magnitude in resources when compared to the cluster state scheme, at a cost of a somewhat reduced noise threshold.

  14. SFTP: A Secure and Fault-Tolerant Paradigm against Blackhole Attack in MANET

    NASA Astrophysics Data System (ADS)

    KumarRout, Jitendra; Kumar Bhoi, Sourav; Kumar Panda, Sanjaya

    2013-02-01

    Security issues in MANET are a challenging task nowadays. MANETs are vulnerable to passive attacks and active attacks because of a limited number of resources and lack of centralized authority. Blackhole attack is an attack in network layer which degrade the network performance by dropping the packets. In this paper, we have proposed a Secure Fault-Tolerant Paradigm (SFTP) which checks the Blackhole attack in the network. The three phases used in SFTP algorithm are designing of coverage area to find the area of coverage, Network Connection algorithm to design a fault-tolerant model and Route Discovery algorithm to discover the route and data delivery from source to destination. SFTP gives better network performance by making the network fault free.

  15. Communications protocols for a fault tolerant, integrated local area network for Space Station applications

    NASA Technical Reports Server (NTRS)

    Meredith, B. D.

    1984-01-01

    The evolutionary growth of the Space Station and the diverse activities onboard are expected to require a hierarchy of integrated,local area networks capable of supporting data, voice and video communications. In addition, fault tolerant network operation is necessary to protect communications between critical systems attached to the net and to relieve the valuable human resources onboard Space Station of day-to-day data system repair tasks. An experimental, local area network is being developed which will serve as a testbed for investigating candidate algorithms and technologies for a fault tolerant, integrated network. The establishment of a set of rules or protocols which govern communications on the net is essential to obtain orderly and reliable operation. A hierarchy of protocols for the experimental network is presented and procedures for data and control communications are described.

  16. Experimental Robot Position Sensor Fault Tolerance Using Accelerometers and Joint Torque Sensors

    NASA Technical Reports Server (NTRS)

    Aldridge, Hal A.; Juang, Jer-Nan

    1997-01-01

    Robot systems in critical applications, such as those in space and nuclear environments, must be able to operate during component failure to complete important tasks. One failure mode that has received little attention is the failure of joint position sensors. Current fault tolerant designs require the addition of directly redundant position sensors which can affect joint design. The proposed method uses joint torque sensors found in most existing advanced robot designs along with easily locatable, lightweight accelerometers to provide a joint position sensor fault recovery mode. This mode uses the torque sensors along with a virtual passive control law for stability and accelerometers for joint position information. Two methods for conversion from Cartesian acceleration to joint position based on robot kinematics, not integration, are presented. The fault tolerant control method was tested on several joints of a laboratory robot. The controllers performed well with noisy, biased data and a model with uncertain parameters.

  17. Machine-checked proofs of the design and implementation of a fault-tolerant circuit

    NASA Technical Reports Server (NTRS)

    Bevier, William R.; Young, William D.

    1990-01-01

    A formally verified implementation of the 'oral messages' algorithm of Pease, Shostak, and Lamport is described. An abstract implementation of the algorithm is verified to achieve interactive consistency in the presence of faults. This abstract characterization is then mapped down to a hardware level implementation which inherits the fault-tolerant characteristics of the abstract version. All steps in the proof were checked with the Boyer-Moore theorem prover. A significant results is the demonstration of a fault-tolerant device that is formally specified and whose implementation is proved correct with respect to this specification. A significant simplifying assumption is that the redundant processors behave synchronously. A mechanically checked proof that the oral messages algorithm is 'optimal' in the sense that no algorithm which achieves agreement via similar message passing can tolerate a larger proportion of faulty processor is also described.

  18. Robust fault-tolerant H∞ control of active suspension systems with finite-frequency constraint

    NASA Astrophysics Data System (ADS)

    Wang, Rongrong; Jing, Hui; Karimi, Hamid Reza; Chen, Nan

    2015-10-01

    In this paper, the robust fault-tolerant (FT) H∞ control problem of active suspension systems with finite-frequency constraint is investigated. A full-car model is employed in the controller design such that the heave, pitch and roll motions can be simultaneously controlled. Both the actuator faults and external disturbances are considered in the controller synthesis. As the human body is more sensitive to the vertical vibration in 4-8 Hz, robust H∞ control with this finite-frequency constraint is designed. Other performances such as suspension deflection and actuator saturation are also considered. As some of the states such as the sprung mass pitch and roll angles are hard to measure, a robust H∞ dynamic output-feedback controller with fault tolerant ability is proposed. Simulation results show the performance of the proposed controller.

  19. Actuator usage and fault tolerance of the James Webb Space Telescope optical element mirror actuators

    NASA Astrophysics Data System (ADS)

    Barto, A.; Acton, D. S.; Finley, P.; Gallagher, B.; Hardy, B.; Knight, J. S.; Lightsey, P.

    2012-09-01

    The James Webb Space Telescope (JWST) telescope's secondary mirror and eighteen primary mirror segments are each actively controlled in rigid body position via six hexapod actuators. The mirrors are stowed to the mirror support structure to survive the launch environment and then must be deployed 12.5 mm to reach the nominally deployed position before the Wavefront Sensing & Control (WFS&C) alignment and phasing process begins. The actuation system is electrically, but not mechanically redundant. Therefore, with the large number of hexapod actuators, the fault tolerance of the OTE architecture and WFS&C alignment process has been carefully considered. The details of the fault tolerance will be discussed, including motor life budgeting, failure signatures, and motor life.

  20. Actuator fault tolerant multi-controller scheme using set separation based diagnosis

    NASA Astrophysics Data System (ADS)

    Seron, María M.; De Doná, José A.

    2010-11-01

    We present a fault tolerant control strategy based on a new principle for actuator fault diagnosis. The scheme employs a standard bank of observers which match the different fault situations that can occur in the plant. Each of these observers has an associated estimation error with distinctive dynamics when an estimator matches the current fault situation of the plant. Based on the information from each observer, a fault detection and isolation (FDI) module is able to reconfigure the control loop by selecting the appropriate control law from a bank of controllers, each of them designed to stabilise and achieve reference tracking for one of the given fault models. The main contribution of this article is to propose a new FDI principle which exploits the separation of sets that characterise healthy system operation from sets that characterise transitions from healthy to faulty behaviour. The new principle allows to provide pre-checkable conditions for guaranteed fault tolerance of the overall multi-controller scheme.

  1. A quantitative study of fault tolerance, noise immunity, and generalization ability of MLPs

    PubMed

    Bernier; Ortega; Ros; Rojas; Prieto

    2000-12-01

    An analysis of the influence of weight and input perturbations in a multilayer perceptron (MLP) is made in this article. Quantitative measurements of fault tolerance, noise immunity, and generalization ability are provided. From the expressions obtained, it is possible to justify some previously reported conjectures and experimentally obtained results (e.g., the influence of weight magnitudes, the relation between training with noise and the generalization ability, the relation between fault tolerance and the generalization ability). The measurements introduced here are explicitly related to the mean squared error degradation in the presence of perturbations, thus constituting a selection criterion between different alternatives of weight configurations. Moreover, they allow us to predict the degradation of the learning performance of an MLP when its weights or inputs are deviated from their nominal values and thus, the behavior of a physical implementation can be evaluated before the weights are mapped on it according to its accuracy. PMID:11112261

  2. Using Concatenated Quantum Codes for Universal Fault-Tolerant Quantum Gates

    NASA Astrophysics Data System (ADS)

    Jochym-O'Connor, Tomas; Laflamme, Raymond

    2014-01-01

    We propose a method for universal fault-tolerant quantum computation using concatenated quantum error correcting codes. The concatenation scheme exploits the transversal properties of two different codes, combining them to provide a means to protect against low-weight arbitrary errors. We give the required properties of the error correcting codes to ensure universal fault tolerance and discuss a particular example using the 7-qubit Steane and 15-qubit Reed-Muller codes. Namely, other than computational basis state preparation as required by the DiVincenzo criteria, our scheme requires no special ancillary state preparation to achieve universality, as opposed to schemes such as magic state distillation. We believe that optimizing the codes used in such a scheme could provide a useful alternative to state distillation schemes that exhibit high overhead costs.

  3. Problems related to the integration of fault tolerant aircraft electronic systems

    NASA Technical Reports Server (NTRS)

    Bannister, J. A.; Adlakha, V.; Triyedi, K.; Alspaugh, T. A., Jr.

    1982-01-01

    Problems related to the design of the hardware for an integrated aircraft electronic system are considered. Taxonomies of concurrent systems are reviewed and a new taxonomy is proposed. An informal methodology intended to identify feasible regions of the taxonomic design space is described. Specific tools are recommended for use in the methodology. Based on the methodology, a preliminary strawman integrated fault tolerant aircraft electronic system is proposed. Next, problems related to the programming and control of inegrated aircraft electronic systems are discussed. Issues of system resource management, including the scheduling and allocation of real time periodic tasks in a multiprocessor environment, are treated in detail. The role of software design in integrated fault tolerant aircraft electronic systems is discussed. Conclusions and recommendations for further work are included.

  4. Simulated fault injection - A methodology to evaluate fault tolerant microprocessor architectures

    NASA Technical Reports Server (NTRS)

    Choi, Gwan S.; Iyer, Ravishankar K.; Carreno, Victor A.

    1990-01-01

    A simulation-based fault-injection method for validating fault-tolerant microprocessor architectures is described. The approach uses mixed-mode simulation (electrical/logic analysis), and injects transient errors in run-time to assess the resulting fault impact. As an example, a fault-tolerant architecture which models the digital aspects of a dual-channel real-time jet-engine controller is used. The level of effectiveness of the dual configuration with respect to single and multiple transients is measured. The results indicate 100 percent coverage of single transients. Approximately 12 percent of the multiple transients affect both channels; none result in controller failure since two additional levels of redundancy exist.

  5. Fault tolerant onboard packet switch architecture for communication satellites: Shared memory per beam approach

    NASA Technical Reports Server (NTRS)

    Shalkhauser, Mary JO; Quintana, Jorge A.; Soni, Nitin J.

    1994-01-01

    The NASA Lewis Research Center is developing a multichannel communication signal processing satellite (MCSPS) system which will provide low data rate, direct to user, commercial communications services. The focus of current space segment developments is a flexible, high-throughput, fault tolerant onboard information switching processor. This information switching processor (ISP) is a destination-directed packet switch which performs both space and time switching to route user information among numerous user ground terminals. Through both industry study contracts and in-house investigations, several packet switching architectures were examined. A contention-free approach, the shared memory per beam architecture, was selected for implementation. The shared memory per beam architecture, fault tolerance insertion, implementation, and demonstration plans are described.

  6. Real-number codes for fault-tolerant matrix operations on processor arrays

    NASA Technical Reports Server (NTRS)

    Nair, V. S. S.; Abraham, Jacob A.

    1990-01-01

    A generalization of existing real number codes is proposed. It is proven that linearity is a necessary and sufficient condition for codes used for fault-tolerant matrix operations such as matrix addition, multiplication, transposition, and LU decomposition. It is also proven that for every linear code defined over a finite field, there exists a corresponding linear real-number code with similar error detecting capabilities. Encoding schemes are given for some of the example codes which fall under the general set of real-number codes. With the help of experiments, a rule is derived for the selection of a particular code for a given application. The performance overhead of fault tolerance schemes using the generalized encoding schemes is shown to be very low, and this is substantiated through simulation experiments.

  7. Robust Gain-Scheduled Fault Tolerant Control for a Transport Aircraft

    NASA Technical Reports Server (NTRS)

    Shin, Jong-Yeob; Gregory, Irene

    2007-01-01

    This paper presents an application of robust gain-scheduled control concepts using a linear parameter-varying (LPV) control synthesis method to design fault tolerant controllers for a civil transport aircraft. To apply the robust LPV control synthesis method, the nonlinear dynamics must be represented by an LPV model, which is developed using the function substitution method over the entire flight envelope. The developed LPV model associated with the aerodynamic coefficient uncertainties represents nonlinear dynamics including those outside the equilibrium manifold. Passive and active fault tolerant controllers (FTC) are designed for the longitudinal dynamics of the Boeing 747-100/200 aircraft in the presence of elevator failure. Both FTC laws are evaluated in the full nonlinear aircraft simulation in the presence of the elevator fault and the results are compared to show pros and cons of each control law.

  8. Economic modeling of fault tolerant flight control systems in commercial applications

    NASA Technical Reports Server (NTRS)

    Finelli, G. B.

    1982-01-01

    This paper describes the current development of a comprehensive model which will supply the assessment and analysis capability to investigate the economic viability of Fault Tolerant Flight Control Systems (FTFCS) for commercial aircraft of the 1990's and beyond. An introduction to the unique attributes of fault tolerance and how they will influence aircraft operations and consequent airline costs and benefits is presented. Specific modeling issues and elements necessary for accurate assessment of all costs affected by ownership and operation of FTFCS are delineated. Trade-off factors are presented, aimed at exposing economically optimal realizations of system implementations, resource allocation, and operating policies. A trade-off example is furnished to graphically display some of the analysis capabilities of the comprehensive simulation model now being developed.

  9. Fault tolerant filtering and fault detection for quantum systems driven by fields in single photon states

    NASA Astrophysics Data System (ADS)

    Gao, Qing; Dong, Daoyi; Petersen, Ian R.; Rabitz, Herschel

    2016-06-01

    The purpose of this paper is to solve the fault tolerant filtering and fault detection problem for a class of open quantum systems driven by a continuous-mode bosonic input field in single photon states when the systems are subject to stochastic faults. Optimal estimates of both the system observables and the fault process are simultaneously calculated and characterized by a set of coupled recursive quantum stochastic differential equations.

  10. Permutation codes for the state assignment of fault tolerant sequential machines

    NASA Technical Reports Server (NTRS)

    Chen, M.; Trachtenberg, E. A.

    1991-01-01

    A new fault-tolerant state assignment method is suggested for synchronous sequential machines. It is assumed that the inputs are fault free and that for no input it is possible to reach all or most of the states, whose number may be fairly large. Error correcting codes for the state assignment are generated by permutations of a chosen linear code. A state assignment algorithm is developed and its computational complexity is estimated. Examples are given.

  11. Provable Transient Recovery for Frame-Based, Fault-Tolerant Computing Systems

    NASA Technical Reports Server (NTRS)

    DiVito, Ben L.; Butler, Ricky W.

    1992-01-01

    We present a formal verification of the transient fault recovery aspects of the Reliable Computing Platform (RCP), a fault-tolerant computing system architecture for digital flight control applications. The RCP uses NMR-style redundancy to mask faults and internal majority voting to purge the effects of transient faults. The system design has been formally specified and verified using the EHDM verification system. Our formalization accommodates a wide variety of voting schemes for purging the effects of transients.

  12. A novel mathematical setup for fault tolerant control systems with state-dependent failure process

    NASA Astrophysics Data System (ADS)

    Chitraganti, S.; Aberkane, S.; Aubrun, C.

    2014-12-01

    In this paper, we consider a fault tolerant control system (FTCS) with state- dependent failures and provide a tractable mathematical model to handle the state-dependent failures. By assuming abrupt changes in system parameters, we use a jump process modelling of failure process and the fault detection and isolation (FDI) process. In particular, we assume that the failure rates of the failure process vary according to which set the state of the system belongs to.

  13. A general model for the study of fault tolerance and diagnosis.

    NASA Technical Reports Server (NTRS)

    Meyer, J. F.

    1973-01-01

    The concept of a 'system with faults' is introduced as a suggested point of departure for the theoretical study of fault tolerance and diagnosis in systems. The model is defined relative to a general representation scheme for systems and, depending on the choice of representation, can be used to investigate either hardware or software faults that occur during either the design or use of a system.

  14. Experimental fault tolerant universal quantum gates with solid-state spins under ambient conditions

    NASA Astrophysics Data System (ADS)

    Rong, Xing

    Quantum computation provides great speedup over classical counterpart for certain problems, such as quantum simulations, prime factoring and database searching. One of the challenges for realizing quantum computation is to execute precise control of the quantum system in the presence of noise. Recently, high fidelity control of spin-qubits has been achieved in several quantum systems. However, control of the spin-qubits with the accuracy required by the fault tolerant quantum computation under ambient conditions remains exclusive. Here we demonstrate a universal set of logic gates in nitrogen-vacancy centers with an average single-qubit gate fidelity of 0.99995 and two qubit gate fidelity of 0.992. These high control fidelities have been achieved in the C naturally abundant diamonds at room temperature via composite pulses and optimal control method. This experimental implementation of quantum gates with fault tolerant control fidelity sets an important step towards the fault-tolerant quantum computation under ambient conditions. National Key Basic Research Program of China (Grant No. 2013CB921800).

  15. Fast fault-tolerant decoder for qubit and qudit surface codes

    NASA Astrophysics Data System (ADS)

    Watson, Fern H. E.; Anwar, Hussain; Browne, Dan E.

    2015-09-01

    The surface code is one of the most promising candidates for combating errors in large scale fault-tolerant quantum computation. A fault-tolerant decoder is a vital part of the error correction process—it is the algorithm which computes the operations needed to correct or compensate for the errors according to the measured syndrome, even when the measurement itself is error prone. Previously decoders based on minimum-weight perfect matching have been studied. However, these are not immediately generalizable from qubit to qudit codes. In this work, we develop a fault-tolerant decoder for the surface code, capable of efficient operation for qubits and qudits of any dimension, generalizing the decoder first introduced by Bravyi and Haah [Phys. Rev. Lett. 111, 200501 (2013), 10.1103/PhysRevLett.111.200501]. We study its performance when both the physical qudits and the syndromes measurements are subject to generalized uncorrelated bit-flip noise (and the higher-dimensional equivalent). We show that, with appropriate enhancements to the decoder and a high enough qudit dimension, a threshold at an error rate of more than 8 % can be achieved.

  16. Fault-Tolerant Algorithms for Connectivity Restoration in Wireless Sensor Networks

    PubMed Central

    Zeng, Yali; Xu, Li; Chen, Zhide

    2015-01-01

    As wireless sensor network (WSN) is often deployed in a hostile environment, nodes in the networks are prone to large-scale failures, resulting in the network not working normally. In this case, an effective restoration scheme is needed to restore the faulty network timely. Most of existing restoration schemes consider more about the number of deployed nodes or fault tolerance alone, but fail to take into account the fact that network coverage and topology quality are also important to a network. To address this issue, we present two algorithms named Full 2-Connectivity Restoration Algorithm (F2CRA) and Partial 3-Connectivity Restoration Algorithm (P3CRA), which restore a faulty WSN in different aspects. F2CRA constructs the fan-shaped topology structure to reduce the number of deployed nodes, while P3CRA constructs the dual-ring topology structure to improve the fault tolerance of the network. F2CRA is suitable when the restoration cost is given the priority, and P3CRA is suitable when the network quality is considered first. Compared with other algorithms, these two algorithms ensure that the network has stronger fault-tolerant function, larger coverage area and better balanced load after the restoration. PMID:26703616

  17. Fault-Tolerant Algorithms for Connectivity Restoration in Wireless Sensor Networks.

    PubMed

    Zeng, Yali; Xu, Li; Chen, Zhide

    2015-01-01

    As wireless sensor network (WSN) is often deployed in a hostile environment, nodes in the networks are prone to large-scale failures, resulting in the network not working normally. In this case, an effective restoration scheme is needed to restore the faulty network timely. Most of existing restoration schemes consider more about the number of deployed nodes or fault tolerance alone, but fail to take into account the fact that network coverage and topology quality are also important to a network. To address this issue, we present two algorithms named Full 2-Connectivity Restoration Algorithm (F2CRA) and Partial 3-Connectivity Restoration Algorithm (P3CRA), which restore a faulty WSN in different aspects. F2CRA constructs the fan-shaped topology structure to reduce the number of deployed nodes, while P3CRA constructs the dual-ring topology structure to improve the fault tolerance of the network. F2CRA is suitable when the restoration cost is given the priority, and P3CRA is suitable when the network quality is considered first. Compared with other algorithms, these two algorithms ensure that the network has stronger fault-tolerant function, larger coverage area and better balanced load after the restoration. PMID:26703616

  18. Preserving Collective Performance Across Process Failure for a Fault Tolerant MPI

    SciTech Connect

    Hursey, Joshua J; Graham, Richard L

    2011-01-01

    Application developers are investigating Algorithm Based Fault Tolerance (ABFT) techniques to improve the efficiency of application recovery beyond what traditional techniques alone can provide. Applications will depend on libraries to sustain failure-free performance across process failure to continue to efficiently use High Performance Computing (HPC) systems even in the presence of process failure. Optimized Message Passing Interface (MPI) collective operations are a critical component of many scalable HPC applications. However, most of the collective algorithms are not able to handle process failure. Next generation MPI implementations must provide fault aware versions of such algorithms that can sustain performance across process failure. This paper discusses the design and implementation of fault aware collective algorithms for tree structured communication patterns. The three design approaches of rerouting, lookup avoiding and rebalancing are described, and analyzed for their performance impact relative to a similar fault unaware collective algorithm. The analysis shows that the rerouting approach causes up to a four times performance degradation while the rebalancing approach can bring the performance within 1% of the fault unaware performance. Additionally, this paper introduces the reader to a set of run-through stabilization semantics being developed by the MPI Forum's Fault Tolerance Working Group to support ABFT. This paper underscores the need for care to be taken when designing new fault aware collective algorithms for fault tolerant MPI implementations.

  19. ALLIANCE: An architecture for fault tolerant, cooperative control of heterogeneous mobile robots

    SciTech Connect

    Parker, L.E.

    1995-02-01

    This research addresses the problem of achieving fault tolerant cooperation within small- to medium-sized teams of heterogeneous mobile robots. The author describes a novel behavior-based, fully distributed architecture, called ALLIANCE, that utilizes adaptive action selection to achieve fault tolerant cooperative control in robot missions involving loosely coupled, largely independent tasks. The robots in this architecture possess a variety of high-level functions that they can perform during a mission, and must at all times select an appropriate action based on the requirements of the mission, the activities of other robots, the current environmental conditions, and their own internal states. Since such cooperative teams often work in dynamic and unpredictable environments, the software architecture allows the team members to respond robustly and reliably to unexpected environmental changes and modifications in the robot team that may occur due to mechanical failure, the learning of new skills, or the addition or removal of robots from the team by human intervention. After presenting ALLIANCE, the author describes in detail experimental results of an implementation of this architecture on a team of physical mobile robots performing a cooperative box pushing demonstration. These experiments illustrate the ability of ALLIANCE to achieve adaptive, fault-tolerant cooperative control amidst dynamic changes in the capabilities of the robot team.

  20. Design and Experimental Validation for Direct-Drive Fault-Tolerant Permanent-Magnet Vernier Machines

    PubMed Central

    Liu, Guohai; Yang, Junqin; Chen, Ming; Chen, Qian

    2014-01-01

    A fault-tolerant permanent-magnet vernier (FT-PMV) machine is designed for direct-drive applications, incorporating the merits of high torque density and high reliability. Based on the so-called magnetic gearing effect, PMV machines have the ability of high torque density by introducing the flux-modulation poles (FMPs). This paper investigates the fault-tolerant characteristic of PMV machines and provides a design method, which is able to not only meet the fault-tolerant requirements but also keep the ability of high torque density. The operation principle of the proposed machine has been analyzed. The design process and optimization are presented specifically, such as the combination of slots and poles, the winding distribution, and the dimensions of PMs and teeth. By using the time-stepping finite element method (TS-FEM), the machine performances are evaluated. Finally, the FT-PMV machine is manufactured, and the experimental results are presented to validate the theoretical analysis. PMID:25045729

  1. Design and experimental validation for direct-drive fault-tolerant permanent-magnet vernier machines.

    PubMed

    Liu, Guohai; Yang, Junqin; Chen, Ming; Chen, Qian

    2014-01-01

    A fault-tolerant permanent-magnet vernier (FT-PMV) machine is designed for direct-drive applications, incorporating the merits of high torque density and high reliability. Based on the so-called magnetic gearing effect, PMV machines have the ability of high torque density by introducing the flux-modulation poles (FMPs). This paper investigates the fault-tolerant characteristic of PMV machines and provides a design method, which is able to not only meet the fault-tolerant requirements but also keep the ability of high torque density. The operation principle of the proposed machine has been analyzed. The design process and optimization are presented specifically, such as the combination of slots and poles, the winding distribution, and the dimensions of PMs and teeth. By using the time-stepping finite element method (TS-FEM), the machine performances are evaluated. Finally, the FT-PMV machine is manufactured, and the experimental results are presented to validate the theoretical analysis. PMID:25045729

  2. Adaptive Fault-Tolerant Control of Uncertain Nonlinear Large-Scale Systems With Unknown Dead Zone.

    PubMed

    Chen, Mou; Tao, Gang

    2016-08-01

    In this paper, an adaptive neural fault-tolerant control scheme is proposed and analyzed for a class of uncertain nonlinear large-scale systems with unknown dead zone and external disturbances. To tackle the unknown nonlinear interaction functions in the large-scale system, the radial basis function neural network (RBFNN) is employed to approximate them. To further handle the unknown approximation errors and the effects of the unknown dead zone and external disturbances, integrated as the compounded disturbances, the corresponding disturbance observers are developed for their estimations. Based on the outputs of the RBFNN and the disturbance observer, the adaptive neural fault-tolerant control scheme is designed for uncertain nonlinear large-scale systems by using a decentralized backstepping technique. The closed-loop stability of the adaptive control system is rigorously proved via Lyapunov analysis and the satisfactory tracking performance is achieved under the integrated effects of unknown dead zone, actuator fault, and unknown external disturbances. Simulation results of a mass-spring-damper system are given to illustrate the effectiveness of the proposed adaptive neural fault-tolerant control scheme for uncertain nonlinear large-scale systems. PMID:26340792

  3. Analysis of a hardware and software fault tolerant processor for critical applications

    NASA Technical Reports Server (NTRS)

    Dugan, Joanne B.

    1993-01-01

    Computer systems for critical applications must be designed to tolerate software faults as well as hardware faults. A unified approach to tolerating hardware and software faults is characterized by classifying faults in terms of duration (transient or permanent) rather than source (hardware or software). Errors arising from transient faults can be handled through masking or voting, but errors arising from permanent faults require system reconfiguration to bypass the failed component. Most errors which are caused by software faults can be considered transient, in that they are input-dependent. Software faults are triggered by a particular set of inputs. Quantitative dependability analysis of systems which exhibit a unified approach to fault tolerance can be performed by a hierarchical combination of fault tree and Markov models. A methodology for analyzing hardware and software fault tolerant systems is applied to the analysis of a hypothetical system, loosely based on the Fault Tolerant Parallel Processor. The models consider both transient and permanent faults, hardware and software faults, independent and related software faults, automatic recovery, and reconfiguration.

  4. Energy-efficient fault tolerance in multiprocessor real-time systems

    NASA Astrophysics Data System (ADS)

    Guo, Yifeng

    The recent progress in the multiprocessor/multicore systems has important implications for real-time system design and operation. From vehicle navigation to space applications as well as industrial control systems, the trend is to deploy multiple processors in real-time systems: systems with 4 -- 8 processors are common, and it is expected that many-core systems with dozens of processing cores will be available in near future. For such systems, in addition to general temporal requirement common for all real-time systems, two additional operational objectives are seen as critical: energy efficiency and fault tolerance. An intriguing dimension of the problem is that energy efficiency and fault tolerance are typically conflicting objectives, due to the fact that tolerating faults (e.g., permanent/transient) often requires extra resources with high energy consumption potential. In this dissertation, various techniques for energy-efficient fault tolerance in multiprocessor real-time systems have been investigated. First, the Reliability-Aware Power Management (RAPM) framework, which can preserve the system reliability with respect to transient faults when Dynamic Voltage Scaling (DVS) is applied for energy savings, is extended to support parallel real-time applications with precedence constraints. Next, the traditional Standby-Sparing (SS) technique for dual processor systems, which takes both transient and permanent faults into consideration while saving energy, is generalized to support multiprocessor systems with arbitrary number of identical processors. Observing the inefficient usage of slack time in the SS technique, a Preference-Oriented Scheduling Framework is designed to address the problem where tasks are given preferences for being executed as soon as possible (ASAP) or as late as possible (ALAP). A preference-oriented earliest deadline (POED) scheduler is proposed and its application in multiprocessor systems for energy-efficient fault tolerance is

  5. Stability of non-linear integrable accelerator

    SciTech Connect

    Batalov, I.; Valishev, A.; /Fermilab

    2011-09-01

    The stability of non-linear Integrable Optics Test Accelerator (IOTA) model developed in [1] was tested. The area of the stable region in transverse coordinates and the maximum attainable tune spread were found as a function of non-linear lens strength. Particle loss as a function of turn number was analyzed to determine whether a dynamic aperture limitation present in the system. The system was also tested with sextupoles included in the machine for chromaticity compensation. A method of evaluation of the beam size in the linear part of the accelerator was proposed.

  6. Non-linear Post Processing Image Enhancement

    NASA Technical Reports Server (NTRS)

    Hunt, Shawn; Lopez, Alex; Torres, Angel

    1997-01-01

    A non-linear filter for image post processing based on the feedforward Neural Network topology is presented. This study was undertaken to investigate the usefulness of "smart" filters in image post processing. The filter has shown to be useful in recovering high frequencies, such as those lost during the JPEG compression-decompression process. The filtered images have a higher signal to noise ratio, and a higher perceived image quality. Simulation studies comparing the proposed filter with the optimum mean square non-linear filter, showing examples of the high frequency recovery, and the statistical properties of the filter are given,

  7. Non-linear cord-rubber composites

    NASA Technical Reports Server (NTRS)

    Clark, S. K.; Dodge, R. N.

    1989-01-01

    A method is presented for calculating the stress-strain relations in a multi-layer composite made up of materials whose individual stress-strain characteristics are non-linear and possibly different. The method is applied to the case of asymmetric tubes in tension, and comparisons with experimentally measured data are given.

  8. Coordinated Fault-Tolerance for High-Performance Computing Final Project Report

    SciTech Connect

    Panda, Dhabaleswar Kumar; Beckman, Pete

    2011-07-28

    With the Coordinated Infrastructure for Fault Tolerance Systems (CIFTS, as the original project came to be called) project, our aim has been to understand and tackle the following broad research questions, the answers to which will help the HEC community analyze and shape the direction of research in the field of fault tolerance and resiliency on future high-end leadership systems. Will availability of global fault information, obtained by fault information exchange between the different HEC software on a system, allow individual system software to better detect, diagnose, and adaptively respond to faults? If fault-awareness is raised throughout the system through fault information exchange, is it possible to get all system software working together to provide a more comprehensive end-to-end fault management on the system? What are the missing fault-tolerance features that widely used HEC system software lacks today that would inhibit such software from taking advantage of systemwide global fault information? What are the practical limitations of a systemwide approach for end-to-end fault management based on fault awareness and coordination? What mechanisms, tools, and technologies are needed to bring about fault awareness and coordination of responses on a leadership-class system? What standards, outreach, and community interaction are needed for adoption of the concept of fault awareness and coordination for fault management on future systems? Keeping our overall objectives in mind, the CIFTS team has taken a parallel fourfold approach. Our central goal was to design and implement a light-weight, scalable infrastructure with a simple, standardized interface to allow communication of fault-related information through the system and facilitate coordinated responses. This work led to the development of the Fault Tolerance Backplane (FTB) publish-subscribe API specification, together with a reference implementation and several experimental implementations on top of

  9. Fault-tolerance and two-level pipelining in VLSI systolic arrays

    SciTech Connect

    Kung, H.T.; Lam, M.S.

    1984-01-01

    The authors address two important issues in systolic array designs: fault-tolerance and two-level pipelining. The proposed systolic fault-tolerant scheme maintains the original data flow pattern by bypassing defective cells with a few registers. As a result, many of the desirable properties of systolic arrays (such as local and regular communication between cells) are preserved. Two-level pipelining refers to the use of pipelined functional units in the implementation of systolic cells. The authors paper addresses the problem of efficiently utilizing pipelined units to increase the overall system throughput. They show that both of these problems can be reduced to the same mathematical problem of incorporating extra delays on certain data paths in originally correct systolic designs. They introduce the mathematical notion of a cut which enables them to handle this problem effectively. The results obtained by applying the techniques described are encouraging. When applied to systolic arrays without feedback cycles, the arrays can tolerate large numbers of failures (with the addition of very little hardware) while maintaining the original throughput. Furthermore, all of the pipeline stages in the cells can be kept fully utilized through the addition of a small number of delay registers. However, adding delays to systolic arrays with cycles typically induces a significant decrease in throughput. In response to this, they have derived a new class of systolic algorithms in which the data cycle around a ring of processing cells. The systolic ring architecture has the property that its performance degrades gracefully as cells fail. Using the cut theory for arrays without feedback and the ring architecture approach for those with feedback, they have effective fault-tolerant and two-level pipelining schemes for most systolic arrays. 24 references.

  10. Fault-tolerant analysis and control of SSRMS-type manipulators with single-joint failure

    NASA Astrophysics Data System (ADS)

    She, Yu; Xu, Wenfu; Su, Haijun; Liang, Bin; Shi, Hongliang

    2016-03-01

    Several space manipulators, whose configurations are similar to that of the Space Station Remote Manipulator System (SSRMS, also called Canadarm2), are playing important roles in the construction and maintenance of the International Space Station. Working in the harsh orbital environment, they are at high risk of single-joint failure. Fault-tolerant capability is critical for those manipulators to complete their on-orbital tasks. In this paper, we analysed and compared the manipulation capability of SSRMS-type manipulators with joints locked at arbitrary positions, and proposed efficient path planning via a fault-tolerant control method. First, a unified kinematic model of this type of manipulators was established. Second, the manipulation capability of the original 7-DOF (degrees of freedom) redundant manipulator was analysed and compared with its degraded 6-DOF counterparts formed by different joint locking configurations. Then, we identified those joints with large sensitivity to fault tolerance performance. The influences of different positions of all joints were also determined by numerical computation. Based on the analysis, the relatively safe and dangerous regions for each joint failure were identified. Finally, we proposed a path planning strategy and realized by a H∞ controller which enables the failure joint locked in the safe region, and simulations were carried on a degraded 3-DOF planar redundant manipulator to verify the planning strategy and control approach. This paper provided important analysis results and efficient methods to address the possible problems of SSRMS-type manipulators caused by single-joint failure that can be extended to other types of manipulators. Moreover, the proposed method is useful for designing the optimal configuration of a redundant manipulator.

  11. High Speed Operation and Testing of a Fault Tolerant Magnetic Bearing

    NASA Technical Reports Server (NTRS)

    DeWitt, Kenneth; Clark, Daniel

    2004-01-01

    Research activities undertaken to upgrade the fault-tolerant facility, continue testing high-speed fault-tolerant operation, and assist in the commission of the high temperature (1000 degrees F) thrust magnetic bearing as described. The fault-tolerant magnetic bearing test facility was upgraded to operate to 40,000 RPM. The necessary upgrades included new state-of-the art position sensors with high frequency modulation and new power edge filtering of amplifier outputs. A comparison study of the new sensors and the previous system was done as well as a noise assessment of the sensor-to-controller signals. Also a comparison study of power edge filtering for amplifier-to-actuator signals was done; this information is valuable for all position sensing and motor actuation applications. After these facility upgrades were completed, the rig is believed to have capabilities for 40,000 RPM operation, though this has yet to be demonstrated. Other upgrades included verification and upgrading of safety shielding, and upgrading control algorithms. The rig will now also be used to demonstrate motoring capabilities and control algorithms are in the process of being created. Recently an extreme temperature thrust magnetic bearing was designed from the ground up. The thrust bearing was designed to fit within the existing high temperature facility. The retrofit began near the end of the summer, 04, and continues currently. Contract staff authored a NASA-TM entitled "An Overview of Magnetic Bearing Technology for Gas Turbine Engines", containing a compilation of bearing data as it pertains to operation in the regime of the gas turbine engine and a presentation of how magnetic bearings can become a viable candidate for use in future engine technology.

  12. A multi-layer robust adaptive fault tolerant control system for high performance aircraft

    NASA Astrophysics Data System (ADS)

    Huo, Ying

    Modern high-performance aircraft demand advanced fault-tolerant flight control strategies. Not only the control effector failures, but the aerodynamic type failures like wing-body damages often result in substantially deteriorate performance because of low available redundancy. As a result the remaining control actuators may yield substantially lower maneuvering capabilities which do not authorize the accomplishment of the air-craft's original specified mission. The problem is to solve the control reconfiguration on available control redundancies when the mission modification is urged to save the aircraft. The proposed robust adaptive fault-tolerant control (RAFTC) system consists of a multi-layer reconfigurable flight controller architecture. It contains three layers accounting for different types and levels of failures including sensor, actuator, and fuselage damages. In case of the nominal operation with possible minor failure(s) a standard adaptive controller stands to achieve the control allocation. This is referred to as the first layer, the controller layer. The performance adjustment is accounted for in the second layer, the reference layer, whose role is to adjust the reference model in the controller design with a degraded transit performance. The upmost mission adjust is in the third layer, the mission layer, when the original mission is not feasible with greatly restricted control capabilities. The modified mission is achieved through the optimization of the command signal which guarantees the boundedness of the closed-loop signals. The main distinguishing feature of this layer is the the mission decision property based on the current available resources. The contribution of the research is the multi-layer fault-tolerant architecture that can address the complete failure scenarios and their accommodations in realities. Moreover, the emphasis is on the mission design capabilities which may guarantee the stability of the aircraft with restricted post

  13. Fault Tolerant Magnetic Bearing Testing and Conical Magnetic Bearing Development for Extreme Temperature Environments

    NASA Technical Reports Server (NTRS)

    Keith, Theo G., Jr.; Clark, Daniel

    2004-01-01

    During the six month tenure of the grant, activities included continued research of hydrostatic bearings as a viable backup-bearing solution for a magnetically levitated shaft system in extreme temperature environments (1000 F), developmental upgrades of the fault-tolerant magnetic bearing rig at the NASA Glenn Research Center, and assisting in the development of a conical magnetic bearing for extreme temperature environments, particularly turbomachinery. It leveraged work from the ongoing Smart Efficient Components (SEC) and the Turbine-Based Combined Cycle (TBCC) program at NASA Glenn Research Center. The effort was useful in providing technology for more efficient and powerful gas turbine engines.

  14. 2009 fault tolerance for extreme-scale computing workshop, Albuquerque, NM - March 19-20, 2009.

    SciTech Connect

    Katz, D. S.; Daly, J.; DeBardeleben, N.; Elnozahy, M.; Kramer, B.; Lathrop, S.; Nystrom, N.; Milfeld, K.; Sanielevici, S.; Scott, S.; Votta, L.; Louisiana State Univ.; Center for Exceptional Computing; LANL; IBM; Univ. of Illinois; Shodor Foundation; Pittsburgh Supercomputer Center; Texas Advanced Computing Center; ORNL; Sun Microsystems

    2009-02-01

    This is a report on the third in a series of petascale workshops co-sponsored by Blue Waters and TeraGrid to address challenges and opportunities for making effective use of emerging extreme-scale computing. This workshop was held to discuss fault tolerance on large systems for running large, possibly long-running applications. The main point of the workshop was to have systems people, middleware people (including fault-tolerance experts), and applications people talk about the issues and figure out what needs to be done, mostly at the middleware and application levels, to run such applications on the emerging petascale systems, without having faults cause large numbers of application failures. The workshop found that there is considerable interest in fault tolerance, resilience, and reliability of high-performance computing (HPC) systems in general, at all levels of HPC. The only way to recover from faults is through the use of some redundancy, either in space or in time. Redundancy in time, in the form of writing checkpoints to disk and restarting at the most recent checkpoint after a fault that cause an application to crash/halt, is the most common tool used in applications today, but there are questions about how long this can continue to be a good solution as systems and memories grow faster than I/O bandwidth to disk. There is interest in both modifications to this, such as checkpoints to memory, partial checkpoints, and message logging, and alternative ideas, such as in-memory recovery using residues. We believe that systematic exploration of these ideas holds the most promise for the scientific applications community. Fault tolerance has been an issue of discussion in the HPC community for at least the past 10 years; but much like other issues, the community has managed to put off addressing it during this period. There is a growing recognition that as systems continue to grow to petascale and beyond, the field is approaching the point where we don't have

  15. Impact of coverage on the reliability of a fault tolerant computer

    NASA Technical Reports Server (NTRS)

    Bavuso, S. J.

    1975-01-01

    A mathematical reliability model is established for a reconfigurable fault tolerant avionic computer system utilizing state-of-the-art computers. System reliability is studied in light of the coverage probabilities associated with the first and second independent hardware failures. Coverage models are presented as a function of detection, isolation, and recovery probabilities. Upper and lower bonds are established for the coverage probabilities and the method for computing values for the coverage probabilities is investigated. Further, an architectural variation is proposed which is shown to enhance coverage.

  16. A unified method for analyzing mission reliability for fault tolerant computer systems.

    NASA Technical Reports Server (NTRS)

    Bricker, J. L.

    1973-01-01

    For fault-tolerant computer systems consisting of multiple classes of modules, a unified method for analyzing mission reliability is proposed and evaluated. The analysis proceeds by generalizing the notions of standby and N modular redundancy into a concept called hybrid-degraded redundancy. The probabilistic evaluation of the unified redundancy concept is then developed to yield, for a given modular class, the joint distribution of success and the number of nonfailed modules from that class, at special times. With this information, a Markov chain analysis gives the reliability of an entire sequence of phases (mission profile).

  17. Over-constrained rigid multibody systems: differential kinematics and fault tolerance

    NASA Astrophysics Data System (ADS)

    Yi, Yong; McInroy, John E.; Chen, Yixin

    2002-07-01

    Over-constrained parallel manipulators can be used for fault tolerance. This paper derives the differential kinematics and static force model for a general over-constrained rigid multibody system. The result shows that the redundant constraints result in constrained active joints and redundant internal force. By incorporating these constraints, general methods for overcoming stuck legs or even the complete loss of legs are derived. The Stewart platform special case is studied as an example, and the relationship between its forward Jacobian and its inverse Jacobian is also found.

  18. Fault-tolerant Remote Quantum Entanglement Establishment for Secure Quantum Communications

    NASA Astrophysics Data System (ADS)

    Tsai, Chia-Wei; Lin, Jason

    2016-02-01

    This work presents a strategy for constructing long-distance quantum communications among a number of remote users through collective-noise channel. With the assistance of semi-honest quantum certificate authorities (QCAs), the remote users can share a secret key through fault-tolerant entanglement swapping. The proposed protocol is feasible for large-scale distributed quantum networks with numerous users. Each pair of communicating parties only needs to establish the quantum channels and the classical authenticated channels with his/her local QCA. Thus, it enables any user to communicate freely without point-to-point pre-establishing any communication channels, which is efficient and feasible for practical environments.

  19. Reliability calculation using randomization for Markovian fault-tolerant computing systems

    NASA Technical Reports Server (NTRS)

    Miller, D. R.

    1982-01-01

    The randomization technique for computing transient probabilities of Markov processes is presented. The technique is applied to a Markov process model of a simplified fault tolerant computer system for illustrative purposes. It is applicable to much larger and more complex models. Transient state probabilities are computed, from which reliabilities are derived. An accelerated version of the randomization algorithm is developed which exploits ''stiffness' of the models to gain increased efficiency. A great advantage of the randomization approach is that it easily allows probabilities and reliabilities to be computed to any predetermined accuracy.

  20. Lithium Ion Battery (LIB) Charger: Spacesuit Battery Charger Design with 2-Fault Tolerance to Catastrophic Hazards

    NASA Technical Reports Server (NTRS)

    Darcy, Eric; Davies, Frank

    2009-01-01

    Charger design that is 2-fault tolerant to catastrophic has been achieved for the Spacesuit Li-ion Battery with key features. Power supply control circuit and 2 microprocessors independently control against overcharge. 3 microprocessor control against undercharge (false positive: Go for EVA) conditions. 2 independent channels provide functional redundancy. Capable of charge balancing cell banks in series. Cell manufacturing and performance uniformity is excellent with both designs. Once a few outliers are removed, LV cells are slightly more uniform than MoliJ cells. If cell balance feature of charger is ever invoked, it will be an indication of a significant degradation issue, not a nominal condition.

  1. Self-stabilizing byzantine-fault-tolerant clock synchronization system and method

    NASA Technical Reports Server (NTRS)

    Malekpour, Mahyar R. (Inventor)

    2012-01-01

    Systems and methods for rapid Byzantine-fault-tolerant self-stabilizing clock synchronization are provided. The systems and methods are based on a protocol comprising a state machine and a set of monitors that execute once every local oscillator tick. The protocol is independent of specific application specific requirements. The faults are assumed to be arbitrary and/or malicious. All timing measures of variables are based on the node's local clock and thus no central clock or externally generated pulse is used. Instances of the protocol are shown to tolerate bursts of transient failures and deterministically converge with a linear convergence time with respect to the synchronization period as predicted.

  2. Safety Verification of a Fault Tolerant Reconfigurable Autonomous Goal-Based Robotic Control System

    NASA Technical Reports Server (NTRS)

    Braman, Julia M. B.; Murray, Richard M; Wagner, David A.

    2007-01-01

    Fault tolerance and safety verification of control systems are essential for the success of autonomous robotic systems. A control architecture called Mission Data System (MDS), developed at the Jet Propulsion Laboratory, takes a goal-based control approach. In this paper, a method for converting goal network control programs into linear hybrid systems is developed. The linear hybrid system can then be verified for safety in the presence of failures using existing symbolic model checkers. An example task is simulated in MDS and successfully verified using HyTech, a symbolic model checking software for linear hybrid systems.

  3. Catalysis and activation of magic states in fault-tolerant architectures

    SciTech Connect

    Campbell, Earl T.

    2011-03-15

    In many architectures for fault-tolerant quantum computing universality is achieved by a combination of Clifford group unitary operators and preparation of suitable nonstabilizer states, the so-called magic states. Universality is possible even for some fairly noisy nonstabilizer states, as distillation can convert many noisy copies into fewer purer magic states. Here we propose protocols that exploit multiple species of magic states in surprising ways. These protocols provide examples of previously unobserved phenomena that are analogous to catalysis and activation well known in entanglement theory.

  4. Reliability model of fault-tolerant data processing system with primary and backup nodes

    NASA Astrophysics Data System (ADS)

    Rahman, P. A.; Bobkova, E. Yu

    2016-04-01

    This paper deals with the fault-tolerant data processing systems, which are widely used in modern world of information technologies and have acceptable overhead expenses in hardware implementation. A simplified reliability model for duplex systems and the offered by authors advanced model for data processing systems with primary and backup nodes based on a three-state model of recoverable elements, which takes into consideration different failure rates of passive and active nodes and finite time of node activation, are also given. A calculation formula for the availability factor of the dual-node data processing system with primary and backup nodes and calculation examples are also provided.

  5. Fault-tolerant control of delta operator systems with actuator saturation and effectiveness loss

    NASA Astrophysics Data System (ADS)

    Yang, Hongjiu; Zhang, Luyang; Zhao, Ling; Yuan, Yuan

    2016-07-01

    This paper studies the problem of robust fault-tolerant control against the actuator effectiveness loss for delta operator systems with actuator saturation. Ellipsoids are used to estimate the domain of attraction for the delta operator systems with actuator saturation and effectiveness loss. Some invariance set conditions used for enlarging the domain of attraction are expressed by linear matrix inequalities. Discussions on system performance optimisation are presented in this paper, including reduction on computational complexity, expansion of the domain of attraction and disturbance rejection. Two numerical examples are given to illustrate the effectiveness of the developed techniques.

  6. Trojan horse attack free fault-tolerant quantum key distribution protocols

    NASA Astrophysics Data System (ADS)

    Yang, Chun-Wei; Hwang, Tzonelih

    2013-11-01

    This work proposes two quantum key distribution (QKD) protocols—each of which is robust under one kind of collective noises—collective-dephasing noise and collective-rotation noise. Due to the use of a new coding function which produces error-robust codewords allowing one-time transmission of quanta, the proposed QKD schemes are fault-tolerant and congenitally free from Trojan horse attacks without having to use any extra hardware. Moreover, by adopting two Bell state measurements instead of a 4-GHZ state joint measurement for decoding, the proposed protocols are practical in combating collective noises.

  7. Fault tolerant control for switching discrete-time systems with delays: an improved cone complementarity approach

    NASA Astrophysics Data System (ADS)

    Benzaouia, Abdellah; Ouladsine, Mustapha; Ananou, Bouchra

    2014-10-01

    In this paper, fault tolerant control problem for discrete-time switching systems with delay is studied. Sufficient conditions of building an observer are obtained by using multiple Lyapunov function. These conditions are worked out in a new way, using cone complementarity technique, to obtain new LMIs with slack variables and multiple weighted residual matrices. The obtained results are applied on a numerical example showing fault detection, localisation of fault and reconfiguration of the control to maintain asymptotic stability even in the presence of a permanent sensor fault.

  8. Model prototype utilization in the analysis of fault tolerant control and data processing systems

    NASA Astrophysics Data System (ADS)

    Kovalev, I. V.; Tsarev, R. Yu; Gruzenkin, D. V.; Prokopenko, A. V.; Knyazkov, A. N.; Laptenok, V. D.

    2016-04-01

    The procedure assessing the profit of control and data processing system implementation is presented in the paper. The reasonability of model prototype creation and analysis results from the implementing of the approach of fault tolerance provision through the inclusion of structural and software assessment redundancy. The developed procedure allows finding the best ratio between the development cost and the analysis of model prototype and earnings from the results of this utilization and information produced. The suggested approach has been illustrated by the model example of profit assessment and analysis of control and data processing system.

  9. An experimental evaluation of the effectiveness of random testing of fault-tolerant software

    NASA Technical Reports Server (NTRS)

    Vouk, Mladen A.; Mcallister, David F.; Tai, K. C.

    1986-01-01

    Results of a fault-tolerant software (FTS) experiment are used to show deficiencies of the simple random testing approach. Testing was performed using randomly generated test cases supplemented with extremal and special value (ESV) cases. Error detection efficiency of the random testing approach, with emphasis on correlated errors, was compared to the error detecting capabilities of the ESV data and found deficient. The use of carefully designed test cases as a supplement to random testing, as well as the use of structure based testing are recommended.

  10. Plan for the Characterization of HIRF Effects on a Fault-Tolerant Computer Communication System

    NASA Technical Reports Server (NTRS)

    Torres-Pomales, Wilfredo; Malekpour, Mahyar R.; Miner, Paul S.; Koppen, Sandra V.

    2008-01-01

    This report presents the plan for the characterization of the effects of high intensity radiated fields on a prototype implementation of a fault-tolerant data communication system. Various configurations of the communication system will be tested. The prototype system is implemented using off-the-shelf devices. The system will be tested in a closed-loop configuration with extensive real-time monitoring. This test is intended to generate data suitable for the design of avionics health management systems, as well as redundancy management mechanisms and policies for robust distributed processing architectures.

  11. The Design of Fault Tolerant Quantum Dot Cellular Automata Based Logic

    NASA Technical Reports Server (NTRS)

    Armstrong, C. Duane; Humphreys, William M.; Fijany, Amir

    2002-01-01

    As transistor geometries are reduced, quantum effects begin to dominate device performance. At some point, transistors cease to have the properties that make them useful computational components. New computing elements must be developed in order to keep pace with Moore s Law. Quantum dot cellular automata (QCA) represent an alternative paradigm to transistor-based logic. QCA architectures that are robust to manufacturing tolerances and defects must be developed. We are developing software that allows the exploration of fault tolerant QCA gate architectures by automating the specification, simulation, analysis and documentation processes.

  12. The Development of Design Tools for Fault Tolerant Quantum Dot Cellular Automata Based Logic

    NASA Technical Reports Server (NTRS)

    Armstrong, Curtis D.; Humphreys, William M.

    2003-01-01

    We are developing software to explore the fault tolerance of quantum dot cellular automata gate architectures in the presence of manufacturing variations and device defects. The Topology Optimization Methodology using Applied Statistics (TOMAS) framework extends the capabilities of the A Quantum Interconnected Network Array Simulator (AQUINAS) by adding front-end and back-end software and creating an environment that integrates all of these components. The front-end tools establish all simulation parameters, configure the simulation system, automate the Monte Carlo generation of simulation files, and execute the simulation of these files. The back-end tools perform automated data parsing, statistical analysis and report generation.

  13. Fault tolerant computing: A preamble for assuring viability of large computer systems

    NASA Technical Reports Server (NTRS)

    Lim, R. S.

    1977-01-01

    The need for fault-tolerant computing is addressed from the viewpoints of (1) why it is needed, (2) how to apply it in the current state of technology, and (3) what it means in the context of the Phoenix computer system and other related systems. To this end, the value of concurrent error detection and correction is described. User protection, program retry, and repair are among the factors considered. The technology of algebraic codes to protect memory systems and arithmetic codes to protect memory systems and arithmetic codes to protect arithmetic operations is discussed.

  14. Fault tolerant topologies for fiber optic networks and computer interconnects operating in the severe avionics environment

    NASA Astrophysics Data System (ADS)

    Glista, Andrew S., Jr.

    1991-02-01

    The history of fiber optics technology development for naval aircraft is reviewed, and the current status of network and fly-by-light flight control development is examined. Fiber-optic component selection for aircraft is addressed, covering fiber and cables, optical sources, couplers, and connectors. Novel fault-tolerant network topologies for both analog and digital fiber optic transmission, which will permit both packet- and circuit-switched operation of robust fiber optic networks are discussed. The application of smart skin technology, i.e., fibers embedded in composite materials, to optical computer backplanes is briefly considered.

  15. Fly-By-Light/Power-By-Wire Fault-Tolerant Fiber-Optic Backplane

    NASA Technical Reports Server (NTRS)

    Malekpour, Mahyar R.

    2002-01-01

    The design and development of a fault-tolerant fiber-optic backplane to demonstrate feasibility of such architecture is presented. The simulation results of test cases on the backplane in the advent of induced faults are presented, and the fault recovery capability of the architecture is demonstrated. The architecture was designed, developed, and implemented using the Very High Speed Integrated Circuits (VHSIC) Hardware Description Language (VHDL). The architecture was synthesized and implemented in hardware using Field Programmable Gate Arrays (FPGA) on multiple prototype boards.

  16. AVR microcontroller simulator for software implemented hardware fault tolerance algorithms research

    NASA Astrophysics Data System (ADS)

    Piotrowski, Adam; Tarnowski, Szymon; Napieralski, Andrzej

    2008-01-01

    Reliability of new, advanced electronic systems becomes a serious problem especially in places like accelerators and synchrotrons, where sophisticated digital devices operate closely to radiation sources. One of the possible solutions to harden the microprocessor-based system is a strict programming approach known as the Software Implemented Hardware Fault Tolerance. Unfortunately, in real environments it is not possible to perform precise and accurate tests of the new algorithms due to hardware limitation. This paper highlights the AVR-family microcontroller simulator project equipped with an appropriate monitoring and the SEU injection systems.

  17. A performance evaluation of the software-implemented fault-tolerance computer

    NASA Technical Reports Server (NTRS)

    Palumbo, D. L.; Butler, R. W.

    1986-01-01

    The results of a performance evaluation of the Software-Implemented Fault-Tolerance (SIFT) computer system conducted in the NASA Avionics Integration Research Laboratory are presented. The essential system functions are described and compared to both earlier design proposals and subsequent design improvements. Using SIFT's specimen task load, the executive tasks, such as reconfiguration, clock synchronization, and interactive consistency, are found to consume significant computing resources. Together with other system overhead (e.g., voting and scheduling), the operating system overhead is in excess of 60 percent. The authors propose specific design changes that reduce this overhead burden significantly.

  18. SIFT - A preliminary evaluation. [Software Implemented Fault Tolerant computer for aircraft control

    NASA Technical Reports Server (NTRS)

    Palumbo, D. L.; Butler, R. W.

    1983-01-01

    This paper presents the results of a performance evaluation of the SIFT computer system conducted in the NASA AIRLAB facility. The essential system functions are described and compared to both earlier design proposals and subsequent design improvements. The functions supporting fault tolerance are found to consume significant computing resources. With SIFT's specimen task load, scheduled at a 30-Hz rate, the executive tasks such as reconfiguration, clock synchronization and interactive consistency, require 55 percent of the available task slots. Other system overhead (e.g., voting and scheduling) use an average of 50 percent of each remaining task slot.

  19. Fault-tolerant Remote Quantum Entanglement Establishment for Secure Quantum Communications

    NASA Astrophysics Data System (ADS)

    Tsai, Chia-Wei; Lin, Jason

    2016-07-01

    This work presents a strategy for constructing long-distance quantum communications among a number of remote users through collective-noise channel. With the assistance of semi-honest quantum certificate authorities (QCAs), the remote users can share a secret key through fault-tolerant entanglement swapping. The proposed protocol is feasible for large-scale distributed quantum networks with numerous users. Each pair of communicating parties only needs to establish the quantum channels and the classical authenticated channels with his/her local QCA. Thus, it enables any user to communicate freely without point-to-point pre-establishing any communication channels, which is efficient and feasible for practical environments.

  20. DYNAMIC NON LINEAR IMPACT ANALYSIS OF FUEL CASK CONTAINMENT VESSELS

    SciTech Connect

    Leduc, D

    2008-06-10

    Large fuel casks present challenges when evaluating their performance in the accident sequence specified in 10CFR 71. Testing is often limited because of cost, difficulty in preparing test units and the limited availability of facilities which can carry out such tests. In the past, many casks were evaluated without testing using simplified analytical methods. This paper details the use of dynamic non-linear analysis of large fuel casks using advanced computational techniques. Results from the dynamic analysis of two casks, the T-3 Spent Fuel Cask and the Hanford Un-irradiated Fuel Package are examined in detail. These analyses are used to fully evaluate containment vessel stresses and strains resulting from complex loads experienced by cask components during impacts. Importantly, these advanced analytical analyses are capable of examining stresses in key regions of the cask including the cask closure. This paper compares these advanced analytical results with the results of simplified cask analyses like those detailed in NUREG 3966.

  1. Non-linear dark energy clustering

    SciTech Connect

    Anselmi, Stefano; Ballesteros, Guillermo; Pietroni, Massimo E-mail: ballesteros@pd.infn.it

    2011-11-01

    We consider a dark energy fluid with arbitrary sound speed and equation of state and discuss the effect of its clustering on the cold dark matter distribution at the non-linear level. We write the continuity, Euler and Poisson equations for the system in the Newtonian approximation. Then, using the time renormalization group method to resum perturbative corrections at all orders, we compute the total clustering power spectrum and matter power spectrum. At the linear level, a sound speed of dark energy different from that of light modifies the power spectrum on observationally interesting scales, such as those relevant for baryonic acoustic oscillations. We show that the effect of varying the sound speed of dark energy on the non-linear corrections to the matter power spectrum is below the per cent level, and therefore these corrections can be well modelled by their counterpart in cosmological scenarios with smooth dark energy. We also show that the non-linear effects on the matter growth index can be as large as 10–15 per cent for small scales.

  2. Phototube non-linearity correction technique

    NASA Astrophysics Data System (ADS)

    Riboldi, S.; Blasi, N.; Brambilla, S.; Camera, F.; Giaz, A.; Million, B.

    2015-06-01

    Scintillation light is often detected by photo-multiplier tube (PMT) technology. PMTs are however intrinsically non linear devices, especially when operated with high light yield scintillators and high input photon flux. Many physical effects (e.g. inter-dynode field variation, photocathode resistivity, etc.) can spoil the ideal PMT behavior in terms of gain, ending up in what are addressed as the under-linearity and over-linearity effects. Established techniques implemented in the PMT base (e.g. increasing bleeding current, active voltage divider, etc.) can mitigate these effects, but given the unavoidable spread in manufacturing and materials, it turns out that, with respect to linearity at the percent level, every PMT sample is a story of its own. The residual non linearity is usually accounted for with polynomial correction of the spectrum energy scale, starting from the position of a few known energy peaks of calibration sources, but uncertainly remains in between of calibration peaks. We propose to retrieve the calibration information from the entire energy spectrum and not only the position of full energy peaks (FEP), by means of an automatic procedure that also takes into account the quality (signal/noise ratio) of the information about the non-linearity extracted from the various regions of the spectrum.

  3. Evaluation of Simple Causal Message Logging for Large-Scale Fault Tolerant HPC Systems

    SciTech Connect

    Bronevetsky, G; Meneses, E; Kale, L V

    2011-02-25

    The era of petascale computing brought machines with hundreds of thousands of processors. The next generation of exascale supercomputers will make available clusters with millions of processors. In those machines, mean time between failures will range from a few minutes to few tens of minutes, making the crash of a processor the common case, instead of a rarity. Parallel applications running on those large machines will need to simultaneously survive crashes and maintain high productivity. To achieve that, fault tolerance techniques will have to go beyond checkpoint/restart, which requires all processors to roll back in case of a failure. Incorporating some form of message logging will provide a framework where only a subset of processors are rolled back after a crash. In this paper, we discuss why a simple causal message logging protocol seems a promising alternative to provide fault tolerance in large supercomputers. As opposed to pessimistic message logging, it has low latency overhead, especially in collective communication operations. Besides, it saves messages when more than one thread is running per processor. Finally, we demonstrate that a simple causal message logging protocol has a faster recovery and a low performance penalty when compared to checkpoint/restart. Running NAS Parallel Benchmarks (CG, MG and BT) on 1024 processors, simple causal message logging has a latency overhead below 5%.

  4. A performance assessment of a byzantine resilient fault-tolerant computer

    NASA Technical Reports Server (NTRS)

    Young, Steven D.; Elks, Carl R.; Graham, R. L.

    1989-01-01

    This report presents the results of a performance analysis of a quad-redundant Fault-Tolerant Processor (FTP). The FTP is a computing system specifically designed for applications where very high reliability is required. Examples of such applications are flight control systems, nuclear power systems, and spacecraft control systems. The FTP performance was analyzed in a hierarchical manner encompassing the hardware, the operating system, and the application. At the hardware level, the hardware organization and design was assessed in relation to system throughput and response. Analysis at the operating system level revealed that the scheduler took only 3.2 percent of each 40ms frame, while the redundancy management software took 10.4 percent. The application level performance was analyzed via a synthetic workload and a representative flight control model. The estimated throughput for this application was found to be 317.6 KIPS if not exercising the voter. Exercising the voter to ensure fault tolerance will diminish this number linearly as the number of votes is increased. This performance analysis method was proven effective by uncovering undesirable behavior and anomalies in the FTP system.

  5. Design and implementation of a fault-tolerant and dynamic metadata database for clinical trials

    NASA Astrophysics Data System (ADS)

    Lee, J.; Zhou, Z.; Talini, E.; Documet, J.; Liu, B.

    2007-03-01

    In recent imaging-based clinical trials, quantitative image analysis (QIA) and computer-aided diagnosis (CAD) methods are increasing in productivity due to higher resolution imaging capabilities. A radiology core doing clinical trials have been analyzing more treatment methods and there is a growing quantity of metadata that need to be stored and managed. These radiology centers are also collaborating with many off-site imaging field sites and need a way to communicate metadata between one another in a secure infrastructure. Our solution is to implement a data storage grid with a fault-tolerant and dynamic metadata database design to unify metadata from different clinical trial experiments and field sites. Although metadata from images follow the DICOM standard, clinical trials also produce metadata specific to regions-of-interest and quantitative image analysis. We have implemented a data access and integration (DAI) server layer where multiple field sites can access multiple metadata databases in the data grid through a single web-based grid service. The centralization of metadata database management simplifies the task of adding new databases into the grid and also decreases the risk of configuration errors seen in peer-to-peer grids. In this paper, we address the design and implementation of a data grid metadata storage that has fault-tolerance and dynamic integration for imaging-based clinical trials.

  6. Fault-tolerant control of large space structures using the stable factorization approach

    NASA Technical Reports Server (NTRS)

    Razavi, H. C.; Mehra, R. K.; Vidyasagar, M.

    1986-01-01

    Large space structures are characterized by the following features: they are in general infinite-dimensional systems, and have large numbers of undamped or lightly damped poles. Any attempt to apply linear control theory to large space structures must therefore take into account these features. Phase I consisted of an attempt to apply the recently developed Stable Factorization (SF) design philosophy to problems of large space structures, with particular attention to the aspects of robustness and fault tolerance. The final report on the Phase I effort consists of four sections, each devoted to one task. The first three sections report theoretical results, while the last consists of a design example. Significant results were obtained in all four tasks of the project. More specifically, an innovative approach to order reduction was obtained, stabilizing controller structures for plants with an infinite number of unstable poles were determined under some conditions, conditions for simultaneous stabilizability of an infinite number of plants were explored, and a fault tolerance controller design that stabilizes a flexible structure model was obtained which is robust against one failure condition.

  7. Study of a unified hardware and software fault-tolerant architecture

    NASA Technical Reports Server (NTRS)

    Lala, Jaynarayan; Alger, Linda; Friend, Steven; Greeley, Gregory; Sacco, Stephen; Adams, Stuart

    1989-01-01

    A unified architectural concept, called the Fault Tolerant Processor Attached Processor (FTP-AP), that can tolerate hardware as well as software faults is proposed for applications requiring ultrareliable computation capability. An emulation of the FTP-AP architecture, consisting of a breadboard Motorola 68010-based quadruply redundant Fault Tolerant Processor, four VAX 750s as attached processors, and four versions of a transport aircraft yaw damper control law, is used as a testbed in the AIRLAB to examine a number of critical issues. Solutions of several basic problems associated with N-Version software are proposed and implemented on the testbed. This includes a confidence voter to resolve coincident errors in N-Version software. A reliability model of N-Version software that is based upon the recent understanding of software failure mechanisms is also developed. The basic FTP-AP architectural concept appears suitable for hosting N-Version application software while at the same time tolerating hardware failures. Architectural enhancements for greater efficiency, software reliability modeling, and N-Version issues that merit further research are identified.

  8. Fault Tolerance Implementation within SRAM Based FPGA Designs based upon Single Event Upset Occurrence Rates

    NASA Technical Reports Server (NTRS)

    Berg, Melanie

    2006-01-01

    Emerging technology is enabling the design community to consistently expand the amount of functionality that can be implemented within Integrated Circuits (ICs). As the number of gates placed within an FPGA increases, the complexity of the design can grow exponentially. Consequently, the ability to create reliable circuits has become an incredibly difficult task. In order to ease the complexity of design completion, the commercial design community has developed a very rigid (but effective) design methodology based on synchronous circuit techniques. In order to create faster, smaller and lower power circuits, transistor geometries and core voltages have decreased. In environments that contain ionizing energy, such a combination will increase the probability of Single Event Upsets (SEUs) and will consequently affect the state space of a circuit. In order to combat the effects of radiation, the aerospace community has developed several "Hardened by Design" (fault tolerant) design schemes. This paper will address design mitigation schemes targeted for SRAM Based FPGA CMOS devices. Because some mitigation schemes may be over zealous (too much power, area, complexity, etc.. . .), the designer should be conscious that system requirements can ease the amount of mitigation necessary for acceptable operation. Therefore, various degrees of Fault Tolerance will be demonstrated along with an analysis of its effectiveness.

  9. Stacked codes: Universal fault-tolerant quantum computation in a two-dimensional layout

    NASA Astrophysics Data System (ADS)

    Jochym-O'Connor, Tomas; Bartlett, Stephen D.

    2016-02-01

    We introduce a class of three-dimensional color codes, which we call stacked codes, together with a fault-tolerant transformation that will map logical qubits encoded in two-dimensional (2D) color codes into stacked codes and back. The stacked code allows for the transversal implementation of a non-Clifford π /8 logical gate, which when combined with the logical Clifford gates that are transversal in the 2D color code give a gate set that is both fault-tolerant and universal without requiring nonstabilizer magic states. We then show that the layers forming the stacked code can be unfolded and arranged in a 2D layout. As only Clifford gates can be implemented transversally for 2D topological stabilizer codes, a nonlocal operation must be incorporated in order to allow for this transversal application of a non-Clifford gate. Our code achieves this operation through the transformation from a 2D color code to the unfolded stacked code induced by measuring only geometrically local stabilizers and gauge operators within the bulk of 2D color codes together with a nonlocal operator that has support on a one-dimensional boundary between such 2D codes. We believe that this proposed method to implement the nonlocal operation is a realistic one for 2D stabilizer layouts and would be beneficial in avoiding the large overheads caused by magic state distillation.

  10. Optimizing the Reliability and Performance of Service Composition Applications with Fault Tolerance in Wireless Sensor Networks

    PubMed Central

    Wu, Zhao; Xiong, Naixue; Huang, Yannong; Xu, Degang; Hu, Chunyang

    2015-01-01

    The services composition technology provides flexible methods for building service composition applications (SCAs) in wireless sensor networks (WSNs). The high reliability and high performance of SCAs help services composition technology promote the practical application of WSNs. The optimization methods for reliability and performance used for traditional software systems are mostly based on the instantiations of software components, which are inapplicable and inefficient in the ever-changing SCAs in WSNs. In this paper, we consider the SCAs with fault tolerance in WSNs. Based on a Universal Generating Function (UGF) we propose a reliability and performance model of SCAs in WSNs, which generalizes a redundancy optimization problem to a multi-state system. Based on this model, an efficient optimization algorithm for reliability and performance of SCAs in WSNs is developed based on a Genetic Algorithm (GA) to find the optimal structure of SCAs with fault-tolerance in WSNs. In order to examine the feasibility of our algorithm, we have evaluated the performance. Furthermore, the interrelationships between the reliability, performance and cost are investigated. In addition, a distinct approach to determine the most suitable parameters in the suggested algorithm is proposed. PMID:26561818

  11. Design of a fault tolerant airborne digital computer. Volume 1: Architecture

    NASA Technical Reports Server (NTRS)

    Wensley, J. H.; Levitt, K. N.; Green, M. W.; Goldberg, J.; Neumann, P. G.

    1973-01-01

    This volume is concerned with the architecture of a fault tolerant digital computer for an advanced commercial aircraft. All of the computations of the aircraft, including those presently carried out by analogue techniques, are to be carried out in this digital computer. Among the important qualities of the computer are the following: (1) The capacity is to be matched to the aircraft environment. (2) The reliability is to be selectively matched to the criticality and deadline requirements of each of the computations. (3) The system is to be readily expandable. contractible, and (4) The design is to appropriate to post 1975 technology. Three candidate architectures are discussed and assessed in terms of the above qualities. Of the three candidates, a newly conceived architecture, Software Implemented Fault Tolerance (SIFT), provides the best match to the above qualities. In addition SIFT is particularly simple and believable. The other candidates, Bus Checker System (BUCS), also newly conceived in this project, and the Hopkins multiprocessor are potentially more efficient than SIFT in the use of redundancy, but otherwise are not as attractive.

  12. Universal fault-tolerant adiabatic quantum computing with quantum dots or donors

    NASA Astrophysics Data System (ADS)

    Landahl, Andrew

    I will present a conceptual design for an adiabatic quantum computer that can achieve arbitrarily accurate universal fault-tolerant quantum computations with a constant energy gap and nearest-neighbor interactions. This machine can run any quantum algorithm known today or discovered in the future, in principle. The key theoretical idea is adiabatic deformation of degenerate ground spaces formed by topological quantum error-correcting codes. An open problem with the design is making the four-body interactions and measurements it uses more technologically accessible. I will present some partial solutions, including one in which interactions between quantum dots or donors in a two-dimensional array can emulate the desired interactions in second-order perturbation theory. I will conclude with some open problems, including the challenge of reformulating Kitaev's gadget perturbation theory technique so that it preserves fault tolerance. Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000.

  13. Multi-version software reliability through fault-avoidance and fault-tolerance

    NASA Technical Reports Server (NTRS)

    Vouk, Mladen A.; Mcallister, David F.

    1989-01-01

    A number of experimental and theoretical issues associated with the practical use of multi-version software to provide run-time tolerance to software faults were investigated. A specialized tool was developed and evaluated for measuring testing coverage for a variety of metrics. The tool was used to collect information on the relationships between software faults and coverage provided by the testing process as measured by different metrics (including data flow metrics). Considerable correlation was found between coverage provided by some higher metrics and the elimination of faults in the code. Back-to-back testing was continued as an efficient mechanism for removal of un-correlated faults, and common-cause faults of variable span. Software reliability estimation methods was also continued based on non-random sampling, and the relationship between software reliability and code coverage provided through testing. New fault tolerance models were formulated. Simulation studies of the Acceptance Voting and Multi-stage Voting algorithms were finished and it was found that these two schemes for software fault tolerance are superior in many respects to some commonly used schemes. Particularly encouraging are the safety properties of the Acceptance testing scheme.

  14. A multiobjective scatter search algorithm for fault-tolerant NoC mapping optimisation

    NASA Astrophysics Data System (ADS)

    Le, Qianqi; Yang, Guowu; Hung, William N. N.; Zhang, Xinpeng; Fan, Fuyou

    2014-08-01

    Mapping IP cores to an on-chip network is an important step in Network-on-Chip (NoC) design and affects the performance of NoC systems. A mapping optimisation algorithm and a fault-tolerant mechanism are proposed in this article. The fault-tolerant mechanism and the corresponding routing algorithm can recover NoC communication from switch failures, while preserving high performance. The mapping optimisation algorithm is based on scatter search (SS), which is an intelligent algorithm with a powerful combinatorial search ability. To meet the requests of the NoC mapping application, the standard SS is improved for multiple objective optimisation. This method helps to obtain high-performance mapping layouts. The proposed algorithm was implemented on the Embedded Systems Synthesis Benchmarks Suite (E3S). Experimental results show that this optimisation algorithm achieves low-power consumption, little communication time, balanced link load and high reliability, compared to particle swarm optimisation and genetic algorithm.

  15. A fault-tolerant addressable spin qubit in a natural silicon quantum dot

    PubMed Central

    Takeda, Kenta; Kamioka, Jun; Otsuka, Tomohiro; Yoneda, Jun; Nakajima, Takashi; Delbecq, Matthieu R.; Amaha, Shinichi; Allison, Giles; Kodera, Tetsuo; Oda, Shunri; Tarucha, Seigo

    2016-01-01

    Fault-tolerant quantum computing requires high-fidelity qubits. This has been achieved in various solid-state systems, including isotopically purified silicon, but is yet to be accomplished in industry-standard natural (unpurified) silicon, mainly as a result of the dephasing caused by residual nuclear spins. This high fidelity can be achieved by speeding up the qubit operation and/or prolonging the dephasing time, that is, increasing the Rabi oscillation quality factor Q (the Rabi oscillation decay time divided by the π rotation time). In isotopically purified silicon quantum dots, only the second approach has been used, leaving the qubit operation slow. We apply the first approach to demonstrate an addressable fault-tolerant qubit using a natural silicon double quantum dot with a micromagnet that is optimally designed for fast spin control. This optimized design allows access to Rabi frequencies up to 35 MHz, which is two orders of magnitude greater than that achieved in previous studies. We find the optimum Q = 140 in such high-frequency range at a Rabi frequency of 10 MHz. This leads to a qubit fidelity of 99.6% measured via randomized benchmarking, which is the highest reported for natural silicon qubits and comparable to that obtained in isotopically purified silicon quantum dot–based qubits. This result can inspire contributions to quantum computing from industrial communities. PMID:27536725

  16. The use of automatic programming techniques for fault tolerant computing systems

    NASA Technical Reports Server (NTRS)

    Wild, C.

    1985-01-01

    It is conjectured that the production of software for ultra-reliable computing systems such as required by Space Station, aircraft, nuclear power plants and the like will require a high degree of automation as well as fault tolerance. In this paper, the relationship between automatic programming techniques and fault tolerant computing systems is explored. Initial efforts in the automatic synthesis of code from assertions to be used for error detection as well as the automatic generation of assertions and test cases from abstract data type specifications is outlined. Speculation on the ability to generate truly diverse designs capable of recovery from errors by exploring alternate paths in the program synthesis tree is discussed. Some initial thoughts on the use of knowledge based systems for the global detection of abnormal behavior using expectations and the goal-directed reconfiguration of resources to meet critical mission objectives are given. One of the sources of information for these systems would be the knowledge captured during the automatic programming process.

  17. Sliding mode based fault detection, reconstruction and fault tolerant control scheme for motor systems.

    PubMed

    Mekki, Hemza; Benzineb, Omar; Boukhetala, Djamel; Tadjine, Mohamed; Benbouzid, Mohamed

    2015-07-01

    The fault-tolerant control problem belongs to the domain of complex control systems in which inter-control-disciplinary information and expertise are required. This paper proposes an improved faults detection, reconstruction and fault-tolerant control (FTC) scheme for motor systems (MS) with typical faults. For this purpose, a sliding mode controller (SMC) with an integral sliding surface is adopted. This controller can make the output of system to track the desired position reference signal in finite-time and obtain a better dynamic response and anti-disturbance performance. But this controller cannot deal directly with total system failures. However an appropriate combination of the adopted SMC and sliding mode observer (SMO), later it is designed to on-line detect and reconstruct the faults and also to give a sensorless control strategy which can achieve tolerance to a wide class of total additive failures. The closed-loop stability is proved, using the Lyapunov stability theory. Simulation results in healthy and faulty conditions confirm the reliability of the suggested framework. PMID:25747198

  18. Fault-tolerant system analysis: imperfect switching and maintenance. Final technical paper

    SciTech Connect

    Veatch, M.H.; Foley, R.D.

    1987-01-01

    This final report presents the results of research into two important areas of concern for fault-tolerant avionics systems: testability analysis and innovative repair policies. The algorithms developed from this research have been included in the Mission Reliability Model (MIREM) and verified by comparison with known results from several Integrated Communication, Navigation, and Identification Avionics architectures. The purpose of the testability analysis was to develop techniques for assessing the impact of imperfect switching on the overall reliability of fault-tolerant avionics. A method of quantifying the effects of undetected errors and false alarms has been developed and included in MIREM. Under the next phase of the program, three repair statistics were identified: Mean Time To Repair, Mean Time Between Maintenance Actions, and Inherent Availability. These were used to define four alternative repair policies: immediate repair, deferred repair, scheduled maintenance, and repair at degraded level. Also included in MIREM as model outputs, these four options offer greater flexibility in evaluating and developing avionics designs.

  19. Scalable Replay with Partial-Order Dependencies for Message-Logging Fault Tolerance

    SciTech Connect

    Lifflander, Jonathan; Meneses, Esteban; Menon, Harshita; Miller, Phil; Krishnamoorthy, Sriram; Kale, Laxmikant

    2014-09-22

    Deterministic replay of a parallel application is commonly used for discovering bugs or to recover from a hard fault with message-logging fault tolerance. For message passing programs, a major source of overhead during forward execution is recording the order in which messages are sent and received. During replay, this ordering must be used to deterministically reproduce the execution. Previous work in replay algorithms often makes minimal assumptions about the programming model and application in order to maintain generality. However, in many cases, only a partial order must be recorded due to determinism intrinsic in the code, ordering constraints imposed by the execution model, and events that are commutative (their relative execution order during replay does not need to be reproduced exactly). In this paper, we present a novel algebraic framework for reasoning about the minimum dependencies required to represent the partial order for different concurrent orderings and interleavings. By exploiting this theory, we improve on an existing scalable message-logging fault tolerance scheme. The improved scheme scales to 131,072 cores on an IBM BlueGene/P with up to 2x lower overhead than one that records a total order.

  20. A fault-tolerant addressable spin qubit in a natural silicon quantum dot.

    PubMed

    Takeda, Kenta; Kamioka, Jun; Otsuka, Tomohiro; Yoneda, Jun; Nakajima, Takashi; Delbecq, Matthieu R; Amaha, Shinichi; Allison, Giles; Kodera, Tetsuo; Oda, Shunri; Tarucha, Seigo

    2016-08-01

    Fault-tolerant quantum computing requires high-fidelity qubits. This has been achieved in various solid-state systems, including isotopically purified silicon, but is yet to be accomplished in industry-standard natural (unpurified) silicon, mainly as a result of the dephasing caused by residual nuclear spins. This high fidelity can be achieved by speeding up the qubit operation and/or prolonging the dephasing time, that is, increasing the Rabi oscillation quality factor Q (the Rabi oscillation decay time divided by the π rotation time). In isotopically purified silicon quantum dots, only the second approach has been used, leaving the qubit operation slow. We apply the first approach to demonstrate an addressable fault-tolerant qubit using a natural silicon double quantum dot with a micromagnet that is optimally designed for fast spin control. This optimized design allows access to Rabi frequencies up to 35 MHz, which is two orders of magnitude greater than that achieved in previous studies. We find the optimum Q = 140 in such high-frequency range at a Rabi frequency of 10 MHz. This leads to a qubit fidelity of 99.6% measured via randomized benchmarking, which is the highest reported for natural silicon qubits and comparable to that obtained in isotopically purified silicon quantum dot-based qubits. This result can inspire contributions to quantum computing from industrial communities. PMID:27536725

  1. Predeployment validation of fault-tolerant systems through software-implemented fault insertion

    NASA Technical Reports Server (NTRS)

    Czeck, Edward W.; Siewiorek, Daniel P.; Segall, Zary Z.

    1989-01-01

    Fault injection-based automated testing (FIAT) environment, which can be used to experimentally characterize and evaluate distributed realtime systems under fault-free and faulted conditions is described. A survey is presented of validation methodologies. The need for fault insertion based on validation methodologies is demonstrated. The origins and models of faults, and motivation for the FIAT concept are reviewed. FIAT employs a validation methodology which builds confidence in the system through first providing a baseline of fault-free performance data and then characterizing the behavior of the system with faults present. Fault insertion is accomplished through software and allows faults or the manifestation of faults to be inserted by either seeding faults into memory or triggering error detection mechanisms. FIAT is capable of emulating a variety of fault-tolerant strategies and architectures, can monitor system activity, and can automatically orchestrate experiments involving insertion of faults. There is a common system interface which allows ease of use to decrease experiment development and run time. Fault models chosen for experiments on FIAT have generated system responses which parallel those observed in real systems under faulty conditions. These capabilities are shown by two example experiments each using a different fault-tolerance strategy.

  2. Fault Tolerant Architecture For A Fly-By-Light Flight Control Computer

    NASA Astrophysics Data System (ADS)

    Thompson, Kevin; Stipanovich, John; Smith, Brian; Reddy, Mahesh C.

    1990-02-01

    The next generation of flight control computers will utilize fiber optic technology to produce a fly-by-light flight control system. Optical transducers and optical fibers will take the place of electrical position transducers and wires, torsion bars, bell cranks, and cables. Applications for this fly-by-light technology include space launch vehicles, upperstages, space-craft, and commercial/military aircraft. Optical fibers are lighter than mechanical transmission media and unlike conven-tional wire transmissions are not susceptible to electromagnetic interference (EMI) and high energy emission sources. This paper will give an overview of a fault tolerant In-Line Monitored optical flight control system being developed at Boeing Aerospace & Electronics in Seattle, Washington. This system uses passive transducers with fiber optic interconnections which hold promises to virtually eliminate EMI threats to flight control system performance and flight safety and also provide significant weight savings. The main emphasis of this paper will be the In-Line Monitored architecture of the optical transducer system required for use in a fault tolerant flight control system.

  3. Minimum sliding mode error feedback control for fault tolerant reconfigurable satellite formations with J2 perturbations

    NASA Astrophysics Data System (ADS)

    Cao, Lu; Chen, Xiaoqian; Misra, Arun K.

    2014-03-01

    Minimum Sliding Mode Error Feedback Control (MSMEFC) is proposed to improve the control precision of spacecraft formations based on the conventional sliding mode control theory. This paper proposes a new approach to estimate and offset the system model errors, which include various kinds of uncertainties and disturbances, as well as smoothes out the effect of nonlinear switching control terms. To facilitate the analysis, the concept of equivalent control error is introduced, which is the key to the utilization of MSMEFC. A cost function is formulated on the basis of the principle of minimum sliding mode error; then the equivalent control error is estimated and fed back to the conventional sliding mode control. It is shown that the sliding mode after the MSMEFC will approximate to the ideal sliding mode, resulting in improved control performance and quality. The new methodology is applied to spacecraft formation flying. It guarantees global asymptotic convergence of the relative tracking error in the presence of J2 perturbations. In addition, some fault tolerant situations such as thruster failure for a period of time, thruster degradation and so on, are also considered to verify the effectiveness of MSMEFC. Numerical simulations are performed to demonstrate the efficacy of the proposed methodology to maintain and reconfigure the satellite formation with the existence of initial offsets and J2 perturbation effects, even in the fault-tolerant cases.

  4. Relaxed fault-tolerant hardware implementation of neural networks in the presence of multiple transient errors.

    PubMed

    Mahdiani, Hamid Reza; Fakhraie, Sied Mehdi; Lucas, Caro

    2012-08-01

    Reliability should be identified as the most important challenge in future nano-scale very large scale integration (VLSI) implementation technologies for the development of complex integrated systems. Normally, fault tolerance (FT) in a conventional system is achieved by increasing its redundancy, which also implies higher implementation costs and lower performance that sometimes makes it even infeasible. In contrast to custom approaches, a new class of applications is categorized in this paper, which is inherently capable of absorbing some degrees of vulnerability and providing FT based on their natural properties. Neural networks are good indicators of imprecision-tolerant applications. We have also proposed a new class of FT techniques called relaxed fault-tolerant (RFT) techniques which are developed for VLSI implementation of imprecision-tolerant applications. The main advantage of RFT techniques with respect to traditional FT solutions is that they exploit inherent FT of different applications to reduce their implementation costs while improving their performance. To show the applicability as well as the efficiency of the RFT method, the experimental results for implementation of a face-recognition computationally intensive neural network and its corresponding RFT realization are presented in this paper. The results demonstrate promising higher performance of artificial neural network VLSI solutions for complex applications in faulty nano-scale implementation environments. PMID:24807519

  5. Data-driven output-feedback fault-tolerant L2 control of unknown dynamic systems.

    PubMed

    Wang, Jun-Sheng; Yang, Guang-Hong

    2016-07-01

    This paper studies the data-driven output-feedback fault-tolerant L2-control problem for unknown dynamic systems. In a framework of active fault-tolerant control (FTC), three issues are addressed, including fault detection, controller reconfiguration for optimal guaranteed cost control, and tracking control. According to the data-driven form of observer-based residual generators, the system state is expressed in the form of the measured input-output data. On this basis, a model-free approach to L2 control of unknown linear time-invariant (LTI) discrete-time plants is given. To achieve tracking control, a design method for a pre-filter is also presented. With the aid of the aforementioned results and the input-output data-based time-varying value function approximation structure, a data-driven FTC scheme ensuring L2-gain properties is developed. To illustrate the effectiveness of the proposed methodology, two simulation examples are employed. PMID:27178710

  6. Fault-tolerant quantum computation with a soft-decision decoder for error correction and detection by teleportation

    PubMed Central

    Goto, Hayato; Uchikawa, Hironori

    2013-01-01

    Fault-tolerant quantum computation with quantum error-correcting codes has been considerably developed over the past decade. However, there are still difficult issues, particularly on the resource requirement. For further improvement of fault-tolerant quantum computation, here we propose a soft-decision decoder for quantum error correction and detection by teleportation. This decoder can achieve almost optimal performance for the depolarizing channel. Applying this decoder to Knill's C4/C6 scheme for fault-tolerant quantum computation, which is one of the best schemes so far and relies heavily on error correction and detection by teleportation, we dramatically improve its performance. This leads to substantial reduction of resources. PMID:23784512

  7. Fault-tolerant quantum computation with a soft-decision decoder for error correction and detection by teleportation.

    PubMed

    Goto, Hayato; Uchikawa, Hironori

    2013-01-01

    Fault-tolerant quantum computation with quantum error-correcting codes has been considerably developed over the past decade. However, there are still difficult issues, particularly on the resource requirement. For further improvement of fault-tolerant quantum computation, here we propose a soft-decision decoder for quantum error correction and detection by teleportation. This decoder can achieve almost optimal performance for the depolarizing channel. Applying this decoder to Knill's C4/C6 scheme for fault-tolerant quantum computation, which is one of the best schemes so far and relies heavily on error correction and detection by teleportation, we dramatically improve its performance. This leads to substantial reduction of resources. PMID:23784512

  8. Influence of slot number and pole number in fault-tolerant brushless dc motors having unequal tooth widths

    NASA Astrophysics Data System (ADS)

    Ishak, D.; Zhu, Z. Q.; Howe, D.

    2005-05-01

    The electromagnetic performance of fault-tolerant three-phase permanent magnet brushless dc motors, in which the wound teeth are wider than the unwound teeth and their tooth tips span approximately one pole pitch and which have similar numbers of slots and poles, is investigated. It is shown that they have a more trapezoidal phase back-emf wave form, a higher torque capability, and a lower torque ripple than similar fault-tolerant machines with equal tooth widths. However, these benefits gradually diminish as the pole number is increased, due to the effect of interpole leakage flux.

  9. Disturbance observer based fault estimation and dynamic output feedback fault tolerant control for fuzzy systems with local nonlinear models.

    PubMed

    Han, Jian; Zhang, Huaguang; Wang, Yingchun; Liu, Yang

    2015-11-01

    This paper addresses the problems of fault estimation (FE) and fault tolerant control (FTC) for fuzzy systems with local nonlinear models, external disturbances, sensor and actuator faults, simultaneously. Disturbance observer (DO) and FE observer are designed, simultaneously. Compared with the existing results, the proposed observer is with a wider application range. Using the estimation information, a novel fuzzy dynamic output feedback fault tolerant controller (DOFFTC) is designed. The controller can be used for the fuzzy systems with unmeasurable local nonlinear models, mismatched input disturbances, and measurement output affecting by sensor faults and disturbances. At last, the simulation shows the effectiveness of the proposed methods. PMID:26456728

  10. Spin waves cause non-linear friction

    NASA Astrophysics Data System (ADS)

    Magiera, M. P.; Brendel, L.; Wolf, D. E.; Nowak, U.

    2011-07-01

    Energy dissipation is studied for a hard magnetic tip that scans a soft magnetic substrate. The dynamics of the atomic moments are simulated by solving the Landau-Lifshitz-Gilbert (LLG) equation numerically. The local energy currents are analysed for the case of a Heisenberg spin chain taken as substrate. This leads to an explanation for the velocity dependence of the friction force: The non-linear contribution for high velocities can be attributed to a spin wave front pushed by the tip along the substrate.

  11. Non-Linear Dynamics of Saturn's Rings

    NASA Astrophysics Data System (ADS)

    Esposito, L. W.

    2015-10-01

    Non-linear processes can explain why Saturn's rings are so active and dynamic. Ring systems differ from simple linear systems in two significant ways: 1. They are systems of granular material: where particle-to-particle collisions dominate; thus a kinetic, not a fluid description needed. We find that stresses are strikingly inhomogeneous and fluctuations are large compared to equilibrium. 2. They are strongly forced by resonances: which drive a non-linear response, pushing the system across thresholds that lead to persistent states. Some of this non-linearity is captured in a simple Predator-Prey Model: Periodic forcing from the moon causes streamline crowding; This damps the relative velocity, and allows aggregates to grow. About a quarter phase later, the aggregates stir the system to higher relative velocity and the limit cycle repeats each orbit, with relative velocity ranging from nearly zero to a multiple of the orbit average: 2-10x is possible. Results of driven N-body systems by Stuart Robbins: Even unforced rings show large variations; Forcing triggers aggregation; Some limit cycles and phase lags seen, but not always as predicted by predator-prey model. Summary of Halo Results: A predatorprey model for ring dynamics produces transient structures like 'straw' that can explain the halo structure and spectroscopy: Cyclic velocity changes cause perturbed regions to reach higher collision speeds at some orbital phases, which preferentially removes small regolith particles; Surrounding particles diffuse back too slowly to erase the effect: this gives the halo morphology; This requires energetic collisions (v ≈ 10m/sec, with throw distances about 200km, implying objects of scale R ≈ 20km); We propose 'straw'. Transform to Duffing Eqn : With the coordinate transformation, z = M2/3, the Predator-Prey equations can be combined to form a single second-order differential equation with harmonic resonance forcing. Ring dynamics and history implications: Moon

  12. Non-Linear Dynamics of Saturn's Rings

    NASA Astrophysics Data System (ADS)

    Esposito, Larry W.

    2015-04-01

    Non-linear processes can explain why Saturn's rings are so active and dynamic. Ring systems differ from simple linear systems in two significant ways: 1. They are systems of granular material: where particle-to-particle collisions dominate; thus a kinetic, not a fluid description needed. We find that stresses are strikingly inhomogeneous and fluctuations are large compared to equilibrium. 2. They are strongly forced by resonances: which drive a non-linear response, pushing the system across thresholds that lead to persistent states. Some of this non-linearity is captured in a simple Predator-Prey Model: Periodic forcing from the moon causes streamline crowding; This damps the relative velocity, and allows aggregates to grow. About a quarter phase later, the aggregates stir the system to higher relative velocity and the limit cycle repeats each orbit, with relative velocity ranging from nearly zero to a multiple of the orbit average: 2-10x is possible Results of driven N-body systems by Stuart Robbins: Even unforced rings show large variations; Forcing triggers aggregation; Some limit cycles and phase lags seen, but not always as predicted by predator-prey model. Summary of Halo Results: A predator-prey model for ring dynamics produces transient structures like 'straw' that can explain the halo structure and spectroscopy: Cyclic velocity changes cause perturbed regions to reach higher collision speeds at some orbital phases, which preferentially removes small regolith particles; Surrounding particles diffuse back too slowly to erase the effect: this gives the halo morphology; This requires energetic collisions (v ≈ 10m/sec, with throw distances about 200km, implying objects of scale R ≈ 20km); We propose 'straw'. Transform to Duffing Eqn : With the coordinate transformation, z = M2/3, the Predator-Prey equations can be combined to form a single second-order differential equation with harmonic resonance forcing. Ring dynamics and history implications: Moon

  13. Non-linear Models for Longitudinal Data

    PubMed Central

    Serroyen, Jan; Molenberghs, Geert; Verbeke, Geert; Davidian, Marie

    2009-01-01

    While marginal models, random-effects models, and conditional models are routinely considered to be the three main modeling families for continuous and discrete repeated measures with linear and generalized linear mean structures, respectively, it is less common to consider non-linear models, let alone frame them within the above taxonomy. In the latter situation, indeed, when considered at all, the focus is often exclusively on random-effects models. In this paper, we consider all three families, exemplify their great flexibility and relative ease of use, and apply them to a simple but illustrative set of data on tree circumference growth of orange trees. PMID:20160890

  14. Initial Fault Tolerance and Autonomy Results for Autonomous On-board Processing of Hyperspectral Imaging

    NASA Astrophysics Data System (ADS)

    French, M.; Walters, J.; Zick, K.

    2011-12-01

    By developing Radiation Hardening by Software (RHBSW) techniques leveraged from the High Performance Computing community, our work seeks to deliver radiation tolerant, high performance System on a Chip (SoC) processors to the remote sensing community. This SoC architecture is uniquely suited to both handle high performance signal processing tasks, as well as autonomous agent processing. This allows situational awareness to be developed in-situ, resulting in a 10-100x decrease in processing latency, which directly translates into more science experiments conducted per day and a more thorough, timely analysis of captured data. With the increase in the amount of computational throughput made possible by commodity high performance processors and low overhead fault tolerance, new applications can be considered for on-board processing. A high performance and low overhead fault tolerance strategy targeting scientific applications on the SpaceCube 1.0 platform has been enhanced with initial results showing an order of magnitude increase in Mean Time Between Data Error and a complete elimination of processor hangs. Initial study of representative Hyperspectral applications also proves promising due to high levels of data parallelism and fine grained parallelism achievable within FPGA System on a Chip architectures enabled by our RHBSW techniques. To demonstrate the kinds of capabilities these fault tolerance approaches yield, the team focused on applications representative of the Decadal Survey HyspIRI mission, which uses high throughput Thermal Infrared Scanner (132 Mbps) and Hyperspectral Visibe ShortWave InfraRed (804 Mbps) instruments, while having only a 15 Mbps downlink channel. This mission provides a great many use scenarios for onboard processing, from high compression algorithms, to pre-processing and selective download of high priority images, to full on-board classification. This paper focuses on recent efforts which revolve around developing a fault emulator

  15. Fault-tolerant logical gates in quantum error-correcting codes

    NASA Astrophysics Data System (ADS)

    Pastawski, Fernando; Yoshida, Beni

    2015-01-01

    Recently, S. Bravyi and R. König [Phys. Rev. Lett. 110, 170503 (2013), 10.1103/PhysRevLett.110.170503] have shown that there is a trade-off between fault-tolerantly implementable logical gates and geometric locality of stabilizer codes. They consider locality-preserving operations which are implemented by a constant-depth geometrically local circuit and are thus fault tolerant by construction. In particular, they show that, for local stabilizer codes in D spatial dimensions, locality-preserving gates are restricted to a set of unitary gates known as the D th level of the Clifford hierarchy. In this paper, we explore this idea further by providing several extensions and applications of their characterization to qubit stabilizer and subsystem codes. First, we present a no-go theorem for self-correcting quantum memory. Namely, we prove that a three-dimensional stabilizer Hamiltonian with a locality-preserving implementation of a non-Clifford gate cannot have a macroscopic energy barrier. This result implies that non-Clifford gates do not admit such implementations in Haah's cubic code and Michnicki's welded code. Second, we prove that the code distance of a D -dimensional local stabilizer code with a nontrivial locality-preserving m th -level Clifford logical gate is upper bounded by O (LD +1 -m) . For codes with non-Clifford gates (m >2 ), this improves the previous best bound by S. Bravyi and B. Terhal [New. J. Phys. 11, 043029 (2009), 10.1088/1367-2630/11/4/043029]. Topological color codes, introduced by H. Bombin and M. A. Martin-Delgado [Phys. Rev. Lett. 97, 180501 (2006), 10.1103/PhysRevLett.97.180501; Phys. Rev. Lett. 98, 160502 (2007), 10.1103/PhysRevLett.98.160502; Phys. Rev. B 75, 075103 (2007), 10.1103/PhysRevB.75.075103], saturate the bound for m =D . Third, we prove that the qubit erasure threshold for codes with a nontrivial transversal m th -level Clifford logical gate is upper bounded by 1 /m . This implies that no family of fault-tolerant codes with

  16. Fault-tolerance and thermal characteristics of quantum-dot cellular automata devices

    NASA Astrophysics Data System (ADS)

    Anduwan, G. A.; Padgett, B. D.; Kuntzman, M.; Hendrichsen, M. K.; Sturzu, I.; Khatun, M.; Tougaw, P. D.

    2010-06-01

    We present fault tolerant properties of various quantum-dot cellular automata (QCA) devices. Effects of temperatures and dot displacements on the operation of the fundamental devices such as a binary wire, logical gates, a crossover, and an exclusive OR (XOR) have been investigated. A Hubbard-type Hamiltonian and intercellular Hartree approximation have been used for modeling, and a uniform random distribution has been implemented for the defect simulations. The breakdown characteristics of all the devices are almost the same except the crossover. Results show that the success of any device is significantly dependent on both the fabrication defects and temperatures. We have observed unique characteristic features of the crossover. It is highly sensitive to defects of any magnitude. Results show that the presence of a crossover in a XOR design is a major factor for its failure. The effects of temperature and defects in the crossover device are pronounced and have significant impact on larger and complicated QCA devices.

  17. On the design of fault-tolerant two-dimensional systolic arrays for yield enhancement

    SciTech Connect

    Kim, J.H.; Reddy, S.M.

    1989-04-01

    The continuing growth of interest in systolic arrays poses the problem of ensuring an acceptable yield. In this paper, the authors propose a unified approach to the design of fault-tolerant systolic arrays incorporating design for testability, a testing scheme, a reconfiguration algorithm, time complexity analysis of the proposed reconfiguration algorithm, and yield analysis. A main feature of the proposed designs is that multiple PE's in a 2-D array can be tested simultaneously, thus reducing the testing time significantly. Another feature is that with introduction of delay registers, the proposed reconfiguration algorithm reconfigures a faulty 2-D systolic array into a fault-free array without reducing throughput. The overall aim of this paper is to provide a design for a 2-D systolic array that produces high yield in VLSI/WSI implementations.

  18. Analysis and design of algorithm-based fault-tolerant systems

    NASA Technical Reports Server (NTRS)

    Nair, V. S. Sukumaran

    1990-01-01

    An important consideration in the design of high performance multiprocessor systems is to ensure the correctness of the results computed in the presence of transient and intermittent failures. Concurrent error detection and correction have been applied to such systems in order to achieve reliability. Algorithm Based Fault Tolerance (ABFT) was suggested as a cost-effective concurrent error detection scheme. The research was motivated by the complexity involved in the analysis and design of ABFT systems. To that end, a matrix-based model was developed and, based on that, algorithms for both the design and analysis of ABFT systems are formulated. These algorithms are less complex than the existing ones. In order to reduce the complexity further, a hierarchical approach is developed for the analysis of large systems.

  19. An extension to Schneider's general paradigm for fault-tolerant clock synchronization

    NASA Technical Reports Server (NTRS)

    Miner, Paul S.

    1992-01-01

    In 1987, Schneider presented a general paradigm that provides a single proof of a number of fault tolerant clock synchronization algorithms. His proof was subsequently subjected to the rigor of mechanical verification by Shankar. However, both Schneider and Shankar assumed a condition Shankar refers to as a bounded delay. This condition states that the elapsed time between synchronization events (i.e., the time that the local process applies an adjustment to its logical clock) is bounded. This property is really a result of the algorithm and should not be assumed in a proof of correctness. This paper remedies this by providing a proof of this property in the context of the general paradigm proposed by Schneider. The argument given is a generalization of Welch and Lynch's proof of a related property for their algorithm.

  20. Diagnosing a Failed Proof in Fault-Tolerance: A Disproving Challenge Problem

    NASA Technical Reports Server (NTRS)

    Pike, Lee; Miner, Paul; Torres-Pomales, Wilfredo

    2006-01-01

    This paper proposes a challenge problem in disproving. We describe a fault-tolerant distributed protocol designed at NASA for use in a fly-by-wire system for next-generation commercial aircraft. An early design of the protocol contains a subtle bug that is highly unlikely to be caught in fault injection testing. We describe a failed proof of the protocol's correctness in a mechanical theorem prover (PVS) with a complex unfinished proof conjecture. We use a model checking suite (SAL) to generate a concrete counterexample to the unproven conjecture to demonstrate the existence of a bug. However, we argue that the effort required in our approach is too high and propose what conditions a better solution would satisfy. We carefully describe the protocol and bug to provide a challenging but feasible case study for disproving research.

  1. Fault tolerance techniques to assure data integrity in high-volume PACS image archives

    NASA Astrophysics Data System (ADS)

    He, Yutao; Huang, Lu J.; Valentino, Daniel J.; Wingate, W. Keith; Avizienis, Algirdas

    1995-05-01

    Picture archiving and communication systems (PACS) perform the systematic acquisition, archiving, and presentation of large quantities of radiological image and text data. In the UCLA Radiology PACS, for example, the volume of image data archived currently exceeds 2500 gigabytes. Furthermore, the distributed heterogeneous PACS is expected to have near real-time response, be continuously available, and assure the integrity and privacy of patient data. The off-the-shelf subsystems that compose the current PACS cannot meet these expectations; therefore fault tolerance techniques had to be incorporated into the system. This paper is to report our first-step efforts towards the goal and is organized as follows: First we discuss data integrity and identify fault classes under the PACS operational environment, then we describe auditing and accounting schemes developed for error-detection and analyze operational data collected. Finally, we outline plans for future research.

  2. Reliability and coverage analysis of non-repairable fault-tolerant memory systems

    NASA Technical Reports Server (NTRS)

    Cox, G. W.; Carroll, B. D.

    1976-01-01

    A method was developed for the construction of probabilistic state-space models for nonrepairable systems. Models were developed for several systems which achieved reliability improvement by means of error-coding, modularized sparing, massive replication and other fault-tolerant techniques. From the models developed, sets of reliability and coverage equations for the systems were developed. Comparative analyses of the systems were performed using these equation sets. In addition, the effects of varying subunit reliabilities on system reliability and coverage were described. The results of these analyses indicated that a significant gain in system reliability may be achieved by use of combinations of modularized sparing, error coding, and software error control. For sufficiently reliable system subunits, this gain may far exceed the reliability gain achieved by use of massive replication techniques, yet result in a considerable saving in system cost.

  3. Performance analysis of fault-tolerant systems in parallel execution of conversations

    NASA Technical Reports Server (NTRS)

    Kim, K. H.; Heu, Shin; Yang, Seung M.

    1989-01-01

    The execution overhead inherent in the conversation scheme, which is a scheme for realizing fault-tolerant cooperating processes free of the domino effect, is analyzed. Multiprocessor/multicomputer systems capable of parallel execution of conversation components are considered and a queuing network model of such systems is adopted. Based on the queuing model, various performance indicators, including system throughput, average number of processors idling inside a conversation due to the synchronization required, and average time spent in the conversation, have been evaluated numerically for several application environments. The numeric results are discussed and several essential performance characteristics of the conversation scheme are derived. For example, when the number of participant processes is not large, say less than six, the system performance is highly affected by the synchronization required on the processes in a conversation, and not so much by the probability of acceptance-test failure.

  4. Development of N-version software samples for an experiment in software fault tolerance

    NASA Technical Reports Server (NTRS)

    Lauterbach, L.

    1987-01-01

    The report documents the task planning and software development phases of an effort to obtain twenty versions of code independently designed and developed from a common specification. These versions were created for use in future experiments in software fault tolerance, in continuation of the experimental series underway at the Systems Validation Methods Branch (SVMB) at NASA Langley Research Center. The 20 versions were developed under controlled conditions at four U.S. universities, by 20 teams of two researchers each. The versions process raw data from a modified Redundant Strapped Down Inertial Measurement Unit (RSDIMU). The specifications, and over 200 questions submitted by the developers concerning the specifications, are included as appendices to this report. Design documents, and design and code walkthrough reports for each version, were also obtained in this task for use in future studies.

  5. A fault-tolerant voltage measurement method for series connected battery packs

    NASA Astrophysics Data System (ADS)

    Xia, Bing; Mi, Chris

    2016-03-01

    This paper proposes a fault-tolerant voltage measurement method for battery management systems. Instead of measuring the voltage of individual cells, the proposed method measures the voltage sum of multiple battery cells without additional voltage sensors. A matrix interpretation is developed to demonstrate the viability of the proposed sensor topology to distinguish between sensor faults and cell faults. A methodology is introduced to isolate sensor and cell faults by locating abnormal signals. A measurement electronic circuit is proposed to implement the design concept. Simulation and experiment results support the mathematical analysis and validate the feasibility and robustness of the proposed method. In addition, the measurement problem is generalized and the condition for valid sensor topology is discovered. The tuning of design parameters are analyzed based on fault detection reliability and noise levels.

  6. Dynamical jumping real-time fault-tolerant routing protocol for wireless sensor networks.

    PubMed

    Wu, Guowei; Lin, Chi; Xia, Feng; Yao, Lin; Zhang, He; Liu, Bing

    2010-01-01

    In time-critical wireless sensor network (WSN) applications, a high degree of reliability is commonly required. A dynamical jumping real-time fault-tolerant routing protocol (DMRF) is proposed in this paper. Each node utilizes the remaining transmission time of the data packets and the state of the forwarding candidate node set to dynamically choose the next hop. Once node failure, network congestion or void region occurs, the transmission mode will switch to jumping transmission mode, which can reduce the transmission time delay, guaranteeing the data packets to be sent to the destination node within the specified time limit. By using feedback mechanism, each node dynamically adjusts the jumping probabilities to increase the ratio of successful transmission. Simulation results show that DMRF can not only efficiently reduce the effects of failure nodes, congestion and void region, but also yield higher ratio of successful transmission, smaller transmission delay and reduced number of control packets. PMID:22294933

  7. Fault-tolerance of a neural network solving the traveling salesman problem

    NASA Technical Reports Server (NTRS)

    Protzel, P.; Palumbo, D.; Arras, M.

    1989-01-01

    This study presents the results of a fault-injection experiment that stimulates a neural network solving the Traveling Salesman Problem (TSP). The network is based on a modified version of Hopfield's and Tank's original method. We define a performance characteristic for the TSP that allows an overall assessment of the solution quality for different city-distributions and problem sizes. Five different 10-, 20-, and 30- city cases are sued for the injection of up to 13 simultaneous stuck-at-0 and stuck-at-1 faults. The results of more than 4000 simulation-runs show the extreme fault-tolerance of the network, especially with respect to stuck-at-0 faults. One possible explanation for the overall surprising result is the redundancy of the problem representation.

  8. Development and evaluation of a Fault-Tolerant Multiprocessor (FTMP) computer. Volume 2: FTMP software

    NASA Technical Reports Server (NTRS)

    Lala, J. H.; Smith, T. B., III

    1983-01-01

    The software developed for the Fault-Tolerant Multiprocessor (FTMP) is described. The FTMP executive is a timer-interrupt driven dispatcher that schedules iterative tasks which run at 3.125, 12.5, and 25 Hz. Major tasks which run under the executive include system configuration control, flight control, and display. The flight control task includes autopilot and autoland functions for a jet transport aircraft. System Displays include status displays of all hardware elements (processors, memories, I/O ports, buses), failure log displays showing transient and hard faults, and an autopilot display. All software is in a higher order language (AED, an ALGOL derivative). The executive is a fully distributed general purpose executive which automatically balances the load among available processor triads. Provisions for graceful performance degradation under processing overload are an integral part of the scheduling algorithms.

  9. Fault tolerance in an inner-outer solver: A GVR-enabled case study

    SciTech Connect

    Zhang, Ziming; Chien, Andrew A.; Teranishi, Keita

    2015-04-18

    Resilience is a major challenge for large-scale systems. It is particularly important for iterative linear solvers, since they take much of the time of many scientific applications. We show that single bit flip errors in the Flexible GMRES iterative linear solver can lead to high computational overhead or even failure to converge to the right answer. Informed by these results, we design and evaluate several strategies for fault tolerance in both inner and outer solvers appropriate across a range of error rates. We implement them, extending Trilinos’ solver library with the Global View Resilience (GVR) programming model, which provides multi-stream snapshots, multi-version data structures with portable and rich error checking/recovery. Lastly, experimental results validate correct execution with low performance overhead under varied error conditions.

  10. Fault tolerant integrated inertial navigation/global positioning systems for next generation spacecraft

    NASA Astrophysics Data System (ADS)

    Miller, Hugh; Hilts, David A.

    The authors address the requirements, benefits, and mitigation of risks to adapt a commercial Hexad fault-tolerant inertial navigation/global positioning system (FT IN/GPS) for use in next-generation spacecraft. Next-generation requirements are examined to determine whether a high production base system can meet autonomous, reliable, and low-cost requirements for future spacecraft. The major benefits are the combining and replacement of functions, the reduction of unscheduled maintenance and operations costs, and a higher probability of mission success. The design, development, and production risks are mitigated by the long-term commercial production schedule for the Boeing 777 air data inertial reference unit (ADIRU) which begins in the mid-1990s. The conclusion is that a strapdown ring laser gyro (RLG) Hexad FT IN/GPS is the preferred integrated navigation and control system for next-generation vehicles.

  11. Implementing a strand of a scalable fault-tolerant quantum computing fabric.

    PubMed

    Chow, Jerry M; Gambetta, Jay M; Magesan, Easwar; Abraham, David W; Cross, Andrew W; Johnson, B R; Masluk, Nicholas A; Ryan, Colm A; Smolin, John A; Srinivasan, Srikanth J; Steffen, M

    2014-01-01

    With favourable error thresholds and requiring only nearest-neighbour interactions on a lattice, the surface code is an error-correcting code that has garnered considerable attention. At the heart of this code is the ability to perform a low-weight parity measurement of local code qubits. Here we demonstrate high-fidelity parity detection of two code qubits via measurement of a third syndrome qubit. With high-fidelity gates, we generate entanglement distributed across three superconducting qubits in a lattice where each code qubit is coupled to two bus resonators. Via high-fidelity measurement of the syndrome qubit, we deterministically entangle the code qubits in either an even or odd parity Bell state, conditioned on the syndrome qubit state. Finally, to fully characterize this parity readout, we develop a measurement tomography protocol. The lattice presented naturally extends to larger networks of qubits, outlining a path towards fault-tolerant quantum computing. PMID:24958160

  12. Low cost management of replicated data in fault-tolerant distributed systems

    NASA Technical Reports Server (NTRS)

    Joseph, Thomas A.; Birman, Kenneth P.

    1990-01-01

    Many distributed systems replicate data for fault tolerance or availability. In such systems, a logical update on a data item results in a physical update on a number of copies. The synchronization and communication required to keep the copies of replicated data consistent introduce a delay when operations are performed. A technique is described that relaxes the usual degree of synchronization, permitting replicated data items to be updated concurrently with other operations, while at the same time ensuring that correctness is not violated. The additional concurrency thus obtained results in better response time when performing operations on replicated data. How this technique performs in conjunction with a roll-back and a roll-forward failure recovery mechanism is also discussed.

  13. Fault-free validation of a fault-tolerant multiprocessor: Baseline experiments and workoad implementation

    NASA Technical Reports Server (NTRS)

    Feather, F.; Siewiorek, D.; Segall, Z.

    1986-01-01

    In the future, aircraft employing active control technology must use highly reliable multiprocessors in order to achieve flight safety. Such computers must be experimentally validated before they are deployed. This project outlines a methodology for doing fault-free validation of reliable multiprocessors. The methodology begins with baseline experiments, which test single phenomenon. As experiments progress, tools for performance testing are developed. This report presents the results of interrupt baseline experiments performed on the Fault-Tolerant Multiprocessor (FTMP) at NASA-Langley's AIRLAB. Interrupt-causing excepting conditions were tested, and several were found to have unimplemented interrupt handling software while one had an unimplemented interrupt vector. A synthetic workload model for realtime multiprocessors is then developed as an application level performance analysis tool. Details of the workload implementation and calibration are presented. Both the experimental methodology and the synthetic workload model are general enough to be applicable to reliable multi-processors besides FTMP.

  14. Design of a fault tolerant airborne digital computer. Volume 2: Computational requirements and technology

    NASA Technical Reports Server (NTRS)

    Ratner, R. S.; Shapiro, E. B.; Zeidler, H. M.; Wahlstrom, S. E.; Clark, C. B.; Goldberg, J.

    1973-01-01

    This final report summarizes the work on the design of a fault tolerant digital computer for aircraft. Volume 2 is composed of two parts. Part 1 is concerned with the computational requirements associated with an advanced commercial aircraft. Part 2 reviews the technology that will be available for the implementation of the computer in the 1975-1985 period. With regard to the computation task 26 computations have been categorized according to computational load, memory requirements, criticality, permitted down-time, and the need to save data in order to effect a roll-back. The technology part stresses the impact of large scale integration (LSI) on the realization of logic and memory. Also considered was module interconnection possibilities so as to minimize fault propagation.

  15. Experimental fault-tolerant universal quantum gates with solid-state spins under ambient conditions.

    PubMed

    Rong, Xing; Geng, Jianpei; Shi, Fazhan; Liu, Ying; Xu, Kebiao; Ma, Wenchao; Kong, Fei; Jiang, Zhen; Wu, Yang; Du, Jiangfeng

    2015-01-01

    Quantum computation provides great speedup over its classical counterpart for certain problems. One of the key challenges for quantum computation is to realize precise control of the quantum system in the presence of noise. Control of the spin-qubits in solids with the accuracy required by fault-tolerant quantum computation under ambient conditions remains elusive. Here, we quantitatively characterize the source of noise during quantum gate operation and demonstrate strategies to suppress the effect of these. A universal set of logic gates in a nitrogen-vacancy centre in diamond are reported with an average single-qubit gate fidelity of 0.999952 and two-qubit gate fidelity of 0.992. These high control fidelities have been achieved at room temperature in naturally abundant (13)C diamond via composite pulses and an optimized control method. PMID:26602456

  16. Fault tolerant attitude sensing and force feedback control for unmanned aerial vehicles

    NASA Astrophysics Data System (ADS)

    Jagadish, Chirag

    Two aspects of an unmanned aerial vehicle are studied in this work. One is fault tolerant attitude determination and the other is to provide force feedback to the joy-stick of the UAV so as to prevent faulty inputs from the pilot. Determination of attitude plays an important role in control of aerial vehicles. One way of defining the attitude is through Euler angles. These angles can be determined based on the measurements of the projections of the gravity and earth magnetic fields on the three body axes of the vehicle. Attitude determination in unmanned aerial vehicles poses additional challenges due to limitations of space, payload, power and cost. Therefore it provides for almost no room for any bulky sensors or extra sensor hardware for backup and as such leaves no room for sensor fault issues either. In the face of these limitations, this study proposes a fault tolerant computing of Euler angles by utilizing multiple different computation methods, with each method utilizing a different subset of the available sensor measurement data. Twenty-five such methods have been presented in this document. The capability of computing the Euler angles in multiple ways provides a diversified redundancy required for fault tolerance. The proposed approach can identify certain sets of sensor failures and even separate the reference fields from the disturbances. A bank-to-turn maneuver of the NASA GTM UAV is used to demonstrate the fault tolerance provided by the proposed method as well as to demonstrate the method of determining the correct Euler angles despite interferences by inertial acceleration disturbances. Attitude computation is essential for stability. But as of today most UAVs are commanded remotely by human pilots. While basic stability control is entrusted to machine or the on-board automatic controller, overall guidance is usually with humans. It is therefore the pilot who sets the command/references through a joy-stick. While this is a good compromise between

  17. Fault detection, isolation and reconfiguration in FTMP Methods and experimental results. [fault tolerant multiprocessor

    NASA Technical Reports Server (NTRS)

    Lala, J. H.

    1983-01-01

    The Fault-Tolerant Multiprocessor (FTMP) is a highly reliable computer designed to meet a goal of 10 to the -10th failures per hour and built with the objective of flying an active-control transport aircraft. Fault detection, identification, and recovery software is described, and experimental results obtained by injecting faults in the pin level in the FTMP are presented. Over 21,000 faults were injected in the CPU, memory, bus interface circuits, and error detection, masking, and error reporting circuits of one LRU of the multiprocessor. Detection, isolation, and reconfiguration times were recorded for each fault, and the results were found to agree well with earlier assumptions made in reliability modeling.

  18. A discussion on sensor recovery techniques for fault tolerant multisensor schemes

    NASA Astrophysics Data System (ADS)

    Stoican, Florin; Olaru, Sorin; De Doná, José A.; Seron, María M.

    2014-08-01

    The present paper deals with the interplay between healthy and faulty sensor functioning in a multisensor scheme based on a switching control strategy. Fault tolerance guarantees have been recently obtained in this framework based upon the characterisation of invariant sets for state estimations in healthy and faulty functioning. A source of conservativeness of this approach is related to the issue of sensor recovery. A common working hypothesis has been to assume that once a sensor switches to faulty functioning it can no longer be used by the control mechanism even if at an ulterior moment it switches back to healthy functioning. In the current paper, we present necessary and sufficient conditions for the acknowledgement of sensor recovery and we propose and compare different techniques for the reintegration of sensors in the closed-loop decision-making mechanism.

  19. Reliability analysis of a fault-tolerant gas turbine control system: Final report

    SciTech Connect

    Young, R.

    1987-09-01

    This report documents the reliability analysis of the General Electric MKIV Speedtronic control system performed by ARINC Research Corporation under EPRI Research Project 2101-7. The MKIV control is a fault-tolerant microprocessor-based system employing triply redundant control processors and critical instrumentation. The analysis consisted of review of actual operating history of the MKIV controls at three operating facilities and tracks each system's reliability growth from initial startup through several years of system operation. An independent reliability assessment of the MKIV system was performed by obtaining event, failure, and outage data from three different utility installations, representing 11 unit years and 69,852 fired hours of operation. Reliability growth analyses were performed using Duane and AMSAA models to estimate MTBF expected for a mature system considering both failures and forced outages. The analyses concluded that the system will exceed the design reliability goal of one forced outage off line per unit per year.

  20. A direct approach to fault-tolerance in measurement-based quantum computation via teleportation

    NASA Astrophysics Data System (ADS)

    Silva, Marcus; Danos, Vincent; Kashefi, Elham; Ollivier, Harold

    2007-06-01

    We discuss a simple variant of the one-way quantum computing model (Raussendorf R and Briegel H-J 2001 Phys. Rev. Lett. 86 5188), called the Pauli measurement model, where measurements are restricted to be along the eigenbases of the Pauli X and Y operators, while qubits can be initially prepared both in the {|}{+_{\\pi\\over 4}}\\rangle:={1/\\sqrt{2}}({|}0\\rangle+\\e^{i(\\pi/4)}{|}{1}\\rangle) state and the usual {|}{+}\\rangle:={1/ \\sqrt{2}}({|}{0}\\rangle+{|}{1}\\rangle) state. We prove the universality of this quantum computation model, and establish a standardization procedure which permits all entanglement and state preparation to be performed at the beginning of computation. This leads us to develop a direct approach to fault-tolerance by simple transformations of the entanglement graph and preparation operations, while error correction is performed naturally via syndrome-extracting teleportations.

  1. Experimental fault-tolerant universal quantum gates with solid-state spins under ambient conditions

    PubMed Central

    Rong, Xing; Geng, Jianpei; Shi, Fazhan; Liu, Ying; Xu, Kebiao; Ma, Wenchao; Kong, Fei; Jiang, Zhen; Wu, Yang; Du, Jiangfeng

    2015-01-01

    Quantum computation provides great speedup over its classical counterpart for certain problems. One of the key challenges for quantum computation is to realize precise control of the quantum system in the presence of noise. Control of the spin-qubits in solids with the accuracy required by fault-tolerant quantum computation under ambient conditions remains elusive. Here, we quantitatively characterize the source of noise during quantum gate operation and demonstrate strategies to suppress the effect of these. A universal set of logic gates in a nitrogen-vacancy centre in diamond are reported with an average single-qubit gate fidelity of 0.999952 and two-qubit gate fidelity of 0.992. These high control fidelities have been achieved at room temperature in naturally abundant 13C diamond via composite pulses and an optimized control method. PMID:26602456

  2. Observer-based fault-tolerant control for a class of nonlinear networked control systems

    NASA Astrophysics Data System (ADS)

    Mahmoud, M. S.; Memon, A. M.; Shi, Peng

    2014-08-01

    This paper presents a fault-tolerant control (FTC) scheme for nonlinear systems which are connected in a networked control system. The nonlinear system is first transformed into two subsystems such that the unobservable part is affected by a fault and the observable part is unaffected. An observer is then designed which gives state estimates using a Luenberger observer and also estimates unknown parameter of the system; this helps in fault estimation. The FTC is applied in the presence of sampling due to the presence of a network in the loop. The controller gain is obtained using linear-quadratic regulator technique. The methodology is applied on a mechatronic system and the results show satisfactory performance.

  3. High Temperature, Permanent Magnet Biased, Fault Tolerant, Homopolar Magnetic Bearing Development

    NASA Technical Reports Server (NTRS)

    Palazzolo, Alan; Tucker, Randall; Kenny, Andrew; Kang, Kyung-Dae; Ghandi, Varun; Liu, Jinfang; Choi, Heeju; Provenza, Andrew

    2008-01-01

    This paper summarizes the development of a magnetic bearing designed to operate at 1,000 F. A novel feature of this high temperature magnetic bearing is its homopolar construction which incorporates state of the art high temperature, 1,000 F, permanent magnets. A second feature is its fault tolerance capability which provides the desired control forces with over one-half of the coils failed. The construction and design methodology of the bearing is outlined and test results are shown. The agreement between a 3D finite element, magnetic field based prediction for force is shown to be in good agreement with predictions at room and high temperature. A 5 axis test rig will be complete soon to provide a means to test the magnetic bearings at high temperature and speed.

  4. ANDY: A general, fault-tolerant tool for database searching oncomputer clusters

    SciTech Connect

    Smith, Andrew; Chandonia, John-Marc; Brenner, Steven E.

    2005-12-21

    Summary: ANDY (seArch coordination aND analYsis) is a set ofPerl programs and modules for distributing large biological databasesearches, and in general any sequence of commands, across the nodes of aLinux computer cluster. ANDY is compatible with several commonly usedDistributed Resource Management (DRM) systems, and it can be easilyextended to new DRMs. A distinctive feature of ANDY is the choice ofeither dedicated or fair-use operation: ANDY is almost as efficient assingle-purpose tools that require a dedicated cluster, but it runs on ageneral-purpose cluster along with any other jobs scheduled by a DRM.Other features include communication through named pipes for performance,flexible customizable routines for error-checking and summarizingresults, and multiple fault-tolerance mechanisms. Availability: ANDY isfreely available and may be obtained fromhttp://compbio.berkeley.edu/proj/andy; this site also containssupplemental data and figures and amore detailed overview of thesoftware.

  5. An Adaptive Fault-Tolerance Agent Running on Situation-Aware Environment

    NASA Astrophysics Data System (ADS)

    Kim, Soongohn; Ko, Eungnam

    The focus of situation-aware ubiquitous computing has increased lately. An example of situation-aware applications is a multimedia education system. Since ubiquitous applications need situation-aware middleware services and computing environment keeps changing as the applications change, it is challenging to detect errors and recover them in order to provide seamless services and avoid a single point of failure. This paper proposes an Adaptive Fault Tolerance Agent (AFTA) in situation-aware middleware framework and presents its simulation model of AFT-based agents. The strong point of this system is to detect and recover error automatically in case that the session's process comes to an end through a software error.

  6. Fault-tolerant strategies for an implantable centrifugal blood pump using a radially controlled magnetic bearing.

    PubMed

    Pai, Chi Nan; Shinshi, Tadahiko

    2011-10-01

    In our laboratory, an implantable centrifugal blood pump (CBP) with a two degrees-of-freedom radially controlled magnetic bearing (MB) to support the impeller without contact has been developed to assist the pumping function of the weakened heart ventricle. In order to maintain the function of the CBP after damage to the electromagnets (EMs) of the MB, fault-tolerant strategies for the CBP are proposed in this study. Using a redundant MB design, magnetic levitation of the impeller was maintained with damage to up to two out of a total of four EMs of the MB; with damage to three EMs, contact-free support of the impeller was achieved using hydrodynamic and electromagnetic forces; and with damage to all four EMs, the pump operating point, of 5 l/min against 100 mmHg, was achieved using the motor for rotation of the impeller, with contact between the impeller and the stator. PMID:21382738

  7. An experimental investigation of fault tolerant software structures in an avionics application

    NASA Technical Reports Server (NTRS)

    Caglayan, Alper K.; Eckhardt, Dave E., Jr.

    1989-01-01

    The objective of this experimental investigation is to compare the functional performance and software reliability of competing fault tolerant software structures utilizing software diversity. In this experiment, three versions of the redundancy management software for a skewed sensor array have been developed using three diverse failure detection and isolation algorithms and incorporated into various N-version, recovery block and hybrid software structures. The empirical results show that, for maximum functional performance improvement in the selected application domain, the results of diverse algorithms should be voted before being processed by multiple versions without enforced diversity. Results also suggest that when the reliability gain with an N-version structure is modest, recovery block structures are more feasible since higher reliability can be obtained using an acceptance check with a modest reliability.

  8. H∞ fault-tolerant control for time-varied actuator fault of nonlinear system

    NASA Astrophysics Data System (ADS)

    Liu, Chunsheng; Jiang, Bin

    2014-12-01

    This paper studies H∞ fault-tolerant control for a class of uncertain nonlinear systems subject to time-varied actuator faults. A radial basis function neural network is utilised to approximate the unknown nonlinear functions; an updating rule is designed to estimate on-line time-varied fault of actuator; and the controller with the states feedback and faults estimation is applied to compensate for the effects of fault and minimise H∞ performance criteria in order to get a desired H∞ disturbance rejection constraint. Sufficient conditions are derived, which guarantees that the closed-loop system is robustly stable and satisfies the H∞ performance in both normal and fault cases. In order to reduce computing cost, a simplified algorithm of matrix Riccati inequality is given. A spacecraft model is presented to demonstrate the effectiveness of the proposed methods.

  9. Methods and apparatuses for self-generating fault-tolerant keys in spread-spectrum systems

    DOEpatents

    Moradi, Hussein; Farhang, Behrouz; Subramanian, Vijayarangam

    2015-12-15

    Self-generating fault-tolerant keys for use in spread-spectrum systems are disclosed. At a communication device, beacon signals are received from another communication device and impulse responses are determined from the beacon signals. The impulse responses are circularly shifted to place a largest sample at a predefined position. The impulse responses are converted to a set of frequency responses in a frequency domain. The frequency responses are shuffled with a predetermined shuffle scheme to develop a set of shuffled frequency responses. A set of phase differences is determined as a difference between an angle of the frequency response and an angle of the shuffled frequency response at each element of the corresponding sets. Each phase difference is quantized to develop a set of secret-key quantized phases and a set of spreading codes is developed wherein each spreading code includes a corresponding phase of the set of secret-key quantized phases.

  10. Condition for fault-tolerant quantum computation with a cavity-QED scheme

    SciTech Connect

    Goto, Hayato; Ichimura, Kouichi

    2010-09-15

    A condition for fault-tolerant quantum computation (FTQC) with cavity schemes is discussed. It is shown that the condition is very hard if the standard error threshold of FTQC is simply applied. To relax the condition, we propose to combine the cavity-quantum-electrodynamics (QED) scheme proposed by Duan et al. [Phys. Rev. A 72, 032333 (2005)] and Xiao et al. [Phys. Rev. A 70, 042314 (2004)] with the recently proposed FTQC scheme with probabilistic two-qubit gates [Goto and Ichimura, Phys. Rev. A 80, 040303(R) (2009)]. It is shown that the condition for FTQC is dramatically relaxed compared to the case of the standard threshold. The optimization of the cavity-QED scheme is also discussed.

  11. Fault tolerance in an inner-outer solver: A GVR-enabled case study

    DOE PAGESBeta

    Zhang, Ziming; Chien, Andrew A.; Teranishi, Keita

    2015-04-18

    Resilience is a major challenge for large-scale systems. It is particularly important for iterative linear solvers, since they take much of the time of many scientific applications. We show that single bit flip errors in the Flexible GMRES iterative linear solver can lead to high computational overhead or even failure to converge to the right answer. Informed by these results, we design and evaluate several strategies for fault tolerance in both inner and outer solvers appropriate across a range of error rates. We implement them, extending Trilinos’ solver library with the Global View Resilience (GVR) programming model, which provides multi-streammore » snapshots, multi-version data structures with portable and rich error checking/recovery. Lastly, experimental results validate correct execution with low performance overhead under varied error conditions.« less

  12. Independent SCPS-TP development for fault-tolerant, end-to-end communication architectures

    NASA Astrophysics Data System (ADS)

    Edwards, E.; Lamorie, J.; Younghusband, D.; Brunet, C.; Hartman, L.

    2002-07-01

    A fully networked architecture provides for the distribution of computing elements, of all mission components, through the spacecraft. Each node is individually addressable through the network, and behaves as an independent entity. This level of communication also supports individualized Command and Data Handling (C&DH), as well as one-to-one transactions between spacecraft nodes and individual ground segment users. To be effective, fault-tolerance must be applied at the network data transport level, as well as the supporting layers below it. If the network provides fail-safe characteristics independent of the mission applications being executed, then developers need not build in their own systems to ensure network reliability. The Space Communications Protocol Standards (SCPS) were developed to provide robust communications in a space environment, while retaining compatibility with Internet data transport at the ground segment. Although SCPS is a standard of the Consultative Committee for Space Data Systems (CCSDS), the adoption of SCPS was initially delayed by US export regulations that prevented the distribution of reference code. This paper describes the development and test of a fully independent implementation of the SCSP Transport Protocol, SCPS-TP, which has been derived directly from the CCSDS specification. The performance of the protocol is described for a set of geostationary satellite tests, and these results will be compared with those derived from network simulation and laboratory emulation. The work is placed in the context of a comprehensive, fault-tolerant network that potentially surpasses the failsafe performance of a traditional spacecraft control system under similar circumstances.

  13. Superconducting quantum circuits at the surface code threshold for fault tolerance.

    PubMed

    Barends, R; Kelly, J; Megrant, A; Veitia, A; Sank, D; Jeffrey, E; White, T C; Mutus, J; Fowler, A G; Campbell, B; Chen, Y; Chen, Z; Chiaro, B; Dunsworth, A; Neill, C; O'Malley, P; Roushan, P; Vainsencher, A; Wenner, J; Korotkov, A N; Cleland, A N; Martinis, John M

    2014-04-24

    A quantum computer can solve hard problems, such as prime factoring, database searching and quantum simulation, at the cost of needing to protect fragile quantum states from error. Quantum error correction provides this protection by distributing a logical state among many physical quantum bits (qubits) by means of quantum entanglement. Superconductivity is a useful phenomenon in this regard, because it allows the construction of large quantum circuits and is compatible with microfabrication. For superconducting qubits, the surface code approach to quantum computing is a natural choice for error correction, because it uses only nearest-neighbour coupling and rapidly cycled entangling gates. The gate fidelity requirements are modest: the per-step fidelity threshold is only about 99 per cent. Here we demonstrate a universal set of logic gates in a superconducting multi-qubit processor, achieving an average single-qubit gate fidelity of 99.92 per cent and a two-qubit gate fidelity of up to 99.4 per cent. This places Josephson quantum computing at the fault-tolerance threshold for surface code error correction. Our quantum processor is a first step towards the surface code, using five qubits arranged in a linear array with nearest-neighbour coupling. As a further demonstration, we construct a five-qubit Greenberger-Horne-Zeilinger state using the complete circuit and full set of gates. The results demonstrate that Josephson quantum computing is a high-fidelity technology, with a clear path to scaling up to large-scale, fault-tolerant quantum circuits. PMID:24759412

  14. ECFS: A decentralized, distributed and fault-tolerant FUSE filesystem for the LHCb online farm

    NASA Astrophysics Data System (ADS)

    Rybczynski, Tomasz; Bonaccorsi, Enrico; Neufeld, Niko

    2014-06-01

    The LHCb experiment records millions of proton collisions every second, but only a fraction of them are useful for LHCb physics. In order to filter out the "bad events" a large farm of x86-servers (~2000 nodes) has been put in place. These servers boot from and run from NFS, however they use their local disk to temporarily store data, which cannot be processed in real-time ("data-deferring"). These events are subsequently processed, when there are no live-data coming in. The effective CPU power is thus greatly increased. This gain in CPU power depends critically on the availability of the local disks. For cost and power-reasons, mirroring (RAID-1) is not used, leading to a lot of operational headache with failing disks and disk-errors or server failures induced by faulty disks. To mitigate these problems and increase the reliability of the LHCb farm, while at same time keeping cost and power-consumption low, an extensive research and study of existing highly available and distributed file systems has been done. While many distributed file systems are providing reliability by "file replication", none of the evaluated ones supports erasure algorithms. A decentralised, distributed and fault-tolerant "write once read many" file system has been designed and implemented as a proof of concept providing fault tolerance without using expensive - in terms of disk space - file replication techniques and providing a unique namespace as a main goals. This paper describes the design and the implementation of the Erasure Codes File System (ECFS) and presents the specialised FUSE interface for Linux. Depending on the encoding algorithm ECFS will use a certain number of target directories as a backend to store the segments that compose the encoded data. When target directories are mounted via nfs/autofs - ECFS will act as a file-system over network/block-level raid over multiple servers.

  15. Reliable and Fault-Tolerant Software-Defined Network Operations Scheme for Remote 3D Printing

    NASA Astrophysics Data System (ADS)

    Kim, Dongkyun; Gil, Joon-Min

    2015-03-01

    The recent wide expansion of applicable three-dimensional (3D) printing and software-defined networking (SDN) technologies has led to a great deal of attention being focused on efficient remote control of manufacturing processes. SDN is a renowned paradigm for network softwarization, which has helped facilitate remote manufacturing in association with high network performance, since SDN is designed to control network paths and traffic flows, guaranteeing improved quality of services by obtaining network requests from end-applications on demand through the separated SDN controller or control plane. However, current SDN approaches are generally focused on the controls and automation of the networks, which indicates that there is a lack of management plane development designed for a reliable and fault-tolerant SDN environment. Therefore, in addition to the inherent advantage of SDN, this paper proposes a new software-defined network operations center (SD-NOC) architecture to strengthen the reliability and fault-tolerance of SDN in terms of network operations and management in particular. The cooperation and orchestration between SDN and SD-NOC are also introduced for the SDN failover processes based on four principal SDN breakdown scenarios derived from the failures of the controller, SDN nodes, and connected links. The abovementioned SDN troubles significantly reduce the network reachability to remote devices (e.g., 3D printers, super high-definition cameras, etc.) and the reliability of relevant control processes. Our performance consideration and analysis results show that the proposed scheme can shrink operations and management overheads of SDN, which leads to the enhancement of responsiveness and reliability of SDN for remote 3D printing and control processes.

  16. Implementation of a fault-tolerant PACS over a grid architecture

    NASA Astrophysics Data System (ADS)

    Gutierrez, Marco A.; Santos, Carlos S.; Moreno, Ramon A.; Kobayashi, Luiz O. M.; Furuie, Sergio S.; Freire, Sergio M.; Floriano, Daniel B.; Oliveira, Carlos S.; João, Mario, Jr.; Gismondi, Ronaldo C.

    2006-03-01

    The goal of this paper is to describe the experience of the Heart Institute (InCor) on the implementation of a fault-tolerant Picture Archiving and Communication System (PACS) over a data grid architecture. The system is centered on a DICOM image server with a distributed storage and failover capability. The proposed data grid architecture is deployed over a gigabit Ethernet network which integrates the two main public Hospitals in Sao Paulo and one University Hospital in Rio de Janeiro, both in Brazil. Distributed data storage in the three sites is managed by the Storage Resource Broker (SRB) developed at the University of California at San Diego. The architecture of the implemented PACS image server can be divided into two major functional modules: a) DICOM protocol handler; b) Distributed storage of image data. Fault-tolerance is achieved by injecting redundancy into the modules, which are provided with failover capability. The DICOM protocol handler comprises a series of server processes hosted by different machines and a load-balancer node which distributes the computational load among the servers. The load balancer is provided with a backup node which is triggered in case of failure, thus assuring the continuous operation of the system. Distributed storage of image data is implemented as a thin software layer over the SRB. Image data are replicated at the three sites, so the PACS server is able to retrieve image data even when only a single site is available. A prototype of the DICOM image server has been deployed in this environment and is currently under evaluation.

  17. Non-Linear Dynamics of Saturn's Rings

    NASA Astrophysics Data System (ADS)

    Esposito, L. W.

    2015-12-01

    Non-linear processes can explain why Saturn's rings are so active and dynamic. Some of this non-linearity is captured in a simple Predator-Prey Model: Periodic forcing from the moon causes streamline crowding; This damps the relative velocity, and allows aggregates to grow. About a quarter phase later, the aggregates stir the system to higher relative velocity and the limit cycle repeats each orbit, with relative velocity ranging from nearly zero to a multiple of the orbit average: 2-10x is possible. Summary of Halo Results: A predator-prey model for ring dynamics produces transient structures like 'straw' that can explain the halo structure and spectroscopy: Cyclic velocity changes cause perturbed regions to reach higher collision speeds at some orbital phases, which preferentially removes small regolith particles; Surrounding particles diffuse back too slowly to erase the effect: this gives the halo morphology; This requires energetic collisions (v ≈ 10m/sec, with throw distances about 200km, implying objects of scale R ≈ 20km); We propose 'straw', as observed ny Cassini cameras. Transform to Duffing Eqn : With the coordinate transformation, z = M2/3, the Predator-Prey equations can be combined to form a single second-order differential equation with harmonic resonance forcing. Ring dynamics and history implications: Moon-triggered clumping at perturbed regions in Saturn's rings creates both high velocity dispersion and large aggregates at these distances, explaining both small and large particles observed there. This confirms the triple architecture of ring particles: a broad size distribution of particles; these aggregate into temporary rubble piles; coated by a regolith of dust. We calculate the stationary size distribution using a cell-to-cell mapping procedure that converts the phase-plane trajectories to a Markov chain. Approximating the Markov chain as an asymmetric random walk with reflecting boundaries allows us to determine the power law index from

  18. Verification of fault-tolerant clock synchronization systems. M.S. Thesis - College of William and Mary, 1992

    NASA Technical Reports Server (NTRS)

    Miner, Paul S.

    1993-01-01

    A critical function in a fault-tolerant computer architecture is the synchronization of the redundant computing elements. The synchronization algorithm must include safeguards to ensure that failed components do not corrupt the behavior of good clocks. Reasoning about fault-tolerant clock synchronization is difficult because of the possibility of subtle interactions involving failed components. Therefore, mechanical proof systems are used to ensure that the verification of the synchronization system is correct. In 1987, Schneider presented a general proof of correctness for several fault-tolerant clock synchronization algorithms. Subsequently, Shankar verified Schneider's proof by using the mechanical proof system EHDM. This proof ensures that any system satisfying its underlying assumptions will provide Byzantine fault-tolerant clock synchronization. The utility of Shankar's mechanization of Schneider's theory for the verification of clock synchronization systems is explored. Some limitations of Shankar's mechanically verified theory were encountered. With minor modifications to the theory, a mechanically checked proof is provided that removes these limitations. The revised theory also allows for proven recovery from transient faults. Use of the revised theory is illustrated with the verification of an abstract design of a clock synchronization system.

  19. Sliding mode fault detection and fault-tolerant control of smart dampers in semi-active control of building structures

    NASA Astrophysics Data System (ADS)

    Yeganeh Fallah, Arash; Taghikhany, Touraj

    2015-12-01

    Recent decades have witnessed much interest in the application of active and semi-active control strategies for seismic protection of civil infrastructures. However, the reliability of these systems is still in doubt as there remains the possibility of malfunctioning of their critical components (i.e. actuators and sensors) during an earthquake. This paper focuses on the application of the sliding mode method due to the inherent robustness of its fault detection observer and fault-tolerant control. The robust sliding mode observer estimates the state of the system and reconstructs the actuators’ faults which are used for calculating a fault distribution matrix. Then the fault-tolerant sliding mode controller reconfigures itself by the fault distribution matrix and accommodates the fault effect on the system. Numerical simulation of a three-story structure with magneto-rheological dampers demonstrates the effectiveness of the proposed fault-tolerant control system. It was shown that the fault-tolerant control system maintains the performance of the structure at an acceptable level in the post-fault case.

  20. An improved fault-tolerant control scheme for PWM inverter-fed induction motor-based EVs.

    PubMed

    Tabbache, Bekheïra; Benbouzid, Mohamed; Kheloui, Abdelaziz; Bourgeot, Jean-Matthieu; Mamoune, Abdeslam

    2013-11-01

    This paper proposes an improved fault-tolerant control scheme for PWM inverter-fed induction motor-based electric vehicles. The proposed strategy deals with power switch (IGBTs) failures mitigation within a reconfigurable induction motor control. To increase the vehicle powertrain reliability regarding IGBT open-circuit failures, 4-wire and 4-leg PWM inverter topologies are investigated and their performances discussed in a vehicle context. The proposed fault-tolerant topologies require only minimum hardware modifications to the conventional off-the-shelf six-switch three-phase drive, mitigating the IGBTs failures by specific inverter control. Indeed, the two topologies exploit the induction motor neutral accessibility for fault-tolerant purposes. The 4-wire topology uses then classical hysteresis controllers to account for the IGBT failures. The 4-leg topology, meanwhile, uses a specific 3D space vector PWM to handle vehicle requirements in terms of size (DC bus capacitors) and cost (IGBTs number). Experiments on an induction motor drive and simulations on an electric vehicle are carried-out using a European urban driving cycle to show that the proposed fault-tolerant control approach is effective and provides a simple configuration with high performance in terms of speed and torque responses. PMID:23916869

  1. Non-linear Flood Risk Assessment

    NASA Astrophysics Data System (ADS)

    Mazzarella, A.

    The genesis of floodings is very complex depending on hydrologic, meteorological and evapo-transpirative factors that are linked among themselves in a non linear way with numerous feedback processes. The Cantor dust and the rank-ordering statistics supply a proper framework for identifying a kind of a non linear order in the time succession of the floodings and so provide a basis for their prediction. When a catalogue is analysed, it is necessary to test its completeness with respect to the size of the recorded events and results obtained from analysis of catalogues that do not take into account such a test are suspect and possibly wrong, or, at least, unreliable. Floodings have no instrumentally determined magnitude scale, like that conventionally used for earthquakes, and this is why they are generally described in qualitative terms. For this reason, a semi-quantitative index, called ASI (Alluvial Strength Index) has been here developed that combines attributes of alluvial triggering mechanisms and effects on the territorial and hydraulic system.The historical succession of alluvial events occurred at high valley of Po river (Northern Italy), mean valley of Calore river (Southern Italy) and at Sarno, near Naples, have been accurately reconstructed on the basis of old documents and classified according to their ASI. The catalogues have been verified to be complete only for events classified at least as moderate and this probably because many of the lowest energetic events, especially in the past, escaped the detection. The identification of scale-invariances in the time clustering of alluvial events, both on short and long time scales, even if indicative of the complexity of their genesis, might be very helpful for the assessment and reduction of the hazard of future disasters. For example, on the basis of the results of the rank-ordering statistics, the most probable occurrence of an alluvial event at Sarno, classified at least as strong, is predicted to occur

  2. Detecting non-linearities in neuro-electrical signals: A study of synchronous local field potentials

    NASA Astrophysics Data System (ADS)

    Müller-Gerking, Johannes; Martinerie, Jacques; Neuenschwander, Sergio; Pezard, Laurent; Renault, Bernard; Varela, Francisco J.

    The question of the presence and detection of non-linear dynamics and possibly low-dimensional chaos in the brain is still an open question, with recent results indicating that initial claims for low dimensionality were faulted by incomplete statistical testing. To make some progress on this question, our approach was to use stringent data analysis of precisely controlled and behaviorally significant neuroelectric data. There are strong indications that functional brain activity is correlated with synchronous local field potentials. We examine here such synchronous episodes in data recorded from the visual system of behaving cats and pigeons. Our purpose was to examine under these ideal conditions whether the time series showed any evidence of non-linearity concommitantly with the arising of synchrony. To test for non-linearity we have used surrogate sets for non-linear forecasting, the false nearest strands method, and an examination of deterministic vs stochastic modeling. Our results indicate that the time series under examination do show evidence for traces of non-linear dynamics but weakly, since they are not robust under changes of parameters. We conclude that low-dimensional chaos is unlikely to be found in the brain, and that a robust detection and characterization of higher-dimensional non-linear dynamics is beyond the reach of current analytical tools.

  3. Non-linear electrohydrodynamics in microfluidic devices.

    PubMed

    Zeng, Jun

    2011-01-01

    Since the inception of microfluidics, the electric force has been exploited as one of the leading mechanisms for driving and controlling the movement of the operating fluid and the charged suspensions. Electric force has an intrinsic advantage in miniaturized devices. Because the electrodes are placed over a small distance, from sub-millimeter to a few microns, a very high electric field is easy to obtain. The electric force can be highly localized as its strength rapidly decays away from the peak. This makes the electric force an ideal candidate for precise spatial control. The geometry and placement of the electrodes can be used to design electric fields of varying distributions, which can be readily realized by Micro-Electro-Mechanical Systems (MEMS) fabrication methods. In this paper, we examine several electrically driven liquid handling operations. The emphasis is given to non-linear electrohydrodynamic effects. We discuss the theoretical treatment and related numerical methods. Modeling and simulations are used to unveil the associated electrohydrodynamic phenomena. The modeling based investigation is interwoven with examples of microfluidic devices to illustrate the applications. PMID:21673912

  4. Non-Linear Electrohydrodynamics in Microfluidic Devices

    PubMed Central

    Zeng, Jun

    2011-01-01

    Since the inception of microfluidics, the electric force has been exploited as one of the leading mechanisms for driving and controlling the movement of the operating fluid and the charged suspensions. Electric force has an intrinsic advantage in miniaturized devices. Because the electrodes are placed over a small distance, from sub-millimeter to a few microns, a very high electric field is easy to obtain. The electric force can be highly localized as its strength rapidly decays away from the peak. This makes the electric force an ideal candidate for precise spatial control. The geometry and placement of the electrodes can be used to design electric fields of varying distributions, which can be readily realized by Micro-Electro-Mechanical Systems (MEMS) fabrication methods. In this paper, we examine several electrically driven liquid handling operations. The emphasis is given to non-linear electrohydrodynamic effects. We discuss the theoretical treatment and related numerical methods. Modeling and simulations are used to unveil the associated electrohydrodynamic phenomena. The modeling based investigation is interwoven with examples of microfluidic devices to illustrate the applications. PMID:21673912

  5. Cooperation-induced topological complexity: a promising road to fault tolerance and hebbian learning.

    PubMed

    Turalska, Malgorzata; Geneston, Elvis; West, Bruce J; Allegrini, Paolo; Grigolini, Paolo

    2012-01-01

    According to an increasing number of researchers intelligence emerges from criticality as a consequence of locality breakdown and long-range correlation, well known properties of phase transition processes. We study a model of interacting units, as an idealization of real cooperative systems such as the brain or a flock of birds, for the purpose of discussing the emergence of long-range correlation from the coupling of any unit with its nearest neighbors. We focus on the critical condition that has been recently shown to maximize information transport and we study the topological structure of the network of dynamically linked nodes. Although the topology of this network depends on the arbitrary choice of correlation threshold, namely the correlation intensity selected to establish a link between two nodes; the numerical calculations of this paper afford some important indications on the dynamically induced topology. The first important property is the emergence of a perception length as large as the flock size, thanks to some nodes with a large number of links, thus playing the leadership role. All the units are equivalent and leadership moves in time from one to another set of nodes, thereby insuring fault tolerance. Then we focus on the correlation threshold generating a scale-free topology with power index ν ≈ 1 and we find that if this topological structure is selected to establish consensus through the linked nodes, the control parameter necessary to generate criticality is close to the critical value corresponding to the all-to-all coupling condition. We find that criticality in this case generates also a third state, corresponding to a total lack of consensus. However, we make a numerical analysis of the dynamically induced network, and we find that it consists of two almost independent structures, each of which is equivalent to a network in the all-to-all coupling condition. This observation confirms that cooperation makes the system evolve toward

  6. On mapping algorithms to linear and fault-tolerant systolic arrays

    SciTech Connect

    Kumar, V.K.P.; Tsai, Y.C.

    1989-03-01

    The authors develop a simple mapping technique to design systolic arrays with limited I/O capability. Using this, improved systolic algorithms are derived for some matrix computations, on linearly connected arrays of processing elements (PEs) with constant I/O bandwidth. The important features of these designs are modularity with constant hardware in each PE, few control lines, simple data input/ouput format, and improved delay time. They extend the technique to design an optimal time systolic algorithm for n x n matrix multiplication. In this model, the propagation delay is assumed to be proportional to wire length. Fault reconfiguration is achieved by using buffers to bypass faulty PE's, which does not affect the clock rate of the system. The unidirectional flow of control and data in our design assures correctness of the algorithm in the presence of faulty PE's. This design can be implemented on reconfigurable fault-tolerant VLSI arrays using the Diogenes methodology. The authors compare their designs to those in the literature and are shown to be superior with respect to I/O format, control, and delay from input to output.

  7. Advanced Information Processing System (AIPS)-based fault tolerant avionics architecture for launch vehicles

    NASA Technical Reports Server (NTRS)

    Lala, Jaynarayan H.; Harper, Richard E.; Jaskowiak, Kenneth R.; Rosch, Gene; Alger, Linda S.; Schor, Andrei L.

    1990-01-01

    An avionics architecture for the advanced launch system (ALS) that uses validated hardware and software building blocks developed under the advanced information processing system program is presented. The AIPS for ALS architecture defined is preliminary, and reliability requirements can be met by the AIPS hardware and software building blocks that are built using the state-of-the-art technology available in the 1992-93 time frame. The level of detail in the architecture definition reflects the level of detail available in the ALS requirements. As the avionics requirements are refined, the architecture can also be refined and defined in greater detail with the help of analysis and simulation tools. A useful methodology is demonstrated for investigating the impact of the avionics suite to the recurring cost of the ALS. It is shown that allowing the vehicle to launch with selected detected failures can potentially reduce the recurring launch costs. A comparative analysis shows that validated fault-tolerant avionics built out of Class B parts can result in lower life-cycle-cost in comparison to simplex avionics built out of Class S parts or other redundant architectures.

  8. Advanced Information Processing System (AIPS)-based fault tolerant avionics architecture for launch vehicles

    NASA Astrophysics Data System (ADS)

    Lala, Jaynarayan H.; Harper, Richard E.; Jaskowiak, Kenneth R.; Rosch, Gene; Alger, Linda S.; Schor, Andrei L.

    An avionics architecture for the advanced launch system (ALS) that uses validated hardware and software building blocks developed under the advanced information processing system program is presented. The AIPS for ALS architecture defined is preliminary, and reliability requirements can be met by the AIPS hardware and software building blocks that are built using the state-of-the-art technology available in the 1992-93 time frame. The level of detail in the architecture definition reflects the level of detail available in the ALS requirements. As the avionics requirements are refined, the architecture can also be refined and defined in greater detail with the help of analysis and simulation tools. A useful methodology is demonstrated for investigating the impact of the avionics suite to the recurring cost of the ALS. It is shown that allowing the vehicle to launch with selected detected failures can potentially reduce the recurring launch costs. A comparative analysis shows that validated fault-tolerant avionics built out of Class B parts can result in lower life-cycle-cost in comparison to simplex avionics built out of Class S parts or other redundant architectures.

  9. Reverse Computation for Rollback-based Fault Tolerance in Large Parallel Systems

    SciTech Connect

    Perumalla, Kalyan S; Park, Alfred J

    2013-01-01

    Reverse computation is presented here as an important future direction in addressing the challenge of fault tolerant execution on very large cluster platforms for parallel computing. As the scale of parallel jobs increases, traditional checkpointing approaches suffer scalability problems ranging from computational slowdowns to high congestion at the persistent stores for checkpoints. Reverse computation can overcome such problems and is also better suited for parallel computing on newer architectures with smaller, cheaper or energy-efficient memories and file systems. Initial evidence for the feasibility of reverse computation in large systems is presented with detailed performance data from a particle simulation scaling to 65,536 processor cores and 950 accelerators (GPUs). Reverse computation is observed to deliver very large gains relative to checkpointing schemes when nodes rely on their host processors/memory to tolerate faults at their accelerators. A comparison between reverse computation and checkpointing with measurements such as cache miss ratios, TLB misses and memory usage indicates that reverse computation is hard to ignore as a future alternative to be pursued in emerging architectures.

  10. Model Checking a Byzantine-Fault-Tolerant Self-Stabilizing Protocol for Distributed Clock Synchronization Systems

    NASA Technical Reports Server (NTRS)

    Malekpour, Mahyar R.

    2007-01-01

    This report presents the mechanical verification of a simplified model of a rapid Byzantine-fault-tolerant self-stabilizing protocol for distributed clock synchronization systems. This protocol does not rely on any assumptions about the initial state of the system. This protocol tolerates bursts of transient failures, and deterministically converges within a time bound that is a linear function of the self-stabilization period. A simplified model of the protocol is verified using the Symbolic Model Verifier (SMV) [SMV]. The system under study consists of 4 nodes, where at most one of the nodes is assumed to be Byzantine faulty. The model checking effort is focused on verifying correctness of the simplified model of the protocol in the presence of a permanent Byzantine fault as well as confirmation of claims of determinism and linear convergence with respect to the self-stabilization period. Although model checking results of the simplified model of the protocol confirm the theoretical predictions, these results do not necessarily confirm that the protocol solves the general case of this problem. Modeling challenges of the protocol and the system are addressed. A number of abstractions are utilized in order to reduce the state space. Also, additional innovative state space reduction techniques are introduced that can be used in future verification efforts applied to this and other protocols.

  11. Fault-tolerant corrector/detector chip for high-speed data processing

    DOEpatents

    Andaleon, D.D.; Napolitano, L.M. Jr.; Redinbo, G.R.; Shreeve, W.O.

    1994-03-01

    An internally fault-tolerant data error detection and correction integrated circuit device and a method of operating same is described. The device functions as a bidirectional data buffer between a 32-bit data processor and the remainder of a data processing system and provides a 32-bit datum with a relatively short eight bits of data-protecting parity. The 32-bits of data by eight bits of parity is partitioned into eight 4-bit nibbles and two 4-bit nibbles, respectively. For data flowing towards the processor the data and parity nibbles are checked in parallel and in a single operation employing a dual orthogonal basis technique. The dual orthogonal basis increase the efficiency of the implementation. Any one of ten (eight data, two parity) nibbles are correctable if erroneous, or two different erroneous nibbles are detectable. For data flowing away from the processor the appropriate parity nibble values are calculated and transmitted to the system along with the data. The device regenerates parity values for data flowing in either direction and compares regenerated to generated parity with a totally self-checking equality checker. As such, the device is self-validating and enabled to both detect and indicate an occurrence of an internal failure. A generalization of the device to protect 64-bit data with 16-bit parity to protect against byte-wide errors is also presented. 8 figures.

  12. Fault-tolerant corrector/detector chip for high-speed data processing

    DOEpatents

    Andaleon, David D.; Napolitano, Jr., Leonard M.; Redinbo, G. Robert; Shreeve, William O.

    1994-01-01

    An internally fault-tolerant data error detection and correction integrated circuit device (10) and a method of operating same. The device functions as a bidirectional data buffer between a 32-bit data processor and the remainder of a data processing system and provides a 32-bit datum is provided with a relatively short eight bits of data-protecting parity. The 32-bits of data by eight bits of parity is partitioned into eight 4-bit nibbles and two 4-bit nibbles, respectively. For data flowing towards the processor the data and parity nibbles are checked in parallel and in a single operation employing a dual orthogonal basis technique. The dual orthogonal basis increase the efficiency of the implementation. Any one of ten (eight data, two parity) nibbles are correctable if erroneous, or two different erroneous nibbles are detectable. For data flowing away from the processor the appropriate parity nibble values are calculated and transmitted to the system along with the data. The device regenerates parity values for data flowing in either direction and compares regenerated to generated parity with a totally self-checking equality checker. As such, the device is self-validating and enabled to both detect and indicate an occurrence of an internal failure. A generalization of the device to protect 64-bit data with 16-bit parity to protect against byte-wide errors is also presented.

  13. Extreme temperature robust optical sensor designs and fault-tolerant signal processing

    DOEpatents

    Riza, Nabeel Agha; Perez, Frank

    2012-01-17

    Silicon Carbide (SiC) probe designs for extreme temperature and pressure sensing uses a single crystal SiC optical chip encased in a sintered SiC material probe. The SiC chip may be protected for high temperature only use or exposed for both temperature and pressure sensing. Hybrid signal processing techniques allow fault-tolerant extreme temperature sensing. Wavelength peak-to-peak (or null-to-null) collective spectrum spread measurement to detect wavelength peak/null shift measurement forms a coarse-fine temperature measurement using broadband spectrum monitoring. The SiC probe frontend acts as a stable emissivity Black-body radiator and monitoring the shift in radiation spectrum enables a pyrometer. This application combines all-SiC pyrometry with thick SiC etalon laser interferometry within a free-spectral range to form a coarse-fine temperature measurement sensor. RF notch filtering techniques improve the sensitivity of the temperature measurement where fine spectral shift or spectrum measurements are needed to deduce temperature.

  14. The BTeV DAQ and Trigger System - Some throughput, usability and fault tolerance aspects

    SciTech Connect

    Erik Edward Gottschalk et al.

    2001-08-20

    As presented at the last CHEP conference, the BTeV triggering and data collection pose a significant challenge in construction and operation, generating 1.5 Terabytes/second of raw data from over 30 million detector channels. We report on facets of the DAQ and trigger farms. We report on the current design of the DAQ, especially its partitioning features to support commissioning of the detector. We are exploring collaborations with computer science groups experienced in fault tolerant and dynamic real-time and embedded systems to develop a system to provide the extreme flexibility and high availability required of the heterogeneous trigger farm ({approximately} ten thousand DSPs and commodity processors). We describe directions in the following areas: system modeling and analysis using the Model Integrated Computing approach to assist in the creation of domain-specific modeling, analysis, and program synthesis environments for building complex, large-scale computer-based systems; System Configuration Management to include compilable design specifications for configurable hardware components, schedules, and communication maps; Runtime Environment and Hierarchical Fault Detection/Management--a system-wide infrastructure for rapidly detecting, isolating, filtering, and reporting faults which will be encapsulated in intelligent active entities (agents) to run on DSPs, L2/3 processors, and other supporting processors throughout the system.

  15. Intelligent fault-tolerant control for swing-arm system in the space-borne spectrograph

    NASA Astrophysics Data System (ADS)

    Shi, Yufeng; Zhou, Chunjie; Huang, Xiongfeng; Yin, Quan

    2012-04-01

    Fault-tolerant control (FTC) for the space-borne equipments is very important in the engineering design. This paper presents a two-layer intelligent FTC approach to handle the speed stability problem in the swing-arm system suffering from various faults in space. This approach provides the reliable FTC at the performance level, and improves the control flow error detection capability at the code level. The faults degrading the system performance are detected by the performance-based fault detection mechanism. The detected faults are categorized as the anticipated faults and unanticipated faults by the fault bank. Neural network is used as an on-line estimator to approximate the unanticipated faults. The compensation control and intelligent integral sliding mode control are employed to accommodate two types of faults at the performance level, respectively. To guarantee the reliability of the FTC at the code level, the key parts of the program codes are modified by control flow checking by software signatures (CFCSS) to detect the control flow errors caused by the single event upset. Meanwhile, some of the undetected control flow errors can be detected by the FTC at the performance level. The FTC for the anticipated fault and unanticipated fault are verified in Synopsys Saber, and the detection of control flow error is tested in the DSP controller. Simulation results demonstrate the efficiency of the novel FTC approach.

  16. Fault tolerant cooperative control for UAV rendezvous problem subject to actuator faults

    NASA Astrophysics Data System (ADS)

    Jiang, T.; Meskin, N.; Sobhani-Tehrani, E.; Khorasani, K.; Rabbath, C. A.

    2007-04-01

    This paper investigates the problem of fault tolerant cooperative control for UAV rendezvous problem in which multiple UAVs are required to arrive at their designated target despite presence of a fault in the thruster of any UAV. An integrated hierarchical scheme is proposed and developed that consists of a cooperative rendezvous planning algorithm at the team level and a nonlinear fault detection and isolation (FDI) subsystem at individual UAV's actuator/sensor level. Furthermore, a rendezvous re-planning strategy is developed that interfaces the rendezvous planning algorithm with the low-level FDI. A nonlinear geometric approach is used for the FDI subsystem that can detect and isolate faults in various UAV actuators including thrusters and control surfaces. The developed scheme is implemented for a rendezvous scenario with three Aerosonde UAVs, a single target, and presence of a priori known threats. Simulation results reveal the effectiveness of our proposed scheme in fulfilling the rendezvous mission objective that is specified as a successful intercept of Aerosondes at their designated target, despite the presence of severe loss of effectiveness in Aerosondes engine thrusters.

  17. Multiversion software reliability through fault-avoidance and fault-tolerance

    NASA Technical Reports Server (NTRS)

    Vouk, Mladen A.; Mcallister, David F.

    1990-01-01

    In this project we have proposed to investigate a number of experimental and theoretical issues associated with the practical use of multi-version software in providing dependable software through fault-avoidance and fault-elimination, as well as run-time tolerance of software faults. In the period reported here we have working on the following: We have continued collection of data on the relationships between software faults and reliability, and the coverage provided by the testing process as measured by different metrics (including data flow metrics). We continued work on software reliability estimation methods based on non-random sampling, and the relationship between software reliability and code coverage provided through testing. We have continued studying back-to-back testing as an efficient mechanism for removal of uncorrelated faults, and common-cause faults of variable span. We have also been studying back-to-back testing as a tool for improvement of the software change process, including regression testing. We continued investigating existing, and worked on formulation of new fault-tolerance models. In particular, we have partly finished evaluation of Consensus Voting in the presence of correlated failures, and are in the process of finishing evaluation of Consensus Recovery Block (CRB) under failure correlation. We find both approaches far superior to commonly employed fixed agreement number voting (usually majority voting). We have also finished a cost analysis of the CRB approach.

  18. Robust fault-tolerant tracking control design for spacecraft under control input saturation.

    PubMed

    Bustan, Danyal; Pariz, Naser; Sani, Seyyed Kamal Hosseini

    2014-07-01

    In this paper, a continuous globally stable tracking control algorithm is proposed for a spacecraft in the presence of unknown actuator failure, control input saturation, uncertainty in inertial matrix and external disturbances. The design method is based on variable structure control and has the following properties: (1) fast and accurate response in the presence of bounded disturbances; (2) robust to the partial loss of actuator effectiveness; (3) explicit consideration of control input saturation; and (4) robust to uncertainty in inertial matrix. In contrast to traditional fault-tolerant control methods, the proposed controller does not require knowledge of the actuator faults and is implemented without explicit fault detection and isolation processes. In the proposed controller a single parameter is adjusted dynamically in such a way that it is possible to prove that both attitude and angular velocity errors will tend to zero asymptotically. The stability proof is based on a Lyapunov analysis and the properties of the singularity free quaternion representation of spacecraft dynamics. Results of numerical simulations state that the proposed controller is successful in achieving high attitude performance in the presence of external disturbances, actuator failures, and control input saturation. PMID:24751476

  19. A Novel N-Input Voting Algorithm for X-by-Wire Fault-Tolerant Systems

    PubMed Central

    Karimi, Abbas; Zarafshan, Faraneh; Al-Haddad, S. A. R.; Ramli, Abdul Rahman

    2014-01-01

    Voting is an important operation in multichannel computation paradigm and realization of ultrareliable and real-time control systems that arbitrates among the results of N redundant variants. These systems include N-modular redundant (NMR) hardware systems and diversely designed software systems based on N-version programming (NVP). Depending on the characteristics of the application and the type of selected voter, the voting algorithms can be implemented for either hardware or software systems. In this paper, a novel voting algorithm is introduced for real-time fault-tolerant control systems, appropriate for applications in which N is large. Then, its behavior has been software implemented in different scenarios of error-injection on the system inputs. The results of analyzed evaluations through plots and statistical computations have demonstrated that this novel algorithm does not have the limitations of some popular voting algorithms such as median and weighted; moreover, it is able to significantly increase the reliability and availability of the system in the best case to 2489.7% and 626.74%, respectively, and in the worst case to 3.84% and 1.55%, respectively. PMID:25386613

  20. An Autonomous Self-Aware and Adaptive Fault Tolerant Routing Technique for Wireless Sensor Networks.

    PubMed

    Abba, Sani; Lee, Jeong-A

    2015-01-01

    We propose an autonomous self-aware and adaptive fault-tolerant routing technique (ASAART) for wireless sensor networks. We address the limitations of self-healing routing (SHR) and self-selective routing (SSR) techniques for routing sensor data. We also examine the integration of autonomic self-aware and adaptive fault detection and resiliency techniques for route formation and route repair to provide resilience to errors and failures. We achieved this by using a combined continuous and slotted prioritized transmission back-off delay to obtain local and global network state information, as well as multiple random functions for attaining faster routing convergence and reliable route repair despite transient and permanent node failure rates and efficient adaptation to instantaneous network topology changes. The results of simulations based on a comparison of the ASAART with the SHR and SSR protocols for five different simulated scenarios in the presence of transient and permanent node failure rates exhibit a greater resiliency to errors and failure and better routing performance in terms of the number of successfully delivered network packets, end-to-end delay, delivered MAC layer packets, packet error rate, as well as efficient energy conservation in a highly congested, faulty, and scalable sensor network. PMID:26295236

  1. Improved fault tolerance of Turbo decoding based on optimized index assignments

    NASA Astrophysics Data System (ADS)

    Geldmacher, J.; Götze, J.

    2014-11-01

    This paper investigates the impact of an error-prone buffer memory on a channel decoder as employed in modern digital communication systems. On one hand this work is motivated by the fact that energy efficient decoder implementations may not only be achieved by optimizations on algorithmic level, but also by chip-level modifications. One of such modifications is so called aggressive voltage scaling of buffer memories, which, while achieving reduced power consumption, also injects errors into the likelihood values used during the decoding process. On the other hand, it has been recognized that the ongoing increase of integration density with smaller structures makes integrated circuits more sensitive to process variations during manufacturing, and to voltage and temperature variations. This may lead to a paradigm shift from 100 %-reliable operation to fault tolerant signal processing. Both reasons are the motivation to discuss the required co-design of algorithms and underlying circuits. For an error-prone receive buffer of a Turbo decoder the influence of quantizer design and index assignment on the error resilience of the decoding algorithm is discussed. It is shown that a suitable design of both enables a compensation of hardware induced bits errors with rates up to 1 % without increasing the computational complexity of the decoder.

  2. A Self-Stabilizing Byzantine-Fault-Tolerant Clock Synchronization Protocol

    NASA Technical Reports Server (NTRS)

    Malekpour, Mahyar R.

    2009-01-01

    This report presents a rapid Byzantine-fault-tolerant self-stabilizing clock synchronization protocol that is independent of application-specific requirements. It is focused on clock synchronization of a system in the presence of Byzantine faults after the cause of any transient faults has dissipated. A model of this protocol is mechanically verified using the Symbolic Model Verifier (SMV) [SMV] where the entire state space is examined and proven to self-stabilize in the presence of one arbitrary faulty node. Instances of the protocol are proven to tolerate bursts of transient failures and deterministically converge with a linear convergence time with respect to the synchronization period. This protocol does not rely on assumptions about the initial state of the system other than the presence of sufficient number of good nodes. All timing measures of variables are based on the node s local clock, and no central clock or externally generated pulse is used. The Byzantine faulty behavior modeled here is a node with arbitrarily malicious behavior that is allowed to influence other nodes at every clock tick. The only constraint is that the interactions are restricted to defined interfaces.

  3. Fault tolerant capabilities of the Cosmic Background Explorer attitude control system

    NASA Technical Reports Server (NTRS)

    Placanica, Samuel J.

    1992-01-01

    The Cosmic Background Explorer (COBE), which was launched November 18, 1989 from Vandenberg Air Force Base aboard a Delta rocket, has been classified by the scientific community as a major success with regards to the field of cosmology theory. Despite a number of anomalies which have occurred during the mission, the attitude control system (ACS) has performed remarkably well. This is due in large part to the fault tolerant capabilities that were designed into the ACS. A unique triaxial control system orientated in the spacecraft's transverse plane provides the ACS the ability to safely survive various sensor and actuator failures. Features that help to achieve this fail-operational system include component cross-strapping and autonomous control electronics switching. This design philosophy was of utmost importance because of the constraint placed upon the ACS to keep the spinning observatory and its cryogen-cooled science instruments pointing away from the sun. Even though the liquid helium was depleted within the expected twelve months from launch, it is still very much desirable to avoid any thermal disturbances upon the remaining functional instruments.

  4. Fault-tolerant quantum computation in multiqubit block codes: performance and overhead

    NASA Astrophysics Data System (ADS)

    Brun, Todd

    Fault-tolerant quantum computation requires that quantum information remain encoded in a quantum error-correcting code at all times; that a universal set of logical unitary gates and measurements is available; and that the probability of an uncorrectable error is low for the duration of the computation. Quantum computation can in principle be scaled up to unlimited size if the rate of decoherence is below a threshold. The main constructions that have been studied involve encoding each logical qubit in a separate block (either a concatenated code or a block of the surface code), which typically requires thousands of physical qubits per logical qubit, if not more. To reduce this overhead, we consider using multiqubit codes to achieve much higher storage rates. We estimate performance and overhead for certain families of codes, and ask: how large a quantum computation can be done as a function of the decoherence rate for a fixed size code block? Finally, we consider remaining open questions and limitations to this approach. This work is supported by NSF Grant No. CCF-1421078.

  5. A New On-Line Diagnosis Protocol for the SPIDER Family of Byzantine Fault Tolerant Architectures

    NASA Technical Reports Server (NTRS)

    Geser, Alfons; Miner, Paul S.

    2004-01-01

    This paper presents the formal verification of a new protocol for online distributed diagnosis for the SPIDER family of architectures. An instance of the Scalable Processor-Independent Design for Electromagnetic Resilience (SPIDER) architecture consists of a collection of processing elements communicating over a Reliable Optical Bus (ROBUS). The ROBUS is a specialized fault-tolerant device that guarantees Interactive Consistency, Distributed Diagnosis (Group Membership), and Synchronization in the presence of a bounded number of physical faults. Formal verification of the original SPIDER diagnosis protocol provided a detailed understanding that led to the discovery of a significantly more efficient protocol. The original protocol was adapted from the formally verified protocol used in the MAFT architecture. It required O(N) message exchanges per defendant to correctly diagnose failures in a system with N nodes. The new protocol achieves the same diagnostic fidelity, but only requires O(1) exchanges per defendant. This paper presents this new diagnosis protocol and a formal proof of its correctness using PVS.

  6. Trojan Horse Attack Free Fault-Tolerant Quantum Key Distribution Protocols Using GHZ States

    NASA Astrophysics Data System (ADS)

    Chang, Chih-Hung; Yang, Chun-Wei; Hwang, Tzonelih

    2016-04-01

    Recently, Yang and Hwang (Quantum Inf. Process. 13(3): 781-794, 19) proposed two fault-tolerant QKD protocols based on their proposed coding functions for resisting the collective noise, and their QKD protocols are free from Trojan horse attack without employing any specific detecting devices (e.g., photon number splitter (PNS) and wavelength filter). By using four-particle Greenberger-Horne-Zeilinger (GHZ) state and four-particle GHZ-like state in their proposed coding functions, Yang and Hwang's QKD protocols can resist each kind of the collective noise-collective-dephasing noise, collective-rotation noise. However, their proposed coding function can be improved by the utilization of three-particle GHZ state (three-particle GHZ-like state) instead of four-particle GHZ state (four-particle GHZ-like state) that will eventually reduce the consumption of the qubits. As a result, this study proposed the improved version of Yang and Hwang's coding functions to enhance the qubit efficiency of their schemes from 20 % to 22 %.

  7. Adaptive backstepping fault-tolerant control for flexible spacecraft with unknown bounded disturbances and actuator failures.

    PubMed

    Jiang, Ye; Hu, Qinglei; Ma, Guangfu

    2010-01-01

    In this paper, a robust adaptive fault-tolerant control approach to attitude tracking of flexible spacecraft is proposed for use in situations when there are reaction wheel/actuator failures, persistent bounded disturbances and unknown inertia parameter uncertainties. The controller is designed based on an adaptive backstepping sliding mode control scheme, and a sufficient condition under which this control law can render the system semi-globally input-to-state stable is also provided such that the closed-loop system is robust with respect to any disturbance within a quantifiable restriction on the amplitude, as well as the set of initial conditions, if the control gains are designed appropriately. Moreover, in the design, the control law does not need a fault detection and isolation mechanism even if the failure time instants, patterns and values on actuator failures are also unknown for the designers, as motivated from a practical spacecraft control application. In addition to detailed derivations of the new controller design and a rigorous sketch of all the associated stability and attitude error convergence proofs, illustrative simulation results of an application to flexible spacecraft show that high precise attitude control and vibration suppression are successfully achieved using various scenarios of controlling effective failures. PMID:19747677

  8. The use of hybrid automata for fault-tolerant vibration control for parametric failures

    NASA Astrophysics Data System (ADS)

    Byreddy, Chakradhar; Frampton, Kenneth D.; Yongmin, Kim

    2006-03-01

    The purpose of this work is to make use of hybrid automata for vibration control reconfiguration under system failures. Fault detection and isolation (FDI) filters are used to monitor an active vibration control system. When system failures occur (specifically parametric faults) the FDI filters detect and identify the specific failure. In this work we are specifically interested in parametric faults such as changes in system physical parameters; however this approach works equally well with additive faults such as sensor or actuator failures. The FDI filter output is used to drive a hybrid automaton, which selects the appropriate controller and FDI filter from a library. The hybrid automata also implements switching between controllers and filters in order to maintain optimal performance under faulty operating conditions. The biggest challenge in developing this system is managing the switching and in maintaining stability during the discontinuous switches. Therefore, in addition to vibration control, the stability associated with switching compensators and FDI filters is studied. Furthermore, the performance of two types of FDI filters is compared: filters based on parameter estimation methods and so called "Beard-Jones" filters. Finally, these simulations help in understanding the use of hybrid automata for fault-tolerant control.

  9. CUMULVS: Providing fault-tolerance, visualization and steering of parallel applications

    SciTech Connect

    Geist, G.A. II; Kohl, J.A.; Papadopoulos, P.M.

    1996-09-01

    The use of visualization and computational steering can often assist scientists in analyzing large-scale scientific applications. Fault-tolerance to failures is of great importance when running on a distributed system. However, the details of implementing these features are complex and tedious, leaving many scientists with inadequate development tools. CUMULVS is a library that enables programmers to easily incorporate interactive visualization and computational steering into existing parallel programs. The library is divided into two pieces: one for the application program and one for the, possibly commercial, visualization and steering front-end. Together these two libraries encompass all the connection and data protocols needed to dynamically attach multiple independent viewer front-ends to a running parallel application. Viewer programs can also steer one or more user-defined parameters to {open_quotes}close the loop{close_quotes} for computational experiments and analyses. CUMULVS allows the programmer to specify user-directed checkpoints for saving important program state in case of failures, and also provides a mechanism to migrate tasks across heterogeneous machine architectures to achieve improved performance. Details of the CUMULVS design goals and compromises as well as future directions are given.

  10. Laboratory test methodology for evaluating the effects of electromagnetic disturbances on fault-tolerant control systems

    NASA Technical Reports Server (NTRS)

    Belcastro, Celeste M.

    1989-01-01

    Control systems for advanced aircraft, especially those with relaxed static stability, will be critical to flight and will, therefore, have very high reliability specifications which must be met for adverse as well as nominal operating conditions. Adverse conditions can result from electromagnetic disturbances caused by lightning, high energy radio frequency transmitters, and nuclear electromagnetic pulses. Tools and techniques must be developed to verify the integrity of the control system in adverse operating conditions. The most difficult and illusive perturbations to computer based control systems caused by an electromagnetic environment (EME) are functional error modes that involve no component damage. These error modes are collectively known as upset, can occur simultaneously in all of the channels of a redundant control system, and are software dependent. A methodology is presented for performing upset tests on a multichannel control system and considerations are discussed for the design of upset tests to be conducted in the lab on fault tolerant control systems operating in a closed loop with a simulated plant.

  11. An addressable quantum dot qubit with fault-tolerant control-fidelity.

    PubMed

    Veldhorst, M; Hwang, J C C; Yang, C H; Leenstra, A W; de Ronde, B; Dehollain, J P; Muhonen, J T; Hudson, F E; Itoh, K M; Morello, A; Dzurak, A S

    2014-12-01

    Exciting progress towards spin-based quantum computing has recently been made with qubits realized using nitrogen-vacancy centres in diamond and phosphorus atoms in silicon. For example, long coherence times were made possible by the presence of spin-free isotopes of carbon and silicon. However, despite promising single-atom nanotechnologies, there remain substantial challenges in coupling such qubits and addressing them individually. Conversely, lithographically defined quantum dots have an exchange coupling that can be precisely engineered, but strong coupling to noise has severely limited their dephasing times and control fidelities. Here, we combine the best aspects of both spin qubit schemes and demonstrate a gate-addressable quantum dot qubit in isotopically engineered silicon with a control fidelity of 99.6%, obtained via Clifford-based randomized benchmarking and consistent with that required for fault-tolerant quantum computing. This qubit has dephasing time T2* = 120 μs and coherence time T2 = 28 ms, both orders of magnitude larger than in other types of semiconductor qubit. By gate-voltage-tuning the electron g*-factor we can Stark shift the electron spin resonance frequency by more than 3,000 times the 2.4 kHz electron spin resonance linewidth, providing a direct route to large-scale arrays of addressable high-fidelity qubits that are compatible with existing manufacturing technologies. PMID:25305743

  12. Fault-tolerant nonlinear adaptive flight control using sliding mode online learning.

    PubMed

    Krüger, Thomas; Schnetter, Philipp; Placzek, Robin; Vörsmann, Peter

    2012-08-01

    An expanded nonlinear model inversion flight control strategy using sliding mode online learning for neural networks is presented. The proposed control strategy is implemented for a small unmanned aircraft system (UAS). This class of aircraft is very susceptible towards nonlinearities like atmospheric turbulence, model uncertainties and of course system failures. Therefore, these systems mark a sensible testbed to evaluate fault-tolerant, adaptive flight control strategies. Within this work the concept of feedback linearization is combined with feed forward neural networks to compensate for inversion errors and other nonlinear effects. Backpropagation-based adaption laws of the network weights are used for online training. Within these adaption laws the standard gradient descent backpropagation algorithm is augmented with the concept of sliding mode control (SMC). Implemented as a learning algorithm, this nonlinear control strategy treats the neural network as a controlled system and allows a stable, dynamic calculation of the learning rates. While considering the system's stability, this robust online learning method therefore offers a higher speed of convergence, especially in the presence of external disturbances. The SMC-based flight controller is tested and compared with the standard gradient descent backpropagation algorithm in the presence of system failures. PMID:22386784

  13. Time Delay Fault Tolerant Controller for Actuator Failures during Aircraft Autolanding

    NASA Astrophysics Data System (ADS)

    Lee, Jangho; Choi, Hyoung Sik; Lee, Sangjong; Kim, Eung Tai; Shin, Dongho

    A time delay control methodology is adopted to cope with degraded control performance due to control surface damage of unmanned aerial vehicles, especially in the case of the automatic landing phase. It is a crucial challenge to maintain consistent control performance even under fault environments such as stuck and/or incipient actuator faults. Flight control systems designed using conventional feedback control methods in such cases may result in unsatisfactory performance, and even worse, may not guarantee the closed-loop stability, which is fatal for aircraft in the state of auto-landing. To overcome the shortfalls of the conventional approach, the time delay control scheme is adopted. This scheme is known to be robust against disturbance, model uncertainties and so on. Motivated by the fact that the abrupt and/or incipient actuator faults focused on in this paper could be considered as model uncertainties, we consider the application of the time delay controller to designing a fault tolerant control system. To show the effectiveness of the time delay control method, a nonlinear 6-DOF simulation is performed under model uncertainties and wind disturbances, and control performance is compared with that of conventional controllers in the case of multiple and single actuator faults.

  14. Validation of a fault-tolerant multiprocessor: Baseline experiments and workload implementation

    NASA Technical Reports Server (NTRS)

    Feather, Frank; Siewiorek, Daniel; Segall, Zary

    1985-01-01

    In the future, aircraft must employ highly reliable multiprocessors in order to achieve flight safety. Such computers must be experimentally validated before they are deployed. This project outlines a methodology for validating reliable multiprocessors. The methodology begins with baseline experiments, which tests a single phenomenon. As experiments progress, tools for performance testing are developed. The methodology is used, in part, on the Fault Tolerant Multiprocessor (FTMP) at NASA-Langley's AIRLAB facility. Experiments are designed to evaluate the fault-free performance of the system. Presented are the results of interrupt baseline experiments performed on FTMP. Interrupt causing exception conditions were tested, and several were found to have unimplemented interrupt handling software while one had an unimplemented interrupt vector. A synthetic workload model for realtime multiprocessors is then developed as an application level performance analysis tool. Details of the workload implementation and calibration are presented. Both the experimental methodology and the synthetic workload model are general enough to be applicable to reliable multiprocessors beside FTMP.

  15. Fault-Tolerant, Real-Time, Multi-Core Computer System

    NASA Technical Reports Server (NTRS)

    Gostelow, Kim P.

    2012-01-01

    A document discusses a fault-tolerant, self-aware, low-power, multi-core computer for space missions with thousands of simple cores, achieving speed through concurrency. The proposed machine decides how to achieve concurrency in real time, rather than depending on programmers. The driving features of the system are simple hardware that is modular in the extreme, with no shared memory, and software with significant runtime reorganizing capability. The document describes a mechanism for moving ongoing computations and data that is based on a functional model of execution. Because there is no shared memory, the processor connects to its neighbors through a high-speed data link. Messages are sent to a neighbor switch, which in turn forwards that message on to its neighbor until reaching the intended destination. Except for the neighbor connections, processors are isolated and independent of each other. The processors on the periphery also connect chip-to-chip, thus building up a large processor net. There is no particular topology to the larger net, as a function at each processor allows it to forward a message in the correct direction. Some chip-to-chip connections are not necessarily nearest neighbors, providing short cuts for some of the longer physical distances. The peripheral processors also provide the connections to sensors, actuators, radios, science instruments, and other devices with which the computer system interacts.

  16. A Performance Prediction Model for a Fault-Tolerant Computer During Recovery and Restoration

    NASA Technical Reports Server (NTRS)

    Obando, Rodrigo A.; Stoughton, John W.

    1995-01-01

    The modeling and design of a fault-tolerant multiprocessor system is addressed. Of interest is the behavior of the system during recovery and restoration after a fault has occurred. The multiprocessor systems are based on the Algorithm to Architecture Mapping Model (ATAMM) and the fault considered is the death of a processor. The developed model is useful in the determination of performance bounds of the system during recovery and restoration. The performance bounds include time to recover from the fault, time to restore the system, and determination of any permanent delay in the input to output latency after the system has regained steady state. Implementation of an ATAMM based computer was developed for a four-processor generic VHSIC spaceborne computer (GVSC) as the target system. A simulation of the GVSC was also written on the code used in the ATAMM Multicomputer Operating System (AMOS). The simulation is used to verify the new model for tracking the propagation of the delay through the system and predicting the behavior of the transient state of recovery and restoration. The model is shown to accurately predict the transient behavior of an ATAMM based multicomputer during recovery and restoration.

  17. An Autonomous Self-Aware and Adaptive Fault Tolerant Routing Technique for Wireless Sensor Networks

    PubMed Central

    Abba, Sani; Lee, Jeong-A

    2015-01-01

    We propose an autonomous self-aware and adaptive fault-tolerant routing technique (ASAART) for wireless sensor networks. We address the limitations of self-healing routing (SHR) and self-selective routing (SSR) techniques for routing sensor data. We also examine the integration of autonomic self-aware and adaptive fault detection and resiliency techniques for route formation and route repair to provide resilience to errors and failures. We achieved this by using a combined continuous and slotted prioritized transmission back-off delay to obtain local and global network state information, as well as multiple random functions for attaining faster routing convergence and reliable route repair despite transient and permanent node failure rates and efficient adaptation to instantaneous network topology changes. The results of simulations based on a comparison of the ASAART with the SHR and SSR protocols for five different simulated scenarios in the presence of transient and permanent node failure rates exhibit a greater resiliency to errors and failure and better routing performance in terms of the number of successfully delivered network packets, end-to-end delay, delivered MAC layer packets, packet error rate, as well as efficient energy conservation in a highly congested, faulty, and scalable sensor network. PMID:26295236

  18. Toward a Fault Tolerant Architecture for Vital Medical-Based Wearable Computing.

    PubMed

    Abdali-Mohammadi, Fardin; Bajalan, Vahid; Fathi, Abdolhossein

    2015-12-01

    Advancements in computers and electronic technologies have led to the emergence of a new generation of efficient small intelligent systems. The products of such technologies might include Smartphones and wearable devices, which have attracted the attention of medical applications. These products are used less in critical medical applications because of their resource constraint and failure sensitivity. This is due to the fact that without safety considerations, small-integrated hardware will endanger patients' lives. Therefore, proposing some principals is required to construct wearable systems in healthcare so that the existing concerns are dealt with. Accordingly, this paper proposes an architecture for constructing wearable systems in critical medical applications. The proposed architecture is a three-tier one, supporting data flow from body sensors to cloud. The tiers of this architecture include wearable computers, mobile computing, and mobile cloud computing. One of the features of this architecture is its high possible fault tolerance due to the nature of its components. Moreover, the required protocols are presented to coordinate the components of this architecture. Finally, the reliability of this architecture is assessed by simulating the architecture and its components, and other aspects of the proposed architecture are discussed. PMID:26364202

  19. A Byzantine-Fault Tolerant Self-Stabilizing Protocol for Distributed Clock Synchronization Systems

    NASA Technical Reports Server (NTRS)

    Malekpour, Mahyar R.

    2006-01-01

    Embedded distributed systems have become an integral part of safety-critical computing applications, necessitating system designs that incorporate fault tolerant clock synchronization in order to achieve ultra-reliable assurance levels. Many efficient clock synchronization protocols do not, however, address Byzantine failures, and most protocols that do tolerate Byzantine failures do not self-stabilize. Of the Byzantine self-stabilizing clock synchronization algorithms that exist in the literature, they are based on either unjustifiably strong assumptions about initial synchrony of the nodes or on the existence of a common pulse at the nodes. The Byzantine self-stabilizing clock synchronization protocol presented here does not rely on any assumptions about the initial state of the clocks. Furthermore, there is neither a central clock nor an externally generated pulse system. The proposed protocol converges deterministically, is scalable, and self-stabilizes in a short amount of time. The convergence time is linear with respect to the self-stabilization period. Proofs of the correctness of the protocol as well as the results of formal verification efforts are reported.

  20. Verification of a Byzantine-Fault-Tolerant Self-stabilizing Protocol for Clock Synchronization

    NASA Technical Reports Server (NTRS)

    Malekpour, Mahyar R.

    2008-01-01

    This paper presents the mechanical verification of a simplified model of a rapid Byzantine-fault-tolerant self-stabilizing protocol for distributed clock synchronization systems. This protocol does not rely on any assumptions about the initial state of the system except for the presence of sufficient good nodes, thus making the weakest possible assumptions and producing the strongest results. This protocol tolerates bursts of transient failures, and deterministically converges within a time bound that is a linear function of the self-stabilization period. A simplified model of the protocol is verified using the Symbolic Model Verifier (SMV). The system under study consists of 4 nodes, where at most one of the nodes is assumed to be Byzantine faulty. The model checking effort is focused on verifying correctness of the simplified model of the protocol in the presence of a permanent Byzantine fault as well as confirmation of claims of determinism and linear convergence with respect to the self-stabilization period. Although model checking results of the simplified model of the protocol confirm the theoretical predictions, these results do not necessarily confirm that the protocol solves the general case of this problem. Modeling challenges of the protocol and the system are addressed. A number of abstractions are utilized in order to reduce the state space.