Science.gov

Sample records for fault-tolerant non-linear analytic

  1. Fault-tolerant flight control system combining expert system and analytical redundancy concepts

    NASA Technical Reports Server (NTRS)

    Handelman, Dave

    1987-01-01

    This research involves the development of a knowledge-based fault-tolerant flight control system. A software architecture is presented that integrates quantitative analytical redundancy techniques and heuristic expert system problem solving concepts for the purpose of in-flight, real-time failure accommodation.

  2. Formal specification of requirements for analytical redundancy-based fault-tolerant flight control systems

    NASA Astrophysics Data System (ADS)

    Del Gobbo, Diego

    2000-10-01

    Flight control systems are undergoing a rapid process of automation. The use of Fly-By-Wire digital flight control systems in commercial aviation (Airbus 320 and Boeing FBW-B777) is a clear sign of this trend. The increased automation goes in parallel with an increased complexity of flight control systems with obvious consequences on reliability and safety. Flight control systems must meet strict fault-tolerance requirements. The standard solution to achieving fault tolerance capability relies on multi-string architectures. On the other hand, multi-string architectures further increase the complexity of the system inducing a reduction of overall reliability. In the past two decades a variety of techniques based on analytical redundancy have been suggested for fault diagnosis purposes. While research on analytical redundancy has obtained desirable results, a design methodology involving requirements specification and feasibility analysis of analytical redundancy based fault tolerant flight control systems is missing. The main objective of this research work is to describe within a formal framework the implications of adopting analytical redundancy as a basis to achieve fault tolerance. The research activity involves analysis of the analytical redundancy approach, analysis of flight control system informal requirements, and re-engineering (modeling and specification) of the fault tolerance requirements. The USAF military specification MIL-F-9490D and supporting documents are adopted as source for the flight control informal requirements. The De Havilland DHC-2 general aviation aircraft equipped with standard autopilot control functions is adopted as pilot application. Relational algebra is adopted as formal framework for the specification of the requirements. The detailed analysis and formalization of the requirements resulted in a better definition of the fault tolerance problem in the framework of analytical redundancy. Fault tolerance requirements and related

  3. Rapid Non-Linear Uncertainty Propagation via Analytical Techniques

    NASA Astrophysics Data System (ADS)

    Fujimoto, K.; Scheeres, D. J.

    2012-09-01

    Space situational awareness (SSA) is known to be a data starved problem compared to traditional estimation problems in that observation gaps per object may span over days if not weeks. Therefore, consistent characterization of the uncertainty associated with these objects including non-linear effects is crucial in maintaining an accurate catalog of objects in Earth orbit. Simultaneously, the motion of satellites in Earth orbit is well-modeled in that it is particularly amenable to having their solution and their uncertainty described through analytic or semi-analytic techniques. Even when stronger non-gravitational perturbations such as solar radiation pressure and atmospheric drag are encountered, these perturbations generally have deterministic components that are substantially larger than their time-varying stochastic components. Analytic techniques are powerful because time propagation is only a matter of changing the time parameter, allowing for rapid computational turnaround. These two ideas are combined in this paper: a method of analytically propagating non-linear orbit uncertainties is discussed. In particular, the uncertainty is expressed as an analytic probability density function (pdf) for all time. For a deterministic system model, such pdfs may be obtained if the initial pdf and the system states for all time are also given analytically. Even when closed-form solutions are not available, approximate solutions exist in the form of Edgeworth series for pdfs and Taylor series for the states. The coefficients of the latter expansion are referred to as state transition tensors (STTs), which are a generalization of state transition matrices to arbitrary order. Analytically expressed pdfs can be incorporated in many practical tasks in SSA. One can compute the mean and covariance of the uncertainty, for example, with the moments of the initial pdf as inputs. This process does not involve any sampling and its accuracy can be determined a priori. Analytical

  4. Intelligent fault-tolerant controllers

    NASA Technical Reports Server (NTRS)

    Huang, Chien Y.

    1987-01-01

    A system with fault tolerant controls is one that can detect, isolate, and estimate failures and perform necessary control reconfiguration based on this new information. Artificial intelligence (AI) is concerned with semantic processing, and it has evolved to include the topics of expert systems and machine learning. This research represents an attempt to apply AI to fault tolerant controls, hence, the name intelligent fault tolerant control (IFTC). A generic solution to the problem is sought, providing a system based on logic in addition to analytical tools, and offering machine learning capabilities. The advantages are that redundant system specific algorithms are no longer needed, that reasonableness is used to quickly choose the correct control strategy, and that the system can adapt to new situations by learning about its effects on system dynamics.

  5. An aircraft sensor fault tolerant system

    NASA Technical Reports Server (NTRS)

    Caglayan, A. K.; Lancraft, R. E.

    1982-01-01

    The design of a sensor fault tolerant system which uses analytical redundancy for the Terminal Configured Vehicle (TCV) research aircraft in a Microwave Landing System (MLS) environment was studied. The fault tolerant system provides reliable estimates for aircraft position, velocity, and attitude in the presence of possible failures in navigation aid instruments and onboard sensors. The estimates, provided by the fault tolerant system, are used by the automated guidance and control system to land the aircraft along a prescribed path. Sensor failures are identified by utilizing the analytic relationship between the various sensor outputs arising from the aircraft equations of motion.

  6. Fault tolerant control of spacecraft

    NASA Astrophysics Data System (ADS)

    Godard

    Autonomous multiple spacecraft formation flying space missions demand the development of reliable control systems to ensure rapid, accurate, and effective response to various attitude and formation reconfiguration commands. Keeping in mind the complexities involved in the technology development to enable spacecraft formation flying, this thesis presents the development and validation of a fault tolerant control algorithm that augments the AOCS on-board a spacecraft to ensure that these challenging formation flying missions will fly successfully. Taking inspiration from the existing theory of nonlinear control, a fault-tolerant control system for the RyePicoSat missions is designed to cope with actuator faults whilst maintaining the desirable degree of overall stability and performance. Autonomous fault tolerant adaptive control scheme for spacecraft equipped with redundant actuators and robust control of spacecraft in underactuated configuration, represent the two central themes of this thesis. The developed algorithms are validated using a hardware-in-the-loop simulation. A reaction wheel testbed is used to validate the proposed fault tolerant attitude control scheme. A spacecraft formation flying experimental testbed is used to verify the performance of the proposed robust control scheme for underactuated spacecraft configurations. The proposed underactuated formation flying concept leads to more than 60% savings in fuel consumption when compared to a fully actuated spacecraft formation configuration. We also developed a novel attitude control methodology that requires only a single thruster to stabilize three axis attitude and angular velocity components of a spacecraft. Numerical simulations and hardware-in-the-loop experimental results along with rigorous analytical stability analysis shows that the proposed methodology will greatly enhance the reliability of the spacecraft, while allowing for potentially significant overall mission cost reduction.

  7. Fault tolerant linear actuator

    DOEpatents

    Tesar, Delbert

    2004-09-14

    In varying embodiments, the fault tolerant linear actuator of the present invention is a new and improved linear actuator with fault tolerance and positional control that may incorporate velocity summing, force summing, or a combination of the two. In one embodiment, the invention offers a velocity summing arrangement with a differential gear between two prime movers driving a cage, which then drives a linear spindle screw transmission. Other embodiments feature two prime movers driving separate linear spindle screw transmissions, one internal and one external, in a totally concentric and compact integrated module.

  8. Validated Fault Tolerant Architectures for Space Station

    NASA Technical Reports Server (NTRS)

    Lala, Jaynarayan H.

    1990-01-01

    Viewgraphs on validated fault tolerant architectures for space station are presented. Topics covered include: fault tolerance approach; advanced information processing system (AIPS); and fault tolerant parallel processor (FTPP).

  9. Design and validation of fault-tolerant flight systems

    NASA Technical Reports Server (NTRS)

    Finelli, George B.; Palumbo, Daniel L.

    1987-01-01

    NASA has undertaken the development of a methodology for the design of easily validated fault-tolerant systems which emphasizes validation processes that can be directly incorporated into the design process. Attention is presently given to the statistical issues arising in the validation of highly reliable fault-tolerant systems. Structured specification and design methodologies, mathematical proof techniques, analytical modeling, simulation/emulation, and physical testing, are all discussed. Important design factors associated with fault-tolerance are noted; synchronization and 'Byzantine resilience' must accompany fault tolerance.

  10. Approximate Analytical Solutions for Primary Chatter in the Non-Linear Metal Cutting Model

    NASA Astrophysics Data System (ADS)

    Warmiński, J.; Litak, G.; Cartmell, M. P.; Khanin, R.; Wiercigroch, M.

    2003-01-01

    This paper considers an accepted model of the metal cutting process dynamics in the context of an approximate analysis of the resulting non-linear differential equations of motion. The process model is based upon the established mechanics of orthogonal cutting and results in a pair of non-linear ordinary differential equations which are then restated in a form suitable for approximate analytical solution. The chosen solution technique is the perturbation method of multiple time scales and approximate closed-form solutions are generated for the most important non-resonant case. Numerical data are then substituted into the analytical solutions and key results are obtained and presented. Some comparisons between the exact numerical calculations for the forces involved and their reduced and simplified analytical counterparts are given. It is shown that there is almost no discernible difference between the two thus confirming the validity of the excitation functions adopted in the analysis for the data sets used, these being chosen to represent a real orthogonal cutting process. In an attempt to provide guidance for the selection of technological parameters for the avoidance of primary chatter, this paper determines for the first time the stability regions in terms of the depth of cut and the cutting speed co-ordinates.

  11. Fault-tolerant processing system

    NASA Technical Reports Server (NTRS)

    Palumbo, Daniel L. (Inventor)

    1996-01-01

    A fault-tolerant, fiber optic interconnect, or backplane, which serves as a via for data transfer between modules. Fault tolerance algorithms are embedded in the backplane by dividing the backplane into a read bus and a write bus and placing a redundancy management unit (RMU) between the read bus and the write bus so that all data transmitted by the write bus is subjected to the fault tolerance algorithms before the data is passed for distribution to the read bus. The RMU provides both backplane control and fault tolerance.

  12. SFT: Scalable Fault Tolerance

    SciTech Connect

    Petrini, Fabrizio; Nieplocha, Jarek; Tipparaju, Vinod

    2006-04-15

    In this paper we will present a new technology that we are currently developing within the SFT: Scalable Fault Tolerance FastOS project which seeks to implement fault tolerance at the operating system level. Major design goals include dynamic reallocation of resources to allow continuing execution in the presence of hardware failures, very high scalability, high efficiency (low overhead), and transparency—requiring no changes to user applications. Our technology is based on a global coordination mechanism, that enforces transparent recovery lines in the system, and TICK, a lightweight, incremental checkpointing software architecture implemented as a Linux kernel module. TICK is completely user-transparent and does not require any changes to user code or system libraries; it is highly responsive: an interrupt, such as a timer interrupt, can trigger a checkpoint in as little as 2.5μs; and it supports incremental and full checkpoints with minimal overhead—less than 6% with full checkpointing to disk performed as frequently as once per minute.

  13. Fault Tolerant State Machines

    NASA Technical Reports Server (NTRS)

    Burke, Gary R.; Taft, Stephanie

    2004-01-01

    State machines are commonly used to control sequential logic in FPGAs and ASKS. An errant state machine can cause considerable damage to the device it is controlling. For example in space applications, the FPGA might be controlling Pyros, which when fired at the wrong time will cause a mission failure. Even a well designed state machine can be subject to random errors us a result of SEUs from the radiation environment in space. There are various ways to encode the states of a state machine, and the type of encoding makes a large difference in the susceptibility of the state machine to radiation. In this paper we compare 4 methods of state machine encoding and find which method gives the best fault tolerance, as well as determining the resources needed for each method.

  14. Robot Position Sensor Fault Tolerance

    NASA Technical Reports Server (NTRS)

    Aldridge, Hal A.

    1997-01-01

    Robot systems in critical applications, such as those in space and nuclear environments, must be able to operate during component failure to complete important tasks. One failure mode that has received little attention is the failure of joint position sensors. Current fault tolerant designs require the addition of directly redundant position sensors which can affect joint design. A new method is proposed that utilizes analytical redundancy to allow for continued operation during joint position sensor failure. Joint torque sensors are used with a virtual passive torque controller to make the robot joint stable without position feedback and improve position tracking performance in the presence of unknown link dynamics and end-effector loading. Two Cartesian accelerometer based methods are proposed to determine the position of the joint. The joint specific position determination method utilizes two triaxial accelerometers attached to the link driven by the joint with the failed position sensor. The joint specific method is not computationally complex and the position error is bounded. The system wide position determination method utilizes accelerometers distributed on different robot links and the end-effector to determine the position of sets of multiple joints. The system wide method requires fewer accelerometers than the joint specific method to make all joint position sensors fault tolerant but is more computationally complex and has lower convergence properties. Experiments were conducted on a laboratory manipulator. Both position determination methods were shown to track the actual position satisfactorily. A controller using the position determination methods and the virtual passive torque controller was able to servo the joints to a desired position during position sensor failure.

  15. Fault-tolerant rotary actuator

    DOEpatents

    Tesar, Delbert

    2006-10-17

    A fault-tolerant actuator module, in a single containment shell, containing two actuator subsystems that are either asymmetrically or symmetrically laid out is provided. Fault tolerance in the actuators of the present invention is achieved by the employment of dual sets of equal resources. Dual resources are integrated into single modules, with each having the external appearance and functionality of a single set of resources.

  16. Reinitialization issues in fault tolerant systems

    NASA Technical Reports Server (NTRS)

    Caglayan, A. K.; Lancraft, R. E.

    1983-01-01

    This paper is concerned with the reinitialization of fault tolerant systems in which detection and isolation (FDI) techniques are used, on-line, to identify and compensate for system failures. Specifically, it will focus on FDI techniques which utilize analytic redundancy, arising from a knowledge of the plant dynamics, by analyzing the residuals of a no-fail filter designed on the assumption of no failures. In these types of fault tolerant systems, system failures have to propagate through the no-fail filter dynamics in order to get detected. Therefore, the no-fail filter must be reinitialized after the isolation of a failure so that the accumulated effects of the failure are removed. In this paper, various approaches to this reinitialization problem will be discussed.

  17. Genetic programming as an analytical tool for non-linear dielectric spectroscopy.

    PubMed

    Woodward, A M; Gilbert, R J; Kell, D B

    1999-05-01

    By modelling the non-linear effects of membranous enzymes on an applied oscillating electromagnetic field using supervised multivariate analysis methods, Non-Linear Dielectric Spectroscopy (NLDS) has previously been shown to produce quantitative information that is indicative of the metabolic state of various organisms. The use of Genetic Programming (GP) for the multivariate analysis of NLDS data recorded from yeast fermentations is discussed, and GPs are compared with previous results using Partial Least Squares (PLS) and Artificial Neural Nets (NN). GP considerably outperforms these methods, both in terms of the precision of the predictions and their interpretability. PMID:10379559

  18. Implementing fault-tolerant sensors

    NASA Technical Reports Server (NTRS)

    Marzullo, Keith

    1989-01-01

    One aspect of fault tolerance in process control programs is the ability to tolerate sensor failure. A methodology is presented for transforming a process control program that cannot tolerate sensor failures to one that can. Additionally, a hierarchy of failure models is identified.

  19. Chip level simulation of fault tolerant computers

    NASA Technical Reports Server (NTRS)

    Armstrong, J. R.

    1982-01-01

    Chip-level modeling techniques in the evaluation of fault tolerant systems were researched. A fault tolerant computer was modeled. An efficient approach to functional fault simulation was developed. Simulation software was also developed.

  20. Fault-Tolerant Heat Exchanger

    NASA Technical Reports Server (NTRS)

    Izenson, Michael G.; Crowley, Christopher J.

    2005-01-01

    A compact, lightweight heat exchanger has been designed to be fault-tolerant in the sense that a single-point leak would not cause mixing of heat-transfer fluids. This particular heat exchanger is intended to be part of the temperature-regulation system for habitable modules of the International Space Station and to function with water and ammonia as the heat-transfer fluids. The basic fault-tolerant design is adaptable to other heat-transfer fluids and heat exchangers for applications in which mixing of heat-transfer fluids would pose toxic, explosive, or other hazards: Examples could include fuel/air heat exchangers for thermal management on aircraft, process heat exchangers in the cryogenic industry, and heat exchangers used in chemical processing. The reason this heat exchanger can tolerate a single-point leak is that the heat-transfer fluids are everywhere separated by a vented volume and at least two seals. The combination of fault tolerance, compactness, and light weight is implemented in a unique heat-exchanger core configuration: Each fluid passage is entirely surrounded by a vented region bridged by solid structures through which heat is conducted between the fluids. Precise, proprietary fabrication techniques make it possible to manufacture the vented regions and heat-conducting structures with very small dimensions to obtain a very large coefficient of heat transfer between the two fluids. A large heat-transfer coefficient favors compact design by making it possible to use a relatively small core for a given heat-transfer rate. Calculations and experiments have shown that in most respects, the fault-tolerant heat exchanger can be expected to equal or exceed the performance of the non-fault-tolerant heat exchanger that it is intended to supplant (see table). The only significant disadvantages are a slight weight penalty and a small decrease in the mass-specific heat transfer.

  1. a Frequency Domain Based NUMERIC-ANALYTICAL Method for Non-Linear Dynamical Systems

    NASA Astrophysics Data System (ADS)

    Narayanan, S.; Sekar, P.

    1998-04-01

    In this paper a multiharmonic balancing technique is used to develop certain algorithms to determine periodic orbits of non-liner dynamical systems with external, parametric and self excitations. Essentially, in this method the non-linear differential equations are transformed into a set of non-linear algebraic equations in terms of the Fourier coefficients of the periodic solutions which are solved by using the Newton-Raphson technique. The method is developed such that both fast Fourier transform and discrete Fourier transform algorithms can be used. It is capable of treating all types of non-linearities and higher dimensional systems. The stability of periodic orbits is investigated by obtaining the monodromy matrix. A path following algorithm based on the predictor-corrector method is also presented to enable the bifurcation analysis. The prediction is done with a cubic extrapolation technique with an arc length incrementation while the correction is done with the use of the least square minimisation technique. The under determined system of equations is solved by singular value decomposition. The suitability of the method is demonstrated by obtaining the bifurcational behaviour of rolling contact vibrations modelled by Hertz contact law.

  2. Fault tolerant software modules for SIFT

    NASA Technical Reports Server (NTRS)

    Hecht, M.; Hecht, H.

    1982-01-01

    The implementation of software fault tolerance is investigated for critical modules of the Software Implemented Fault Tolerance (SIFT) operating system to support the computational and reliability requirements of advanced fly by wire transport aircraft. Fault tolerant designs generated for the error reported and global executive are examined. A description of the alternate routines, implementation requirements, and software validation are included.

  3. Unique signature of bivalent analyte surface plasmon resonance model: A model governed by non-linear differential equations

    NASA Astrophysics Data System (ADS)

    Tiwari, Purushottam; Wang, Xuewen; Darici, Yesim; He, Jin; Uren, Aykut

    Surface plasmon resonance (SPR) is a biophysical technique for the quantitative analysis of bimolecular interactions. Correct identification of the binding model is crucial for the interpretation of SPR data. Bivalent SPR model is governed by non-linear differential equations, which, in general, have no analytical solutions. Therefore, an analytical based approach cannot be employed in order to identify this particular model. There exists a unique signature in the bivalent analyte model, existence of an `optimal analyte concentration', which can distinguish this model from other biphasic models. The unambiguous identification and related analysis of the bivalent analyte model is demonstrated by using theoretical simulations and experimentally measured SPR sensorgrams. Experimental SPR sensorgrams were measured by using Biacore T200 instrument available in Biacore Molecular Interaction Shared Resource facility, supported by NIH Grant P30CA51008, at Georgetown University.

  4. On Three-Dimensional Flow and Heat Transfer over a Non-Linearly Stretching Sheet: Analytical and Numerical Solutions

    PubMed Central

    Khan, Junaid Ahmad; Mustafa, Meraj; Hayat, Tasawar; Alsaedi, Ahmed

    2014-01-01

    This article studies the viscous flow and heat transfer over a plane horizontal surface stretched non-linearly in two lateral directions. Appropriate wall conditions characterizing the non-linear variation in the velocity and temperature of the sheet are employed for the first time. A new set of similarity variables is introduced to reduce the boundary layer equations into self-similar forms. The velocity and temperature distributions are determined by two methods, namely (i) optimal homotopy analysis method (OHAM) and (ii) fourth-fifth-order Runge-Kutta integration based shooting technique. The analytic and numerical solutions are compared and these are found in excellent agreement. Influences of embedded parameters on momentum and thermal boundary layers are sketched and discussed. PMID:25198696

  5. On three-dimensional flow and heat transfer over a non-linearly stretching sheet: analytical and numerical solutions.

    PubMed

    Khan, Junaid Ahmad; Mustafa, Meraj; Hayat, Tasawar; Alsaedi, Ahmed

    2014-01-01

    This article studies the viscous flow and heat transfer over a plane horizontal surface stretched non-linearly in two lateral directions. Appropriate wall conditions characterizing the non-linear variation in the velocity and temperature of the sheet are employed for the first time. A new set of similarity variables is introduced to reduce the boundary layer equations into self-similar forms. The velocity and temperature distributions are determined by two methods, namely (i) optimal homotopy analysis method (OHAM) and (ii) fourth-fifth-order Runge-Kutta integration based shooting technique. The analytic and numerical solutions are compared and these are found in excellent agreement. Influences of embedded parameters on momentum and thermal boundary layers are sketched and discussed. PMID:25198696

  6. Parallel fault-tolerant robot control

    NASA Technical Reports Server (NTRS)

    Hamilton, D. L.; Bennett, J. K.; Walker, I. D.

    1992-01-01

    A shared memory multiprocessor architecture is used to develop a parallel fault-tolerant robot controller. Several versions of the robot controller are developed and compared. A robot simulation is also developed for control observation. Comparison of a serial version of the controller and a parallel version without fault tolerance showed the speedup possible with the coarse-grained parallelism currently employed. The performance degradation due to the addition of processor fault tolerance was demonstrated by comparison of these controllers with their fault-tolerant versions. Comparison of the more fault-tolerant controller with the lower-level fault-tolerant controller showed how varying the amount of redundant data affects performance. The results demonstrate the trade-off between speed performance and processor fault tolerance.

  7. Software Fault Tolerance: A Tutorial

    NASA Technical Reports Server (NTRS)

    Torres-Pomales, Wilfredo

    2000-01-01

    Because of our present inability to produce error-free software, software fault tolerance is and will continue to be an important consideration in software systems. The root cause of software design errors is the complexity of the systems. Compounding the problems in building correct software is the difficulty in assessing the correctness of software for highly complex systems. After a brief overview of the software development processes, we note how hard-to-detect design faults are likely to be introduced during development and how software faults tend to be state-dependent and activated by particular input sequences. Although component reliability is an important quality measure for system level analysis, software reliability is hard to characterize and the use of post-verification reliability estimates remains a controversial issue. For some applications software safety is more important than reliability, and fault tolerance techniques used in those applications are aimed at preventing catastrophes. Single version software fault tolerance techniques discussed include system structuring and closure, atomic actions, inline fault detection, exception handling, and others. Multiversion techniques are based on the assumption that software built differently should fail differently and thus, if one of the redundant versions fails, it is expected that at least one of the other versions will provide an acceptable output. Recovery blocks, N-version programming, and other multiversion techniques are reviewed.

  8. Hardware and software fault tolerance - A unified architectural approach

    NASA Technical Reports Server (NTRS)

    Lala, Jaynarayan H.; Alger, Linda S.

    1988-01-01

    The loss of hardware fault tolerance which often arises when design diversity is used to improve the fault tolerance of computer software is considered analytically, and a unified design approach is proposed to avoid the problem. The fundamental theory of fault-tolerant (FT) architectures is reviewed; the current status of design-diversity software development is surveyed; and the FT-processor/attached-processor (FTP/AP) architecture developed by Lala et al. (1986) is described in detail and illustrated with diagrams. FTP/AP is shown to permit efficient implementation of N-version FT software while still tolerating random hardware failures with very high coverage; the reliability is found to be significantly higher than that of conventional majority-vote N-version software.

  9. A Log-Scaling Fault Tolerant Agreement Algorithm for a Fault Tolerant MPI

    SciTech Connect

    Hursey, Joshua J; Naughton, III, Thomas J; Vallee, Geoffroy R; Graham, Richard L

    2011-01-01

    The lack of fault tolerance is becoming a limiting factor for application scalability in HPC systems. The MPI does not provide standardized fault tolerance interfaces and semantics. The MPI Forum's Fault Tolerance Working Group is proposing a collective fault tolerant agreement algorithm for the next MPI standard. Such algorithms play a central role in many fault tolerant applications. This paper combines a log-scaling two-phase commit agreement algorithm with a reduction operation to provide the necessary functionality for the new collective without any additional messages. Error handling mechanisms are described that preserve the fault tolerance properties while maintaining overall scalability.

  10. Fault-tolerant parallel processor

    SciTech Connect

    Harper, R.E.; Lala, J.H. )

    1991-06-01

    This paper addresses issues central to the design and operation of an ultrareliable, Byzantine resilient parallel computer. Interprocessor connectivity requirements are met by treating connectivity as a resource that is shared among many processing elements, allowing flexibility in their configuration and reducing complexity. Redundant groups are synchronized solely by message transmissions and receptions, which aslo provide input data consistency and output voting. Reliability analysis results are presented that demonstrate the reduced failure probability of such a system. Performance analysis results are presented that quantify the temporal overhead involved in executing such fault-tolerance-specific operations. Empirical performance measurements of prototypes of the architecture are presented. 30 refs.

  11. The fault-tolerant multiprocessor computer

    NASA Technical Reports Server (NTRS)

    Smith, T. B., III (Editor); Lala, J. H. (Editor); Goldberg, J. (Editor); Kautz, W. H. (Editor); Melliar-Smith, P. M. (Editor); Green, M. W. (Editor); Levitt, K. N. (Editor); Schwartz, R. L. (Editor); Weinstock, C. B. (Editor); Palumbo, D. L. (Editor)

    1986-01-01

    The development and evaluation of fault-tolerant computer architectures and software-implemented fault tolerance (SIFT) for use in advanced NASA vehicles and potentially in flight-control systems are described in a collection of previously published reports prepared for NASA. Topics addressed include the principles of fault-tolerant multiprocessor (FTMP) operation; processor and slave regional designs; FTMP executive, facilities, acceptance-test/diagnostic, applications, and support software; FTM reliability and availability models; SIFT hardware design; and SIFT validation and verification.

  12. Exact analytic solution for non-linear density fluctuation in a ΛCDM universe

    NASA Astrophysics Data System (ADS)

    Yoo, Jaiyul; Gong, Jinn-Ouk

    2016-07-01

    We derive the exact third-order analytic solution of the matter density fluctuation in the proper-time hypersurface in a ΛCDM universe, accounting for the explicit time-dependence and clarifying the relation to the initial condition. Furthermore, we compare our analytic solution to the previous calculation in the comoving gauge, and to the standard Newtonian perturbation theory by providing Fourier kernels for the relativistic effects. Our results provide an essential ingredient for a complete description of galaxy bias in the relativistic context.

  13. Analytical solutions for non-linear differential equations with the help of a digital computer

    NASA Technical Reports Server (NTRS)

    Cromwell, P. C.

    1964-01-01

    A technique was developed with the help of a digital computer for analytic (algebraic) solutions of autonomous and nonautonomous equations. Two operational transform techniques have been programmed for the solution of these equations. Only relatively simple nonlinear differential equations have been considered. In the cases considered it has been possible to assimilate the secular terms into the solutions. For cases where f(t) is not a bounded function, a direct series solution is developed which can be shown to be an analytic function. All solutions have been checked against results obtained by numerical integration for given initial conditions and constants. It is evident that certain nonlinear differential equations can be solved with the help of a digital computer.

  14. Stress analysis for multilayered coating systems using semi-analytical BEM with geometric non-linearities

    NASA Astrophysics Data System (ADS)

    Zhang, Yao-Ming; Gu, Yan; Chen, Jeng-Tzong

    2011-05-01

    For a long time, most of the current numerical methods, including the finite element method, have not been efficient to analyze stress fields of very thin structures, such as the problems of thin coatings and their interfacial/internal mechanics. In this paper, the boundary element method for 2-D elastostatic problems is studied for the analysis of multi-coating systems. The nearly singular integrals, which is the primary obstacle associated with the BEM formulations, are dealt with efficiently by using a semi-analytical algorithm. The proposed semi-analytical integral formulas, compared with current analytical methods in the BEM literature, are suitable for high-order geometry elements when nearly singular integrals need to be calculated. Owing to the employment of the curved surface elements, only a small number of elements need to be divided along the boundary, and high accuracy can be achieved without increasing more computational efforts. For the test problems studied, very promising results are obtained when the thickness of coated layers is in the orders of 10-6-10-9, which is sufficient for modeling most coated systems in the micro- or nano-scales.

  15. Construction of approximate analytical solutions to a new class of non-linear oscillator equation

    NASA Technical Reports Server (NTRS)

    Mickens, R. E.; Oyedeji, K.

    1985-01-01

    The principle of harmonic balance is invoked in the development of an approximate analytic model for a class of nonlinear oscillators typified by a mass attached to a stretched wire. By assuming that harmonic balance will hold, solutions are devised for a steady state limit cycle and/or limit point motion. A method of slowly varying amplitudes then allows derivation of approximate solutions by determining the form of the exact solutions and substituting into them the lowest order terms of their respective Fourier expansions. The latter technique is actually a generalization of the method proposed by Kryloff and Bogoliuboff (1943).

  16. Finite size and geometrical non-linear effects during crack pinning by heterogeneities: An analytical and experimental study

    NASA Astrophysics Data System (ADS)

    Vasoya, Manish; Unni, Aparna Beena; Leblond, Jean-Baptiste; Lazarus, Veronique; Ponson, Laurent

    2016-04-01

    Crack pinning by heterogeneities is a central toughening mechanism in the failure of brittle materials. So far, most analytical explorations of the crack front deformation arising from spatial variations of fracture properties have been restricted to weak toughness contrasts using first order approximation and to defects of small dimensions with respect to the sample size. In this work, we investigate the non-linear effects arising from larger toughness contrasts by extending the approximation to the second order, while taking into account the finite sample thickness. Our calculations predict the evolution of a planar crack lying on the mid-plane of a plate as a function of material parameters and loading conditions, especially in the case of a single infinitely elongated obstacle. Peeling experiments are presented which validate the approach and evidence that the second order term broadens its range of validity in terms of toughness contrast values. The work highlights the non-linear response of the crack front to strong defects and the central role played by the thickness of the specimen on the pinning process.

  17. Fault-tolerant software - Experiment with the sift operating system. [Software Implemented Fault Tolerance computer

    NASA Technical Reports Server (NTRS)

    Brunelle, J. E.; Eckhardt, D. E., Jr.

    1985-01-01

    Results are presented of an experiment conducted in the NASA Avionics Integrated Research Laboratory (AIRLAB) to investigate the implementation of fault-tolerant software techniques on fault-tolerant computer architectures, in particular the Software Implemented Fault Tolerance (SIFT) computer. The N-version programming and recovery block techniques were implemented on a portion of the SIFT operating system. The results indicate that, to effectively implement fault-tolerant software design techniques, system requirements will be impacted and suggest that retrofitting fault-tolerant software on existing designs will be inefficient and may require system modification.

  18. A Standard Analytical Quasi Non Linear Module For Hydrologic and Water Resources Systems Simulation

    NASA Astrophysics Data System (ADS)

    Ostrowski, M.; Mehler, R.; Lohr, H.; Lempert, M.

    Hydrologic and water resources systems simulation involves an inhomogeneous set of different types of empirical and less (conceptual) or more physically defined dif- ferential equations. In hydrology these equations have been traditionally defined in a way to provide computationally efficient analytical solutions as well as practically applicable models, which are strongly simplified versions of complex reality. This simplifications have often led to the assumption of linear first order differential equa- tions such as the linear reservoir theory or derivatives thereof. It is evident that new approaches are necessary to close the gap between these simplifying assumptions and more realistic of nonlinear differential equations using the increased computer power. Over a period of several years the authors have developed a generic module for the computationally efficient simulation of nonlinear hydrologic/water resources systems. The module is based on the piecewise linearised nonlinear inhomogeneous differential equation of multiple input/output storage modules. Starting from soil moisture simu- lation the approach has been extendedand applied to other processes such as reservoir systems simulations as well as urban drainage systems analysis. The module has been implemented as a standard module into several practically applied complex simulation packages such as Reservoirs System Operation, GIS-based Catchment Modelling and Urban Pollution Load Modelling The scope of the presentation is to - describe the theoretical and practical background of the simulation module - the range of applicability - give validated examples of application

  19. An analytical study of the endoreversible Curzon-Ahlborn cycle for a non-linear heat transfer law

    NASA Astrophysics Data System (ADS)

    Páez-Hernández, Ricardo T.; Portillo-Díaz, Pedro; Ladino-Luna, Delfino; Ramírez-Rojas, Alejandro; Pacheco-Paez, Juan C.

    2016-01-01

    In the present article, an endoreversible Curzon-Ahlborn engine is studied by considering a non-linear heat transfer law, particularly the Dulong-Petit heat transfer law, using the `componendo and dividendo' rule as well as a simple differentiation to obtain the Curzon-Ahlborn efficiency as proposed by Agrawal in 2009. This rule is actually a change of variable that simplifies a two-variable problem to a one-variable problem. From elemental calculus, we obtain an analytical expression of efficiency and the power output. The efficiency is given only in terms of the temperatures of the reservoirs, such as both Carnot and Curzon-Ahlborn cycles. We make a comparison between efficiencies measured in real power plants and theoretical values from analytical expressions obtained in this article and others found in literature from several other authors. This comparison shows that the theoretical values of efficiency are close to real efficiency, and in some cases, they are exactly the same. Therefore, we can say that the Agrawal method is good in calculating thermal engine efficiencies approximately.

  20. Concatenated codes for fault tolerant quantum computing

    SciTech Connect

    Knill, E.; Laflamme, R.; Zurek, W.

    1995-05-01

    The application of concatenated codes to fault tolerant quantum computing is discussed. We have previously shown that for quantum memories and quantum communication, a state can be transmitted with error {epsilon} provided each gate has error at most c{epsilon}. We show how this can be used with Shor`s fault tolerant operations to reduce the accuracy requirements when maintaining states not currently participating in the computation. Viewing Shor`s fault tolerant operations as a method for reducing the error of operations, we give a concatenated implementation which promises to propagate the reduction hierarchically. This has the potential of reducing the accuracy requirements in long computations.

  1. Fault Tolerant Homopolar Magnetic Bearings

    NASA Technical Reports Server (NTRS)

    Li, Ming-Hsiu; Palazzolo, Alan; Kenny, Andrew; Provenza, Andrew; Beach, Raymond; Kascak, Albert

    2003-01-01

    Magnetic suspensions (MS) satisfy the long life and low loss conditions demanded by satellite and ISS based flywheels used for Energy Storage and Attitude Control (ACESE) service. This paper summarizes the development of a novel MS that improves reliability via fault tolerant operation. Specifically, flux coupling between poles of a homopolar magnetic bearing is shown to deliver desired forces even after termination of coil currents to a subset of failed poles . Linear, coordinate decoupled force-voltage relations are also maintained before and after failure by bias linearization. Current distribution matrices (CDM) which adjust the currents and fluxes following a pole set failure are determined for many faulted pole combinations. The CDM s and the system responses are obtained utilizing 1D magnetic circuit models with fringe and leakage factors derived from detailed, 3D, finite element field models. Reliability results are presented vs. detection/correction delay time and individual power amplifier reliability for 4, 6, and 7 pole configurations. Reliability is shown for two success criteria, i.e. (a) no catcher bearing contact following pole failures and (b) re-levitation off of the catcher bearings following pole failures. An advantage of the method presented over other redundant operation approaches is a significantly reduced requirement for backup hardware such as additional actuators or power amplifiers.

  2. Fault-tolerant PACS server

    NASA Astrophysics Data System (ADS)

    Cao, Fei; Liu, Brent J.; Huang, H. K.; Zhou, Michael Z.; Zhang, Jianguo; Zhang, X. C.; Mogel, Greg T.

    2002-05-01

    Failure of a PACS archive server could cripple an entire PACS operation. Last year we demonstrated that it was possible to design a fault-tolerant (FT) server with 99.999% uptime. The FT design was based on a triple modular redundancy with a simple majority vote to automatically detect and mask a faulty module. The purpose of this presentation is to report on its continuous developments in integrating with external mass storage devices, and to delineate laboratory failover experiments. An FT PACS Simulator with generic PACS software has been used in the experiment. To simulate a PACS clinical operation, image examinations are transmitted continuously from the modality simulator to the DICOM gateway and then to the FT PACS server and workstations. The hardware failures in network, FT server module, disk, RAID, and DLT are manually induced to observe the failover recovery of the FT PACS to resume its normal data flow. We then test and evaluate the FT PACS server in its reliability, functionality, and performance.

  3. Advanced information processing system: The Army fault tolerant architecture conceptual study. Volume 2: Army fault tolerant architecture design and analysis

    NASA Technical Reports Server (NTRS)

    Harper, R. E.; Alger, L. S.; Babikyan, C. A.; Butler, B. P.; Friend, S. A.; Ganska, R. J.; Lala, J. H.; Masotto, T. K.; Meyer, A. J.; Morton, D. P.

    1992-01-01

    Described here is the Army Fault Tolerant Architecture (AFTA) hardware architecture and components and the operating system. The architectural and operational theory of the AFTA Fault Tolerant Data Bus is discussed. The test and maintenance strategy developed for use in fielded AFTA installations is presented. An approach to be used in reducing the probability of AFTA failure due to common mode faults is described. Analytical models for AFTA performance, reliability, availability, life cycle cost, weight, power, and volume are developed. An approach is presented for using VHSIC Hardware Description Language (VHDL) to describe and design AFTA's developmental hardware. A plan is described for verifying and validating key AFTA concepts during the Dem/Val phase. Analytical models and partial mission requirements are used to generate AFTA configurations for the TF/TA/NOE and Ground Vehicle missions.

  4. Study of fault-tolerant software technology

    NASA Technical Reports Server (NTRS)

    Slivinski, T.; Broglio, C.; Wild, C.; Goldberg, J.; Levitt, K.; Hitt, E.; Webb, J.

    1984-01-01

    Presented is an overview of the current state of the art of fault-tolerant software and an analysis of quantitative techniques and models developed to assess its impact. It examines research efforts as well as experience gained from commercial application of these techniques. The paper also addresses the computer architecture and design implications on hardware, operating systems and programming languages (including Ada) of using fault-tolerant software in real-time aerospace applications. It concludes that fault-tolerant software has progressed beyond the pure research state. The paper also finds that, although not perfectly matched, newer architectural and language capabilities provide many of the notations and functions needed to effectively and efficiently implement software fault-tolerance.

  5. Fault-tolerant communication channel structures

    NASA Technical Reports Server (NTRS)

    Alkalai, Leon (Inventor); Chau, Savio N. (Inventor); Tai, Ann T. (Inventor)

    2006-01-01

    Systems and techniques for implementing fault-tolerant communication channels and features in communication systems. Selected commercial-off-the-shelf devices can be integrated in such systems to reduce the cost.

  6. Magnetic levitation-based electromagnetic energy harvesting: a semi-analytical non-linear model for energy transduction

    PubMed Central

    Soares dos Santos, Marco P.; Ferreira, Jorge A. F.; Simões, José A. O.; Pascoal, Ricardo; Torrão, João; Xue, Xiaozheng; Furlani, Edward P.

    2016-01-01

    Magnetic levitation has been used to implement low-cost and maintenance-free electromagnetic energy harvesting. The ability of levitation-based harvesting systems to operate autonomously for long periods of time makes them well-suited for self-powering a broad range of technologies. In this paper, a combined theoretical and experimental study is presented of a harvester configuration that utilizes the motion of a levitated hard-magnetic element to generate electrical power. A semi-analytical, non-linear model is introduced that enables accurate and efficient analysis of energy transduction. The model predicts the transient and steady-state response of the harvester a function of its motion (amplitude and frequency) and load impedance. Very good agreement is obtained between simulation and experiment with energy errors lower than 14.15% (mean absolute percentage error of 6.02%) and cross-correlations higher than 86%. The model provides unique insight into fundamental mechanisms of energy transduction and enables the geometric optimization of harvesters prior to fabrication and the rational design of intelligent energy harvesters. PMID:26725842

  7. Magnetic levitation-based electromagnetic energy harvesting: a semi-analytical non-linear model for energy transduction

    NASA Astrophysics Data System (ADS)

    Soares Dos Santos, Marco P.; Ferreira, Jorge A. F.; Simões, José A. O.; Pascoal, Ricardo; Torrão, João; Xue, Xiaozheng; Furlani, Edward P.

    2016-01-01

    Magnetic levitation has been used to implement low-cost and maintenance-free electromagnetic energy harvesting. The ability of levitation-based harvesting systems to operate autonomously for long periods of time makes them well-suited for self-powering a broad range of technologies. In this paper, a combined theoretical and experimental study is presented of a harvester configuration that utilizes the motion of a levitated hard-magnetic element to generate electrical power. A semi-analytical, non-linear model is introduced that enables accurate and efficient analysis of energy transduction. The model predicts the transient and steady-state response of the harvester a function of its motion (amplitude and frequency) and load impedance. Very good agreement is obtained between simulation and experiment with energy errors lower than 14.15% (mean absolute percentage error of 6.02%) and cross-correlations higher than 86%. The model provides unique insight into fundamental mechanisms of energy transduction and enables the geometric optimization of harvesters prior to fabrication and the rational design of intelligent energy harvesters.

  8. Magnetic levitation-based electromagnetic energy harvesting: a semi-analytical non-linear model for energy transduction.

    PubMed

    Soares Dos Santos, Marco P; Ferreira, Jorge A F; Simões, José A O; Pascoal, Ricardo; Torrão, João; Xue, Xiaozheng; Furlani, Edward P

    2016-01-01

    Magnetic levitation has been used to implement low-cost and maintenance-free electromagnetic energy harvesting. The ability of levitation-based harvesting systems to operate autonomously for long periods of time makes them well-suited for self-powering a broad range of technologies. In this paper, a combined theoretical and experimental study is presented of a harvester configuration that utilizes the motion of a levitated hard-magnetic element to generate electrical power. A semi-analytical, non-linear model is introduced that enables accurate and efficient analysis of energy transduction. The model predicts the transient and steady-state response of the harvester a function of its motion (amplitude and frequency) and load impedance. Very good agreement is obtained between simulation and experiment with energy errors lower than 14.15% (mean absolute percentage error of 6.02%) and cross-correlations higher than 86%. The model provides unique insight into fundamental mechanisms of energy transduction and enables the geometric optimization of harvesters prior to fabrication and the rational design of intelligent energy harvesters. PMID:26725842

  9. Modeling the Fault Tolerant Capability of a Flight Control System: An Exercise in SCR Specification

    NASA Technical Reports Server (NTRS)

    Alexander, Chris; Cortellessa, Vittorio; DelGobbo, Diego; Mili, Ali; Napolitano, Marcello

    2000-01-01

    In life-critical and mission-critical applications, it is important to make provisions for a wide range of contingencies, by providing means for fault tolerance. In this paper, we discuss the specification of a flight control system that is fault tolerant with respect to sensor faults. Redundancy is provided by analytical relations that hold between sensor readings; depending on the conditions, this redundancy can be used to detect, identify and accommodate sensor faults.

  10. Reconfigurable fault tolerant avionics system

    NASA Astrophysics Data System (ADS)

    Ibrahim, M. M.; Asami, K.; Cho, Mengu

    This paper presents the design of a reconfigurable avionics system based on modern Static Random Access Memory (SRAM)-based Field Programmable Gate Array (FPGA) to be used in future generations of nano satellites. A major concern in satellite systems and especially nano satellites is to build robust systems with low-power consumption profiles. The system is designed to be flexible by providing the capability of reconfiguring itself based on its orbital position. As Single Event Upsets (SEU) do not have the same severity and intensity in all orbital locations, having the maximum at the South Atlantic Anomaly (SAA) and the polar cusps, the system does not have to be fully protected all the time in its orbit. An acceptable level of protection against high-energy cosmic rays and charged particles roaming in space is provided within the majority of the orbit through software fault tolerance. Check pointing and roll back, besides control flow assertions, is used for that level of protection. In the minority part of the orbit where severe SEUs are expected to exist, a reconfiguration for the system FPGA is initiated where the processor systems are triplicated and protection through Triple Modular Redundancy (TMR) with feedback is provided. This technique of reconfiguring the system as per the level of the threat expected from SEU-induced faults helps in reducing the average dynamic power consumption of the system to one-third of its maximum. This technique can be viewed as a smart protection through system reconfiguration. The system is built on the commercial version of the (XC5VLX50) Xilinx Virtex5 FPGA on bulk silicon with 324 IO. Simulations of orbit SEU rates were carried out using the SPENVIS web-based software package.

  11. A Fault Tolerant System for an Integrated Avionics Sensor Configuration

    NASA Technical Reports Server (NTRS)

    Caglayan, A. K.; Lancraft, R. E.

    1984-01-01

    An aircraft sensor fault tolerant system methodology for the Transport Systems Research Vehicle in a Microwave Landing System (MLS) environment is described. The fault tolerant system provides reliable estimates in the presence of possible failures both in ground-based navigation aids, and in on-board flight control and inertial sensors. Sensor failures are identified by utilizing the analytic relationships between the various sensors arising from the aircraft point mass equations of motion. The estimation and failure detection performance of the software implementation (called FINDS) of the developed system was analyzed on a nonlinear digital simulation of the research aircraft. Simulation results showing the detection performance of FINDS, using a dual redundant sensor compliment, are presented for bias, hardover, null, ramp, increased noise and scale factor failures. In general, the results show that FINDS can distinguish between normal operating sensor errors and failures while providing an excellent detection speed for bias failures in the MLS, indicated airspeed, attitude and radar altimeter sensors.

  12. Reliability of Fault Tolerant Control Systems. Part 2

    NASA Technical Reports Server (NTRS)

    Wu, N. Eva

    2000-01-01

    This paper reports Part II of a two part effort that is intended to delineate the relationship between reliability and fault tolerant control in a quantitative manner. Reliability properties peculiar to fault-tolerant control systems are emphasized, such as the presence of analytic redundancy in high proportion, the dependence of failures on control performance, and high risks associated with decisions in redundancy management due to multiple sources of uncertainties and sometimes large processing requirements. As a consequence, coverage of failures through redundancy management can be severely limited. The paper proposes to formulate the fault tolerant control problem as an optimization problem that maximizes coverage of failures through redundancy management. Coverage modeling is attempted in a way that captures its dependence on the control performance and on the diagnostic resolution. Under the proposed redundancy management policy, it is shown that an enhanced overall system reliability can be achieved with a control law of a superior robustness, with an estimator of a higher resolution, and with a control performance requirement of a lesser stringency.

  13. Software fault tolerance in computer operating systems

    NASA Technical Reports Server (NTRS)

    Iyer, Ravishankar K.; Lee, Inhwan

    1994-01-01

    This chapter provides data and analysis of the dependability and fault tolerance for three operating systems: the Tandem/GUARDIAN fault-tolerant system, the VAX/VMS distributed system, and the IBM/MVS system. Based on measurements from these systems, basic software error characteristics are investigated. Fault tolerance in operating systems resulting from the use of process pairs and recovery routines is evaluated. Two levels of models are developed to analyze error and recovery processes inside an operating system and interactions among multiple instances of an operating system running in a distributed environment. The measurements show that the use of process pairs in Tandem systems, which was originally intended for tolerating hardware faults, allows the system to tolerate about 70% of defects in system software that result in processor failures. The loose coupling between processors which results in the backup execution (the processor state and the sequence of events occurring) being different from the original execution is a major reason for the measured software fault tolerance. The IBM/MVS system fault tolerance almost doubles when recovery routines are provided, in comparison to the case in which no recovery routines are available. However, even when recovery routines are provided, there is almost a 50% chance of system failure when critical system jobs are involved.

  14. Analysis of fault-tolerant neurocontrol architectures

    NASA Technical Reports Server (NTRS)

    Troudet, T.; Merrill, W.

    1992-01-01

    The fault-tolerance of analog parallel distributed implementations of a multivariable aircraft neurocontroller is analyzed by simulating weight and neuron failures in a simplified scheme of analog processing based on the functional architecture of the ETANN chip (Electrically Trainable Artificial Neural Network). The neural information processing is found to be only partially distributed throughout the set of weights of the neurocontroller synthesized with the backpropagation algorithm. Although the degree of distribution of the neural processing, and consequently the fault-tolerance of the neurocontroller, could be enhanced using Locally Distributed Weight and Neuron Approaches, a satisfactory level of fault-tolerance could only be obtained by retraining the degrated VLSI neurocontroller. The possibility of maintaining neurocontrol performance and stability in the presence of single weight of neuron failures was demonstrated through an automated retraining procedure of the neurocontroller based on a pre-programmed choice and sequence of the training parameters.

  15. Fault-tolerant dynamic task graph scheduling

    SciTech Connect

    Kurt, Mehmet C.; Krishnamoorthy, Sriram; Agrawal, Kunal; Agrawal, Gagan

    2014-11-16

    In this paper, we present an approach to fault tolerant execution of dynamic task graphs scheduled using work stealing. In particular, we focus on selective and localized recovery of tasks in the presence of soft faults. We elicit from the user the basic task graph structure in terms of successor and predecessor relationships. The work stealing-based algorithm to schedule such a task graph is augmented to enable recovery when the data and meta-data associated with a task get corrupted. We use this redundancy, and the knowledge of the task graph structure, to selectively recover from faults with low space and time overheads. We show that the fault tolerant design retains the essential properties of the underlying work stealing-based task scheduling algorithm, and that the fault tolerant execution is asymptotically optimal when task re-execution is taken into account. Experimental evaluation demonstrates the low cost of recovery under various fault scenarios.

  16. Experiments in fault tolerant software reliability

    NASA Technical Reports Server (NTRS)

    Mcallister, David F.; Tai, K. C.; Vouk, Mladen A.

    1987-01-01

    The reliability of voting was evaluated in a fault-tolerant software system for small output spaces. The effectiveness of the back-to-back testing process was investigated. Version 3.0 of the RSDIMU-ATS, a semi-automated test bed for certification testing of RSDIMU software, was prepared and distributed. Software reliability estimation methods based on non-random sampling are being studied. The investigation of existing fault-tolerance models was continued and formulation of new models was initiated.

  17. The cost of software fault tolerance

    NASA Technical Reports Server (NTRS)

    Migneault, G. E.

    1982-01-01

    The proposed use of software fault tolerance techniques as a means of reducing software costs in avionics and as a means of addressing the issue of system unreliability due to faults in software is examined. A model is developed to provide a view of the relationships among cost, redundancy, and reliability which suggests strategies for software development and maintenance which are not conventional.

  18. Towards fault-tolerant optimal control

    NASA Technical Reports Server (NTRS)

    Chizeck, H. J.; Willsky, A. S.

    1979-01-01

    The paper considers the design of fault-tolerant controllers that may endow systems with dynamic reliability. Results for jump linear quadratic Gaussian control problems are extended to include random jump costs, trajectory discontinuities, and a simple case of non-Markovian mode transitions.

  19. A methodology for testing fault-tolerant software

    NASA Technical Reports Server (NTRS)

    Andrews, D. M.; Mahmood, A.; Mccluskey, E. J.

    1985-01-01

    A methodology for testing fault tolerant software is presented. There are problems associated with testing fault tolerant software because many errors are masked or corrected by voters, limiter, or automatic channel synchronization. This methodology illustrates how the same strategies used for testing fault tolerant hardware can be applied to testing fault tolerant software. For example, one strategy used in testing fault tolerant hardware is to disable the redundancy during testing. A similar testing strategy is proposed for software, namely, to move the major emphasis on testing earlier in the development cycle (before the redundancy is in place) thus reducing the possibility that undetected errors will be masked when limiters and voters are added.

  20. The MAFT architecture for distributed fault tolerance

    SciTech Connect

    Kieckhafer, R.M.; Walter, C.J.; Finn, A.M.; Thambidurai, P.M.

    1988-04-01

    This paper describes the Multicomputer Architecture for Fault-Tolerance (MAFT), a distributed system designed to provide extremely reliable computation in real-time control systems. MAFT is based on the physical and functional partitioning of executive functions from application functions. The implementation of the executive functions in a special-purpose hardware processor allows the fault-tolerance functions to be transparent to the application programs and minimizes overhead. Byzantine Agreement and Approximate Agreement algorithms are employed for critical system parameters. MAFT supports the use of multiversion hardware and software to tolerate built-in or generic faults. Graceful degradation and restoration of the application workload is permitted in response to the exclusion and readmission of nodes, respectively.

  1. Fault tolerant GPS/Inertial System design

    NASA Astrophysics Data System (ADS)

    Brown, Alison K.; Sturza, Mark A.; Deangelis, Franco; Lukaszewski, David A.

    The use of a GPS/Inertial integrated system in future launch vehicles motivates the described design of the present fault-tolerant system. The robustness of the navigation system is enhanced by integrating the GPS with an inertial fault-tolerant system. Three layers of failure detection and isolation are incorporated to determine the nature of flaws in the inertial instruments, the GPS receivers, or the integrated navigation solution. The layers are based on: (1) a high-rate parity algorithm for instrument failures; (2) a similar parity algorithm for GPS satellite or receiver failures; and (3) a GPS navigation solution to monitor inertial navigation failures. Dual failures of any system component can occur in any system component without affecting the performance of launch-vehicle navigation or guidance.

  2. Performance Analysis on Fault Tolerant Control System

    NASA Technical Reports Server (NTRS)

    Shin, Jong-Yeob; Belcastro, Christine

    2005-01-01

    In a fault tolerant control (FTC) system, a parameter varying FTC law is reconfigured based on fault parameters estimated by fault detection and isolation (FDI) modules. FDI modules require some time to detect fault occurrences in aero-vehicle dynamics. In this paper, an FTC analysis framework is provided to calculate the upper bound of an induced-L(sub 2) norm of an FTC system with existence of false identification and detection time delay. The upper bound is written as a function of a fault detection time and exponential decay rates and has been used to determine which FTC law produces less performance degradation (tracking error) due to false identification. The analysis framework is applied for an FTC system of a HiMAT (Highly Maneuverable Aircraft Technology) vehicle. Index Terms fault tolerant control system, linear parameter varying system, HiMAT vehicle.

  3. Fault-tolerant electrical power system

    NASA Astrophysics Data System (ADS)

    Mehdi, Ishaque S.; Weimer, Joseph A.

    1987-10-01

    An electrical system that will meet the requirements of a 1990s two-engine fighter is being developed in the Fault-Tolerant Electrical Power System (FTEPS) program, sponsored by the AFWAL Aero Propulsion Laboratory. FTEPS will demonstrate the generation and distribution of fault-tolerant, reliable, electrical power required for future aircraft. The system incorporates MIL-STD-1750A digital processors and MIL-STD-1553B data buses for control and communications. Electrical power is distributed through electrical load management centers by means of solid-state power controllers for fault protection and individual load control. The system will provide uninterruptible power to flight-critical loads such as the flight control and mission computers with sealed lead-acid batteries. Primary power is provided by four 60 kVA variable speed constant frequency generators. Buildup and testing of the FTEPS demonstrator is expected to be complete by May 1988.

  4. Parametric Modeling and Fault Tolerant Control

    NASA Technical Reports Server (NTRS)

    Wu, N. Eva; Ju, Jianhong

    2000-01-01

    Fault tolerant control is considered for a nonlinear aircraft model expressed as a linear parameter-varying system. By proper parameterization of foreseeable faults, the linear parameter-varying system can include fault effects as additional varying parameters. A recently developed technique in fault effect parameter estimation allows us to assume that estimates of the fault effect parameters are available on-line. Reconfigurability is calculated for this model with respect to the loss of control effectiveness to assess the potentiality of the model to tolerate such losses prior to control design. The control design is carried out by applying a polytopic method to the aircraft model. An error bound on fault effect parameter estimation is provided, within which the Lyapunov stability of the closed-loop system is robust. Our simulation results show that as long as the fault parameter estimates are sufficiently accurate, the polytopic controller can provide satisfactory fault-tolerance.

  5. A Unified Fault-Tolerance Protocol

    NASA Technical Reports Server (NTRS)

    Miner, Paul; Gedser, Alfons; Pike, Lee; Maddalon, Jeffrey

    2004-01-01

    Davies and Wakerly show that Byzantine fault tolerance can be achieved by a cascade of broadcasts and middle value select functions. We present an extension of the Davies and Wakerly protocol, the unified protocol, and its proof of correctness. We prove that it satisfies validity and agreement properties for communication of exact values. We then introduce bounded communication error into the model. Inexact communication is inherent for clock synchronization protocols. We prove that validity and agreement properties hold for inexact communication, and that exact communication is a special case. As a running example, we illustrate the unified protocol using the SPIDER family of fault-tolerant architectures. In particular we demonstrate that the SPIDER interactive consistency, distributed diagnosis, and clock synchronization protocols are instances of the unified protocol.

  6. A Primer on Architectural Level Fault Tolerance

    NASA Technical Reports Server (NTRS)

    Butler, Ricky W.

    2008-01-01

    This paper introduces the fundamental concepts of fault tolerant computing. Key topics covered are voting, fault detection, clock synchronization, Byzantine Agreement, diagnosis, and reliability analysis. Low level mechanisms such as Hamming codes or low level communications protocols are not covered. The paper is tutorial in nature and does not cover any topic in detail. The focus is on rationale and approach rather than detailed exposition.

  7. A dual, fault-tolerant aerospace actuator

    NASA Technical Reports Server (NTRS)

    Siebert, C. J.

    1985-01-01

    The requirements for mechanisms used in the Space Transportation System (STS) are to provide dual fault tolerance, and if the payload equipment violates the Shuttle bay door envelope, these deployment/restow mechanisms must have independent primary and backup features. The research and development of an electromechanical actuator that meets these requirements and will be used on the Transfer Orbit Stage (TOS) program is described.

  8. Fault-tolerant architectures for superconducting qubits

    NASA Astrophysics Data System (ADS)

    DiVincenzo, David P.

    2009-12-01

    In this short review, I draw attention to new developments in the theory of fault tolerance in quantum computation that may give concrete direction to future work in the development of superconducting qubit systems. The basics of quantum error-correction codes, which I will briefly review, have not significantly changed since their introduction 15 years ago. But an interesting picture has emerged of an efficient use of these codes that may put fault-tolerant operation within reach. It is now understood that two-dimensional surface codes, close relatives of the original toric code of Kitaev, can be adapted as shown by Raussendorf and Harrington to effectively perform logical gate operations in a very simple planar architecture, with error thresholds for fault-tolerant operation simulated to be 0.75%. This architecture uses topological ideas in its functioning, but it is not 'topological quantum computation'—there are no non-abelian anyons in sight. I offer some speculations on the crucial pieces of superconducting hardware that could be demonstrated in the next couple of years that would be clear stepping stones towards this surface-code architecture.

  9. Interstitial fault tolerance-a technique for making systolic arrays fault tolerant

    SciTech Connect

    Kuhn, R.H.

    1983-01-01

    Systolic arrays are a popular model for the implementation of highly parallel VLSI systems. In this paper interstitial fault tolerance (IFT), a technique for incorporating fault tolerance into systolic arrays in a natural manner, is discussed. IFT can be used for reliable computation or for yield enhancement. Previous fault tolerance techniques for reliable computation on SIMD systems have employed redundant hardware. IFT on the other hand employs time redundancy. Previous wafer scale integration techniques for yield enhancement have been proposed only for linear processing element arrays. Ift is effective for both linear and two dimensional arrays. The time redundancy to achieve IFT is shown to be bounded by a factor of 3, allowing no processor redundancy. Results of monte carlo simulation of ift are presented. 19 references.

  10. An accurate analytic approximation to the non-linear change in volume of solids with applied pressure

    NASA Technical Reports Server (NTRS)

    Schlosser, Herbert; Ferrante, John

    1989-01-01

    An accurate analytic expression for the nonlinear change of the volume of a solid as a function of applied pressure is of great interest in high-pressure experimentation. It is found that a two-parameter analytic expression, fits the experimental volume-change data to within a few percent over the entire experimentally attainable pressure range. Results are presented for 24 different materials including metals, ceramic semiconductors, polymers, and ionic and rare-gas solids.

  11. Fault-Tolerant Coding for State Machines

    NASA Technical Reports Server (NTRS)

    Naegle, Stephanie Taft; Burke, Gary; Newell, Michael

    2008-01-01

    Two reliable fault-tolerant coding schemes have been proposed for state machines that are used in field-programmable gate arrays and application-specific integrated circuits to implement sequential logic functions. The schemes apply to strings of bits in state registers, which are typically implemented in practice as assemblies of flip-flop circuits. If a single-event upset (SEU, a radiation-induced change in the bit in one flip-flop) occurs in a state register, the state machine that contains the register could go into an erroneous state or could hang, by which is meant that the machine could remain in undefined states indefinitely. The proposed fault-tolerant coding schemes are intended to prevent the state machine from going into an erroneous or hang state when an SEU occurs. To ensure reliability of the state machine, the coding scheme for bits in the state register must satisfy the following criteria: 1. All possible states are defined. 2. An SEU brings the state machine to a known state. 3. There is no possibility of a hang state. 4. No false state is entered. 5. An SEU exerts no effect on the state machine. Fault-tolerant coding schemes that have been commonly used include binary encoding and "one-hot" encoding. Binary encoding is the simplest state machine encoding and satisfies criteria 1 through 3 if all possible states are defined. Binary encoding is a binary count of the state machine number in sequence; the table represents an eight-state example. In one-hot encoding, N bits are used to represent N states: All except one of the bits in a string are 0, and the position of the 1 in the string represents the state. With proper circuit design, one-hot encoding can satisfy criteria 1 through 4. Unfortunately, the requirement to use N bits to represent N states makes one-hot coding inefficient.

  12. Fault tolerant massively parallel processing architecture

    SciTech Connect

    Balasubramanian, V.; Banerjee, P.

    1987-08-01

    This paper presents two massively parallel processing architectures suitable for solving a wide variety of algorithms of divide-and-conquer type for problems such as the discrete Fourier transform, production systems, design automation, and others. The first architecture, called the Chain-structured Butterfly ARchitecture (CBAR), consists of a two-dimensional array of N-L . (log/sub 2/(L)+1) processing elements (PE) organized as L levels of log/sub 2/(L)+1 stages, and which has the butterfly connection between PEs in consecutive stages with straight-through feedback between PEs in the last and first stages. This connection system has the desirable property of allowing thousands of PEs to be connected with O(N) connection cost, O(log/sub 2/(N/log/sub 2/N)) communication paths, and a small number (=4) of I/O ports per PE. However, this architecture is not fault tolerant. The authors, therefore, propose a second architecture, called the REconfigurable Chain-structured Butterfly ARchitecture (RECBAR), which is a modified version of the CBAR. The RECBAR possesses all the desirable features of the CBAR, with the number of I/O ports per PE increased to six, and uses O(log/sub 2/N)/N) overhead in PEs and approximately 50% overhead in links to achieve single-level fault tolerance. Reliability improvements of the RECBAR over the CBAR are studied. This paper also presents a distributed diagnostic and structuring algorithm for the RECBAR that enables the architecture to detect faults and structure itself accordingly within 2 . log/sub 2/(L)+1 time steps, thus making it a truly fault tolerant architecture.

  13. A fault tolerant 80960 engine controller

    NASA Technical Reports Server (NTRS)

    Reichmuth, D. M.; Gage, M. L.; Paterson, E. S.; Kramer, D. D.

    1993-01-01

    The paper describes the design of the 80960 Fault Tolerant Engine Controller for the supervision of engine operations, which was designed for the NASA Marshall Space Center. Consideration is given to the major electronic components of the controller, including the engine controller, effectors, and the sensors, as well as to the controller hardware, the controller module and the communications module, and the controller software. The architecture of the controller hardware allows modifications to be made to fit the requirements of any new propulsion systems. Multiple flow diagrams are presented illustrating the controller's operations.

  14. Software fault tolerance using data diversity

    NASA Technical Reports Server (NTRS)

    Knight, John C.

    1991-01-01

    Research on data diversity is discussed. Data diversity relies on a different form of redundancy from existing approaches to software fault tolerance and is substantially less expensive to implement. Data diversity can also be applied to software testing and greatly facilitates the automation of testing. Up to now it has been explored both theoretically and in a pilot study, and has been shown to be a promising technique. The effectiveness of data diversity as an error detection mechanism and the application of data diversity to differential equation solvers are discussed.

  15. FTAPE: A fault injection tool to measure fault tolerance

    NASA Astrophysics Data System (ADS)

    Tsai, Timothy K.; Iyer, Ravishankar K.

    1994-07-01

    The paper introduces FTAPE (Fault Tolerance And Performance Evaluator), a tool that can be used to compare fault-tolerant computers. The tool combines system-wide fault injection with a controllable workload. A workload generator is used to create high stress conditions for the machine. Faults are injected based on this workload activity in order to ensure a high level of fault propagation. The errors/fault ratio and performance degradation are presented as measures of fault tolerance.

  16. FTAPE: A fault injection tool to measure fault tolerance

    NASA Technical Reports Server (NTRS)

    Tsai, Timothy K.; Iyer, Ravishankar K.

    1994-01-01

    The paper introduces FTAPE (Fault Tolerance And Performance Evaluator), a tool that can be used to compare fault-tolerant computers. The tool combines system-wide fault injection with a controllable workload. A workload generator is used to create high stress conditions for the machine. Faults are injected based on this workload activity in order to ensure a high level of fault propagation. The errors/fault ratio and performance degradation are presented as measures of fault tolerance.

  17. Method and system for environmentally adaptive fault tolerant computing

    NASA Technical Reports Server (NTRS)

    Copenhaver, Jason L. (Inventor); Jeremy, Ramos (Inventor); Wolfe, Jeffrey M. (Inventor); Brenner, Dean (Inventor)

    2010-01-01

    A method and system for adapting fault tolerant computing. The method includes the steps of measuring an environmental condition representative of an environment. An on-board processing system's sensitivity to the measured environmental condition is measured. It is determined whether to reconfigure a fault tolerance of the on-board processing system based in part on the measured environmental condition. The fault tolerance of the on-board processing system may be reconfigured based in part on the measured environmental condition.

  18. Fabrication of fault-tolerant systolic array processors

    SciTech Connect

    Golovko, V.A.

    1995-05-01

    Methods for designing fault-tolerant systolic array processors are discussed. Several ways of bypassing faulty elements in configurations, which depend on an input-data flow organization, are suggested. An analysis of the additional hardware costs of providing fault tolerance by various techniques and for various levels of redundancy is presented. Hadamard fault-tolerant processor design was used to illustrate the efficiency of the techniques suggested.

  19. FTAPE: A fault injection tool to measure fault tolerance

    NASA Technical Reports Server (NTRS)

    Tsai, Timothy K.; Iyer, Ravishankar K.

    1995-01-01

    The paper introduces FTAPE (Fault Tolerance And Performance Evaluator), a tool that can be used to compare fault-tolerant computers. The tool combines system-wide fault injection with a controllable workload. A workload generator is used to create high stress conditions for the machine. Faults are injected based on this workload activity in order to ensure a high level of fault propagation. The errors/fault ratio and performance degradation are presented as measures of fault tolerance.

  20. Coordinated Fault Tolerance for High-Performance Computing

    SciTech Connect

    Dongarra, Jack; Bosilca, George; et al.

    2013-04-08

    Our work to meet our goal of end-to-end fault tolerance has focused on two areas: (1) improving fault tolerance in various software currently available and widely used throughout the HEC domain and (2) using fault information exchange and coordination to achieve holistic, systemwide fault tolerance and understanding how to design and implement interfaces for integrating fault tolerance features for multiple layers of the software stack—from the application, math libraries, and programming language runtime to other common system software such as jobs schedulers, resource managers, and monitoring tools.

  1. Fault Tolerant Magnetic Bearing for Turbomachinery

    NASA Technical Reports Server (NTRS)

    Choi, Benjamin; Provenza, Andrew

    2001-01-01

    NASA Glenn Research Center (GRC) has developed a Fault-Tolerant Magnetic Bearing Suspension rig to enhance the bearing system safety. It successfully demonstrated that using only two active poles out of eight redundant poles from each radial bearing (that is, simply 12 out of 16 poles dead) levitated the rotor and spun it without losing stability and desired position up to the maximum allowable speed of 20,000 rpm. In this paper, it is demonstrated that as far as the summation of force vectors of the attracting poles and rotor weight is zero, a fault-tolerant magnetic bearing system maintained the rotor at the desired position without losing stability even at the maximum rotor speed. A proportional-integral-derivative (PID) controller generated autonomous corrective actions with no operator's input for the fault situations without losing load capacity in terms of rotor position. This paper also deals with a centralized modal controller to better control the dynamic behavior over system modes.

  2. Fault detection and fault tolerance in robotics

    NASA Technical Reports Server (NTRS)

    Visinsky, Monica; Walker, Ian D.; Cavallaro, Joseph R.

    1992-01-01

    Robots are used in inaccessible or hazardous environments in order to alleviate some of the time, cost and risk involved in preparing men to endure these conditions. In order to perform their expected tasks, the robots are often quite complex, thus increasing their potential for failures. If men must be sent into these environments to repair each component failure in the robot, the advantages of using the robot are quickly lost. Fault tolerant robots are needed which can effectively cope with failures and continue their tasks until repairs can be realistically scheduled. Before fault tolerant capabilities can be created, methods of detecting and pinpointing failures must be perfected. This paper develops a basic fault tree analysis of a robot in order to obtain a better understanding of where failures can occur and how they contribute to other failures in the robot. The resulting failure flow chart can also be used to analyze the resiliency of the robot in the presence of specific faults. By simulating robot failures and fault detection schemes, the problems involved in detecting failures for robots are explored in more depth.

  3. Fault tolerance analysis and applications to microwave modules and MMIC's

    NASA Astrophysics Data System (ADS)

    Boggan, Garry H.

    A project whose objective was to provide an overview of built-in-test (BIT) considerations applicable to microwave systems, modules, and MMICs (monolithic microwave integrated circuits) is discussed. Available analytical techniques and software for assessing system failure characteristics were researched, and the resulting investigation provides a review of two techniques which have applicability to microwave systems design. A system-level approach to fault tolerance and redundancy management is presented in its relationship to the subsystem/element design. An overview of the microwave BIT focus from the Air Force Integrated Diagnostics program is presented. The technical reports prepared by the GIMADS team were reviewed for applicability to microwave modules and components. A review of MIMIC (millimeter and microwave integrated circuit) program activities relative to BIT/BITE is given.

  4. Architectural issues in fault-tolerant, secure computing systems

    SciTech Connect

    Joseph, M.K.

    1988-01-01

    This dissertation explores several facets of the applicability of fault-tolerance techniques to secure computer design, these being: (1) how fault-tolerance techniques can be used on unsolved problems in computer security (e.g., computer viruses, and denial-of-service); (2) how fault-tolerance techniques can be used to support classical computer-security mechanisms in the presence of accidental and deliberate faults; and (3) the problems involved in designing a fault-tolerant, secure computer system (e.g., how computer security can degrade along with both the computational and fault-tolerance capabilities of a computer system). The approach taken in this research is almost as important as its results. It is different from current computer-security research in that a design paradigm for fault-tolerant computer design is used. This led to an extensive fault and error classification of many typical security threats. Throughout this work, a fault-tolerance perspective is taken. However, the author did not ignore basic computer-security technology. For some problems he investigated how to support and extend basic-security mechanism (e.g., trusted computing base), instead of trying to achieve the same result with purely fault-tolerance techniques.

  5. Fault tree models for fault tolerant hypercube multiprocessors

    NASA Technical Reports Server (NTRS)

    Boyd, Mark A.; Tuazon, Jezus O.

    1991-01-01

    Three candidate fault tolerant hypercube architectures are modeled, their reliability analyses are compared, and the resulting implications of these methods of incorporating fault tolerance into hypercube multiprocessors are discussed. In the course of performing the reliability analyses, the use of HARP and fault trees in modeling sequence dependent system behaviors is demonstrated.

  6. Method and apparatus for fault tolerance

    NASA Technical Reports Server (NTRS)

    Masson, Gerald M. (Inventor); Sullivan, Gregory F. (Inventor)

    1993-01-01

    A method and apparatus for achieving fault tolerance in a computer system having at least a first central processing unit and a second central processing unit. The method comprises the steps of first executing a first algorithm in the first central processing unit on input which produces a first output as well as a certification trail. Next, executing a second algorithm in the second central processing unit on the input and on at least a portion of the certification trail which produces a second output. The second algorithm has a faster execution time than the first algorithm for a given input. Then, comparing the first and second outputs such that an error result is produced if the first and second outputs are not the same. The step of executing a first algorithm and the step of executing a second algorithm preferably takes place over essentially the same time period.

  7. Fault-tolerant software for the FIMP

    NASA Technical Reports Server (NTRS)

    Hecht, H.; Hecht, M.

    1984-01-01

    The work reported here provides protection against software failures in the task dispatcher of the FTMP, a particularly critical portion of the system software. Faults in other system modules and application programs can be handled by similar techniques but are not covered in this effort. Goals of the work reported here are: (1) to develop provisions in the software design that will detect and mitigate software failures in the dispatcher portion of the FTMP Executive and, (2) to propose the implementation of specific software reliability measures in other parts of the system. Beyond the specific support to the FTMP project, the work reported here represents a considerable advance in the practical application of the recovery block methodology for fault tolerant software design.

  8. Fault Tolerance and Parallel Processing for NGST

    NASA Astrophysics Data System (ADS)

    Sengupta, R.; Offenberg, J. D.; Fixsen, D. J.; Nieto-Santisteban, M. A.; Hanisch, R. J.; Stockman, H. S.; Mather, J. C.

    1999-12-01

    The Next Generation Space Telescope (NGST) Image Processing Group is developing scalable cosmic ray rejection and data compression algorithms for parallel processors as part of NASA's Remote Exploration and Experimentation (REE) Project. The primary intention of the REE project is to use commercial-off-the shelf (COTS) technology to develop scalable, low-power, fault tolerant, high performance computers in space. NGST is one of the applications selected to demonstrate the benefit of having on-board supercomputing power. Real-time cosmic ray rejection would enable us to reduce the downlink data volume by as much as two orders of magnitude by combining multiple read-outs on the spacecraft rather than downlinking them separately. The combined read-outs can be further reduced in size by applying lossy and/or lossless data compression algorithms. This work is funded by NASA's REE project, managed by JPL.

  9. Quantum fault-tolerant thresholds for universal concatenated schemes

    NASA Astrophysics Data System (ADS)

    Chamberland, Christopher; Jochym-O'Connor, Tomas; Laflamme, Raymond

    Fault-tolerant quantum computation uses ancillary qubits in order to protect logical data qubits while allowing for the manipulation of the quantum information without severe losses in coherence. While different models for fault-tolerant quantum computation exist, determining the ancillary qubit overhead for competing schemes remains a challenging theoretical problem. In this work, we study the fault-tolerance threshold rates of different models for universal fault-tolerant quantum computation. Namely, we provide different threshold rates for the 105-qubit concatenated coding scheme for universal computation without the need for state distillation. We study two error models: adversarial noise and depolarizing noise and provide lower bounds for the threshold in each of these error regimes. Establishing the threshold rates for the concatenated coding scheme will allow for a physical quantum resource comparison between our fault-tolerant universal quantum computation model and the traditional model using magic state distillation.

  10. Fault tolerant operation of switched reluctance machine

    NASA Astrophysics Data System (ADS)

    Wang, Wei

    The energy crisis and environmental challenges have driven industry towards more energy efficient solutions. With nearly 60% of electricity consumed by various electric machines in industry sector, advancement in the efficiency of the electric drive system is of vital importance. Adjustable speed drive system (ASDS) provides excellent speed regulation and dynamic performance as well as dramatically improved system efficiency compared with conventional motors without electronics drives. Industry has witnessed tremendous grow in ASDS applications not only as a driving force but also as an electric auxiliary system for replacing bulky and low efficiency auxiliary hydraulic and mechanical systems. With the vast penetration of ASDS, its fault tolerant operation capability is more widely recognized as an important feature of drive performance especially for aerospace, automotive applications and other industrial drive applications demanding high reliability. The Switched Reluctance Machine (SRM), a low cost, highly reliable electric machine with fault tolerant operation capability, has drawn substantial attention in the past three decades. Nevertheless, SRM is not free of fault. Certain faults such as converter faults, sensor faults, winding shorts, eccentricity and position sensor faults are commonly shared among all ASDS. In this dissertation, a thorough understanding of various faults and their influence on transient and steady state performance of SRM is developed via simulation and experimental study, providing necessary knowledge for fault detection and post fault management. Lumped parameter models are established for fast real time simulation and drive control. Based on the behavior of the faults, a fault detection scheme is developed for the purpose of fast and reliable fault diagnosis. In order to improve the SRM power and torque capacity under faults, the maximum torque per ampere excitation are conceptualized and validated through theoretical analysis and

  11. Measuring fault tolerance with the FTAPE fault injection tool

    NASA Astrophysics Data System (ADS)

    Tsai, Timothy K.; Iyer, Ravishankar K.

    1995-05-01

    This paper describes FTAPE (Fault Tolerance And Performance Evaluator), a tool that can be used to compare fault-tolerant computers. The major parts of the tool include a system-wide fault-injector, a workload generator, and a workload activity measurement tool. The workload creates high stress conditions on the machine. Using stress-based injection, the fault injector is able to utilize knowledge of the workload activity to ensure a high level of fault propagation. The errors/fault ratio, performance degradation, and number of system crashes are presented as measures of fault tolerance.

  12. Measuring fault tolerance with the FTAPE fault injection tool

    NASA Technical Reports Server (NTRS)

    Tsai, Timothy K.; Iyer, Ravishankar K.

    1995-01-01

    This paper describes FTAPE (Fault Tolerance And Performance Evaluator), a tool that can be used to compare fault-tolerant computers. The major parts of the tool include a system-wide fault-injector, a workload generator, and a workload activity measurement tool. The workload creates high stress conditions on the machine. Using stress-based injection, the fault injector is able to utilize knowledge of the workload activity to ensure a high level of fault propagation. The errors/fault ratio, performance degradation, and number of system crashes are presented as measures of fault tolerance.

  13. A fault-tolerant network architecture for integrated avionics

    NASA Technical Reports Server (NTRS)

    Butler, Bryan; Adams, Stuart

    1991-01-01

    The Army Fault-Tolerant Architecture (AFTA) under construction at the Charles Stark Draper Laboratory is an example of a highly integrated critical avionics system. The AFTA system must connect to other redundant and nonredundant systems, as well as to input/output devices. A fault-tolerant data bus (FTDB) is being developed to provide highly reliable communication between the AFTA computer and other network stations. The FTDB is being designed for Byzantine resilience and is probably capable of tolerating any single arbitrary fault. The author describes a prototype architecture for the fault-tolerant data bus.

  14. Model-Based Fault Tolerant Control

    NASA Technical Reports Server (NTRS)

    Kumar, Aditya; Viassolo, Daniel

    2008-01-01

    The Model Based Fault Tolerant Control (MBFTC) task was conducted under the NASA Aviation Safety and Security Program. The goal of MBFTC is to develop and demonstrate real-time strategies to diagnose and accommodate anomalous aircraft engine events such as sensor faults, actuator faults, or turbine gas-path component damage that can lead to in-flight shutdowns, aborted take offs, asymmetric thrust/loss of thrust control, or engine surge/stall events. A suite of model-based fault detection algorithms were developed and evaluated. Based on the performance and maturity of the developed algorithms two approaches were selected for further analysis: (i) multiple-hypothesis testing, and (ii) neural networks; both used residuals from an Extended Kalman Filter to detect the occurrence of the selected faults. A simple fusion algorithm was implemented to combine the results from each algorithm to obtain an overall estimate of the identified fault type and magnitude. The identification of the fault type and magnitude enabled the use of an online fault accommodation strategy to correct for the adverse impact of these faults on engine operability thereby enabling continued engine operation in the presence of these faults. The performance of the fault detection and accommodation algorithm was extensively tested in a simulation environment.

  15. FTMP (Fault Tolerant Multiprocessor) programmer's manual

    NASA Technical Reports Server (NTRS)

    Feather, F. E.; Liceaga, C. A.; Padilla, P. A.

    1986-01-01

    The Fault Tolerant Multiprocessor (FTMP) computer system was constructed using the Rockwell/Collins CAPS-6 processor. It is installed in the Avionics Integration Research Laboratory (AIRLAB) of NASA Langley Research Center. It is hosted by AIRLAB's System 10, a VAX 11/750, for the loading of programs and experimentation. The FTMP support software includes a cross compiler for a high level language called Automated Engineering Design (AED) System, an assembler for the CAPS-6 processor assembly language, and a linker. Access to this support software is through an automated remote access facility on the VAX which relieves the user of the burden of learning how to use the IBM 4381. This manual is a compilation of information about the FTMP support environment. It explains the FTMP software and support environment along many of the finer points of running programs on FTMP. This will be helpful to the researcher trying to run an experiment on FTMP and even to the person probing FTMP with fault injections. Much of the information in this manual can be found in other sources; we are only attempting to bring together the basic points in a single source. If the reader should need points clarified, there is a list of support documentation in the back of this manual.

  16. Fault-tolerant multichannel demultiplexer subsystems

    NASA Technical Reports Server (NTRS)

    Redinbo, Robert

    1991-01-01

    Fault tolerance in future processing and switching communication satellites is addressed by showing new methods for detecting hardware failures in the first major subsystem, the multichannel demultiplexer. An efficient method for demultiplexing frequency slotted channels uses multirate filter banks which contain fast Fourier transform processing. All numerical processing is performed at a lower rate commensurate with the small bandwidth of each bandbase channel. The integrity of the demultiplexing operations is protected by using real number convolutional codes to compute comparable parity values which detect errors at the data sample level. High rate, systematic convolutional codes produce parity values at a much reduced rate, and protection is achieved by generating parity values in two ways and comparing them. Parity values corresponding to each output channel are generated in parallel by a subsystem, operating even slower and in parallel with the demultiplexer that is virtually identical to the original structure. These parity calculations may be time shared with the same processing resources because they are so similar.

  17. Fault-tolerance for exascale systems.

    SciTech Connect

    Riesen, Rolf E.; Varela, Maria Ruiz; Ferreira, Kurt Brian

    2010-08-01

    Periodic, coordinated, checkpointing to disk is the most prevalent fault tolerance method used in modern large-scale, capability class, high-performance computing (HPC) systems. Previous work has shown that as the system grows in size, the inherent synchronization of coordinated checkpoint/restart (CR) limits application scalability; at large node counts the application spends most of its time checkpointing instead of executing useful work. Furthermore, a single component failure forces an application restart from the last correct checkpoint. Suggested alternatives to coordinated CR include uncoordinated CR with message logging, redundant computation, and RAID-inspired, in-memory distributed checkpointing schemes. Each of these alternatives have differing overheads that are dependent on both the scale and communication characteristics of the application. In this work, using the Structural Simulation Toolkit (SST) simulator, we compare the performance characteristics of each of these resilience methods for a number of HPC application patterns on a number of proposed exascale machines. The result of this work provides valuable guidance on the most efficient resilience methods for exascale systems.

  18. An analytical coupled technique for solving nonlinear large-amplitude oscillation of a conservative system with inertia and static non-linearity.

    PubMed

    Razzak, Md Abdur; Alam, Md Shamsul

    2016-01-01

    Based on a new trial function, an analytical coupled technique (a combination of homotopy perturbation method and variational method) is presented to obtain the approximate frequencies and the corresponding periodic solutions of the free vibration of a conservative oscillator having inertia and static non-linearities. In some of the previous articles, the first and second-order approximations have been determined by the same method of such nonlinear oscillator, but the trial functions have not been satisfied the initial conditions. It seemed to be a big shortcoming of those articles. The new trial function of this paper overcomes aforementioned limitation. The first-order approximation is mainly considered in this paper. The main advantage of this present paper is, the first-order approximation gives better result than other existing second-order harmonic balance methods. The present method is valid for large amplitudes of oscillation. The absolute relative error measures (first-order approximate frequency) in this paper is 0.00 % for large amplitude A = 1000, while the relative error gives two different second-order harmonic balance methods: 10.33 and 3.72 %. Thus the present method is suitable for solving the above-mentioned nonlinear oscillator. PMID:27119060

  19. Fault tolerant architectures for integrated aircraft electronics systems

    NASA Technical Reports Server (NTRS)

    Levitt, K. N.; Melliar-Smith, P. M.; Schwartz, R. L.

    1983-01-01

    Work into possible architectures for future flight control computer systems is described. Ada for Fault-Tolerant Systems, the NETS Network Error-Tolerant System architecture, and voting in asynchronous systems are covered.

  20. Fault-tolerance - The survival attribute of digital systems

    NASA Technical Reports Server (NTRS)

    Avizienis, A.

    1978-01-01

    Fault-tolerance is the architectural attribute of a digital system that keeps the logic machine doing its specified tasks when its host, the physical system, suffers various kinds of failures of its components. A more general concept of fault-tolerance also includes human mistakes committed during software and hardware implementation and during man/machine interaction among the causes of faults that are to be tolerated by the logic machine. This paper discusses the concept of fault-tolerance, the reasons for its inclusion in digital system architecture, and the methods of its implementation. A chronological view of the evolution of fault-tolerant systems and an outline of some goals for its further development conclude the presentation.

  1. Analysis of typical fault-tolerant architectures using HARP

    NASA Technical Reports Server (NTRS)

    Bavuso, Salvatore J.; Bechta Dugan, Joanne; Trivedi, Kishor S.; Rothmann, Elizabeth M.; Smith, W. Earl

    1987-01-01

    Difficulties encountered in the modeling of fault-tolerant systems are discussed. The Hybrid Automated Reliability Predictor (HARP) approach to modeling fault-tolerant systems is described. The HARP is written in FORTRAN, consists of nearly 30,000 lines of codes and comments, and is based on behavioral decomposition. Using the behavioral decomposition, the dependability model is divided into fault-occurrence/repair and fault/error-handling models; the characteristics and combining of these two models are examined. Examples in which the HARP is applied to the modeling of some typical fault-tolerant systems, including a local-area network, two fault-tolerant computer systems, and a flight control system, are presented.

  2. Programming fault-tolerant distributed systems in Ada

    NASA Technical Reports Server (NTRS)

    Voigt, Susan J.

    1985-01-01

    Viewgraphs on the topic of programming fault-tolerant distributed systems in the Ada programming language are presented. Topics covered include project goals, Ada difficulties and solutions, testbed requirements, and virtual processors.

  3. Fault-tolerant interconnection networks for multiprocessor systems

    SciTech Connect

    Nassar, H.M.

    1989-01-01

    Interconnection networks represent the backbone of multiprocessor systems. A failure in the network, therefore, could seriously degrade the system performance. For this reason, fault tolerance has been regarded as a major consideration in interconnection network design. This thesis presents two novel techniques to provide fault tolerance capabilities to three major networks: the Beneline network and the Clos network. First, the Simple Fault Tolerance Technique (SFT) is presented. The SFT technique is in fact the result of merging two widely known interconnection mechanisms: a normal interconnection network and a shared bus. This technique is most suitable for networks with small switches, such as the Baseline network and the Benes network. For the Clos network, whose switches may be large for the SFT, another technique is developed to produce the Fault-Tolerant Clos (FTC) network. In the FTC, one switch is added to each stage. The two techniques are described and thoroughly analyzed.

  4. Application-Specific Fault Tolerance via Data Access Characterization

    SciTech Connect

    Ali, Nawab; Krishnamoorthy, Sriram; Govind, Niranjan; Kowalski, Karol; Sadayappan, Ponnuswamy

    2011-08-30

    Recent trends in semiconductor technology and supercomputer design predict an increasing probability of faults during an application's execution. Designing an application that is resilient to system failures requires careful evaluation of the impact of various approaches on preserving key application state. In this paper, we present our experiences in an ongoing effort to make a large computational chemistry application fault tolerant. We construct the data access signatures of key application modules to evaluate alternative fault tolerance approaches. We present the instrumentation methodology, characterization of the application modules, and evaluation of fault tolerance techniques using the information collected. The application signatures developed capture application characteristics not traditionally revealed by performance tools. We believe these can be used in the design and evaluation of runtimes beyond fault tolerance.

  5. Fault tolerance analysis of the class of rearrangeable interconnection networks

    SciTech Connect

    Pakzad, S. . Dept. of Electrical Engineering)

    1989-08-01

    This paper analyzes the fault tolerance characteristics of a range or rearrangeable {beta}-networks based on the concepts and the framework developed by S. Pakzad and S. Lakshmivarahan. These rearrangeable {beta}-networks include the Benes network, the Waksman network, the Joel network, and the serial network. In addition, this paper presents a comparative analysis of the aforementioned networks according to their hardware cost, performance, and degree of fault tolerance.

  6. Fault tolerant programmable digital attitude control electronics study

    NASA Technical Reports Server (NTRS)

    Sorensen, A. A.

    1974-01-01

    The attitude control electronics mechanization study to develop a fault tolerant autonomous concept for a three axis system is reported. Programmable digital electronics are compared to general purpose digital computers. The requirements, constraints, and tradeoffs are discussed. It is concluded that: (1) general fault tolerance can be achieved relatively economically, (2) recovery times of less than one second can be obtained, (3) the number of faulty behavior patterns must be limited, and (4) adjoined processes are the best indicators of faulty operation.

  7. On the design of fault-tolerant robotic manipulator systems

    NASA Astrophysics Data System (ADS)

    Tesar, Delbert

    1993-02-01

    Robotic systems are finding increasing use in space applications. Many of these devices are going to be operational on board the Space Station Freedom. Fault tolerance has been deemed necessary because of the criticality of the tasks and the inaccessibility of the systems to maintenance and repair. Design for fault tolerance in manipulator systems is an area within robotics that is without precedence in the literature. In this paper, we will attempt to lay down the foundations for such a technology. Design for fault tolerance demands new and special approaches to design, often at considerable variance from established design practices. These design aspects, together with reliability evaluation and modeling tools, are presented. Mechanical architectures that employ protective redundancies at many levels and have a modular architecture are then studied in detail. Once a mechanical architecture for fault tolerance has been derived, the chronological stages of operational fault tolerance are investigated. Failure detection, isolation, and estimation methods are surveyed, and such methods for robot sensors and actuators are derived. Failure recovery methods are also presented for each of the protective layers of redundancy. Failure recovery tactics often span all of the layers of a control hierarchy. Thus, a unified framework for decision-making and control, which orchestrates both the nominal redundancy management tasks and the failure management tasks, has been derived. The well-developed field of fault-tolerant computers is studied next, and some design principles relevant to the design of fault-tolerant robot controllers are abstracted. Conclusions are drawn, and a road map for the design of fault-tolerant manipulator systems is laid out with recommendations for a 10 DOF arm with dual actuators at each joint.

  8. Optimal Management of Redundant Control Authority for Fault Tolerance

    NASA Technical Reports Server (NTRS)

    Wu, N. Eva; Ju, Jianhong

    2000-01-01

    This paper is intended to demonstrate the feasibility of a solution to a fault tolerant control problem. It explains, through a numerical example, the design and the operation of a novel scheme for fault tolerant control. The fundamental principle of the scheme was formalized in [5] based on the notion of normalized nonspecificity. The novelty lies with the use of a reliability criterion for redundancy management, and therefore leads to a high overall system reliability.

  9. A second generation experiment in fault-tolerant software

    NASA Technical Reports Server (NTRS)

    Knight, J. C.

    1986-01-01

    The primary goal was to determine whether the application of fault tolerance to software increases its reliability if the cost of production is the same as for an equivalent nonfault tolerance version derived from the same requirements specification. Software development protocols are discussed. The feasibility of adapting to software design fault tolerance the technique of N-fold Modular Redundancy with majority voting was studied.

  10. Advanced development for space robotics with emphasis on fault tolerance

    NASA Technical Reports Server (NTRS)

    Tesar, D.; Chladek, J.; Hooper, R.; Sreevijayan, D.; Kapoor, C.; Geisinger, J.; Meaney, M.; Browning, G.; Rackers, K.

    1995-01-01

    This paper describes the ongoing work in fault tolerance at the University of Texas at Austin. The paper describes the technical goals the group is striving to achieve and includes a brief description of the individual projects focusing on fault tolerance. The ultimate goal is to develop and test technology applicable to all future missions of NASA (lunar base, Mars exploration, planetary surveillance, space station, etc.).

  11. Active resonator reset in the non-linear regime of circuit QED to improve multi-round quantum parity checks

    NASA Astrophysics Data System (ADS)

    Bultink, Cornelis Christiaan; Rol, M. A.; Fu, X.; Dikken, B. C. S.; de Sterke, J. C.; Vermeulen, R. F. L.; Schouten, R. N.; Bruno, A.; Bertels, K. L. M.; Dicarlo, L.

    Reliable quantum parity measurements are essential for fault-tolerant quantum computing. In quantum processors based on circuit QED, the fidelity and speed of multi-round quantum parity checks using an ancillary qubit can be compromised by photons remaining in the readout resonator post measurement, leading to ancilla dephasing and gate errors. The challenge of quickly depleting photons is biggest when maximizing the single-shot readout fidelity involves strong pulses turning the resonators non-linear. We experimentally demonstrate the numerical optimization of counter pulses for fast photon depletion in this non-analytic regime. We compare two methods, one using digital feedback and another running open loop. We assess both methods by minimizing the average number of rounds to ancilla measurement error. We acknowledge funding from the EU FP7 project SCALEQIT, FOM, and an ERC Synergy Grant.

  12. Steps toward fault-tolerant quantum chemistry.

    SciTech Connect

    Taube, Andrew Garvin

    2010-05-01

    Developing quantum chemistry programs on the coming generation of exascale computers will be a difficult task. The programs will need to be fault-tolerant and minimize the use of global operations. This work explores the use a task-based model that uses a data-centric approach to allocate work to different processes as it applies to quantum chemistry. After introducing the key problems that appear when trying to parallelize a complicated quantum chemistry method such as coupled-cluster theory, we discuss the implications of that model as it pertains to the computational kernel of a coupled-cluster program - matrix multiplication. Also, we discuss the extensions that would required to build a full coupled-cluster program using the task-based model. Current programming models for high-performance computing are fault-intolerant and use global operations. Those properties are unsustainable as computers scale to millions of CPUs; instead one must recognize that these systems will be hierarchical in structure, prone to constant faults, and global operations will be infeasible. The FAST-OS HARE project is introducing a scale-free computing model to address these issues. This model is hierarchical and fault-tolerant by design, allows for the clean overlap of computation and communication, reducing the network load, does not require checkpointing, and avoids the complexity of many HPC runtimes. Development of an algorithm within this model requires a change in focus from imperative programming to a data-centric approach. Quantum chemistry (QC) algorithms, in particular electronic structure methods, are an ideal test bed for this computing model. These methods describe the distribution of electrons in a molecule, which determine the properties of the molecule. The computational cost of these methods is high, scaling quartically or higher in the size of the molecule, which is why QC applications are major users of HPC resources. The complexity of these algorithms means that

  13. Reliability of voting in fault-tolerant software systems for small output spaces

    NASA Technical Reports Server (NTRS)

    Mcallister, David F.; Sun, Chien-En; Vouk, Mladen A.

    1987-01-01

    Under a voting strategy in a fault-tolerant software system there is a difference between correctness and agreement. An independent N-version programming reliability model is proposed for treating small output spaces which distinguishes between correctness and agreement. System reliability is investigated using analytical relationships and simulation. A consensus majority voting strategy is proposed and its performance is analyzed and compared with other voting strategies. Consensus majority strategy automatically adapts the voting to different component reliability and output space cardinality characteristics. It is shown that absolute majority voting strategy provides a lower bound on the reliability provided by the consensus majority, and 2-of-n voting strategy an upper bound. If r is the cardinality of the output space it is proved the 1/r is a lower bound on the average reliability of fault-tolerant system components below which the system reliability begins to deteriorate as more versions are added.

  14. Block QCA Fault-Tolerant Logic Gates

    NASA Technical Reports Server (NTRS)

    Firjany, Amir; Toomarian, Nikzad; Modarres, Katayoon

    2003-01-01

    Suitably patterned arrays (blocks) of quantum-dot cellular automata (QCA) have been proposed as fault-tolerant universal logic gates. These block QCA gates could be used to realize the potential of QCA for further miniaturization, reduction of power consumption, increase in switching speed, and increased degree of integration of very-large-scale integrated (VLSI) electronic circuits. The limitations of conventional VLSI circuitry, the basic principle of operation of QCA, and the potential advantages of QCA-based VLSI circuitry were described in several NASA Tech Briefs articles, namely Implementing Permutation Matrices by Use of Quantum Dots (NPO-20801), Vol. 25, No. 10 (October 2001), page 42; Compact Interconnection Networks Based on Quantum Dots (NPO-20855) Vol. 27, No. 1 (January 2003), page 32; Bit-Serial Adder Based on Quantum Dots (NPO-20869), Vol. 27, No. 1 (January 2003), page 35; and Hybrid VLSI/QCA Architecture for Computing FFTs (NPO-20923), which follows this article. To recapitulate the principle of operation (greatly oversimplified because of the limitation on space available for this article): A quantum-dot cellular automata contains four quantum dots positioned at or between the corners of a square cell. The cell contains two extra mobile electrons that can tunnel (in the quantummechanical sense) between neighboring dots within the cell. The Coulomb repulsion between the two electrons tends to make them occupy antipodal dots in the cell. For an isolated cell, there are two energetically equivalent arrangements (denoted polarization states) of the extra electrons. The cell polarization is used to encode binary information. Because the polarization of a nonisolated cell depends on Coulomb-repulsion interactions with neighboring cells, universal logic gates and binary wires could be constructed, in principle, by arraying QCA of suitable design in suitable patterns. Heretofore, researchers have recognized two major obstacles to realization of QCA

  15. Fault tolerant hypercube computer system architecture

    NASA Technical Reports Server (NTRS)

    Madan, Herb S. (Inventor); Chow, Edward (Inventor)

    1989-01-01

    A fault-tolerant multiprocessor computer system of the hypercube type comprising a hierarchy of computers of like kind which can be functionally substituted for one another as necessary is disclosed. Communication between the working nodes is via one communications network while communications between the working nodes and watch dog nodes and load balancing nodes higher in the structure is via another communications network separate from the first. A typical branch of the hierarchy reporting to a master node or host computer comprises, a plurality of first computing nodes; a first network of message conducting paths for interconnecting the first computing nodes as a hypercube. The first network provides a path for message transfer between the first computing nodes; a first watch dog node; and a second network of message connecting paths for connecting the first computing nodes to the first watch dog node independent from the first network, the second network provides an independent path for test message and reconfiguration affecting transfers between the first computing nodes and the first switch watch dog node. There is additionally, a plurality of second computing nodes; a third network of message conducting paths for interconnecting the second computing nodes as a hypercube. The third network provides a path for message transfer between the second computing nodes; a fourth network of message conducting paths for connecting the second computing nodes to the first watch dog node independent from the third network. The fourth network provides an independent path for test message and reconfiguration affecting transfers between the second computing nodes and the first watch dog node; and a first multiplexer disposed between the first watch dog node and the second and fourth networks for allowing the first watch dog node to selectively communicate with individual ones of the computing nodes through the second and fourth networks; as well as, a second watch dog node

  16. Extending quantum error correction: New continuous measurement protocols and improved fault-tolerant overhead

    NASA Astrophysics Data System (ADS)

    Ahn, Charlene Sonja

    Quantum mechanical applications range from quantum computers to quantum key distribution to teleportation. In these applications, quantum error correction is extremely important for protecting quantum states against decoherence. Here I present two main results regarding quantum error correction protocols. The first main topic I address is the development of continuous-time quantum error correction protocols via combination with techniques from quantum control. These protocols rely on weak measurement and Hamiltonian feedback instead of the projective measurements and unitary gates usually assumed by canonical quantum error correction. I show that a subclass of these protocols can be understood as a quantum feedback protocol, and analytically analyze the general case using the stabilizer formalism; I show that in this case perfect feedback can perfectly protect a stabilizer subspace. I also show through numerical simulations that another subclass of these protocols does better than canonical quantum error correction when the time between corrections is limited. The second main topic is development of improved overhead results for fault-tolerant computation. In particular, through analysis of topological quantum error correcting codes, it will be shown that the required blowup in depth of a noisy circuit performing a fault-tolerant computation can be reduced to a factor of O(log log L), an improvement over previous results. Showing this requires investigation into a local method of performing fault-tolerant correction on a topological code of arbitrary dimension.

  17. Advanced information processing system: The Army Fault-Tolerant Architecture detailed design overview

    NASA Technical Reports Server (NTRS)

    Harper, Richard E.; Babikyan, Carol A.; Butler, Bryan P.; Clasen, Robert J.; Harris, Chris H.; Lala, Jaynarayan H.; Masotto, Thomas K.; Nagle, Gail A.; Prizant, Mark J.; Treadwell, Steven

    1994-01-01

    The Army Avionics Research and Development Activity (AVRADA) is pursuing programs that would enable effective and efficient management of large amounts of situational data that occurs during tactical rotorcraft missions. The Computer Aided Low Altitude Night Helicopter Flight Program has identified automated Terrain Following/Terrain Avoidance, Nap of the Earth (TF/TA, NOE) operation as key enabling technology for advanced tactical rotorcraft to enhance mission survivability and mission effectiveness. The processing of critical information at low altitudes with short reaction times is life-critical and mission-critical necessitating an ultra-reliable/high throughput computing platform for dependable service for flight control, fusion of sensor data, route planning, near-field/far-field navigation, and obstacle avoidance operations. To address these needs the Army Fault Tolerant Architecture (AFTA) is being designed and developed. This computer system is based upon the Fault Tolerant Parallel Processor (FTPP) developed by Charles Stark Draper Labs (CSDL). AFTA is hard real-time, Byzantine, fault-tolerant parallel processor which is programmed in the ADA language. This document describes the results of the Detailed Design (Phase 2 and 3 of a 3-year project) of the AFTA development. This document contains detailed descriptions of the program objectives, the TF/TA NOE application requirements, architecture, hardware design, operating systems design, systems performance measurements and analytical models.

  18. Fault-tolerant wait-free shared objects

    NASA Technical Reports Server (NTRS)

    Jayanti, Prasad; Chandra, Tushar D.; Toueg, Sam

    1992-01-01

    A concurrent system consists of processes communicating via shared objects, such as shared variables, queues, etc. The concept of wait-freedom was introduced to cope with process failures: each process that accesses a wait-free object is guaranteed to get a response even if all the other processes crash. However, if a wait-free object 'crashes,' all the processes that access that object are prevented from making progress. In this paper, we introduce the concept of fault-tolerant wait-free objects, and study the problem of implementing them. We give a universal method to construct fault-tolerant wait-free objects, for all types of 'responsive' failures (including one in which faulty objects may 'lie'). In sharp contrast, we prove that many common and interesting types (such as queues, sets, and test&set) have no fault-tolerant wait-free implementations even under the most benign of the 'non-responsive' types of failure. We also introduce several concepts and techniques that are central to the design of fault-tolerant concurrent systems: the concepts of self-implementation and graceful degradation, and techniques to automatically increase the fault-tolerance of implementations. We prove matching lower bounds on the resource complexity of most of our algorithms.

  19. A fault-tolerant intelligent robotic control system

    NASA Technical Reports Server (NTRS)

    Marzwell, Neville I.; Tso, Kam Sing

    1993-01-01

    This paper describes the concept, design, and features of a fault-tolerant intelligent robotic control system being developed for space and commercial applications that require high dependability. The comprehensive strategy integrates system level hardware/software fault tolerance with task level handling of uncertainties and unexpected events for robotic control. The underlying architecture for system level fault tolerance is the distributed recovery block which protects against application software, system software, hardware, and network failures. Task level fault tolerance provisions are implemented in a knowledge-based system which utilizes advanced automation techniques such as rule-based and model-based reasoning to monitor, diagnose, and recover from unexpected events. The two level design provides tolerance of two or more faults occurring serially at any level of command, control, sensing, or actuation. The potential benefits of such a fault tolerant robotic control system include: (1) a minimized potential for damage to humans, the work site, and the robot itself; (2) continuous operation with a minimum of uncommanded motion in the presence of failures; and (3) more reliable autonomous operation providing increased efficiency in the execution of robotic tasks and decreased demand on human operators for controlling and monitoring the robotic servicing routines.

  20. Performance of fault-tolerant diagnostics in the hypercube systems

    SciTech Connect

    Ghafoor, A.; Sole, P.

    1989-08-01

    In this paper, they introduce the concept of fault-tolerant self-diagnosis for distributed systems and show that there exists a performance tradeoff between the complexity of a self-diagnostic algorithm and the level of fault tolerance inherited by the algorithm. For the study, they select hypercube systems and show that designing an optimal algorithm for such systems has an equivalent coding theory formulation which belongs to the class of NP-hard problems. Subsequently, they propose an ''efficient'' diagnostic scheme for these systems and study the performance tradeoff of the proposed algorithm which is based on a combinatorial structure called Hadamard matrix. The authors make an essential use of its properties of symmetrical partitioning and covering in hypercube networks. Using known translate weight distributions, they evaluated the tradeoff between the fault tolerance and traffic complexity of the proposed diagnostic algorithm for hypercubes of small sizes. An interesting compromise is exhibited for the hypercube with an arbitrary size.

  1. Software reliability through fault-avoidance and fault-tolerance

    NASA Technical Reports Server (NTRS)

    Vouk, Mladen A.; Mcallister, David F.

    1991-01-01

    Twenty independently developed but functionally equivalent software versions were used to investigate and compare empirically some properties of N-version programming, Recovery Block, and Consensus Recovery Block, using the majority and consensus voting algorithms. This was also compared with another hybrid fault-tolerant scheme called Acceptance Voting, using dynamic versions of consensus and majority voting. Consensus voting provides adaptation of the voting strategy to varying component reliability, failure correlation, and output space characteristics. Since failure correlation among versions effectively reduces the cardinality of the space in which the voter make decisions, consensus voting is usually preferable to simple majority voting in any fault-tolerant system. When versions have considerably different reliabilities, the version with the best reliability will perform better than any of the fault-tolerant techniques.

  2. Multiple Embedded Processors for Fault-Tolerant Computing

    NASA Technical Reports Server (NTRS)

    Bolotin, Gary; Watson, Robert; Katanyoutanant, Sunant; Burke, Gary; Wang, Mandy

    2005-01-01

    A fault-tolerant computer architecture has been conceived in an effort to reduce vulnerability to single-event upsets (spurious bit flips caused by impingement of energetic ionizing particles or photons). As in some prior fault-tolerant architectures, the redundancy needed for fault tolerance is obtained by use of multiple processors in one computer. Unlike prior architectures, the multiple processors are embedded in a single field-programmable gate array (FPGA). What makes this new approach practical is the recent commercial availability of FPGAs that are capable of having multiple embedded processors. A working prototype (see figure) consists of two embedded IBM PowerPC 405 processor cores and a comparator built on a Xilinx Virtex-II Pro FPGA. This relatively simple instantiation of the architecture implements an error-detection scheme. A planned future version, incorporating four processors and two comparators, would correct some errors in addition to detecting them.

  3. Fault tolerant architectures for integrated aircraft electronics systems, task 2

    NASA Technical Reports Server (NTRS)

    Levitt, K. N.; Melliar-Smith, P. M.; Schwartz, R. L.

    1984-01-01

    The architectural basis for an advanced fault tolerant on-board computer to succeed the current generation of fault tolerant computers is examined. The network error tolerant system architecture is studied with particular attention to intercluster configurations and communication protocols, and to refined reliability estimates. The diagnosis of faults, so that appropriate choices for reconfiguration can be made is discussed. The analysis relates particularly to the recognition of transient faults in a system with tasks at many levels of priority. The demand driven data-flow architecture, which appears to have possible application in fault tolerant systems is described and work investigating the feasibility of automatic generation of aircraft flight control programs from abstract specifications is reported.

  4. Garbage collection: an exercise in distributed, fault-tolerant programming

    SciTech Connect

    Vestal, S.C.

    1987-01-01

    Two garbage-collection algorithms are presented to reclaim unused storage in object-oriented systems implemented on local area networks. The algorithms are fault-tolerant and allowed parallel, incremental collection in an object address space distributed throughout the system. The two approaches allow multiple collectors, so some unused storage can be reclaimed in partitioned networks. The first method makes use of fault-tolerant reference counts together with an algorithm to collect cycles of objects that would otherwise remain unclaimed. The second method adapts a parallel collector so that it can be used to collect subspaces of the entire network address space. Throughout this work concern is with a methodology for developing distributed, parallel, fault-tolerant programs. Also, there is concern with the suitability of object-oriented systems for such applications.

  5. Measurement and analysis of operating system fault tolerance

    NASA Technical Reports Server (NTRS)

    Lee, I.; Tang, D.; Iyer, R. K.

    1992-01-01

    This paper demonstrates a methodology to model and evaluate the fault tolerance characteristics of operational software. The methodology is illustrated through case studies on three different operating systems: the Tandem GUARDIAN fault-tolerant system, the VAX/VMS distributed system, and the IBM/MVS system. Measurements are made on these systems for substantial periods to collect software error and recovery data. In addition to investigating basic dependability characteristics such as major software problems and error distributions, we develop two levels of models to describe error and recovery processes inside an operating system and on multiple instances of an operating system running in a distributed environment. Based on the models, reward analysis is conducted to evaluate the loss of service due to software errors and the effect of the fault-tolerance techniques implemented in the systems. Software error correlation in multicomputer systems is also investigated.

  6. Architectural concepts and redundancy techniques in fault-tolerant computers

    NASA Technical Reports Server (NTRS)

    Rennels, D. A.

    1974-01-01

    This paper presents a description of redundancy techniques employed in the design of fault-tolerant computers, and a discussion of the effects of functional requirements, technology constraints, and cost considerations which enter into the choice of these techniques. The STAR computer, developed at the Jet Propulsion Laboratory for long-duration planetary spacecraft missions, is discussed along with several later fault-tolerant computer designs. The class of computers described in this paper employs dynamic redundancy, i.e., the machine is divided into a set of submodules, each with standby spares; a special hard core monitor unit detects and diagnoses faults, and effects automated recovery by replacing failed parts.

  7. Single-Shot Fault-Tolerant Quantum Error Correction

    NASA Astrophysics Data System (ADS)

    Bombín, Héctor

    2015-07-01

    Conventional quantum error correcting codes require multiple rounds of measurements to detect errors with enough confidence in fault-tolerant scenarios. Here, I show that for suitable topological codes, a single round of local measurements is enough. This feature is generic and is related to self-correction and confinement phenomena in the corresponding quantum Hamiltonian model. Three-dimensional gauge color codes exhibit this single-shot feature, which also applies to initialization and gauge fixing. Assuming the time for efficient classical computations to be negligible, this yields a topological fault-tolerant quantum computing scheme where all elementary logical operations can be performed in constant time.

  8. Rule-based fault-tolerant flight control

    NASA Technical Reports Server (NTRS)

    Handelman, Dave

    1988-01-01

    Fault tolerance has always been a desirable characteristic of aircraft. The ability to withstand unexpected changes in aircraft configuration has a direct impact on the ability to complete a mission effectively and safely. The possible synergistic effects of combining techniques of modern control theory, statistical hypothesis testing, and artificial intelligence in the attempt to provide failure accommodation for aircraft are investigated. This effort has resulted in the definition of a theory for rule based control and a system for development of such a rule based controller. Although presented here in response to the goal of aircraft fault tolerance, the rule based control technique is applicable to a wide range of complex control problems.

  9. Enhanced Fault-Tolerant Quantum Computing in d -Level Systems

    NASA Astrophysics Data System (ADS)

    Campbell, Earl T.

    2014-12-01

    Error-correcting codes protect quantum information and form the basis of fault-tolerant quantum computing. Leading proposals for fault-tolerant quantum computation require codes with an exceedingly rare property, a transversal non-Clifford gate. Codes with the desired property are presented for d -level qudit systems with prime d . The codes use n =d -1 qudits and can detect up to ˜d /3 errors. We quantify the performance of these codes for one approach to quantum computation known as magic-state distillation. Unlike prior work, we find performance is always enhanced by increasing d .

  10. Reconfigurable tree architectures using subtree oriented fault tolerance

    NASA Technical Reports Server (NTRS)

    Lowrie, Matthew B.

    1987-01-01

    An approach to the design of reconfigurable tree architecture is presented in which spare processors are allocated at the leaves. The approach is unique in that spares are associated with subtrees and sharing of spares between these subtrees can occur. The Subtree Oriented Fault Tolerance (SOFT) approach is more reliable than previous approaches capable of tolerating link and switch failures for both single chip and multichip tree implementations while reducing redundancy in terms of both spare processors and links. VLSI layout is 0(n) for binary trees and is directly extensible to N-ary trees and fault tolerance through performance degradation.

  11. Fault tolerant kinematic control of hyper-redundant manipulators

    NASA Technical Reports Server (NTRS)

    Bedrossian, Nazareth S.

    1994-01-01

    Hyper-redundant spatial manipulators possess fault-tolerant features because of their redundant structure. The kinematic control of these manipulators is investigated with special emphasis on fault-tolerant control. The manipulator tasks are viewed in the end-effector space while actuator commands are in joint-space, requiring an inverse kinematic algorithm to generate joint-angle commands from the end-effector ones. The rate-inverse kinematic control algorithm presented in this paper utilizes the pseudoinverse to accommodate for joint motor failures. An optimal scale factor for the robust inverse is derived.

  12. SIFT - Design and analysis of a fault-tolerant computer for aircraft control. [Software Implemented Fault Tolerant systems

    NASA Technical Reports Server (NTRS)

    Wensley, J. H.; Lamport, L.; Goldberg, J.; Green, M. W.; Levitt, K. N.; Melliar-Smith, P. M.; Shostak, R. E.; Weinstock, C. B.

    1978-01-01

    SIFT (Software Implemented Fault Tolerance) is an ultrareliable computer for critical aircraft control applications that achieves fault tolerance by the replication of tasks among processing units. The main processing units are off-the-shelf minicomputers, with standard microcomputers serving as the interface to the I/O system. Fault isolation is achieved by using a specially designed redundant bus system to interconnect the processing units. Error detection and analysis and system reconfiguration are performed by software. Iterative tasks are redundantly executed, and the results of each iteration are voted upon before being used. Thus, any single failure in a processing unit or bus can be tolerated with triplication of tasks, and subsequent failures can be tolerated after reconfiguration. Independent execution by separate processors means that the processors need only be loosely synchronized, and a novel fault-tolerant synchronization method is described.

  13. Electronic Power Switch for Fault-Tolerant Networks

    NASA Technical Reports Server (NTRS)

    Volp, J.

    1987-01-01

    Power field-effect transistors reduce energy waste and simplify interconnections. Current switch containing power field-effect transistor (PFET) placed in series with each load in fault-tolerant power-distribution system. If system includes several loads and supplies, switches placed in series with adjacent loads and supplies. System of switches protects against overloads and losses of individual power sources.

  14. Adding Fault Tolerance to NPB Benchmarks Using ULFM

    SciTech Connect

    Parchman, Zachary W; Vallee, Geoffroy R; Naughton III, Thomas J; Engelmann, Christian; Bernholdt, David E; Scott, Stephen L

    2016-01-01

    In the world of high-performance computing, fault tolerance and application resilience are becoming some of the primary concerns because of increasing hardware failures and memory corruptions. While the research community has been investigating various options, from system-level solutions to application-level solutions, standards such as the Message Passing Interface (MPI) are also starting to include such capabilities. The current proposal for MPI fault tolerant is centered around the User-Level Failure Mitigation (ULFM) concept, which provides means for fault detection and recovery of the MPI layer. This approach does not address application-level recovery, which is currently left to application developers. In this work, we present a mod- ification of some of the benchmarks of the NAS parallel benchmark (NPB) to include support of the ULFM capabilities as well as application-level strategies and mechanisms for application-level failure recovery. As such, we present: (i) an application-level library to checkpoint and restore data, (ii) extensions of NPB benchmarks for fault tolerance based on different strategies, (iii) a fault injection tool, and (iv) some preliminary results that show the impact of such fault tolerant strategies on the application execution.

  15. Fault-free performance validation of fault-tolerant multiprocessors

    NASA Technical Reports Server (NTRS)

    Czeck, Edward W.; Feather, Frank E.; Grizzaffi, Ann Marie; Segall, Zary Z.; Siewiorek, Daniel P.

    1987-01-01

    A validation methodology for testing the performance of fault-tolerant computer systems was developed and applied to the Fault-Tolerant Multiprocessor (FTMP) at NASA-Langley's AIRLAB facility. This methodology was claimed to be general enough to apply to any ultrareliable computer system. The goal of this research was to extend the validation methodology and to demonstrate the robustness of the validation methodology by its more extensive application to NASA's Fault-Tolerant Multiprocessor System (FTMP) and to the Software Implemented Fault-Tolerance (SIFT) Computer System. Furthermore, the performance of these two multiprocessors was compared by conducting similar experiments. An analysis of the results shows high level language instruction execution times for both SIFT and FTMP were consistent and predictable, with SIFT having greater throughput. At the operating system level, FTMP consumes 60% of the throughput for its real-time dispatcher and 5% on fault-handling tasks. In contrast, SIFT consumes 16% of its throughput for the dispatcher, but consumes 66% in fault-handling software overhead.

  16. Abstractions for Fault-Tolerant Distributed System Verification

    NASA Technical Reports Server (NTRS)

    Pike, Lee S.; Maddalon, Jeffrey M.; Miner, Paul S.; Geser, Alfons

    2004-01-01

    Four kinds of abstraction for the design and analysis of fault tolerant distributed systems are discussed. These abstractions concern system messages, faults, fault masking voting, and communication. The abstractions are formalized in higher order logic, and are intended to facilitate specifying and verifying such systems in higher order theorem provers.

  17. Study of fault tolerant software technology for dynamic systems

    NASA Technical Reports Server (NTRS)

    Caglayan, A. K.; Zacharias, G. L.

    1985-01-01

    The major aim of this study is to investigate the feasibility of using systems-based failure detection isolation and compensation (FDIC) techniques in building fault-tolerant software and extending them, whenever possible, to the domain of software fault tolerance. First, it is shown that systems-based FDIC methods can be extended to develop software error detection techniques by using system models for software modules. In particular, it is demonstrated that systems-based FDIC techniques can yield consistency checks that are easier to implement than acceptance tests based on software specifications. Next, it is shown that systems-based failure compensation techniques can be generalized to the domain of software fault tolerance in developing software error recovery procedures. Finally, the feasibility of using fault-tolerant software in flight software is investigated. In particular, possible system and version instabilities, and functional performance degradation that may occur in N-Version programming applications to flight software are illustrated. Finally, a comparative analysis of N-Version and recovery block techniques in the context of generic blocks in flight software is presented.

  18. Trends in reliability modeling technology for fault tolerant systems

    NASA Technical Reports Server (NTRS)

    Bavuso, S. J.

    1979-01-01

    Reliability modeling for fault tolerant avionic computing systems was developed. The modeling of large systems involving issues of state size and complexity, fault coverage, and practical computation was discussed. A novel technique which provides the tool for studying the reliability of systems with nonconstant failure rates is presented. The fault latency which may provide a method of obtaining vital latent fault data is measured.

  19. Clouds: A support architecture for fault tolerant, distributed systems

    NASA Technical Reports Server (NTRS)

    Dasgupta, P.; Leblanco, R. J., Jr.

    1986-01-01

    Clouds is a distributed operating system providing support for fault tolerance, location independence, reconfiguration, and transactions. The implementation paradigm uses objects and nested actions as building blocks. Subsystems and applications that can be supported by Clouds to further enhance the performance and utility of the system are also discussed.

  20. cost and benefits optimization model for fault-tolerant aircraft electronic systems

    NASA Technical Reports Server (NTRS)

    1983-01-01

    The factors involved in economic assessment of fault tolerant systems (FTS) and fault tolerant flight control systems (FTFCS) are discussed. Algorithms for optimization and economic analysis of FTFCS are documented.

  1. Non Linear Conjugate Gradient

    Energy Science and Technology Software Center (ESTSC)

    2006-11-17

    Software that simulates and inverts electromagnetic field data for subsurface electrical properties (electrical conductivity) of geological media. The software treats data produced by a time harmonic source field excitation arising from the following antenna geometery: loops and grounded bipoles, as well as point electric and magnetic dioples. The inversion process is carried out using a non-linear conjugate gradient optimization scheme, which minimizes the misfit between field data and model data using a least squares criteria.more » The software is an upgrade from the code NLCGCS_MP ver 1.0. The upgrade includes the following components: Incorporation of new 1 D field sourcing routines to more accurately simulate the 3D electromagnetic field for arbitrary geologic& media, treatment for generalized finite length transmitting antenna geometry (antennas with vertical and horizontal component directions). In addition, the software has been upgraded to treat transverse anisotropy in electrical conductivity.« less

  2. The IEEE eighteenth international symposium on fault-tolerant computing (Digest of Papers)

    SciTech Connect

    Not Available

    1988-01-01

    These proceedings collect papers on fault detection and computers. Topics include: software failure behavior, fault tolerant distributed programs, parallel simulation of faults, concurrent built-in self-test techniques, fault-tolerant parallel processor architectures, probabilistic fault diagnosis, fault tolerances in hypercube processors and cellular automation modeling.

  3. Fault Tolerance Middleware for a Multi-Core System

    NASA Technical Reports Server (NTRS)

    Some, Raphael R.; Springer, Paul L.; Zima, Hans P.; James, Mark; Wagner, David A.

    2012-01-01

    Fault Tolerance Middleware (FTM) provides a framework to run on a dedicated core of a multi-core system and handles detection of single-event upsets (SEUs), and the responses to those SEUs, occurring in an application running on multiple cores of the processor. This software was written expressly for a multi-core system and can support different kinds of fault strategies, such as introspection, algorithm-based fault tolerance (ABFT), and triple modular redundancy (TMR). It focuses on providing fault tolerance for the application code, and represents the first step in a plan to eventually include fault tolerance in message passing and the FTM itself. In the multi-core system, the FTM resides on a single, dedicated core, separate from the cores used by the application. This is done in order to isolate the FTM from application faults and to allow it to swap out any application core for a substitute. The structure of the FTM consists of an interface to a fault tolerant strategy module, a responder module, a fault manager module, an error factory, and an error mapper that determines the severity of the error. In the present reference implementation, the only fault tolerant strategy implemented is introspection. The introspection code waits for an application node to send an error notification to it. It then uses the error factory to create an error object, and at this time, a severity level is assigned to the error. The introspection code uses its built-in knowledge base to generate a recommended response to the error. Responses might include ignoring the error, logging it, rolling back the application to a previously saved checkpoint, swapping in a new node to replace a bad one, or restarting the application. The original error and recommended response are passed to the top-level fault manager module, which invokes the response. The responder module also notifies the introspection module of the generated response. This provides additional information to the

  4. Fault Tolerance in ZigBee Wireless Sensor Networks

    NASA Technical Reports Server (NTRS)

    Alena, Richard; Gilstrap, Ray; Baldwin, Jarren; Stone, Thom; Wilson, Pete

    2011-01-01

    Wireless sensor networks (WSN) based on the IEEE 802.15.4 Personal Area Network standard are finding increasing use in the home automation and emerging smart energy markets. The network and application layers, based on the ZigBee 2007 PRO Standard, provide a convenient framework for component-based software that supports customer solutions from multiple vendors. This technology is supported by System-on-a-Chip solutions, resulting in extremely small and low-power nodes. The Wireless Connections in Space Project addresses the aerospace flight domain for both flight-critical and non-critical avionics. WSNs provide the inherent fault tolerance required for aerospace applications utilizing such technology. The team from Ames Research Center has developed techniques for assessing the fault tolerance of ZigBee WSNs challenged by radio frequency (RF) interference or WSN node failure.

  5. Fault-tolerant clock synchronization validation methodology. [in computer systems

    NASA Technical Reports Server (NTRS)

    Butler, Ricky W.; Palumbo, Daniel L.; Johnson, Sally C.

    1987-01-01

    A validation method for the synchronization subsystem of a fault-tolerant computer system is presented. The high reliability requirement of flight-crucial systems precludes the use of most traditional validation methods. The method presented utilizes formal design proof to uncover design and coding errors and experimentation to validate the assumptions of the design proof. The experimental method is described and illustrated by validating the clock synchronization system of the Software Implemented Fault Tolerance computer. The design proof of the algorithm includes a theorem that defines the maximum skew between any two nonfaulty clocks in the system in terms of specific system parameters. Most of these parameters are deterministic. One crucial parameter is the upper bound on the clock read error, which is stochastic. The probability that this upper bound is exceeded is calculated from data obtained by the measurement of system parameters. This probability is then included in a detailed reliability analysis of the system.

  6. Combining dynamical decoupling with fault-tolerant quantum computation

    SciTech Connect

    Ng, Hui Khoon; Preskill, John; Lidar, Daniel A.

    2011-07-15

    We study how dynamical decoupling (DD) pulse sequences can improve the reliability of quantum computers. We prove upper bounds on the accuracy of DD-protected quantum gates and derive sufficient conditions for DD-protected gates to outperform unprotected gates. Under suitable conditions, fault-tolerant quantum circuits constructed from DD-protected gates can tolerate stronger noise and have a lower overhead cost than fault-tolerant circuits constructed from unprotected gates. Our accuracy estimates depend on the dynamics of the bath that couples to the quantum computer and can be expressed either in terms of the operator norm of the bath's Hamiltonian or in terms of the power spectrum of bath correlations; we explain in particular how the performance of recursively generated concatenated pulse sequences can be analyzed from either viewpoint. Our results apply to Hamiltonian noise models with limited spatial correlations.

  7. Fault Injection Campaign for a Fault Tolerant Duplex Framework

    NASA Technical Reports Server (NTRS)

    Sacco, Gian Franco; Ferraro, Robert D.; von llmen, Paul; Rennels, Dave A.

    2007-01-01

    Fault tolerance is an efficient approach adopted to avoid or reduce the damage of a system failure. In this work we present the results of a fault injection campaign we conducted on the Duplex Framework (DF). The DF is a software developed by the UCLA group [1, 2] that uses a fault tolerant approach and allows to run two replicas of the same process on two different nodes of a commercial off-the-shelf (COTS) computer cluster. A third process running on a different node, constantly monitors the results computed by the two replicas, and eventually restarts the two replica processes if an inconsistency in their computation is detected. This approach is very cost efficient and can be adopted to control processes on spacecrafts where the fault rate produced by cosmic rays is not very high.

  8. A benchmark for fault tolerant flight control evaluation

    NASA Astrophysics Data System (ADS)

    Smaili, H.; Breeman, J.; Lombaerts, T.; Stroosma, O.

    2013-12-01

    A large transport aircraft simulation benchmark (REconfigurable COntrol for Vehicle Emergency Return - RECOVER) has been developed within the GARTEUR (Group for Aeronautical Research and Technology in Europe) Flight Mechanics Action Group 16 (FM-AG(16)) on Fault Tolerant Control (2004 2008) for the integrated evaluation of fault detection and identification (FDI) and reconfigurable flight control strategies. The benchmark includes a suitable set of assessment criteria and failure cases, based on reconstructed accident scenarios, to assess the potential of new adaptive control strategies to improve aircraft survivability. The application of reconstruction and modeling techniques, based on accident flight data, has resulted in high-fidelity nonlinear aircraft and fault models to evaluate new Fault Tolerant Flight Control (FTFC) concepts and their real-time performance to accommodate in-flight failures.

  9. Fault Tolerant Characteristics of Artificial Neural Network Electronic Hardware

    NASA Technical Reports Server (NTRS)

    Zee, Frank

    1995-01-01

    The fault tolerant characteristics of analog-VLSI artificial neural network (with 32 neurons and 532 synapses) chips are studied by exposing them to high energy electrons, high energy protons, and gamma ionizing radiations under biased and unbiased conditions. The biased chips became nonfunctional after receiving a cumulative dose of less than 20 krads, while the unbiased chips only started to show degradation with a cumulative dose of over 100 krads. As the total radiation dose increased, all the components demonstrated graceful degradation. The analog sigmoidal function of the neuron became steeper (increase in gain), current leakage from the synapses progressively shifted the sigmoidal curve, and the digital memory of the synapses and the memory addressing circuits began to gradually fail. From these radiation experiments, we can learn how to modify certain designs of the neural network electronic hardware without using radiation-hardening techniques to increase its reliability and fault tolerance.

  10. Fault-tolerant building-block computer study

    NASA Technical Reports Server (NTRS)

    Rennels, D. A.

    1978-01-01

    Ultra-reliable core computers are required for improving the reliability of complex military systems. Such computers can provide reliable fault diagnosis, failure circumvention, and, in some cases serve as an automated repairman for their host systems. A small set of building-block circuits which can be implemented as single very large integration devices, and which can be used with off-the-shelf microprocessors and memories to build self checking computer modules (SCCM) is described. Each SCCM is a microcomputer which is capable of detecting its own faults during normal operation and is described to communicate with other identical modules over one or more Mil Standard 1553A buses. Several SCCMs can be connected into a network with backup spares to provide fault-tolerant operation, i.e. automated recovery from faults. Alternative fault-tolerant SCCM configurations are discussed along with the cost and reliability associated with their implementation.

  11. Fault recovery characteristics of the fault tolerant multi-processor

    NASA Technical Reports Server (NTRS)

    Padilla, Peter A.

    1990-01-01

    The fault handling performance of the fault tolerant multiprocessor (FTMP) was investigated. Fault handling errors detected during fault injection experiments were characterized. In these fault injection experiments, the FTMP disabled a working unit instead of the faulted unit once every 500 faults, on the average. System design weaknesses allow active faults to exercise a part of the fault management software that handles byzantine or lying faults. It is pointed out that these weak areas in the FTMP's design increase the probability that, for any hardware fault, a good LRU (line replaceable unit) is mistakenly disabled by the fault management software. It is concluded that fault injection can help detect and analyze the behavior of a system in the ultra-reliable regime. Although fault injection testing cannot be exhaustive, it has been demonstrated that it provides a unique capability to unmask problems and to characterize the behavior of a fault-tolerant system.

  12. Fault-tolerant software for aircraft control systems

    NASA Technical Reports Server (NTRS)

    1978-01-01

    Concepts for software to implement real time aircraft control systems on a centralized digital computer were discussed. A fault tolerant software structure employing functionally redundant routines with concurrent error detection was proposed for critical control functions involving safety of flight and landing. A degraded recovery block concept was devised to allow collocation of critical and noncritical software modules within the same control structure. The additional computer resources required to implement the proposed software structure for a representative set of aircraft control functions were discussed. It was estimated that approximately 30 percent more memory space is required to implement the total set of control functions. A reliability model for the fault tolerant software was described and parametric estimates of failure rate were made.

  13. Survey of fault-tolerant multistage networks and comparison to the extra stage cube

    SciTech Connect

    Adams, G.B. III; Siegel, H.J.

    1984-01-01

    A variety of fault-tolerant multistage interconnection networks for parallel processing systems that have been proposed in the literature are surveyed. A network is fault-tolerant if it can continue to meet its fault tolerance criterion in the presence of one or more failures of the type(s) allowed by its fault model. Significant differences in fault models and fault-tolerance criteria exist among various fault-tolerant networks. This makes direct comparison of these networks difficult. In analyzing the networks, this paper compares the various models and assesses the effect of choosing a common model and criterion. Network characteristics such as degree of fault tolerance, routing control method, and permutation capability are discussed. The networks surveyed and compared to the extra stage cube are the modified baseline, augmented delta, f-network, enhanced inverse augmented data manipulator, gamma, fault-tolerant Benes, and beta-networks. 21 references.

  14. Fault-tolerant Landau-Zener quantum gates

    SciTech Connect

    Hicke, C.; Santos, L. F.; Dykman, M. I.

    2006-01-15

    We present a method to perform fault-tolerant single-qubit gate operations using Landau-Zener tunneling. In a single Landau-Zener pulse, the qubit transition frequency is varied in time so that it passes through the frequency of the radiation field. We show that a simple three-pulse sequence allows eliminating errors in the gate up to the third order in errors in the qubit energies or the radiation frequency.

  15. Design methods for fault-tolerant finite state machines

    NASA Technical Reports Server (NTRS)

    Niranjan, Shailesh; Frenzel, James F.

    1993-01-01

    VLSI electronic circuits are increasingly being used in space-borne applications where high levels of radiation may induce faults, known as single event upsets. In this paper we review the classical methods of designing fault tolerant digital systems, with an emphasis on those methods which are particularly suitable for VLSI-implementation of finite state machines. Four methods are presented and will be compared in terms of design complexity, circuit size, and estimated circuit delay.

  16. Decomposition in reliability analysis of fault-tolerant systems

    NASA Technical Reports Server (NTRS)

    Trivedi, K. S.; Geist, R. M.

    1983-01-01

    The existing approaches to reliability modeling are briefly reviewed. An examination of the limitations of the existing approaches in modeling ultrareliable fault-tolerant systems illustrates the need to use decomposition techniques. The notion of behavioral decomposition is introduced for dealing with reliability models with a large number of states, and a series of examples is presented. The CARE (computer-aided reliability estimation) and HARP (hybrid automated reliability predictor) approaches to reliability are discussed.

  17. Using certification trails to achieve software fault tolerance

    NASA Technical Reports Server (NTRS)

    Sullivan, Gregory F.; Masson, Gerald M.

    1993-01-01

    A conceptually novel and powerful technique to achieve fault tolerance in hardware and software systems is introduced. When used for software fault tolerance, this new technique uses time and software redundancy and can be outlined as follows. In the initial phase, a program is run to solve a problem and store the result. In addition, this program leaves behind a trail of data called a certification trail. In the second phase, another program is run which solves the original problem again. This program, however, has access to the certification trail left by the first program. Because of the availability of the certification trail, the second phase can be performed by a less complex program and can execute more quickly. In the final phase, the two results are accepted as correct; otherwise an error is indicated. An essential aspect of this approach is that the second program must always generate either an error indication or a correct output even when the certification trail it receives from the first program is incorrect. The certification trail approach to fault tolerance was formalized and it was illustrated by applying it to the fundamental problem of finding a minimum spanning tree. Cases in which the second phase can be run concorrectly with the first and act as a monitor are discussed. The certification trail approach was compared to other approaches to fault tolerance. Because of space limitations we have omitted examples of our technique applied to the Huffman tree, and convex hull problems. These can be found in the full version of this paper.

  18. Fault tolerant sequential circuits using sequence invariant state machines

    NASA Technical Reports Server (NTRS)

    Alahmad, M.; Whitaker, S.

    1991-01-01

    The idea of introducing redundancy to improve the reliability of digital systems originates from papers published in the 1950's. Since then, redundancy has been recognized as a realistic means for constructing reliable systems. A method using redundancy to reconfigure the Sequency Invariant State Machine (SISM) to achieve fault tolerance is introduced. This new architecture is most useful in space applications, where recovery rather than replacement of faulty modules is the only means of maintenance.

  19. Bounded-time fault-tolerant rule-based systems

    NASA Technical Reports Server (NTRS)

    Browne, James C.; Emerson, Allen; Gouda, Mohamed; Miranker, Daniel; Mok, Aloysius; Rosier, Louis

    1990-01-01

    Two systems concepts are introduced: bounded response-time and self-stabilization in the context of rule-based programs. These concepts are essential for the design of rule-based programs which must be highly fault tolerant and perform in a real time environment. The mechanical analysis of programs for these two properties is discussed. The techniques are used to analyze a NASA application.

  20. Formal Techniques for Synchronized Fault-Tolerant Systems

    NASA Technical Reports Server (NTRS)

    DiVito, Ben L.; Butler, Ricky W.

    1992-01-01

    We present the formal verification of synchronizing aspects of the Reliable Computing Platform (RCP), a fault-tolerant computing system for digital flight control applications. The RCP uses NMR-style redundancy to mask faults and internal majority voting to purge the effects of transient faults. The system design has been formally specified and verified using the EHDM verification system. Our formalization is based on an extended state machine model incorporating snapshots of local processors clocks.

  1. A fault-tolerant one-way quantum computer

    SciTech Connect

    Raussendorf, R. . E-mail: rraussendorf@perimeterinstitute.ca; Harrington, J.; Goyal, K.

    2006-09-15

    We describe a fault-tolerant one-way quantum computer on cluster states in three dimensions. The presented scheme uses methods of topological error correction resulting from a link between cluster states and surface codes. The error threshold is 1.4% for local depolarizing error and 0.11% for each source in an error model with preparation-, gate-, storage-, and measurement errors.

  2. The art of fault-tolerant system reliability modeling

    NASA Technical Reports Server (NTRS)

    Butler, Ricky W.; Johnson, Sally C.

    1990-01-01

    A step-by-step tutorial of the methods and tools used for the reliability analysis of fault-tolerant systems is presented. Emphasis is on the representation of architectural features in mathematical models. Details of the mathematical solution of complex reliability models are not presented. Instead the use of several recently developed computer programs--SURE, ASSIST, STEM, PAWS--which automate the generation and solution of these models is described.

  3. Validation of a fault-tolerant clock synchronization system

    NASA Technical Reports Server (NTRS)

    Butler, R. W.; Johnson, S. C.

    1984-01-01

    A validation method for the synchronization subsystem of a fault tolerant computer system is investigated. The method combines formal design verification with experimental testing. The design proof reduces the correctness of the clock synchronization system to the correctness of a set of axioms which are experimentally validated. Since the reliability requirements are often extreme, requiring the estimation of extremely large quantiles, an asymptotic approach to estimation in the tail of a distribution is employed.

  4. Scalable and Fault Tolerant Failure Detection and Consensus

    SciTech Connect

    Katti, Amogh; Di Fatta, Giuseppe; Naughton III, Thomas J; Engelmann, Christian

    2015-01-01

    Future extreme-scale high-performance computing systems will be required to work under frequent component failures. The MPI Forum's User Level Failure Mitigation proposal has introduced an operation, MPI_Comm_shrink, to synchronize the alive processes on the list of failed processes, so that applications can continue to execute even in the presence of failures by adopting algorithm-based fault tolerance techniques. This MPI_Comm_shrink operation requires a fault tolerant failure detection and consensus algorithm. This paper presents and compares two novel failure detection and consensus algorithms. The proposed algorithms are based on Gossip protocols and are inherently fault-tolerant and scalable. The proposed algorithms were implemented and tested using the Extreme-scale Simulator. The results show that in both algorithms the number of Gossip cycles to achieve global consensus scales logarithmically with system size. The second algorithm also shows better scalability in terms of memory and network bandwidth usage and a perfect synchronization in achieving global consensus.

  5. Evaluation of reliability modeling tools for advanced fault tolerant systems

    NASA Technical Reports Server (NTRS)

    Baker, Robert; Scheper, Charlotte

    1986-01-01

    The Computer Aided Reliability Estimation (CARE III) and Automated Reliability Interactice Estimation System (ARIES 82) reliability tools for application to advanced fault tolerance aerospace systems were evaluated. To determine reliability modeling requirements, the evaluation focused on the Draper Laboratories' Advanced Information Processing System (AIPS) architecture as an example architecture for fault tolerance aerospace systems. Advantages and limitations were identified for each reliability evaluation tool. The CARE III program was designed primarily for analyzing ultrareliable flight control systems. The ARIES 82 program's primary use was to support university research and teaching. Both CARE III and ARIES 82 were not suited for determining the reliability of complex nodal networks of the type used to interconnect processing sites in the AIPS architecture. It was concluded that ARIES was not suitable for modeling advanced fault tolerant systems. It was further concluded that subject to some limitations (the difficulty in modeling systems with unpowered spare modules, systems where equipment maintenance must be considered, systems where failure depends on the sequence in which faults occurred, and systems where multiple faults greater than a double near coincident faults must be considered), CARE III is best suited for evaluating the reliability of advanced tolerant systems for air transport.

  6. ROBUS-2: A Fault-Tolerant Broadcast Communication System

    NASA Technical Reports Server (NTRS)

    Torres-Pomales, Wilfredo; Malekpour, Mahyar R.; Miner, Paul S.

    2005-01-01

    The Reliable Optical Bus (ROBUS) is the core communication system of the Scalable Processor-Independent Design for Enhanced Reliability (SPIDER), a general-purpose fault-tolerant integrated modular architecture currently under development at NASA Langley Research Center. The ROBUS is a time-division multiple access (TDMA) broadcast communication system with medium access control by means of time-indexed communication schedule. ROBUS-2 is a developmental version of the ROBUS providing guaranteed fault-tolerant services to the attached processing elements (PEs), in the presence of a bounded number of faults. These services include message broadcast (Byzantine Agreement), dynamic communication schedule update, clock synchronization, and distributed diagnosis (group membership). The ROBUS also features fault-tolerant startup and restart capabilities. ROBUS-2 is tolerant to internal as well as PE faults, and incorporates a dynamic self-reconfiguration capability driven by the internal diagnostic system. This version of the ROBUS is intended for laboratory experimentation and demonstrations of the capability to reintegrate failed nodes, dynamically update the communication schedule, and tolerate and recover from correlated transient faults.

  7. Data-driven Fault Tolerance for Work Stealing Computations

    SciTech Connect

    Ma, Wenjing; Krishnamoorthy, Sriram

    2012-06-25

    Checkpoint-restart approaches to fault tolerance typically roll back all the processes to the previous checkpoint in the event of a failure. Work stealing is a promising technique to dynamically tolerate variations in the execution environment, including faults, system noise, and energy constraints. In this paper, we present fault tolerance mechanisms for task parallel computations, a popular computation idiom, employing work stealing. The computation is organized as a collection of tasks with data in a global address space. The completion of data operations, rather than the actual messages, is tracked to derive an idempotent data store. This information is used to accurately identify the tasks to be re-executed, therefore to recompute only the lost data, in the presence of random work stealing. We consider three recovery schemes that present distinct trade-offs -- lazy recovery with potentially increased re-execution cost, immediate collective recovery with associated synchronization overheads, and noncollective recovery enabled by additional communication. We employ distributed work stealing to dynamically rebalance the tasks on the live processes and evaluate the three schemes using candidate application benchmarks. We demonstrate that the overheads (space and time) of the fault tolerance mechanism are low, the cost incurred due to failures are small, and the overheads decrease with per-process work at scale.

  8. Faster quantum chemistry simulation on fault-tolerant quantum computers

    NASA Astrophysics Data System (ADS)

    Cody Jones, N.; Whitfield, James D.; McMahon, Peter L.; Yung, Man-Hong; Van Meter, Rodney; Aspuru-Guzik, Alán; Yamamoto, Yoshihisa

    2012-11-01

    Quantum computers can in principle simulate quantum physics exponentially faster than their classical counterparts, but some technical hurdles remain. We propose methods which substantially improve the performance of a particular form of simulation, ab initio quantum chemistry, on fault-tolerant quantum computers; these methods generalize readily to other quantum simulation problems. Quantum teleportation plays a key role in these improvements and is used extensively as a computing resource. To improve execution time, we examine techniques for constructing arbitrary gates which perform substantially faster than circuits based on the conventional Solovay-Kitaev algorithm (Dawson and Nielsen 2006 Quantum Inform. Comput. 6 81). For a given approximation error ɛ, arbitrary single-qubit gates can be produced fault-tolerantly and using a restricted set of gates in time which is O(log ɛ) or O(log log ɛ) with sufficient parallel preparation of ancillas, constant average depth is possible using a method we call programmable ancilla rotations. Moreover, we construct and analyze efficient implementations of first- and second-quantized simulation algorithms using the fault-tolerant arbitrary gates and other techniques, such as implementing various subroutines in constant time. A specific example we analyze is the ground-state energy calculation for lithium hydride.

  9. Active Fault Tolerant Control for Ultrasonic Piezoelectric Motor

    NASA Astrophysics Data System (ADS)

    Boukhnifer, Moussa

    2012-07-01

    Ultrasonic piezoelectric motor technology is an important system component in integrated mechatronics devices working on extreme operating conditions. Due to these constraints, robustness and performance of the control interfaces should be taken into account in the motor design. In this paper, we apply a new architecture for a fault tolerant control using Youla parameterization for an ultrasonic piezoelectric motor. The distinguished feature of proposed controller architecture is that it shows structurally how the controller design for performance and robustness may be done separately which has the potential to overcome the conflict between performance and robustness in the traditional feedback framework. A fault tolerant control architecture includes two parts: one part for performance and the other part for robustness. The controller design works in such a way that the feedback control system will be solely controlled by the proportional plus double-integral PI2 performance controller for a nominal model without disturbances and H∞ robustification controller will only be activated in the presence of the uncertainties or an external disturbances. The simulation results demonstrate the effectiveness of the proposed fault tolerant control architecture.

  10. Algorithm-dependent fault tolerance for distributed computing

    SciTech Connect

    P. D. Hough; M. e. Goldsby; E. J. Walsh

    2000-02-01

    Large-scale distributed systems assembled from commodity parts, like CPlant, have become common tools in the distributed computing world. Because of their size and diversity of parts, these systems are prone to failures. Applications that are being run on these systems have not been equipped to efficiently deal with failures, nor is there vendor support for fault tolerance. Thus, when a failure occurs, the application crashes. While most programmers make use of checkpoints to allow for restarting of their applications, this is cumbersome and incurs substantial overhead. In many cases, there are more efficient and more elegant ways in which to address failures. The goal of this project is to develop a software architecture for the detection of and recovery from faults in a cluster computing environment. The detection phase relies on the latest techniques developed in the fault tolerance community. Recovery is being addressed in an application-dependent manner, thus allowing the programmer to take advantage of algorithmic characteristics to reduce the overhead of fault tolerance. This architecture will allow large-scale applications to be more robust in high-performance computing environments that are comprised of clusters of commodity computers such as CPlant and SMP clusters.

  11. Resource requirements for a fault-tolerant quantum Fourier transform

    NASA Astrophysics Data System (ADS)

    Goto, Hayato

    2014-11-01

    We investigate resource requirements for a fault-tolerant quantum Fourier transform. The quantum Fourier transform is a basic subroutine for quantum algorithms which provide an exponential speedup over known classical ones, such as Shor's algorithm for factoring. To implement single-qubit rotations required for a quantum Fourier transform in a fault-tolerant manner, we consider two types of approaches: gate synthesis and state distillation. While the gate synthesis approximates single-qubit rotations with basic quantum operations, the state distillation allows one to perform single-qubit rotations for a quantum Fourier transform exactly. It is unknown, however, which approach is better for a quantum Fourier transform. Here we develop a state-distillation method optimized for a quantum Fourier transform and compare this performance with those of state-of-the-art techniques for gate synthesis without and with ancillary states (ancillas). The performance is evaluated with the resource requirement for a quantum Fourier transform. The resource is measured by the total number of π /8 gates denoted by T , which is called the T count. Contrary to the expectation, the T count for the state distillation is considerably larger than those for the ancilla-free and ancilla-assisted gate synthesis. Thus, we conclude that the ancilla-assisted gate synthesis is a better approach to a fault-tolerant quantum Fourier transform.

  12. Exploiting data representation for fault tolerance

    SciTech Connect

    Hoemmen, Mark Frederick; Elliott, J.; Mueller, F.

    2015-01-06

    Incorrect computer hardware behavior may corrupt intermediate computations in numerical algorithms, possibly resulting in incorrect answers. Prior work models misbehaving hardware by randomly flipping bits in memory. We start by accepting this premise, and present an analytic model for the error introduced by a bit flip in an IEEE 754 floating-point number. We then relate this finding to the linear algebra concepts of normalization and matrix equilibration. In particular, we present a case study illustrating that normalizing both vector inputs of a dot product minimizes the probability of a single bit flip causing a large error in the dot product's result. Moreover, the absolute error is either less than one or very large, which allows detection of large errors. Then, we apply this to the GMRES iterative solver. We count all possible errors that can be introduced through faults in arithmetic in the computationally intensive orthogonalization phase of GMRES, and show that when the matrix is equilibrated, the absolute error is bounded above by one.

  13. Exploiting data representation for fault tolerance

    DOE PAGESBeta

    Hoemmen, Mark Frederick; Elliott, J.; Sandia National Lab.; Mueller, F.

    2015-01-06

    Incorrect computer hardware behavior may corrupt intermediate computations in numerical algorithms, possibly resulting in incorrect answers. Prior work models misbehaving hardware by randomly flipping bits in memory. We start by accepting this premise, and present an analytic model for the error introduced by a bit flip in an IEEE 754 floating-point number. We then relate this finding to the linear algebra concepts of normalization and matrix equilibration. In particular, we present a case study illustrating that normalizing both vector inputs of a dot product minimizes the probability of a single bit flip causing a large error in the dot product'smore » result. Moreover, the absolute error is either less than one or very large, which allows detection of large errors. Then, we apply this to the GMRES iterative solver. We count all possible errors that can be introduced through faults in arithmetic in the computationally intensive orthogonalization phase of GMRES, and show that when the matrix is equilibrated, the absolute error is bounded above by one.« less

  14. The non-linear MSW equation

    NASA Astrophysics Data System (ADS)

    Thomson, Mark J.; McKellar, Bruce H. J.

    1991-04-01

    A simple, non-linear generalization of the MSW equation is presented and its analytic solution is outlined. The orbits of the polarization vector are shown to be periodic, and to lie on a sphere. Their non-trivial flow patterns fall into two topological categories, the more complex of which can become chaotic if perturbed.

  15. Evaluation Applied to Reliability Analysis of Reconfigurable, Highly Reliable, Fault-Tolerant, Computing Systems for Avionics

    NASA Technical Reports Server (NTRS)

    Migneault, G. E.

    1979-01-01

    Emulation techniques are proposed as a solution to a difficulty arising in the analysis of the reliability of highly reliable computer systems for future commercial aircraft. The difficulty, viz., the lack of credible precision in reliability estimates obtained by analytical modeling techniques are established. The difficulty is shown to be an unavoidable consequence of: (1) a high reliability requirement so demanding as to make system evaluation by use testing infeasible, (2) a complex system design technique, fault tolerance, (3) system reliability dominated by errors due to flaws in the system definition, and (4) elaborate analytical modeling techniques whose precision outputs are quite sensitive to errors of approximation in their input data. The technique of emulation is described, indicating how its input is a simple description of the logical structure of a system and its output is the consequent behavior. The use of emulation techniques is discussed for pseudo-testing systems to evaluate bounds on the parameter values needed for the analytical techniques.

  16. Engineering Non-Classical Light with Non-Linear Microwaveguides

    NASA Astrophysics Data System (ADS)

    Grimsmo, Arne; Clerk, Aashish; Blais, Alexandre

    The quest for ever increasing fidelity and scalability in measurement of superconducting qubits to be used for fault-tolerant quantum computing has recently led to the development of near quantum-limited broadband phase preserving amplifiers in the microwave regime. These devices are, however, more than just amplifiers: They are sources of high-quality, broadband two-mode squeezed light. We show how bottom-up engineering of Josephson junction embedded waveguides can be used to design novel squeezing spectra. Furthermore, the entanglement in the two-mode squeezed output field can be imprinted onto quantum systems coupled to the device's output. These broadband microwave amplifiers constitute a realization of non-linear waveguide QED, a very interesting playground for non-equilibrium many-body physics.

  17. Energy dissipation and error probability in fault-tolerant binary switching

    PubMed Central

    Fashami, Mohammad Salehi; Atulasimha, Jayasimha; Bandyopadhyay, Supriyo

    2013-01-01

    The potential energy profile of an ideal binary switch is a symmetric double well. Switching between the wells without energy dissipation requires time-modulating the height of the potential barrier separating the wells and tilting the profile towards the desired well at the precise juncture when the barrier disappears. This, however, demands perfect timing synchronization and is therefore fault-intolerant even in the absence of noise. A fault-tolerant strategy that requires no time modulation of the barrier (and hence no timing synchronization) switches by tilting the profile by an amount at least equal to the barrier height and dissipates at least that amount of energy in abrupt switching. Here, we present a third strategy that requires a time modulated barrier but no timing synchronization. It is therefore fault-tolerant, error-free in the absence of thermal noise, and yet it dissipates arbitrarily small energy in a noise-free environment since an arbitrarily small tilt is required for slow switching. This case is exemplified with stress induced switching of a shape-anisotropic single-domain soft nanomagnet dipole-coupled to a hard magnet. When thermal noise is present, we show analytically that the minimum energy dissipated to switch in this scheme is ~2kTln(1/p) [p = switching error probability]. PMID:24220310

  18. Non-linear oscillations

    NASA Astrophysics Data System (ADS)

    Hagedorn, P.

    The mathematical pendulum is used to provide a survey of free and forced oscillations in damped and undamped systems. This simple model is employed to present illustrations for and comparisons between the various approximation schemes. A summary of the Liapunov stability theory is provided. The first and the second method of Liapunov are explained for autonomous as well as for nonautonomous systems. Here, a basic familiarity with the theory of linear oscillations is assumed. La Salle's theorem about the stability of invariant domains is explained in terms of illustrative examples. Self-excited oscillations are examined, taking into account such oscillations in mechanical and electrical systems, analytical approximation methods for the computation of self-excited oscillations, analytical criteria for the existence of limit cycles, forced oscillations in self-excited systems, and self-excited oscillations in systems with several degrees of freedom. Attention is given to Hamiltonian systems and an introduction to the theory of optimal control is provided.

  19. Prediction of pressure drawdown in gas reservoirs using a semi-analytical solution of the non-linear gas flow equation

    SciTech Connect

    Mattar, L.; Adegbesan, L.O.

    1980-01-01

    The differential equation for flow of gases in a porous medium is nonlinear and cannot be solved by strictly analytical methods. Previous studies in the literature have obtained analytical solutions to this equation by linearlization (i.e., treating viscosity and compressibilty as constant). In this study, the solution for nonlinear gas flow equation is obtained using the semianalytical technique developed by Kale and Mattar which solves the nonlinear equation by the method of perturbation. Results obtained, for prediction of pressure drawdown in gas reservoirs, indicate that the solution of the linearlized form of the equation is valid for both low and high permeability reservoirs.

  20. Multi-fault Tolerance for Cartesian Data Distributions

    SciTech Connect

    Ali, Nawab; Krishnamoorthy, Sriram; Halappanavar, Mahantesh; Daily, Jeffrey A.

    2013-06-01

    Faults are expected to play an increasingly important role in how algorithms and applications are designed to run on future extreme-scale sys- tems. Algorithm-based fault tolerance (ABFT) is a promising approach that involves modications to the algorithm to recover from faults with lower over- heads than replicated storage and a signicant reduction in lost work compared to checkpoint-restart techniques. Fault-tolerant linear algebra (FTLA) algo- rithms employ additional processors that store parities along the dimensions of a matrix to tolerate multiple, simultaneous faults. Existing approaches as- sume regular data distributions (blocked or block-cyclic) with the failures of each data block being independent. To match the characteristics of failures on parallel computers, we extend these approaches to mapping parity blocks in several important ways. First, we handle parity computation for generalized Cartesian data distributions with each processor holding arbitrary subsets of blocks in a Cartesian-distributed array. Second, techniques to handle corre- lated failures, i.e., multiple processors that can be expected to fail together, are presented. Third, we handle the colocation of parity blocks with the data blocks and do not require them to be on additional processors. Several al- ternative approaches, based on graph matching, are presented that attempt to balance the memory overhead on processors while guaranteeing the same fault tolerance properties as existing approaches that assume independent fail- ures on regular blocked data distributions. The evaluation of these algorithms demonstrates that the additional desirable properties are provided by the pro- posed approach with minimal overhead.

  1. Minimizing resource overheads for fault-tolerant preparation of encoded states of the Steane code

    PubMed Central

    Goto, Hayato

    2016-01-01

    The seven-qubit quantum error-correcting code originally proposed by Steane is one of the best known quantum codes. The Steane code has a desirable property that most basic operations can be performed easily in a fault-tolerant manner. A major obstacle to fault-tolerant quantum computation with the Steane code is fault-tolerant preparation of encoded states, which requires large computational resources. Here we propose efficient state preparation methods for zero and magic states encoded with the Steane code, where the zero state is one of the computational basis states and the magic state allows us to achieve universality in fault-tolerant quantum computation. The methods minimize resource overheads for the fault-tolerant state preparation, and therefore reduce necessary resources for quantum computation with the Steane code. Thus, the present results will open a new possibility for efficient fault-tolerant quantum computation. PMID:26812959

  2. Software Implemented Fault-Tolerant (SIFT) user's guide

    NASA Technical Reports Server (NTRS)

    Green, D. F., Jr.; Palumbo, D. L.; Baltrus, D. W.

    1984-01-01

    Program development for a Software Implemented Fault Tolerant (SIFT) computer system is accomplished in the NASA LaRC AIRLAB facility using a DEC VAX-11 to interface with eight Bendix BDX 930 flight control processors. The interface software which provides this SIFT program development capability was developed by AIRLAB personnel. This technical memorandum describes the application and design of this software in detail, and is intended to assist both the user in performance of SIFT research and the systems programmer responsible for maintaining and/or upgrading the SIFT programming environment.

  3. CMOS processor element for a fault-tolerant SVD array

    NASA Astrophysics Data System (ADS)

    Kota, Kishore; Cavallaro, Joseph R.

    1993-11-01

    This paper describes the VLSI implementation of a CORDIC based processor element for use in a fault-reconfigurable systolic array to compute the singular value decomposition (SVD) of a matrix. The chip implements a time redundant fault tolerance scheme, which allows processors adjacent to a faulty processor to act as computation backup during the systolic idle time. Also, processors around a fault collaborate to reroute data around the faulty processor. This form of time redundancy is attractive when tolerance to a few faults needs to be achieved with little hardware overhead.

  4. Computer-aided reliability estimation. [for fault-tolerant systems

    NASA Technical Reports Server (NTRS)

    Stiffler, J. J.

    1977-01-01

    Computer-aided reliability estimation (CARE) programs are developed to improve the tools available for estimating the reliability of fault-tolerant systems. A description is presented of a program, called CARE II, which was developed after the first program reported by Mathur (1971). Attention is given to the CARE II reliability model, the CARE II coverage model, and CARE II limitations which are to be rectified in CARE III. It is pointed out that the present coverage model in CARE II is extremely versatile. The major limitation is related to the burden placed on the user to determine the basic parameters from which the coverage calculations are made.

  5. Parameter Transient Behavior Analysis on Fault Tolerant Control System

    NASA Technical Reports Server (NTRS)

    Belcastro, Christine (Technical Monitor); Shin, Jong-Yeob

    2003-01-01

    In a fault tolerant control (FTC) system, a parameter varying FTC law is reconfigured based on fault parameters estimated by fault detection and isolation (FDI) modules. FDI modules require some time to detect fault occurrences in aero-vehicle dynamics. This paper illustrates analysis of a FTC system based on estimated fault parameter transient behavior which may include false fault detections during a short time interval. Using Lyapunov function analysis, the upper bound of an induced-L2 norm of the FTC system performance is calculated as a function of a fault detection time and the exponential decay rate of the Lyapunov function.

  6. Implementing fault tolerance in a superconducting quantum circuit

    NASA Astrophysics Data System (ADS)

    Barends, Rami

    2015-03-01

    The surface code error correction scheme is appealing for superconducting circuits as the fundamental operations have been demonstrated at the fault-tolerant threshold. Here, we present experimental results on the repetition code, a one-dimensional primitive of the surface code which can detect bit-flip errors, implemented on a device consisting of nine Xmon transmon qubits. We discuss the basic mechanics of error detection, show preservation of a Greenberger-Horne-Zeilinger state, and show suppression of environmentally-induced error.

  7. Fault tolerant issues in the BTeV trigger

    SciTech Connect

    Jeffrey A. Appel et al.

    2002-12-03

    The BTeV trigger performs sophisticated computations using large ensembles of FPGAs, DSPs, and conventional microprocessors. This system will have between 5,000 and 10,000 computing elements and many networks and data switches. While much attention has been devoted to developing efficient algorithms, the need for fault-tolerant, fault-adaptive, and flexible techniques and software to manage this huge computing platform has been identified as one of the most challenging aspects of this project. They describe the problem and offer an approach to solving it based on a distributed, hierarchical fault management system.

  8. A Test Generation Framework for Distributed Fault-Tolerant Algorithms

    NASA Technical Reports Server (NTRS)

    Goodloe, Alwyn; Bushnell, David; Miner, Paul; Pasareanu, Corina S.

    2009-01-01

    Heavyweight formal methods such as theorem proving have been successfully applied to the analysis of safety critical fault-tolerant systems. Typically, the models and proofs performed during such analysis do not inform the testing process of actual implementations. We propose a framework for generating test vectors from specifications written in the Prototype Verification System (PVS). The methodology uses a translator to produce a Java prototype from a PVS specification. Symbolic (Java) PathFinder is then employed to generate a collection of test cases. A small example is employed to illustrate how the framework can be used in practice.

  9. Programs For Modeling Fault-Tolerant Computing Systems

    NASA Technical Reports Server (NTRS)

    Butler, Ricky W.

    1991-01-01

    Pade Approximation with Scaling, (PAWS) and Scaling Taylor Exponential Matrix (STEM) computer programs are software tools for design and validation. Provide flexible, user-friendly, language-based interface for input of Markov mathematical methods describing behaviors of fault-tolerant computer systems. Markov models include both recovery from faults via reconfiguration and behaviors of such systems when faults occur. PAWS and STEM produce exact solutions of probability of system failure and provide conservative estimate of number of significant digits in solution. Written in PASCAL and FORTRAN.

  10. Reliability analysis of fault-tolerant reconfigurable nano-architectures

    SciTech Connect

    Bhaduri, D.; Graham, P. S.; Shukla, S. K.

    2004-01-01

    Manufacturing defects and transient errors will be abundant in high - density reconfigurable nano-scale designs. Recently, we have automated a computational scheme based on Markov Random Field (MRF) and Belief Propagation algorithms in a tool named NANOLAB to evaluate the reliability of nano architectures. In this paper, we show how our methodology can be exploited to design defect- and fault-tolerant programmable logic architectures. The effectiveness of such automation is illustrated by analyzing reconfigurable Boolean networks formed using different industry-based configurable logic blocks (CLBs), both in the presence of thermal perturbations and signal noise.

  11. Single event upset tests of a RISC-based fault-tolerant computer

    SciTech Connect

    Kimbrough, J.R.; Butner, D.N.; Colella, N.J.; Kaschmitter, J.L.; Shaeffer, D.L.; McKnett, C.L.; Coakley, P.G.; Casteneda, C.

    1996-03-23

    The project successfully demonstrated that dual lock-step comparison of commercial RISC processors is a viable fault-tolerant approach to handling SEU in space environment. The fault tolerant approach on orbit error rate was 38 times less than the single processor error rate. The random nature of the upsets and appearance in critical code section show it is essential to incorporate both hardware and software in the design and operation of fault-tolerant computers.

  12. Validation Methods for Fault-Tolerant avionics and control systems, working group meeting 1

    NASA Technical Reports Server (NTRS)

    1979-01-01

    The proceedings of the first working group meeting on validation methods for fault tolerant computer design are presented. The state of the art in fault tolerant computer validation was examined in order to provide a framework for future discussions concerning research issues for the validation of fault tolerant avionics and flight control systems. The development of positions concerning critical aspects of the validation process are given.

  13. Novel neural networks-based fault tolerant control scheme with fault alarm.

    PubMed

    Shen, Qikun; Jiang, Bin; Shi, Peng; Lim, Cheng-Chew

    2014-11-01

    In this paper, the problem of adaptive active fault-tolerant control for a class of nonlinear systems with unknown actuator fault is investigated. The actuator fault is assumed to have no traditional affine appearance of the system state variables and control input. The useful property of the basis function of the radial basis function neural network (NN), which will be used in the design of the fault tolerant controller, is explored. Based on the analysis of the design of normal and passive fault tolerant controllers, by using the implicit function theorem, a novel NN-based active fault-tolerant control scheme with fault alarm is proposed. Comparing with results in the literature, the fault-tolerant control scheme can minimize the time delay between fault occurrence and accommodation that is called the time delay due to fault diagnosis, and reduce the adverse effect on system performance. In addition, the FTC scheme has the advantages of a passive fault-tolerant control scheme as well as the traditional active fault-tolerant control scheme's properties. Furthermore, the fault-tolerant control scheme requires no additional fault detection and isolation model which is necessary in the traditional active fault-tolerant control scheme. Finally, simulation results are presented to demonstrate the efficiency of the developed techniques. PMID:25014982

  14. Active fault tolerant control of a flexible beam

    NASA Astrophysics Data System (ADS)

    Bai, Yuanqiang; Grigoriadis, Karolos M.; Song, Gangbing

    2007-04-01

    This paper presents the development and application of an H∞ fault detection and isolation (FDI) filter and fault tolerant controller (FTC) for smart structures. A linear matrix inequality (LMI) formulation is obtained to design the full order robust H∞ filter to estimate the faulty input signals. A fault tolerant H∞ controller is designed for the combined system of plant and filter which minimizes the control objective selected in the presence of disturbances and faults. A cantilevered flexible beam bonded with piezoceramic smart materials, in particular the PZT (Lead Zirconate Titanate), in the form of a patch is used in the validation of the FDI filter and FTC controller design. These PZT patches are surface-bonded on the beam and perform as actuators and sensors. A real-time data acquisition and control system is used to record the experimental data and to implement the designed FDI filter and FTC. To assist the control system design, system identification is conducted for the first mode of the smart structural system. The state space model from system identification is used for the H∞ FDI filter design. The controller was designed based on minimization of the control effort and displacement of the beam. The residuals obtained from the filter through experiments clearly identify the fault signals. The experimental results of the proposed FTC controller show its e effectiveness for the vibration suppression of the beam for the faulty system when the piezoceramic actuator has a partial failure.

  15. Performance and economy of a fault-tolerant multiprocessor

    NASA Technical Reports Server (NTRS)

    Lala, J. H.; Smith, C. J.

    1979-01-01

    The FTMP (Fault-Tolerant Multiprocessor) is one of two central aircraft fault-tolerant architectures now in the prototype phase under NASA sponsorship. The intended application of the computer includes such critical real-time tasks as 'fly-by-wire' active control and completely automatic Category III landings of commercial aircraft. The FTMP architecture is briefly described and it is shown that it is a viable solution to the multi-faceted problems of safety, speed, and cost. Three job dispatch strategies are described, and their results with respect to job-starting delay are presented. The first strategy is a simple First-Come-First-Serve (FCFS) job dispatch executive. The other two schedulers are an adaptive FCFS and an interrupt driven scheduler. Three failure modes are discussed, and the FTMP survival probability in the face of random hard failures is evaluated. It is noted that the hourly cost of operating two FTMPs in a transport aircraft can be as little as one-to-two percent of the total flight-hour cost of the aircraft.

  16. Development and Evaluation of Fault-Tolerant Flight Control Systems

    NASA Technical Reports Server (NTRS)

    Song, Yong D.; Gupta, Kajal (Technical Monitor)

    2004-01-01

    The research is concerned with developing a new approach to enhancing fault tolerance of flight control systems. The original motivation for fault-tolerant control comes from the need for safe operation of control elements (e.g. actuators) in the event of hardware failures in high reliability systems. One such example is modem space vehicle subjected to actuator/sensor impairments. A major task in flight control is to revise the control policy to balance impairment detectability and to achieve sufficient robustness. This involves careful selection of types and parameters of the controllers and the impairment detecting filters used. It also involves a decision, upon the identification of some failures, on whether and how a control reconfiguration should take place in order to maintain a certain system performance level. In this project new flight dynamic model under uncertain flight conditions is considered, in which the effects of both ramp and jump faults are reflected. Stabilization algorithms based on neural network and adaptive method are derived. The control algorithms are shown to be effective in dealing with uncertain dynamics due to external disturbances and unpredictable faults. The overall strategy is easy to set up and the computation involved is much less as compared with other strategies. Computer simulation software is developed. A serious of simulation studies have been conducted with varying flight conditions.

  17. Fault-tolerant adaptive FIR filters using variable detection threshold

    NASA Astrophysics Data System (ADS)

    Lin, L. K.; Redinbo, G. R.

    1994-10-01

    Adaptive filters are widely used in many digital signal processing applications, where tap weight of the filters are adjusted by stochastic gradient search methods. Block adaptive filtering techniques, such as block least mean square and block conjugate gradient algorithm, were developed to speed up the convergence as well as improve the tracking capability which are two important factors in designing real-time adaptive filter systems. Even though algorithm-based fault tolerance can be used as a low-cost high level fault-tolerant technique to protect the aforementioned systems from hardware failures with minimal hardware overhead, the issue of choosing a good detection threshold remains a challenging problem. First of all, the systems usually only have limited computational resources, i.e., concurrent error detection and correction is not feasible. Secondly, any prior knowledge of input data is very difficult to get in practical settings. We propose a checksum-based fault detection scheme using two-level variable detection thresholds that is dynamically dependent on the past syndromes. Simulations show that the proposed scheme reduces the possibility of false alarms and has a high degree of fault coverage in adaptive filter systems.

  18. Software reliability through fault-avoidance and fault-tolerance

    NASA Technical Reports Server (NTRS)

    Vouk, Mladen A.; Mcallister, David F.

    1993-01-01

    Strategies and tools for the testing, risk assessment and risk control of dependable software-based systems were developed. Part of this project consists of studies to enable the transfer of technology to industry, for example the risk management techniques for safety-concious systems. Theoretical investigations of Boolean and Relational Operator (BRO) testing strategy were conducted for condition-based testing. The Basic Graph Generation and Analysis tool (BGG) was extended to fully incorporate several variants of the BRO metric. Single- and multi-phase risk, coverage and time-based models are being developed to provide additional theoretical and empirical basis for estimation of the reliability and availability of large, highly dependable software. A model for software process and risk management was developed. The use of cause-effect graphing for software specification and validation was investigated. Lastly, advanced software fault-tolerance models were studied to provide alternatives and improvements in situations where simple software fault-tolerance strategies break down.

  19. Reliability modeling of fault-tolerant computer based systems

    NASA Technical Reports Server (NTRS)

    Bavuso, Salvatore J.

    1987-01-01

    Digital fault-tolerant computer-based systems have become commonplace in military and commercial avionics. These systems hold the promise of increased availability, reliability, and maintainability over conventional analog-based systems through the application of replicated digital computers arranged in fault-tolerant configurations. Three tightly coupled factors of paramount importance, ultimately determining the viability of these systems, are reliability, safety, and profitability. Reliability, the major driver affects virtually every aspect of design, packaging, and field operations, and eventually produces profit for commercial applications or increased national security. However, the utilization of digital computer systems makes the task of producing credible reliability assessment a formidable one for the reliability engineer. The root of the problem lies in the digital computer's unique adaptability to changing requirements, computational power, and ability to test itself efficiently. Addressed here are the nuances of modeling the reliability of systems with large state sizes, in the Markov sense, which result from systems based on replicated redundant hardware and to discuss the modeling of factors which can reduce reliability without concomitant depletion of hardware. Advanced fault-handling models are described and methods of acquiring and measuring parameters for these models are delineated.

  20. A validation methodology for fault-tolerant clock synchronization

    NASA Technical Reports Server (NTRS)

    Johnson, S. C.; Butler, R. W.

    1984-01-01

    A validation method for the synchronization subsystem of a fault-tolerant computer system is presented. The high reliability requirement of flight crucial systems precludes the use of most traditional validation methods. The method presented utilizes formal design proof to uncover design and coding errors and experimentation to validate the assumptions of the design proof. The experimental method is described and illustrated by validating an experimental implementation of the Software Implemented Fault Tolerance (SIFT) clock synchronization algorithm. The design proof of the algorithm defines the maximum skew between any two nonfaulty clocks in the system in terms of theoretical upper bounds on certain system parameters. The quantile to which each parameter must be estimated is determined by a combinatorial analysis of the system reliability. The parameters are measured by direct and indirect means, and upper bounds are estimated. A nonparametric method based on an asymptotic property of the tail of a distribution is used to estimate the upper bound of a critical system parameter. Although the proof process is very costly, it is extremely valuable when validating the crucial synchronization subsystem.

  1. Fault-tolerance in Two-dimensional Topological Systems

    NASA Astrophysics Data System (ADS)

    Anderson, Jonas T.

    This thesis is a collection of ideas with the general goal of building, at least in the abstract, a local fault-tolerant quantum computer. The connection between quantum information and topology has proven to be an active area of research in several fields. The introduction of the toric code by Alexei Kitaev demonstrated the usefulness of topology for quantum memory and quantum computation. Many quantum codes used for quantum memory are modeled by spin systems on a lattice, with operators that extract syndrome information placed on vertices or faces of the lattice. It is natural to wonder whether the useful codes in such systems can be classified. This thesis presents work that leverages ideas from topology and graph theory to explore the space of such codes. Homological stabilizer codes are introduced and it is shown that, under a set of reasonable assumptions, any qubit homological stabilizer code is equivalent to either a toric code or a color code. Additionally, the toric code and the color code correspond to distinct classes of graphs. Many systems have been proposed as candidate quantum computers. It is very desirable to design quantum computing architectures with two-dimensional layouts and low complexity in parity-checking circuitry. Kitaev's surface codes provided the first example of codes satisfying this property. They provided a new route to fault tolerance with more modest overheads and thresholds approaching 1%. The recently discovered color codes share many properties with the surface codes, such as the ability to perform syndrome extraction locally in two dimensions. Some families of color codes admit a transversal implementation of the entire Clifford group. This work investigates color codes on the 4.8.8 lattice known as triangular codes. I develop a fault-tolerant error-correction strategy for these codes in which repeated syndrome measurements on this lattice generate a three-dimensional space-time combinatorial structure. I then develop an

  2. Analysis of non-linearity in differential wavefront sensing technique.

    PubMed

    Duan, Hui-Zong; Liang, Yu-Rong; Yeh, Hsien-Chi

    2016-03-01

    An analytical model of a differential wavefront sensing (DWS) technique based on Gaussian Beam propagation has been derived. Compared with the result of the interference signals detected by quadrant photodiode, which is calculated by using the numerical method, the analytical model has been verified. Both the analytical model and numerical simulation show milli-radians level non-linearity effect of DWS detection. In addition, the beam clipping has strong influence on the non-linearity of DWS. The larger the beam clipping is, the smaller the non-linearity is. However, the beam walking effect hardly has influence on DWS. Thus, it can be ignored in laser interferometer. PMID:26974079

  3. Fault tolerant small satellite attitude control using adaptive non-singular terminal sliding mode

    NASA Astrophysics Data System (ADS)

    Cao, Lu; Chen, XiaoQian; Sheng, Tao

    2013-06-01

    The Attitude Control System (ACS) plays a pivotal role in the whole performance of the spacecraft on the orbit; therefore, it is vitally important to design the control system with the performance of rapid response, high control precision and insensitive to external perturbations. In the first place, this paper proposes two adaptive nonlinear control algorithms based on the sliding mode control (SMC), which are designed for small satellite attitude control system. The nonlinear dynamics describing the attitude of small satellite is considered in a circle reference orbit, and the stability of the closed-loop system in the presence of external perturbations is investigated. Then, in order to account for accidental or degradation fault in satellite actuators, the fault-tolerant control schemes are presented. Hence, two adaptive fault-tolerant control laws (continuous sliding mode control and non-singular terminal sliding mode control) are developed by adopting the nonlinear analytical model to describe the system, which can guarantee global asymptotic convergence of the attitude control error with the existence of unknown external perturbations. The nonlinear hyperplane based Terminal sliding mode is introduced into the control law design; therefore, the system convergence performance improves and the control error is convergent in "finite time". As a result, the study on the non-singular terminal sliding mode control is the emphasis and the continuous sliding mode control is used to compare with the non-singular terminal sliding mode control. Meanwhile, an adaptive fuzzy algorithm has been proposed to suppress the chattering phenomenon. Moreover, several numerical examples are presented to demonstrate the efficacy of the proposed controllers by correcting for the external perturbations. Simulation results confirm that the suggested methodologies yield high control precision in control. In addition, actuator degradation, actuator stuck and actuator failure for a

  4. The X-38 Spacecraft Fault-Tolerant Avionics System

    NASA Technical Reports Server (NTRS)

    Kouba,Coy; Buscher, Deborah; Busa, Joseph

    2003-01-01

    In 1995 NASA began an experimental program to develop a reusable crew return vehicle (CRV) for the International Space Station. The purpose of the CRV was threefold: (i) to bring home an injured or ill crewmember; (ii) to bring home the entire crew if the Shuttle fleet was grounded; and (iii) to evacuate the crew in the case of an imminent Station threat (i.e., fire, decompression, etc). Built at the Johnson Space Center, were two approach and landing prototypes and one spacecraft demonstrator (called V201). A series of increasingly complex ground subsystem tests were completed, and eight successful high-altitude drop tests were achieved to prove the design concept. In this program, an unprecedented amount of commercial-off-the-shelf technology was utilized in this first crewed spacecraft NASA has built since the Shuttle program. Unfortunately, in 2002 the program was canceled due to changing Agency priorities. The vehicle was 80% complete and the program was shut down in such a manner as to preserve design, development, test and engineering data. This paper describes the X-38 V201 fault-tolerant avionics system. Based on Draper Laboratory's Byzantine-resilient fault-tolerant parallel processing system and their "network element" hardware, each flight computer exchanges information on a strict timescale to process input data, compare results, and issue voted vehicle output commands. Major accomplishments achieved in this development include: (i) a space qualified two-fault tolerant design using mostly COTS (hardware and operating system); (ii) a single event upset tolerant network element board, (iii) on-the-fly recovery of a failed processor; (iv) use of synched cache; (v) realignment of memory to bring back a failed channel; (vi) flight code automatically generated from the master measurement list; and (vii) built in-house by a team of civil servants and support contractors. This paper will present an overview of the avionics system and the hardware

  5. FPGA-Based, Self-Checking, Fault-Tolerant Computers

    NASA Technical Reports Server (NTRS)

    Some, Raphael; Rennels, David

    2004-01-01

    A proposed computer architecture would exploit the capabilities of commercially available field-programmable gate arrays (FPGAs) to enable computers to detect and recover from bit errors. The main purpose of the proposed architecture is to enable fault-tolerant computing in the presence of single-event upsets (SEUs). [An SEU is a spurious bit flip (also called a soft error) caused by a single impact of ionizing radiation.] The architecture would also enable recovery from some soft errors caused by electrical transients and, to some extent, from intermittent and permanent (hard) errors caused by aging of electronic components. A typical FPGA of the current generation contains one or more complete processor cores, memories, and highspeed serial input/output (I/O) channels, making it possible to shrink a board-level processor node to a single integrated-circuit chip. Custom, highly efficient microcontrollers, general-purpose computers, custom I/O processors, and signal processors can be rapidly and efficiently implemented by use of FPGAs. Unfortunately, FPGAs are susceptible to SEUs. Prior efforts to mitigate the effects of SEUs have yielded solutions that degrade performance of the system and require support from external hardware and software. In comparison with other fault-tolerant- computing architectures (e.g., triple modular redundancy), the proposed architecture could be implemented with less circuitry and lower power demand. Moreover, the fault-tolerant computing functions would require only minimal support from circuitry outside the central processing units (CPUs) of computers, would not require any software support, and would be largely transparent to software and to other computer hardware. There would be two types of modules: a self-checking processor module and a memory system (see figure). The self-checking processor module would be implemented on a single FPGA and would be capable of detecting its own internal errors. It would contain two CPUs executing

  6. Design study of Software-Implemented Fault-Tolerance (SIFT) computer

    NASA Technical Reports Server (NTRS)

    Wensley, J. H.; Goldberg, J.; Green, M. W.; Kutz, W. H.; Levitt, K. N.; Mills, M. E.; Shostak, R. E.; Whiting-Okeefe, P. M.; Zeidler, H. M.

    1982-01-01

    Software-implemented fault tolerant (SIFT) computer design for commercial aviation is reported. A SIFT design concept is addressed. Alternate strategies for physical implementation are considered. Hardware and software design correctness is addressed. System modeling and effectiveness evaluation are considered from a fault-tolerant point of view.

  7. A complete hardening method for the generation of fault tolerant circuits

    NASA Astrophysics Data System (ADS)

    Portela-Garcia, Marta; Garcia-Valderas, Mario; Lopez-Ongil, Celia; Entrena, Luis

    2005-06-01

    Fault Tolerance has become an important requirement for integrated circuits, not only in safety critical applications like aerospace circuits, but also for applications working at the earth surface. Since the appearance of nanometer technologies, the sensitiveness of integrated circuits to radiation has increased notably, making the occurrence of soft errors much more frequent. Therefore, hardened circuits are currently required in many applications where fault tolerance was not a requirement in the very near past. In this paper, tools and methods for the whole hardening process of a circuit are presented: tools for the automatic insertion of fault tolerant structures in a circuit description and methods for the evaluation of fault tolerance achieved. These methods allow the evaluation of fault tolerance by means of emulation in platform FPGAs, which offer a much faster way to perform evaluation than simulation based techniques. Different circuits are used to test the proposed tool for inserting fault tolerant structures. Fault tolerance evaluation is performed using the proposed fault emulation methods, before and after applying hardening process, showing the fault tolerance improvement. The proposed techniques for evaluation have been compared, in terms of evaluation time, with previously proposed solutions and with simulation based solutions, showing improvements of several orders of magnitude.

  8. SIFT - Multiprocessor architecture for Software Implemented Fault Tolerance flight control and avionics computers

    NASA Technical Reports Server (NTRS)

    Forman, P.; Moses, K.

    1979-01-01

    A brief description of a SIFT (Software Implemented Fault Tolerance) Flight Control Computer with emphasis on implementation is presented. A multiprocessor system that relies on software-implemented fault detection and reconfiguration algorithms is described. A high level reliability and fault tolerance is achieved by the replication of computing tasks among processing units.

  9. 14 CFR Special Federal Aviation... - Fuel Tank System Fault Tolerance Evaluation Requirements

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... 14 Aeronautics and Space 1 2013-01-01 2013-01-01 false Fuel Tank System Fault Tolerance Evaluation Requirements Federal Special Federal Aviation Regulation No. 88 Aeronautics and Space FEDERAL AVIATION..., SFAR No. 88 Special Federal Aviation Regulation No. 88—Fuel Tank System Fault Tolerance...

  10. 14 CFR Special Federal Aviation... - Fuel Tank System Fault Tolerance Evaluation Requirements

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... 14 Aeronautics and Space 1 2012-01-01 2012-01-01 false Fuel Tank System Fault Tolerance Evaluation Requirements Federal Special Federal Aviation Regulation No. 88 Aeronautics and Space FEDERAL AVIATION..., SFAR No. 88 Special Federal Aviation Regulation No. 88—Fuel Tank System Fault Tolerance...

  11. 14 CFR Special Federal Aviation... - Fuel Tank System Fault Tolerance Evaluation Requirements

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 14 Aeronautics and Space 1 2010-01-01 2010-01-01 false Fuel Tank System Fault Tolerance Evaluation Requirements Federal Special Federal Aviation Regulation No. 88 Aeronautics and Space FEDERAL AVIATION..., SFAR No. 88 Special Federal Aviation Regulation No. 88—Fuel Tank System Fault Tolerance...

  12. 14 CFR Special Federal Aviation... - Fuel Tank System Fault Tolerance Evaluation Requirements

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 14 Aeronautics and Space 1 2011-01-01 2011-01-01 false Fuel Tank System Fault Tolerance Evaluation Requirements Federal Special Federal Aviation Regulation No. 88 Aeronautics and Space FEDERAL AVIATION..., SFAR No. 88 Special Federal Aviation Regulation No. 88—Fuel Tank System Fault Tolerance...

  13. 14 CFR Special Federal Aviation... - Fuel Tank System Fault Tolerance Evaluation Requirements

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... 14 Aeronautics and Space 1 2014-01-01 2014-01-01 false Fuel Tank System Fault Tolerance Evaluation Requirements Federal Special Federal Aviation Regulation No. 88 Aeronautics and Space FEDERAL AVIATION..., SFAR No. 88 Special Federal Aviation Regulation No. 88—Fuel Tank System Fault Tolerance...

  14. A survey of NASA and military standards on fault tolerance and reliability applied to robotics

    NASA Technical Reports Server (NTRS)

    Cavallaro, Joseph R.; Walker, Ian D.

    1994-01-01

    There is currently increasing interest and activity in the area of reliability and fault tolerance for robotics. This paper discusses the application of Standards in robot reliability, and surveys the literature of relevant existing standards. A bibliography of relevant Military and NASA standards for reliability and fault tolerance is included.

  15. An optimized implementation of a fault-tolerant clock synchronization circuit

    NASA Technical Reports Server (NTRS)

    Torres-Pomales, Wilfredo

    1995-01-01

    A fault-tolerant clock synchronization circuit was designed and tested. A comparison to a previous design and the procedure followed to achieve the current optimization are included. The report also includes a description of the system and the results of tests performed to study the synchronization and fault-tolerant characteristics of the implementation.

  16. An empirical comparison of software fault tolerance and fault elimination

    NASA Technical Reports Server (NTRS)

    Shimeall, Timothy J.; Leveson, Nancy G.

    1991-01-01

    Reliability is an important concern in the development of software for modern systems. Some researchers have hypothesized that particular fault-handling approaches or techniques are so effective that other approaches or techniques are superfluous. The authors have performed a study that compares two major approaches to the improvement of software, software fault elimination and software fault tolerance, by examination of the fault detection obtained by five techniques: run-time assertions, multi-version voting, functional testing augmented by structural testing, code reading by stepwise abstraction, and static data-flow analysis. This study has focused on characterizing the sets of faults detected by the techniques and on characterizing the relationships between these sets of faults. The results of the study show that none of the techniques studied is necessarily redundant to any combination of the others. Further results reveal strengths and weakness in the fault detection by the techniques studied and suggest directions for future research.

  17. Using Ada for a distributed, fault tolerant system

    NASA Technical Reports Server (NTRS)

    Dewolf, J. B.; Sodano, N. M.; Whittredge, R. S.

    1984-01-01

    It is pointed out that advanced avionics applications increasingly require underlying machine architectures which are damage and fault tolerant, and which provide access to distributed sensors, effectors and high-throughput computational resources. The Advanced Information Processing System (AIPS), sponsored by NASA, is to provide an architecture which can meet the considered requirements. Ada was selected for implementing the AIPS system software. Advantages of Ada are related to its provisions for real-time programming, error detection, modularity and separate compilation, and standardization and portability. Chief drawbacks of this language are currently limited availability and maturity of language implementations, and limited experience in applying the language to real-time applications. The present investigation is concerned with current plans for employing Ada in the design of the software for AIPS. Attention is given to an overview of AIPS, AIPS software services, and representative design issues in each of four major software categories.

  18. Fault-Tolerant Control Based on Hybrid Redundancy

    NASA Astrophysics Data System (ADS)

    Takagi, Taro; Takahashi, Masanori

    This paper presents a new fault-tolerant control system (FTCS) against actuator failures. The proposed FTCS is based on a hybrid of static and dynamic redundancies. The redundancy-mode is selected appropriately by only a switching logic which is designed from the control performance. Hence, no fault detector is utilized. For all switched modes, a unity high-gain feedback controller with a parallel feedforward compensator is introduced to attain the stabilization and the asymptotic tracking. Because the controller has high robustness with respect to uncertainties, the FTCS can cope with variations in dynamics that is caused by the failure. In this paper, several simulation results for the connected vehicles are shown to confirm the effectiveness of the FTCS.

  19. Evolution of shuttle avionics redundancy management/fault tolerance

    NASA Technical Reports Server (NTRS)

    Boykin, J. C.; Thibodeau, J. R.; Schneider, H. E.

    1985-01-01

    The challenge of providing redundancy management (RM) and fault tolerance to meet the Shuttle Program requirements of fail operational/fail safe for the avionics systems was complicated by the critical program constraints of weight, cost, and schedule. The basic and sometimes false effectivity of less than pure RM designs is addressed. Evolution of the multiple input selection filter (the heart of the RM function) is discussed with emphasis on the subtle interactions of the flight control system that were found to be potentially catastrophic. Several other general RM development problems are discussed, with particular emphasis on the inertial measurement unit RM, indicative of the complexity of managing that three string system and its critical interfaces with the guidance and control systems.

  20. MCNP load balancing and fault tolerance with PVM

    SciTech Connect

    McKinney, G.W.

    1995-07-01

    Version 4A of the Monte Carlo neutron, photon, and electron transport code MCNP, developed by LANL (Los Alamos National Laboratory), supports distributed-memory multiprocessing through the software package PVM (Parallel Virtual Machine, version 3.1.4). Using PVM for interprocessor communication, MCNP can simultaneously execute a single problem on a cluster of UNIX-based workstations. This capability provided system efficiencies that exceeded 80% on dedicated workstation clusters, however, on heterogeneous or multiuser systems, the performance was limited by the slowest processor (i.e., equal work was assigned to each processor). The next public release of MCNP will provide multiprocessing enhancements that include load balancing and fault tolerance which are shown to dramatically increase multiuser system efficiency and reliability.

  1. Fault-tolerant control of heavy-haul trains

    NASA Astrophysics Data System (ADS)

    Zhuan, Xiangtao; Xia, Xiaohua

    2010-06-01

    The fault-tolerant control (FTC) of heavy-haul trains is discussed on the basis of the speed regulation proposed in previous works. The fault modes of trains are assumed and the corresponding fault detection and isolation (FDI) are studied. The FDI of sensor faults is based on a geometric approach for residual generators. The FDI of a braking system is based on the observation of the steady-state speed. From the difference of the steady-state speeds between the fault system and the faultless system, one can get fault information. Simulation tests were conducted on the suitability of the FDIs and the redesigned speed regulators. It is shown that the proposed FTC does not explicitly worsen the performance of the speed regulator in the case of a faultless system, while it obviously improves the performance of the speed regulator in the case of a faulty system.

  2. Gain-Scheduled Fault Tolerance Control Under False Identification

    NASA Technical Reports Server (NTRS)

    Shin, Jong-Yeob; Belcastro, Christine (Technical Monitor)

    2006-01-01

    An active fault tolerant control (FTC) law is generally sensitive to false identification since the control gain is reconfigured for fault occurrence. In the conventional FTC law design procedure, dynamic variations due to false identification are not considered. In this paper, an FTC synthesis method is developed in order to consider possible variations of closed-loop dynamics under false identification into the control design procedure. An active FTC synthesis problem is formulated into an LMI optimization problem to minimize the upper bound of the induced-L2 norm which can represent the worst-case performance degradation due to false identification. The developed synthesis method is applied for control of the longitudinal motions of FASER (Free-flying Airplane for Subscale Experimental Research). The designed FTC law of the airplane is simulated for pitch angle command tracking under a false identification case.

  3. FTMP - A highly reliable Fault-Tolerant Multiprocessor for aircraft

    NASA Technical Reports Server (NTRS)

    Hopkins, A. L., Jr.; Smith, T. B., III; Lala, J. H.

    1978-01-01

    The FTMP (Fault-Tolerant Multiprocessor) is a complex multiprocessor computer that employs a form of redundancy related to systems considered by Mathur (1971), in which each major module can substitute for any other module of the same type. Despite the conceptual simplicity of the redundancy form, the implementation has many intricacies owing partly to the low target failure rate, and partly to the difficulty of eliminating single-fault vulnerability. An extensive analysis of the computer through the use of such modeling techniques as Markov processes and combinatorial mathematics shows that for random hard faults the computer can meet its requirements. It is also shown that the maintenance scheduled at intervals of 200 hr or more can be adequate most of the time.

  4. Fault model development for fault tolerant VLSI design

    NASA Astrophysics Data System (ADS)

    Hartmann, C. R.; Lala, P. K.; Ali, A. M.; Visweswaran, G. S.; Ganguly, S.

    1988-05-01

    Fault models provide systematic and precise representations of physical defects in microcircuits in a form suitable for simulation and test generation. The current difficulty in testing VLSI circuits can be attributed to the tremendous increase in design complexity and the inappropriateness of traditional stuck-at fault models. This report develops fault models for three different types of common defects that are not accurately represented by the stuck-at fault model. The faults examined in this report are: bridging faults, transistor stuck-open faults, and transient faults caused by alpha particle radiation. A generalized fault model could not be developed for the three fault types. However, microcircuit behavior and fault detection strategies are described for the bridging, transistor stuck-open, and transient (alpha particle strike) faults. The results of this study can be applied to the simulation and analysis of faults in fault tolerant VLSI circuits.

  5. Coverage modeling for dependability analysis of fault-tolerant systems

    NASA Technical Reports Server (NTRS)

    Dugan, Joanne Bechta; Trivedi, Kishor S.

    1989-01-01

    Several different models for predicting coverage in a fault-tolerant system, including models for permanent, intermittent, and transient errors, are discussed. Markov, semi-Markov, nonhomogeneous Markov, and extended stochastic Petri net models for computing coverage are developed. Two types of events that interfere with recovery are examined; and methods for modeling such events, whether they are deterministic or random, are given. The sensitivity of system reliability/availability to the coverage parameter and the sensitivity of the coverage parameter to various error-handling strategies are investigated. It is found that a policy of attempting transient recovery upon detection of an error can actually increase the unreliability of the system. This result is true if the error detectability is not nearly perfect, so that the risk of producing an undetectable error is greater than the benefit gained by not discarding the component.

  6. Hypothetical Scenario Generator for Fault-Tolerant Diagnosis

    NASA Technical Reports Server (NTRS)

    James, Mark

    2007-01-01

    The Hypothetical Scenario Generator for Fault-tolerant Diagnostics (HSG) is an algorithm being developed in conjunction with other components of artificial- intelligence systems for automated diagnosis and prognosis of faults in spacecraft, aircraft, and other complex engineering systems. By incorporating prognostic capabilities along with advanced diagnostic capabilities, these developments hold promise to increase the safety and affordability of the affected engineering systems by making it possible to obtain timely and accurate information on the statuses of the systems and predicting impending failures well in advance. The HSG is a specific instance of a hypothetical- scenario generator that implements an innovative approach for performing diagnostic reasoning when data are missing. The special purpose served by the HSG is to (1) look for all possible ways in which the present state of the engineering system can be mapped with respect to a given model and (2) generate a prioritized set of future possible states and the scenarios of which they are parts.

  7. Buffered coscheduling for parallel programming and enhanced fault tolerance

    DOEpatents

    Petrini, Fabrizio; Feng, Wu-chun

    2006-01-31

    A computer implemented method schedules processor jobs on a network of parallel machine processors or distributed system processors. Control information communications generated by each process performed by each processor during a defined time interval is accumulated in buffers, where adjacent time intervals are separated by strobe intervals for a global exchange of control information. A global exchange of the control information communications at the end of each defined time interval is performed during an intervening strobe interval so that each processor is informed by all of the other processors of the number of incoming jobs to be received by each processor in a subsequent time interval. The buffered coscheduling method of this invention also enhances the fault tolerance of a network of parallel machine processors or distributed system processors

  8. Fault tolerant vector control of induction motor drive

    NASA Astrophysics Data System (ADS)

    Odnokopylov, G.; Bragin, A.

    2014-10-01

    For electric composed of technical objects hazardous industries, such as nuclear, military, chemical, etc. an urgent task is to increase their resiliency and survivability. The construction principle of vector control system fault-tolerant asynchronous electric. Displaying recovery efficiency three-phase induction motor drive in emergency mode using two-phase vector control system. The process of formation of a simulation model of the asynchronous electric unbalance in emergency mode. When modeling used coordinate transformation, providing emergency operation electric unbalance work. The results of modeling transient phase loss motor stator. During a power failure phase induction motor cannot save circular rotating field in the air gap of the motor and ensure the restoration of its efficiency at rated torque and speed.

  9. Ultrafast and fault-tolerant quantum communication across long distances.

    PubMed

    Muralidharan, Sreraman; Kim, Jungsang; Lütkenhaus, Norbert; Lukin, Mikhail D; Jiang, Liang

    2014-06-27

    Quantum repeaters (QRs) provide a way of enabling long distance quantum communication by establishing entangled qubits between remote locations. In this Letter, we investigate a new approach to QRs in which quantum information can be faithfully transmitted via a noisy channel without the use of long distance teleportation, thus eliminating the need to establish remote entangled links. Our approach makes use of small encoding blocks to fault-tolerantly correct both operational and photon loss errors. We describe a way to optimize the resource requirement for these QRs with the aim of the generation of a secure key. Numerical calculations indicate that the number of quantum memory bits at each repeater station required for the generation of one secure key has favorable polylogarithmic scaling with the distance across which the communication is desired. PMID:25014798

  10. Investigation of an advanced fault tolerant integrated avionics system

    NASA Technical Reports Server (NTRS)

    Dunn, W. R.; Cottrell, D.; Flanders, J.; Javornik, A.; Rusovick, M.

    1986-01-01

    Presented is an advanced, fault-tolerant multiprocessor avionics architecture as could be employed in an advanced rotorcraft such as LHX. The processor structure is designed to interface with existing digital avionics systems and concepts including the Army Digital Avionics System (ADAS) cockpit/display system, navaid and communications suites, integrated sensing suite, and the Advanced Digital Optical Control System (ADOCS). The report defines mission, maintenance and safety-of-flight reliability goals as might be expected for an operational LHX aircraft. Based on use of a modular, compact (16-bit) microprocessor card family, results of a preliminary study examining simplex, dual and standby-sparing architectures is presented. Given the stated constraints, it is shown that the dual architecture is best suited to meet reliability goals with minimum hardware and software overhead. The report presents hardware and software design considerations for realizing the architecture including redundancy management requirements and techniques as well as verification and validation needs and methods.

  11. Fault Tolerant Coverage and Connectivity in Presence of Channel Randomness

    PubMed Central

    Sagar, Anil Kumar; Lobiyal, D. K.

    2014-01-01

    Some applications of wireless sensor network require K-coverage and K-connectivity to ensure the system to be fault tolerance and to make it more reliable. Therefore, it makes coverage and connectivity an important issue in wireless sensor networks. In this paper, we proposed K-coverage and K-connectivity models for wireless sensor networks. In both models, nodes are distributed according to Poisson distribution in the sensor field. To make the proposed model more realistic we used log-normal shadowing path loss model to capture the radio irregularities and studied its impact on K-coverage and K-connectivity. The value of K can be different for different types of applications. Further, we also analyzed the problem of node failure for K-coverage model. In the simulation section, results clearly show that coverage and connectivity of wireless sensor network depend on the node density, shadowing parameters like the path loss exponent, and standard deviation. PMID:24574922

  12. Validating Requirements for Fault Tolerant Systems Using Model Checking

    NASA Technical Reports Server (NTRS)

    Schneider, Francis; Easterbrook, Steve M.; Callahan, John R.; Holzmann, Gerard J.

    1997-01-01

    Model checking is shown to be an effective tool in validating the behavior of a fault tolerant embedded spacecraft controller. The case study presented here shows that by judiciously abstracting away extraneous complexity, the state space of the model could be exhaustively searched allowing critical functional requirements to be validated down to the design level. Abstracting away detail not germane to the problem of interest leaves by definition a partial specification behind. The success of this procedure shows that it is feasible to effectively validate a partial specification with this technique. Three anomalies were found in the system one of which is an error in the detailed requirements, and the other two are missing/ambiguous requirements. Because the method allows validation of partial specifications, it also is an effective methodology towards maintaining fidelity between a co-evolving specification and an implementation.

  13. The Design of a Fault-Tolerant COTS-Based Bus Architecture for Space Applications

    NASA Technical Reports Server (NTRS)

    Chau, Savio N.; Alkalai, Leon; Tai, Ann T.

    2000-01-01

    The high-performance, scalability and miniaturization requirements together with the power, mass and cost constraints mandate the use of commercial-off-the-shelf (COTS) components and standards in the X2000 avionics system architecture for deep-space missions. In this paper, we report our experiences and findings on the design of an IEEE 1394 compliant fault-tolerant COTS-based bus architecture. While the COTS standard IEEE 1394 adequately supports power management, high performance and scalability, its topological criteria impose restrictions on fault tolerance realization. To circumvent the difficulties, we derive a "stack-tree" topology that not only complies with the IEEE 1394 standard but also facilitates fault tolerance realization in a spaceborne system with limited dedicated resource redundancies. Moreover, by exploiting pertinent standard features of the 1394 interface which are not purposely designed for fault tolerance, we devise a comprehensive set of fault detection mechanisms to support the fault-tolerant bus architecture.

  14. Analysis of GPS Abnormal Conditions within Fault Tolerant Control Laws

    NASA Astrophysics Data System (ADS)

    Al-Sinbol, Gahssan

    The Global Position System (GPS) is a critical element for the functionality of autonomous flying vehicles. The GPS operation at normal and abnormal conditions directly impacts the trajectory tracking performance of the autonomous Unmanned Aerial Vehicles (UAVs) controllers. The effects of GPS parameter variation must be well understood and user-friendly computational tools must be developed to facilitate the design and evaluation of fault tolerant control laws. This thesis presents the development of a simplified GPS error model in Matlab/Simulink and its use performing a sensitivity analysis of GPS parameters effect under system normal and abnormal operation on different UAV trajectory tracking controllers. The model statistically generates position and velocity errors, simulates the effect of GPS satellite configuration on the position and velocity measurement accuracy, and implements a set of failures to the GPS readings. The model and its graphical user interface was integrated within the WVU UAV simulation environment as a masked Simulink block. The effects on the controllers' trajectory tracking performance of the following GPS parameters were investigated within normal operation ranges and outside: time delay, update rate, error standard deviation, bias, and major position and velocity failures. Several sets of control laws with fixed and adaptive parameters and of different levels of complexity have been used in this investigation. A complex performance index formulated in terms of tracking errors and control activity was used for control laws performance evaluation. The composition of various metrics within the performance index was performed using fixed and variable weights depending on the local characteristics of the commanded trajectory. This study has revealed that GPS error parameters have a significant impact on control laws performance. The proposed GPS model has proved to be a valuable, flexible tool for testing and evaluation of the fault

  15. Modeling and measurement of fault-tolerant multiprocessors

    NASA Technical Reports Server (NTRS)

    Shin, K. G.; Woodbury, M. H.; Lee, Y. H.

    1985-01-01

    The workload effects on computer performance are addressed first for a highly reliable unibus multiprocessor used in real-time control. As an approach to studing these effects, a modified Stochastic Petri Net (SPN) is used to describe the synchronous operation of the multiprocessor system. From this model the vital components affecting performance can be determined. However, because of the complexity in solving the modified SPN, a simpler model, i.e., a closed priority queuing network, is constructed that represents the same critical aspects. The use of this model for a specific application requires the partitioning of the workload into job classes. It is shown that the steady state solution of the queuing model directly produces useful results. The use of this model in evaluating an existing system, the Fault Tolerant Multiprocessor (FTMP) at the NASA AIRLAB, is outlined with some experimental results. Also addressed is the technique of measuring fault latency, an important microscopic system parameter. Most related works have assumed no or a negligible fault latency and then performed approximate analyses. To eliminate this deficiency, a new methodology for indirectly measuring fault latency is presented.

  16. ALLIANCE: An architecture for fault tolerant multi-robot cooperation

    SciTech Connect

    Parker, L.E.

    1995-02-01

    ALLIANCE is a software architecture that facilitates the fault tolerant cooperative control of teams of heterogeneous mobile robots performing missions composed of loosely coupled, largely independent subtasks. ALLIANCE allows teams of robots, each of which possesses a variety of high-level functions that it can perform during a mission, to individually select appropriate actions throughout the mission based on the requirements of the mission, the activities of other robots, the current environmental conditions, and the robot`s own internal states. ALLIANCE is a fully distributed, behavior-based architecture that incorporates the use of mathematically modeled motivations (such as impatience and acquiescence) within each robot to achieve adaptive action selection. Since cooperative robotic teams usually work in dynamic and unpredictable environments, this software architecture allows the robot team members to respond robustly, reliably, flexibly, and coherently to unexpected environmental changes and modifications in the robot team that may occur due to mechanical failure, the learning of new skills, or the addition or removal of robots from the team by human intervention. The feasibility of this architecture is demonstrated in an implementation on a team of mobile robots performing a laboratory version of hazardous waste cleanup.

  17. Fault-tolerant authenticated quantum dialogue using logical Bell states

    NASA Astrophysics Data System (ADS)

    Ye, Tian-Yu

    2015-09-01

    Two fault-tolerant authenticated quantum dialogue protocols are proposed in this paper by employing logical Bell states as the quantum resource, which combat the collective-dephasing noise and the collective-rotation noise, respectively. The two proposed protocols each can accomplish the mutual identity authentication and the dialogue between two participants simultaneously and securely over one kind of collective noise channels. In each of two proposed protocols, the information transmitted through the classical channel is assumed to be eavesdroppable and modifiable. The key for choosing the measurement bases of sample logical qubits is pre-shared privately between two participants. The Bell state measurements rather than the four-qubit joint measurements are adopted for decoding. The two participants share the initial states of message logical Bell states with resort to the direct transmission of auxiliary logical Bell states so that the information leakage problem is avoided. The impersonation attack, the man-in-the-middle attack, the modification attack and the Trojan horse attacks from Eve all are detectable.

  18. Fault-tolerant self-routing computer network topology

    SciTech Connect

    Mitchell, T.L.

    1987-01-01

    This document reports on the development and analysis of a new, easily expandable, highly fault tolerant self-routing computer network topology. The topology applies equally to any general-purpose computer-networking environment, whether local, metropolitan, or wide area. This new connectivity scheme is named the spiral topology because the architecture is built around modules of four computer nodes each, connected by top and bottom spirals. The spiral topology features a simple internal self-routing algorithm that adapts quickly, and automatically, to failed nodes and links. The six most important direct consequences of the spiral computer-network architecture are the topology's (1) ease of expansion; (2) fast, on-the-fly self-routing; (3) extremely high tolerance to network faults; (4) increased network security; (5) potential for the total elimination of store and forward transmissions due to routing decision delays; and (6) rendering the maximum path length issue moots. The fast on-the-fly routing capability of the spiral topology makes it highly amenable to fiber topic communications in any networking environment.

  19. Application of fault-tolerant controls to UAVs

    NASA Astrophysics Data System (ADS)

    Vos, David W.; Motazed, Ben

    1996-05-01

    Autonomous unmanned systems require provision for fault detection and recovery. Multiply-redundant schemes typically used in aerospace applications are prohibitively expensive and inappropriate solution for unmanned systems where low cost and small size are critical. Aurora Flight Sciences is developing alternative low-cost, fault-tolerant control (FTC) capabilities, incorporating failure detection and isolation, and control reconfiguring algorithms into aircraft flight control systems. A 'monitoring observer', or failure detection filter, predicts the future aircraft state based on prior control inputs and measurements, and interprets discrepancies between the output of the two systems. The FTC detects and isolates the onset of a sensor or actuator failure in real-time, and automatically reconfigures the control laws to maintain full control authority. This methodology is unique in providing a compact and elegant FTC solution to dynamic systems with nonlinear parameter dependence, such as high-altitude UAVs (unmanned air vehicles) and UUVs (unmanned undersea vehicles), where the dynamic behavior varies strongly with speed (i.e., dynamic pressure) and density. In simulation, the application of the algorithm to actual telemetry data from an in-flight vertical gyro failure, shows the algorithm can easily detect the failure and further demonstrated (in simulation) reconfiguring of the autopilots to successfully accommodate recovery.

  20. Fault-Tolerant Software-Defined Radio on Manycore

    NASA Technical Reports Server (NTRS)

    Ricketts, Scott

    2015-01-01

    Software-defined radio (SDR) platforms generally rely on field-programmable gate arrays (FPGAs) and digital signal processors (DSPs), but such architectures require significant software development. In addition, application demands for radiation mitigation and fault tolerance exacerbate programming challenges. MaXentric Technologies, LLC, has developed a manycore-based SDR technology that provides 100 times the throughput of conventional radiationhardened general purpose processors. Manycore systems (30-100 cores and beyond) have the potential to provide high processing performance at error rates that are equivalent to current space-deployed uniprocessor systems. MaXentric's innovation is a highly flexible radio, providing over-the-air reconfiguration; adaptability; and uninterrupted, real-time, multimode operation. The technology is also compliant with NASA's Space Telecommunications Radio System (STRS) architecture. In addition to its many uses within NASA communications, the SDR can also serve as a highly programmable research-stage prototyping device for new waveforms and other communications technologies. It can also support noncommunication codes on its multicore processor, collocated with the communications workload-reducing the size, weight, and power of the overall system by aggregating processing jobs to a single board computer.

  1. Integrated sensor and actuator fault-tolerant control

    NASA Astrophysics Data System (ADS)

    Seron, María M.; De Doná, José A.; Richter, Jan H.

    2013-04-01

    We propose a fault-tolerant control scheme that deals with sensor and actuator faults through the use of a virtual actuator (VA) and a bank of virtual sensors (VSs). A novel feature of the scheme is that the VSs implicitly integrate both fault detection and isolation (FDI) and - in conjunction with the VA - controller reconfiguration tasks. The VA and the bank of VSs operate in closed-loop with an observer-based tracking controller designed for a nominal (fault free) model of the plant. A switching rule that reconfigures the VA and engages the suitable VS from the bank is based on sets defined for measurable residual signals constructed directly from the VS signals. Our method handles abrupt actuator and sensor faults of arbitrary magnitude including complete outage. The overall scheme is shown to guarantee closed-loop boundedness and setpoint tracking under all considered fault situations. Enhancements of the scheme to deal with errors in the fault detection and isolation are also proposed. Applications of the scheme to a winding machine and an interconnected tank system are presented.

  2. Lightweight storage and overlay networks for fault tolerance.

    SciTech Connect

    Oldfield, Ron A.

    2010-01-01

    The next generation of capability-class, massively parallel processing (MPP) systems is expected to have hundreds of thousands to millions of processors, In such environments, it is critical to have fault-tolerance mechanisms, including checkpoint/restart, that scale with the size of applications and the percentage of the system on which the applications execute. For application-driven, periodic checkpoint operations, the state-of-the-art does not provide a scalable solution. For example, on today's massive-scale systems that execute applications which consume most of the memory of the employed compute nodes, checkpoint operations generate I/O that consumes nearly 80% of the total I/O usage. Motivated by this observation, this project aims to improve I/O performance for application-directed checkpoints through the use of lightweight storage architectures and overlay networks. Lightweight storage provide direct access to underlying storage devices. Overlay networks provide caching and processing capabilities in the compute-node fabric. The combination has potential to signifcantly reduce I/O overhead for large-scale applications. This report describes our combined efforts to model and understand overheads for application-directed checkpoints, as well as implementation and performance analysis of a checkpoint service that uses available compute nodes as a network cache for checkpoint operations.

  3. Distributed Evaluation Functions for Fault Tolerant Multi-Rover Systems

    NASA Technical Reports Server (NTRS)

    Agogino, Adrian; Turner, Kagan

    2005-01-01

    The ability to evolve fault tolerant control strategies for large collections of agents is critical to the successful application of evolutionary strategies to domains where failures are common. Furthermore, while evolutionary algorithms have been highly successful in discovering single-agent control strategies, extending such algorithms to multiagent domains has proven to be difficult. In this paper we present a method for shaping evaluation functions for agents that provide control strategies that both are tolerant to different types of failures and lead to coordinated behavior in a multi-agent setting. This method neither relies of a centralized strategy (susceptible to single point of failures) nor a distributed strategy where each agent uses a system wide evaluation function (severe credit assignment problem). In a multi-rover problem, we show that agents using our agent-specific evaluation perform up to 500% better than agents using the system evaluation. In addition we show that agents are still able to maintain a high level of performance when up to 60% of the agents fail due to actuator, communication or controller faults.

  4. Algorithm-Based Fault Tolerance for Numerical Subroutines

    NASA Technical Reports Server (NTRS)

    Tumon, Michael; Granat, Robert; Lou, John

    2007-01-01

    A software library implements a new methodology of detecting faults in numerical subroutines, thus enabling application programs that contain the subroutines to recover transparently from single-event upsets. The software library in question is fault-detecting middleware that is wrapped around the numericalsubroutines. Conventional serial versions (based on LAPACK and FFTW) and a parallel version (based on ScaLAPACK) exist. The source code of the application program that contains the numerical subroutines is not modified, and the middleware is transparent to the user. The methodology used is a type of algorithm- based fault tolerance (ABFT). In ABFT, a checksum is computed before a computation and compared with the checksum of the computational result; an error is declared if the difference between the checksums exceeds some threshold. Novel normalization methods are used in the checksum comparison to ensure correct fault detections independent of algorithm inputs. In tests of this software reported in the peer-reviewed literature, this library was shown to enable detection of 99.9 percent of significant faults while generating no false alarms.

  5. Fault tolerant channel-encrypting quantum dialogue against collective noise

    NASA Astrophysics Data System (ADS)

    Ye, TianYu

    2015-04-01

    In this paper, two fault tolerant channel-encrypting quantum dialogue (QD) protocols against collective noise are presented. One is against collective-dephasing noise, while the other is against collective-rotation noise. The decoherent-free states, each of which is composed of two physical qubits, act as traveling states combating collective noise. Einstein-Podolsky-Rosen pairs, which play the role of private quantum key, are securely shared between two participants over a collective-noise channel in advance. Through encryption and decryption with private quantum key, the initial state of each traveling two-photon logical qubit is privately shared between two participants. Due to quantum encryption sharing of the initial state of each traveling logical qubit, the issue of information leakage is overcome. The private quantum key can be repeatedly used after rotation as long as the rotation angle is properly chosen, making quantum resource economized. As a result, their information-theoretical efficiency is nearly up to 66.7%. The proposed QD protocols only need single-photon measurements rather than two-photon joint measurements for quantum measurements. Security analysis shows that an eavesdropper cannot obtain anything useful about secret messages during the dialogue process without being discovered. Furthermore, the proposed QD protocols can be implemented with current techniques in experiment.

  6. Making classical ground-state spin computing fault-tolerant.

    PubMed

    Crosson, I J; Bacon, D; Brown, K R

    2010-09-01

    We examine a model of classical deterministic computing in which the ground state of the classical system is a spatial history of the computation. This model is relevant to quantum dot cellular automata as well as to recent universal adiabatic quantum computing constructions. In its most primitive form, systems constructed in this model cannot compute in an error-free manner when working at nonzero temperature. However, by exploiting a mapping between the partition function for this model and probabilistic classical circuits we are able to show that it is possible to make this model effectively error-free. We achieve this by using techniques in fault-tolerant classical computing and the result is that the system can compute effectively error-free if the temperature is below a critical temperature. We further link this model to computational complexity and show that a certain problem concerning finite temperature classical spin systems is complete for the complexity class Merlin-Arthur. This provides an interesting connection between the physical behavior of certain many-body spin systems and computational complexity. PMID:21230024

  7. Study on fault-tolerant processors for advanced launch system

    NASA Technical Reports Server (NTRS)

    Shin, Kang G.; Liu, Jyh-Charn

    1990-01-01

    Issues related to the reliability of a redundant system with large main memory are addressed. The Fault-Tolerant Processor (FTP) for the Advanced Launch System (ALS) is used as a basis for the presentation. When the system is free of latent faults, the probability of system crash due to multiple channel faults is shown to be insignificant even when voting on the outputs of computing channels is infrequent. Using channel error maskers (CEMs) is shown to improve reliability more effectively than increasing redundancy or the number of channels for applications with long mission times. Even without using a voter, most memory errors can be immediately corrected by those CEMs implemented with conventional coding techniques. In addition to their ability to enhance system reliability, CEMs (with a very low hardware overhead) can be used to dramatically reduce not only the need of memory realignment, but also the time required to realign channel memories in case, albeit rare, such a need arises. Using CEMs, two different schemes were developed to solve the memory realignment problem. In both schemes, most errors are corrected by CEMs, and the remaining errors are masked by a voter.

  8. Fault tolerance control for proton exchange membrane fuel cell systems

    NASA Astrophysics Data System (ADS)

    Wu, Xiaojuan; Zhou, Boyang

    2016-08-01

    Fault diagnosis and controller design are two important aspects to improve proton exchange membrane fuel cell (PEMFC) system durability. However, the two tasks are often separately performed. For example, many pressure and voltage controllers have been successfully built. However, these controllers are designed based on the normal operation of PEMFC. When PEMFC faces problems such as flooding or membrane drying, a controller with a specific design must be used. This paper proposes a unique scheme that simultaneously performs fault diagnosis and tolerance control for the PEMFC system. The proposed control strategy consists of a fault diagnosis, a reconfiguration mechanism and adjustable controllers. Using a back-propagation neural network, a model-based fault detection method is employed to detect the PEMFC current fault type (flooding, membrane drying or normal). According to the diagnosis results, the reconfiguration mechanism determines which backup controllers to be selected. Three nonlinear controllers based on feedback linearization approaches are respectively built to adjust the voltage and pressure difference in the case of normal, membrane drying and flooding conditions. The simulation results illustrate that the proposed fault tolerance control strategy can track the voltage and keep the pressure difference at desired levels in faulty conditions.

  9. RAID Unbound: Storage Fault Tolerance in a Distributed Environment

    NASA Technical Reports Server (NTRS)

    Ritchie, Brian

    1996-01-01

    Mirroring, data replication, backup, and more recently, redundant arrays of independent disks (RAID) are all technologies used to protect and ensure access to critical company data. A new set of problems has arisen as data becomes more and more geographically distributed. Each of the technologies listed above provides important benefits; but each has failed to adapt fully to the realities of distributed computing. The key to data high availability and protection is to take the technologies' strengths and 'virtualize' them across a distributed network. RAID and mirroring offer high data availability, which data replication and backup provide strong data protection. If we take these concepts at a very granular level (defining user, record, block, file, or directory types) and them liberate them from the physical subsystems with which they have traditionally been associated, we have the opportunity to create a highly scalable network wide storage fault tolerance. The network becomes the virtual storage space in which the traditional concepts of data high availability and protection are implemented without their corresponding physical constraints.

  10. Fault-tolerant error correction with the gauge color code.

    PubMed

    Brown, Benjamin J; Nickerson, Naomi H; Browne, Dan E

    2016-01-01

    The constituent parts of a quantum computer are inherently vulnerable to errors. To this end, we have developed quantum error-correcting codes to protect quantum information from noise. However, discovering codes that are capable of a universal set of computational operations with the minimal cost in quantum resources remains an important and ongoing challenge. One proposal of significant recent interest is the gauge color code. Notably, this code may offer a reduced resource cost over other well-studied fault-tolerant architectures by using a new method, known as gauge fixing, for performing the non-Clifford operations that are essential for universal quantum computation. Here we examine the gauge color code when it is subject to noise. Specifically, we make use of single-shot error correction to develop a simple decoding algorithm for the gauge color code, and we numerically analyse its performance. Remarkably, we find threshold error rates comparable to those of other leading proposals. Our results thus provide the first steps of a comparative study between the gauge color code and other promising computational architectures. PMID:27470619