The Curiosity Mars Rover's Fault Protection Engine
NASA Technical Reports Server (NTRS)
Benowitz, Ed
2014-01-01
The Curiosity Rover, currently operating on Mars, contains flight software onboard to autonomously handle aspects of system fault protection. Over 1000 monitors and 39 responses are present in the flight software. Orchestrating these behaviors is the flight software's fault protection engine. In this paper, we discuss the engine's design, responsibilities, and present some lessons learned for future missions.
MER Surface Phase; Blurring the Line Between Fault Protection and What is Supposed to Happen
NASA Technical Reports Server (NTRS)
Reeves, Glenn E.
2008-01-01
An assessment on the limitations of communication with MER rovers and how such constraints drove the system design, flight software and fault protection architecture, blurring the line between traditional fault protection and expected nominal behavior, and requiring the most novel autonomous and semi-autonomous elements of the vehicle software including communication, surface mobility, attitude knowledge acquisition, fault protection, and the activity arbitration service.
Staged-Fault Testing of Distance Protection Relay Settings
NASA Astrophysics Data System (ADS)
Havelka, J.; Malarić, R.; Frlan, K.
2012-01-01
In order to analyze the operation of the protection system during induced fault testing in the Croatian power system, a simulation using the CAPE software has been performed. The CAPE software (Computer-Aided Protection Engineering) is expert software intended primarily for relay protection engineers, which calculates current and voltage values during faults in the power system, so that relay protection devices can be properly set up. Once the accuracy of the simulation model had been confirmed, a series of simulations were performed in order to obtain the optimal fault location to test the protection system. The simulation results were used to specify the test sequence definitions for the end-to-end relay testing using advanced testing equipment with GPS synchronization for secondary injection in protection schemes based on communication. The objective of the end-to-end testing was to perform field validation of the protection settings, including verification of the circuit breaker operation, telecommunication channel time and the effectiveness of the relay algorithms. Once the end-to-end secondary injection testing had been completed, the induced fault testing was performed with three-end lines loaded and in service. This paper describes and analyses the test procedure, consisting of CAPE simulations, end-to-end test with advanced secondary equipment and staged-fault test of a three-end power line in the Croatian transmission system.
Protecting Against Faults in JPL Spacecraft
NASA Technical Reports Server (NTRS)
Morgan, Paula
2007-01-01
A paper discusses techniques for protecting against faults in spacecraft designed and operated by NASA s Jet Propulsion Laboratory (JPL). The paper addresses, more specifically, fault-protection requirements and techniques common to most JPL spacecraft (in contradistinction to unique, mission specific techniques), standard practices in the implementation of these techniques, and fault-protection software architectures. Common requirements include those to protect onboard command, data-processing, and control computers; protect against loss of Earth/spacecraft radio communication; maintain safe temperatures; and recover from power overloads. The paper describes fault-protection techniques as part of a fault-management strategy that also includes functional redundancy, redundant hardware, and autonomous monitoring of (1) the operational and health statuses of spacecraft components, (2) temperatures inside and outside the spacecraft, and (3) allocation of power. The strategy also provides for preprogrammed automated responses to anomalous conditions. In addition, the software running in almost every JPL spacecraft incorporates a general-purpose "Safe Mode" response algorithm that configures the spacecraft in a lower-power state that is safe and predictable, thereby facilitating diagnosis of more complex faults by a team of human experts on Earth.
Emerging technologies for V&V of ISHM software for space exploration
NASA Technical Reports Server (NTRS)
Feather, Martin S.; Markosian, Lawrence Z.
2006-01-01
Systems1,2 required to exhibit high operational reliability often rely on some form of fault protection to recognize and respond to faults, preventing faults' escalation to catastrophic failures. Integrated System Health Management (ISHM) extends the functionality of fault protection to both scale to more complex systems (and systems of systems), and to maintain capability rather than just avert catastrophe. Forms of ISHM have been utilized to good effect in the maintenance phase of systems' total lifecycles (often referred to as 'condition-based mainte-nance'), but less so in a 'fault protection' role during actual operations. One of the impediments to such use lies in the challenges of verification, validation and certification of ISHM systems themselves. This paper makes the case that state-of-the-practice V&V and certification techniques will not suffice for emerging forms of ISHM systems; however, a number of maturing software engineering assurance technologies show particular promise for addressing these ISHM V&V challenges.
Novel elastic protection against DDF failures in an enhanced software-defined SIEPON
NASA Astrophysics Data System (ADS)
Pakpahan, Andrew Fernando; Hwang, I.-Shyan; Yu, Yu-Ming; Hsu, Wu-Hsiao; Liem, Andrew Tanny; Nikoukar, AliAkbar
2017-07-01
Ever-increasing bandwidth demands on passive optical networks (PONs) are pushing the utilization of every fiber strand to its limit. This is mandating comprehensive protection until the end of the distribution drop fiber (DDF). Hence, it is important to provide refined protection with an advanced fault-protection architecture and recovery mechanism that is able to cope with various DDF failures. We propose a novel elastic protection against DDF failures that incorporates a software-defined networking (SDN) capability and a bus protection line to enhance the resiliency of the existing Service Interoperability in Ethernet Passive Optical Networks (SIEPON) system. We propose the addition of an integrated SDN controller and flow tables to the optical line terminal and optical network units (ONUs) in order to deliver various DDF protection scenarios. The proposed architecture enables flexible assignment of backup ONU(s) in pre/post-fault conditions depending on the PON traffic load. A transient backup ONU and multiple backup ONUs can be deployed in the pre-fault and post-fault scenarios, respectively. Our extensively discussed simulation results show that our proposed architecture provides better overall throughput and drop probability compared to the architecture with a fixed DDF protection mechanism. It does so while still maintaining overall QoS performance in terms of packet delay, mean jitter, packet loss, and throughput under various fault conditions.
Advanced microprocessor based power protection system using artificial neural network techniques
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Z.; Kalam, A.; Zayegh, A.
This paper describes an intelligent embedded microprocessor based system for fault classification in power system protection system using advanced 32-bit microprocessor technology. The paper demonstrates the development of protective relay to provide overcurrent protection schemes for fault detection. It also describes a method for power fault classification in three-phase system based on the use of neural network technology. The proposed design is implemented and tested on a single line three phase power system in power laboratory. Both the hardware and software development are described in detail.
Methodology for Designing Fault-Protection Software
NASA Technical Reports Server (NTRS)
Barltrop, Kevin; Levison, Jeffrey; Kan, Edwin
2006-01-01
A document describes a methodology for designing fault-protection (FP) software for autonomous spacecraft. The methodology embodies and extends established engineering practices in the technical discipline of Fault Detection, Diagnosis, Mitigation, and Recovery; and has been successfully implemented in the Deep Impact Spacecraft, a NASA Discovery mission. Based on established concepts of Fault Monitors and Responses, this FP methodology extends the notion of Opinion, Symptom, Alarm (aka Fault), and Response with numerous new notions, sub-notions, software constructs, and logic and timing gates. For example, Monitor generates a RawOpinion, which graduates into Opinion, categorized into no-opinion, acceptable, or unacceptable opinion. RaiseSymptom, ForceSymptom, and ClearSymptom govern the establishment and then mapping to an Alarm (aka Fault). Local Response is distinguished from FP System Response. A 1-to-n and n-to- 1 mapping is established among Monitors, Symptoms, and Responses. Responses are categorized by device versus by function. Responses operate in tiers, where the early tiers attempt to resolve the Fault in a localized step-by-step fashion, relegating more system-level response to later tier(s). Recovery actions are gated by epoch recovery timing, enabling strategy, urgency, MaxRetry gate, hardware availability, hazardous versus ordinary fault, and many other priority gates. This methodology is systematic, logical, and uses multiple linked tables, parameter files, and recovery command sequences. The credibility of the FP design is proven via a fault-tree analysis "top-down" approach, and a functional fault-mode-effects-and-analysis via "bottoms-up" approach. Via this process, the mitigation and recovery strategy(s) per Fault Containment Region scope (width versus depth) the FP architecture.
Spacecraft fault tolerance: The Magellan experience
NASA Technical Reports Server (NTRS)
Kasuda, Rick; Packard, Donna Sexton
1993-01-01
Interplanetary and earth orbiting missions are now imposing unique fault tolerant requirements upon spacecraft design. Mission success is the prime motivator for building spacecraft with fault tolerant systems. The Magellan spacecraft had many such requirements imposed upon its design. Magellan met these requirements by building redundancy into all the major subsystem components and designing the onboard hardware and software with the capability to detect a fault, isolate it to a component, and issue commands to achieve a back-up configuration. This discussion is limited to fault protection, which is the autonomous capability to respond to a fault. The Magellan fault protection design is discussed, as well as the developmental and flight experiences and a summary of the lessons learned.
Galileo spacecraft power distribution and autonomous fault recovery
NASA Technical Reports Server (NTRS)
Detwiler, R. C.
1982-01-01
There is a trend in current spacecraft design to achieve greater fault tolerance through the implemenation of on-board software dedicated to detecting and isolating failures. A combination of hardware and software is utilized in the Galileo power system for autonomous fault recovery. Galileo is a dual-spun spacecraft designed to carry a number of scientific instruments into a series of orbits around the planet Jupiter. In addition to its self-contained scientific payload, it will also carry a probe system which will be separated from the spacecraft some 150 days prior to Jupiter encounter. The Galileo spacecraft is scheduled to be launched in 1985. Attention is given to the power system, the fault protection requirements, and the power fault recovery implementation.
Drive and protection circuit for converter module of cascaded H-bridge STATCOM
NASA Astrophysics Data System (ADS)
Wang, Xuan; Yuan, Hongliang; Wang, Xiaoxing; Wang, Shuai; Fu, Yongsheng
2018-04-01
Drive and protection circuit is an important part of power electronics, which is related to safe and stable operation issues in the power electronics. The drive and protection circuit is designed for the cascaded H-bridge STATCOM. This circuit can realize flexible dead-time setting, operation status self-detection, fault priority protection and detailed fault status uploading. It can help to improve the reliability of STATCOM's operation. Finally, the proposed circuit is tested and analyzed by power electronic simulation software PSPICE (Simulation Program with IC Emphasis) and a series of experiments. Further studies showed that the proposed circuit can realize drive and control of H-bridge circuit, meanwhile it also can realize fast processing faults and have advantage of high reliability.
Havens: Explicit Reliable Memory Regions for HPC Applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hukerikar, Saurabh; Engelmann, Christian
2016-01-01
Supporting error resilience in future exascale-class supercomputing systems is a critical challenge. Due to transistor scaling trends and increasing memory density, scientific simulations are expected to experience more interruptions caused by transient errors in the system memory. Existing hardware-based detection and recovery techniques will be inadequate to manage the presence of high memory fault rates. In this paper we propose a partial memory protection scheme based on region-based memory management. We define the concept of regions called havens that provide fault protection for program objects. We provide reliability for the regions through a software-based parity protection mechanism. Our approach enablesmore » critical program objects to be placed in these havens. The fault coverage provided by our approach is application agnostic, unlike algorithm-based fault tolerance techniques.« less
A fault-tolerant intelligent robotic control system
NASA Technical Reports Server (NTRS)
Marzwell, Neville I.; Tso, Kam Sing
1993-01-01
This paper describes the concept, design, and features of a fault-tolerant intelligent robotic control system being developed for space and commercial applications that require high dependability. The comprehensive strategy integrates system level hardware/software fault tolerance with task level handling of uncertainties and unexpected events for robotic control. The underlying architecture for system level fault tolerance is the distributed recovery block which protects against application software, system software, hardware, and network failures. Task level fault tolerance provisions are implemented in a knowledge-based system which utilizes advanced automation techniques such as rule-based and model-based reasoning to monitor, diagnose, and recover from unexpected events. The two level design provides tolerance of two or more faults occurring serially at any level of command, control, sensing, or actuation. The potential benefits of such a fault tolerant robotic control system include: (1) a minimized potential for damage to humans, the work site, and the robot itself; (2) continuous operation with a minimum of uncommanded motion in the presence of failures; and (3) more reliable autonomous operation providing increased efficiency in the execution of robotic tasks and decreased demand on human operators for controlling and monitoring the robotic servicing routines.
Risk-Significant Adverse Condition Awareness Strengthens Assurance of Fault Management Systems
NASA Technical Reports Server (NTRS)
Fitz, Rhonda
2017-01-01
As spaceflight systems increase in complexity, Fault Management (FM) systems are ranked high in risk-based assessment of software criticality, emphasizing the importance of establishing highly competent domain expertise to provide assurance. Adverse conditions (ACs) and specific vulnerabilities encountered by safety- and mission-critical software systems have been identified through efforts to reduce the risk posture of software-intensive NASA missions. Acknowledgement of potential off-nominal conditions and analysis to determine software system resiliency are important aspects of hazard analysis and FM. A key component of assuring FM is an assessment of how well software addresses susceptibility to failure through consideration of ACs. Focus on significant risk predicted through experienced analysis conducted at the NASA Independent Verification & Validation (IV&V) Program enables the scoping of effective assurance strategies with regard to overall asset protection of complex spaceflight as well as ground systems. Research efforts sponsored by NASAs Office of Safety and Mission Assurance (OSMA) defined terminology, categorized data fields, and designed a baseline repository that centralizes and compiles a comprehensive listing of ACs and correlated data relevant across many NASA missions. This prototype tool helps projects improve analysis by tracking ACs and allowing queries based on project, mission type, domain/component, causal fault, and other key characteristics. Vulnerability in off-nominal situations, architectural design weaknesses, and unexpected or undesirable system behaviors in reaction to faults are curtailed with the awareness of ACs and risk-significant scenarios modeled for analysts through this database. Integration within the Enterprise Architecture at NASA IV&V enables interfacing with other tools and datasets, technical support, and accessibility across the Agency. This paper discusses the development of an improved workflow process utilizing this database for adaptive, risk-informed FM assurance that critical software systems will safely and securely protect against faults and respond to ACs in order to achieve successful missions.
Risk-Significant Adverse Condition Awareness Strengthens Assurance of Fault Management Systems
NASA Technical Reports Server (NTRS)
Fitz, Rhonda
2017-01-01
As spaceflight systems increase in complexity, Fault Management (FM) systems are ranked high in risk-based assessment of software criticality, emphasizing the importance of establishing highly competent domain expertise to provide assurance. Adverse conditions (ACs) and specific vulnerabilities encountered by safety- and mission-critical software systems have been identified through efforts to reduce the risk posture of software-intensive NASA missions. Acknowledgement of potential off-nominal conditions and analysis to determine software system resiliency are important aspects of hazard analysis and FM. A key component of assuring FM is an assessment of how well software addresses susceptibility to failure through consideration of ACs. Focus on significant risk predicted through experienced analysis conducted at the NASA Independent Verification Validation (IVV) Program enables the scoping of effective assurance strategies with regard to overall asset protection of complex spaceflight as well as ground systems. Research efforts sponsored by NASA's Office of Safety and Mission Assurance defined terminology, categorized data fields, and designed a baseline repository that centralizes and compiles a comprehensive listing of ACs and correlated data relevant across many NASA missions. This prototype tool helps projects improve analysis by tracking ACs and allowing queries based on project, mission type, domaincomponent, causal fault, and other key characteristics. Vulnerability in off-nominal situations, architectural design weaknesses, and unexpected or undesirable system behaviors in reaction to faults are curtailed with the awareness of ACs and risk-significant scenarios modeled for analysts through this database. Integration within the Enterprise Architecture at NASA IVV enables interfacing with other tools and datasets, technical support, and accessibility across the Agency. This paper discusses the development of an improved workflow process utilizing this database for adaptive, risk-informed FM assurance that critical software systems will safely and securely protect against faults and respond to ACs in order to achieve successful missions.
Software-implemented fault insertion: An FTMP example
NASA Technical Reports Server (NTRS)
Czeck, Edward W.; Siewiorek, Daniel P.; Segall, Zary Z.
1987-01-01
This report presents a model for fault insertion through software; describes its implementation on a fault-tolerant computer, FTMP; presents a summary of fault detection, identification, and reconfiguration data collected with software-implemented fault insertion; and compares the results to hardware fault insertion data. Experimental results show detection time to be a function of time of insertion and system workload. For the fault detection time, there is no correlation between software-inserted faults and hardware-inserted faults; this is because hardware-inserted faults must manifest as errors before detection, whereas software-inserted faults immediately exercise the error detection mechanisms. In summary, the software-implemented fault insertion is able to be used as an evaluation technique for the fault-handling capabilities of a system in fault detection, identification and recovery. Although the software-inserted faults do not map directly to hardware-inserted faults, experiments show software-implemented fault insertion is capable of emulating hardware fault insertion, with greater ease and automation.
Abstract for 1999 Rational Software User Conference
NASA Technical Reports Server (NTRS)
Dunphy, Julia; Rouquette, Nicolas; Feather, Martin; Tung, Yu-Wen
1999-01-01
We develop spacecraft fault-protection software at NASA/JPL. Challenges exemplified by our task: 1) high-quality systems - need for extensive validation & verification; 2) multi-disciplinary context - involves experts from diverse areas; 3) embedded systems - must adapt to external practices, notations, etc.; and 4) development pressures - NASA's mandate of "better, faster, cheaper".
Toward a Model-Based Approach to Flight System Fault Protection
NASA Technical Reports Server (NTRS)
Day, John; Murray, Alex; Meakin, Peter
2012-01-01
Fault Protection (FP) is a distinct and separate systems engineering sub-discipline that is concerned with the off-nominal behavior of a system. Flight system fault protection is an important part of the overall flight system systems engineering effort, with its own products and processes. As with other aspects of systems engineering, the FP domain is highly amenable to expression and management in models. However, while there are standards and guidelines for performing FP related analyses, there are not standards or guidelines for formally relating the FP analyses to each other or to the system hardware and software design. As a result, the material generated for these analyses are effectively creating separate models that are only loosely-related to the system being designed. Development of approaches that enable modeling of FP concerns in the same model as the system hardware and software design enables establishment of formal relationships that has great potential for improving the efficiency, correctness, and verification of the implementation of flight system FP. This paper begins with an overview of the FP domain, and then continues with a presentation of a SysML/UML model of the FP domain and the particular analyses that it contains, by way of showing a potential model-based approach to flight system fault protection, and an exposition of the use of the FP models in FSW engineering. The analyses are small examples, inspired by current real-project examples of FP analyses.
Assurance of Fault Management: Risk-Significant Adverse Condition Awareness
NASA Technical Reports Server (NTRS)
Fitz, Rhonda
2016-01-01
Fault Management (FM) systems are ranked high in risk-based assessment of criticality within flight software, emphasizing the importance of establishing highly competent domain expertise to provide assurance for NASA projects, especially as spaceflight systems continue to increase in complexity. Insight into specific characteristics of FM architectures seen embedded within safety- and mission-critical software systems analyzed by the NASA Independent Verification Validation (IVV) Program has been enhanced with an FM Technical Reference (TR) suite. Benefits are aimed beyond the IVV community to those that seek ways to efficiently and effectively provide software assurance to reduce the FM risk posture of NASA and other space missions. The identification of particular FM architectures, visibility, and associated IVV techniques provides a TR suite that enables greater assurance that critical software systems will adequately protect against faults and respond to adverse conditions. The role FM has with regard to overall asset protection of flight software systems is being addressed with the development of an adverse condition (AC) database encompassing flight software vulnerabilities.Identification of potential off-nominal conditions and analysis to determine how a system responds to these conditions are important aspects of hazard analysis and fault management. Understanding what ACs the mission may face, and ensuring they are prevented or addressed is the responsibility of the assurance team, which necessarily should have insight into ACs beyond those defined by the project itself. Research efforts sponsored by NASAs Office of Safety and Mission Assurance defined terminology, categorized data fields, and designed a baseline repository that centralizes and compiles a comprehensive listing of ACs and correlated data relevant across many NASA missions. This prototype tool helps projects improve analysis by tracking ACs, and allowing queries based on project, mission type, domain component, causal fault, and other key characteristics. The repository has a firm structure, initial collection of data, and an interface established for informational queries, with plans for integration within the Enterprise Architecture at NASA IVV, enabling support and accessibility across the Agency. The development of an improved workflow process for adaptive, risk-informed FM assurance is currently underway.
Analysis of a hardware and software fault tolerant processor for critical applications
NASA Technical Reports Server (NTRS)
Dugan, Joanne B.
1993-01-01
Computer systems for critical applications must be designed to tolerate software faults as well as hardware faults. A unified approach to tolerating hardware and software faults is characterized by classifying faults in terms of duration (transient or permanent) rather than source (hardware or software). Errors arising from transient faults can be handled through masking or voting, but errors arising from permanent faults require system reconfiguration to bypass the failed component. Most errors which are caused by software faults can be considered transient, in that they are input-dependent. Software faults are triggered by a particular set of inputs. Quantitative dependability analysis of systems which exhibit a unified approach to fault tolerance can be performed by a hierarchical combination of fault tree and Markov models. A methodology for analyzing hardware and software fault tolerant systems is applied to the analysis of a hypothetical system, loosely based on the Fault Tolerant Parallel Processor. The models consider both transient and permanent faults, hardware and software faults, independent and related software faults, automatic recovery, and reconfiguration.
Fault Injection Validation of a Safety-Critical TMR Sysem
NASA Astrophysics Data System (ADS)
Irrera, Ivano; Madeira, Henrique; Zentai, Andras; Hergovics, Beata
2016-08-01
Digital systems and their software are the core technology for controlling and monitoring industrial systems in practically all activity domains. Functional safety standards such as the European standard EN 50128 for railway applications define the procedures and technical requirements for the development of software for railway control and protection systems. The validation of such systems is a highly demanding task. In this paper we discuss the use of fault injection techniques, which have been used extensively in several domains, particularly in the space domain, to complement the traditional procedures to validate a SIL (Safety Integrity Level) 4 system for railway signalling, implementing a TMR (Triple Modular Redundancy) architecture. The fault injection tool is based on JTAG technology. The results of our injection campaign showed a high degree of tolerance to most of the injected faults, but several cases of unexpected behaviour have also been observed, helping understanding worst-case scenarios.
NASA Technical Reports Server (NTRS)
Benowitz, E. G.; Niessner, A. F.
2003-01-01
We have successfully demonstrated a portion of the spacecraft attitude control and fault protection, running on a standard Java platform, and are currently in the process of taking advantage of the features provided by the RTSJ.
NASA Technical Reports Server (NTRS)
Fitz, Rhonda; Whitman, Gerek
2016-01-01
Research into complexities of software systems Fault Management (FM) and how architectural design decisions affect safety, preservation of assets, and maintenance of desired system functionality has coalesced into a technical reference (TR) suite that advances the provision of safety and mission assurance. The NASA Independent Verification and Validation (IV&V) Program, with Software Assurance Research Program support, extracted FM architectures across the IV&V portfolio to evaluate robustness, assess visibility for validation and test, and define software assurance methods applied to the architectures and designs. This investigation spanned IV&V projects with seven different primary developers, a wide range of sizes and complexities, and encompassed Deep Space Robotic, Human Spaceflight, and Earth Orbiter mission FM architectures. The initiative continues with an expansion of the TR suite to include Launch Vehicles, adding the benefit of investigating differences intrinsic to model-based FM architectures and insight into complexities of FM within an Agile software development environment, in order to improve awareness of how nontraditional processes affect FM architectural design and system health management. The identification of particular FM architectures, visibility, and associated IV&V techniques provides a TR suite that enables greater assurance that critical software systems will adequately protect against faults and respond to adverse conditions. Additionally, the role FM has with regard to strengthened security requirements, with potential to advance overall asset protection of flight software systems, is being addressed with the development of an adverse conditions database encompassing flight software vulnerabilities. Capitalizing on the established framework, this TR suite provides assurance capability for a variety of FM architectures and varied development approaches. Research results are being disseminated across NASA, other agencies, and the software community. This paper discusses the findings and TR suite informing the FM domain in best practices for FM architectural design, visibility observations, and methods employed for IV&V and mission assurance.
SSME digital control design characteristics
NASA Technical Reports Server (NTRS)
Mitchell, W. T.; Searle, R. F.
1985-01-01
To protect against a latent programming error (software fault) existing in an untried branch combination that would render the space shuttle out of control in a critical flight phase, the Backup Flight System (BFS) was chartered to provide a safety alternative. The BFS is designed to operate in critical flight phases (ascent and descent) by monitoring the activities of the space shuttle flight subsystems that are under control of the primary flight software (PFS) (e.g., navigation, crew interface, propulsion), then, upon manual command by the flightcrew, to assume control of the space shuttle and deliver it to a noncritical flight condition (safe orbit or touchdown). The problems associated with the selection of the PFS/BFS system architecture, the internal BFS architecture, the fault tolerant software mechanisms, and the long term BFS utility are discussed.
NASA Technical Reports Server (NTRS)
Brunelle, J. E.; Eckhardt, D. E., Jr.
1985-01-01
Results are presented of an experiment conducted in the NASA Avionics Integrated Research Laboratory (AIRLAB) to investigate the implementation of fault-tolerant software techniques on fault-tolerant computer architectures, in particular the Software Implemented Fault Tolerance (SIFT) computer. The N-version programming and recovery block techniques were implemented on a portion of the SIFT operating system. The results indicate that, to effectively implement fault-tolerant software design techniques, system requirements will be impacted and suggest that retrofitting fault-tolerant software on existing designs will be inefficient and may require system modification.
Factors That Affect Software Testability
NASA Technical Reports Server (NTRS)
Voas, Jeffrey M.
1991-01-01
Software faults that infrequently affect software's output are dangerous. When a software fault causes frequent software failures, testing is likely to reveal the fault before the software is releases; when the fault remains undetected during testing, it can cause disaster after the software is installed. A technique for predicting whether a particular piece of software is likely to reveal faults within itself during testing is found in [Voas91b]. A piece of software that is likely to reveal faults within itself during testing is said to have high testability. A piece of software that is not likely to reveal faults within itself during testing is said to have low testability. It is preferable to design software with higher testabilities from the outset, i.e., create software with as high of a degree of testability as possible to avoid the problems of having undetected faults that are associated with low testability. Information loss is a phenomenon that occurs during program execution that increases the likelihood that a fault will remain undetected. In this paper, I identify two brad classes of information loss, define them, and suggest ways of predicting the potential for information loss to occur. We do this in order to decrease the likelihood that faults will remain undetected during testing.
Practical Methods for Estimating Software Systems Fault Content and Location
NASA Technical Reports Server (NTRS)
Nikora, A.; Schneidewind, N.; Munson, J.
1999-01-01
Over the past several years, we have developed techniques to discriminate between fault-prone software modules and those that are not, to estimate a software system's residual fault content, to identify those portions of a software system having the highest estimated number of faults, and to estimate the effects of requirements changes on software quality.
Software dependability in the Tandem GUARDIAN system
NASA Technical Reports Server (NTRS)
Lee, Inhwan; Iyer, Ravishankar K.
1995-01-01
Based on extensive field failure data for Tandem's GUARDIAN operating system this paper discusses evaluation of the dependability of operational software. Software faults considered are major defects that result in processor failures and invoke backup processes to take over. The paper categorizes the underlying causes of software failures and evaluates the effectiveness of the process pair technique in tolerating software faults. A model to describe the impact of software faults on the reliability of an overall system is proposed. The model is used to evaluate the significance of key factors that determine software dependability and to identify areas for improvement. An analysis of the data shows that about 77% of processor failures that are initially considered due to software are confirmed as software problems. The analysis shows that the use of process pairs to provide checkpointing and restart (originally intended for tolerating hardware faults) allows the system to tolerate about 75% of reported software faults that result in processor failures. The loose coupling between processors, which results in the backup execution (the processor state and the sequence of events) being different from the original execution, is a major reason for the measured software fault tolerance. Over two-thirds (72%) of measured software failures are recurrences of previously reported faults. Modeling, based on the data, shows that, in addition to reducing the number of software faults, software dependability can be enhanced by reducing the recurrence rate.
Multi-version software reliability through fault-avoidance and fault-tolerance
NASA Technical Reports Server (NTRS)
Vouk, Mladen A.; Mcallister, David F.
1989-01-01
A number of experimental and theoretical issues associated with the practical use of multi-version software to provide run-time tolerance to software faults were investigated. A specialized tool was developed and evaluated for measuring testing coverage for a variety of metrics. The tool was used to collect information on the relationships between software faults and coverage provided by the testing process as measured by different metrics (including data flow metrics). Considerable correlation was found between coverage provided by some higher metrics and the elimination of faults in the code. Back-to-back testing was continued as an efficient mechanism for removal of un-correlated faults, and common-cause faults of variable span. Software reliability estimation methods was also continued based on non-random sampling, and the relationship between software reliability and code coverage provided through testing. New fault tolerance models were formulated. Simulation studies of the Acceptance Voting and Multi-stage Voting algorithms were finished and it was found that these two schemes for software fault tolerance are superior in many respects to some commonly used schemes. Particularly encouraging are the safety properties of the Acceptance testing scheme.
Software Fault Tolerance: A Tutorial
NASA Technical Reports Server (NTRS)
Torres-Pomales, Wilfredo
2000-01-01
Because of our present inability to produce error-free software, software fault tolerance is and will continue to be an important consideration in software systems. The root cause of software design errors is the complexity of the systems. Compounding the problems in building correct software is the difficulty in assessing the correctness of software for highly complex systems. After a brief overview of the software development processes, we note how hard-to-detect design faults are likely to be introduced during development and how software faults tend to be state-dependent and activated by particular input sequences. Although component reliability is an important quality measure for system level analysis, software reliability is hard to characterize and the use of post-verification reliability estimates remains a controversial issue. For some applications software safety is more important than reliability, and fault tolerance techniques used in those applications are aimed at preventing catastrophes. Single version software fault tolerance techniques discussed include system structuring and closure, atomic actions, inline fault detection, exception handling, and others. Multiversion techniques are based on the assumption that software built differently should fail differently and thus, if one of the redundant versions fails, it is expected that at least one of the other versions will provide an acceptable output. Recovery blocks, N-version programming, and other multiversion techniques are reviewed.
NASA Technical Reports Server (NTRS)
Tomayko, James E.
1986-01-01
Twenty-five years of spacecraft onboard computer development have resulted in a better understanding of the requirements for effective, efficient, and fault tolerant flight computer systems. Lessons from eight flight programs (Gemini, Apollo, Skylab, Shuttle, Mariner, Voyager, and Galileo) and three reserach programs (digital fly-by-wire, STAR, and the Unified Data System) are useful in projecting the computer hardware configuration of the Space Station and the ways in which the Ada programming language will enhance the development of the necessary software. The evolution of hardware technology, fault protection methods, and software architectures used in space flight in order to provide insight into the pending development of such items for the Space Station are reviewed.
Fault tolerant software modules for SIFT
NASA Technical Reports Server (NTRS)
Hecht, M.; Hecht, H.
1982-01-01
The implementation of software fault tolerance is investigated for critical modules of the Software Implemented Fault Tolerance (SIFT) operating system to support the computational and reliability requirements of advanced fly by wire transport aircraft. Fault tolerant designs generated for the error reported and global executive are examined. A description of the alternate routines, implementation requirements, and software validation are included.
Reconfigurable fault tolerant avionics system
NASA Astrophysics Data System (ADS)
Ibrahim, M. M.; Asami, K.; Cho, Mengu
This paper presents the design of a reconfigurable avionics system based on modern Static Random Access Memory (SRAM)-based Field Programmable Gate Array (FPGA) to be used in future generations of nano satellites. A major concern in satellite systems and especially nano satellites is to build robust systems with low-power consumption profiles. The system is designed to be flexible by providing the capability of reconfiguring itself based on its orbital position. As Single Event Upsets (SEU) do not have the same severity and intensity in all orbital locations, having the maximum at the South Atlantic Anomaly (SAA) and the polar cusps, the system does not have to be fully protected all the time in its orbit. An acceptable level of protection against high-energy cosmic rays and charged particles roaming in space is provided within the majority of the orbit through software fault tolerance. Check pointing and roll back, besides control flow assertions, is used for that level of protection. In the minority part of the orbit where severe SEUs are expected to exist, a reconfiguration for the system FPGA is initiated where the processor systems are triplicated and protection through Triple Modular Redundancy (TMR) with feedback is provided. This technique of reconfiguring the system as per the level of the threat expected from SEU-induced faults helps in reducing the average dynamic power consumption of the system to one-third of its maximum. This technique can be viewed as a smart protection through system reconfiguration. The system is built on the commercial version of the (XC5VLX50) Xilinx Virtex5 FPGA on bulk silicon with 324 IO. Simulations of orbit SEU rates were carried out using the SPENVIS web-based software package.
Li, Qiuying; Pham, Hoang
2017-01-01
In this paper, we propose a software reliability model that considers not only error generation but also fault removal efficiency combined with testing coverage information based on a nonhomogeneous Poisson process (NHPP). During the past four decades, many software reliability growth models (SRGMs) based on NHPP have been proposed to estimate the software reliability measures, most of which have the same following agreements: 1) it is a common phenomenon that during the testing phase, the fault detection rate always changes; 2) as a result of imperfect debugging, fault removal has been related to a fault re-introduction rate. But there are few SRGMs in the literature that differentiate between fault detection and fault removal, i.e. they seldom consider the imperfect fault removal efficiency. But in practical software developing process, fault removal efficiency cannot always be perfect, i.e. the failures detected might not be removed completely and the original faults might still exist and new faults might be introduced meanwhile, which is referred to as imperfect debugging phenomenon. In this study, a model aiming to incorporate fault introduction rate, fault removal efficiency and testing coverage into software reliability evaluation is developed, using testing coverage to express the fault detection rate and using fault removal efficiency to consider the fault repair. We compare the performance of the proposed model with several existing NHPP SRGMs using three sets of real failure data based on five criteria. The results exhibit that the model can give a better fitting and predictive performance.
A methodology for testing fault-tolerant software
NASA Technical Reports Server (NTRS)
Andrews, D. M.; Mahmood, A.; Mccluskey, E. J.
1985-01-01
A methodology for testing fault tolerant software is presented. There are problems associated with testing fault tolerant software because many errors are masked or corrected by voters, limiter, or automatic channel synchronization. This methodology illustrates how the same strategies used for testing fault tolerant hardware can be applied to testing fault tolerant software. For example, one strategy used in testing fault tolerant hardware is to disable the redundancy during testing. A similar testing strategy is proposed for software, namely, to move the major emphasis on testing earlier in the development cycle (before the redundancy is in place) thus reducing the possibility that undetected errors will be masked when limiters and voters are added.
Non-Pilot Protection of the HVDC Grid
NASA Astrophysics Data System (ADS)
Badrkhani Ajaei, Firouz
This thesis develops a non-pilot protection system for the next generation power transmission system, the High-Voltage Direct Current (HVDC) grid. The HVDC grid protection system is required to be (i) adequately fast to prevent damages and/or converter blocking and (ii) reliable to minimize the impacts of faults. This study is mainly focused on the Modular Multilevel Converter (MMC) -based HVDC grid since the MMC is considered as the building block of the future HVDC systems. The studies reported in this thesis include (i) developing an enhanced equivalent model of the MMC to enable accurate representation of its DC-side fault response, (ii) developing a realistic HVDC-AC test system that includes a five-terminal MMC-based HVDC grid embedded in a large interconnected AC network, (iii) investigating the transient response of the developed test system to AC-side and DC-side disturbances in order to determine the HVDC grid protection requirements, (iv) investigating the fault surge propagation in the HVDC grid to determine the impacts of the DC-side fault location on the measured signals at each relay location, (v) designing a protection algorithm that detects and locates DC-side faults reliably and sufficiently fast to prevent relay malfunction and unnecessary blocking of the converters, and (vi) performing hardware-in-the-loop tests on the designed relay to verify its potential to be implemented in hardware. The results of the off-line time domain transients studies in the PSCAD software platform and the real-time hardware-in-the-loop tests using an enhanced version of the RTDS platform indicate that the developed HVDC grid relay meets all technical requirements including speed, dependability, security, selectivity, and robustness. Moreover, the developed protection algorithm does not impose considerable computational burden on the hardware.
Finite Element Method Applied to Fuse Protection Design
NASA Astrophysics Data System (ADS)
Li, Sen; Song, Zhiquan; Zhang, Ming; Xu, Liuwei; Li, Jinchao; Fu, Peng; Wang, Min; Dong, Lin
2014-03-01
In a poloidal field (PF) converter module, fuse protection is of great importance to ensure the safety of the thyristors. The fuse is pre-selected in a traditional way and then verified by finite element analysis. A 3D physical model is built by ANSYS software to solve the thermal-electric coupled problem of transient process in case of external fault. The result shows that this method is feasible.
Li, Qiuying; Pham, Hoang
2017-01-01
In this paper, we propose a software reliability model that considers not only error generation but also fault removal efficiency combined with testing coverage information based on a nonhomogeneous Poisson process (NHPP). During the past four decades, many software reliability growth models (SRGMs) based on NHPP have been proposed to estimate the software reliability measures, most of which have the same following agreements: 1) it is a common phenomenon that during the testing phase, the fault detection rate always changes; 2) as a result of imperfect debugging, fault removal has been related to a fault re-introduction rate. But there are few SRGMs in the literature that differentiate between fault detection and fault removal, i.e. they seldom consider the imperfect fault removal efficiency. But in practical software developing process, fault removal efficiency cannot always be perfect, i.e. the failures detected might not be removed completely and the original faults might still exist and new faults might be introduced meanwhile, which is referred to as imperfect debugging phenomenon. In this study, a model aiming to incorporate fault introduction rate, fault removal efficiency and testing coverage into software reliability evaluation is developed, using testing coverage to express the fault detection rate and using fault removal efficiency to consider the fault repair. We compare the performance of the proposed model with several existing NHPP SRGMs using three sets of real failure data based on five criteria. The results exhibit that the model can give a better fitting and predictive performance. PMID:28750091
Technical Basis for Evaluating Software-Related Common-Cause Failures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Muhlheim, Michael David; Wood, Richard
2016-04-01
The instrumentation and control (I&C) system architecture at a nuclear power plant (NPP) incorporates protections against common-cause failures (CCFs) through the use of diversity and defense-in-depth. Even for well-established analog-based I&C system designs, the potential for CCFs of multiple systems (or redundancies within a system) constitutes a credible threat to defeating the defense-in-depth provisions within the I&C system architectures. The integration of digital technologies into the I&C systems provides many advantages compared to the aging analog systems with respect to reliability, maintenance, operability, and cost effectiveness. However, maintaining the diversity and defense-in-depth for both the hardware and software within themore » digital system is challenging. In fact, the introduction of digital technologies may actually increase the potential for CCF vulnerabilities because of the introduction of undetected systematic faults. These systematic faults are defined as a “design fault located in a software component” and at a high level, are predominately the result of (1) errors in the requirement specification, (2) inadequate provisions to account for design limits (e.g., environmental stress), or (3) technical faults incorporated in the internal system (or architectural) design or implementation. Other technology-neutral CCF concerns include hardware design errors, equipment qualification deficiencies, installation or maintenance errors, instrument loop scaling and setpoint mistakes.« less
Practical Issues in Implementing Software Reliability Measurement
NASA Technical Reports Server (NTRS)
Nikora, Allen P.; Schneidewind, Norman F.; Everett, William W.; Munson, John C.; Vouk, Mladen A.; Musa, John D.
1999-01-01
Many ways of estimating software systems' reliability, or reliability-related quantities, have been developed over the past several years. Of particular interest are methods that can be used to estimate a software system's fault content prior to test, or to discriminate between components that are fault-prone and those that are not. The results of these methods can be used to: 1) More accurately focus scarce fault identification resources on those portions of a software system most in need of it. 2) Estimate and forecast the risk of exposure to residual faults in a software system during operation, and develop risk and safety criteria to guide the release of a software system to fielded use. 3) Estimate the efficiency of test suites in detecting residual faults. 4) Estimate the stability of the software maintenance process.
Assessing Survivability Using Software Fault Injection
2001-04-01
UNCLASSIFIED Defense Technical Information Center Compilation Part Notice ADPO10875 TITLE: Assessing Survivability Using Software Fault Injection...Esc to exit .......................................................................... = 11-1 Assessing Survivability Using Software Fault Injection...Jeffrey Voas Reliable Software Technologies 21351 Ridgetop Circle, #400 Dulles, VA 20166 jmvoas@rstcorp.crom Abstract approved sources have the
Lessons Learned in the Livingstone 2 on Earth Observing One Flight Experiment
NASA Technical Reports Server (NTRS)
Hayden, Sandra C.; Sweet, Adam J.; Shulman, Seth
2005-01-01
The Livingstone 2 (L2) model-based diagnosis software is a reusable diagnostic tool for monitoring complex systems. In 2004, L2 was integrated with the JPL Autonomous Sciencecraft Experiment (ASE) and deployed on-board Goddard's Earth Observing One (EO-1) remote sensing satellite, to monitor and diagnose the EO-1 space science instruments and imaging sequence. This paper reports on lessons learned from this flight experiment. The goals for this experiment, including validation of minimum success criteria and of a series of diagnostic scenarios, have all been successfully net. Long-term operations in space are on-going, as a test of the maturity of the system, with L2 performance remaining flawless. L2 has demonstrated the ability to track the state of the system during nominal operations, detect simulated abnormalities in operations and isolate failures to their root cause fault. Specific advances demonstrated include diagnosis of ambiguity groups rather than a single fault candidate; hypothesis revision given new sensor evidence about the state of the system; and the capability to check for faults in a dynamic system without having to wait until the system is quiescent. The major benefits of this advanced health management technology are to increase mission duration and reliability through intelligent fault protection, and robust autonomous operations with reduced dependency on supervisory operations from Earth. The work-load for operators will be reduced by telemetry of processed state-of-health information rather than raw data. The long-term vision is that of making diagnosis available to the onboard planner or executive, allowing autonomy software to re-plan in order to work around known component failures. For a system that is expected to evolve substantially over its lifetime, as for the International Space Station, the model-based approach has definite advantages over rule-based expert systems and limit-checking fault protection systems, as these do not scale well. The model-based approach facilitates reuse of the L2 diagnostic software; only the model of the system to be diagnosed and telemetry monitoring software has to be rebuilt for a new system or expanded for a growing system. The hierarchical L2 model supports modularity and expendability, and as such is suitable solution for integrated system health management as envisioned for systems-of-systems.
Fault Tree Analysis Application for Safety and Reliability
NASA Technical Reports Server (NTRS)
Wallace, Dolores R.
2003-01-01
Many commercial software tools exist for fault tree analysis (FTA), an accepted method for mitigating risk in systems. The method embedded in the tools identifies a root as use in system components, but when software is identified as a root cause, it does not build trees into the software component. No commercial software tools have been built specifically for development and analysis of software fault trees. Research indicates that the methods of FTA could be applied to software, but the method is not practical without automated tool support. With appropriate automated tool support, software fault tree analysis (SFTA) may be a practical technique for identifying the underlying cause of software faults that may lead to critical system failures. We strive to demonstrate that existing commercial tools for FTA can be adapted for use with SFTA, and that applied to a safety-critical system, SFTA can be used to identify serious potential problems long before integrator and system testing.
The cost of software fault tolerance
NASA Technical Reports Server (NTRS)
Migneault, G. E.
1982-01-01
The proposed use of software fault tolerance techniques as a means of reducing software costs in avionics and as a means of addressing the issue of system unreliability due to faults in software is examined. A model is developed to provide a view of the relationships among cost, redundancy, and reliability which suggests strategies for software development and maintenance which are not conventional.
Study of fault tolerant software technology for dynamic systems
NASA Technical Reports Server (NTRS)
Caglayan, A. K.; Zacharias, G. L.
1985-01-01
The major aim of this study is to investigate the feasibility of using systems-based failure detection isolation and compensation (FDIC) techniques in building fault-tolerant software and extending them, whenever possible, to the domain of software fault tolerance. First, it is shown that systems-based FDIC methods can be extended to develop software error detection techniques by using system models for software modules. In particular, it is demonstrated that systems-based FDIC techniques can yield consistency checks that are easier to implement than acceptance tests based on software specifications. Next, it is shown that systems-based failure compensation techniques can be generalized to the domain of software fault tolerance in developing software error recovery procedures. Finally, the feasibility of using fault-tolerant software in flight software is investigated. In particular, possible system and version instabilities, and functional performance degradation that may occur in N-Version programming applications to flight software are illustrated. Finally, a comparative analysis of N-Version and recovery block techniques in the context of generic blocks in flight software is presented.
Study of fault-tolerant software technology
NASA Technical Reports Server (NTRS)
Slivinski, T.; Broglio, C.; Wild, C.; Goldberg, J.; Levitt, K.; Hitt, E.; Webb, J.
1984-01-01
Presented is an overview of the current state of the art of fault-tolerant software and an analysis of quantitative techniques and models developed to assess its impact. It examines research efforts as well as experience gained from commercial application of these techniques. The paper also addresses the computer architecture and design implications on hardware, operating systems and programming languages (including Ada) of using fault-tolerant software in real-time aerospace applications. It concludes that fault-tolerant software has progressed beyond the pure research state. The paper also finds that, although not perfectly matched, newer architectural and language capabilities provide many of the notations and functions needed to effectively and efficiently implement software fault-tolerance.
Contingency Software in Autonomous Systems
NASA Technical Reports Server (NTRS)
Lutz, Robyn; Patterson-Hine, Ann
2006-01-01
This viewgraph presentation reviews the development of contingency software for autonomous systems. Autonomous vehicles currently have a limited capacity to diagnose and mitigate failures. There is a need to be able to handle a broader range of contingencies. The goals of the project are: 1. Speed up diagnosis and mitigation of anomalous situations.2.Automatically handle contingencies, not just failures.3.Enable projects to select a degree of autonomy consistent with their needs and to incrementally introduce more autonomy.4.Augment on-board fault protection with verified contingency scripts
NASA Technical Reports Server (NTRS)
Padilla, Peter A.
1991-01-01
An investigation was made in AIRLAB of the fault handling performance of the Fault Tolerant MultiProcessor (FTMP). Fault handling errors detected during fault injection experiments were characterized. In these fault injection experiments, the FTMP disabled a working unit instead of the faulted unit once in every 500 faults, on the average. System design weaknesses allow active faults to exercise a part of the fault management software that handles Byzantine or lying faults. Byzantine faults behave such that the faulted unit points to a working unit as the source of errors. The design's problems involve: (1) the design and interface between the simplex error detection hardware and the error processing software, (2) the functional capabilities of the FTMP system bus, and (3) the communication requirements of a multiprocessor architecture. These weak areas in the FTMP's design increase the probability that, for any hardware fault, a good line replacement unit (LRU) is mistakenly disabled by the fault management software.
Detection of faults and software reliability analysis
NASA Technical Reports Server (NTRS)
Knight, J. C.
1986-01-01
Multiversion or N-version programming was proposed as a method of providing fault tolerance in software. The approach requires the separate, independent preparation of multiple versions of a piece of software for some application. Specific topics addressed are: failure probabilities in N-version systems, consistent comparison in N-version systems, descriptions of the faults found in the Knight and Leveson experiment, analytic models of comparison testing, characteristics of the input regions that trigger faults, fault tolerance through data diversity, and the relationship between failures caused by automatically seeded faults.
Coordinated Fault Tolerance for High-Performance Computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dongarra, Jack; Bosilca, George; et al.
2013-04-08
Our work to meet our goal of end-to-end fault tolerance has focused on two areas: (1) improving fault tolerance in various software currently available and widely used throughout the HEC domain and (2) using fault information exchange and coordination to achieve holistic, systemwide fault tolerance and understanding how to design and implement interfaces for integrating fault tolerance features for multiple layers of the software stack—from the application, math libraries, and programming language runtime to other common system software such as jobs schedulers, resource managers, and monitoring tools.
Model Transformation for a System of Systems Dependability Safety Case
NASA Technical Reports Server (NTRS)
Murphy, Judy; Driskell, Stephen B.
2010-01-01
Software plays an increasingly larger role in all aspects of NASA's science missions. This has been extended to the identification, management and control of faults which affect safety-critical functions and by default, the overall success of the mission. Traditionally, the analysis of fault identification, management and control are hardware based. Due to the increasing complexity of system, there has been a corresponding increase in the complexity in fault management software. The NASA Independent Validation & Verification (IV&V) program is creating processes and procedures to identify, and incorporate safety-critical software requirements along with corresponding software faults so that potential hazards may be mitigated. This Specific to Generic ... A Case for Reuse paper describes the phases of a dependability and safety study which identifies a new, process to create a foundation for reusable assets. These assets support the identification and management of specific software faults and, their transformation from specific to generic software faults. This approach also has applications to other systems outside of the NASA environment. This paper addresses how a mission specific dependability and safety case is being transformed to a generic dependability and safety case which can be reused for any type of space mission with an emphasis on software fault conditions.
Preliminary design of the redundant software experiment
NASA Technical Reports Server (NTRS)
Campbell, Roy; Deimel, Lionel; Eckhardt, Dave, Jr.; Kelly, John; Knight, John; Lauterbach, Linda; Lee, Larry; Mcallister, Dave; Mchugh, John
1985-01-01
The goal of the present experiment is to characterize the fault distributions of highly reliable software replicates, constructed using techniques and environments which are similar to those used in comtemporary industrial software facilities. The fault distributions and their effect on the reliability of fault tolerant configurations of the software will be determined through extensive life testing of the replicates against carefully constructed randomly generated test data. Each detected error will be carefully analyzed to provide insight in to their nature and cause. A direct objective is to develop techniques for reducing the intensity of coincident errors, thus increasing the reliability gain which can be achieved with fault tolerance. Data on the reliability gains realized, and the cost of the fault tolerant configurations can be used to design a companion experiment to determine the cost effectiveness of the fault tolerant strategy. Finally, the data and analysis produced by this experiment will be valuable to the software engineering community as a whole because it will provide a useful insight into the nature and cause of hard to find, subtle faults which escape standard software engineering validation techniques and thus persist far into the software life cycle.
NASA Astrophysics Data System (ADS)
Arvind, Pratul
2012-11-01
The ability to identify and classify all ten types of faults in a distribution system is an important task for protection engineers. Unlike transmission system, distribution systems have a complex configuration and are subjected to frequent faults. In the present work, an algorithm has been developed for identifying all ten types of faults in a distribution system by collecting current samples at the substation end. The samples are subjected to wavelet packet transform and artificial neural network in order to yield better classification results. A comparison of results between wavelet transform and wavelet packet transform is also presented thereby justifying the feature extracted from wavelet packet transform yields promising results. It should also be noted that current samples are collected after simulating a 25kv distribution system in PSCAD software.
Software fault tolerance for real-time avionics systems
NASA Technical Reports Server (NTRS)
Anderson, T.; Knight, J. C.
1983-01-01
Avionics systems have very high reliability requirements and are therefore prime candidates for the inclusion of fault tolerance techniques. In order to provide tolerance to software faults, some form of state restoration is usually advocated as a means of recovery. State restoration can be very expensive for systems which utilize concurrent processes. The concurrency present in most avionics systems and the further difficulties introduced by timing constraints imply that providing tolerance for software faults may be inordinately expensive or complex. A straightforward pragmatic approach to software fault tolerance which is believed to be applicable to many real-time avionics systems is proposed. A classification system for software errors is presented together with approaches to recovery and continued service for each error type.
Advanced Protection & Service Restoration for FREEDM Systems
NASA Astrophysics Data System (ADS)
Singh, Urvir
A smart electric power distribution system (FREEDM system) that incorporates DERs (Distributed Energy Resources), SSTs (Solid State Transformers - that can limit the fault current to two times of the rated current) & RSC (Reliable & Secure Communication) capabilities has been studied in this work in order to develop its appropriate protection & service restoration techniques. First, a solution is proposed that can make conventional protective devices be able to provide effective protection for FREEDM systems. Results show that although this scheme can provide required protection but it can be quite slow. Using the FREEDM system's communication capabilities, a communication assisted Overcurrent (O/C) protection scheme is proposed & results show that by using communication (blocking signals) very fast operating times are achieved thereby, mitigating the problem of conventional O/C scheme. Using the FREEDM System's DGI (Distributed Grid Intelligence) capability, an automated FLISR (Fault Location, Isolation & Service Restoration) scheme is proposed that is based on the concept of 'software agents' & uses lesser data (than conventional centralized approaches). Test results illustrated that this scheme is able to provide a global optimal system reconfiguration for service restoration.
Design study of Software-Implemented Fault-Tolerance (SIFT) computer
NASA Technical Reports Server (NTRS)
Wensley, J. H.; Goldberg, J.; Green, M. W.; Kutz, W. H.; Levitt, K. N.; Mills, M. E.; Shostak, R. E.; Whiting-Okeefe, P. M.; Zeidler, H. M.
1982-01-01
Software-implemented fault tolerant (SIFT) computer design for commercial aviation is reported. A SIFT design concept is addressed. Alternate strategies for physical implementation are considered. Hardware and software design correctness is addressed. System modeling and effectiveness evaluation are considered from a fault-tolerant point of view.
A second generation experiment in fault-tolerant software
NASA Technical Reports Server (NTRS)
Knight, J. C.
1986-01-01
The primary goal was to determine whether the application of fault tolerance to software increases its reliability if the cost of production is the same as for an equivalent nonfault tolerance version derived from the same requirements specification. Software development protocols are discussed. The feasibility of adapting to software design fault tolerance the technique of N-fold Modular Redundancy with majority voting was studied.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lumsdaine, Andrew
2013-03-08
The main purpose of the Coordinated Infrastructure for Fault Tolerance in Systems initiative has been to conduct research with a goal of providing end-to-end fault tolerance on a systemwide basis for applications and other system software. While fault tolerance has been an integral part of most high-performance computing (HPC) system software developed over the past decade, it has been treated mostly as a collection of isolated stovepipes. Visibility and response to faults has typically been limited to the particular hardware and software subsystems in which they are initially observed. Little fault information is shared across subsystems, allowing little flexibility ormore » control on a system-wide basis, making it practically impossible to provide cohesive end-to-end fault tolerance in support of scientific applications. As an example, consider faults such as communication link failures that can be seen by a network library but are not directly visible to the job scheduler, or consider faults related to node failures that can be detected by system monitoring software but are not inherently visible to the resource manager. If information about such faults could be shared by the network libraries or monitoring software, then other system software, such as a resource manager or job scheduler, could ensure that failed nodes or failed network links were excluded from further job allocations and that further diagnosis could be performed. As a founding member and one of the lead developers of the Open MPI project, our efforts over the course of this project have been focused on making Open MPI more robust to failures by supporting various fault tolerance techniques, and using fault information exchange and coordination between MPI and the HPC system software stack from the application, numeric libraries, and programming language runtime to other common system components such as jobs schedulers, resource managers, and monitoring tools.« less
Coordinated Fault-Tolerance for High-Performance Computing Final Project Report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Panda, Dhabaleswar Kumar; Beckman, Pete
2011-07-28
With the Coordinated Infrastructure for Fault Tolerance Systems (CIFTS, as the original project came to be called) project, our aim has been to understand and tackle the following broad research questions, the answers to which will help the HEC community analyze and shape the direction of research in the field of fault tolerance and resiliency on future high-end leadership systems. Will availability of global fault information, obtained by fault information exchange between the different HEC software on a system, allow individual system software to better detect, diagnose, and adaptively respond to faults? If fault-awareness is raised throughout the system throughmore » fault information exchange, is it possible to get all system software working together to provide a more comprehensive end-to-end fault management on the system? What are the missing fault-tolerance features that widely used HEC system software lacks today that would inhibit such software from taking advantage of systemwide global fault information? What are the practical limitations of a systemwide approach for end-to-end fault management based on fault awareness and coordination? What mechanisms, tools, and technologies are needed to bring about fault awareness and coordination of responses on a leadership-class system? What standards, outreach, and community interaction are needed for adoption of the concept of fault awareness and coordination for fault management on future systems? Keeping our overall objectives in mind, the CIFTS team has taken a parallel fourfold approach. Our central goal was to design and implement a light-weight, scalable infrastructure with a simple, standardized interface to allow communication of fault-related information through the system and facilitate coordinated responses. This work led to the development of the Fault Tolerance Backplane (FTB) publish-subscribe API specification, together with a reference implementation and several experimental implementations on top of existing publish-subscribe tools. We enhanced the intrinsic fault tolerance capabilities representative implementations of a variety of key HPC software subsystems and integrated them with the FTB. Targeting software subsystems included: MPI communication libraries, checkpoint/restart libraries, resource managers and job schedulers, and system monitoring tools. Leveraging the aforementioned infrastructure, as well as developing and utilizing additional tools, we have examined issues associated with expanded, end-to-end fault response from both system and application viewpoints. From the standpoint of system operations, we have investigated log and root cause analysis, anomaly detection and fault prediction, and generalized notification mechanisms. Our applications work has included libraries for fault-tolerance linear algebra, application frameworks for coupled multiphysics applications, and external frameworks to support the monitoring and response for general applications. Our final goal was to engage the high-end computing community to increase awareness of tools and issues around coordinated end-to-end fault management.« less
Experiments in fault tolerant software reliability
NASA Technical Reports Server (NTRS)
Mcallister, David F.; Tai, K. C.; Vouk, Mladen A.
1987-01-01
The reliability of voting was evaluated in a fault-tolerant software system for small output spaces. The effectiveness of the back-to-back testing process was investigated. Version 3.0 of the RSDIMU-ATS, a semi-automated test bed for certification testing of RSDIMU software, was prepared and distributed. Software reliability estimation methods based on non-random sampling are being studied. The investigation of existing fault-tolerance models was continued and formulation of new models was initiated.
Hierarchical Simulation to Assess Hardware and Software Dependability
NASA Technical Reports Server (NTRS)
Ries, Gregory Lawrence
1997-01-01
This thesis presents a method for conducting hierarchical simulations to assess system hardware and software dependability. The method is intended to model embedded microprocessor systems. A key contribution of the thesis is the idea of using fault dictionaries to propagate fault effects upward from the level of abstraction where a fault model is assumed to the system level where the ultimate impact of the fault is observed. A second important contribution is the analysis of the software behavior under faults as well as the hardware behavior. The simulation method is demonstrated and validated in four case studies analyzing Myrinet, a commercial, high-speed networking system. One key result from the case studies shows that the simulation method predicts the same fault impact 87.5% of the time as is obtained by similar fault injections into a real Myrinet system. Reasons for the remaining discrepancy are examined in the thesis. A second key result shows the reduction in the number of simulations needed due to the fault dictionary method. In one case study, 500 faults were injected at the chip level, but only 255 propagated to the system level. Of these 255 faults, 110 shared identical fault dictionary entries at the system level and so did not need to be resimulated. The necessary number of system-level simulations was therefore reduced from 500 to 145. Finally, the case studies show how the simulation method can be used to improve the dependability of the target system. The simulation analysis was used to add recovery to the target software for the most common fault propagation mechanisms that would cause the software to hang. After the modification, the number of hangs was reduced by 60% for fault injections into the real system.
Software reliability through fault-avoidance and fault-tolerance
NASA Technical Reports Server (NTRS)
Vouk, Mladen A.; Mcallister, David F.
1993-01-01
Strategies and tools for the testing, risk assessment and risk control of dependable software-based systems were developed. Part of this project consists of studies to enable the transfer of technology to industry, for example the risk management techniques for safety-concious systems. Theoretical investigations of Boolean and Relational Operator (BRO) testing strategy were conducted for condition-based testing. The Basic Graph Generation and Analysis tool (BGG) was extended to fully incorporate several variants of the BRO metric. Single- and multi-phase risk, coverage and time-based models are being developed to provide additional theoretical and empirical basis for estimation of the reliability and availability of large, highly dependable software. A model for software process and risk management was developed. The use of cause-effect graphing for software specification and validation was investigated. Lastly, advanced software fault-tolerance models were studied to provide alternatives and improvements in situations where simple software fault-tolerance strategies break down.
Model-Based Verification and Validation of Spacecraft Avionics
NASA Technical Reports Server (NTRS)
Khan, M. Omair; Sievers, Michael; Standley, Shaun
2012-01-01
Verification and Validation (V&V) at JPL is traditionally performed on flight or flight-like hardware running flight software. For some time, the complexity of avionics has increased exponentially while the time allocated for system integration and associated V&V testing has remained fixed. There is an increasing need to perform comprehensive system level V&V using modeling and simulation, and to use scarce hardware testing time to validate models; the norm for thermal and structural V&V for some time. Our approach extends model-based V&V to electronics and software through functional and structural models implemented in SysML. We develop component models of electronics and software that are validated by comparison with test results from actual equipment. The models are then simulated enabling a more complete set of test cases than possible on flight hardware. SysML simulations provide access and control of internal nodes that may not be available in physical systems. This is particularly helpful in testing fault protection behaviors when injecting faults is either not possible or potentially damaging to the hardware. We can also model both hardware and software behaviors in SysML, which allows us to simulate hardware and software interactions. With an integrated model and simulation capability we can evaluate the hardware and software interactions and identify problems sooner. The primary missing piece is validating SysML model correctness against hardware; this experiment demonstrated such an approach is possible.
Software reliability models for fault-tolerant avionics computers and related topics
NASA Technical Reports Server (NTRS)
Miller, Douglas R.
1987-01-01
Software reliability research is briefly described. General research topics are reliability growth models, quality of software reliability prediction, the complete monotonicity property of reliability growth, conceptual modelling of software failure behavior, assurance of ultrahigh reliability, and analysis techniques for fault-tolerant systems.
Study of a unified hardware and software fault-tolerant architecture
NASA Technical Reports Server (NTRS)
Lala, Jaynarayan; Alger, Linda; Friend, Steven; Greeley, Gregory; Sacco, Stephen; Adams, Stuart
1989-01-01
A unified architectural concept, called the Fault Tolerant Processor Attached Processor (FTP-AP), that can tolerate hardware as well as software faults is proposed for applications requiring ultrareliable computation capability. An emulation of the FTP-AP architecture, consisting of a breadboard Motorola 68010-based quadruply redundant Fault Tolerant Processor, four VAX 750s as attached processors, and four versions of a transport aircraft yaw damper control law, is used as a testbed in the AIRLAB to examine a number of critical issues. Solutions of several basic problems associated with N-Version software are proposed and implemented on the testbed. This includes a confidence voter to resolve coincident errors in N-Version software. A reliability model of N-Version software that is based upon the recent understanding of software failure mechanisms is also developed. The basic FTP-AP architectural concept appears suitable for hosting N-Version application software while at the same time tolerating hardware failures. Architectural enhancements for greater efficiency, software reliability modeling, and N-Version issues that merit further research are identified.
Analyzing Software Errors in Safety-Critical Embedded Systems
NASA Technical Reports Server (NTRS)
Lutz, Robyn R.
1994-01-01
This paper analyzes the root causes of safty-related software faults identified as potentially hazardous to the system are distributed somewhat differently over the set of possible error causes than non-safety-related software faults.
Development and validation of techniques for improving software dependability
NASA Technical Reports Server (NTRS)
Knight, John C.
1992-01-01
A collection of document abstracts are presented on the topic of improving software dependability through NASA grant NAG-1-1123. Specific topics include: modeling of error detection; software inspection; test cases; Magnetic Stereotaxis System safety specifications and fault trees; and injection of synthetic faults into software.
An experiment in software reliability
NASA Technical Reports Server (NTRS)
Dunham, J. R.; Pierce, J. L.
1986-01-01
The results of a software reliability experiment conducted in a controlled laboratory setting are reported. The experiment was undertaken to gather data on software failures and is one in a series of experiments being pursued by the Fault Tolerant Systems Branch of NASA Langley Research Center to find a means of credibly performing reliability evaluations of flight control software. The experiment tests a small sample of implementations of radar tracking software having ultra-reliability requirements and uses n-version programming for error detection, and repetitive run modeling for failure and fault rate estimation. The experiment results agree with those of Nagel and Skrivan in that the program error rates suggest an approximate log-linear pattern and the individual faults occurred with significantly different error rates. Additional analysis of the experimental data raises new questions concerning the phenomenon of interacting faults. This phenomenon may provide one explanation for software reliability decay.
Use of Field Programmable Gate Array Technology in Future Space Avionics
NASA Technical Reports Server (NTRS)
Ferguson, Roscoe C.; Tate, Robert
2005-01-01
Fulfilling NASA's new vision for space exploration requires the development of sustainable, flexible and fault tolerant spacecraft control systems. The traditional development paradigm consists of the purchase or fabrication of hardware boards with fixed processor and/or Digital Signal Processing (DSP) components interconnected via a standardized bus system. This is followed by the purchase and/or development of software. This paradigm has several disadvantages for the development of systems to support NASA's new vision. Building a system to be fault tolerant increases the complexity and decreases the performance of included software. Standard bus design and conventional implementation produces natural bottlenecks. Configuring hardware components in systems containing common processors and DSPs is difficult initially and expensive or impossible to change later. The existence of Hardware Description Languages (HDLs), the recent increase in performance, density and radiation tolerance of Field Programmable Gate Arrays (FPGAs), and Intellectual Property (IP) Cores provides the technology for reprogrammable Systems on a Chip (SOC). This technology supports a paradigm better suited for NASA's vision. Hardware and software production are melded for more effective development; they can both evolve together over time. Designers incorporating this technology into future avionics can benefit from its flexibility. Systems can be designed with improved fault isolation and tolerance using hardware instead of software. Also, these designs can be protected from obsolescence problems where maintenance is compromised via component and vendor availability.To investigate the flexibility of this technology, the core of the Central Processing Unit and Input/Output Processor of the Space Shuttle AP101S Computer were prototyped in Verilog HDL and synthesized into an Altera Stratix FPGA.
A coverage and slicing dependencies analysis for seeking software security defects.
He, Hui; Zhang, Dongyan; Liu, Min; Zhang, Weizhe; Gao, Dongmin
2014-01-01
Software security defects have a serious impact on the software quality and reliability. It is a major hidden danger for the operation of a system that a software system has some security flaws. When the scale of the software increases, its vulnerability has becoming much more difficult to find out. Once these vulnerabilities are exploited, it may lead to great loss. In this situation, the concept of Software Assurance is carried out by some experts. And the automated fault localization technique is a part of the research of Software Assurance. Currently, automated fault localization method includes coverage based fault localization (CBFL) and program slicing. Both of the methods have their own location advantages and defects. In this paper, we have put forward a new method, named Reverse Data Dependence Analysis Model, which integrates the two methods by analyzing the program structure. On this basis, we finally proposed a new automated fault localization method. This method not only is automation lossless but also changes the basic location unit into single sentence, which makes the location effect more accurate. Through several experiments, we proved that our method is more effective. Furthermore, we analyzed the effectiveness among these existing methods and different faults.
Fault recovery characteristics of the fault tolerant multi-processor
NASA Technical Reports Server (NTRS)
Padilla, Peter A.
1990-01-01
The fault handling performance of the fault tolerant multiprocessor (FTMP) was investigated. Fault handling errors detected during fault injection experiments were characterized. In these fault injection experiments, the FTMP disabled a working unit instead of the faulted unit once every 500 faults, on the average. System design weaknesses allow active faults to exercise a part of the fault management software that handles byzantine or lying faults. It is pointed out that these weak areas in the FTMP's design increase the probability that, for any hardware fault, a good LRU (line replaceable unit) is mistakenly disabled by the fault management software. It is concluded that fault injection can help detect and analyze the behavior of a system in the ultra-reliable regime. Although fault injection testing cannot be exhaustive, it has been demonstrated that it provides a unique capability to unmask problems and to characterize the behavior of a fault-tolerant system.
Onboard Science Data Analysis: Opportunities, Benefits, and Effects on Mission Design
NASA Technical Reports Server (NTRS)
Stolorz, P.; Cheeseman, P.
1998-01-01
Much of the initial focus for spacecraft autonomy has been on developing new software and systems concepts to automate engineering functions of the spacecraft: guidance, navigation and control, fault protection, and resources management. However, the ultimate objectives of NASA missions are science objectives, which implies that we need a new framework for perfoming science data evaluation and observation planning autonomously onboard spacecraft.
NASA Technical Reports Server (NTRS)
1974-01-01
Studies were conducted to develop appropriate space shuttle electrical power distribution and control (EPDC) subsystem simulation models and to apply the computer simulations to systems analysis of the EPDC. A previously developed software program (SYSTID) was adapted for this purpose. The following objectives were attained: (1) significant enhancement of the SYSTID time domain simulation software, (2) generation of functionally useful shuttle EPDC element models, and (3) illustrative simulation results in the analysis of EPDC performance, under the conditions of fault, current pulse injection due to lightning, and circuit protection sizing and reaction times.
NHPP-Based Software Reliability Models Using Equilibrium Distribution
NASA Astrophysics Data System (ADS)
Xiao, Xiao; Okamura, Hiroyuki; Dohi, Tadashi
Non-homogeneous Poisson processes (NHPPs) have gained much popularity in actual software testing phases to estimate the software reliability, the number of remaining faults in software and the software release timing. In this paper, we propose a new modeling approach for the NHPP-based software reliability models (SRMs) to describe the stochastic behavior of software fault-detection processes. The fundamental idea is to apply the equilibrium distribution to the fault-detection time distribution in NHPP-based modeling. We also develop efficient parameter estimation procedures for the proposed NHPP-based SRMs. Through numerical experiments, it can be concluded that the proposed NHPP-based SRMs outperform the existing ones in many data sets from the perspective of goodness-of-fit and prediction performance.
A study of fault prediction and reliability assessment in the SEL environment
NASA Technical Reports Server (NTRS)
Basili, Victor R.; Patnaik, Debabrata
1986-01-01
An empirical study on estimation and prediction of faults, prediction of fault detection and correction effort, and reliability assessment in the Software Engineering Laboratory environment (SEL) is presented. Fault estimation using empirical relationships and fault prediction using curve fitting method are investigated. Relationships between debugging efforts (fault detection and correction effort) in different test phases are provided, in order to make an early estimate of future debugging effort. This study concludes with the fault analysis, application of a reliability model, and analysis of a normalized metric for reliability assessment and reliability monitoring during development of software.
Measurement and analysis of operating system fault tolerance
NASA Technical Reports Server (NTRS)
Lee, I.; Tang, D.; Iyer, R. K.
1992-01-01
This paper demonstrates a methodology to model and evaluate the fault tolerance characteristics of operational software. The methodology is illustrated through case studies on three different operating systems: the Tandem GUARDIAN fault-tolerant system, the VAX/VMS distributed system, and the IBM/MVS system. Measurements are made on these systems for substantial periods to collect software error and recovery data. In addition to investigating basic dependability characteristics such as major software problems and error distributions, we develop two levels of models to describe error and recovery processes inside an operating system and on multiple instances of an operating system running in a distributed environment. Based on the models, reward analysis is conducted to evaluate the loss of service due to software errors and the effect of the fault-tolerance techniques implemented in the systems. Software error correlation in multicomputer systems is also investigated.
Development and analysis of the Software Implemented Fault-Tolerance (SIFT) computer
NASA Technical Reports Server (NTRS)
Goldberg, J.; Kautz, W. H.; Melliar-Smith, P. M.; Green, M. W.; Levitt, K. N.; Schwartz, R. L.; Weinstock, C. B.
1984-01-01
SIFT (Software Implemented Fault Tolerance) is an experimental, fault-tolerant computer system designed to meet the extreme reliability requirements for safety-critical functions in advanced aircraft. Errors are masked by performing a majority voting operation over the results of identical computations, and faulty processors are removed from service by reassigning computations to the nonfaulty processors. This scheme has been implemented in a special architecture using a set of standard Bendix BDX930 processors, augmented by a special asynchronous-broadcast communication interface that provides direct, processor to processor communication among all processors. Fault isolation is accomplished in hardware; all other fault-tolerance functions, together with scheduling and synchronization are implemented exclusively by executive system software. The system reliability is predicted by a Markov model. Mathematical consistency of the system software with respect to the reliability model has been partially verified, using recently developed tools for machine-aided proof of program correctness.
Analyzing and Predicting Effort Associated with Finding and Fixing Software Faults
NASA Technical Reports Server (NTRS)
Hamill, Maggie; Goseva-Popstojanova, Katerina
2016-01-01
Context: Software developers spend a significant amount of time fixing faults. However, not many papers have addressed the actual effort needed to fix software faults. Objective: The objective of this paper is twofold: (1) analysis of the effort needed to fix software faults and how it was affected by several factors and (2) prediction of the level of fix implementation effort based on the information provided in software change requests. Method: The work is based on data related to 1200 failures, extracted from the change tracking system of a large NASA mission. The analysis includes descriptive and inferential statistics. Predictions are made using three supervised machine learning algorithms and three sampling techniques aimed at addressing the imbalanced data problem. Results: Our results show that (1) 83% of the total fix implementation effort was associated with only 20% of failures. (2) Both safety critical failures and post-release failures required three times more effort to fix compared to non-critical and pre-release counterparts, respectively. (3) Failures with fixes spread across multiple components or across multiple types of software artifacts required more effort. The spread across artifacts was more costly than spread across components. (4) Surprisingly, some types of faults associated with later life-cycle activities did not require significant effort. (5) The level of fix implementation effort was predicted with 73% overall accuracy using the original, imbalanced data. Using oversampling techniques improved the overall accuracy up to 77%. More importantly, oversampling significantly improved the prediction of the high level effort, from 31% to around 85%. Conclusions: This paper shows the importance of tying software failures to changes made to fix all associated faults, in one or more software components and/or in one or more software artifacts, and the benefit of studying how the spread of faults and other factors affect the fix implementation effort.
Transient Faults in Computer Systems
NASA Technical Reports Server (NTRS)
Masson, Gerald M.
1993-01-01
A powerful technique particularly appropriate for the detection of errors caused by transient faults in computer systems was developed. The technique can be implemented in either software or hardware; the research conducted thus far primarily considered software implementations. The error detection technique developed has the distinct advantage of having provably complete coverage of all errors caused by transient faults that affect the output produced by the execution of a program. In other words, the technique does not have to be tuned to a particular error model to enhance error coverage. Also, the correctness of the technique can be formally verified. The technique uses time and software redundancy. The foundation for an effective, low-overhead, software-based certification trail approach to real-time error detection resulting from transient fault phenomena was developed.
Copilot: Monitoring Embedded Systems
NASA Technical Reports Server (NTRS)
Pike, Lee; Wegmann, Nis; Niller, Sebastian; Goodloe, Alwyn
2012-01-01
Runtime verification (RV) is a natural fit for ultra-critical systems, where correctness is imperative. In ultra-critical systems, even if the software is fault-free, because of the inherent unreliability of commodity hardware and the adversity of operational environments, processing units (and their hosted software) are replicated, and fault-tolerant algorithms are used to compare the outputs. We investigate both software monitoring in distributed fault-tolerant systems, as well as implementing fault-tolerance mechanisms using RV techniques. We describe the Copilot language and compiler, specifically designed for generating monitors for distributed, hard real-time systems. We also describe two case-studies in which we generated Copilot monitors in avionics systems.
A Generalised Fault Protection Structure Proposed for Uni-grounded Low-Voltage AC Microgrids
NASA Astrophysics Data System (ADS)
Bui, Duong Minh; Chen, Shi-Lin; Lien, Keng-Yu; Jiang, Jheng-Lun
2016-04-01
This paper presents three main configurations of uni-grounded low-voltage AC microgrids. Transient situations of a uni-grounded low-voltage (LV) AC microgrid (MG) are simulated through various fault tests and operation transition tests between grid-connected and islanded modes. Based on transient simulation results, available fault protection methods are proposed for main and back-up protection of a uni-grounded AC microgrid. In addition, concept of a generalised fault protection structure of uni-grounded LVAC MGs is mentioned in the paper. As a result, main contributions of the paper are: (i) definition of different uni-grounded LVAC MG configurations; (ii) analysing transient responses of a uni-grounded LVAC microgrid through line-to-line faults, line-to-ground faults, three-phase faults and a microgrid operation transition test, (iii) proposing available fault protection methods for uni-grounded microgrids, such as: non-directional or directional overcurrent protection, under/over voltage protection, differential current protection, voltage-restrained overcurrent protection, and other fault protection principles not based on phase currents and voltages (e.g. total harmonic distortion detection of currents and voltages, using sequence components of current and voltage, 3I0 or 3V0 components), and (iv) developing a generalised fault protection structure with six individual protection zones to be suitable for different uni-grounded AC MG configurations.
1988-08-20
34 William A. Link, Patuxent Wildlife Research Center "Increasing reliability of multiversion fault-tolerant software design by modulation," Junryo 3... Multiversion lault-Tolerant Software Design by Modularization Junryo Miyashita Department of Computer Science California state University at san Bernardino Fault...They shall beE refered to as " multiversion fault-tolerant software design". Onel problem of developing multi-versions of a program is the high cost
User's guide to programming fault injection and data acquisition in the SIFT environment
NASA Technical Reports Server (NTRS)
Elks, Carl R.; Green, David F.; Palumbo, Daniel L.
1987-01-01
Described are the features, command language, and functional design of the SIFT (Software Implemented Fault Tolerance) fault injection and data acquisition interface software. The document is also intended to assist and guide the SIFT user in defining, developing, and executing SIFT fault injection experiments and the subsequent collection and reduction of that fault injection data. It is also intended to be used in conjunction with the SIFT User's Guide (NASA Technical Memorandum 86289) for reference to SIFT system commands, procedures and functions, and overall guidance in SIFT system programming.
Protection of Renewable-dominated Microgrids: Challenges and Potential Solutions.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Elkhatib, Mohamed; Ellis, Abraham; Milan Biswal
keywords : Microgrid Protection, Impedance Relay, Signal Processing-based Fault Detec- tion, Networked Microgrids, Communication-Assisted Protection In this report we address the challenge of designing efficient protection system for inverter- dominated microgrids. These microgrids are characterised with limited fault current capacity as a result of current-limiting protection functions of inverters. Typically, inverters limit their fault contribution in sub-cycle time frame to as low as 1.1 per unit. As a result, overcurrent protection could fail completely to detect faults in inverter-dominated microgrids. As part of this project a detailed literature survey of existing and proposed microgrid protection schemes were conducted. The surveymore » concluded that there is a gap in the available microgrid protection methods. The only credible protection solution available in literature for low- fault inverter-dominated microgrids is the differential protection scheme which represents a robust transmission-grade protection solution but at a very high cost. Two non-overcurrent protection schemes were investigated as part of this project; impedance-based protection and transient-based protection. Impedance-based protection depends on monitoring impedance trajectories at feeder relays to detect faults. Two communication-based impedance-based protection schemes were developed. the first scheme utilizes directional elements and pilot signals to locate the fault. The second scheme depends on a Central Protection Unit that communicates with all feeder relays to locate the fault based on directional flags received from feeder relays. The later approach could potentially be adapted to protect networked microgrids and dynamic topology microgrids. Transient-based protection relies on analyzing high frequency transients to detect and locate faults. This approach is very promising but its implementation in the filed faces several challenges. For example, high frequency transients due to faults can be confused with transients due to other events such as capacitor switching. Additionally, while detecting faults by analyzing transients could be doable, locating faults based on analyzing transients is still an open question.« less
The Infeasibility of Experimental Quantification of Life-Critical Software Reliability
NASA Technical Reports Server (NTRS)
Butler, Ricky W.; Finelli, George B.
1991-01-01
This paper affirms that quantification of life-critical software reliability is infeasible using statistical methods whether applied to standard software or fault-tolerant software. The key assumption of software fault tolerance|separately programmed versions fail independently|is shown to be problematic. This assumption cannot be justified by experimentation in the ultra-reliability region and subjective arguments in its favor are not sufficiently strong to justify it as an axiom. Also, the implications of the recent multi-version software experiments support this affirmation.
Experimental analysis of computer system dependability
NASA Technical Reports Server (NTRS)
Iyer, Ravishankar, K.; Tang, Dong
1993-01-01
This paper reviews an area which has evolved over the past 15 years: experimental analysis of computer system dependability. Methodologies and advances are discussed for three basic approaches used in the area: simulated fault injection, physical fault injection, and measurement-based analysis. The three approaches are suited, respectively, to dependability evaluation in the three phases of a system's life: design phase, prototype phase, and operational phase. Before the discussion of these phases, several statistical techniques used in the area are introduced. For each phase, a classification of research methods or study topics is outlined, followed by discussion of these methods or topics as well as representative studies. The statistical techniques introduced include the estimation of parameters and confidence intervals, probability distribution characterization, and several multivariate analysis methods. Importance sampling, a statistical technique used to accelerate Monte Carlo simulation, is also introduced. The discussion of simulated fault injection covers electrical-level, logic-level, and function-level fault injection methods as well as representative simulation environments such as FOCUS and DEPEND. The discussion of physical fault injection covers hardware, software, and radiation fault injection methods as well as several software and hybrid tools including FIAT, FERARI, HYBRID, and FINE. The discussion of measurement-based analysis covers measurement and data processing techniques, basic error characterization, dependency analysis, Markov reward modeling, software-dependability, and fault diagnosis. The discussion involves several important issues studies in the area, including fault models, fast simulation techniques, workload/failure dependency, correlated failures, and software fault tolerance.
Fault Management Techniques in Human Spaceflight Operations
NASA Technical Reports Server (NTRS)
O'Hagan, Brian; Crocker, Alan
2006-01-01
This paper discusses human spaceflight fault management operations. Fault detection and response capabilities available in current US human spaceflight programs Space Shuttle and International Space Station are described while emphasizing system design impacts on operational techniques and constraints. Preflight and inflight processes along with products used to anticipate, mitigate and respond to failures are introduced. Examples of operational products used to support failure responses are presented. Possible improvements in the state of the art, as well as prioritization and success criteria for their implementation are proposed. This paper describes how the architecture of a command and control system impacts operations in areas such as the required fault response times, automated vs. manual fault responses, use of workarounds, etc. The architecture includes the use of redundancy at the system and software function level, software capabilities, use of intelligent or autonomous systems, number and severity of software defects, etc. This in turn drives which Caution and Warning (C&W) events should be annunciated, C&W event classification, operator display designs, crew training, flight control team training, and procedure development. Other factors impacting operations are the complexity of a system, skills needed to understand and operate a system, and the use of commonality vs. optimized solutions for software and responses. Fault detection, annunciation, safing responses, and recovery capabilities are explored using real examples to uncover underlying philosophies and constraints. These factors directly impact operations in that the crew and flight control team need to understand what happened, why it happened, what the system is doing, and what, if any, corrective actions they need to perform. If a fault results in multiple C&W events, or if several faults occur simultaneously, the root cause(s) of the fault(s), as well as their vehicle-wide impacts, must be determined in order to maintain situational awareness. This allows both automated and manual recovery operations to focus on the real cause of the fault(s). An appropriate balance must be struck between correcting the root cause failure and addressing the impacts of that fault on other vehicle components. Lastly, this paper presents a strategy for using lessons learned to improve the software, displays, and procedures in addition to determining what is a candidate for automation. Enabling technologies and techniques are identified to promote system evolution from one that requires manual fault responses to one that uses automation and autonomy where they are most effective. These considerations include the value in correcting software defects in a timely manner, automation of repetitive tasks, making time critical responses autonomous, etc. The paper recommends the appropriate use of intelligent systems to determine the root causes of faults and correctly identify separate unrelated faults.
Software fault tolerance in computer operating systems
NASA Technical Reports Server (NTRS)
Iyer, Ravishankar K.; Lee, Inhwan
1994-01-01
This chapter provides data and analysis of the dependability and fault tolerance for three operating systems: the Tandem/GUARDIAN fault-tolerant system, the VAX/VMS distributed system, and the IBM/MVS system. Based on measurements from these systems, basic software error characteristics are investigated. Fault tolerance in operating systems resulting from the use of process pairs and recovery routines is evaluated. Two levels of models are developed to analyze error and recovery processes inside an operating system and interactions among multiple instances of an operating system running in a distributed environment. The measurements show that the use of process pairs in Tandem systems, which was originally intended for tolerating hardware faults, allows the system to tolerate about 70% of defects in system software that result in processor failures. The loose coupling between processors which results in the backup execution (the processor state and the sequence of events occurring) being different from the original execution is a major reason for the measured software fault tolerance. The IBM/MVS system fault tolerance almost doubles when recovery routines are provided, in comparison to the case in which no recovery routines are available. However, even when recovery routines are provided, there is almost a 50% chance of system failure when critical system jobs are involved.
Diagnostics Tools Identify Faults Prior to Failure
NASA Technical Reports Server (NTRS)
2013-01-01
Through the SBIR program, Rochester, New York-based Impact Technologies LLC collaborated with Ames Research Center to commercialize the Center s Hybrid Diagnostic Engine, or HyDE, software. The fault detecting program is now incorporated into a software suite that identifies potential faults early in the design phase of systems ranging from printers to vehicles and robots, saving time and money.
Hardware Fault Simulator for Microprocessors
NASA Technical Reports Server (NTRS)
Hess, L. M.; Timoc, C. C.
1983-01-01
Breadboarded circuit is faster and more thorough than software simulator. Elementary fault simulator for AND gate uses three gates and shaft register to simulate stuck-at-one or stuck-at-zero conditions at inputs and output. Experimental results showed hardware fault simulator for microprocessor gave faster results than software simulator, by two orders of magnitude, with one test being applied every 4 microseconds.
Resilience Design Patterns: A Structured Approach to Resilience at Extreme Scale
Engelmann, Christian; Hukerikar, Saurabh
2017-09-01
Reliability is a serious concern for future extreme-scale high-performance computing (HPC) systems. Projections based on the current generation of HPC systems and technology roadmaps suggest the prevalence of very high fault rates in future systems. While the HPC community has developed various resilience solutions, application-level techniques as well as system-based solutions, the solution space remains fragmented. There are no formal methods and metrics to integrate the various HPC resilience techniques into composite solutions, nor are there methods to holistically evaluate the adequacy and efficacy of such solutions in terms of their protection coverage, and their performance \\& power efficiency characteristics.more » Additionally, few of the current approaches are portable to newer architectures and software environments that will be deployed on future systems. In this paper, we develop a structured approach to the design, evaluation and optimization of HPC resilience using the concept of design patterns. A design pattern is a general repeatable solution to a commonly occurring problem. We identify the problems caused by various types of faults, errors and failures in HPC systems and the techniques used to deal with these events. Each well-known solution that addresses a specific HPC resilience challenge is described in the form of a pattern. We develop a complete catalog of such resilience design patterns, which may be used by system architects, system software and tools developers, application programmers, as well as users and operators as essential building blocks when designing and deploying resilience solutions. We also develop a design framework that enhances a designer's understanding the opportunities for integrating multiple patterns across layers of the system stack and the important constraints during implementation of the individual patterns. It is also useful for defining mechanisms and interfaces to coordinate flexible fault management across hardware and software components. The resilience patterns and the design framework also enable exploration and evaluation of design alternatives and support optimization of the cost-benefit trade-offs among performance, protection coverage, and power consumption of resilience solutions. Here, the overall goal of this work is to establish a systematic methodology for the design and evaluation of resilience technologies in extreme-scale HPC systems that keep scientific applications running to a correct solution in a timely and cost-efficient manner despite frequent faults, errors, and failures of various types.« less
Resilience Design Patterns: A Structured Approach to Resilience at Extreme Scale
DOE Office of Scientific and Technical Information (OSTI.GOV)
Engelmann, Christian; Hukerikar, Saurabh
Reliability is a serious concern for future extreme-scale high-performance computing (HPC) systems. Projections based on the current generation of HPC systems and technology roadmaps suggest the prevalence of very high fault rates in future systems. While the HPC community has developed various resilience solutions, application-level techniques as well as system-based solutions, the solution space remains fragmented. There are no formal methods and metrics to integrate the various HPC resilience techniques into composite solutions, nor are there methods to holistically evaluate the adequacy and efficacy of such solutions in terms of their protection coverage, and their performance \\& power efficiency characteristics.more » Additionally, few of the current approaches are portable to newer architectures and software environments that will be deployed on future systems. In this paper, we develop a structured approach to the design, evaluation and optimization of HPC resilience using the concept of design patterns. A design pattern is a general repeatable solution to a commonly occurring problem. We identify the problems caused by various types of faults, errors and failures in HPC systems and the techniques used to deal with these events. Each well-known solution that addresses a specific HPC resilience challenge is described in the form of a pattern. We develop a complete catalog of such resilience design patterns, which may be used by system architects, system software and tools developers, application programmers, as well as users and operators as essential building blocks when designing and deploying resilience solutions. We also develop a design framework that enhances a designer's understanding the opportunities for integrating multiple patterns across layers of the system stack and the important constraints during implementation of the individual patterns. It is also useful for defining mechanisms and interfaces to coordinate flexible fault management across hardware and software components. The resilience patterns and the design framework also enable exploration and evaluation of design alternatives and support optimization of the cost-benefit trade-offs among performance, protection coverage, and power consumption of resilience solutions. Here, the overall goal of this work is to establish a systematic methodology for the design and evaluation of resilience technologies in extreme-scale HPC systems that keep scientific applications running to a correct solution in a timely and cost-efficient manner despite frequent faults, errors, and failures of various types.« less
Protection of Renewable-dominated Microgrids: Challenges and Potential Solutions
DOE Office of Scientific and Technical Information (OSTI.GOV)
Elkhatib, Mohamed; Ellis, Abraham; Biswal, Milan
In this report we address the challenge of designing efficient protection system for inverter- dominated microgrids. These microgrids are characterised with limited fault current capacity as a result of current-limiting protection functions of inverters. Typically, inverters limit their fault contribution in sub-cycle time frame to as low as 1.1 per unit. As a result, overcurrent protection could fail completely to detect faults in inverter-dominated microgrids. As part of this project a detailed literature survey of existing and proposed microgrid protection schemes were conducted. The survey concluded that there is a gap in the available microgrid protection methods. The only crediblemore » protection solution available in literature for low- fault inverter-dominated microgrids is the differential protection scheme which represents a robust transmission-grade protection solution but at a very high cost. Two non-overcurrent protection schemes were investigated as part of this project; impedance-based protection and transient-based protection. Impedance-based protection depends on monitoring impedance trajectories at feeder relays to detect faults. Two communication-based impedance-based protection schemes were developed. the first scheme utilizes directional elements and pilot signals to locate the fault. The second scheme depends on a Central Protection Unit that communicates with all feeder relays to locate the fault based on directional flags received from feeder relays. The later approach could potentially be adapted to protect networked microgrids and dynamic topology microgrids. Transient-based protection relies on analyzing high frequency transients to detect and locate faults. This approach is very promising but its implementation in the filed faces several challenges. For example, high frequency transients due to faults can be confused with transients due to other events such as capacitor switching. Additionally, while detecting faults by analyzing transients could be doable, locating faults based on analyzing transients is still an open question.« less
NASA Technical Reports Server (NTRS)
Dunham, J. R. (Editor); Knight, J. C. (Editor)
1982-01-01
The state of the art in the production of crucial software for flight control applications was addressed. The association between reliability metrics and software is considered. Thirteen software development projects are discussed. A short term need for research in the areas of tool development and software fault tolerance was indicated. For the long term, research in format verification or proof methods was recommended. Formal specification and software reliability modeling, were recommended as topics for both short and long term research.
Algorithm-Based Fault Tolerance for Numerical Subroutines
NASA Technical Reports Server (NTRS)
Tumon, Michael; Granat, Robert; Lou, John
2007-01-01
A software library implements a new methodology of detecting faults in numerical subroutines, thus enabling application programs that contain the subroutines to recover transparently from single-event upsets. The software library in question is fault-detecting middleware that is wrapped around the numericalsubroutines. Conventional serial versions (based on LAPACK and FFTW) and a parallel version (based on ScaLAPACK) exist. The source code of the application program that contains the numerical subroutines is not modified, and the middleware is transparent to the user. The methodology used is a type of algorithm- based fault tolerance (ABFT). In ABFT, a checksum is computed before a computation and compared with the checksum of the computational result; an error is declared if the difference between the checksums exceeds some threshold. Novel normalization methods are used in the checksum comparison to ensure correct fault detections independent of algorithm inputs. In tests of this software reported in the peer-reviewed literature, this library was shown to enable detection of 99.9 percent of significant faults while generating no false alarms.
Applications of Logic Coverage Criteria and Logic Mutation to Software Testing
ERIC Educational Resources Information Center
Kaminski, Garrett K.
2011-01-01
Logic is an important component of software. Thus, software logic testing has enjoyed significant research over a period of decades, with renewed interest in the last several years. One approach to detecting logic faults is to create and execute tests that satisfy logic coverage criteria. Another approach to detecting faults is to perform mutation…
Attitude control fault protection - The Voyager experience
NASA Technical Reports Server (NTRS)
Litty, E. C.
1980-01-01
The length of the Voyager mission and the communication delay caused by the distances involved made fault protection a necessary part of the Voyager Attitude and Articulation Control Subsystem (AACS) design. An overview of the Voyager attitude control fault protection is given and flight experiences relating to fault protection are provided.
Runtime Verification in Context : Can Optimizing Error Detection Improve Fault Diagnosis
NASA Technical Reports Server (NTRS)
Dwyer, Matthew B.; Purandare, Rahul; Person, Suzette
2010-01-01
Runtime verification has primarily been developed and evaluated as a means of enriching the software testing process. While many researchers have pointed to its potential applicability in online approaches to software fault tolerance, there has been a dearth of work exploring the details of how that might be accomplished. In this paper, we describe how a component-oriented approach to software health management exposes the connections between program execution, error detection, fault diagnosis, and recovery. We identify both research challenges and opportunities in exploiting those connections. Specifically, we describe how recent approaches to reducing the overhead of runtime monitoring aimed at error detection might be adapted to reduce the overhead and improve the effectiveness of fault diagnosis.
Predeployment validation of fault-tolerant systems through software-implemented fault insertion
NASA Technical Reports Server (NTRS)
Czeck, Edward W.; Siewiorek, Daniel P.; Segall, Zary Z.
1989-01-01
Fault injection-based automated testing (FIAT) environment, which can be used to experimentally characterize and evaluate distributed realtime systems under fault-free and faulted conditions is described. A survey is presented of validation methodologies. The need for fault insertion based on validation methodologies is demonstrated. The origins and models of faults, and motivation for the FIAT concept are reviewed. FIAT employs a validation methodology which builds confidence in the system through first providing a baseline of fault-free performance data and then characterizing the behavior of the system with faults present. Fault insertion is accomplished through software and allows faults or the manifestation of faults to be inserted by either seeding faults into memory or triggering error detection mechanisms. FIAT is capable of emulating a variety of fault-tolerant strategies and architectures, can monitor system activity, and can automatically orchestrate experiments involving insertion of faults. There is a common system interface which allows ease of use to decrease experiment development and run time. Fault models chosen for experiments on FIAT have generated system responses which parallel those observed in real systems under faulty conditions. These capabilities are shown by two example experiments each using a different fault-tolerance strategy.
Advanced information processing system: Fault injection study and results
NASA Technical Reports Server (NTRS)
Burkhardt, Laura F.; Masotto, Thomas K.; Lala, Jaynarayan H.
1992-01-01
The objective of the AIPS program is to achieve a validated fault tolerant distributed computer system. The goals of the AIPS fault injection study were: (1) to present the fault injection study components addressing the AIPS validation objective; (2) to obtain feedback for fault removal from the design implementation; (3) to obtain statistical data regarding fault detection, isolation, and reconfiguration responses; and (4) to obtain data regarding the effects of faults on system performance. The parameters are described that must be varied to create a comprehensive set of fault injection tests, the subset of test cases selected, the test case measurements, and the test case execution. Both pin level hardware faults using a hardware fault injector and software injected memory mutations were used to test the system. An overview is provided of the hardware fault injector and the associated software used to carry out the experiments. Detailed specifications are given of fault and test results for the I/O Network and the AIPS Fault Tolerant Processor, respectively. The results are summarized and conclusions are given.
NASA Astrophysics Data System (ADS)
Litinski, Daniel; Kesselring, Markus S.; Eisert, Jens; von Oppen, Felix
2017-07-01
We present a scalable architecture for fault-tolerant topological quantum computation using networks of voltage-controlled Majorana Cooper pair boxes and topological color codes for error correction. Color codes have a set of transversal gates which coincides with the set of topologically protected gates in Majorana-based systems, namely, the Clifford gates. In this way, we establish color codes as providing a natural setting in which advantages offered by topological hardware can be combined with those arising from topological error-correcting software for full-fledged fault-tolerant quantum computing. We provide a complete description of our architecture, including the underlying physical ingredients. We start by showing that in topological superconductor networks, hexagonal cells can be employed to serve as physical qubits for universal quantum computation, and we present protocols for realizing topologically protected Clifford gates. These hexagonal-cell qubits allow for a direct implementation of open-boundary color codes with ancilla-free syndrome read-out and logical T gates via magic-state distillation. For concreteness, we describe how the necessary operations can be implemented using networks of Majorana Cooper pair boxes, and we give a feasibility estimate for error correction in this architecture. Our approach is motivated by nanowire-based networks of topological superconductors, but it could also be realized in alternative settings such as quantum-Hall-superconductor hybrids.
Experience report: Using formal methods for requirements analysis of critical spacecraft software
NASA Technical Reports Server (NTRS)
Lutz, Robyn R.; Ampo, Yoko
1994-01-01
Formal specification and analysis of requirements continues to gain support as a method for producing more reliable software. However, the introduction of formal methods to a large software project is difficult, due in part to the unfamiliarity of the specification languages and the lack of graphics. This paper reports results of an investigation into the effectiveness of formal methods as an aid to the requirements analysis of critical, system-level fault-protection software on a spacecraft currently under development. Our experience indicates that formal specification and analysis can enhance the accuracy of the requirements and add assurance prior to design development in this domain. The work described here is part of a larger, NASA-funded research project whose purpose is to use formal-methods techniques to improve the quality of software in space applications. The demonstration project described here is part of the effort to evaluate experimentally the effectiveness of supplementing traditional engineering approaches to requirements specification with the more rigorous specification and analysis available with formal methods.
Health management and controls for Earth-to-orbit propulsion systems
NASA Astrophysics Data System (ADS)
Bickford, R. L.
1995-03-01
Avionics and health management technologies increase the safety and reliability while decreasing the overall cost for Earth-to-orbit (ETO) propulsion systems. New ETO propulsion systems will depend on highly reliable fault tolerant flight avionics, advanced sensing systems and artificial intelligence aided software to ensure critical control, safety and maintenance requirements are met in a cost effective manner. Propulsion avionics consist of the engine controller, actuators, sensors, software and ground support elements. In addition to control and safety functions, these elements perform system monitoring for health management. Health management is enhanced by advanced sensing systems and algorithms which provide automated fault detection and enable adaptive control and/or maintenance approaches. Aerojet is developing advanced fault tolerant rocket engine controllers which provide very high levels of reliability. Smart sensors and software systems which significantly enhance fault coverage and enable automated operations are also under development. Smart sensing systems, such as flight capable plume spectrometers, have reached maturity in ground-based applications and are suitable for bridging to flight. Software to detect failed sensors has reached similar maturity. This paper will discuss fault detection and isolation for advanced rocket engine controllers as well as examples of advanced sensing systems and software which significantly improve component failure detection for engine system safety and health management.
Verification of an IGBT Fusing Switch for Over-current Protection of the SNS HVCM
DOE Office of Scientific and Technical Information (OSTI.GOV)
Benwell, Andrew; Kemp, Mark; Burkhart, Craig
2010-06-11
An IGBT based over-current protection system has been developed to detect faults and limit the damage caused by faults in high voltage converter modulators. During normal operation, an IGBT enables energy to be transferred from storage capacitors to a H-bridge. When a fault occurs, the over-current protection system detects the fault, limits the fault current and opens the IGBT to isolate the remaining stored energy from the fault. This paper presents an experimental verification of the over-current protection system under applicable conditions.
Diagnostic Analyzer for Gearboxes (DAG): User's Guide. Version 3.1 for Microsoft Windows 3.1
NASA Technical Reports Server (NTRS)
Jammu, Vinay B.; Kourosh, Danai
1997-01-01
This documentation describes the Diagnostic Analyzer for Gearboxes (DAG) software for performing fault diagnosis of gearboxes. First, the user would construct a graphical representation of the gearbox using the gear, bearing, shaft, and sensor tools contained in the DAG software. Next, a set of vibration features obtained by processing the vibration signals recorded from the gearbox using a signal analyzer is required. Given this information, the DAG software uses an unsupervised neural network referred to as the Fault Detection Network (FDN) to identify the occurrence of faults, and a pattern classifier called Single Category-Based Classifier (SCBC) for abnormality scaling of individual vibration features. The abnormality-scaled vibration features are then used as inputs to a Structure-Based Connectionist Network (SBCN) for identifying faults in gearbox subsystems and components. The weights of the SBCN represent its diagnostic knowledge and are derived from the structure of the gearbox graphically presented in DAG. The outputs of SBCN are fault possibility values between 0 and 1 for individual subsystems and components in the gearbox with a 1 representing a definite fault and a 0 representing normality. This manual describes the steps involved in creating the diagnostic gearbox model, along with the options and analysis tools of the DAG software.
A Voyager attitude control perspective on fault tolerant systems
NASA Technical Reports Server (NTRS)
Rasmussen, R. D.; Litty, E. C.
1981-01-01
In current spacecraft design, a trend can be observed to achieve greater fault tolerance through the application of on-board software dedicated to detecting and isolating failures. Whether fault tolerance through software can meet the desired objectives depends on very careful consideration and control of the system in which the software is imbedded. The considered investigation has the objective to provide some of the insight needed for the required analysis of the system. A description is given of the techniques which have been developed in this connection during the development of the Voyager spacecraft. The Voyager Galileo Attitude and Articulation Control Subsystem (AACS) fault tolerant design is discussed to emphasize basic lessons learned from this experience. The central driver of hardware redundancy implementation on Voyager was known as the 'single point failure criterion'.
Wavelet Based Protection Scheme for Multi Terminal Transmission System with PV and Wind Generation
NASA Astrophysics Data System (ADS)
Manju Sree, Y.; Goli, Ravi kumar; Ramaiah, V.
2017-08-01
A hybrid generation is a part of large power system in which number of sources usually attached to a power electronic converter and loads are clustered can operate independent of the main power system. The protection scheme is crucial against faults based on traditional over current protection since there are adequate problems due to fault currents in the mode of operation. This paper adopts a new approach for detection, discrimination of the faults for multi terminal transmission line protection in presence of hybrid generation. Transient current based protection scheme is developed with discrete wavelet transform. Fault indices of all phase currents at all terminals are obtained by analyzing the detail coefficients of current signals using bior 1.5 mother wavelet. This scheme is tested for different types of faults and is found effective for detection and discrimination of fault with various fault inception angle and fault impedance.
Ground Software Maintenance Facility (GSMF) system manual
NASA Technical Reports Server (NTRS)
Derrig, D.; Griffith, G.
1986-01-01
The Ground Software Maintenance Facility (GSMF) is designed to support development and maintenance of spacelab ground support software. THE GSMF consists of a Perkin Elmer 3250 (Host computer) and a MITRA 125s (ATE computer), with appropriate interface devices and software to simulate the Electrical Ground Support Equipment (EGSE). This document is presented in three sections: (1) GSMF Overview; (2) Software Structure; and (3) Fault Isolation Capability. The overview contains information on hardware and software organization along with their corresponding block diagrams. The Software Structure section describes the modes of software structure including source files, link information, and database files. The Fault Isolation section describes the capabilities of the Ground Computer Interface Device, Perkin Elmer host, and MITRA ATE.
Detection of faults and software reliability analysis
NASA Technical Reports Server (NTRS)
Knight, John C.
1987-01-01
Multi-version or N-version programming is proposed as a method of providing fault tolerance in software. The approach requires the separate, independent preparation of multiple versions of a piece of software for some application. These versions are executed in parallel in the application environment; each receives identical inputs and each produces its version of the required outputs. The outputs are collected by a voter and, in principle, they should all be the same. In practice there may be some disagreement. If this occurs, the results of the majority are taken to be the correct output, and that is the output used by the system. A total of 27 programs were produced. Each of these programs was then subjected to one million randomly-generated test cases. The experiment yielded a number of programs containing faults that are useful for general studies of software reliability as well as studies of N-version programming. Fault tolerance through data diversity and analytic models of comparison testing are discussed.
Automated Fault Interpretation and Extraction using Improved Supplementary Seismic Datasets
NASA Astrophysics Data System (ADS)
Bollmann, T. A.; Shank, R.
2017-12-01
During the interpretation of seismic volumes, it is necessary to interpret faults along with horizons of interest. With the improvement of technology, the interpretation of faults can be expedited with the aid of different algorithms that create supplementary seismic attributes, such as semblance and coherency. These products highlight discontinuities, but still need a large amount of human interaction to interpret faults and are plagued by noise and stratigraphic discontinuities. Hale (2013) presents a method to improve on these datasets by creating what is referred to as a Fault Likelihood volume. In general, these volumes contain less noise and do not emphasize stratigraphic features. Instead, planar features within a specified strike and dip range are highlighted. Once a satisfactory Fault Likelihood Volume is created, extraction of fault surfaces is much easier. The extracted fault surfaces are then exported to interpretation software for QC. Numerous software packages have implemented this methodology with varying results. After investigating these platforms, we developed a preferred Automated Fault Interpretation workflow.
Measurement of fault latency in a digital avionic miniprocessor
NASA Technical Reports Server (NTRS)
Mcgough, J. G.; Swern, F. L.
1981-01-01
The results of fault injection experiments utilizing a gate-level emulation of the central processor unit of the Bendix BDX-930 digital computer are presented. The failure detection coverage of comparison-monitoring and a typical avionics CPU self-test program was determined. The specific tasks and experiments included: (1) inject randomly selected gate-level and pin-level faults and emulate six software programs using comparison-monitoring to detect the faults; (2) based upon the derived empirical data develop and validate a model of fault latency that will forecast a software program's detecting ability; (3) given a typical avionics self-test program, inject randomly selected faults at both the gate-level and pin-level and determine the proportion of faults detected; (4) determine why faults were undetected; (5) recommend how the emulation can be extended to multiprocessor systems such as SIFT; and (6) determine the proportion of faults detected by a uniprocessor BIT (built-in-test) irrespective of self-test.
Criteria for software modularization
NASA Technical Reports Server (NTRS)
Card, David N.; Page, Gerald T.; Mcgarry, Frank E.
1985-01-01
A central issue in programming practice involves determining the appropriate size and information content of a software module. This study attempted to determine the effectiveness of two widely used criteria for software modularization, strength and size, in reducing fault rate and development cost. Data from 453 FORTRAN modules developed by professional programmers were analyzed. The results indicated that module strength is a good criterion with respect to fault rate, whereas arbitrary module size limitations inhibit programmer productivity. This analysis is a first step toward defining empirically based standards for software modularization.
NASA Technical Reports Server (NTRS)
Smith, T. B., III; Lala, J. H.
1984-01-01
The FTMP architecture is a high reliability computer concept modeled after a homogeneous multiprocessor architecture. Elements of the FTMP are operated in tight synchronism with one another and hardware fault-detection and fault-masking is provided which is transparent to the software. Operating system design and user software design is thus greatly simplified. Performance of the FTMP is also comparable to that of a simplex equivalent due to the efficiency of fault handling hardware. The FTMP project constructed an engineering module of the FTMP, programmed the machine and extensively tested the architecture through fault injection and other stress testing. This testing confirmed the soundness of the FTMP concepts.
Fault Analysis in Solar Photovoltaic Arrays
NASA Astrophysics Data System (ADS)
Zhao, Ye
Fault analysis in solar photovoltaic (PV) arrays is a fundamental task to increase reliability, efficiency and safety in PV systems. Conventional fault protection methods usually add fuses or circuit breakers in series with PV components. But these protection devices are only able to clear faults and isolate faulty circuits if they carry a large fault current. However, this research shows that faults in PV arrays may not be cleared by fuses under some fault scenarios, due to the current-limiting nature and non-linear output characteristics of PV arrays. First, this thesis introduces new simulation and analytic models that are suitable for fault analysis in PV arrays. Based on the simulation environment, this thesis studies a variety of typical faults in PV arrays, such as ground faults, line-line faults, and mismatch faults. The effect of a maximum power point tracker on fault current is discussed and shown to, at times, prevent the fault current protection devices to trip. A small-scale experimental PV benchmark system has been developed in Northeastern University to further validate the simulation conclusions. Additionally, this thesis examines two types of unique faults found in a PV array that have not been studied in the literature. One is a fault that occurs under low irradiance condition. The other is a fault evolution in a PV array during night-to-day transition. Our simulation and experimental results show that overcurrent protection devices are unable to clear the fault under "low irradiance" and "night-to-day transition". However, the overcurrent protection devices may work properly when the same PV fault occurs in daylight. As a result, a fault under "low irradiance" and "night-to-day transition" might be hidden in the PV array and become a potential hazard for system efficiency and reliability.
Path Searching Based Fault Automated Recovery Scheme for Distribution Grid with DG
NASA Astrophysics Data System (ADS)
Xia, Lin; Qun, Wang; Hui, Xue; Simeng, Zhu
2016-12-01
Applying the method of path searching based on distribution network topology in setting software has a good effect, and the path searching method containing DG power source is also applicable to the automatic generation and division of planned islands after the fault. This paper applies path searching algorithm in the automatic division of planned islands after faults: starting from the switch of fault isolation, ending in each power source, and according to the line load that the searching path traverses and the load integrated by important optimized searching path, forming optimized division scheme of planned islands that uses each DG as power source and is balanced to local important load. Finally, COBASE software and distribution network automation software applied are used to illustrate the effectiveness of the realization of such automatic restoration program.
Software reliability through fault-avoidance and fault-tolerance
NASA Technical Reports Server (NTRS)
Vouk, Mladen A.; Mcallister, David F.
1992-01-01
Accomplishments in the following research areas are summarized: structure based testing, reliability growth, and design testability with risk evaluation; reliability growth models and software risk management; and evaluation of consensus voting, consensus recovery block, and acceptance voting. Four papers generated during the reporting period are included as appendices.
Formal Validation of Fault Management Design Solutions
NASA Technical Reports Server (NTRS)
Gibson, Corrina; Karban, Robert; Andolfato, Luigi; Day, John
2013-01-01
The work presented in this paper describes an approach used to develop SysML modeling patterns to express the behavior of fault protection, test the model's logic by performing fault injection simulations, and verify the fault protection system's logical design via model checking. A representative example, using a subset of the fault protection design for the Soil Moisture Active-Passive (SMAP) system, was modeled with SysML State Machines and JavaScript as Action Language. The SysML model captures interactions between relevant system components and system behavior abstractions (mode managers, error monitors, fault protection engine, and devices/switches). Development of a method to implement verifiable and lightweight executable fault protection models enables future missions to have access to larger fault test domains and verifiable design patterns. A tool-chain to transform the SysML model to jpf-Statechart compliant Java code and then verify the generated code via model checking was established. Conclusions and lessons learned from this work are also described, as well as potential avenues for further research and development.
Detection of faults and software reliability analysis
NASA Technical Reports Server (NTRS)
Knight, J. C.
1987-01-01
Specific topics briefly addressed include: the consistent comparison problem in N-version system; analytic models of comparison testing; fault tolerance through data diversity; and the relationship between failures caused by automatically seeded faults.
A Fault Oblivious Extreme-Scale Execution Environment
DOE Office of Scientific and Technical Information (OSTI.GOV)
McKie, Jim
The FOX project, funded under the ASCR X-stack I program, developed systems software and runtime libraries for a new approach to the data and work distribution for massively parallel, fault oblivious application execution. Our work was motivated by the premise that exascale computing systems will provide a thousand-fold increase in parallelism and a proportional increase in failure rate relative to today’s machines. To deliver the capability of exascale hardware, the systems software must provide the infrastructure to support existing applications while simultaneously enabling efficient execution of new programming models that naturally express dynamic, adaptive, irregular computation; coupled simulations; and massivemore » data analysis in a highly unreliable hardware environment with billions of threads of execution. Our OS research has prototyped new methods to provide efficient resource sharing, synchronization, and protection in a many-core compute node. We have experimented with alternative task/dataflow programming models and shown scalability in some cases to hundreds of thousands of cores. Much of our software is in active development through open source projects. Concepts from FOX are being pursued in next generation exascale operating systems. Our OS work focused on adaptive, application tailored OS services optimized for multi → many core processors. We developed a new operating system NIX that supports role-based allocation of cores to processes which was released to open source. We contributed to the IBM FusedOS project, which promoted the concept of latency-optimized and throughput-optimized cores. We built a task queue library based on distributed, fault tolerant key-value store and identified scaling issues. A second fault tolerant task parallel library was developed, based on the Linda tuple space model, that used low level interconnect primitives for optimized communication. We designed fault tolerance mechanisms for task parallel computations employing work stealing for load balancing that scaled to the largest existing supercomputers. Finally, we implemented the Elastic Building Blocks runtime, a library to manage object-oriented distributed software components. To support the research, we won two INCITE awards for time on Intrepid (BG/P) and Mira (BG/Q). Much of our work has had impact in the OS and runtime community through the ASCR Exascale OS/R workshop and report, leading to the research agenda of the Exascale OS/R program. Our project was, however, also affected by attrition of multiple PIs. While the PIs continued to participate and offer guidance as time permitted, losing these key individuals was unfortunate both for the project and for the DOE HPC community.« less
Design for dependability: A simulation-based approach. Ph.D. Thesis, 1993
NASA Technical Reports Server (NTRS)
Goswami, Kumar K.
1994-01-01
This research addresses issues in simulation-based system level dependability analysis of fault-tolerant computer systems. The issues and difficulties of providing a general simulation-based approach for system level analysis are discussed and a methodology that address and tackle these issues is presented. The proposed methodology is designed to permit the study of a wide variety of architectures under various fault conditions. It permits detailed functional modeling of architectural features such as sparing policies, repair schemes, routing algorithms as well as other fault-tolerant mechanisms, and it allows the execution of actual application software. One key benefit of this approach is that the behavior of a system under faults does not have to be pre-defined as it is normally done. Instead, a system can be simulated in detail and injected with faults to determine its failure modes. The thesis describes how object-oriented design is used to incorporate this methodology into a general purpose design and fault injection package called DEPEND. A software model is presented that uses abstractions of application programs to study the behavior and effect of software on hardware faults in the early design stage when actual code is not available. Finally, an acceleration technique that combines hierarchical simulation, time acceleration algorithms and hybrid simulation to reduce simulation time is introduced.
Debugging and Logging Services for Defence Service Oriented Architectures
2012-02-01
Service A software component and callable end point that provides a logically related set of operations, each of which perform a logical step in a...important to note that in some cases when the fault is identified to lie in uneditable code such as program libraries, or outsourced software services ...debugging is limited to characterisation of the fault, reporting it to the software or service provider and development of work-arounds and management
Software Implemented Fault-Tolerant (SIFT) user's guide
NASA Technical Reports Server (NTRS)
Green, D. F., Jr.; Palumbo, D. L.; Baltrus, D. W.
1984-01-01
Program development for a Software Implemented Fault Tolerant (SIFT) computer system is accomplished in the NASA LaRC AIRLAB facility using a DEC VAX-11 to interface with eight Bendix BDX 930 flight control processors. The interface software which provides this SIFT program development capability was developed by AIRLAB personnel. This technical memorandum describes the application and design of this software in detail, and is intended to assist both the user in performance of SIFT research and the systems programmer responsible for maintaining and/or upgrading the SIFT programming environment.
The Infeasibility of Quantifying the Reliability of Life-Critical Real-Time Software
NASA Technical Reports Server (NTRS)
Butler, Ricky W.; Finelli, George B.
1991-01-01
This paper affirms that the quantification of life-critical software reliability is infeasible using statistical methods whether applied to standard software or fault-tolerant software. The classical methods of estimating reliability are shown to lead to exhorbitant amounts of testing when applied to life-critical software. Reliability growth models are examined and also shown to be incapable of overcoming the need for excessive amounts of testing. The key assumption of software fault tolerance separately programmed versions fail independently is shown to be problematic. This assumption cannot be justified by experimentation in the ultrareliability region and subjective arguments in its favor are not sufficiently strong to justify it as an axiom. Also, the implications of the recent multiversion software experiments support this affirmation.
ROSE::FTTransform - A Source-to-Source Translation Framework for Exascale Fault-Tolerance Research
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lidman, J; Quinlan, D; Liao, C
2012-03-26
Exascale computing systems will require sufficient resilience to tolerate numerous types of hardware faults while still assuring correct program execution. Such extreme-scale machines are expected to be dominated by processors driven at lower voltages (near the minimum 0.5 volts for current transistors). At these voltage levels, the rate of transient errors increases dramatically due to the sensitivity to transient and geographically localized voltage drops on parts of the processor chip. To achieve power efficiency, these processors are likely to be streamlined and minimal, and thus they cannot be expected to handle transient errors entirely in hardware. Here we present anmore » open, compiler-based framework to automate the armoring of High Performance Computing (HPC) software to protect it from these types of transient processor errors. We develop an open infrastructure to support research work in this area, and we define tools that, in the future, may provide more complete automated and/or semi-automated solutions to support software resiliency on future exascale architectures. Results demonstrate that our approach is feasible, pragmatic in how it can be separated from the software development process, and reasonably efficient (0% to 30% overhead for the Jacobi iteration on common hardware; and 20%, 40%, 26%, and 2% overhead for a randomly selected subset of benchmarks from the Livermore Loops [1]).« less
Effectiveness of back-to-back testing
NASA Technical Reports Server (NTRS)
Vouk, Mladen A.; Mcallister, David F.; Eckhardt, David E.; Caglayan, Alper; Kelly, John P. J.
1987-01-01
Three models of back-to-back testing processes are described. Two models treat the case where there is no intercomponent failure dependence. The third model describes the more realistic case where there is correlation among the failure probabilities of the functionally equivalent components. The theory indicates that back-to-back testing can, under the right conditions, provide a considerable gain in software reliability. The models are used to analyze the data obtained in a fault-tolerant software experiment. It is shown that the expected gain is indeed achieved, and exceeded, provided the intercomponent failure dependence is sufficiently small. However, even with the relatively high correlation the use of several functionally equivalent components coupled with back-to-back testing may provide a considerable reliability gain. Implications of this finding are that the multiversion software development is a feasible and cost effective approach to providing highly reliable software components intended for fault-tolerant software systems, on condition that special attention is directed at early detection and elimination of correlated faults.
NASA Technical Reports Server (NTRS)
Briand, Lionel C.; Basili, Victor R.; Hetmanski, Christopher J.
1992-01-01
Applying equal testing and verification effort to all parts of a software system is not very efficient, especially when resources are limited and scheduling is tight. Therefore, one needs to be able to differentiate low/high fault density components so that the testing/verification effort can be concentrated where needed. Such a strategy is expected to detect more faults and thus improve the resulting reliability of the overall system. This paper presents an alternative approach for constructing such models that is intended to fulfill specific software engineering needs (i.e. dealing with partial/incomplete information and creating models that are easy to interpret). Our approach to classification is as follows: (1) to measure the software system to be considered; and (2) to build multivariate stochastic models for prediction. We present experimental results obtained by classifying FORTRAN components developed at the NASA/GSFC into two fault density classes: low and high. Also we evaluate the accuracy of the model and the insights it provides into the software process.
Fault Mitigation Schemes for Future Spaceflight Multicore Processors
NASA Technical Reports Server (NTRS)
Alexander, James W.; Clement, Bradley J.; Gostelow, Kim P.; Lai, John Y.
2012-01-01
Future planetary exploration missions demand significant advances in on-board computing capabilities over current avionics architectures based on a single-core processing element. The state-of-the-art multi-core processor provides much promise in meeting such challenges while introducing new fault tolerance problems when applied to space missions. Software-based schemes are being presented in this paper that can achieve system-level fault mitigation beyond that provided by radiation-hard-by-design (RHBD). For mission and time critical applications such as the Terrain Relative Navigation (TRN) for planetary or small body navigation, and landing, a range of fault tolerance methods can be adapted by the application. The software methods being investigated include Error Correction Code (ECC) for data packet routing between cores, virtual network routing, Triple Modular Redundancy (TMR), and Algorithm-Based Fault Tolerance (ABFT). A robust fault tolerance framework that provides fail-operational behavior under hard real-time constraints and graceful degradation will be demonstrated using TRN executing on a commercial Tilera(R) processor with simulated fault injections.
Simplex GPS and InSAR Inversion Software
NASA Technical Reports Server (NTRS)
Donnellan, Andrea; Parker, Jay W.; Lyzenga, Gregory A.; Pierce, Marlon E.
2012-01-01
Changes in the shape of the Earth's surface can be routinely measured with precisions better than centimeters. Processes below the surface often drive these changes and as a result, investigators require models with inversion methods to characterize the sources. Simplex inverts any combination of GPS (global positioning system), UAVSAR (uninhabited aerial vehicle synthetic aperture radar), and InSAR (interferometric synthetic aperture radar) data simultaneously for elastic response from fault and fluid motions. It can be used to solve for multiple faults and parameters, all of which can be specified or allowed to vary. The software can be used to study long-term tectonic motions and the faults responsible for those motions, or can be used to invert for co-seismic slip from earthquakes. Solutions involving estimation of fault motion and changes in fluid reservoirs such as magma or water are possible. Any arbitrary number of faults or parameters can be considered. Simplex specifically solves for any of location, geometry, fault slip, and expansion/contraction of a single or multiple faults. It inverts GPS and InSAR data for elastic dislocations in a half-space. Slip parameters include strike slip, dip slip, and tensile dislocations. It includes a map interface for both setting up the models and viewing the results. Results, including faults, and observed, computed, and residual displacements, are output in text format, a map interface, and can be exported to KML. The software interfaces with the QuakeTables database allowing a user to select existing fault parameters or data. Simplex can be accessed through the QuakeSim portal graphical user interface or run from a UNIX command line.
Modeling and Performance Considerations for Automated Fault Isolation in Complex Systems
NASA Technical Reports Server (NTRS)
Ferrell, Bob; Oostdyk, Rebecca
2010-01-01
The purpose of this paper is to document the modeling considerations and performance metrics that were examined in the development of a large-scale Fault Detection, Isolation and Recovery (FDIR) system. The FDIR system is envisioned to perform health management functions for both a launch vehicle and the ground systems that support the vehicle during checkout and launch countdown by using suite of complimentary software tools that alert operators to anomalies and failures in real-time. The FDIR team members developed a set of operational requirements for the models that would be used for fault isolation and worked closely with the vendor of the software tools selected for fault isolation to ensure that the software was able to meet the requirements. Once the requirements were established, example models of sufficient complexity were used to test the performance of the software. The results of the performance testing demonstrated the need for enhancements to the software in order to meet the demands of the full-scale ground and vehicle FDIR system. The paper highlights the importance of the development of operational requirements and preliminary performance testing as a strategy for identifying deficiencies in highly scalable systems and rectifying those deficiencies before they imperil the success of the project
30 CFR 75.814 - Electrical protection.
Code of Federal Regulations, 2013 CFR
2013-07-01
... protection must not be dependent upon control power and may consist of a current transformer and overcurrent... restarting of the equipment. (b) Current transformers used for the ground-fault protection specified in... series with ground-fault current transformers. (c) Each ground-fault current device specified in...
30 CFR 75.814 - Electrical protection.
Code of Federal Regulations, 2011 CFR
2011-07-01
... protection must not be dependent upon control power and may consist of a current transformer and overcurrent... restarting of the equipment. (b) Current transformers used for the ground-fault protection specified in... series with ground-fault current transformers. (c) Each ground-fault current device specified in...
30 CFR 75.814 - Electrical protection.
Code of Federal Regulations, 2014 CFR
2014-07-01
... protection must not be dependent upon control power and may consist of a current transformer and overcurrent... restarting of the equipment. (b) Current transformers used for the ground-fault protection specified in... series with ground-fault current transformers. (c) Each ground-fault current device specified in...
30 CFR 75.814 - Electrical protection.
Code of Federal Regulations, 2012 CFR
2012-07-01
... protection must not be dependent upon control power and may consist of a current transformer and overcurrent... restarting of the equipment. (b) Current transformers used for the ground-fault protection specified in... series with ground-fault current transformers. (c) Each ground-fault current device specified in...
30 CFR 75.814 - Electrical protection.
Code of Federal Regulations, 2010 CFR
2010-07-01
... protection must not be dependent upon control power and may consist of a current transformer and overcurrent... restarting of the equipment. (b) Current transformers used for the ground-fault protection specified in... series with ground-fault current transformers. (c) Each ground-fault current device specified in...
NASA Technical Reports Server (NTRS)
Briand, Lionel C.; Basili, Victor R.; Hetmanski, Christopher J.
1993-01-01
Applying equal testing and verification effort to all parts of a software system is not very efficient, especially when resources are limited and scheduling is tight. Therefore, one needs to be able to differentiate low/high fault frequency components so that testing/verification effort can be concentrated where needed. Such a strategy is expected to detect more faults and thus improve the resulting reliability of the overall system. This paper presents the Optimized Set Reduction approach for constructing such models, intended to fulfill specific software engineering needs. Our approach to classification is to measure the software system and build multivariate stochastic models for predicting high risk system components. We present experimental results obtained by classifying Ada components into two classes: is or is not likely to generate faults during system and acceptance test. Also, we evaluate the accuracy of the model and the insights it provides into the error making process.
PDSS/IMC requirements and functional specifications
NASA Technical Reports Server (NTRS)
1983-01-01
The system (software and hardware) requirements for the Payload Development Support System (PDSS)/Image Motion Compensator (IMC) are provided. The PDSS/IMC system provides the capability for performing Image Motion Compensator Electronics (IMCE) flight software test, checkout, and verification and provides the capability for monitoring the IMC flight computer system during qualification testing for fault detection and fault isolation.
NASA Astrophysics Data System (ADS)
Utegulov, B. B.; Utegulov, A. B.; Meiramova, S.
2018-02-01
The paper proposes the development of a self-learning machine for creating models of microprocessor-based single-phase ground fault protection devices in networks with an isolated neutral voltage higher than 1000 V. Development of a self-learning machine for creating models of microprocessor-based single-phase earth fault protection devices in networks with an isolated neutral voltage higher than 1000 V. allows to effectively implement mathematical models of automatic change of protection settings. Single-phase earth fault protection devices.
SIRU development. Volume 3: Software description and program documentation
NASA Technical Reports Server (NTRS)
Oehrle, J.
1973-01-01
The development and initial evaluation of a strapdown inertial reference unit (SIRU) system are discussed. The SIRU configuration is a modular inertial subsystem with hardware and software features that achieve fault tolerant operational capabilities. The SIRU redundant hardware design is formulated about a six gyro and six accelerometer instrument module package. The six axes array provides redundant independent sensing and the symmetry enables the formulation of an optimal software redundant data processing structure with self-contained fault detection and isolation (FDI) capabilities. The basic SIRU software coding system used in the DDP-516 computer is documented.
Towards Certification of a Space System Application of Fault Detection and Isolation
NASA Technical Reports Server (NTRS)
Feather, Martin S.; Markosian, Lawrence Z.
2008-01-01
Advanced fault detection, isolation and recovery (FDIR) software is being investigated at NASA as a means to the improve reliability and availability of its space systems. Certification is a critical step in the acceptance of such software. Its attainment hinges on performing the necessary verification and validation to show that the software will fulfill its requirements in the intended setting. Presented herein is our ongoing work to plan for the certification of a pilot application of advanced FDIR software in a NASA setting. We describe the application, and the key challenges and opportunities it offers for certification.
NASA Technical Reports Server (NTRS)
Lala, J. H.; Smith, T. B., III
1983-01-01
The experimental test and evaluation of the Fault-Tolerant Multiprocessor (FTMP) is described. Major objectives of this exercise include expanding validation envelope, building confidence in the system, revealing any weaknesses in the architectural concepts and in their execution in hardware and software, and in general, stressing the hardware and software. To this end, pin-level faults were injected into one LRU of the FTMP and the FTMP response was measured in terms of fault detection, isolation, and recovery times. A total of 21,055 stuck-at-0, stuck-at-1 and invert-signal faults were injected in the CPU, memory, bus interface circuits, Bus Guardian Units, and voters and error latches. Of these, 17,418 were detected. At least 80 percent of undetected faults are estimated to be on unused pins. The multiprocessor identified all detected faults correctly and recovered successfully in each case. Total recovery time for all faults averaged a little over one second. This can be reduced to half a second by including appropriate self-tests.
Fault Tolerant Software Technology for Distributed Computer Systems
1989-03-01
RAY.) &-TR-88-296 I Fin;.’ Technical Report ,r 19,39 i A28 3329 F’ULT TOLERANT SOFTWARE TECHNOLOGY FOR DISTRIBUTED COMPUTER SYSTEMS Georgia Institute...GrfisABN 34-70IiWftlI NO0. IN?3. NO IACCESSION NO. 158 21 7 11. TITLE (Incld security Cassification) FAULT TOLERANT SOFTWARE FOR DISTRIBUTED COMPUTER ...Technology for Distributed Computing Systems," a two year effort performed at Georgia Institute of Technology as part of the Clouds Project. The Clouds
Distributed asynchronous microprocessor architectures in fault tolerant integrated flight systems
NASA Technical Reports Server (NTRS)
Dunn, W. R.
1983-01-01
The paper discusses the implementation of fault tolerant digital flight control and navigation systems for rotorcraft application. It is shown that in implementing fault tolerance at the systems level using advanced LSI/VLSI technology, aircraft physical layout and flight systems requirements tend to define a system architecture of distributed, asynchronous microprocessors in which fault tolerance can be achieved locally through hardware redundancy and/or globally through application of analytical redundancy. The effects of asynchronism on the execution of dynamic flight software is discussed. It is shown that if the asynchronous microprocessors have knowledge of time, these errors can be significantly reduced through appropiate modifications of the flight software. Finally, the papear extends previous work to show that through the combined use of time referencing and stable flight algorithms, individual microprocessors can be configured to autonomously tolerate intermittent faults.
Mars Science Laboratory CHIMRA/IC/DRT Flight Software for Sample Acquisition and Processing
NASA Technical Reports Server (NTRS)
Kim, Won S.; Leger, Chris; Carsten, Joseph; Helmick, Daniel; Kuhn, Stephen; Redick, Richard; Trujillo, Diana
2013-01-01
The design methodologies of using sequence diagrams, multi-process functional flow diagrams, and hierarchical state machines were successfully applied in designing three MSL (Mars Science Laboratory) flight software modules responsible for handling actuator motions of the CHIMRA (Collection and Handling for In Situ Martian Rock Analysis), IC (Inlet Covers), and DRT (Dust Removal Tool) mechanisms. The methodologies were essential to specify complex interactions with other modules, support concurrent foreground and background motions, and handle various fault protections. Studying task scenarios with multi-process functional flow diagrams yielded great insight to overall design perspectives. Since the three modules require three different levels of background motion support, the methodologies presented in this paper provide an excellent comparison. All three modules are fully operational in flight.
Tutorial: Advanced fault tree applications using HARP
NASA Technical Reports Server (NTRS)
Dugan, Joanne Bechta; Bavuso, Salvatore J.; Boyd, Mark A.
1993-01-01
Reliability analysis of fault tolerant computer systems for critical applications is complicated by several factors. These modeling difficulties are discussed and dynamic fault tree modeling techniques for handling them are described and demonstrated. Several advanced fault tolerant computer systems are described, and fault tree models for their analysis are presented. HARP (Hybrid Automated Reliability Predictor) is a software package developed at Duke University and NASA Langley Research Center that is capable of solving the fault tree models presented.
Experiments in fault tolerant software reliability
NASA Technical Reports Server (NTRS)
Mcallister, David F.; Vouk, Mladen A.
1989-01-01
Twenty functionally equivalent programs were built and tested in a multiversion software experiment. Following unit testing, all programs were subjected to an extensive system test. In the process sixty-one distinct faults were identified among the versions. Less than 12 percent of the faults exhibited varying degrees of positive correlation. The common-cause (or similar) faults spanned as many as 14 components. However, a majority of these faults were trivial, and easily detected by proper unit and/or system testing. Only two of the seven similar faults were difficult faults, and both were caused by specification ambiguities. One of these faults exhibited variable identical-and-wrong response span, i.e. response span which varied with the testing conditions and input data. Techniques that could have been used to avoid the faults are discussed. For example, it was determined that back-to-back testing of 2-tuples could have been used to eliminate about 90 percent of the faults. In addition, four of the seven similar faults could have been detected by using back-to-back testing of 5-tuples. It is believed that most, if not all, similar faults could have been avoided had the specifications been written using more formal notation, the unit testing phase was subject to more stringent standards and controls, and better tools for measuring the quality and adequacy of the test data (e.g. coverage) were used.
Solar Photovoltaic (PV) Distributed Generation Systems - Control and Protection
NASA Astrophysics Data System (ADS)
Yi, Zhehan
This dissertation proposes a comprehensive control, power management, and fault detection strategy for solar photovoltaic (PV) distribution generations. Battery storages are typically employed in PV systems to mitigate the power fluctuation caused by unstable solar irradiance. With AC and DC loads, a PV-battery system can be treated as a hybrid microgrid which contains both DC and AC power resources and buses. In this thesis, a control power and management system (CAPMS) for PV-battery hybrid microgrid is proposed, which provides 1) the DC and AC bus voltage and AC frequency regulating scheme and controllers designed to track set points; 2) a power flow management strategy in the hybrid microgrid to achieve system generation and demand balance in both grid-connected and islanded modes; 3) smooth transition control during grid reconnection by frequency and phase synchronization control between the main grid and microgrid. Due to the increasing demands for PV power, scales of PV systems are getting larger and fault detection in PV arrays becomes challenging. High-impedance faults, low-mismatch faults, and faults occurred in low irradiance conditions tend to be hidden due to low fault currents, particularly, when a PV maximum power point tracking (MPPT) algorithm is in-service. If remain undetected, these faults can considerably lower the output energy of solar systems, damage the panels, and potentially cause fire hazards. In this dissertation, fault detection challenges in PV arrays are analyzed in depth, considering the crossing relations among the characteristics of PV, interactions with MPPT algorithms, and the nature of solar irradiance. Two fault detection schemes are then designed as attempts to address these technical issues, which detect faults inside PV arrays accurately even under challenging circumstances, e.g., faults in low irradiance conditions or high-impedance faults. Taking advantage of multi-resolution signal decomposition (MSD), a powerful signal processing technique based on discrete wavelet transformation (DWT), the first attempt is devised, which extracts the features of both line-to-line (L-L) and line-to-ground (L-G) faults and employs a fuzzy inference system (FIS) for the decision-making stage of fault detection. This scheme is then improved as the second attempt by further studying the system's behaviors during L-L faults, extracting more efficient fault features, and devising a more advanced decision-making stage: the two-stage support vector machine (SVM). For the first time, the two-stage SVM method is proposed in this dissertation to detect L-L faults in PV system with satisfactory accuracies. Numerous simulation and experimental case studies are carried out to verify the proposed control and protection strategies. Simulation environment is set up using the PSCAD/EMTDC and Matlab/Simulink software packages. Experimental case studies are conducted in a PV-battery hybrid microgrid using the dSPACE real-time controller to demonstrate the ease of hardware implementation and the controller performance. Another small-scale grid-connected PV system is set up to verify both fault detection algorithms which demonstrate promising performances and fault detecting accuracies.
An experimental investigation of fault tolerant software structures in an avionics application
NASA Technical Reports Server (NTRS)
Caglayan, Alper K.; Eckhardt, Dave E., Jr.
1989-01-01
The objective of this experimental investigation is to compare the functional performance and software reliability of competing fault tolerant software structures utilizing software diversity. In this experiment, three versions of the redundancy management software for a skewed sensor array have been developed using three diverse failure detection and isolation algorithms and incorporated into various N-version, recovery block and hybrid software structures. The empirical results show that, for maximum functional performance improvement in the selected application domain, the results of diverse algorithms should be voted before being processed by multiple versions without enforced diversity. Results also suggest that when the reliability gain with an N-version structure is modest, recovery block structures are more feasible since higher reliability can be obtained using an acceptance check with a modest reliability.
Determining the Impact of Steady-State PV Fault Current Injections on Distribution Protection
DOE Office of Scientific and Technical Information (OSTI.GOV)
Seuss, John; Reno, Matthew J.; Broderick, Robert Joseph
This report investigates the fault current contribution from a single large PV system and the impact it has on existing distribution overcurrent protection devices. Assumptions are made about the modeling of the PV system under fault to perform exhaustive steady - state fault analyses throughout distribution feeder models. Each PV interconnection location is tested to determine how the size of the PV system affects the fault current measured by each protection device. This data is then searched for logical conditions that indicate whether a protection device has operated in a manner that will cause more customer outages due to themore » addition of the PV system. This is referred to as a protection issue , and there are four unique types of issues that have been identified in the study. The PV system size at which any issues occur are recorded to determine the feeder's PV hosting capacity limitations due to interference with protection settings. The analysis is carried out on six feeder models. The report concludes with a discussion of the prevalence and cause of each protection issue caused by PV system fault current.« less
Using recurrence plot analysis for software execution interpretation and fault detection
NASA Astrophysics Data System (ADS)
Mosdorf, M.
2015-09-01
This paper shows a method targeted at software execution interpretation and fault detection using recurrence plot analysis. In in the proposed approach recurrence plot analysis is applied to software execution trace that contains executed assembly instructions. Results of this analysis are subject to further processing with PCA (Principal Component Analysis) method that simplifies number coefficients used for software execution classification. This method was used for the analysis of five algorithms: Bubble Sort, Quick Sort, Median Filter, FIR, SHA-1. Results show that some of the collected traces could be easily assigned to particular algorithms (logs from Bubble Sort and FIR algorithms) while others are more difficult to distinguish.
Slicken 1.0: Program for calculating the orientation of shear on reactivated faults
NASA Astrophysics Data System (ADS)
Xu, Hong; Xu, Shunshan; Nieto-Samaniego, Ángel F.; Alaniz-Álvarez, Susana A.
2017-07-01
The slip vector on a fault is an important parameter in the study of the movement history of a fault and its faulting mechanism. Although there exist many graphical programs to represent the shear stress (or slickenline) orientations on faults, programs to quantitatively calculate the orientation of fault slip based on a given stress field are scarce. In consequence, we develop Slicken 1.0, a software to rapidly calculate the orientation of maximum shear stress on any fault plane. For this direct method of calculating the resolved shear stress on a planar surface, the input data are the unit vector normal to the involved plane, the unit vectors of the three principal stress axes, and the stress ratio. The advantage of this program is that the vertical or horizontal principal stresses are not necessarily required. Due to its nimble design using Java SE 8.0, it runs on most operating systems with the corresponding Java VM. The software program will be practical for geoscience students, geologists and engineers and will help resolve a deficiency in field geology, and structural and engineering geology.
A theoretical basis for the analysis of redundant software subject to coincident errors
NASA Technical Reports Server (NTRS)
Eckhardt, D. E., Jr.; Lee, L. D.
1985-01-01
Fundamental to the development of redundant software techniques fault-tolerant software, is an understanding of the impact of multiple-joint occurrences of coincident errors. A theoretical basis for the study of redundant software is developed which provides a probabilistic framework for empirically evaluating the effectiveness of the general (N-Version) strategy when component versions are subject to coincident errors, and permits an analytical study of the effects of these errors. The basic assumptions of the model are: (1) independently designed software components are chosen in a random sample; and (2) in the user environment, the system is required to execute on a stationary input series. The intensity of coincident errors, has a central role in the model. This function describes the propensity to introduce design faults in such a way that software components fail together when executing in the user environment. The model is used to give conditions under which an N-Version system is a better strategy for reducing system failure probability than relying on a single version of software. A condition which limits the effectiveness of a fault-tolerant strategy is studied, and it is posted whether system failure probability varies monotonically with increasing N or whether an optimal choice of N exists.
Expert systems applied to fault isolation and energy storage management, phase 2
NASA Technical Reports Server (NTRS)
1987-01-01
A user's guide for the Fault Isolation and Energy Storage (FIES) II system is provided. Included are a brief discussion of the background and scope of this project, a discussion of basic and advanced operating installation and problem determination procedures for the FIES II system and information on hardware and software design and implementation. A number of appendices are provided including a detailed specification for the microprocessor software, a detailed description of the expert system rule base and a description and listings of the LISP interface software.
Software fault tolerance using data diversity
NASA Technical Reports Server (NTRS)
Knight, John C.
1991-01-01
Research on data diversity is discussed. Data diversity relies on a different form of redundancy from existing approaches to software fault tolerance and is substantially less expensive to implement. Data diversity can also be applied to software testing and greatly facilitates the automation of testing. Up to now it has been explored both theoretically and in a pilot study, and has been shown to be a promising technique. The effectiveness of data diversity as an error detection mechanism and the application of data diversity to differential equation solvers are discussed.
NASA Technical Reports Server (NTRS)
Aucoin, B. M.; Heller, R. P.
1990-01-01
An intelligent remote power controller (RPC) based on microcomputer technology can implement advanced functions for the accurate and secure detection of all types of faults on a spaceborne electrical distribution system. The intelligent RPC will implement conventional protection functions such as overcurrent, under-voltage, and ground fault protection. Advanced functions for the detection of soft faults, which cannot presently be detected, can also be implemented. Adaptive overcurrent protection changes overcurrent settings based on connected load. Incipient and high-impedance fault detection provides early detection of arcing conditions to prevent fires, and to clear and reconfigure circuits before soft faults progress to a hard-fault condition. Power electronics techniques can be used to implement fault current limiting to prevent voltage dips during hard faults. It is concluded that these techniques will enhance the overall safety and reliability of the distribution system.
An empirical study of flight control software reliability
NASA Technical Reports Server (NTRS)
Dunham, J. R.; Pierce, J. L.
1986-01-01
The results of a laboratory experiment in flight control software reliability are reported. The experiment tests a small sample of implementations of a pitch axis control law for a PA28 aircraft with over 14 million pitch commands with varying levels of additive input and feedback noise. The testing which uses the method of n-version programming for error detection surfaced four software faults in one implementation of the control law. The small number of detected faults precluded the conduct of the error burst analyses. The pitch axis problem provides data for use in constructing a model in the prediction of the reliability of software in systems with feedback. The study is undertaken to find means to perform reliability evaluations of flight control software.
NASA Technical Reports Server (NTRS)
Quinn, Todd M.; Walters, Jerry L.
1991-01-01
Future space explorations will require long term human presence in space. Space environments that provide working and living quarters for manned missions are becoming increasingly larger and more sophisticated. Monitor and control of the space environment subsystems by expert system software, which emulate human reasoning processes, could maintain the health of the subsystems and help reduce the human workload. The autonomous power expert (APEX) system was developed to emulate a human expert's reasoning processes used to diagnose fault conditions in the domain of space power distribution. APEX is a fault detection, isolation, and recovery (FDIR) system, capable of autonomous monitoring and control of the power distribution system. APEX consists of a knowledge base, a data base, an inference engine, and various support and interface software. APEX provides the user with an easy-to-use interactive interface. When a fault is detected, APEX will inform the user of the detection. The user can direct APEX to isolate the probable cause of the fault. Once a fault has been isolated, the user can ask APEX to justify its fault isolation and to recommend actions to correct the fault. APEX implementation and capabilities are discussed.
Fault Management Architectures and the Challenges of Providing Software Assurance
NASA Technical Reports Server (NTRS)
Savarino, Shirley; Fitz, Rhonda; Fesq, Lorraine; Whitman, Gerek
2015-01-01
Fault Management (FM) is focused on safety, the preservation of assets, and maintaining the desired functionality of the system. How FM is implemented varies among missions. Common to most missions is system complexity due to a need to establish a multi-dimensional structure across hardware, software and spacecraft operations. FM is necessary to identify and respond to system faults, mitigate technical risks and ensure operational continuity. Generally, FM architecture, implementation, and software assurance efforts increase with mission complexity. Because FM is a systems engineering discipline with a distributed implementation, providing efficient and effective verification and validation (V&V) is challenging. A breakout session at the 2012 NASA Independent Verification & Validation (IV&V) Annual Workshop titled "V&V of Fault Management: Challenges and Successes" exposed this issue in terms of V&V for a representative set of architectures. NASA's Software Assurance Research Program (SARP) has provided funds to NASA IV&V to extend the work performed at the Workshop session in partnership with NASA's Jet Propulsion Laboratory (JPL). NASA IV&V will extract FM architectures across the IV&V portfolio and evaluate the data set, assess visibility for validation and test, and define software assurance methods that could be applied to the various architectures and designs. This SARP initiative focuses efforts on FM architectures from critical and complex projects within NASA. The identification of particular FM architectures and associated V&V/IV&V techniques provides a data set that can enable improved assurance that a system will adequately detect and respond to adverse conditions. Ultimately, results from this activity will be incorporated into the NASA Fault Management Handbook providing dissemination across NASA, other agencies and the space community. This paper discusses the approach taken to perform the evaluations and preliminary findings from the research.
Fault Management Architectures and the Challenges of Providing Software Assurance
NASA Technical Reports Server (NTRS)
Savarino, Shirley; Fitz, Rhonda; Fesq, Lorraine; Whitman, Gerek
2015-01-01
The satellite systems Fault Management (FM) is focused on safety, the preservation of assets, and maintaining the desired functionality of the system. How FM is implemented varies among missions. Common to most is system complexity due to a need to establish a multi-dimensional structure across hardware, software and operations. This structure is necessary to identify and respond to system faults, mitigate technical risks and ensure operational continuity. These architecture, implementation and software assurance efforts increase with mission complexity. Because FM is a systems engineering discipline with a distributed implementation, providing efficient and effective verification and validation (VV) is challenging. A breakout session at the 2012 NASA Independent Verification Validation (IVV) Annual Workshop titled VV of Fault Management: Challenges and Successes exposed these issues in terms of VV for a representative set of architectures. NASA's IVV is funded by NASA's Software Assurance Research Program (SARP) in partnership with NASA's Jet Propulsion Laboratory (JPL) to extend the work performed at the Workshop session. NASA IVV will extract FM architectures across the IVV portfolio and evaluate the data set for robustness, assess visibility for validation and test, and define software assurance methods that could be applied to the various architectures and designs. This work focuses efforts on FM architectures from critical and complex projects within NASA. The identification of particular FM architectures, visibility, and associated VVIVV techniques provides a data set that can enable higher assurance that a satellite system will adequately detect and respond to adverse conditions. Ultimately, results from this activity will be incorporated into the NASA Fault Management Handbook providing dissemination across NASA, other agencies and the satellite community. This paper discusses the approach taken to perform the evaluations and preliminary findings from the research including identification of FM architectures, visibility observations, and methods utilized for VVIVV.
Saturating time-delay transformer for overcurrent protection. [Patent application
Praeg, W.F.
1975-12-18
Electrical loads connected to dc supplies are protected from damage by overcurrent in the case of a load fault by connecting in series with the load a saturating transformer that detects a load fault and limits the fault current to a safe level for a period long enough to correct the fault or else disconnect the power supply.
Saturating time-delay transformer for overcurrent protection
Praeg, Walter F.
1977-01-01
Electrical loads connected to d-c supplies are protected from damage by overcurrent in the case of a load fault by connecting in series with the load a saturating transformer that detects a load fault and limits the fault current to a safe level for a period long enough to correct the fault or else disconnect the power supply.
NASA Technical Reports Server (NTRS)
Lala, J. H.; Smith, T. B., III
1983-01-01
The software developed for the Fault-Tolerant Multiprocessor (FTMP) is described. The FTMP executive is a timer-interrupt driven dispatcher that schedules iterative tasks which run at 3.125, 12.5, and 25 Hz. Major tasks which run under the executive include system configuration control, flight control, and display. The flight control task includes autopilot and autoland functions for a jet transport aircraft. System Displays include status displays of all hardware elements (processors, memories, I/O ports, buses), failure log displays showing transient and hard faults, and an autopilot display. All software is in a higher order language (AED, an ALGOL derivative). The executive is a fully distributed general purpose executive which automatically balances the load among available processor triads. Provisions for graceful performance degradation under processing overload are an integral part of the scheduling algorithms.
Software for determining the true displacement of faults
NASA Astrophysics Data System (ADS)
Nieto-Fuentes, R.; Nieto-Samaniego, Á. F.; Xu, S.-S.; Alaniz-Álvarez, S. A.
2014-03-01
One of the most important parameters of faults is the true (or net) displacement, which is measured by restoring two originally adjacent points, called “piercing points”, to their original positions. This measurement is not typically applicable because it is rare to observe piercing points in natural outcrops. Much more common is the measurement of the apparent displacement of a marker. Methods to calculate the true displacement of faults using descriptive geometry, trigonometry or vector algebra are common in the literature, and most of them solve a specific situation from a large amount of possible combinations of the fault parameters. True displacements are not routinely calculated because it is a tedious and tiring task, despite their importance and the relatively simple methodology. We believe that the solution is to develop software capable of performing this work. In a previous publication, our research group proposed a method to calculate the true displacement of faults by solving most combinations of fault parameters using simple trigonometric equations. The purpose of this contribution is to present a computer program for calculating the true displacement of faults. The input data are the dip of the fault; the pitch angles of the markers, slickenlines and observation lines; and the marker separation. To prevent the common difficulties involved in switching between operative systems, the software is developed using the Java programing language. The computer program could be used as a tool in education and will also be useful for the calculation of the true fault displacement in geological and engineering works. The application resolves the cases with known direction of net slip, which commonly is assumed parallel to the slickenlines. This assumption is not always valid and must be used with caution, because the slickenlines are formed during a step of the incremental displacement on the fault surface, whereas the net slip is related to the finite slip.
The control system of a 2kW@20K helium refrigerator
NASA Astrophysics Data System (ADS)
Pan, W.; Wu, J. H.; Li, Qing; Liu, L. Q.; Li, Qiang
2017-12-01
The automatic control of a helium refrigerator includes three aspects, that is, one-button start and stop control, safety protection control, and cooling capacity control. The 2kW@20K helium refrigerator’s control system uses the SIEMENS PLC S7-300 and its related programming and configuration software Step7 and the industrial monitoring software WinCC, to realize the dynamic control of its process, the real-time monitoring of its data, the safety interlock control, and the optimal control of its cooling capacity. At first, this paper describes the control architecture of the whole system in detail, including communication configuration and equipment introduction; and then introduces the sequence control strategy of the dynamic processes, including the start and stop control mode of the machine and the safety interlock control strategy of the machine; finally tells the precise control strategy of the machine’s cooling capacity. Eventually, the whole system achieves the target of one-button starting and stopping, automatic fault protection and stable running to the target cooling capacity, and help finished the cold helium pressurization test of aerospace products.
NASA Technical Reports Server (NTRS)
Russell, B. Don
1989-01-01
This research concentrated on the application of advanced signal processing, expert system, and digital technologies for the detection and control of low grade, incipient faults on spaceborne power systems. The researchers have considerable experience in the application of advanced digital technologies and the protection of terrestrial power systems. This experience was used in the current contracts to develop new approaches for protecting the electrical distribution system in spaceborne applications. The project was divided into three distinct areas: (1) investigate the applicability of fault detection algorithms developed for terrestrial power systems to the detection of faults in spaceborne systems; (2) investigate the digital hardware and architectures required to monitor and control spaceborne power systems with full capability to implement new detection and diagnostic algorithms; and (3) develop a real-time expert operating system for implementing diagnostic and protection algorithms. Significant progress has been made in each of the above areas. Several terrestrial fault detection algorithms were modified to better adapt to spaceborne power system environments. Several digital architectures were developed and evaluated in light of the fault detection algorithms.
Modeling Security Aspects of Network
NASA Astrophysics Data System (ADS)
Schoch, Elmar
With more and more widespread usage of computer systems and networks, dependability becomes a paramount requirement. Dependability typically denotes tolerance or protection against all kinds of failures, errors and faults. Sources of failures can basically be accidental, e.g., in case of hardware errors or software bugs, or intentional due to some kind of malicious behavior. These intentional, malicious actions are subject of security. A more complete overview on the relations between dependability and security can be found in [31]. In parallel to the increased use of technology, misuse also has grown significantly, requiring measures to deal with it.
The Dangers of Failure Masking in Fault-Tolerant Software: Aspects of a Recent In-Flight Upset Event
NASA Technical Reports Server (NTRS)
Johnson, C. W.; Holloway, C. M.
2007-01-01
On 1 August 2005, a Boeing Company 777-200 aircraft, operating on an international passenger flight from Australia to Malaysia, was involved in a significant upset event while flying on autopilot. The Australian Transport Safety Bureau's investigation into the event discovered that an anomaly existed in the component software hierarchy that allowed inputs from a known faulty accelerometer to be processed by the air data inertial reference unit (ADIRU) and used by the primary flight computer, autopilot and other aircraft systems. This anomaly had existed in original ADIRU software, and had not been detected in the testing and certification process for the unit. This paper describes the software aspects of the incident in detail, and suggests possible implications concerning complex, safety-critical, fault-tolerant software.
NASA Technical Reports Server (NTRS)
Butler, Ricky W.; Boerschlein, David P.
1993-01-01
Fault-Tree Compiler (FTC) program, is software tool used to calculate probability of top event in fault tree. Gates of five different types allowed in fault tree: AND, OR, EXCLUSIVE OR, INVERT, and M OF N. High-level input language easy to understand and use. In addition, program supports hierarchical fault-tree definition feature, which simplifies tree-description process and reduces execution time. Set of programs created forming basis for reliability-analysis workstation: SURE, ASSIST, PAWS/STEM, and FTC fault-tree tool (LAR-14586). Written in PASCAL, ANSI-compliant C language, and FORTRAN 77. Other versions available upon request.
Synchrophasor Sensor Networks for Grid Communication and Protection.
Gharavi, Hamid; Hu, Bin
2017-07-01
This paper focuses primarily on leveraging synchronized current/voltage amplitudes and phase angle measurements to foster new categories of applications, such as improving the effectiveness of grid protection and minimizing outage duration for distributed grid systems. The motivation for such an application arises from the fact that with the support of communication, synchronized measurements from multiple sites in a grid network can greatly enhance the accuracy and timeliness of identifying the source of instabilities. The paper first provides an overview of synchrophasor networks and then presents techniques for power quality assessment, including fault detection and protection. To achieve this we present a new synchrophasor data partitioning scheme that is based on the formation of a joint space and time observation vector. Since communication is an integral part of synchrophasor networks, the newly adopted wireless standard for machine-to-machine (M2M) communication, known as IEEE 802.11ah, has been investigated. The paper also presents a novel implementation of a hardware in the loop testbed for real-time performance evaluation. The purpose is to illustrate the use of both hardware and software tools to verify the performance of synchrophasor networks under more realistic environments. The testbed is a combination of grid network modeling, and an Emulab-based communication network. The combined grid and communication network is then used to assess power quality for fault detection and location using the IEEE 39-bus and 390-bus systems.
Synchrophasor Sensor Networks for Grid Communication and Protection
Gharavi, Hamid
2017-01-01
This paper focuses primarily on leveraging synchronized current/voltage amplitudes and phase angle measurements to foster new categories of applications, such as improving the effectiveness of grid protection and minimizing outage duration for distributed grid systems. The motivation for such an application arises from the fact that with the support of communication, synchronized measurements from multiple sites in a grid network can greatly enhance the accuracy and timeliness of identifying the source of instabilities. The paper first provides an overview of synchrophasor networks and then presents techniques for power quality assessment, including fault detection and protection. To achieve this we present a new synchrophasor data partitioning scheme that is based on the formation of a joint space and time observation vector. Since communication is an integral part of synchrophasor networks, the newly adopted wireless standard for machine-to-machine (M2M) communication, known as IEEE 802.11ah, has been investigated. The paper also presents a novel implementation of a hardware in the loop testbed for real-time performance evaluation. The purpose is to illustrate the use of both hardware and software tools to verify the performance of synchrophasor networks under more realistic environments. The testbed is a combination of grid network modeling, and an Emulab-based communication network. The combined grid and communication network is then used to assess power quality for fault detection and location using the IEEE 39-bus and 390-bus systems. PMID:28890553
76 FR 58424 - Transmission Relay Loadability Reliability Standard
Federal Register 2010, 2011, 2012, 2013, 2014
2011-09-21
... Protection Systems 2. Protective relays are devices that detect and initiate the removal of faults [[Page... protective relay detects a fault on an element of the system under its protection, it sends a signal to an... distribution providers to set load-responsive phase protection relays according to specific criteria to ensure...
MER surface fault protection system
NASA Technical Reports Server (NTRS)
Neilson, Tracy
2005-01-01
The Mars Exploration Rovers surface fault protection design was influenced by the fact that the solar-powered rovers must recharge their batteries during the day to survive the night. the rovers needed to autonomously maintain thermal stability, initiate safe and reliable communication with orbiting assets or directly to Earth, while maintaining energy balance. This paper will describe the system fault protection design for the surface phase of the mission.
NASA Technical Reports Server (NTRS)
Butler, Ricky W.; Martensen, Anna L.
1992-01-01
FTC, Fault-Tree Compiler program, is reliability-analysis software tool used to calculate probability of top event of fault tree. Five different types of gates allowed in fault tree: AND, OR, EXCLUSIVE OR, INVERT, and M OF N. High-level input language of FTC easy to understand and use. Program supports hierarchical fault-tree-definition feature simplifying process of description of tree and reduces execution time. Solution technique implemented in FORTRAN, and user interface in Pascal. Written to run on DEC VAX computer operating under VMS operating system.
Runtime Speculative Software-Only Fault Tolerance
2012-06-01
reliability of RSFT, a in-depth analysis on its window of vulnerability is also discussed and measured via simulated fault injection. The performance...propagation of faults through the entire program. For optimal performance, these techniques have to use herotic alias analysis to find the minimum set of...affect program output. No program source code or alias analysis is needed to analyze the fault propagation ahead of time. 2.3 Limitations of Existing
An implementation and performance measurement of the progressive retry technique
NASA Technical Reports Server (NTRS)
Suri, Gaurav; Huang, Yennun; Wang, Yi-Min; Fuchs, W. Kent; Kintala, Chandra
1995-01-01
This paper describes a recovery technique called progressive retry for bypassing software faults in message-passing applications. The technique is implemented as reusable modules to provide application-level software fault tolerance. The paper describes the implementation of the technique and presents results from the application of progressive retry to two telecommunications systems. the results presented show that the technique is helpful in reducing the total recovery time for message-passing applications.
An Empirical Approach to Logical Clustering of Software Failure Regions
1994-03-01
this is a coincidence or normal behavior of failure regions. " Software faults were numbered in order as they were discovered, by the various testing...locations of the associated faults. The goal of this research will be an improved testing technique that incorporates failure region behavior . To do this...clustering behavior . This, however, does not correlate with the structural clustering of failure regions observed by Ginn (1991) on the same set of data
Static and Dynamic Verification of Critical Software for Space Applications
NASA Astrophysics Data System (ADS)
Moreira, F.; Maia, R.; Costa, D.; Duro, N.; Rodríguez-Dapena, P.; Hjortnaes, K.
Space technology is no longer used only for much specialised research activities or for sophisticated manned space missions. Modern society relies more and more on space technology and applications for every day activities. Worldwide telecommunications, Earth observation, navigation and remote sensing are only a few examples of space applications on which we rely daily. The European driven global navigation system Galileo and its associated applications, e.g. air traffic management, vessel and car navigation, will significantly expand the already stringent safety requirements for space based applications Apart from their usefulness and practical applications, every single piece of onboard software deployed into the space represents an enormous investment. With a long lifetime operation and being extremely difficult to maintain and upgrade, at least when comparing with "mainstream" software development, the importance of ensuring their correctness before deployment is immense. Verification &Validation techniques and technologies have a key role in ensuring that the onboard software is correct and error free, or at least free from errors that can potentially lead to catastrophic failures. Many RAMS techniques including both static criticality analysis and dynamic verification techniques have been used as a means to verify and validate critical software and to ensure its correctness. But, traditionally, these have been isolated applied. One of the main reasons is the immaturity of this field in what concerns to its application to the increasing software product(s) within space systems. This paper presents an innovative way of combining both static and dynamic techniques exploiting their synergy and complementarity for software fault removal. The methodology proposed is based on the combination of Software FMEA and FTA with Fault-injection techniques. The case study herein described is implemented with support from two tools: The SoftCare tool for the SFMEA and SFTA, and the Xception tool for fault-injection. Keywords: Verification &Validation, RAMS, Onboard software, SFMEA, STA, Fault-injection 1 This work is being performed under the project STADY Applied Static And Dynamic Verification Of Critical Software, ESA/ESTEC Contract Nr. 15751/02/NL/LvH.
Maintaining the Health of Software Monitors
NASA Technical Reports Server (NTRS)
Person, Suzette; Rungta, Neha
2013-01-01
Software health management (SWHM) techniques complement the rigorous verification and validation processes that are applied to safety-critical systems prior to their deployment. These techniques are used to monitor deployed software in its execution environment, serving as the last line of defense against the effects of a critical fault. SWHM monitors use information from the specification and implementation of the monitored software to detect violations, predict possible failures, and help the system recover from faults. Changes to the monitored software, such as adding new functionality or fixing defects, therefore, have the potential to impact the correctness of both the monitored software and the SWHM monitor. In this work, we describe how the results of a software change impact analysis technique, Directed Incremental Symbolic Execution (DiSE), can be applied to monitored software to identify the potential impact of the changes on the SWHM monitor software. The results of DiSE can then be used by other analysis techniques, e.g., testing, debugging, to help preserve and improve the integrity of the SWHM monitor as the monitored software evolves.
Using certification trails to achieve software fault tolerance
NASA Technical Reports Server (NTRS)
Sullivan, Gregory F.; Masson, Gerald M.
1993-01-01
A conceptually novel and powerful technique to achieve fault tolerance in hardware and software systems is introduced. When used for software fault tolerance, this new technique uses time and software redundancy and can be outlined as follows. In the initial phase, a program is run to solve a problem and store the result. In addition, this program leaves behind a trail of data called a certification trail. In the second phase, another program is run which solves the original problem again. This program, however, has access to the certification trail left by the first program. Because of the availability of the certification trail, the second phase can be performed by a less complex program and can execute more quickly. In the final phase, the two results are accepted as correct; otherwise an error is indicated. An essential aspect of this approach is that the second program must always generate either an error indication or a correct output even when the certification trail it receives from the first program is incorrect. The certification trail approach to fault tolerance was formalized and it was illustrated by applying it to the fundamental problem of finding a minimum spanning tree. Cases in which the second phase can be run concorrectly with the first and act as a monitor are discussed. The certification trail approach was compared to other approaches to fault tolerance. Because of space limitations we have omitted examples of our technique applied to the Huffman tree, and convex hull problems. These can be found in the full version of this paper.
NASA Technical Reports Server (NTRS)
Basili, V. R.
1981-01-01
Work on metrics is discussed. Factors that affect software quality are reviewed. Metrics is discussed in terms of criteria achievements, reliability, and fault tolerance. Subjective and objective metrics are distinguished. Product/process and cost/quality metrics are characterized and discussed.
NASA Astrophysics Data System (ADS)
Yim, Keun Soo
This dissertation summarizes experimental validation and co-design studies conducted to optimize the fault detection capabilities and overheads in hybrid computer systems (e.g., using CPUs and Graphics Processing Units, or GPUs), and consequently to improve the scalability of parallel computer systems using computational accelerators. The experimental validation studies were conducted to help us understand the failure characteristics of CPU-GPU hybrid computer systems under various types of hardware faults. The main characterization targets were faults that are difficult to detect and/or recover from, e.g., faults that cause long latency failures (Ch. 3), faults in dynamically allocated resources (Ch. 4), faults in GPUs (Ch. 5), faults in MPI programs (Ch. 6), and microarchitecture-level faults with specific timing features (Ch. 7). The co-design studies were based on the characterization results. One of the co-designed systems has a set of source-to-source translators that customize and strategically place error detectors in the source code of target GPU programs (Ch. 5). Another co-designed system uses an extension card to learn the normal behavioral and semantic execution patterns of message-passing processes executing on CPUs, and to detect abnormal behaviors of those parallel processes (Ch. 6). The third co-designed system is a co-processor that has a set of new instructions in order to support software-implemented fault detection techniques (Ch. 7). The work described in this dissertation gains more importance because heterogeneous processors have become an essential component of state-of-the-art supercomputers. GPUs were used in three of the five fastest supercomputers that were operating in 2011. Our work included comprehensive fault characterization studies in CPU-GPU hybrid computers. In CPUs, we monitored the target systems for a long period of time after injecting faults (a temporally comprehensive experiment), and injected faults into various types of program states that included dynamically allocated memory (to be spatially comprehensive). In GPUs, we used fault injection studies to demonstrate the importance of detecting silent data corruption (SDC) errors that are mainly due to the lack of fine-grained protections and the massive use of fault-insensitive data. This dissertation also presents transparent fault tolerance frameworks and techniques that are directly applicable to hybrid computers built using only commercial off-the-shelf hardware components. This dissertation shows that by developing understanding of the failure characteristics and error propagation paths of target programs, we were able to create fault tolerance frameworks and techniques that can quickly detect and recover from hardware faults with low performance and hardware overheads.
Control and protection system for paralleled modular static inverter-converter systems
NASA Technical Reports Server (NTRS)
Birchenough, A. G.; Gourash, F.
1973-01-01
A control and protection system was developed for use with a paralleled 2.5-kWe-per-module static inverter-converter system. The control and protection system senses internal and external fault parameters such as voltage, frequency, current, and paralleling current unbalance. A logic system controls contactors to isolate defective power conditioners or loads. The system sequences contactor operation to automatically control parallel operation, startup, and fault isolation. Transient overload protection and fault checking sequences are included. The operation and performance of a control and protection system, with detailed circuit descriptions, are presented.
NASA Astrophysics Data System (ADS)
Swain, Snehaprava; Ray, Pravat Kumar
2016-12-01
In this paper a three phase fault analysis is done on a DFIG based grid integrated wind energy system. A Novel Active Crowbar Protection (NACB_P) system is proposed to enhance the Fault-ride through (FRT) capability of DFIG both for symmetrical as well as unsymmetrical grid faults. Hence improves the power quality of the system. The protection scheme proposed here is designed with a capacitor in series with the resistor unlike the conventional Crowbar (CB) having only resistors. The major function of the capacitor in the protection circuit is to eliminate the ripples generated in the rotor current and to protect the converter as well as the DC-link capacitor. It also compensates reactive power required by the DFIG during fault. Due to these advantages the proposed scheme enhances the FRT capability of the DFIG and also improves the power quality of the whole system. Experimentally the fault analysis is done on a 3hp slip ring induction generator and simulation results are carried out on a 1.7 MVA DFIG based WECS under different types of grid faults in MATLAB/Simulation and functionality of the proposed scheme is verified.
2016-08-16
Force Research Laboratory Space Vehicles Directorate AFRL /RVSV 3550 Aberdeen Ave, SE 11. SPONSOR/MONITOR’S REPORT Kirtland AFB, NM 87117-5776 NUMBER...Ft Belvoir, VA 22060-6218 1 cy AFRL /RVIL Kirtland AFB, NM 87117-5776 2 cys Official Record Copy AFRL /RVSV/Richard S. Erwin 1 cy... AFRL -RV-PS- AFRL -RV-PS- TR-2016-0112 TR-2016-0112 SPECIFICATION, SYNTHESIS, AND VERIFICATION OF SOFTWARE-BASED CONTROL PROTOCOLS FOR FAULT-TOLERANT
NASA Technical Reports Server (NTRS)
Hoppa, Mary Ann; Wilson, Larry W.
1994-01-01
There are many software reliability models which try to predict future performance of software based on data generated by the debugging process. Our research has shown that by improving the quality of the data one can greatly improve the predictions. We are working on methodologies which control some of the randomness inherent in the standard data generation processes in order to improve the accuracy of predictions. Our contribution is twofold in that we describe an experimental methodology using a data structure called the debugging graph and apply this methodology to assess the robustness of existing models. The debugging graph is used to analyze the effects of various fault recovery orders on the predictive accuracy of several well-known software reliability algorithms. We found that, along a particular debugging path in the graph, the predictive performance of different models can vary greatly. Similarly, just because a model 'fits' a given path's data well does not guarantee that the model would perform well on a different path. Further we observed bug interactions and noted their potential effects on the predictive process. We saw that not only do different faults fail at different rates, but that those rates can be affected by the particular debugging stage at which the rates are evaluated. Based on our experiment, we conjecture that the accuracy of a reliability prediction is affected by the fault recovery order as well as by fault interaction.
NASA Astrophysics Data System (ADS)
Abramovich, B. N.; Sychev, Yu A.; Pelenev, D. N.
2018-03-01
Development results of invariant protection of high-voltage motors at incomplete single-phase ground faults are observed in the article. It is established that current protections have low action selectivity because of an inadmissible decrease in entrance signals during the shirt circuit occurrence in the place of transient resistance. The structural functional scheme and an algorithm of protective actions where correction of automatic zero sequence currents signals of the protected accessions implemented according to the level of incompleteness of ground faults are developed. It is revealed that automatic correction of zero sequence currents allows one to provide the invariance of sensitivity factor for protection under the variation conditions of a transient resistance in the place of damage. Application of invariant protection allows one to minimize damages in 6-10 kV electrical installations of industrial enterprises for a cause of infringement of consumers’ power supply and their system breakdown due to timely localization of emergency of ground faults modes.
Zhang, Zhe; Kong, Xiangping; Yin, Xianggen; Yang, Zengli; Wang, Lijun
2014-01-01
In order to solve the problems of the existing wide-area backup protection (WABP) algorithms, the paper proposes a novel WABP algorithm based on the distribution characteristics of fault component current and improved Dempster/Shafer (D-S) evidence theory. When a fault occurs, slave substations transmit to master station the amplitudes of fault component currents of transmission lines which are the closest to fault element. Then master substation identifies suspicious faulty lines according to the distribution characteristics of fault component current. After that, the master substation will identify the actual faulty line with improved D-S evidence theory based on the action states of traditional protections and direction components of these suspicious faulty lines. The simulation examples based on IEEE 10-generator-39-bus system show that the proposed WABP algorithm has an excellent performance. The algorithm has low requirement of sampling synchronization, small wide-area communication flow, and high fault tolerance. PMID:25050399
Diagnosis diagrams for passing signals on an automatic block signaling railway section
NASA Astrophysics Data System (ADS)
Spunei, E.; Piroi, I.; Chioncel, C. P.; Piroi, F.
2018-01-01
This work presents a diagnosis method for railway traffic security installations. More specifically, the authors present a series of diagnosis charts for passing signals on a railway block equipped with an automatic block signaling installation. These charts are based on the exploitation electric schemes, and are subsequently used to develop a diagnosis software package. The thus developed software package contributes substantially to a reduction of failure detection and remedy for these types of installation faults. The use of the software package eliminates making wrong decisions in the fault detection process, decisions that may result in longer remedy times and, sometimes, to railway traffic events.
Certification of computational results
NASA Technical Reports Server (NTRS)
Sullivan, Gregory F.; Wilson, Dwight S.; Masson, Gerald M.
1993-01-01
A conceptually novel and powerful technique to achieve fault detection and fault tolerance in hardware and software systems is described. When used for software fault detection, this new technique uses time and software redundancy and can be outlined as follows. In the initial phase, a program is run to solve a problem and store the result. In addition, this program leaves behind a trail of data called a certification trail. In the second phase, another program is run which solves the original problem again. This program, however, has access to the certification trail left by the first program. Because of the availability of the certification trail, the second phase can be performed by a less complex program and can execute more quickly. In the final phase, the two results are compared and if they agree the results are accepted as correct; otherwise an error is indicated. An essential aspect of this approach is that the second program must always generate either an error indication or a correct output even when the certification trail it receives from the first program is incorrect. The certification trail approach to fault tolerance is formalized and realizations of it are illustrated by considering algorithms for the following problems: convex hull, sorting, and shortest path. Cases in which the second phase can be run concurrently with the first and act as a monitor are discussed. The certification trail approach are compared to other approaches to fault tolerance.
Quantitative method of medication system interface evaluation.
Pingenot, Alleene Anne; Shanteau, James; Pingenot, James D F
2007-01-01
The objective of this study was to develop a quantitative method of evaluating the user interface for medication system software. A detailed task analysis provided a description of user goals and essential activity. A structural fault analysis was used to develop a detailed description of the system interface. Nurses experienced with use of the system under evaluation provided estimates of failure rates for each point in this simplified fault tree. Means of estimated failure rates provided quantitative data for fault analysis. Authors note that, although failures of steps in the program were frequent, participants reported numerous methods of working around these failures so that overall system failure was rare. However, frequent process failure can affect the time required for processing medications, making a system inefficient. This method of interface analysis, called Software Efficiency Evaluation and Fault Identification Method, provides quantitative information with which prototypes can be compared and problems within an interface identified.
NASA Astrophysics Data System (ADS)
Da Silva, A.; Sánchez Prieto, S.; Polo, O.; Parra Espada, P.
2013-05-01
Because of the tough robustness requirements in space software development, it is imperative to carry out verification tasks at a very early development stage to ensure that the implemented exception mechanisms work properly. All this should be done long time before the real hardware is available. But even if real hardware is available the verification of software fault tolerance mechanisms can be difficult since real faulty situations must be systematically and artificially brought about which can be imposible on real hardware. To solve this problem the Alcala Space Research Group (SRG) has developed a LEON2 virtual platform (Leon2ViP) with fault injection capabilities. This way it is posible to run the exact same target binary software as runs on the physical system in a more controlled and deterministic environment, allowing a more strict requirements verification. Leon2ViP enables unmanned and tightly focused fault injection campaigns, not possible otherwise, in order to expose and diagnose flaws in the software implementation early. Furthermore, the use of a virtual hardware-in-the-loop approach makes it possible to carry out preliminary integration tests with the spacecraft emulator or the sensors. The use of Leon2ViP has meant a signicant improvement, in both time and cost, in the development and verification processes of the Instrument Control Unit boot software on board Solar Orbiter's Energetic Particle Detector.
Development and Testing of Protection Scheme for Renewable-Rich Distribution System
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brahma, Sukumar; Ranade, Satish; Elkhatib, Mohamed E.
As the penetration of renewables increases in the distribution systems, and microgrids are conceived with high penetration of such generation that connects through inverters, fault location and protection of microgrids needs consideration. This report proposes averaged models that help simulate fault scenarios in renewable-rich microgrids, models for locating faults in such microgrids, and comments on the protection models that may be considered for microgrids. Simulation studies are reported to justify the models.
Fault Detection and Correction for the Solar Dynamics Observatory Attitude Control System
NASA Technical Reports Server (NTRS)
Starin, Scott R.; Vess, Melissa F.; Kenney, Thomas M.; Maldonado, Manuel D.; Morgenstern, Wendy M.
2007-01-01
The Solar Dynamics Observatory is an Explorer-class mission that will launch in early 2009. The spacecraft will operate in a geosynchronous orbit, sending data 24 hours a day to a devoted ground station in White Sands, New Mexico. It will carry a suite of instruments designed to observe the Sun in multiple wavelengths at unprecedented resolution. The Atmospheric Imaging Assembly includes four telescopes with focal plane CCDs that can image the full solar disk in four different visible wavelengths. The Extreme-ultraviolet Variability Experiment will collect time-correlated data on the activity of the Sun's corona. The Helioseismic and Magnetic Imager will enable study of pressure waves moving through the body of the Sun. The attitude control system on Solar Dynamics Observatory is responsible for four main phases of activity. The physical safety of the spacecraft after separation must be guaranteed. Fine attitude determination and control must be sufficient for instrument calibration maneuvers. The mission science mode requires 2-arcsecond control according to error signals provided by guide telescopes on the Atmospheric Imaging Assembly, one of the three instruments to be carried. Lastly, accurate execution of linear and angular momentum changes to the spacecraft must be provided for momentum management and orbit maintenance. In thsp aper, single-fault tolerant fault detection and correction of the Solar Dynamics Observatory attitude control system is described. The attitude control hardware suite for the mission is catalogued, with special attention to redundancy at the hardware level. Four reaction wheels are used where any three are satisfactory. Four pairs of redundant thrusters are employed for orbit change maneuvers and momentum management. Three two-axis gyroscopes provide full redundancy for rate sensing. A digital Sun sensor and two autonomous star trackers provide two-out-of-three redundancy for fine attitude determination. The use of software to maximize chances of recovery from any hardware or software fault is detailed. A generic fault detection and correction software structure is used, allowing additions, deletions, and adjustments to fault detection and correction rules. This software structure is fed by in-line fault tests that are also able to take appropriate actions to avoid corruption of the data stream.
Fault-tolerant clock synchronization in distributed systems
NASA Technical Reports Server (NTRS)
Ramanathan, Parameswaran; Shin, Kang G.; Butler, Ricky W.
1990-01-01
Existing fault-tolerant clock synchronization algorithms are compared and contrasted. These include the following: software synchronization algorithms, such as convergence-averaging, convergence-nonaveraging, and consistency algorithms, as well as probabilistic synchronization; hardware synchronization algorithms; and hybrid synchronization. The worst-case clock skews guaranteed by representative algorithms are compared, along with other important aspects such as time, message, and cost overhead imposed by the algorithms. More recent developments such as hardware-assisted software synchronization and algorithms for synchronizing large, partially connected distributed systems are especially emphasized.
The development of an interim generalized gate logic software simulator
NASA Technical Reports Server (NTRS)
Mcgough, J. G.; Nemeroff, S.
1985-01-01
A proof-of-concept computer program called IGGLOSS (Interim Generalized Gate Logic Software Simulator) was developed and is discussed. The simulator engine was designed to perform stochastic estimation of self test coverage (fault-detection latency times) of digital computers or systems. A major attribute of the IGGLOSS is its high-speed simulation: 9.5 x 1,000,000 gates/cpu sec for nonfaulted circuits and 4.4 x 1,000,000 gates/cpu sec for faulted circuits on a VAX 11/780 host computer.
The Design of a Fault-Tolerant COTS-Based Bus Architecture
NASA Technical Reports Server (NTRS)
Chau, Savio N.; Alkalai, Leon; Burt, John B.; Tai, Ann T.
1999-01-01
In this paper, we report our experiences and findings on the design of a fault-tolerant bus architecture comprised of two COTS buses, the IEEE 1394 and the 12C. This fault-tolerant bus is the backbone system bus for the avionics architecture of the X2000 program at the Jet Propulsion Laboratory. COTS buses are attractive because of the availability of low cost commercial products. However, they are not specifically designed for highly reliable applications such as long-life deep-space missions. The X2000 design team has devised a multi-level fault tolerance approach to compensate for this shortcoming of COTS buses. First, the approach enhances the fault tolerance capabilities of the IEEE 1394 and 12 C buses by adding a layer of fault handling hardware and software. Second, algorithms are developed to enable the IEEE 1394 and the 12 C buses assist each other to isolate and recovery from faults. Third, the set of IEEE 1394 and 12 C buses is duplicated to further enhance system reliability. The X2000 design team has paid special attention to guarantee that all fault tolerance provisions will not cause the bus design to deviate from the commercial standard specifications. Otherwise, the economic attractiveness of using COTS will be diminished. The hardware and software design of the X2000 fault-tolerant bus are being implemented and flight hardware will be delivered to the ST4 and Europa Orbiter missions.
AADL and Model-based Engineering
2014-10-20
and MBE Feiler, Oct 20, 2014 © 2014 Carnegie Mellon University We Rely on Software for Safe Aircraft Operation Embedded software systems ...D eveloper Compute Platform Runtime Architecture Application Software Embedded SW System Engineer Data Stream Characteristics Latency...confusion Hardware Engineer Why do system level failures still occur despite fault tolerance techniques being deployed in systems ? Embedded software
A survey of fault diagnosis technology
NASA Technical Reports Server (NTRS)
Riedesel, Joel
1989-01-01
Existing techniques and methodologies for fault diagnosis are surveyed. The techniques run the gamut from theoretical artificial intelligence work to conventional software engineering applications. They are shown to define a spectrum of implementation alternatives where tradeoffs determine their position on the spectrum. Various tradeoffs include execution time limitations and memory requirements of the algorithms as well as their effectiveness in addressing the fault diagnosis problem.
An experiment in software reliability: Additional analyses using data from automated replications
NASA Technical Reports Server (NTRS)
Dunham, Janet R.; Lauterbach, Linda A.
1988-01-01
A study undertaken to collect software error data of laboratory quality for use in the development of credible methods for predicting the reliability of software used in life-critical applications is summarized. The software error data reported were acquired through automated repetitive run testing of three independent implementations of a launch interceptor condition module of a radar tracking problem. The results are based on 100 test applications to accumulate a sufficient sample size for error rate estimation. The data collected is used to confirm the results of two Boeing studies reported in NASA-CR-165836 Software Reliability: Repetitive Run Experimentation and Modeling, and NASA-CR-172378 Software Reliability: Additional Investigations into Modeling With Replicated Experiments, respectively. That is, the results confirm the log-linear pattern of software error rates and reject the hypothesis of equal error rates per individual fault. This rejection casts doubt on the assumption that the program's failure rate is a constant multiple of the number of residual bugs; an assumption which underlies some of the current models of software reliability. data raises new questions concerning the phenomenon of interacting faults.
Software Health Management with Bayesian Networks
NASA Technical Reports Server (NTRS)
Mengshoel, Ole; Schumann, JOhann
2011-01-01
Most modern aircraft as well as other complex machinery is equipped with diagnostics systems for its major subsystems. During operation, sensors provide important information about the subsystem (e.g., the engine) and that information is used to detect and diagnose faults. Most of these systems focus on the monitoring of a mechanical, hydraulic, or electromechanical subsystem of the vehicle or machinery. Only recently, health management systems that monitor software have been developed. In this paper, we will discuss our approach of using Bayesian networks for Software Health Management (SWHM). We will discuss SWHM requirements, which make advanced reasoning capabilities for the detection and diagnosis important. Then we will present our approach to using Bayesian networks for the construction of health models that dynamically monitor a software system and is capable of detecting and diagnosing faults.
NASA Astrophysics Data System (ADS)
Kettle, L. M.; Mora, P.; Weatherley, D.; Gross, L.; Xing, H.
2006-12-01
Simulations using the Finite Element method are widely used in many engineering applications and for the solution of partial differential equations (PDEs). Computational models based on the solution of PDEs play a key role in earth systems simulations. We present numerical modelling of crustal fault systems where the dynamic elastic wave equation is solved using the Finite Element method. This is achieved using a high level computational modelling language, escript, available as open source software from ACcESS (Australian Computational Earth Systems Simulator), the University of Queensland. Escript is an advanced geophysical simulation software package developed at ACcESS which includes parallel equation solvers, data visualisation and data analysis software. The escript library was implemented to develop a flexible Finite Element model which reliably simulates the mechanism of faulting and the physics of earthquakes. Both 2D and 3D elastodynamic models are being developed to study the dynamics of crustal fault systems. Our final goal is to build a flexible model which can be applied to any fault system with user-defined geometry and input parameters. To study the physics of earthquake processes, two different time scales must be modelled, firstly the quasi-static loading phase which gradually increases stress in the system (~100years), and secondly the dynamic rupture process which rapidly redistributes stress in the system (~100secs). We will discuss the solution of the time-dependent elastic wave equation for an arbitrary fault system using escript. This involves prescribing the correct initial stress distribution in the system to simulate the quasi-static loading of faults to failure; determining a suitable frictional constitutive law which accurately reproduces the dynamics of the stick/slip instability at the faults; and using a robust time integration scheme. These dynamic models generate data and information that can be used for earthquake forecasting.
SIRU utilization. Volume 2: Software description and program documentation
NASA Technical Reports Server (NTRS)
Oehrle, J.; Whittredge, R.
1973-01-01
A complete description of the additional analysis, development and evaluation provided for the SIRU system as identified in the requirements for the SIRU utilization program is presented. The SIRU configuration is a modular inertial subsystem with hardware and software features that achieve fault tolerant operational capabilities. The SIRU redundant hardware design is formulated about a six gyro and six accelerometer instrument module package. The modules are mounted in this package so that their measurement input axes form a unique symmetrical pattern that corresponds to the array of perpendiculars to the faces of a regular dodecahedron. This six axes array provides redundant independent sensing and the symmetry enables the formulation of an optimal software redundant data processing structure with self-contained fault detection and isolation (FDI) capabilities. Documentation of the additional software and software modifications required to implement the utilization capabilities includes assembly listings and flow charts
A theoretical basis for the analysis of multiversion software subject to coincident errors
NASA Technical Reports Server (NTRS)
Eckhardt, D. E., Jr.; Lee, L. D.
1985-01-01
Fundamental to the development of redundant software techniques (known as fault-tolerant software) is an understanding of the impact of multiple joint occurrences of errors, referred to here as coincident errors. A theoretical basis for the study of redundant software is developed which: (1) provides a probabilistic framework for empirically evaluating the effectiveness of a general multiversion strategy when component versions are subject to coincident errors, and (2) permits an analytical study of the effects of these errors. An intensity function, called the intensity of coincident errors, has a central role in this analysis. This function describes the propensity of programmers to introduce design faults in such a way that software components fail together when executing in the application environment. A condition under which a multiversion system is a better strategy than relying on a single version is given.
Learning from examples - Generation and evaluation of decision trees for software resource analysis
NASA Technical Reports Server (NTRS)
Selby, Richard W.; Porter, Adam A.
1988-01-01
A general solution method for the automatic generation of decision (or classification) trees is investigated. The approach is to provide insights through in-depth empirical characterization and evaluation of decision trees for software resource data analysis. The trees identify classes of objects (software modules) that had high development effort. Sixteen software systems ranging from 3,000 to 112,000 source lines were selected for analysis from a NASA production environment. The collection and analysis of 74 attributes (or metrics), for over 4,700 objects, captured information about the development effort, faults, changes, design style, and implementation style. A total of 9,600 decision trees were automatically generated and evaluated. The trees correctly identified 79.3 percent of the software modules that had high development effort or faults, and the trees generated from the best parameter combinations correctly identified 88.4 percent of the modules on the average.
Mars Reconnaissance Orbiter In-flight Anomalies and Lessons Learned: An Update
NASA Technical Reports Server (NTRS)
Bayer, Todd J.
2008-01-01
The Mars Reconnaissance Orbiter mission has as its primary objectives: advance our understanding of the current Mars climate, the processes that have formed and modified the surface of the planet and the extent to which water has played a role in surface processes; identify sites of possible aqueous activity indicating environments that may have been or are conducive to biological activity; and thus identify and characterize sites for future landed missions; and provide forward and return relay services for current and future Mars landed assets. MRO's crucial role in the long term strategy for Mars exploration requires a high level of reliability during its 5.4 year mission. This requires an architecture which incorporates extensive redundancy and cross-strapping. Because of the distances and hence light-times involved, the spacecraft itself must be able to utilize this redundancy in responding to time-critical failures. For cases where fault protection is unable to recognize a potentially threatening condition, either due to known limitations or software flaws, intervention by ground operations is required. These aspects of MRO's design were discussed in a previous paper [Ref. 1]. This paper provides an update to the original paper, describing MRO's significant in-flight anomalies over the past year, with lessons learned for redundancy and fault protection architectures and for ground operations.
Method and system for diagnostics of apparatus
NASA Technical Reports Server (NTRS)
Gorinevsky, Dimitry (Inventor)
2012-01-01
Proposed is a method, implemented in software, for estimating fault state of an apparatus outfitted with sensors. At each execution period the method processes sensor data from the apparatus to obtain a set of parity parameters, which are further used for estimating fault state. The estimation method formulates a convex optimization problem for each fault hypothesis and employs a convex solver to compute fault parameter estimates and fault likelihoods for each fault hypothesis. The highest likelihoods and corresponding parameter estimates are transmitted to a display device or an automated decision and control system. The obtained accurate estimate of fault state can be used to improve safety, performance, or maintenance processes for the apparatus.
Fault Analysis and Detection in Microgrids with High PV Penetration
DOE Office of Scientific and Technical Information (OSTI.GOV)
El Khatib, Mohamed; Hernandez Alvidrez, Javier; Ellis, Abraham
In this report we focus on analyzing current-controlled PV inverters behaviour under faults in order to develop fault detection schemes for microgrids with high PV penetration. Inverter model suitable for steady state fault studies is presented and the impact of PV inverters on two protection elements is analyzed. The studied protection elements are superimposed quantities based directional element and negative sequence directional element. Additionally, several non-overcurrent fault detection schemes are discussed in this report for microgrids with high PV penetration. A detailed time-domain simulation study is presented to assess the performance of the presented fault detection schemes under different microgridmore » modes of operation.« less
Multi-Agent Diagnosis and Control of an Air Revitalization System for Life Support in Space
NASA Technical Reports Server (NTRS)
Malin, Jane T.; Kowing, Jeffrey; Nieten, Joseph; Graham, Jeffrey s.; Schreckenghost, Debra; Bonasso, Pete; Fleming, Land D.; MacMahon, Matt; Thronesbery, Carroll
2000-01-01
An architecture of interoperating agents has been developed to provide control and fault management for advanced life support systems in space. In this adjustable autonomy architecture, software agents coordinate with human agents and provide support in novel fault management situations. This architecture combines the Livingstone model-based mode identification and reconfiguration (MIR) system with the 3T architecture for autonomous flexible command and control. The MIR software agent performs model-based state identification and diagnosis. MIR identifies novel recovery configurations and the set of commands required for the recovery. The AZT procedural executive and the human operator use the diagnoses and recovery recommendations, and provide command sequencing. User interface extensions have been developed to support human monitoring of both AZT and MIR data and activities. This architecture has been demonstrated performing control and fault management for an oxygen production system for air revitalization in space. The software operates in a dynamic simulation testbed.
Transparent Ada rendezvous in a fault tolerant distributed system
NASA Technical Reports Server (NTRS)
Racine, Roger
1986-01-01
There are many problems associated with distributing an Ada program over a loosely coupled communication network. Some of these problems involve the various aspects of the distributed rendezvous. The problems addressed involve supporting the delay statement in a selective call and supporting the else clause in a selective call. Most of these difficulties are compounded by the need for an efficient communication system. The difficulties are compounded even more by considering the possibility of hardware faults occurring while the program is running. With a hardware fault tolerant computer system, it is possible to design a distribution scheme and communication software which is efficient and allows Ada semantics to be preserved. An Ada design for the communications software of one such system will be presented, including a description of the services provided in the seven layers of an International Standards Organization (ISO) Open System Interconnect (OSI) model communications system. The system capabilities (hardware and software) that allow this communication system will also be described.
Characterization of the faulted behavior of digital computers and fault tolerant systems
NASA Technical Reports Server (NTRS)
Bavuso, Salvatore J.; Miner, Paul S.
1989-01-01
A development status evaluation is presented for efforts conducted at NASA-Langley since 1977, toward the characterization of the latent fault in digital fault-tolerant systems. Attention is given to the practical, high speed, generalized gate-level logic system simulator developed, as well as to the validation methodology used for the simulator, on the basis of faultable software and hardware simulations employing a prototype MIL-STD-1750A processor. After validation, latency tests will be performed.
Computer Sciences and Data Systems, volume 1
NASA Technical Reports Server (NTRS)
1987-01-01
Topics addressed include: software engineering; university grants; institutes; concurrent processing; sparse distributed memory; distributed operating systems; intelligent data management processes; expert system for image analysis; fault tolerant software; and architecture research.
Survey of Verification and Validation Techniques for Small Satellite Software Development
NASA Technical Reports Server (NTRS)
Jacklin, Stephen A.
2015-01-01
The purpose of this paper is to provide an overview of the current trends and practices in small-satellite software verification and validation. This document is not intended to promote a specific software assurance method. Rather, it seeks to present an unbiased survey of software assurance methods used to verify and validate small satellite software and to make mention of the benefits and value of each approach. These methods include simulation and testing, verification and validation with model-based design, formal methods, and fault-tolerant software design with run-time monitoring. Although the literature reveals that simulation and testing has by far the longest legacy, model-based design methods are proving to be useful for software verification and validation. Some work in formal methods, though not widely used for any satellites, may offer new ways to improve small satellite software verification and validation. These methods need to be further advanced to deal with the state explosion problem and to make them more usable by small-satellite software engineers to be regularly applied to software verification. Last, it is explained how run-time monitoring, combined with fault-tolerant software design methods, provides an important means to detect and correct software errors that escape the verification process or those errors that are produced after launch through the effects of ionizing radiation.
Using Remote Sensing Data to Constrain Models of Fault Interactions and Plate Boundary Deformation
NASA Astrophysics Data System (ADS)
Glasscoe, M. T.; Donnellan, A.; Lyzenga, G. A.; Parker, J. W.; Milliner, C. W. D.
2016-12-01
Determining the distribution of slip and behavior of fault interactions at plate boundaries is a complex problem. Field and remotely sensed data often lack the necessary coverage to fully resolve fault behavior. However, realistic physical models may be used to more accurately characterize the complex behavior of faults constrained with observed data, such as GPS, InSAR, and SfM. These results will improve the utility of using combined models and data to estimate earthquake potential and characterize plate boundary behavior. Plate boundary faults exhibit complex behavior, with partitioned slip and distributed deformation. To investigate what fraction of slip becomes distributed deformation off major faults, we examine a model fault embedded within a damage zone of reduced elastic rigidity that narrows with depth and forward model the slip and resulting surface deformation. The fault segments and slip distributions are modeled using the JPL GeoFEST software. GeoFEST (Geophysical Finite Element Simulation Tool) is a two- and three-dimensional finite element software package for modeling solid stress and strain in geophysical and other continuum domain applications [Lyzenga, et al., 2000; Glasscoe, et al., 2004; Parker, et al., 2008, 2010]. New methods to advance geohazards research using computer simulations and remotely sensed observations for model validation are required to understand fault slip, the complex nature of fault interaction and plate boundary deformation. These models help enhance our understanding of the underlying processes, such as transient deformation and fault creep, and can aid in developing observation strategies for sUAV, airborne, and upcoming satellite missions seeking to determine how faults behave and interact and assess their associated hazard. Models will also help to characterize this behavior, which will enable improvements in hazard estimation. Validating the model results against remotely sensed observations will allow us to better constrain fault zone rheology and physical properties, having implications for the overall understanding of earthquake physics, fault interactions, plate boundary deformation and earthquake hazard, preparedness and risk reduction.
Fault Tolerant Real-Time Systems
1993-09-30
The ART (Advanced Real-Time Technology) Project of Carnegie Mellon University is engaged in wide ranging research on hard real - time systems . The...including hardware and software fault tolerance using temporal redundancy and analytic redundancy to permit the construction of real - time systems whose
Calculation and use of an environment's characteristic software metric set
NASA Technical Reports Server (NTRS)
Basili, Victor R.; Selby, Richard W., Jr.
1985-01-01
Since both cost/quality and production environments differ, this study presents an approach for customizing a characteristic set of software metrics to an environment. The approach is applied in the Software Engineering Laboratory (SEL), a NASA Goddard production environment, to 49 candidate process and product metrics of 652 modules from six (51,000 to 112,000 lines) projects. For this particular environment, the method yielded the characteristic metric set (source lines, fault correction effort per executable statement, design effort, code effort, number of I/O parameters, number of versions). The uses examined for a characteristic metric set include forecasting the effort for development, modification, and fault correction of modules based on historical data.
2009-09-01
this information supports the decison - making process as it is applied to the management of risk. 2. Operational Risk Operational risk is the threat... reasonability . However, to make a software system fault tolerant, the system needs to recognize and fix a system state condition. To detect a fault, a fault...Tracking ..........................................51 C. DECISION- MAKING PROCESS................................................................51 1. Risk
Secure Embedded System Design Methodologies for Military Cryptographic Systems
2016-03-31
Fault- Tree Analysis (FTA); Built-In Self-Test (BIST) Introduction Secure access-control systems restrict operations to authorized users via methods...failures in the individual software/processor elements, the question of exactly how unlikely is difficult to answer. Fault- Tree Analysis (FTA) has a...Collins of Sandia National Laboratories for years of sharing his extensive knowledge of Fail-Safe Design Assurance and Fault- Tree Analysis
Intelligent fault management for the Space Station active thermal control system
NASA Technical Reports Server (NTRS)
Hill, Tim; Faltisco, Robert M.
1992-01-01
The Thermal Advanced Automation Project (TAAP) approach and architecture is described for automating the Space Station Freedom (SSF) Active Thermal Control System (ATCS). The baseline functionally and advanced automation techniques for Fault Detection, Isolation, and Recovery (FDIR) will be compared and contrasted. Advanced automation techniques such as rule-based systems and model-based reasoning should be utilized to efficiently control, monitor, and diagnose this extremely complex physical system. TAAP is developing advanced FDIR software for use on the SSF thermal control system. The goal of TAAP is to join Knowledge-Based System (KBS) technology, using a combination of rules and model-based reasoning, with conventional monitoring and control software in order to maximize autonomy of the ATCS. TAAP's predecessor was NASA's Thermal Expert System (TEXSYS) project which was the first large real-time expert system to use both extensive rules and model-based reasoning to control and perform FDIR on a large, complex physical system. TEXSYS showed that a method is needed for safely and inexpensively testing all possible faults of the ATCS, particularly those potentially damaging to the hardware, in order to develop a fully capable FDIR system. TAAP therefore includes the development of a high-fidelity simulation of the thermal control system. The simulation provides realistic, dynamic ATCS behavior and fault insertion capability for software testing without hardware related risks or expense. In addition, thermal engineers will gain greater confidence in the KBS FDIR software than was possible prior to this kind of simulation testing. The TAAP KBS will initially be a ground-based extension of the baseline ATCS monitoring and control software and could be migrated on-board as additional computation resources are made available.
Development of an Environment for Software Reliability Model Selection
1992-09-01
now is directed to other related problems such as tools for model selection, multiversion programming, and software fault tolerance modeling... multiversion programming, 7. Hlardware can be repaired by spare modules, which is not. the case for software, 2-6 N. Preventive maintenance is very important
Power plant fault detection using artificial neural network
NASA Astrophysics Data System (ADS)
Thanakodi, Suresh; Nazar, Nazatul Shiema Moh; Joini, Nur Fazriana; Hidzir, Hidzrin Dayana Mohd; Awira, Mohammad Zulfikar Khairul
2018-02-01
The fault that commonly occurs in power plants is due to various factors that affect the system outage. There are many types of faults in power plants such as single line to ground fault, double line to ground fault, and line to line fault. The primary aim of this paper is to diagnose the fault in 14 buses power plants by using an Artificial Neural Network (ANN). The Multilayered Perceptron Network (MLP) that detection trained utilized the offline training methods such as Gradient Descent Backpropagation (GDBP), Levenberg-Marquardt (LM), and Bayesian Regularization (BR). The best method is used to build the Graphical User Interface (GUI). The modelling of 14 buses power plant, network training, and GUI used the MATLAB software.
Development of N-version software samples for an experiment in software fault tolerance
NASA Technical Reports Server (NTRS)
Lauterbach, L.
1987-01-01
The report documents the task planning and software development phases of an effort to obtain twenty versions of code independently designed and developed from a common specification. These versions were created for use in future experiments in software fault tolerance, in continuation of the experimental series underway at the Systems Validation Methods Branch (SVMB) at NASA Langley Research Center. The 20 versions were developed under controlled conditions at four U.S. universities, by 20 teams of two researchers each. The versions process raw data from a modified Redundant Strapped Down Inertial Measurement Unit (RSDIMU). The specifications, and over 200 questions submitted by the developers concerning the specifications, are included as appendices to this report. Design documents, and design and code walkthrough reports for each version, were also obtained in this task for use in future studies.
Logic flowgraph methodology - A tool for modeling embedded systems
NASA Technical Reports Server (NTRS)
Muthukumar, C. T.; Guarro, S. B.; Apostolakis, G. E.
1991-01-01
The logic flowgraph methodology (LFM), a method for modeling hardware in terms of its process parameters, has been extended to form an analytical tool for the analysis of integrated (hardware/software) embedded systems. In the software part of a given embedded system model, timing and the control flow among different software components are modeled by augmenting LFM with modified Petrinet structures. The objective of the use of such an augmented LFM model is to uncover possible errors and the potential for unanticipated software/hardware interactions. This is done by backtracking through the augmented LFM mode according to established procedures which allow the semiautomated construction of fault trees for any chosen state of the embedded system (top event). These fault trees, in turn, produce the possible combinations of lower-level states (events) that may lead to the top event.
Surrogate oracles, generalized dependency and simpler models
NASA Technical Reports Server (NTRS)
Wilson, Larry
1990-01-01
Software reliability models require the sequence of interfailure times from the debugging process as input. It was previously illustrated that using data from replicated debugging could greatly improve reliability predictions. However, inexpensive replication of the debugging process requires the existence of a cheap, fast error detector. Laboratory experiments can be designed around a gold version which is used as an oracle or around an n-version error detector. Unfortunately, software developers can not be expected to have an oracle or to bear the expense of n-versions. A generic technique is being investigated for approximating replicated data by using the partially debugged software as a difference detector. It is believed that the failure rate of each fault has significant dependence on the presence or absence of other faults. Thus, in order to discuss a failure rate for a known fault, the presence or absence of each of the other known faults needs to be specified. Also, in simpler models which use shorter input sequences without sacrificing accuracy are of interest. In fact, a possible gain in performance is conjectured. To investigate these propositions, NASA computers running LIC (RTI) versions are used to generate data. This data will be used to label the debugging graph associated with each version. These labeled graphs will be used to test the utility of a surrogate oracle, to analyze the dependent nature of fault failure rates and to explore the feasibility of reliability models which use the data of only the most recent failures.
Testing Scientific Software: A Systematic Literature Review.
Kanewala, Upulee; Bieman, James M
2014-10-01
Scientific software plays an important role in critical decision making, for example making weather predictions based on climate models, and computation of evidence for research publications. Recently, scientists have had to retract publications due to errors caused by software faults. Systematic testing can identify such faults in code. This study aims to identify specific challenges, proposed solutions, and unsolved problems faced when testing scientific software. We conducted a systematic literature survey to identify and analyze relevant literature. We identified 62 studies that provided relevant information about testing scientific software. We found that challenges faced when testing scientific software fall into two main categories: (1) testing challenges that occur due to characteristics of scientific software such as oracle problems and (2) testing challenges that occur due to cultural differences between scientists and the software engineering community such as viewing the code and the model that it implements as inseparable entities. In addition, we identified methods to potentially overcome these challenges and their limitations. Finally we describe unsolved challenges and how software engineering researchers and practitioners can help to overcome them. Scientific software presents special challenges for testing. Specifically, cultural differences between scientist developers and software engineers, along with the characteristics of the scientific software make testing more difficult. Existing techniques such as code clone detection can help to improve the testing process. Software engineers should consider special challenges posed by scientific software such as oracle problems when developing testing techniques.
Implementation of an experimental fault-tolerant memory system
NASA Technical Reports Server (NTRS)
Carter, W. C.; Mccarthy, C. E.
1976-01-01
The experimental fault-tolerant memory system described in this paper has been designed to enable the modular addition of spares, to validate the theoretical fault-secure and self-testing properties of the translator/corrector, to provide a basis for experiments using the new testing and correction processes for recovery, and to determine the practicality of such systems. The hardware design and implementation are described, together with methods of fault insertion. The hardware/software interface, including a restricted single error correction/double error detection (SEC/DED) code, is specified. Procedures are carefully described which, (1) test for specified physical faults, (2) ensure that single error corrections are not miscorrections due to triple faults, and (3) enable recovery from double errors.
NASA Technical Reports Server (NTRS)
Freeman, Kenneth A.; Walsh, Rick; Weeks, David J.
1988-01-01
Space Station issues in fault management are discussed. The system background is described with attention given to design guidelines and power hardware. A contractually developed fault management system, FRAMES, is integrated with the energy management functions, the control switchgear, and the scheduling and operations management functions. The constraints that shaped the FRAMES system and its implementation are considered.
Integrated Environment for Development and Assurance
2015-01-26
Jan 26, 2015 © 2015 Carnegie Mellon University We Rely on Software for Safe Aircraft Operation Embedded software systems introduce a new class of...eveloper Compute Platform Runtime Architecture Application Software Embedded SW System Engineer Data Stream Characteristics Latency jitter affects...Why do system level failures still occur despite fault tolerance techniques being deployed in systems ? Embedded software system as major source of
DOE Office of Scientific and Technical Information (OSTI.GOV)
Habib, Hany F; Lashway, Christopher R; Mohammed, Osama A
One main challenge in the practical implementation of a microgrid is the design of an adequate protection scheme in both grid connected and islanded modes. Conventional overcurrent protection schemes face selectivity and sensitivity issues during grid and microgrid faults since the fault current level is different in both cases for the same relay. Various approaches have been implemented in the past to deal with this problem, yet the most promising ones are the implementation of adaptive protection techniques abiding by the IEC 61850 communication standard. This paper presents a critical review of existing adaptive protection schemes, the technical challenges formore » the use of classical protection techniques and the need for an adaptive, smart protection system. However, the risk of communication link failures and cyber security threats still remain a challenge in implementing a reliable adaptive protection scheme. A contingency is needed where a communication issue prevents the relay from adjusting to a lower current level during islanded mode. An adaptive protection scheme is proposed that utilizes energy storage (ES) and hybrid ES (HESS) already available in the network as a mechanism to source the higher fault current. Four common grid ES and HESS are reviewed for their suitability in feeding the fault while some solutions are proposed.« less
Software For Fault-Tree Diagnosis Of A System
NASA Technical Reports Server (NTRS)
Iverson, Dave; Patterson-Hine, Ann; Liao, Jack
1993-01-01
Fault Tree Diagnosis System (FTDS) computer program is automated-diagnostic-system program identifying likely causes of specified failure on basis of information represented in system-reliability mathematical models known as fault trees. Is modified implementation of failure-cause-identification phase of Narayanan's and Viswanadham's methodology for acquisition of knowledge and reasoning in analyzing failures of systems. Knowledge base of if/then rules replaced with object-oriented fault-tree representation. Enhancement yields more-efficient identification of causes of failures and enables dynamic updating of knowledge base. Written in C language, C++, and Common LISP.
Flight elements: Fault detection and fault management
NASA Technical Reports Server (NTRS)
Lum, H.; Patterson-Hine, A.; Edge, J. T.; Lawler, D.
1990-01-01
Fault management for an intelligent computational system must be developed using a top down integrated engineering approach. An approach proposed includes integrating the overall environment involving sensors and their associated data; design knowledge capture; operations; fault detection, identification, and reconfiguration; testability; causal models including digraph matrix analysis; and overall performance impacts on the hardware and software architecture. Implementation of the concept to achieve a real time intelligent fault detection and management system will be accomplished via the implementation of several objectives, which are: Development of fault tolerant/FDIR requirement and specification from a systems level which will carry through from conceptual design through implementation and mission operations; Implementation of monitoring, diagnosis, and reconfiguration at all system levels providing fault isolation and system integration; Optimize system operations to manage degraded system performance through system integration; and Lower development and operations costs through the implementation of an intelligent real time fault detection and fault management system and an information management system.
NASA Astrophysics Data System (ADS)
Lin, S.; Luo, D.; Yanlin, F.; Li, Y.
2016-12-01
Shallow Seismic Reflection (SSR) is a major geophysical exploration method with its exploration depth range, high-resolution in urban active fault exploration. In this paper, we carried out (SSR) and High-resolution refraction (HRR) test in the Liangyun Basin to explore a buried fault. We used NZ distributed 64 channel seismic instrument, 60HZ high sensitivity detector, Geode multi-channel portable acquisition system and hammer source. We selected single side hammer hit multiple overlay, 48 channels received and 12 times of coverage. As there are some coincidence measuring lines of SSR and HRR, we chose multi chase and encounter observation system. Based on the satellite positioning, we arranged 11 survey lines in our study area with total length for 8132 meters. GEOGIGA seismic reflection data processing software was used to deal with the SSR data. After repeated tests from the aspects of single shot record compilation, interference wave pressing, static correction, velocity parameter extraction, dynamic correction, eventually got the shallow seismic reflection profile images. Meanwhile, we used Canadian technology company good refraction and tomographic imaging software to deal with HRR seismic data, which is based on nonlinear first arrival wave travel time tomography. Combined with drilling geological profiles, we explained 11 measured seismic profiles. Results show 18 obvious fault feature breakpoints, including 4 normal faults of south-west, 7 reverse faults of south-west, one normal fault of north-east and 6 reverse faults of north-east. Breakpoints buried depth is 15-18 meters, and the inferred fault distance is 3-12 meters. Comprehensive analysis shows that the fault property is reverse fault with northeast incline section, and fewer branch normal faults presenting southwest incline section. Since good corresponding relationship between the seismic interpretation results, drilling data and SEM results on the property, occurrence, broken length of the fault, we considered the Liangyun fault to be an active fault which has strong activity during the Neogene Pliocene and early Pleistocene, Middle Pleistocene period. The combined application of SSR and HRR can provide more parameters to explain the seismic results, and improve the accuracy of the interpretation.
Tools Ensure Reliability of Critical Software
NASA Technical Reports Server (NTRS)
2012-01-01
In November 2006, after attempting to make a routine maneuver, NASA's Mars Global Surveyor (MGS) reported unexpected errors. The onboard software switched to backup resources, and a 2-day lapse in communication took place between the spacecraft and Earth. When a signal was finally received, it indicated that MGS had entered safe mode, a state of restricted activity in which the computer awaits instructions from Earth. After more than 9 years of successful operation gathering data and snapping pictures of Mars to characterize the planet's land and weather communication between MGS and Earth suddenly stopped. Months later, a report from NASA's internal review board found the spacecraft's battery failed due to an unfortunate sequence of events. Updates to the spacecraft's software, which had taken place months earlier, were written to the wrong memory address in the spacecraft's computer. In short, the mission ended because of a software defect. Over the last decade, spacecraft have become increasingly reliant on software to carry out mission operations. In fact, the next mission to Mars, the Mars Science Laboratory, will rely on more software than all earlier missions to Mars combined. According to Gerard Holzmann, manager at the Laboratory for Reliable Software (LaRS) at NASA's Jet Propulsion Laboratory (JPL), even the fault protection systems on a spacecraft are mostly software-based. For reasons like these, well-functioning software is critical for NASA. In the same year as the failure of MGS, Holzmann presented a new approach to critical software development to help reduce risk and provide consistency. He proposed The Power of 10: Rules for Developing Safety-Critical Code, which is a small set of rules that can easily be remembered, clearly relate to risk, and allow compliance to be verified. The reaction at JPL was positive, and developers in the private sector embraced Holzmann's ideas.
TES: A modular systems approach to expert system development for real-time space applications
NASA Technical Reports Server (NTRS)
Cacace, Ralph; England, Brenda
1988-01-01
A major goal of the Space Station era is to reduce reliance on support from ground based experts. The development of software programs using expert systems technology is one means of reaching this goal without requiring crew members to become intimately familiar with the many complex spacecraft subsystems. Development of an expert systems program requires a validation of the software with actual flight hardware. By combining accurate hardware and software modelling techniques with a modular systems approach to expert systems development, the validation of these software programs can be successfully completed with minimum risk and effort. The TIMES Expert System (TES) is an application that monitors and evaluates real time data to perform fault detection and fault isolation tasks as they would otherwise be carried out by a knowledgeable designer. The development process and primary features of TES, a modular systems approach, and the lessons learned are discussed.
Fault Detection, Isolation and Recovery (FDIR) Portable Liquid Oxygen Hardware Demonstrator
NASA Technical Reports Server (NTRS)
Oostdyk, Rebecca L.; Perotti, Jose M.
2011-01-01
The Fault Detection, Isolation and Recovery (FDIR) hardware demonstration will highlight the effort being conducted by Constellation's Ground Operations (GO) to provide the Launch Control System (LCS) with system-level health management during vehicle processing and countdown activities. A proof-of-concept demonstration of the FDIR prototype established the capability of the software to provide real-time fault detection and isolation using generated Liquid Hydrogen data. The FDIR portable testbed unit (presented here) aims to enhance FDIR by providing a dynamic simulation of Constellation subsystems that feed the FDIR software live data based on Liquid Oxygen system properties. The LO2 cryogenic ground system has key properties that are analogous to the properties of an electronic circuit. The LO2 system is modeled using electrical components and an equivalent circuit is designed on a printed circuit board to simulate the live data. The portable testbed is also be equipped with data acquisition and communication hardware to relay the measurements to the FDIR application running on a PC. This portable testbed is an ideal capability to perform FDIR software testing, troubleshooting, training among others.
Testing Scientific Software: A Systematic Literature Review
Kanewala, Upulee; Bieman, James M.
2014-01-01
Context Scientific software plays an important role in critical decision making, for example making weather predictions based on climate models, and computation of evidence for research publications. Recently, scientists have had to retract publications due to errors caused by software faults. Systematic testing can identify such faults in code. Objective This study aims to identify specific challenges, proposed solutions, and unsolved problems faced when testing scientific software. Method We conducted a systematic literature survey to identify and analyze relevant literature. We identified 62 studies that provided relevant information about testing scientific software. Results We found that challenges faced when testing scientific software fall into two main categories: (1) testing challenges that occur due to characteristics of scientific software such as oracle problems and (2) testing challenges that occur due to cultural differences between scientists and the software engineering community such as viewing the code and the model that it implements as inseparable entities. In addition, we identified methods to potentially overcome these challenges and their limitations. Finally we describe unsolved challenges and how software engineering researchers and practitioners can help to overcome them. Conclusions Scientific software presents special challenges for testing. Specifically, cultural differences between scientist developers and software engineers, along with the characteristics of the scientific software make testing more difficult. Existing techniques such as code clone detection can help to improve the testing process. Software engineers should consider special challenges posed by scientific software such as oracle problems when developing testing techniques. PMID:25125798
Onboard Nonlinear Engine Sensor and Component Fault Diagnosis and Isolation Scheme
NASA Technical Reports Server (NTRS)
Tang, Liang; DeCastro, Jonathan A.; Zhang, Xiaodong
2011-01-01
A method detects and isolates in-flight sensor, actuator, and component faults for advanced propulsion systems. In sharp contrast to many conventional methods, which deal with either sensor fault or component fault, but not both, this method considers sensor fault, actuator fault, and component fault under one systemic and unified framework. The proposed solution consists of two main components: a bank of real-time, nonlinear adaptive fault diagnostic estimators for residual generation, and a residual evaluation module that includes adaptive thresholds and a Transferable Belief Model (TBM)-based residual evaluation scheme. By employing a nonlinear adaptive learning architecture, the developed approach is capable of directly dealing with nonlinear engine models and nonlinear faults without the need of linearization. Software modules have been developed and evaluated with the NASA C-MAPSS engine model. Several typical engine-fault modes, including a subset of sensor/actuator/components faults, were tested with a mild transient operation scenario. The simulation results demonstrated that the algorithm was able to successfully detect and isolate all simulated faults as long as the fault magnitudes were larger than the minimum detectable/isolable sizes, and no misdiagnosis occurred
Object-Oriented Algorithm For Evaluation Of Fault Trees
NASA Technical Reports Server (NTRS)
Patterson-Hine, F. A.; Koen, B. V.
1992-01-01
Algorithm for direct evaluation of fault trees incorporates techniques of object-oriented programming. Reduces number of calls needed to solve trees with repeated events. Provides significantly improved software environment for such computations as quantitative analyses of safety and reliability of complicated systems of equipment (e.g., spacecraft or factories).
NASA Astrophysics Data System (ADS)
Nolan, S.; Jones, C. E.; Munro, R.; Norman, P.; Galloway, S.; Venturumilli, S.; Sheng, J.; Yuan, W.
2017-12-01
Hybrid electric propulsion aircraft are proposed to improve overall aircraft efficiency, enabling future rising demands for air travel to be met. The development of appropriate electrical power systems to provide thrust for the aircraft is a significant challenge due to the much higher required power generation capacity levels and complexity of the aero-electrical power systems (AEPS). The efficiency and weight of the AEPS is critical to ensure that the benefits of hybrid propulsion are not mitigated by the electrical power train. Hence it is proposed that for larger aircraft (~200 passengers) superconducting power systems are used to meet target power densities. Central to the design of the hybrid propulsion AEPS is a robust and reliable electrical protection and fault management system. It is known from previous studies that the choice of protection system may have a significant impact on the overall efficiency of the AEPS. Hence an informed design process which considers the key trades between choice of cable and protection requirements is needed. To date the fault response of a voltage source converter interfaced DC link rail to rail fault in a superconducting power system has only been investigated using simulation models validated by theoretical values from the literature. This paper will present the experimentally obtained fault response for a variety of different types of superconducting tape for a rail to rail DC fault. The paper will then use these as a platform to identify key trades between protection requirements and cable design, providing guidelines to enable future informed decisions to optimise hybrid propulsion electrical power system and protection design.
Algorithm-Based Fault Tolerance Integrated with Replication
NASA Technical Reports Server (NTRS)
Some, Raphael; Rennels, David
2008-01-01
In a proposed approach to programming and utilization of commercial off-the-shelf computing equipment, a combination of algorithm-based fault tolerance (ABFT) and replication would be utilized to obtain high degrees of fault tolerance without incurring excessive costs. The basic idea of the proposed approach is to integrate ABFT with replication such that the algorithmic portions of computations would be protected by ABFT, and the logical portions by replication. ABFT is an extremely efficient, inexpensive, high-coverage technique for detecting and mitigating faults in computer systems used for algorithmic computations, but does not protect against errors in logical operations surrounding algorithms.
An experimental evaluation of software redundancy as a strategy for improving reliability
NASA Technical Reports Server (NTRS)
Eckhardt, Dave E., Jr.; Caglayan, Alper K.; Knight, John C.; Lee, Larry D.; Mcallister, David F.; Vouk, Mladen A.; Kelly, John P. J.
1990-01-01
The strategy of using multiple versions of independently developed software as a means to tolerate residual software design faults is suggested by the success of hardware redundancy for tolerating hardware failures. Although, as generally accepted, the independence of hardware failures resulting from physical wearout can lead to substantial increases in reliability for redundant hardware structures, a similar conclusion is not immediate for software. The degree to which design faults are manifested as independent failures determines the effectiveness of redundancy as a method for improving software reliability. Interest in multi-version software centers on whether it provides an adequate measure of increased reliability to warrant its use in critical applications. The effectiveness of multi-version software is studied by comparing estimates of the failure probabilities of these systems with the failure probabilities of single versions. The estimates are obtained under a model of dependent failures and compared with estimates obtained when failures are assumed to be independent. The experimental results are based on twenty versions of an aerospace application developed and certified by sixty programmers from four universities. Descriptions of the application, development and certification processes, and operational evaluation are given together with an analysis of the twenty versions.
NASA Astrophysics Data System (ADS)
Parra, Pablo; da Silva, Antonio; Polo, Óscar R.; Sánchez, Sebastián
2018-02-01
In this day and age, successful embedded critical software needs agile and continuous development and testing procedures. This paper presents the overall testing and code coverage metrics obtained during the unit testing procedure carried out to verify the correctness of the boot software that will run in the Instrument Control Unit (ICU) of the Energetic Particle Detector (EPD) on-board Solar Orbiter. The ICU boot software is a critical part of the project so its verification should be addressed at an early development stage, so any test case missed in this process may affect the quality of the overall on-board software. According to the European Cooperation for Space Standardization ESA standards, testing this kind of critical software must cover 100% of the source code statement and decision paths. This leads to the complete testing of fault tolerance and recovery mechanisms that have to resolve every possible memory corruption or communication error brought about by the space environment. The introduced procedure enables fault injection from the beginning of the development process and enables to fulfill the exigent code coverage demands on the boot software.
Suppression of Adverse Effects of GIC Using Controlled Variable Grounding Resistor
NASA Astrophysics Data System (ADS)
Abuhussein, A.; Ali, M. H.
2016-12-01
Geomagnetically induced current (GIC) has a harmful impact on power systems, with a large footprint. Mitigation strategies for the GIC are required to protect the integrity of the power system. To date, the adverse effects of GIC are being mitigated by either operational procedures or grounding fixed capacitors (GFCs). The operational procedures are uncertain, reduce systems' reliability, and increase energy losses. On the other hand, GFCs, incur voltage spikes, increase the transformer cost substantially, and require protection circuitry. This study investigates new possible approaches to cope with GIC, by using a controlled variable grounding resistor (CVGR), without interfering with the system's normal operation. In addition, the new techniques help suppress unsymmetrical faults in the power network. The controllability of the grounding resistor is applied using three different techniques: (1) a Parallel switch that is controlled by PI regulated duty cycle, (2) a Parallel switch that is triggered by a preset values in a look-up-table (LUT), and (3) a Mechanical resistor varied by a Fuzzy logic controller (FLC). The experimental results were obtained and validated using the MATLAB/SIMULINK software. A hypothetical power system that consists of a generator, a 765kv, 500 km long transmission lines connecting between a step-up, Δ-Yn, transformer, and a step-down, Yn-Δ, transformer, is considered. The performance of the CVGR is compared with that of the GFC under the cases of GIC event and unsymmetrical faults. From the simulation results, the following points are concluded: The CVGR effectively suppresses the GIC flowing in the system. Consequently, it protects the transformers from saturation and the rest of the system from collapsing. The CVGR also reduces the voltage and power swings associated with unsymmetrical faults and blocks the zero sequence current flowing through the neutral of the transformer. The performance of the CVGR surpasses that of the GFC in terms of GIC/fault current magnitude and decay time reduction. The GFC violates the IEEE standards C57.32/ C57.12 of transformers insulation level, while the CVGR maintain the neutral to ground voltage within acceptable levels. The CVGR, as opposed to the GFC, does not require a discharge circuit (spark gap), thus it reduces the cost and complexity.
Hardware fault insertion and instrumentation system: Mechanization and validation
NASA Technical Reports Server (NTRS)
Benson, J. W.
1987-01-01
Automated test capability for extensive low-level hardware fault insertion testing is developed. The test capability is used to calibrate fault detection coverage and associated latency times as relevant to projecting overall system reliability. Described are modifications made to the NASA Ames Reconfigurable Flight Control System (RDFCS) Facility to fully automate the total test loop involving the Draper Laboratories' Fault Injector Unit. The automated capability provided included the application of sequences of simulated low-level hardware faults, the precise measurement of fault latency times, the identification of fault symptoms, and bulk storage of test case results. A PDP-11/60 served as a test coordinator, and a PDP-11/04 as an instrumentation device. The fault injector was controlled by applications test software in the PDP-11/60, rather than by manual commands from a terminal keyboard. The time base was especially developed for this application to use a variety of signal sources in the system simulator.
NASA Astrophysics Data System (ADS)
Meskin, Matin
The rate of the integration of distributed generation (DG) units to the distribution level to meet the growth in demand increases as a reasonable replacement for costly network expansion. This integration brings many advantages to the consumers and power grids, as well as giving rise to more challenges in relation to protection and control. Recent research has brought to light the negative effects of DG units on short circuit currents and overcurrent (OC) protection systems in distribution networks. Change in the direction of fault current flow, increment or decrement of fault current magnitude, blindness of protection, feeder sympathy trip, nuisance trip of interrupting devices, and the disruption of coordination between protective devices are some potential impacts of DG unit integration. Among other types of DG units, the integration of renewable energy resources into the electric grid has seen a vast improvement in recent years. In particular, the interconnection of photovoltaic (PV) sources to the medium voltage (MV) distribution networks has experienced a rapid increase in the last decade. In this work, the effect of PV source on conventional OC relays in MV distribution networks is shown. It is indicated that the PV output fluctuation, due to changes in solar radiation, causes the magnitude and direction of the current to change haphazardly. These variations may result in the poor operation of OC relays as the main protective devices in the MV distribution networks. In other words, due to the bi-directional power flow characteristic and the fluctuation of current magnitude occurring in the presence of PV sources, a specific setting of OC relays is difficult to realize. Therefore, OC relays may operate in normal conditions. To improve the OC relay operation, a voltage-dependent-overcurrent protection is proposed. Although, this new method prevents the OC relay from maloperation, its ability to detect earth faults and high impedance faults is poor. Thus, a comprehensive protective system is suggested at the end of the dissertation. The proposed method is based on the application of the phasor measurement unit (PMU) and the differential protection method. All of the current magnitudes and angles are collected by PMU and are sent to the phasor data concentrator (PDC), where a differential protection algorithm is applied to these data. If any fault is detected, the trip will be sent back to the corresponding circuit breakers across the network. Higher selectivity, sensitivity, and faster operation in the differential protection are superior to those of other protection schemes. Differential protection operates as unit protection, which means that it operates only when there is a fault in the protection zone. It does not function for faults occurring out of zone. Therefore, no coordination is required between differential protections across the power system. Moreover, the misoperation of this protective scheme is less likely as compared to other protection methods.
RAMP: A fault tolerant distributed microcomputer structure for aircraft navigation and control
NASA Technical Reports Server (NTRS)
Dunn, W. R.
1980-01-01
RAMP consists of distributed sets of parallel computers partioned on the basis of software and packaging constraints. To minimize hardware and software complexity, the processors operate asynchronously. It was shown that through the design of asymptotically stable control laws, data errors due to the asynchronism were minimized. It was further shown that by designing control laws with this property and making minor hardware modifications to the RAMP modules, the system became inherently tolerant to intermittent faults. A laboratory version of RAMP was constructed and is described in the paper along with the experimental results.
Finite Element Simulations of Kaikoura, NZ Earthquake using DInSAR and High-Resolution DSMs
NASA Astrophysics Data System (ADS)
Barba, M.; Willis, M. J.; Tiampo, K. F.; Glasscoe, M. T.; Clark, M. K.; Zekkos, D.; Stahl, T. A.; Massey, C. I.
2017-12-01
Three-dimensional displacements from the Kaikoura, NZ, earthquake in November 2016 are imaged here using Differential Interferometric Synthetic Aperture Radar (DInSAR) and high-resolution Digital Surface Model (DSM) differencing and optical pixel tracking. Full-resolution co- and post-seismic interferograms of Sentinel-1A/B images are constructed using the JPL ISCE software. The OSU SETSM software is used to produce repeat 0.5 m posting DSMs from commercial satellite imagery, which are supplemented with UAV derived DSMs over the Kaikoura fault rupture on the eastern South Island, NZ. DInSAR provides long-wavelength motions while DSM differencing and optical pixel tracking provides both horizontal and vertical near fault motions, improving the modeling of shallow rupture dynamics. JPL GeoFEST software is used to perform finite element modeling of the fault segments and slip distributions and, in turn, the associated asperity distribution. The asperity profile is then used to simulate event rupture, the spatial distribution of stress drop, and the associated stress changes. Finite element modeling of slope stability is accomplished using the ultra high-resolution UAV derived DSMs to examine the evolution of post-earthquake topography, landslide dynamics and volumes. Results include new insights into shallow dynamics of fault slip and partitioning, estimates of stress change, and improved understanding of its relationship with the associated seismicity, deformation, and triggered cascading hazards.
Fuzzy-Wavelet Based Double Line Transmission System Protection Scheme in the Presence of SVC
NASA Astrophysics Data System (ADS)
Goli, Ravikumar; Shaik, Abdul Gafoor; Tulasi Ram, Sankara S.
2015-06-01
Increasing the power transfer capability and efficient utilization of available transmission lines, improving the power system controllability and stability, power oscillation damping and voltage compensation have made strides and created Flexible AC Transmission (FACTS) devices in recent decades. Shunt FACTS devices can have adverse effects on distance protection both in steady state and transient periods. Severe under reaching is the most important problem of relay which is caused by current injection at the point of connection to the system. Current absorption of compensator leads to overreach of relay. This work presents an efficient method based on wavelet transforms, fault detection, classification and location using Fuzzy logic technique which is almost independent of fault impedance, fault distance and fault inception angle. The proposed protection scheme is found to be fast, reliable and accurate for various types of faults on transmission lines with and without Static Var compensator at different locations and with various incidence angles.
NASA Technical Reports Server (NTRS)
Brenner, Richard; Lala, Jaynarayan H.; Nagle, Gail A.; Schor, Andrei; Turkovich, John
1994-01-01
This program demonstrated the integration of a number of technologies that can increase the availability and reliability of launch vehicles while lowering costs. Availability is increased with an advanced guidance algorithm that adapts trajectories in real-time. Reliability is increased with fault-tolerant computers and communication protocols. Costs are reduced by automatically generating code and documentation. This program was realized through the cooperative efforts of academia, industry, and government. The NASA-LaRC coordinated the effort, while Draper performed the integration. Georgia Institute of Technology supplied a weak Hamiltonian finite element method for optimal control problems. Martin Marietta used MATLAB to apply this method to a launch vehicle (FENOC). Draper supplied the fault-tolerant computing and software automation technology. The fault-tolerant technology includes sequential and parallel fault-tolerant processors (FTP & FTPP) and authentication protocols (AP) for communication. Fault-tolerant technology was incrementally incorporated. Development culminated with a heterogeneous network of workstations and fault-tolerant computers using AP. Draper's software automation system, ASTER, was used to specify a static guidance system based on FENOC, navigation, flight control (GN&C), models, and the interface to a user interface for mission control. ASTER generated Ada code for GN&C and C code for models. An algebraic transform engine (ATE) was developed to automatically translate MATLAB scripts into ASTER.
The embedded software life cycle - An expanded view
NASA Technical Reports Server (NTRS)
Larman, Brian T.; Loesh, Robert E.
1989-01-01
Six common issues that are encountered in the development of software for embedded computer systems are discussed from the perspective of their interrelationships with the development process and/or the system itself. Particular attention is given to concurrent hardware/software development, prototyping, the inaccessibility of the operational system, fault tolerance, the long life cycle, and inheritance. It is noted that the life cycle for embedded software must include elements beyond simply the specification and implementation of the target software.
NASA Technical Reports Server (NTRS)
Leveson, Nancy
1987-01-01
Software safety and its relationship to other qualities are discussed. It is shown that standard reliability and fault tolerance techniques will not solve the safety problem for the present. A new attitude requires: looking at what you do NOT want software to do along with what you want it to do; and assuming things will go wrong. New procedures and changes to entire software development process are necessary: special software safety analysis techniques are needed; and design techniques, especially eliminating complexity, can be very helpful.
A Genetic Representation for Evolutionary Fault Recovery in Virtex FPGAs
NASA Technical Reports Server (NTRS)
Lohn, Jason; Larchev, Greg; DeMara, Ronald; Korsmeyer, David (Technical Monitor)
2003-01-01
Most evolutionary approaches to fault recovery in FPGAs focus on evolving alternative logic configurations as opposed to evolving the intra-cell routing. Since the majority of transistors in a typical FPGA are dedicated to interconnect, nearly 80% according to one estimate, evolutionary fault-recovery systems should benefit hy accommodating routing. In this paper, we propose an evolutionary fault-recovery system employing a genetic representation that takes into account both logic and routing configurations. Experiments were run using a software model of the Xilinx Virtex FPGA. We report that using four Virtex combinational logic blocks, we were able to evolve a 100% accurate quadrature decoder finite state machine in the presence of a stuck-at-zero fault.
Reinforcement and Drill by Microcomputer.
ERIC Educational Resources Information Center
Balajthy, Ernest
1984-01-01
Points out why drill work has a role in the language arts classroom, explores the possibilities of using a microcomputer to give children drill work, and discusses the characteristics of a good software program, along with faults found in many software programs. (FL)
Koley, Ebha; Verma, Khushaboo; Ghosh, Subhojit
2015-01-01
Restrictions on right of way and increasing power demand has boosted development of six phase transmission. It offers a viable alternative for transmitting more power, without major modification in existing structure of three phase double circuit transmission system. Inspite of the advantages, low acceptance of six phase system is attributed to the unavailability of a proper protection scheme. The complexity arising from large number of possible faults in six phase lines makes the protection quite challenging. The proposed work presents a hybrid wavelet transform and modular artificial neural network based fault detector, classifier and locator for six phase lines using single end data only. The standard deviation of the approximate coefficients of voltage and current signals obtained using discrete wavelet transform are applied as input to the modular artificial neural network for fault classification and location. The proposed scheme has been tested for all 120 types of shunt faults with variation in location, fault resistance, fault inception angles. The variation in power system parameters viz. short circuit capacity of the source and its X/R ratio, voltage, frequency and CT saturation has also been investigated. The result confirms the effectiveness and reliability of the proposed protection scheme which makes it ideal for real time implementation.
Foundations for Protecting Renewable-Rich Distribution Systems.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ellis, Abraham; Brahma, Sukumar; Ranade, Satish
High proliferation of Inverter Interfaced Distributed Energy Resources (IIDERs) into the electric distribution grid introduces new challenges to protection of such systems. This is because the existing protection systems are designed with two assumptions: 1) system is single-sourced, resulting in unidirectional fault current, and (2) fault currents are easily detectable due to much higher magnitudes compared to load currents. Due to the fact that most renewables interface with the grid though inverters, and inverters restrict their current output to levels close to the full load currents, both these assumptions are no longer valid - the system becomes multi-sourced, and overcurrent-basedmore » protection does not work. The primary scope of this study is to analyze the response of a grid-tied inverter to different faults in the grid, leading to new guidelines on protecting renewable-rich distribution systems.« less
Novel Directional Protection Scheme for the FREEDM Smart Grid System
NASA Astrophysics Data System (ADS)
Sharma, Nitish
This research primarily deals with the design and validation of the protection system for a large scale meshed distribution system. The large scale system simulation (LSSS) is a system level PSCAD model which is used to validate component models for different time-scale platforms, to provide a virtual testing platform for the Future Renewable Electric Energy Delivery and Management (FREEDM) system. It is also used to validate the cases of power system protection, renewable energy integration and storage, and load profiles. The protection of the FREEDM system against any abnormal condition is one of the important tasks. The addition of distributed generation and power electronic based solid state transformer adds to the complexity of the protection. The FREEDM loop system has a fault current limiter and in addition, the Solid State Transformer (SST) limits the fault current at 2.0 per unit. Former students at ASU have developed the protection scheme using fiber-optic cable. However, during the NSF-FREEDM site visit, the National Science Foundation (NSF) team regarded the system incompatible for the long distances. Hence, a new protection scheme with a wireless scheme is presented in this thesis. The use of wireless communication is extended to protect the large scale meshed distributed generation from any fault. The trip signal generated by the pilot protection system is used to trigger the FID (fault isolation device) which is an electronic circuit breaker operation (switched off/opening the FIDs). The trip signal must be received and accepted by the SST, and it must block the SST operation immediately. A comprehensive protection system for the large scale meshed distribution system has been developed in PSCAD with the ability to quickly detect the faults. The validation of the protection system is performed by building a hardware model using commercial relays at the ASU power laboratory.
Microprocessor-controlled, wide-range streak camera
DOE Office of Scientific and Technical Information (OSTI.GOV)
Amy E. Lewis, Craig Hollabaugh
Bechtel Nevada/NSTec recently announced deployment of their fifth generation streak camera. This camera incorporates many advanced features beyond those currently available for streak cameras. The arc-resistant driver includes a trigger lockout mechanism, actively monitors input trigger levels, and incorporates a high-voltage fault interrupter for user safety and tube protection. The camera is completely modular and may deflect over a variable full-sweep time of 15 nanoseconds to 500 microseconds. The camera design is compatible with both large- and small-format commercial tubes from several vendors. The embedded microprocessor offers Ethernet connectivity, and XML [extensible markup language]-based configuration management with non-volatile parameter storagemore » using flash-based storage media. The camera’s user interface is platform-independent (Microsoft Windows, Unix, Linux, Macintosh OSX) and is accessible using an AJAX [asynchronous Javascript and XML]-equipped modem browser, such as Internet Explorer 6, Firefox, or Safari. User interface operation requires no installation of client software or browser plug-in technology. Automation software can also access the camera configuration and control using HTTP [hypertext transfer protocol]. The software architecture supports multiple-simultaneous clients, multiple cameras, and multiple module access with a standard browser. The entire user interface can be customized.« less
Microprocessor-controlled wide-range streak camera
NASA Astrophysics Data System (ADS)
Lewis, Amy E.; Hollabaugh, Craig
2006-08-01
Bechtel Nevada/NSTec recently announced deployment of their fifth generation streak camera. This camera incorporates many advanced features beyond those currently available for streak cameras. The arc-resistant driver includes a trigger lockout mechanism, actively monitors input trigger levels, and incorporates a high-voltage fault interrupter for user safety and tube protection. The camera is completely modular and may deflect over a variable full-sweep time of 15 nanoseconds to 500 microseconds. The camera design is compatible with both large- and small-format commercial tubes from several vendors. The embedded microprocessor offers Ethernet connectivity, and XML [extensible markup language]-based configuration management with non-volatile parameter storage using flash-based storage media. The camera's user interface is platform-independent (Microsoft Windows, Unix, Linux, Macintosh OSX) and is accessible using an AJAX [asynchronous Javascript and XML]-equipped modem browser, such as Internet Explorer 6, Firefox, or Safari. User interface operation requires no installation of client software or browser plug-in technology. Automation software can also access the camera configuration and control using HTTP [hypertext transfer protocol]. The software architecture supports multiple-simultaneous clients, multiple cameras, and multiple module access with a standard browser. The entire user interface can be customized.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Habib, Hany F; El Hariri, Mohamad; Elsayed, Ahmed
Microgrids’ adaptive protection techniques rely on communication signals from the point of common coupling to ad- just the corresponding relays’ settings for either grid-connected or islanded modes of operation. However, during communication out- ages or in the event of a cyberattack, relays settings are not changed. Thus adaptive protection schemes are rendered unsuc- cessful. Due to their fast response, supercapacitors, which are pre- sent in the microgrid to feed pulse loads, could also be utilized to enhance the resiliency of adaptive protection schemes to communi- cation outages. Proper sizing of the supercapacitors is therefore im- portant in order to maintainmore » a stable system operation and also reg- ulate the protection scheme’s cost. This paper presents a two-level optimization scheme for minimizing the supercapacitor size along with optimizing its controllers’ parameters. The latter will lead to a reduction of the supercapacitor fault current contribution and an increase in that of other AC resources in the microgrid in the ex- treme case of having a fault occurring simultaneously with a pulse load. It was also shown that the size of the supercapacitor can be reduced if the pulse load is temporary disconnected during the transient fault period. Simulations showed that the resulting super- capacitor size and the optimized controller parameters from the proposed two-level optimization scheme were feeding enough fault currents for different types of faults and minimizing the cost of the protection scheme.« less
NASA Technical Reports Server (NTRS)
Landano, M. R.; Easter, R. W.
1984-01-01
Aspects of Space Station automated systems testing and verification are discussed, taking into account several program requirements. It is found that these requirements lead to a number of issues of uncertainties which require study and resolution during the Space Station definition phase. Most, if not all, of the considered uncertainties have implications for the overall testing and verification strategy adopted by the Space Station Program. A description is given of the Galileo Orbiter fault protection design/verification approach. Attention is given to a mission description, an Orbiter description, the design approach and process, the fault protection design verification approach/process, and problems of 'stress' testing.
Graphical workstation capability for reliability modeling
NASA Technical Reports Server (NTRS)
Bavuso, Salvatore J.; Koppen, Sandra V.; Haley, Pamela J.
1992-01-01
In addition to computational capabilities, software tools for estimating the reliability of fault-tolerant digital computer systems must also provide a means of interfacing with the user. Described here is the new graphical interface capability of the hybrid automated reliability predictor (HARP), a software package that implements advanced reliability modeling techniques. The graphics oriented (GO) module provides the user with a graphical language for modeling system failure modes through the selection of various fault-tree gates, including sequence-dependency gates, or by a Markov chain. By using this graphical input language, a fault tree becomes a convenient notation for describing a system. In accounting for any sequence dependencies, HARP converts the fault-tree notation to a complex stochastic process that is reduced to a Markov chain, which it can then solve for system reliability. The graphics capability is available for use on an IBM-compatible PC, a Sun, and a VAX workstation. The GO module is written in the C programming language and uses the graphical kernal system (GKS) standard for graphics implementation. The PC, VAX, and Sun versions of the HARP GO module are currently in beta-testing stages.
Graymer, R.W.; Ponce, D.A.; Jachens, R.C.; Simpson, R.W.; Phelps, G.A.; Wentworth, C.M.
2005-01-01
In order to better understand mechanisms of active faults, we studied relationships between fault behavior and rock units along the Hayward fault using a three-dimensional geologic map. The three-dimensional map-constructed from hypocenters, potential field data, and surface map data-provided a geologic map of each fault surface, showing rock units on either side of the fault truncated by the fault. The two fault-surface maps were superimposed to create a rock-rock juxtaposition map. The three maps were compared with seismicity, including aseismic patches, surface creep, and fault dip along the fault, by using visuallization software to explore three-dimensional relationships. Fault behavior appears to be correlated to the fault-surface maps, but not to the rock-rock juxtaposition map, suggesting that properties of individual wall-rock units, including rock strength, play an important role in fault behavior. Although preliminary, these results suggest that any attempt to understand the detailed distribution of earthquakes or creep along a fault should include consideration of the rock types that abut the fault surface, including the incorporation of observations of physical properties of the rock bodies that intersect the fault at depth. ?? 2005 Geological Society of America.
Protection Relaying Scheme Based on Fault Reactance Operation Type
NASA Astrophysics Data System (ADS)
Tsuji, Kouichi
The theories of operation of existing relays are roughly divided into two types: one is the current differential types based on Kirchhoff's first law and the other is impedance types based on second law. We can apply the Kirchhoff's laws to strictly formulate fault phenomena, so the circuit equations are represented non linear simultaneous equations with variables fault point k and fault resistance Rf. This method has next two defect. 1) heavy computational burden for the iterative calculation on N-R method, 2) relay operator can not easily understand principle of numerical matrix operation. The new protection relay principles we proposed this paper focuses on the fact that the reactance component on fault point is almost zero. Two reactance Xf(S), Xf(R) on branch both ends are calculated by operation of solving linear equations. If signs of Xf(S) and Xf(R) are not same, it can be judged that the fault point exist in the branch. This reactance Xf corresponds to difference of branch reactance between actual fault point and imaginaly fault point. And so relay engineer can to understand fault location by concept of “distance". The simulation results using this new method indicates the highly precise estimation of fault locations compared with the inspected fault locations on operating transmission lines.
Research on Fault Characteristics and Line Protections Within a Large-scale Photovoltaic Power Plant
NASA Astrophysics Data System (ADS)
Zhang, Chi; Zeng, Jie; Zhao, Wei; Zhong, Guobin; Xu, Qi; Luo, Pandian; Gu, Chenjie; Liu, Bohan
2017-05-01
Centralized photovoltaic (PV) systems have different fault characteristics from distributed PV systems due to the different system structures and controls. This makes the fault analysis and protection methods used in distribution networks with distributed PV not suitable for a centralized PV power plant. Therefore, a consolidated expression for the fault current within a PV power plant under different controls was calculated considering the fault response of the PV array. Then, supported by the fault current analysis and the on-site testing data, the overcurrent relay (OCR) performance was evaluated in the collection system of an 850 MW PV power plant. It reveals that the OCRs at downstream side on overhead lines may malfunction. In this case, a new relay scheme was proposed using directional distance elements. In the PSCAD/EMTDC, a detailed PV system model was built and verified using the on-site testing data. Simulation results indicate that the proposed relay scheme could effectively solve the problems under variant fault scenarios and PV plant output levels.
Automatically generated acceptance test: A software reliability experiment
NASA Technical Reports Server (NTRS)
Protzel, Peter W.
1988-01-01
This study presents results of a software reliability experiment investigating the feasibility of a new error detection method. The method can be used as an acceptance test and is solely based on empirical data about the behavior of internal states of a program. The experimental design uses the existing environment of a multi-version experiment previously conducted at the NASA Langley Research Center, in which the launch interceptor problem is used as a model. This allows the controlled experimental investigation of versions with well-known single and multiple faults, and the availability of an oracle permits the determination of the error detection performance of the test. Fault interaction phenomena are observed that have an amplifying effect on the number of error occurrences. Preliminary results indicate that all faults examined so far are detected by the acceptance test. This shows promise for further investigations, and for the employment of this test method on other applications.
NASA Astrophysics Data System (ADS)
Alshammari, A.; Brantley, D.; Knapp, C. C.; Lakshmi, V.
2017-12-01
In this study, multi chemical components ((H2O, H2S) will be injected with supercritical carbon dioxide in onshore part of South Georgia Rift (SGR) Basin model. Chemical reaction expected issue between these components to produce stable mineral of carbonite rocks by the time. The 3D geological model has been extracted from petrel software and computer modelling group (CMG) package software has been used to build simulation model explain the effect of mineralization on fault permeability that control on plume migration critically between (0-0.05 m Darcy). The expected results will be correlated with single component case (CO2 only) to evaluate the importance the mineralization on CO2 plume migration in structure and stratigraphic traps and detect the variation of fault leakage in case of critical values (low permeability). The results will also, show us the ratio of every trapped phase in (SGR) basin reservoir model.
An SSME High Pressure Oxidizer Turbopump diagnostic system using G2 real-time expert system
NASA Technical Reports Server (NTRS)
Guo, Ten-Huei
1991-01-01
An expert system which diagnoses various seal leakage faults in the High Pressure Oxidizer Turbopump of the SSME was developed using G2 real-time expert system. Three major functions of the software were implemented: model-based data generation, real-time expert system reasoning, and real-time input/output communication. This system is proposed as one module of a complete diagnostic system for the SSME. Diagnosis of a fault is defined as the determination of its type, severity, and likelihood. Since fault diagnosis is often accomplished through the use of heuristic human knowledge, an expert system based approach has been adopted as a paradigm to develop this diagnostic system. To implement this approach, a software shell which can be easily programmed to emulate the human decision process, the G2 Real-Time Expert System, was selected. Lessons learned from this implementation are discussed.
An SSME high pressure oxidizer turbopump diagnostic system using G2(TM) real-time expert system
NASA Technical Reports Server (NTRS)
Guo, Ten-Huei
1991-01-01
An expert system which diagnoses various seal leakage faults in the High Pressure Oxidizer Turbopump of the SSME was developed using G2(TM) real-time expert system. Three major functions of the software were implemented: model-based data generation, real-time expert system reasoning, and real-time input/output communication. This system is proposed as one module of a complete diagnostic system for Space Shuttle Main Engine. Diagnosis of a fault is defined as the determination of its type, severity, and likelihood. Since fault diagnosis is often accomplished through the use of heuristic human knowledge, an expert system based approach was adopted as a paradigm to develop this diagnostic system. To implement this approach, a software shell which can be easily programmed to emulate the human decision process, the G2 Real-Time Expert System, was selected. Lessons learned from this implementation are discussed.
Progressive retry for software error recovery in distributed systems
NASA Technical Reports Server (NTRS)
Wang, Yi-Min; Huang, Yennun; Fuchs, W. K.
1993-01-01
In this paper, we describe a method of execution retry for bypassing software errors based on checkpointing, rollback, message reordering and replaying. We demonstrate how rollback techniques, previously developed for transient hardware failure recovery, can also be used to recover from software faults by exploiting message reordering to bypass software errors. Our approach intentionally increases the degree of nondeterminism and the scope of rollback when a previous retry fails. Examples from our experience with telecommunications software systems illustrate the benefits of the scheme.
NASA Astrophysics Data System (ADS)
Indah, F. P.; Syafriani, S.; Andiyansyah, Z. S.
2018-04-01
Sumatra is in an active subduction zone between the indo-australian plate and the eurasian plate and is located at a fault along the sumatra fault so that sumatra is vulnerable to earthquakes. One of the ways to find out the cause of earthquake can be done by identifying the type of earthquake-causing faults based on earthquake of focal mechanism. The data used to identify the type of fault cause of earthquake is the earth tensor moment data which is sourced from global cmt period 1976-2016. The data used in this research using magnitude m ≥ 6 sr. This research uses gmt software (generic mapping tolls) to describe the form of fault. From the research result, it is found that the characteristics of fault field that formed in every region in sumatera island based on data processing and data of earthquake history of 1976-2016 period that the type of fault in sumatera fault is strike slip, fault type in mentawai fault is reverse fault (rising faults) and dip-slip, while the fault type in the subduction zone is dip-slip.
A research program in empirical computer science
NASA Technical Reports Server (NTRS)
Knight, J. C.
1991-01-01
During the grant reporting period our primary activities have been to begin preparation for the establishment of a research program in experimental computer science. The focus of research in this program will be safety-critical systems. Many questions that arise in the effort to improve software dependability can only be addressed empirically. For example, there is no way to predict the performance of the various proposed approaches to building fault-tolerant software. Performance models, though valuable, are parameterized and cannot be used to make quantitative predictions without experimental determination of underlying distributions. In the past, experimentation has been able to shed some light on the practical benefits and limitations of software fault tolerance. It is common, also, for experimentation to reveal new questions or new aspects of problems that were previously unknown. A good example is the Consistent Comparison Problem that was revealed by experimentation and subsequently studied in depth. The result was a clear understanding of a previously unknown problem with software fault tolerance. The purpose of a research program in empirical computer science is to perform controlled experiments in the area of real-time, embedded control systems. The goal of the various experiments will be to determine better approaches to the construction of the software for computing systems that have to be relied upon. As such it will validate research concepts from other sources, provide new research results, and facilitate the transition of research results from concepts to practical procedures that can be applied with low risk to NASA flight projects. The target of experimentation will be the production software development activities undertaken by any organization prepared to contribute to the research program. Experimental goals, procedures, data analysis and result reporting will be performed for the most part by the University of Virginia.
Khoshgoftaar, T M; Allen, E B; Hudepohl, J P; Aud, S J
1997-01-01
Society relies on telecommunications to such an extent that telecommunications software must have high reliability. Enhanced measurement for early risk assessment of latent defects (EMERALD) is a joint project of Nortel and Bell Canada for improving the reliability of telecommunications software products. This paper reports a case study of neural-network modeling techniques developed for the EMERALD system. The resulting neural network is currently in the prototype testing phase at Nortel. Neural-network models can be used to identify fault-prone modules for extra attention early in development, and thus reduce the risk of operational problems with those modules. We modeled a subset of modules representing over seven million lines of code from a very large telecommunications software system. The set consisted of those modules reused with changes from the previous release. The dependent variable was membership in the class of fault-prone modules. The independent variables were principal components of nine measures of software design attributes. We compared the neural-network model with a nonparametric discriminant model and found the neural-network model had better predictive accuracy.
Extended Testability Analysis Tool
NASA Technical Reports Server (NTRS)
Melcher, Kevin; Maul, William A.; Fulton, Christopher
2012-01-01
The Extended Testability Analysis (ETA) Tool is a software application that supports fault management (FM) by performing testability analyses on the fault propagation model of a given system. Fault management includes the prevention of faults through robust design margins and quality assurance methods, or the mitigation of system failures. Fault management requires an understanding of the system design and operation, potential failure mechanisms within the system, and the propagation of those potential failures through the system. The purpose of the ETA Tool software is to process the testability analysis results from a commercial software program called TEAMS Designer in order to provide a detailed set of diagnostic assessment reports. The ETA Tool is a command-line process with several user-selectable report output options. The ETA Tool also extends the COTS testability analysis and enables variation studies with sensor sensitivity impacts on system diagnostics and component isolation using a single testability output. The ETA Tool can also provide extended analyses from a single set of testability output files. The following analysis reports are available to the user: (1) the Detectability Report provides a breakdown of how each tested failure mode was detected, (2) the Test Utilization Report identifies all the failure modes that each test detects, (3) the Failure Mode Isolation Report demonstrates the system s ability to discriminate between failure modes, (4) the Component Isolation Report demonstrates the system s ability to discriminate between failure modes relative to the components containing the failure modes, (5) the Sensor Sensor Sensitivity Analysis Report shows the diagnostic impact due to loss of sensor information, and (6) the Effect Mapping Report identifies failure modes that result in specified system-level effects.
Modeling of a latent fault detector in a digital system
NASA Technical Reports Server (NTRS)
Nagel, P. M.
1978-01-01
Methods of modeling the detection time or latency period of a hardware fault in a digital system are proposed that explain how a computer detects faults in a computational mode. The objectives were to study how software reacts to a fault, to account for as many variables as possible affecting detection and to forecast a given program's detecting ability prior to computation. A series of experiments were conducted on a small emulated microprocessor with fault injection capability. Results indicate that the detecting capability of a program largely depends on the instruction subset used during computation and the frequency of its use and has little direct dependence on such variables as fault mode, number set, degree of branching and program length. A model is discussed which employs an analog with balls in an urn to explain the rate of which subsequent repetitions of an instruction or instruction set detect a given fault.
Comparative analysis of techniques for evaluating the effectiveness of aircraft computing systems
NASA Technical Reports Server (NTRS)
Hitt, E. F.; Bridgman, M. S.; Robinson, A. C.
1981-01-01
Performability analysis is a technique developed for evaluating the effectiveness of fault-tolerant computing systems in multiphase missions. Performability was evaluated for its accuracy, practical usefulness, and relative cost. The evaluation was performed by applying performability and the fault tree method to a set of sample problems ranging from simple to moderately complex. The problems involved as many as five outcomes, two to five mission phases, permanent faults, and some functional dependencies. Transient faults and software errors were not considered. A different analyst was responsible for each technique. Significantly more time and effort were required to learn performability analysis than the fault tree method. Performability is inherently as accurate as fault tree analysis. For the sample problems, fault trees were more practical and less time consuming to apply, while performability required less ingenuity and was more checkable. Performability offers some advantages for evaluating very complex problems.
Proceedings of the Twenty-Third Annual Software Engineering Workshop
NASA Technical Reports Server (NTRS)
1999-01-01
The Twenty-third Annual Software Engineering Workshop (SEW) provided 20 presentations designed to further the goals of the Software Engineering Laboratory (SEL) of the NASA-GSFC. The presentations were selected on their creativity. The sessions which were held on 2-3 of December 1998, centered on the SEL, Experimentation, Inspections, Fault Prediction, Verification and Validation, and Embedded Systems and Safety-Critical Systems.
Operational Suitability Guide. Volume 2. Templates
1990-05-01
Intended mission, and the required technical and operational characteristics. The mission must be adequately defined and key hardware and software ...operational availability. With the use of fault-tolerant computer hardware and software , the system R&M will significantly improve end-to-end...should Include both hardware and software elements, as appropriate. Unique characteristics or unique support concepts should be Identified if they result
An Incremental Life-cycle Assurance Strategy for Critical System Certification
2014-11-04
for Safe Aircraft Operation Embedded software systems introduce a new class of problems not addressed by traditional system modeling & analysis...Platform Runtime Architecture Application Software Embedded SW System Engineer Data Stream Characteristics Latency jitter affects control behavior...do system level failures still occur despite fault tolerance techniques being deployed in systems ? Embedded software system as major source of
Resilience Design Patterns - A Structured Approach to Resilience at Extreme Scale (version 1.0)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hukerikar, Saurabh; Engelmann, Christian
Reliability is a serious concern for future extreme-scale high-performance computing (HPC) systems. Projections based on the current generation of HPC systems and technology roadmaps suggest that very high fault rates in future systems. The errors resulting from these faults will propagate and generate various kinds of failures, which may result in outcomes ranging from result corruptions to catastrophic application crashes. Practical limits on power consumption in HPC systems will require future systems to embrace innovative architectures, increasing the levels of hardware and software complexities. The resilience challenge for extreme-scale HPC systems requires management of various hardware and software technologies thatmore » are capable of handling a broad set of fault models at accelerated fault rates. These techniques must seek to improve resilience at reasonable overheads to power consumption and performance. While the HPC community has developed various solutions, application-level as well as system-based solutions, the solution space of HPC resilience techniques remains fragmented. There are no formal methods and metrics to investigate and evaluate resilience holistically in HPC systems that consider impact scope, handling coverage, and performance & power eciency across the system stack. Additionally, few of the current approaches are portable to newer architectures and software ecosystems, which are expected to be deployed on future systems. In this document, we develop a structured approach to the management of HPC resilience based on the concept of resilience-based design patterns. A design pattern is a general repeatable solution to a commonly occurring problem. We identify the commonly occurring problems and solutions used to deal with faults, errors and failures in HPC systems. The catalog of resilience design patterns provides designers with reusable design elements. We define a design framework that enhances our understanding of the important constraints and opportunities for solutions deployed at various layers of the system stack. The framework may be used to establish mechanisms and interfaces to coordinate flexible fault management across hardware and software components. The framework also enables optimization of the cost-benefit trade-os among performance, resilience, and power consumption. The overall goal of this work is to enable a systematic methodology for the design and evaluation of resilience technologies in extreme-scale HPC systems that keep scientific applications running to a correct solution in a timely and cost-ecient manner in spite of frequent faults, errors, and failures of various types.« less
NASA Technical Reports Server (NTRS)
Dewberry, Brandon S.
1990-01-01
The Environmental Control and Life Support System (ECLSS) is a Freedom Station distributed system with inherent applicability to advanced automation primarily due to the comparatively large reaction times of its subsystem processes. This allows longer contemplation times in which to form a more intelligent control strategy and to detect or prevent faults. The objective of the ECLSS Advanced Automation Project is to reduce the flight and ground manpower needed to support the initial and evolutionary ECLS system. The approach is to search out and make apparent those processes in the baseline system which are in need of more automatic control and fault detection strategies, to influence the ECLSS design by suggesting software hooks and hardware scars which will allow easy adaptation to advanced algorithms, and to develop complex software prototypes which fit into the ECLSS software architecture and will be shown in an ECLSS hardware testbed to increase the autonomy of the system. Covered here are the preliminary investigation and evaluation process, aimed at searching the ECLSS for candidate functions for automation and providing a software hooks and hardware scars analysis. This analysis shows changes needed in the baselined system for easy accommodation of knowledge-based or other complex implementations which, when integrated in flight or ground sustaining engineering architectures, will produce a more autonomous and fault tolerant Environmental Control and Life Support System.
NASA Technical Reports Server (NTRS)
Obrien, Edward M.
1991-01-01
An investigation was undertaken to make the elctrocardiography (ECG) and the electromyography (EMG) signal conditioning circuits two-fault tolerant and to update the circuitry. The present signal conditioning circuits provide at least one level of subject protection against electrical shock hazard but at a level of 100 micro-A (for voltages of up to 200 V). However, it is necessary to provide catastrophic fault tolerance protection for the astronauts and to provide protection at a current level of less that 100 micro-A. For this study, protection at the 10 micro-A level was sought. This is the generally accepted value below which no possibility of microshock exists. Only the possibility of macroshock exists in the case of the signal conditioners. However, this extra amount of protection is desirable. The initial part deals with current limiter circuits followed by an investigation into the signal conditioner specifications and circuit design.
Software Requirements Analysis as Fault Predictor
NASA Technical Reports Server (NTRS)
Wallace, Dolores
2003-01-01
Waiting until the integration and system test phase to discover errors leads to more costly rework than resolving those same errors earlier in the lifecycle. Costs increase even more significantly once a software system has become operational. WE can assess the quality of system requirements, but do little to correlate this information either to system assurance activities or long-term reliability projections - both of which remain unclear and anecdotal. Extending earlier work on requirements accomplished by the ARM tool, measuring requirements quality information against code complexity and test data for the same system may be used to predict specific software modules containing high impact or deeply embedded faults now escaping in operational systems. Such knowledge would lead to more effective and efficient test programs. It may enable insight into whether a program should be maintained or started over.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bhatti, A.A.
1990-04-01
This paper examines the effects of primary and secondary fault quantities as well s of mutual couplings of neighboring circuits on the sensitivity of operation and threshold settings of a microcomputer based differential protection of UHV lines under selective phase switching. Microcomputer based selective phase switching allows the disconnection of minimum number of phases involved in a fault and requires the autoreclosing of these phases immediately after the extinction of secondary arc. During a primary fault a heavy current contribution to the healthy phases tends to cause an unwanted tripping. Faulty phases physically disconnected constitute an isolated fault which beingmore » coupled to the system affects the current and voltage levels of the healthy phases still retained in the system and may cause an unwanted tripping. The microcomputer based differential protection, appears to have poor performance when applied to uncompensated lines employing selective pole switching.« less
NASA Technical Reports Server (NTRS)
Fitz, Rhonda; Whitman, Gerek
2016-01-01
Research into complexities of software systems Fault Management (FM) and how architectural design decisions affect safety, preservation of assets, and maintenance of desired system functionality has coalesced into a technical reference (TR) suite that advances the provision of safety and mission assurance. The NASA Independent Verification and Validation (IVV) Program, with Software Assurance Research Program support, extracted FM architectures across the IVV portfolio to evaluate robustness, assess visibility for validation and test, and define software assurance methods applied to the architectures and designs. This investigation spanned IVV projects with seven different primary developers, a wide range of sizes and complexities, and encompassed Deep Space Robotic, Human Spaceflight, and Earth Orbiter mission FM architectures. The initiative continues with an expansion of the TR suite to include Launch Vehicles, adding the benefit of investigating differences intrinsic to model-based FM architectures and insight into complexities of FM within an Agile software development environment, in order to improve awareness of how nontraditional processes affect FM architectural design and system health management.
A framework for software fault tolerance in real-time systems
NASA Technical Reports Server (NTRS)
Anderson, T.; Knight, J. C.
1983-01-01
A classification scheme for errors and a technique for the provision of software fault tolerance in cyclic real-time systems is presented. The technique requires that the process structure of a system be represented by a synchronization graph which is used by an executive as a specification of the relative times at which they will communicate during execution. Communication between concurrent processes is severely limited and may only take place between processes engaged in an exchange. A history of error occurrences is maintained by an error handler. When an error is detected, the error handler classifies it using the error history information and then initiates appropriate recovery action.
The implementation and use of Ada on distributed systems with high reliability requirements
NASA Technical Reports Server (NTRS)
Knight, J. C.
1984-01-01
The use and implementation of Ada in distributed environments in which reliability is the primary concern is investigated. Emphasis is placed on the possibility that a distributed system may be programmed entirely in ADA so that the individual tasks of the system are unconcerned with which processors they are executing on, and that failures may occur in the software or underlying hardware. The primary activities are: (1) Continued development and testing of our fault-tolerant Ada testbed; (2) consideration of desirable language changes to allow Ada to provide useful semantics for failure; (3) analysis of the inadequacies of existing software fault tolerance strategies.
NASA Technical Reports Server (NTRS)
Bodechtel, J. (Principal Investigator)
1975-01-01
The author has identified the following significant results. The geological interpretation on data exhibiting the Italian peninsula led to the recognition of tectonic features which are explained by a clockwise rotation of various blocks along left-handed transform faults. These faults can be interpreted as resulting from shear due to main stress directed north-eastwards. A land use map of the mountainous regions of Italy was produced on a scale of 1:250,000. For the digital treatment of MSS-CCTs an image processing software was written in FORTRAN 4. The software package includes descriptive statistics and also classification algorithms.
The Development of Design Tools for Fault Tolerant Quantum Dot Cellular Automata Based Logic
NASA Technical Reports Server (NTRS)
Armstrong, Curtis D.; Humphreys, William M.
2003-01-01
We are developing software to explore the fault tolerance of quantum dot cellular automata gate architectures in the presence of manufacturing variations and device defects. The Topology Optimization Methodology using Applied Statistics (TOMAS) framework extends the capabilities of the A Quantum Interconnected Network Array Simulator (AQUINAS) by adding front-end and back-end software and creating an environment that integrates all of these components. The front-end tools establish all simulation parameters, configure the simulation system, automate the Monte Carlo generation of simulation files, and execute the simulation of these files. The back-end tools perform automated data parsing, statistical analysis and report generation.
Monitoring microearthquakes with the San Andreas fault observatory at depth
Oye, V.; Ellsworth, W.L.
2007-01-01
In 2005, the San Andreas Fault Observatory at Depth (SAFOD) was drilled through the San Andreas Fault zone at a depth of about 3.1 km. The borehole has subsequently been instrumented with high-frequency geophones in order to better constrain locations and source processes of nearby microearthquakes that will be targeted in the upcoming phase of SAFOD. The microseismic monitoring software MIMO, developed by NORSAR, has been installed at SAFOD to provide near-real time locations and magnitude estimates using the high sampling rate (4000 Hz) waveform data. To improve the detection and location accuracy, we incorporate data from the nearby, shallow borehole (???250 m) seismometers of the High Resolution Seismic Network (HRSN). The event association algorithm of the MIMO software incorporates HRSN detections provided by the USGS real time earthworm software. The concept of the new event association is based on the generalized beam forming, primarily used in array seismology. The method requires the pre-computation of theoretical travel times in a 3D grid of potential microearthquake locations to the seismometers of the current station network. By minimizing the differences between theoretical and observed detection times an event is associated and the location accuracy is significantly improved.
Fuzzy logic based on-line fault detection and classification in transmission line.
Adhikari, Shuma; Sinha, Nidul; Dorendrajit, Thingam
2016-01-01
This study presents fuzzy logic based online fault detection and classification of transmission line using Programmable Automation and Control technology based National Instrument Compact Reconfigurable i/o (CRIO) devices. The LabVIEW software combined with CRIO can perform real time data acquisition of transmission line. When fault occurs in the system current waveforms are distorted due to transients and their pattern changes according to the type of fault in the system. The three phase alternating current, zero sequence and positive sequence current data generated by LabVIEW through CRIO-9067 are processed directly for relaying. The result shows that proposed technique is capable of right tripping action and classification of type of fault at high speed therefore can be employed in practical application.
Airborne Advanced Reconfigurable Computer System (ARCS)
NASA Technical Reports Server (NTRS)
Bjurman, B. E.; Jenkins, G. M.; Masreliez, C. J.; Mcclellan, K. L.; Templeman, J. E.
1976-01-01
A digital computer subsystem fault-tolerant concept was defined, and the potential benefits and costs of such a subsystem were assessed when used as the central element of a new transport's flight control system. The derived advanced reconfigurable computer system (ARCS) is a triple-redundant computer subsystem that automatically reconfigures, under multiple fault conditions, from triplex to duplex to simplex operation, with redundancy recovery if the fault condition is transient. The study included criteria development covering factors at the aircraft's operation level that would influence the design of a fault-tolerant system for commercial airline use. A new reliability analysis tool was developed for evaluating redundant, fault-tolerant system availability and survivability; and a stringent digital system software design methodology was used to achieve design/implementation visibility.
Various Indices for Diagnosis of Air-gap Eccentricity Fault in Induction Motor-A Review
NASA Astrophysics Data System (ADS)
Nikhil; Mathew, Lini, Dr.; Sharma, Amandeep
2018-03-01
From the past few years, research has gained an ardent pace in the field of fault detection and diagnosis in induction motors. In the current scenario, software is being introduced with diagnostic features to improve stability and reliability in fault diagnostic techniques. Human involvement in decision making for fault detection is slowly being replaced by Artificial Intelligence techniques. In this paper, a brief introduction of eccentricity fault is presented along with their causes and effects on the health of induction motors. Various indices used to detect eccentricity are being introduced along with their boundary conditions and their future scope of research. At last, merits and demerits of all indices are discussed and a comparison is made between them.
NASA Astrophysics Data System (ADS)
Bonanno, Emanuele; Bonini, Lorenzo; Basili, Roberto; Toscani, Giovanni; Seno, Silvio
2016-04-01
Fault-related folding kinematic models are widely used to explain accommodation of crustal shortening. These models, however, include simplifications, such as the assumption of constant growth rate of faults. This value sometimes is not constant in isotropic materials, and even more variable if one considers naturally anisotropic geological systems. , This means that these simplifications could lead to incorrect interpretations of the reality. In this study, we use analogue models to evaluate how thin, mechanical discontinuities, such as beddings or thin weak layers, influence the propagation of reverse faults and related folds. The experiments are performed with two different settings to simulate initially-blind master faults dipping at 30° and 45°. The 30° dip represents one of the Andersonian conjugate fault, and 45° dip is very frequent in positive reactivation of normal faults. The experimental apparatus consists of a clay layer placed above two plates: one plate, the footwall, is fixed; the other one, the hanging wall, is mobile. Motor-controlled sliding of the hanging wall plate along an inclined plane reproduces the reverse fault movement. We run thirty-six experiments: eighteen with dip of 30° and eighteen with dip of 45°. For each dip-angle setting, we initially run isotropic experiments that serve as a reference. Then, we run the other experiments with one or two discontinuities (horizontal precuts performed into the clay layer). We monitored the experiments collecting side photographs every 1.0 mm of displacement of the master fault. These images have been analyzed through PIVlab software, a tool based on the Digital Image Correlation method. With the "displacement field analysis" (one of the PIVlab tools) we evaluated, the variation of the trishear zone shape and how the master-fault tip and newly-formed faults propagate into the clay medium. With the "strain distribution analysis", we observed the amount of the on-fault and off-fault deformation with respect to the faulting pattern and evolution. Secondly, using MOVE software, we extracted the positions of fault tips and folds every 5 mm of displacement on the master fault. Analyzing these positions in all of the experiments, we found that the growth rate of the faults and the related fold shape vary depending on the number of discontinuities in the clay medium. Other results can be summarized as follows: 1) the fault growth rate is not constant, but varies especially while the new faults interacts with precuts; 2) the new faults tend to crosscut the discontinuities when the angle between them is approximately 90°; 3) the trishear zone change its shape during the experiments especially when the main fault interacts with the discontinuities.
Model-Based Fault Diagnosis: Performing Root Cause and Impact Analyses in Real Time
NASA Technical Reports Server (NTRS)
Figueroa, Jorge F.; Walker, Mark G.; Kapadia, Ravi; Morris, Jonathan
2012-01-01
Generic, object-oriented fault models, built according to causal-directed graph theory, have been integrated into an overall software architecture dedicated to monitoring and predicting the health of mission- critical systems. Processing over the generic fault models is triggered by event detection logic that is defined according to the specific functional requirements of the system and its components. Once triggered, the fault models provide an automated way for performing both upstream root cause analysis (RCA), and for predicting downstream effects or impact analysis. The methodology has been applied to integrated system health management (ISHM) implementations at NASA SSC's Rocket Engine Test Stands (RETS).
Dataflow models for fault-tolerant control systems
NASA Technical Reports Server (NTRS)
Papadopoulos, G. M.
1984-01-01
Dataflow concepts are used to generate a unified hardware/software model of redundant physical systems which are prone to faults. Basic results in input congruence and synchronization are shown to reduce to a simple model of data exchanges between processing sites. Procedures are given for the construction of congruence schemata, the distinguishing features of any correctly designed redundant system.
Fault Injection Campaign for a Fault Tolerant Duplex Framework
NASA Technical Reports Server (NTRS)
Sacco, Gian Franco; Ferraro, Robert D.; von llmen, Paul; Rennels, Dave A.
2007-01-01
Fault tolerance is an efficient approach adopted to avoid or reduce the damage of a system failure. In this work we present the results of a fault injection campaign we conducted on the Duplex Framework (DF). The DF is a software developed by the UCLA group [1, 2] that uses a fault tolerant approach and allows to run two replicas of the same process on two different nodes of a commercial off-the-shelf (COTS) computer cluster. A third process running on a different node, constantly monitors the results computed by the two replicas, and eventually restarts the two replica processes if an inconsistency in their computation is detected. This approach is very cost efficient and can be adopted to control processes on spacecrafts where the fault rate produced by cosmic rays is not very high.
NASA Technical Reports Server (NTRS)
Malin, Jane T.; Schrenkenghost, Debra K.
2001-01-01
The Adjustable Autonomy Testbed (AAT) is a simulation-based testbed located in the Intelligent Systems Laboratory in the Automation, Robotics and Simulation Division at NASA Johnson Space Center. The purpose of the testbed is to support evaluation and validation of prototypes of adjustable autonomous agent software for control and fault management for complex systems. The AA T project has developed prototype adjustable autonomous agent software and human interfaces for cooperative fault management. This software builds on current autonomous agent technology by altering the architecture, components and interfaces for effective teamwork between autonomous systems and human experts. Autonomous agents include a planner, flexible executive, low level control and deductive model-based fault isolation. Adjustable autonomy is intended to increase the flexibility and effectiveness of fault management with an autonomous system. The test domain for this work is control of advanced life support systems for habitats for planetary exploration. The CONFIG hybrid discrete event simulation environment provides flexible and dynamically reconfigurable models of the behavior of components and fluids in the life support systems. Both discrete event and continuous (discrete time) simulation are supported, and flows and pressures are computed globally. This provides fast dynamic simulations of interacting hardware systems in closed loops that can be reconfigured during operations scenarios, producing complex cascading effects of operations and failures. Current object-oriented model libraries support modeling of fluid systems, and models have been developed of physico-chemical and biological subsystems for processing advanced life support gases. In FY01, water recovery system models will be developed.
Use of Fuzzy Logic Systems for Assessment of Primary Faults
NASA Astrophysics Data System (ADS)
Petrović, Ivica; Jozsa, Lajos; Baus, Zoran
2015-09-01
In electric power systems, grid elements are often subjected to very complex and demanding disturbances or dangerous operating conditions. Determining initial fault or cause of those states is a difficult task. When fault occurs, often it is an imperative to disconnect affected grid element from the grid. This paper contains an overview of possibilities for using fuzzy logic in an assessment of primary faults in the transmission grid. The tool for this task is SCADA system, which is based on information of currents, voltages, events of protection devices and status of circuit breakers in the grid. The function model described with the membership function and fuzzy logic systems will be presented in the paper. For input data, diagnostics system uses information of protection devices tripping, states of circuit breakers and measurements of currents and voltages before and after faults.
Reliable High Performance Peta- and Exa-Scale Computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bronevetsky, G
2012-04-02
As supercomputers become larger and more powerful, they are growing increasingly complex. This is reflected both in the exponentially increasing numbers of components in HPC systems (LLNL is currently installing the 1.6 million core Sequoia system) as well as the wide variety of software and hardware components that a typical system includes. At this scale it becomes infeasible to make each component sufficiently reliable to prevent regular faults somewhere in the system or to account for all possible cross-component interactions. The resulting faults and instability cause HPC applications to crash, perform sub-optimally or even produce erroneous results. As supercomputers continuemore » to approach Exascale performance and full system reliability becomes prohibitively expensive, we will require novel techniques to bridge the gap between the lower reliability provided by hardware systems and users unchanging need for consistent performance and reliable results. Previous research on HPC system reliability has developed various techniques for tolerating and detecting various types of faults. However, these techniques have seen very limited real applicability because of our poor understanding of how real systems are affected by complex faults such as soft fault-induced bit flips or performance degradations. Prior work on such techniques has had very limited practical utility because it has generally focused on analyzing the behavior of entire software/hardware systems both during normal operation and in the face of faults. Because such behaviors are extremely complex, such studies have only produced coarse behavioral models of limited sets of software/hardware system stacks. Since this provides little insight into the many different system stacks and applications used in practice, this work has had little real-world impact. My project addresses this problem by developing a modular methodology to analyze the behavior of applications and systems during both normal and faulty operation. By synthesizing models of individual components into a whole-system behavior models my work is making it possible to automatically understand the behavior of arbitrary real-world systems to enable them to tolerate a wide range of system faults. My project is following a multi-pronged research strategy. Section II discusses my work on modeling the behavior of existing applications and systems. Section II.A discusses resilience in the face of soft faults and Section II.B looks at techniques to tolerate performance faults. Finally Section III presents an alternative approach that studies how a system should be designed from the ground up to make resilience natural and easy.« less
NASA Technical Reports Server (NTRS)
Ferrell, Bob A.; Lewis, Mark E.; Perotti, Jose M.; Brown, Barbara L.; Oostdyk, Rebecca L.; Goetz, Jesse W.
2010-01-01
This paper's main purpose is to detail issues and lessons learned regarding designing, integrating, and implementing Fault Detection Isolation and Recovery (FDIR) for Constellation Exploration Program (CxP) Ground Operations at Kennedy Space Center (KSC). Part of the0 overall implementation of National Aeronautics and Space Administration's (NASA's) CxP, FDIR is being implemented in three main components of the program (Ares, Orion, and Ground Operations/Processing). While not initially part of the design baseline for the CxP Ground Operations, NASA felt that FDIR is important enough to develop, that NASA's Exploration Systems Mission Directorate's (ESMD's) Exploration Technology Development Program (ETDP) initiated a task for it under their Integrated System Health Management (ISHM) research area. This task, referred to as the FDIIR project, is a multi-year multi-center effort. The primary purpose of the FDIR project is to develop a prototype and pathway upon which Fault Detection and Isolation (FDI) may be transitioned into the Ground Operations baseline. Currently, Qualtech Systems Inc (QSI) Commercial Off The Shelf (COTS) software products Testability Engineering and Maintenance System (TEAMS) Designer and TEAMS RDS/RT are being utilized in the implementation of FDI within the FDIR project. The TEAMS Designer COTS software product is being utilized to model the system with Functional Fault Models (FFMs). A limited set of systems in Ground Operations are being modeled by the FDIR project, and the entire Ares Launch Vehicle is being modeled under the Functional Fault Analysis (FFA) project at Marshall Space Flight Center (MSFC). Integration of the Ares FFMs and the Ground Processing FFMs is being done under the FDIR project also utilizing the TEAMS Designer COTS software product. One of the most significant challenges related to integration is to ensure that FFMs developed by different organizations can be integrated easily and without errors. Software Interface Control Documents (ICDs) for the FFMs and their usage will be addressed as the solution to this issue. In particular, the advantages and disadvantages of these ICDs across physically separate development groups will be delineated.
Software Health Management: A Short Review of Challenges and Existing Techniques
NASA Technical Reports Server (NTRS)
Pipatsrisawat, Knot; Darwiche, Adnan; Mengshoel, Ole J.; Schumann, Johann
2009-01-01
Modern spacecraft (as well as most other complex mechanisms like aircraft, automobiles, and chemical plants) rely more and more on software, to a point where software failures have caused severe accidents and loss of missions. Software failures during a manned mission can cause loss of life, so there are severe requirements to make the software as safe and reliable as possible. Typically, verification and validation (V&V) has the task of making sure that all software errors are found before the software is deployed and that it always conforms to the requirements. Experience, however, shows that this gold standard of error-free software cannot be reached in practice. Even if the software alone is free of glitches, its interoperation with the hardware (e.g., with sensors or actuators) can cause problems. Unexpected operational conditions or changes in the environment may ultimately cause a software system to fail. Is there a way to surmount this problem? In most modern aircraft and many automobiles, hardware such as central electrical, mechanical, and hydraulic components are monitored by IVHM (Integrated Vehicle Health Management) systems. These systems can recognize, isolate, and identify faults and failures, both those that already occurred as well as imminent ones. With the help of diagnostics and prognostics, appropriate mitigation strategies can be selected (replacement or repair, switch to redundant systems, etc.). In this short paper, we discuss some challenges and promising techniques for software health management (SWHM). In particular, we identify unique challenges for preventing software failure in systems which involve both software and hardware components. We then present our classifications of techniques related to SWHM. These classifications are performed based on dimensions of interest to both developers and users of the techniques, and hopefully provide a map for dealing with software faults and failures.
NASA Astrophysics Data System (ADS)
Iigaya, Kiyohito
A robust, fast and accurate protection system based on pilot protection concept was developed previously and a few alterations in that algorithm were made to make it faster and more reliable and then was applied to smart distribution grids to verify the results for it. The new 10 sample window method was adapted into the pilot protection program and its performance for the test bed system operation was tabulated. Following that the system comparison between the hardware results for the same algorithm and the simulation results were compared. The development of the dual slope percentage differential method, its comparison with the 10 sample average window pilot protection system and the effects of CT saturation on the pilot protection system are also shown in this thesis. The implementation of the 10 sample average window pilot protection system is done to multiple distribution grids like Green Hub v4.3, IEEE 34, LSSS loop and modified LSSS loop. Case studies of these multi-terminal model are presented, and the results are also shown in this thesis. The result obtained shows that the new algorithm for the previously proposed protection system successfully identifies fault on the test bed and the results for both hardware and software simulations match and the response time is approximately less than quarter of a cycle which is fast as compared to the present commercial protection system and satisfies the FREEDM system requirement.
Gajare, Swaroop; Rao, J Ganeswara; Naidu, O D; Pradhan, Ashok Kumar
2017-08-13
Cascade tripping of power lines triggered by maloperation of zone-3 relays during stressed system conditions, such as load encroachment, power swing and voltage instability, has led to many catastrophic power failures worldwide, including Indian blackouts in 2012. With the introduction of wide-area measurement systems (WAMS) into the grids, real-time monitoring of transmission network condition is possible. A phasor measurement unit (PMU) sends time-synchronized data to a phasor data concentrator, which can provide a control signal to substation devices. The latency associated with the communication system makes WAMS suitable for a slower form of protection. In this work, a method to identify the faulted line using synchronized data from strategic PMU locations is proposed. Subsequently, a supervisory signal is generated for specific relays in the system for any disturbance or stressed condition. For a given system, an approach to decide the strategic locations for PMU placement is developed, which can be used for determining the minimum number of PMUs required for application of the method. The accuracy of the scheme is tested for faults during normal and stressed conditions in a New England 39-bus system simulated using EMTDC/PSCAD software. With such a strategy, maloperation of relays can be averted in many situations and thereby blackouts/large-scale disturbances can be prevented.This article is part of the themed issue 'Energy management: flexibility, risk and optimization'. © 2017 The Author(s).
Software quality: Process or people
NASA Technical Reports Server (NTRS)
Palmer, Regina; Labaugh, Modenna
1993-01-01
This paper will present data related to software development processes and personnel involvement from the perspective of software quality assurance. We examine eight years of data collected from six projects. Data collected varied by project but usually included defect and fault density with limited use of code metrics, schedule adherence, and budget growth information. The data are a blend of AFSCP 800-14 and suggested productivity measures in Software Metrics: A Practioner's Guide to Improved Product Development. A software quality assurance database tool, SQUID, was used to store and tabulate the data.
A software upgrade method for micro-electronics medical implants.
Cao, Yang; Hao, Hongwei; Xue, Lin; Li, Luming; Ma, Bozhi
2006-01-01
A software upgrade method for micro-electronics medical implants is designed to enhance the devices' function or renew the software if there are some bugs found, the software updating or some memory units disabled. The implants needn't be replaced by operations if the faults can be corrected through reprogramming, which reduces the patients' pain and improves the safety effectively. This paper introduces the software upgrade method using in-application programming (IAP) and emphasizes how to insure the system, especially the implanted part's reliability and stability while upgrading.
Analysis on Behaviour of Wavelet Coefficient during Fault Occurrence in Transformer
NASA Astrophysics Data System (ADS)
Sreewirote, Bancha; Ngaopitakkul, Atthapol
2018-03-01
The protection system for transformer has play significant role in avoiding severe damage to equipment when disturbance occur and ensure overall system reliability. One of the methodology that widely used in protection scheme and algorithm is discrete wavelet transform. However, characteristic of coefficient under fault condition must be analyzed to ensure its effectiveness. So, this paper proposed study and analysis on wavelet coefficient characteristic when fault occur in transformer in both high- and low-frequency component from discrete wavelet transform. The effect of internal and external fault on wavelet coefficient of both fault and normal phase has been taken into consideration. The fault signal has been simulate using transmission connected to transformer experimental setup on laboratory level that modelled after actual system. The result in term of wavelet coefficient shown a clearly differentiate between wavelet characteristic in both high and low frequency component that can be used to further design and improve detection and classification algorithm that based on discrete wavelet transform methodology in the future.
(Quickly) Testing the Tester via Path Coverage
NASA Technical Reports Server (NTRS)
Groce, Alex
2009-01-01
The configuration complexity and code size of an automated testing framework may grow to a point that the tester itself becomes a significant software artifact, prone to poor configuration and implementation errors. Unfortunately, testing the tester by using old versions of the software under test (SUT) may be impractical or impossible: test framework changes may have been motivated by interface changes in the tested system, or fault detection may become too expensive in terms of computing time to justify running until errors are detected on older versions of the software. We propose the use of path coverage measures as a "quick and dirty" method for detecting many faults in complex test frameworks. We also note the possibility of using techniques developed to diversify state-space searches in model checking to diversify test focus, and an associated classification of tester changes into focus-changing and non-focus-changing modifications.
Advanced information processing system: Local system services
NASA Technical Reports Server (NTRS)
Burkhardt, Laura; Alger, Linda; Whittredge, Roy; Stasiowski, Peter
1989-01-01
The Advanced Information Processing System (AIPS) is a multi-computer architecture composed of hardware and software building blocks that can be configured to meet a broad range of application requirements. The hardware building blocks are fault-tolerant, general-purpose computers, fault-and damage-tolerant networks (both computer and input/output), and interfaces between the networks and the computers. The software building blocks are the major software functions: local system services, input/output, system services, inter-computer system services, and the system manager. The foundation of the local system services is an operating system with the functions required for a traditional real-time multi-tasking computer, such as task scheduling, inter-task communication, memory management, interrupt handling, and time maintenance. Resting on this foundation are the redundancy management functions necessary in a redundant computer and the status reporting functions required for an operator interface. The functional requirements, functional design and detailed specifications for all the local system services are documented.
A second generation experiment in fault-tolerant software
NASA Technical Reports Server (NTRS)
Knight, J. C.
1986-01-01
Information was collected on the efficacy of fault-tolerant software by conducting two large-scale controlled experiments. In the first, an empirical study of multi-version software (MVS) was conducted. The second experiment is an empirical evaluation of self testing as a method of error detection (STED). The purpose ot the MVS experiment was to obtain empirical measurement of the performance of multi-version systems. Twenty versions of a program were prepared at four different sites under reasonably realistic development conditions from the same specifications. The purpose of the STED experiment was to obtain empirical measurements of the performance of assertions in error detection. Eight versions of a program were modified to include assertions at two different sites under controlled conditions. The overall structure of the testing environment for the MVS experiment and its status are described. Work to date in the STED experiment is also presented.
An Integrated Crustal Dynamics Simulator
NASA Astrophysics Data System (ADS)
Xing, H. L.; Mora, P.
2007-12-01
Numerical modelling offers an outstanding opportunity to gain an understanding of the crustal dynamics and complex crustal system behaviour. This presentation provides our long-term and ongoing effort on finite element based computational model and software development to simulate the interacting fault system for earthquake forecasting. A R-minimum strategy based finite-element computational model and software tool, PANDAS, for modelling 3-dimensional nonlinear frictional contact behaviour between multiple deformable bodies with the arbitrarily-shaped contact element strategy has been developed by the authors, which builds up a virtual laboratory to simulate interacting fault systems including crustal boundary conditions and various nonlinearities (e.g. from frictional contact, materials, geometry and thermal coupling). It has been successfully applied to large scale computing of the complex nonlinear phenomena in the non-continuum media involving the nonlinear frictional instability, multiple material properties and complex geometries on supercomputers, such as the South Australia (SA) interacting fault system, South California fault model and Sumatra subduction model. It has been also extended and to simulate the hot fractured rock (HFR) geothermal reservoir system in collaboration of Geodynamics Ltd which is constructing the first geothermal reservoir system in Australia and to model the tsunami generation induced by earthquakes. Both are supported by Australian Research Council.
Three-Dimensional Geologic Map of the Hayward Fault Zone, San Francisco Bay Region, California
Phelps, G.A.; Graymer, R.W.; Jachens, R.C.; Ponce, D.A.; Simpson, R.W.; Wentworth, C.M.
2008-01-01
A three-dimensional (3D) geologic map of the Hayward Fault zone was created by integrating the results from geologic mapping, potential field geophysics, and seismology investigations. The map volume is 100 km long, 20 km wide, and extends to a depth of 12 km below sea level. The map volume is oriented northwest and is approximately bisected by the Hayward Fault. The complex geologic structure of the region makes it difficult to trace many geologic units into the subsurface. Therefore, the map units are generalized from 1:24,000-scale geologic maps. Descriptions of geologic units and structures are offered, along with a discussion of the methods used to map them and incorporate them into the 3D geologic map. The map spatial database and associated viewing software are provided. Elements of the map, such as individual fault surfaces, are also provided in a non-proprietary format so that the user can access the map via open-source software. The sheet accompanying this manuscript shows views taken from the 3D geologic map for the user to access. The 3D geologic map is designed as a multi-purpose resource for further geologic investigations and process modeling.
NASA Technical Reports Server (NTRS)
Harper, Richard E.; Elks, Carl
1995-01-01
An Army Fault Tolerant Architecture (AFTA) has been developed to meet real-time fault tolerant processing requirements of future Army applications. AFTA is the enabling technology that will allow the Army to configure existing processors and other hardware to provide high throughput and ultrahigh reliability necessary for TF/TA/NOE flight control and other advanced Army applications. A comprehensive conceptual study of AFTA has been completed that addresses a wide range of issues including requirements, architecture, hardware, software, testability, producibility, analytical models, validation and verification, common mode faults, VHDL, and a fault tolerant data bus. A Brassboard AFTA for demonstration and validation has been fabricated, and two operating systems and a flight-critical Army application have been ported to it. Detailed performance measurements have been made of fault tolerance and operating system overheads while AFTA was executing the flight application in the presence of faults.
FINDS: A fault inferring nonlinear detection system programmers manual, version 3.0
NASA Technical Reports Server (NTRS)
Lancraft, R. E.
1985-01-01
Detailed software documentation of the digital computer program FINDS (Fault Inferring Nonlinear Detection System) Version 3.0 is provided. FINDS is a highly modular and extensible computer program designed to monitor and detect sensor failures, while at the same time providing reliable state estimates. In this version of the program the FINDS methodology is used to detect, isolate, and compensate for failures in simulated avionics sensors used by the Advanced Transport Operating Systems (ATOPS) Transport System Research Vehicle (TSRV) in a Microwave Landing System (MLS) environment. It is intended that this report serve as a programmers guide to aid in the maintenance, modification, and revision of the FINDS software.
Adopting software quality measures for healthcare processes.
Yildiz, Ozkan; Demirörs, Onur
2009-01-01
In this study, we investigated the adoptability of software quality measures for healthcare process measurement. Quality measures of ISO/IEC 9126 are redefined from a process perspective to build a generic healthcare process quality measurement model. Case study research method is used, and the model is applied to a public hospital's Entry to Care process. After the application, weak and strong aspects of the process can be easily observed. Access audibility, fault removal, completeness of documentation, and machine utilization are weak aspects and these aspects are the candidates for process improvement. On the other hand, functional completeness, fault ratio, input validity checking, response time, and throughput time are the strong aspects of the process.
Implementation of a research prototype onboard fault monitoring and diagnosis system
NASA Technical Reports Server (NTRS)
Palmer, Michael T.; Abbott, Kathy H.; Schutte, Paul C.; Ricks, Wendell R.
1987-01-01
Due to the dynamic and complex nature of in-flight fault monitoring and diagnosis, a research effort was undertaken at NASA Langley Research Center to investigate the application of artificial intelligence techniques for improved situational awareness. Under this research effort, concepts were developed and a software architecture was designed to address the complexities of onboard monitoring and diagnosis. This paper describes the implementation of these concepts in a computer program called FaultFinder. The implementation of the monitoring, diagnosis, and interface functions as separate modules is discussed, as well as the blackboard designed for the communication of these modules. Some related issues concerning the future installation of FaultFinder in an aircraft are also discussed.
Designing application software in wide area network settings
NASA Technical Reports Server (NTRS)
Makpangou, Mesaac; Birman, Ken
1990-01-01
Progress in methodologies for developing robust local area network software has not been matched by similar results for wide area settings. The design of application software spanning multiple local area environments is examined. For important classes of applications, simple design techniques are presented that yield fault tolerant wide area programs. An implementation of these techniques as a set of tools for use within the ISIS system is described.
Digital Database of Recently Active Traces of the Hayward Fault, California
Lienkaemper, James J.
2006-01-01
The purpose of this map is to show the location of and evidence for recent movement on active fault traces within the Hayward Fault Zone, California. The mapped traces represent the integration of the following three different types of data: (1) geomorphic expression, (2) creep (aseismic fault slip),and (3) trench exposures. This publication is a major revision of an earlier map (Lienkaemper, 1992), which both brings up to date the evidence for faulting and makes it available formatted both as a digital database for use within a geographic information system (GIS) and for broader public access interactively using widely available viewing software. The pamphlet describes in detail the types of scientific observations used to make the map, gives references pertaining to the fault and the evidence of faulting, and provides guidance for use of and limitations of the map. [Last revised Nov. 2008, a minor update for 2007 LiDAR and recent trench investigations; see version history below.
A new method of converter transformer protection without commutation failure
NASA Astrophysics Data System (ADS)
Zhang, Jiayu; Kong, Bo; Liu, Mingchang; Zhang, Jun; Guo, Jianhong; Jing, Xu
2018-01-01
With the development of AC / DC hybrid transmission technology, converter transformer as nodes of AC and DC conversion of HVDC transmission technology, its reliable safe and stable operation plays an important role in the DC transmission. As a common problem of DC transmission, commutation failure poses a serious threat to the safe and stable operation of power grid. According to the commutation relation between the AC bus voltage of converter station and the output DC voltage of converter, the generalized transformation ratio is defined, and a new method of converter transformer protection based on generalized transformation ratio is put forward. The method uses generalized ratio to realize the on-line monitoring of the fault or abnormal commutation components, and the use of valve side of converter transformer bushing CT current characteristics of converter transformer fault accurately, and is not influenced by the presence of commutation failure. Through the fault analysis and EMTDC/PSCAD simulation, the protection can be operated correctly under the condition of various faults of the converter.
NASA Technical Reports Server (NTRS)
King, Ellis; Hart, Jeremy; Odegard, Ryan
2010-01-01
The Orion Crew Exploration Vehicle (CET) is being designed to include significantly more automation capability than either the Space Shuttle or the International Space Station (ISS). In particular, the vehicle flight software has requirements to accommodate increasingly automated missions throughout all phases of flight. A data-driven flight software architecture will provide an evolvable automation capability to sequence through Guidance, Navigation & Control (GN&C) flight software modes and configurations while maintaining the required flexibility and human control over the automation. This flexibility is a key aspect needed to address the maturation of operational concepts, to permit ground and crew operators to gain trust in the system and mitigate unpredictability in human spaceflight. To allow for mission flexibility and reconfrgurability, a data driven approach is being taken to load the mission event plan as well cis the flight software artifacts associated with the GN&C subsystem. A database of GN&C level sequencing data is presented which manages and tracks the mission specific and algorithm parameters to provide a capability to schedule GN&C events within mission segments. The flight software data schema for performing automated mission sequencing is presented with a concept of operations for interactions with ground and onboard crew members. A prototype architecture for fault identification, isolation and recovery interactions with the automation software is presented and discussed as a forward work item.
Thermal Expert System (TEXSYS): Systems autonomy demonstration project, volume 2. Results
NASA Technical Reports Server (NTRS)
Glass, B. J. (Editor)
1992-01-01
The Systems Autonomy Demonstration Project (SADP) produced a knowledge-based real-time control system for control and fault detection, isolation, and recovery (FDIR) of a prototype two-phase Space Station Freedom external active thermal control system (EATCS). The Thermal Expert System (TEXSYS) was demonstrated in recent tests to be capable of reliable fault anticipation and detection, as well as ordinary control of the thermal bus. Performance requirements were addressed by adopting a hierarchical symbolic control approach-layering model-based expert system software on a conventional, numerical data acquisition and control system. The model-based reasoning capabilities of TEXSYS were shown to be advantageous over typical rule-based expert systems, particularly for detection of unforeseen faults and sensor failures. Volume 1 gives a project overview and testing highlights. Volume 2 provides detail on the EATCS testbed, test operations, and online test results. Appendix A is a test archive, while Appendix B is a compendium of design and user manuals for the TEXSYS software.
Thermal Expert System (TEXSYS): Systems automony demonstration project, volume 1. Overview
NASA Technical Reports Server (NTRS)
Glass, B. J. (Editor)
1992-01-01
The Systems Autonomy Demonstration Project (SADP) produced a knowledge-based real-time control system for control and fault detection, isolation, and recovery (FDIR) of a prototype two-phase Space Station Freedom external active thermal control system (EATCS). The Thermal Expert System (TEXSYS) was demonstrated in recent tests to be capable of reliable fault anticipation and detection, as well as ordinary control of the thermal bus. Performance requirements were addressed by adopting a hierarchical symbolic control approach-layering model-based expert system software on a conventional, numerical data acquisition and control system. The model-based reasoning capabilities of TEXSYS were shown to be advantageous over typical rule-based expert systems, particularly for detection of unforeseen faults and sensor failures. Volume 1 gives a project overview and testing highlights. Volume 2 provides detail on the EATCS test bed, test operations, and online test results. Appendix A is a test archive, while Appendix B is a compendium of design and user manuals for the TEXSYS software.
Development and realization of the open fault diagnosis system based on XPE
NASA Astrophysics Data System (ADS)
Deng, Hui; Wang, TaiYong; He, HuiLong; Xu, YongGang; Zeng, JuXiang
2005-12-01
To make the complex mechanical equipment work in good service, the technology for realizing an embedded open system is introduced systematically, including open hardware configuration, customized embedded operation system and open software structure. The ETX technology is adopted in this system, integrating the CPU main-board functions, and achieving the quick, real-time signal acquisition and intelligent data analysis with applying DSP and CPLD data acquisition card. Under the open configuration, the signal bus mode such as PCI, ISA and PC/104 can be selected and the styles of the signals can be chosen too. In addition, through customizing XPE system, adopting the EWF (Enhanced Write Filter), and realizing the open system authentically, the stability of the system is enhanced. Multi-thread and multi-task programming techniques are adopted in the software programming process. Interconnecting with the remote fault diagnosis center via the net interface, cooperative diagnosis is conducted and the intelligent degree of the fault diagnosis is improved.
Thermal Expert System (TEXSYS): Systems autonomy demonstration project, volume 2. Results
NASA Astrophysics Data System (ADS)
Glass, B. J.
1992-10-01
The Systems Autonomy Demonstration Project (SADP) produced a knowledge-based real-time control system for control and fault detection, isolation, and recovery (FDIR) of a prototype two-phase Space Station Freedom external active thermal control system (EATCS). The Thermal Expert System (TEXSYS) was demonstrated in recent tests to be capable of reliable fault anticipation and detection, as well as ordinary control of the thermal bus. Performance requirements were addressed by adopting a hierarchical symbolic control approach-layering model-based expert system software on a conventional, numerical data acquisition and control system. The model-based reasoning capabilities of TEXSYS were shown to be advantageous over typical rule-based expert systems, particularly for detection of unforeseen faults and sensor failures. Volume 1 gives a project overview and testing highlights. Volume 2 provides detail on the EATCS testbed, test operations, and online test results. Appendix A is a test archive, while Appendix B is a compendium of design and user manuals for the TEXSYS software.
Methodologies for Adaptive Flight Envelope Estimation and Protection
NASA Technical Reports Server (NTRS)
Tang, Liang; Roemer, Michael; Ge, Jianhua; Crassidis, Agamemnon; Prasad, J. V. R.; Belcastro, Christine
2009-01-01
This paper reports the latest development of several techniques for adaptive flight envelope estimation and protection system for aircraft under damage upset conditions. Through the integration of advanced fault detection algorithms, real-time system identification of the damage/faulted aircraft and flight envelop estimation, real-time decision support can be executed autonomously for improving damage tolerance and flight recoverability. Particularly, a bank of adaptive nonlinear fault detection and isolation estimators were developed for flight control actuator faults; a real-time system identification method was developed for assessing the dynamics and performance limitation of impaired aircraft; online learning neural networks were used to approximate selected aircraft dynamics which were then inverted to estimate command margins. As off-line training of network weights is not required, the method has the advantage of adapting to varying flight conditions and different vehicle configurations. The key benefit of the envelope estimation and protection system is that it allows the aircraft to fly close to its limit boundary by constantly updating the controller command limits during flight. The developed techniques were demonstrated on NASA s Generic Transport Model (GTM) simulation environments with simulated actuator faults. Simulation results and remarks on future work are presented.
High-throughput state-machine replication using software transactional memory.
Zhao, Wenbing; Yang, William; Zhang, Honglei; Yang, Jack; Luo, Xiong; Zhu, Yueqin; Yang, Mary; Luo, Chaomin
2016-11-01
State-machine replication is a common way of constructing general purpose fault tolerance systems. To ensure replica consistency, requests must be executed sequentially according to some total order at all non-faulty replicas. Unfortunately, this could severely limit the system throughput. This issue has been partially addressed by identifying non-conflicting requests based on application semantics and executing these requests concurrently. However, identifying and tracking non-conflicting requests require intimate knowledge of application design and implementation, and a custom fault tolerance solution developed for one application cannot be easily adopted by other applications. Software transactional memory offers a new way of constructing concurrent programs. In this article, we present the mechanisms needed to retrofit existing concurrency control algorithms designed for software transactional memory for state-machine replication. The main benefit for using software transactional memory in state-machine replication is that general purpose concurrency control mechanisms can be designed without deep knowledge of application semantics. As such, new fault tolerance systems based on state-machine replications with excellent throughput can be easily designed and maintained. In this article, we introduce three different concurrency control mechanisms for state-machine replication using software transactional memory, namely, ordered strong strict two-phase locking, conventional timestamp-based multiversion concurrency control, and speculative timestamp-based multiversion concurrency control. Our experiments show that speculative timestamp-based multiversion concurrency control mechanism has the best performance in all types of workload, the conventional timestamp-based multiversion concurrency control offers the worst performance due to high abort rate in the presence of even moderate contention between transactions. The ordered strong strict two-phase locking mechanism offers the simplest solution with excellent performance in low contention workload, and fairly good performance in high contention workload.
High-throughput state-machine replication using software transactional memory
Yang, William; Zhang, Honglei; Yang, Jack; Luo, Xiong; Zhu, Yueqin; Yang, Mary; Luo, Chaomin
2017-01-01
State-machine replication is a common way of constructing general purpose fault tolerance systems. To ensure replica consistency, requests must be executed sequentially according to some total order at all non-faulty replicas. Unfortunately, this could severely limit the system throughput. This issue has been partially addressed by identifying non-conflicting requests based on application semantics and executing these requests concurrently. However, identifying and tracking non-conflicting requests require intimate knowledge of application design and implementation, and a custom fault tolerance solution developed for one application cannot be easily adopted by other applications. Software transactional memory offers a new way of constructing concurrent programs. In this article, we present the mechanisms needed to retrofit existing concurrency control algorithms designed for software transactional memory for state-machine replication. The main benefit for using software transactional memory in state-machine replication is that general purpose concurrency control mechanisms can be designed without deep knowledge of application semantics. As such, new fault tolerance systems based on state-machine replications with excellent throughput can be easily designed and maintained. In this article, we introduce three different concurrency control mechanisms for state-machine replication using software transactional memory, namely, ordered strong strict two-phase locking, conventional timestamp-based multiversion concurrency control, and speculative timestamp-based multiversion concurrency control. Our experiments show that speculative timestamp-based multiversion concurrency control mechanism has the best performance in all types of workload, the conventional timestamp-based multiversion concurrency control offers the worst performance due to high abort rate in the presence of even moderate contention between transactions. The ordered strong strict two-phase locking mechanism offers the simplest solution with excellent performance in low contention workload, and fairly good performance in high contention workload. PMID:29075049
Interlock system for machine protection of the KOMAC 100-MeV proton linac
NASA Astrophysics Data System (ADS)
Song, Young-Gi
2015-02-01
The 100-MeV proton linear accelerator of the Korea Multi-purpose Accelerator Complex (KOMAC) has been developed. The beam service started this year after performing the beam commissioning. If the very sensitive and essential equipment is to be protected during machine operation, a machine interlock system is required, and the interlock system has been implemented. The purpose of the interlock system is to shut off the beam when the radio-frequency (RF) and ion source are unstable or a beam loss occurs. The interlock signal of the KOMAC linac includes a variety of sources, such as the beam loss, RF and high-voltage converter modulator faults, and fast closing valves of the vacuum window at the beam lines and so on. This system consists of a hardware-based interlock system using analog circuits and a software-based interlock system using an industrial programmable logic controller (PLC). The hardware-based interlock system has been fabricated, and the requirement has been satisfied with the results being within 10 µs. The software logic interlock system using the PLC has been connected to the framework of with the experimental physics and industrial control system (EPICS) to integrate a variety of interlock signals and to control the machine components when an interlock occurs. This paper will describe the design and the construction of the machine interlock system for the KOMAC 100-MeV linac.
NASA Technical Reports Server (NTRS)
2004-01-01
Topics covered include: Data Relay Board with Protocol for High-Speed, Free-Space Optical Communications; Software and Algorithms for Biomedical Image Data Processing and Visualization; Rapid Chemometric Filtering of Spectral Data; Prioritizing Scientific Data for Transmission; Determining Sizes of Particles in a Flow from DPIV Data; Faster Processing for Inverting GPS Occultation Data; FPGA-Based, Self-Checking, Fault-Tolerant Computers; Ultralow-Power Digital Correlator for Microwave Polarimetry; Grounding Headphones for Protection Against ESD; Lightweight Stacks of Direct Methanol Fuel Cells; Highly Efficient Vector-Inversion Pulse Generators; Estimating Basic Preliminary Design Performances of Aerospace Vehicles; Framework for Development of Object-Oriented Software; Analyzing Spacecraft Telecommunication Systems; Collaborative Planning of Robotic Exploration; Tools for Administration of a UNIX-Based Network; Preparing and Analyzing Iced Airfoils; Evaluating Performance of Components; Fuels Containing Methane of Natural Gas in Solution; Direct Electrolytic Deposition of Mats of MnxOy Nanowires; Bubble Eliminator Based on Centrifugal Flow; Inflatable Emergency Atmospheric-Entry Vehicles; Lightweight Deployable Mirrors with Tensegrity Supports; Centrifugal Adsorption Cartridge System; Ultrasonic Apparatus for Pulverizing Brittle Material; Transplanting Retinal Cells using Bucky Paper for Support; Using an Ultrasonic Instrument to Size Extravascular Bubbles; Coronagraphic Notch Filter for Raman Spectroscopy; On-the-Fly Mapping for Calibrating Directional Antennas; Working Fluids for Increasing Capacities of Heat Pipes; Computationally-Efficient Minimum-Time Aircraft Routes in the Presence of Winds; Liquid-Metal-Fed Pulsed Plasma Thrusters; Personal Radiation Protection System; and Attitude Control for a Solar-Sail Spacecraft.
Quantitative Measures for Software Independent Verification and Validation
NASA Technical Reports Server (NTRS)
Lee, Alice
1996-01-01
As software is maintained or reused, it undergoes an evolution which tends to increase the overall complexity of the code. To understand the effects of this, we brought in statistics experts and leading researchers in software complexity, reliability, and their interrelationships. These experts' project has resulted in our ability to statistically correlate specific code complexity attributes, in orthogonal domains, to errors found over time in the HAL/S flight software which flies in the Space Shuttle. Although only a prototype-tools experiment, the result of this research appears to be extendable to all other NASA software, given appropriate data similar to that logged for the Shuttle onboard software. Our research has demonstrated that a more complete domain coverage can be mathematically demonstrated with the approach we have applied, thereby ensuring full insight into the cause-and-effects relationship between the complexity of a software system and the fault density of that system. By applying the operational profile we can characterize the dynamic effects of software path complexity under this same approach We now have the ability to measure specific attributes which have been statistically demonstrated to correlate to increased error probability, and to know which actions to take, for each complexity domain. Shuttle software verifiers can now monitor the changes in the software complexity, assess the added or decreased risk of software faults in modified code, and determine necessary corrections. The reports, tool documentation, user's guides, and new approach that have resulted from this research effort represent advances in the state of the art of software quality and reliability assurance. Details describing how to apply this technique to other NASA code are contained in this document.
Software reliability models for critical applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pham, H.; Pham, M.
This report presents the results of the first phase of the ongoing EG G Idaho, Inc. Software Reliability Research Program. The program is studying the existing software reliability models and proposes a state-of-the-art software reliability model that is relevant to the nuclear reactor control environment. This report consists of three parts: (1) summaries of the literature review of existing software reliability and fault tolerant software reliability models and their related issues, (2) proposed technique for software reliability enhancement, and (3) general discussion and future research. The development of this proposed state-of-the-art software reliability model will be performed in the secondmore » place. 407 refs., 4 figs., 2 tabs.« less
Software reliability models for critical applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pham, H.; Pham, M.
This report presents the results of the first phase of the ongoing EG&G Idaho, Inc. Software Reliability Research Program. The program is studying the existing software reliability models and proposes a state-of-the-art software reliability model that is relevant to the nuclear reactor control environment. This report consists of three parts: (1) summaries of the literature review of existing software reliability and fault tolerant software reliability models and their related issues, (2) proposed technique for software reliability enhancement, and (3) general discussion and future research. The development of this proposed state-of-the-art software reliability model will be performed in the second place.more » 407 refs., 4 figs., 2 tabs.« less
Resilience Design Patterns - A Structured Approach to Resilience at Extreme Scale (version 1.1)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hukerikar, Saurabh; Engelmann, Christian
Reliability is a serious concern for future extreme-scale high-performance computing (HPC) systems. Projections based on the current generation of HPC systems and technology roadmaps suggest the prevalence of very high fault rates in future systems. The errors resulting from these faults will propagate and generate various kinds of failures, which may result in outcomes ranging from result corruptions to catastrophic application crashes. Therefore the resilience challenge for extreme-scale HPC systems requires management of various hardware and software technologies that are capable of handling a broad set of fault models at accelerated fault rates. Also, due to practical limits on powermore » consumption in HPC systems future systems are likely to embrace innovative architectures, increasing the levels of hardware and software complexities. As a result the techniques that seek to improve resilience must navigate the complex trade-off space between resilience and the overheads to power consumption and performance. While the HPC community has developed various resilience solutions, application-level techniques as well as system-based solutions, the solution space of HPC resilience techniques remains fragmented. There are no formal methods and metrics to investigate and evaluate resilience holistically in HPC systems that consider impact scope, handling coverage, and performance & power efficiency across the system stack. Additionally, few of the current approaches are portable to newer architectures and software environments that will be deployed on future systems. In this document, we develop a structured approach to the management of HPC resilience using the concept of resilience-based design patterns. A design pattern is a general repeatable solution to a commonly occurring problem. We identify the commonly occurring problems and solutions used to deal with faults, errors and failures in HPC systems. Each established solution is described in the form of a pattern that addresses concrete problems in the design of resilient systems. The complete catalog of resilience design patterns provides designers with reusable design elements. We also define a framework that enhances a designer's understanding of the important constraints and opportunities for the design patterns to be implemented and deployed at various layers of the system stack. This design framework may be used to establish mechanisms and interfaces to coordinate flexible fault management across hardware and software components. The framework also supports optimization of the cost-benefit trade-offs among performance, resilience, and power consumption. The overall goal of this work is to enable a systematic methodology for the design and evaluation of resilience technologies in extreme-scale HPC systems that keep scientific applications running to a correct solution in a timely and cost-efficient manner in spite of frequent faults, errors, and failures of various types.« less
NASA Technical Reports Server (NTRS)
1985-01-01
The primary objective of the Test Active Control Technology (ACT) System laboratory tests was to verify and validate the system concept, hardware, and software. The initial lab tests were open loop hardware tests of the Test ACT System as designed and built. During the course of the testing, minor problems were uncovered and corrected. Major software tests were run. The initial software testing was also open loop. These tests examined pitch control laws, wing load alleviation, signal selection/fault detection (SSFD), and output management. The Test ACT System was modified to interface with the direct drive valve (DDV) modules. The initial testing identified problem areas with DDV nonlinearities, valve friction induced limit cycling, DDV control loop instability, and channel command mismatch. The other DDV issue investigated was the ability to detect and isolate failures. Some simple schemes for failure detection were tested but were not completely satisfactory. The Test ACT System architecture continues to appear promising for ACT/FBW applications in systems that must be immune to worst case generic digital faults, and be able to tolerate two sequential nongeneric faults with no reduction in performance. The challenge in such an implementation would be to keep the analog element sufficiently simple to achieve the necessary reliability.
FTMP (Fault Tolerant Multiprocessor) programmer's manual
NASA Technical Reports Server (NTRS)
Feather, F. E.; Liceaga, C. A.; Padilla, P. A.
1986-01-01
The Fault Tolerant Multiprocessor (FTMP) computer system was constructed using the Rockwell/Collins CAPS-6 processor. It is installed in the Avionics Integration Research Laboratory (AIRLAB) of NASA Langley Research Center. It is hosted by AIRLAB's System 10, a VAX 11/750, for the loading of programs and experimentation. The FTMP support software includes a cross compiler for a high level language called Automated Engineering Design (AED) System, an assembler for the CAPS-6 processor assembly language, and a linker. Access to this support software is through an automated remote access facility on the VAX which relieves the user of the burden of learning how to use the IBM 4381. This manual is a compilation of information about the FTMP support environment. It explains the FTMP software and support environment along many of the finer points of running programs on FTMP. This will be helpful to the researcher trying to run an experiment on FTMP and even to the person probing FTMP with fault injections. Much of the information in this manual can be found in other sources; we are only attempting to bring together the basic points in a single source. If the reader should need points clarified, there is a list of support documentation in the back of this manual.
NASA Astrophysics Data System (ADS)
Boron, Sergiusz
2017-06-01
Operational safety of electrical machines and equipment depends, inter alia, on the hazards resulting from their use and on the scope of applied protective measures. The use of insufficient protection against existing hazards leads to reduced operational safety, particularly under fault conditions. On the other hand, excessive (in relation to existing hazards) level of protection may compromise the reliability of power supply. This paper analyses the explosion hazard created by earth faults in longwall power supply systems and evaluates existing protection equipment from the viewpoint of its protective performance, particularly in the context of explosion hazards, and also assesses its effect on the reliability of power supply.
Reconfigurable Autonomy for Future Planetary Rovers
NASA Astrophysics Data System (ADS)
Burroughes, Guy
Extra-terrestrial Planetary rover systems are uniquely remote, placing constraints in regard to communication, environmental uncertainty, and limited physical resources, and requiring a high level of fault tolerance and resistance to hardware degradation. This thesis presents a novel self-reconfiguring autonomous software architecture designed to meet the needs of extraterrestrial planetary environments. At runtime it can safely reconfigure low-level control systems, high-level decisional autonomy systems, and managed software architecture. The architecture can perform automatic Verification and Validation of self-reconfiguration at run-time, and enables a system to be self-optimising, self-protecting, and self-healing. A novel self-monitoring system, which is non-invasive, efficient, tunable, and autonomously deploying, is also presented. The architecture was validated through the use-case of a highly autonomous extra-terrestrial planetary exploration rover. Three major forms of reconfiguration were demonstrated and tested: first, high level adjustment of system internal architecture and goal; second, software module modification; and third, low level alteration of hardware control in response to degradation of hardware and environmental change. The architecture was demonstrated to be robust and effective in a Mars sample return mission use-case testing the operational aspects of a novel, reconfigurable guidance, navigation, and control system for a planetary rover, all operating in concert through a scenario that required reconfiguration of all elements of the system.
Numerical simulation of the stress distribution in a coal mine caused by a normal fault
NASA Astrophysics Data System (ADS)
Zhang, Hongmei; Wu, Jiwen; Zhai, Xiaorong
2017-06-01
Luling coal mine was used for research using FLAC3D software to analyze the stress distribution characteristics of the two sides of a normal fault zone with two different working face models. The working faces were, respectively, on the hanging wall and the foot wall; the two directions of mining were directed to the fault. The stress distributions were different across the fault. The stress was concentrated and the influenced range of stress was gradually larger while the working face was located on the hanging wall. The fault zone played a negative effect to the stress transmission. Obviously, the fault prevented stress transmission, the stress concentrated on the fault zone and the hanging wall. In the second model, the stress on the two sides decreased at first, but then increased continuing to transmit to the hanging wall. The concentrated stress in the fault zone decreased and the stress transmission was obvious. Because of this, the result could be used to minimize roadway damage and lengthen the time available for coal mining by careful design of the roadway and working face.
Mission Services Evolution Center Message Bus
NASA Technical Reports Server (NTRS)
Mayorga, Arturo; Bristow, John O.; Butschky, Mike
2011-01-01
The Goddard Mission Services Evolution Center (GMSEC) Message Bus is a robust, lightweight, fault-tolerant middleware implementation that supports all messaging capabilities of the GMSEC API. This architecture is a distributed software system that routes messages based on message subject names and knowledge of the locations in the network of the interested software components.
NASA Technical Reports Server (NTRS)
Lindsey, Tony; Pecheur, Charles
2004-01-01
Livingstone PathFinder (LPF) is a simulation-based computer program for verifying autonomous diagnostic software. LPF is designed especially to be applied to NASA s Livingstone computer program, which implements a qualitative-model-based algorithm that diagnoses faults in a complex automated system (e.g., an exploratory robot, spacecraft, or aircraft). LPF forms a software test bed containing a Livingstone diagnosis engine, embedded in a simulated operating environment consisting of a simulator of the system to be diagnosed by Livingstone and a driver program that issues commands and faults according to a nondeterministic scenario provided by the user. LPF runs the test bed through all executions allowed by the scenario, checking for various selectable error conditions after each step. All components of the test bed are instrumented, so that execution can be single-stepped both backward and forward. The architecture of LPF is modular and includes generic interfaces to facilitate substitution of alternative versions of its different parts. Altogether, LPF provides a flexible, extensible framework for simulation-based analysis of diagnostic software; these characteristics also render it amenable to application to diagnostic programs other than Livingstone.
Windows .NET Network Distributed Basic Local Alignment Search Toolkit (W.ND-BLAST)
Dowd, Scot E; Zaragoza, Joaquin; Rodriguez, Javier R; Oliver, Melvin J; Payton, Paxton R
2005-01-01
Background BLAST is one of the most common and useful tools for Genetic Research. This paper describes a software application we have termed Windows .NET Distributed Basic Local Alignment Search Toolkit (W.ND-BLAST), which enhances the BLAST utility by improving usability, fault recovery, and scalability in a Windows desktop environment. Our goal was to develop an easy to use, fault tolerant, high-throughput BLAST solution that incorporates a comprehensive BLAST result viewer with curation and annotation functionality. Results W.ND-BLAST is a comprehensive Windows-based software toolkit that targets researchers, including those with minimal computer skills, and provides the ability increase the performance of BLAST by distributing BLAST queries to any number of Windows based machines across local area networks (LAN). W.ND-BLAST provides intuitive Graphic User Interfaces (GUI) for BLAST database creation, BLAST execution, BLAST output evaluation and BLAST result exportation. This software also provides several layers of fault tolerance and fault recovery to prevent loss of data if nodes or master machines fail. This paper lays out the functionality of W.ND-BLAST. W.ND-BLAST displays close to 100% performance efficiency when distributing tasks to 12 remote computers of the same performance class. A high throughput BLAST job which took 662.68 minutes (11 hours) on one average machine was completed in 44.97 minutes when distributed to 17 nodes, which included lower performance class machines. Finally, there is a comprehensive high-throughput BLAST Output Viewer (BOV) and Annotation Engine components, which provides comprehensive exportation of BLAST hits to text files, annotated fasta files, tables, or association files. Conclusion W.ND-BLAST provides an interactive tool that allows scientists to easily utilizing their available computing resources for high throughput and comprehensive sequence analyses. The install package for W.ND-BLAST is freely downloadable from . With registration the software is free, installation, networking, and usage instructions are provided as well as a support forum. PMID:15819992
An Integrated Fault Tolerant Robotic Controller System for High Reliability and Safety
NASA Technical Reports Server (NTRS)
Marzwell, Neville I.; Tso, Kam S.; Hecht, Myron
1994-01-01
This paper describes the concepts and features of a fault-tolerant intelligent robotic control system being developed for applications that require high dependability (reliability, availability, and safety). The system consists of two major elements: a fault-tolerant controller and an operator workstation. The fault-tolerant controller uses a strategy which allows for detection and recovery of hardware, operating system, and application software failures.The fault-tolerant controller can be used by itself in a wide variety of applications in industry, process control, and communications. The controller in combination with the operator workstation can be applied to robotic applications such as spaceborne extravehicular activities, hazardous materials handling, inspection and maintenance of high value items (e.g., space vehicles, reactor internals, or aircraft), medicine, and other tasks where a robot system failure poses a significant risk to life or property.
Reliability of Fault Tolerant Control Systems. Part 1
NASA Technical Reports Server (NTRS)
Wu, N. Eva
2001-01-01
This paper reports Part I of a two part effort, that is intended to delineate the relationship between reliability and fault tolerant control in a quantitative manner. Reliability analysis of fault-tolerant control systems is performed using Markov models. Reliability properties, peculiar to fault-tolerant control systems are emphasized. As a consequence, coverage of failures through redundancy management can be severely limited. It is shown that in the early life of a syi1ein composed of highly reliable subsystems, the reliability of the overall system is affine with respect to coverage, and inadequate coverage induces dominant single point failures. The utility of some existing software tools for assessing the reliability of fault tolerant control systems is also discussed. Coverage modeling is attempted in Part II in a way that captures its dependence on the control performance and on the diagnostic resolution.
[Advanced Development for Space Robotics With Emphasis on Fault Tolerance Technology
NASA Technical Reports Server (NTRS)
Tesar, Delbert
1997-01-01
This report describes work developing fault tolerant redundant robotic architectures and adaptive control strategies for robotic manipulator systems which can dynamically accommodate drastic robot manipulator mechanism, sensor or control failures and maintain stable end-point trajectory control with minimum disturbance. Kinematic designs of redundant, modular, reconfigurable arms for fault tolerance were pursued at a fundamental level. The approach developed robotic testbeds to evaluate disturbance responses of fault tolerant concepts in robotic mechanisms and controllers. The development was implemented in various fault tolerant mechanism testbeds including duality in the joint servo motor modules, parallel and serial structural architectures, and dual arms. All have real-time adaptive controller technologies to react to mechanism or controller disturbances (failures) to perform real-time reconfiguration to continue the task operations. The developments fall into three main areas: hardware, software, and theoretical.
Can Yilmaz, Ali; Aci, Cigdem; Aydin, Kadir
2016-08-17
Currently, in Turkey, fault rates in traffic accidents are determined according to the initiative of accident experts (no speed analyses of vehicles just considering accident type) and there are no specific quantitative instructions on fault rates related to procession of accidents which just represents the type of collision (side impact, head to head, rear end, etc.) in No. 2918 Turkish Highway Traffic Act (THTA 1983). The aim of this study is to introduce a scientific and systematic approach for determination of fault rates in most frequent property damage-only (PDO) traffic accidents in Turkey. In this study, data (police reports, skid marks, deformation, crush depth, etc.) collected from the most frequent and controversial accident types (4 sample vehicle-vehicle scenarios) that consist of PDO were inserted into a reconstruction software called vCrash. Sample real-world scenarios were simulated on the software to generate different vehicle deformations that also correspond to energy-equivalent speed data just before the crash. These values were used to train a multilayer feedforward artificial neural network (MFANN), function fitting neural network (FITNET, a specialized version of MFANN), and generalized regression neural network (GRNN) models within 10-fold cross-validation to predict fault rates without using software. The performance of the artificial neural network (ANN) prediction models was evaluated using mean square error (MSE) and multiple correlation coefficient (R). It was shown that the MFANN model performed better for predicting fault rates (i.e., lower MSE and higher R) than FITNET and GRNN models for accident scenarios 1, 2, and 3, whereas FITNET performed the best for scenario 4. The FITNET model showed the second best results for prediction for the first 3 scenarios. Because there is no training phase in GRNN, the GRNN model produced results much faster than MFANN and FITNET models. However, the GRNN model had the worst prediction results. The R values for prediction of fault rates were close to 1 for all folds and scenarios. This study focuses on exhibiting new aspects and scientific approaches for determining fault rates of involvement in most frequent PDO accidents occurring in Turkey by discussing some deficiencies in THTA and without regard to initiative and/or experience of experts. This study yields judicious decisions to be made especially on forensic investigations and events involving insurance companies. Referring to this approach, injury/fatal and/or pedestrian-related accidents may be analyzed as future work by developing new scientific models.
Fault tolerant computing: A preamble for assuring viability of large computer systems
NASA Technical Reports Server (NTRS)
Lim, R. S.
1977-01-01
The need for fault-tolerant computing is addressed from the viewpoints of (1) why it is needed, (2) how to apply it in the current state of technology, and (3) what it means in the context of the Phoenix computer system and other related systems. To this end, the value of concurrent error detection and correction is described. User protection, program retry, and repair are among the factors considered. The technology of algebraic codes to protect memory systems and arithmetic codes to protect memory systems and arithmetic codes to protect arithmetic operations is discussed.
NASA Astrophysics Data System (ADS)
Yim, S.-W.; Park, B.-C.; Jeong, Y.-T.; Kim, Y.-J.; Yang, S.-E.; Kim, W.-S.; Kim, H.-R.; Du, H.-I.
2013-01-01
A 22.9 kV class hybrid fault current limiter (FCL) developed by Korea Electric Power Corporation and LS Industrial Systems in 2006 operates using the line commutation mechanism and begins to limit the fault current after the first half-cycle. The first peak of the fault current is available for protective coordination in the power system. However, it also produces a large electromagnetic force and imposes a huge stress on power facilities such as the main transformer and gas-insulated switchgear. In this study, we improved the operational characteristics of the hybrid FCL in order to reduce the first peak of the fault current. While maintaining the structure of the hybrid FCL system, we developed a superconducting module that detects and limits the fault current during the first half-cycle. To maintain the protective coordination capacity, the hybrid FCL was designed to reduce the first peak value of the fault current by up to approximately 30%. The superconducting module was also designed to produce a minimum AC loss, generating a small, uniform magnetic field distribution during normal operation. Performance tests confirmed that when applied to the hybrid FCL, the superconducting module showed successful current limiting operation without any damage.
The Assistant for Specifying the Quality Software (ASQS) Operational Concept Document. Volume 1
1990-09-01
Assistant in which the manager supplies system-specific characteristics and needs and the Assistant fills in the software quality concepts and methods. The...member(s) of the Computer Resources Working Group (CRWG) to aid in performing a software quality engineering study. Figure 3.4-1 outlines the...need to recovery from faults more likely than need _o provide alternative functions or interfaces), and more on Autcncmy - 27 - that Modularity
An Open Avionics and Software Architecture to Support Future NASA Exploration Missions
NASA Technical Reports Server (NTRS)
Schlesinger, Adam
2017-01-01
The presentation describes an avionics and software architecture that has been developed through NASAs Advanced Exploration Systems (AES) division. The architecture is open-source, highly reliable with fault tolerance, and utilizes standard capabilities and interfaces, which are scalable and customizable to support future exploration missions. Specific focus areas of discussion will include command and data handling, software, human interfaces, communication and wireless systems, and systems engineering and integration.
Managing Space System Faults: Coalescing NASA's Views
NASA Technical Reports Server (NTRS)
Muirhead, Brian; Fesq, Lorraine
2012-01-01
Managing faults and their resultant failures is a fundamental and critical part of developing and operating aerospace systems. Yet, recent studies have shown that the engineering "discipline" required to manage faults is not widely recognized nor evenly practiced within the NASA community. Attempts to simply name this discipline in recent years has been fraught with controversy among members of the Integrated Systems Health Management (ISHM), Fault Management (FM), Fault Protection (FP), Hazard Analysis (HA), and Aborts communities. Approaches to managing space system faults typically are unique to each organization, with little commonality in the architectures, processes and practices across the industry.
On the engineering of crucial software
NASA Technical Reports Server (NTRS)
Pratt, T. W.; Knight, J. C.; Gregory, S. T.
1983-01-01
The various aspects of the conventional software development cycle are examined. This cycle was the basis of the augmented approach contained in the original grant proposal. This cycle was found inadequate for crucial software development, and the justification for this opinion is presented. Several possible enhancements to the conventional software cycle are discussed. Software fault tolerance, a possible enhancement of major importance, is discussed separately. Formal verification using mathematical proof is considered. Automatic programming is a radical alternative to the conventional cycle and is discussed. Recommendations for a comprehensive approach are presented, and various experiments which could be conducted in AIRLAB are described.
Information technologies in optimization process of monitoring of software and hardware status
NASA Astrophysics Data System (ADS)
Nikitin, P. V.; Savinov, A. N.; Bazhenov, R. I.; Ryabov, I. V.
2018-05-01
The article describes a model of a hardware and software monitoring system for a large company that provides customers with software as a service (SaaS solution) using information technology. The main functions of the monitoring system are: provision of up-todate data for analyzing the state of the IT infrastructure, rapid detection of the fault and its effective elimination. The main risks associated with the provision of these services are described; the comparative characteristics of the software are given; author's methods of monitoring the status of software and hardware are proposed.
Software architecture of the Magdalena Ridge Observatory Interferometer
NASA Astrophysics Data System (ADS)
Farris, Allen; Klinglesmith, Dan; Seamons, John; Torres, Nicolas; Buscher, David; Young, John
2010-07-01
Merging software from 36 independent work packages into a coherent, unified software system with a lifespan of twenty years is the challenge faced by the Magdalena Ridge Observatory Interferometer (MROI). We solve this problem by using standardized interface software automatically generated from simple highlevel descriptions of these systems, relying only on Linux, GNU, and POSIX without complex software such as CORBA. This approach, based on gigabit Ethernet with a TCP/IP protocol, provides the flexibility to integrate and manage diverse, independent systems using a centralized supervisory system that provides a database manager, data collectors, fault handling, and an operator interface.
Operations management system advanced automation: Fault detection isolation and recovery prototyping
NASA Technical Reports Server (NTRS)
Hanson, Matt
1990-01-01
The purpose of this project is to address the global fault detection, isolation and recovery (FDIR) requirements for Operation's Management System (OMS) automation within the Space Station Freedom program. This shall be accomplished by developing a selected FDIR prototype for the Space Station Freedom distributed processing systems. The prototype shall be based on advanced automation methodologies in addition to traditional software methods to meet the requirements for automation. A secondary objective is to expand the scope of the prototyping to encompass multiple aspects of station-wide fault management (SWFM) as discussed in OMS requirements documentation.
2015-08-01
faults are incorporated into the system in order to better understand the EMA reliability, and to aid in designing fault detection software for real...to a fixed angle repeatedly and accurately [16]. The motor in the EHA is used to drive a reversible pump tied to a hydraulic cylinder which moves...24] [25] [26]. These test stands are used for the prognostic testing of EMAS that have had mechanical or electrical faults injected into them. The
An empirical study of software design practices
NASA Technical Reports Server (NTRS)
Card, David N.; Church, Victor E.; Agresti, William W.
1986-01-01
Software engineers have developed a large body of software design theory and folklore, much of which was never validated. The results of an empirical study of software design practices in one specific environment are presented. The practices examined affect module size, module strength, data coupling, descendant span, unreferenced variables, and software reuse. Measures characteristic of these practices were extracted from 887 FORTRAN modules developed for five flight dynamics software projects monitored by the Software Engineering Laboratory (SEL). The relationship of these measures to cost and fault rate was analyzed using a contingency table procedure. The results show that some recommended design practices, despite their intuitive appeal, are ineffective in this environment, whereas others are very effective.
Modelling earthquake ruptures with dynamic off-fault damage
NASA Astrophysics Data System (ADS)
Okubo, Kurama; Bhat, Harsha S.; Klinger, Yann; Rougier, Esteban
2017-04-01
Earthquake rupture modelling has been developed for producing scenario earthquakes. This includes understanding the source mechanisms and estimating far-field ground motion with given a priori constraints like fault geometry, constitutive law of the medium and friction law operating on the fault. It is necessary to consider all of the above complexities of a fault systems to conduct realistic earthquake rupture modelling. In addition to the complexity of the fault geometry in nature, coseismic off-fault damage, which is observed by a variety of geological and seismological methods, plays a considerable role on the resultant ground motion and its spectrum compared to a model with simple planer fault surrounded by purely elastic media. Ideally all of these complexities should be considered in earthquake modelling. State of the art techniques developed so far, however, cannot treat all of them simultaneously due to a variety of computational restrictions. Therefore, we adopt the combined finite-discrete element method (FDEM), which can effectively deal with pre-existing complex fault geometry such as fault branches and kinks and can describe coseismic off-fault damage generated during the dynamic rupture. The advantage of FDEM is that it can handle a wide range of length scales, from metric to kilometric scale, corresponding to the off-fault damage and complex fault geometry respectively. We used the FDEM-based software tool called HOSSedu (Hybrid Optimization Software Suite - Educational Version) for the earthquake rupture modelling, which was developed by Los Alamos National Laboratory. We firstly conducted the cross-validation of this new methodology against other conventional numerical schemes such as the finite difference method (FDM), the spectral element method (SEM) and the boundary integral equation method (BIEM), to evaluate the accuracy with various element sizes and artificial viscous damping values. We demonstrate the capability of the FDEM tool for modelling earthquake ruptures. We then modelled earthquake ruptures allowing for coseismic off-fault damage with appropriate fracture nucleation and growth criteria. We studied the effect of different conditions such as rupture speed (sub-Rayleigh or supershear), the orientation of the initial maximum principal stress with respect to the fault and the magnitude of the initial stress (to mimic depth). The comparison between the sub-Rayleigh and supershear case shows that the coseismic off-fault damage is enhanced in the supershear case when compared with the sub-Rayleigh case. The orientation of the maximum principal stress also has significant difference such that the dynamic off-fault cracking is more likely to occur on the extensional side of the fault for high principal stress orientation. It is found that the coseismic off-fault damage reduces the rupture speed due to the dissipation of the energy by dynamic off-fault cracking generated in the vicinity of the rupture front. In terms of the ground motion amplitude spectra it is shown that the high-frequency radiation is enhanced by the coseismic off-fault damage though it is quickly attenuated. This is caused by the intricate superposition of the radiation generated by the off-fault damage and the perturbation of the rupture speed on the main fault.
14 CFR 29.863 - Flammable fluid fire protection.
Code of Federal Regulations, 2010 CFR
2010-01-01
... AIRCRAFT AIRWORTHINESS STANDARDS: TRANSPORT CATEGORY ROTORCRAFT Design and Construction Fire Protection... sources, including electrical faults, overheating of equipment, and malfunctioning of protective devices...
Reclosing operation characteristics of the flux-coupling type SFCL in a single-line-to ground fault
NASA Astrophysics Data System (ADS)
Jung, B. I.; Cho, Y. S.; Choi, H. S.; Ha, K. H.; Choi, S. G.; Chul, D. C.; Sung, T. H.
2011-11-01
The recloser that is used in distribution systems is a relay system that behaves sequentially to protect power systems from transient and continuous faults. This reclosing operation of the recloser can improve the reliability and stability of the power supply. For cooperation with this recloser, the superconducting fault current limiter (SFCL) must properly perform the reclosing operation. This paper analyzed the reclosing operation characteristics of the three-phase flux-coupling type SFCL in the event of a ground fault. The fault current limiting characteristics according to the changing number of turns of the primary and secondary coils were examined. As the number of turns of the first coil increased, the first maximum fault current decreased. Furthermore, the voltage of the quenched superconducting element also decreased. This means that the power burden of the superconducting element decreases based on the increasing number of turns of the primary coil. The fault current limiting characteristic of the SFCL according to the reclosing time limited the fault current within a 0.5 cycles (8 ms), which is shorter than the closing time of the recloser. In other words, the superconducting element returned to the superconducting state before the second fault and normally performed the fault current limiting operation. If the SFCL did not recover before the recloser reclosing time, the normal current that was flowing in the transmission line after the recovery of the SFCL from the fault would have been limited and would have caused losses. Therefore, the fast recovery time of a SFCL is critical to its cooperation with the protection system.
Compact, Reliable EEPROM Controller
NASA Technical Reports Server (NTRS)
Katz, Richard; Kleyner, Igor
2010-01-01
A compact, reliable controller for an electrically erasable, programmable read-only memory (EEPROM) has been developed specifically for a space-flight application. The design may be adaptable to other applications in which there are requirements for reliability in general and, in particular, for prevention of inadvertent writing of data in EEPROM cells. Inadvertent writes pose risks of loss of reliability in the original space-flight application and could pose such risks in other applications. Prior EEPROM controllers are large and complex and do not provide all reasonable protections (in many cases, few or no protections) against inadvertent writes. In contrast, the present controller provides several layers of protection against inadvertent writes. The controller also incorporates a write-time monitor, enabling determination of trends in the performance of an EEPROM through all phases of testing. The controller has been designed as an integral subsystem of a system that includes not only the controller and the controlled EEPROM aboard a spacecraft but also computers in a ground control station, relatively simple onboard support circuitry, and an onboard communication subsystem that utilizes the MIL-STD-1553B protocol. (MIL-STD-1553B is a military standard that encompasses a method of communication and electrical-interface requirements for digital electronic subsystems connected to a data bus. MIL-STD- 1553B is commonly used in defense and space applications.) The intent was to both maximize reliability while minimizing the size and complexity of onboard circuitry. In operation, control of the EEPROM is effected via the ground computers, the MIL-STD-1553B communication subsystem, and the onboard support circuitry, all of which, in combination, provide the multiple layers of protection against inadvertent writes. There is no controller software, unlike in many prior EEPROM controllers; software can be a major contributor to unreliability, particularly in fault situations such as the loss of power or brownouts. Protection is also provided by a powermonitoring circuit.
Analytical sensor redundancy assessment
NASA Technical Reports Server (NTRS)
Mulcare, D. B.; Downing, L. E.; Smith, M. K.
1988-01-01
The rationale and mechanization of sensor fault tolerance based on analytical redundancy principles are described. The concept involves the substitution of software procedures, such as an observer algorithm, to supplant additional hardware components. The observer synthesizes values of sensor states in lieu of their direct measurement. Such information can then be used, for example, to determine which of two disagreeing sensors is more correct, thus enhancing sensor fault survivability. Here a stability augmentation system is used as an example application, with required modifications being made to a quadruplex digital flight control system. The impact on software structure and the resultant revalidation effort are illustrated as well. Also, the use of an observer algorithm for wind gust filtering of the angle-of-attack sensor signal is presented.
The implementation and use of Ada on distributed systems with high reliability requirements
NASA Technical Reports Server (NTRS)
Knight, J. C.; Gregory, S. T.; Urquhart, J. I. A.
1985-01-01
The use and implementation of Ada in distributed environments in which reliability is the primary concern were investigated. In particular, the concept that a distributed system may be programmed entirely in Ada so that the individual tasks of the system are unconcerned with which processors they are executing on, and that failures may occur in the software or underlying hardware was examined. Progress is discussed for the following areas: continued development and testing of the fault-tolerant Ada testbed; development of suggested changes to Ada so that it might more easily cope with the failure of interest; and design of new approaches to fault-tolerant software in real-time systems, and integration of these ideas into Ada.
Study of a phase-to-ground fault on a 400 kV overhead transmission line
NASA Astrophysics Data System (ADS)
Iagăr, A.; Popa, G. N.; Diniş, C. M.
2018-01-01
Power utilities need to supply their consumers at high power quality level. Because the faults that occur on High-Voltage and Extra-High-Voltage transmission lines can cause serious damages in underlying transmission and distribution systems, it is important to examine each fault in detail. In this work we studied a phase-to-ground fault (on phase 1) of 400 kV overhead transmission line Mintia-Arad. Indactic® 650 fault analyzing system was used to record the history of the fault. Signals (analog and digital) recorded by Indactic® 650 were visualized and analyzed by Focus program. Summary of fault report allowed evaluation of behavior of control and protection equipment and determination of cause and location of the fault.
Analysis of Multilayered Printed Circuit Boards using Computed Tomography
2014-05-01
complex PCBs that present a challenge for any testing or fault analysis. Set-to- work testing and fault analysis of any electronic circuit require...Electronic Warfare and Radar Division in December 2010. He is currently in Electro- Optic Countermeasures Group. Samuel works on embedded system design...and software optimisation of complex electro-optical systems, including the set to work and characterisation of these systems. He has a Bachelor of
NASA Technical Reports Server (NTRS)
Hruby, R. J.; Bjorkman, W. S.
1977-01-01
Flight test results of the strapdown inertial reference unit (SIRU) navigation system are presented. The fault-tolerant SIRU navigation system features a redundant inertial sensor unit and dual computers. System software provides for detection and isolation of inertial sensor failures and continued operation in the event of failures. Flight test results include assessments of the system's navigational performance and fault tolerance.
Pratt and Whitney Overview and Advanced Health Management Program
NASA Technical Reports Server (NTRS)
Inabinett, Calvin
2008-01-01
Hardware Development Activity: Design and Test Custom Multi-layer Circuit Boards for use in the Fault Emulation Unit; Logic design performed using VHDL; Layout power system for lab hardware; Work lab issues with software developers and software testers; Interface with Engine Systems personnel with performance of Engine hardware components; Perform off nominal testing with new engine hardware.
Human factors in software development
DOE Office of Scientific and Technical Information (OSTI.GOV)
Curtis, B.
1986-01-01
This book presents an overview of ergonomics/human factors in software development, recent research, and classic papers. Articles are drawn from the following areas of psychological research on programming: cognitive ergonomics, cognitive psychology, and psycholinguistics. Topics examined include: theoretical models of how programmers solve technical problems, the characteristics of programming languages, specification formats in behavioral research and psychological aspects of fault diagnosis.
Developing Software For Monitoring And Diagnosis
NASA Technical Reports Server (NTRS)
Edwards, S. J.; Caglayan, A. K.
1993-01-01
Expert-system software shell produces executable code. Report discusses beginning phase of research directed toward development of artificial intelligence for real-time monitoring of, and diagnosis of faults in, complicated systems of equipment. Motivated by need for onboard monitoring and diagnosis of electronic sensing and controlling systems of advanced aircraft. Also applicable to such equipment systems as refineries, factories, and powerplants.
Reliable and Fault-Tolerant Software-Defined Network Operations Scheme for Remote 3D Printing
NASA Astrophysics Data System (ADS)
Kim, Dongkyun; Gil, Joon-Min
2015-03-01
The recent wide expansion of applicable three-dimensional (3D) printing and software-defined networking (SDN) technologies has led to a great deal of attention being focused on efficient remote control of manufacturing processes. SDN is a renowned paradigm for network softwarization, which has helped facilitate remote manufacturing in association with high network performance, since SDN is designed to control network paths and traffic flows, guaranteeing improved quality of services by obtaining network requests from end-applications on demand through the separated SDN controller or control plane. However, current SDN approaches are generally focused on the controls and automation of the networks, which indicates that there is a lack of management plane development designed for a reliable and fault-tolerant SDN environment. Therefore, in addition to the inherent advantage of SDN, this paper proposes a new software-defined network operations center (SD-NOC) architecture to strengthen the reliability and fault-tolerance of SDN in terms of network operations and management in particular. The cooperation and orchestration between SDN and SD-NOC are also introduced for the SDN failover processes based on four principal SDN breakdown scenarios derived from the failures of the controller, SDN nodes, and connected links. The abovementioned SDN troubles significantly reduce the network reachability to remote devices (e.g., 3D printers, super high-definition cameras, etc.) and the reliability of relevant control processes. Our performance consideration and analysis results show that the proposed scheme can shrink operations and management overheads of SDN, which leads to the enhancement of responsiveness and reliability of SDN for remote 3D printing and control processes.
The Importance of Architecture in DoD Software
1991-07-01
01282 92 1 14 060 M91-35 The Importance of Architecture in DOD Software S ACCesion For- * DTIC "r,’L- .S Dr. Barry M. Horowitz July 1991 D;.t ibto...resource utilization: architecture determines how the system sustains , 06 operations when parts of the system fail. The architecture also determines...software maintainers to ensure that we deliver to them whatever is necessary for them Medium to sustain and use the architecture . Fault Rate 37% Getting
Software-Controlled Caches in the VMP Multiprocessor
1986-03-01
programming system level that Processors is tuned for the VMP design. In this vein, we are interested in exploring how far the software support can go to ...handled in software, analogously to the handling agement of the shared program state is familiar and of virtual memory page faults. Hardware support for...ensure good behavior, as opposed to how Each cache miss results in bus traffic. Table 2 pro- vides the bus cost for the "average" cache miss. Fig
Transmission line relay mis-operation detection based on time-synchronized field data
Esmaeilian, Ahad; Popovic, Tomo; Kezunovic, Mladen
2015-05-04
In this paper, a real-time tool to detect transmission line relay mis-operation is implemented. The tool uses time-synchronized measurements obtained from both ends of the line during disturbances. The proposed fault analysis tool comes into the picture only after the protective device has operated and tripped the line. The proposed methodology is able not only to detect, classify, and locate transmission line faults, but also to accurately confirm whether the line was tripped due to a mis-operation of protective relays. The analysis report includes either detailed description of the fault type and location or detection of relay mis-operation. As such,more » it can be a source of very useful information to support the system restoration. The focus of the paper is on the implementation requirements that allow practical application of the methodology, which is illustrated using the field data obtained the real power system. Testing and validation is done using the field data recorded by digital fault recorders and protective relays. The test data included several hundreds of event records corresponding to both relay mis-operations and actual faults. The discussion of results addresses various challenges encountered during the implementation and validation of the presented methodology.« less
Functional Fault Modeling Conventions and Practices for Real-Time Fault Isolation
NASA Technical Reports Server (NTRS)
Ferrell, Bob; Lewis, Mark; Perotti, Jose; Oostdyk, Rebecca; Brown, Barbara
2010-01-01
The purpose of this paper is to present the conventions, best practices, and processes that were established based on the prototype development of a Functional Fault Model (FFM) for a Cryogenic System that would be used for real-time Fault Isolation in a Fault Detection, Isolation, and Recovery (FDIR) system. The FDIR system is envisioned to perform health management functions for both a launch vehicle and the ground systems that support the vehicle during checkout and launch countdown by using a suite of complimentary software tools that alert operators to anomalies and failures in real-time. The FFMs were created offline but would eventually be used by a real-time reasoner to isolate faults in a Cryogenic System. Through their development and review, a set of modeling conventions and best practices were established. The prototype FFM development also provided a pathfinder for future FFM development processes. This paper documents the rationale and considerations for robust FFMs that can easily be transitioned to a real-time operating environment.
Fault tolerance in computational grids: perspectives, challenges, and issues.
Haider, Sajjad; Nazir, Babar
2016-01-01
Computational grids are established with the intention of providing shared access to hardware and software based resources with special reference to increased computational capabilities. Fault tolerance is one of the most important issues faced by the computational grids. The main contribution of this survey is the creation of an extended classification of problems that incur in the computational grid environments. The proposed classification will help researchers, developers, and maintainers of grids to understand the types of issues to be anticipated. Moreover, different types of problems, such as omission, interaction, and timing related have been identified that need to be handled on various layers of the computational grid. In this survey, an analysis and examination is also performed pertaining to the fault tolerance and fault detection mechanisms. Our conclusion is that a dependable and reliable grid can only be established when more emphasis is on fault identification. Moreover, our survey reveals that adaptive and intelligent fault identification, and tolerance techniques can improve the dependability of grid working environments.
Investigating an API for resilient exascale computing.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stearley, Jon R.; Tomkins, James; VanDyke, John P.
2013-05-01
Increased HPC capability comes with increased complexity, part counts, and fault occurrences. In- creasing the resilience of systems and applications to faults is a critical requirement facing the viability of exascale systems, as the overhead of traditional checkpoint/restart is projected to outweigh its bene ts due to fault rates outpacing I/O bandwidths. As faults occur and propagate throughout hardware and software layers, pervasive noti cation and handling mechanisms are necessary. This report describes an initial investigation of fault types and programming interfaces to mitigate them. Proof-of-concept APIs are presented for the frequent and important cases of memory errors and nodemore » failures, and a strategy proposed for lesystem failures. These involve changes to the operating system, runtime, I/O library, and application layers. While a single API for fault handling among hardware and OS and application system-wide remains elusive, the e ort increased our understanding of both the mountainous challenges and the promising trailheads. 3« less
A study of the relationship between the performance and dependability of a fault-tolerant computer
NASA Technical Reports Server (NTRS)
Goswami, Kumar K.
1994-01-01
This thesis studies the relationship by creating a tool (FTAPE) that integrates a high stress workload generator with fault injection and by using the tool to evaluate system performance under error conditions. The workloads are comprised of processes which are formed from atomic components that represent CPU, memory, and I/O activity. The fault injector is software-implemented and is capable of injecting any memory addressable location, including special registers and caches. This tool has been used to study a Tandem Integrity S2 Computer. Workloads with varying numbers of processes and varying compositions of CPU, memory, and I/O activity are first characterized in terms of performance. Then faults are injected into these workloads. The results show that as the number of concurrent processes increases, the mean fault latency initially increases due to increased contention for the CPU. However, for even higher numbers of processes (less than 3 processes), the mean latency decreases because long latency faults are paged out before they can be activated.
NASA Astrophysics Data System (ADS)
Mbaya, Timmy
Embedded Aerospace Systems have to perform safety and mission critical operations in a real-time environment where timing and functional correctness are extremely important. Guidance, Navigation, and Control (GN&C) systems substantially rely on complex software interfacing with hardware in real-time; any faults in software or hardware, or their interaction could result in fatal consequences. Integrated Software Health Management (ISWHM) provides an approach for detection and diagnosis of software failures while the software is in operation. The ISWHM approach is based on probabilistic modeling of software and hardware sensors using a Bayesian network. To meet memory and timing constraints of real-time embedded execution, the Bayesian network is compiled into an Arithmetic Circuit, which is used for on-line monitoring. This type of system monitoring, using an ISWHM, provides automated reasoning capabilities that compute diagnoses in a timely manner when failures occur. This reasoning capability enables time-critical mitigating decisions and relieves the human agent from the time-consuming and arduous task of foraging through a multitude of isolated---and often contradictory---diagnosis data. For the purpose of demonstrating the relevance of ISWHM, modeling and reasoning is performed on a simple simulated aerospace system running on a real-time operating system emulator, the OSEK/Trampoline platform. Models for a small satellite and an F-16 fighter jet GN&C (Guidance, Navigation, and Control) system have been implemented. Analysis of the ISWHM is then performed by injecting faults and analyzing the ISWHM's diagnoses.
NASA Astrophysics Data System (ADS)
Anggraini, Ade; Sobiesiak, Monika; Walter, Thomas R.
2010-05-01
The Mw 6.3 May 26, 2006 Yogyakarta Earthquake caused severe damage and claimed thousands lives in the Yogyakarta Special Province and Klaten District of Central Java Province. The nearby Opak River fault was thought to be the source of this earthquake disaster. However, no significant surface movement was observed along the fault which could confirm that this fault was really the source of the earthquake. To investigate the earthquake source and to understand the earthquake mechanism, a rapid response team of the German Task Force for Earthquake, together with the Seismological Division of Badan Meteorologi Klimatologi dan Geofisika and Gadjah Mada University in Yogyakarta, had installed a temporary seismic network of 12 short period seismometers. More than 3000 aftershocks were recorded during the 3-month campaign. Here we present the result of several hundred processed aftershocks. We used integrated software package GIANTPitsa to pick P and S phases manually and HYPO71 to determine the hypocenters. HypoDD software was used for hypocenters relocation to obtain high precision aftershock locations. Our aftershock distribution shows a system of lineaments in southwest-northeast direction, about 10 km east to Opak River fault, at 5-18 km depth. The b-value map from the aftershocks shows that the main lineaments have relatively low b-value at the middle part which suggests this part is still under stress. We also observe several aftershock clusters cutting these lineaments in nearly perpendicular direction. To verify the interpretation of our aftershocks analysis, we will overlay it on surface feature we delineate from satellite data. Hopefully our result will give significant contribution to understand the near surface fault systems around Yogyakarta Area in order to mitigate similar earthquake hazard in the future.
Incipient fault detection and power system protection for spaceborne systems
NASA Technical Reports Server (NTRS)
Russell, B. Don; Hackler, Irene M.
1987-01-01
A program was initiated to study the feasibility of using advanced terrestrial power system protection techniques for spacecraft power systems. It was designed to enhance and automate spacecraft power distribution systems in the areas of safety, reliability and maintenance. The proposed power management/distribution system is described as well as security assessment and control, incipient and low current fault detection, and the proposed spaceborne protection system. It is noted that the intelligent remote power controller permits the implementation of digital relaying algorithms with both adaptive and programmable characteristics.
A circuit-based photovoltaic module simulator with shadow and fault settings
NASA Astrophysics Data System (ADS)
Chao, Kuei-Hsiang; Chao, Yuan-Wei; Chen, Jyun-Ping
2016-03-01
The main purpose of this study was to develop a photovoltaic (PV) module simulator. The proposed simulator, using electrical parameters from solar cells, could simulate output characteristics not only during normal operational conditions, but also during conditions of partial shadow and fault conditions. Such a simulator should possess the advantages of low cost, small size and being easily realizable. Experiments have shown that results from a proposed PV simulator of this kind are very close to that from simulation software during partial shadow conditions, and with negligible differences during fault occurrence. Meanwhile, the PV module simulator, as developed, could be used on various types of series-parallel connections to form PV arrays, to conduct experiments on partial shadow and fault events occurring in some of the modules. Such experiments are designed to explore the impact of shadow and fault conditions on the output characteristics of the system as a whole.
Narrowing the scope of failure prediction using targeted fault load injection
NASA Astrophysics Data System (ADS)
Jordan, Paul L.; Peterson, Gilbert L.; Lin, Alan C.; Mendenhall, Michael J.; Sellers, Andrew J.
2018-05-01
As society becomes more dependent upon computer systems to perform increasingly critical tasks, ensuring that those systems do not fail becomes increasingly important. Many organizations depend heavily on desktop computers for day-to-day operations. Unfortunately, the software that runs on these computers is written by humans and, as such, is still subject to human error and consequent failure. A natural solution is to use statistical machine learning to predict failure. However, since failure is still a relatively rare event, obtaining labelled training data to train these models is not a trivial task. This work presents new simulated fault-inducing loads that extend the focus of traditional fault injection techniques to predict failure in the Microsoft enterprise authentication service and Apache web server. These new fault loads were successful in creating failure conditions that were identifiable using statistical learning methods, with fewer irrelevant faults being created.
On-line early fault detection and diagnosis of municipal solid waste incinerators
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhao Jinsong; Huang Jianchao; Sun Wei
A fault detection and diagnosis framework is proposed in this paper for early fault detection and diagnosis (FDD) of municipal solid waste incinerators (MSWIs) in order to improve the safety and continuity of production. In this framework, principal component analysis (PCA), one of the multivariate statistical technologies, is used for detecting abnormal events, while rule-based reasoning performs the fault diagnosis and consequence prediction, and also generates recommendations for fault mitigation once an abnormal event is detected. A software package, SWIFT, is developed based on the proposed framework, and has been applied in an actual industrial MSWI. The application shows thatmore » automated real-time abnormal situation management (ASM) of the MSWI can be achieved by using SWIFT, resulting in an industrially acceptable low rate of wrong diagnosis, which has resulted in improved process continuity and environmental performance of the MSWI.« less
Partitioning in Avionics Architectures: Requirements, Mechanisms, and Assurance
NASA Technical Reports Server (NTRS)
Rushby, John
1999-01-01
Automated aircraft control has traditionally been divided into distinct "functions" that are implemented separately (e.g., autopilot, autothrottle, flight management); each function has its own fault-tolerant computer system, and dependencies among different functions are generally limited to the exchange of sensor and control data. A by-product of this "federated" architecture is that faults are strongly contained within the computer system of the function where they occur and cannot readily propagate to affect the operation of other functions. More modern avionics architectures contemplate supporting multiple functions on a single, shared, fault-tolerant computer system where natural fault containment boundaries are less sharply defined. Partitioning uses appropriate hardware and software mechanisms to restore strong fault containment to such integrated architectures. This report examines the requirements for partitioning, mechanisms for their realization, and issues in providing assurance for partitioning. Because partitioning shares some concerns with computer security, security models are reviewed and compared with the concerns of partitioning.
DOE Office of Scientific and Technical Information (OSTI.GOV)
X. Zhao, S. Ramakrishnan, J. Lawson, C.Neumeyer, R. Marsala, H. Schneider, Engineering Operations
NSTX at Princeton Plasma Physics Laboratory (PPPL) requires sophisticated plasma positioning control system for stable plasma operation. TF magnetic coils and PF magnetic coils provide electromagnetic fields to position and shape the plasma vertically and horizontally respectively. NSTX utilizes twenty six coil power supplies to establish and initiate electromagnetic fields through the coil system for plasma control. A power protection and interlock system is utilized to detect power system faults and protect the TF coils and PF coils against excessive electromechanical forces, overheating, and over current. Upon detecting any fault condition the power system is restricted, and it is eithermore » prevented from initializing or suppressed to de-energize coil power during pulsing. Power fault status is immediately reported to the computer system. This paper describes the design and operation of NSTX's protection and interlocking system and possible future expansion.« less
A Testbed for Evaluating Lunar Habitat Autonomy Architectures
NASA Technical Reports Server (NTRS)
Lawler, Dennis G.
2008-01-01
A lunar outpost will involve a habitat with an integrated set of hardware and software that will maintain a safe environment for human activities. There is a desire for a paradigm shift whereby crew will be the primary mission operators, not ground controllers. There will also be significant periods when the outpost is uncrewed. This will require that significant automation software be resident in the habitat to maintain all system functions and respond to faults. JSC is developing a testbed to allow for early testing and evaluation of different autonomy architectures. This will allow evaluation of different software configurations in order to: 1) understand different operational concepts; 2) assess the impact of failures and perturbations on the system; and 3) mitigate software and hardware integration risks. The testbed will provide an environment in which habitat hardware simulations can interact with autonomous control software. Faults can be injected into the simulations and different mission scenarios can be scripted. The testbed allows for logging, replaying and re-initializing mission scenarios. An initial testbed configuration has been developed by combining an existing life support simulation and an existing simulation of the space station power distribution system. Results from this initial configuration will be presented along with suggested requirements and designs for the incremental development of a more sophisticated lunar habitat testbed.
14 CFR 25.863 - Flammable fluid fire protection.
Code of Federal Regulations, 2010 CFR
2010-01-01
... AIRCRAFT AIRWORTHINESS STANDARDS: TRANSPORT CATEGORY AIRPLANES Design and Construction Fire Protection § 25..., including electrical faults, overheating of equipment, and malfunctioning of protective devices. (4) Means...
DOE Office of Scientific and Technical Information (OSTI.GOV)
None
2012-01-09
GENI Project: General Atomics is developing a direct current (DC) circuit breaker that could protect the grid from faults 100 times faster than its alternating current (AC) counterparts. Circuit breakers are critical elements in any electrical system. At the grid level, their main function is to isolate parts of the grid where a fault has occurred—such as a downed power line or a transformer explosion—from the rest of the system. DC circuit breakers must interrupt the system during a fault much faster than AC circuit breakers to prevent possible damage to cables, converters and other grid-level components. General Atomics’ high-voltagemore » DC circuit breaker would react in less than 1/1,000th of a second to interrupt current during a fault, preventing potential hazards to people and equipment.« less
Discrete Wavelet Transform for Fault Locations in Underground Distribution System
NASA Astrophysics Data System (ADS)
Apisit, C.; Ngaopitakkul, A.
2010-10-01
In this paper, a technique for detecting faults in underground distribution system is presented. Discrete Wavelet Transform (DWT) based on traveling wave is employed in order to detect the high frequency components and to identify fault locations in the underground distribution system. The first peak time obtained from the faulty bus is employed for calculating the distance of fault from sending end. The validity of the proposed technique is tested with various fault inception angles, fault locations and faulty phases. The result is found that the proposed technique provides satisfactory result and will be very useful in the development of power systems protection scheme.
NASA Technical Reports Server (NTRS)
Glass, B. J.; Hack, E. C.
1990-01-01
A knowledge-based control system for real-time control and fault detection, isolation and recovery (FDIR) of a prototype two-phase Space Station Freedom external thermal control system (TCS) is discussed in this paper. The Thermal Expert System (TEXSYS) has been demonstrated in recent tests to be capable of both fault anticipation and detection and real-time control of the thermal bus. Performance requirements were achieved by using a symbolic control approach, layering model-based expert system software on a conventional numerical data acquisition and control system. The model-based capabilities of TEXSYS were shown to be advantageous during software development and testing. One representative example is given from on-line TCS tests of TEXSYS. The integration and testing of TEXSYS with a live TCS testbed provides some insight on the use of formal software design, development and documentation methodologies to qualify knowledge-based systems for on-line or flight applications.
Model-based reasoning for power system management using KATE and the SSM/PMAD
NASA Technical Reports Server (NTRS)
Morris, Robert A.; Gonzalez, Avelino J.; Carreira, Daniel J.; Mckenzie, F. D.; Gann, Brian
1993-01-01
The overall goal of this research effort has been the development of a software system which automates tasks related to monitoring and controlling electrical power distribution in spacecraft electrical power systems. The resulting software system is called the Intelligent Power Controller (IPC). The specific tasks performed by the IPC include continuous monitoring of the flow of power from a source to a set of loads, fast detection of anomalous behavior indicating a fault to one of the components of the distribution systems, generation of diagnosis (explanation) of anomalous behavior, isolation of faulty object from remainder of system, and maintenance of flow of power to critical loads and systems (e.g. life-support) despite fault conditions being present (recovery). The IPC system has evolved out of KATE (Knowledge-based Autonomous Test Engineer), developed at NASA-KSC. KATE consists of a set of software tools for developing and applying structure and behavior models to monitoring, diagnostic, and control applications.
Autonomous Cryogenics Loading Operations Simulation Software: Knowledgebase Autonomous Test Engineer
NASA Technical Reports Server (NTRS)
Wehner, Walter S.
2012-01-01
The Simulation Software, KATE (Knowledgebase Autonomous Test Engineer), is used to demonstrate the automatic identification of faults in a system. The ACLO (Autonomous Cryogenics Loading Operation) project uses KATE to monitor and find faults in the loading of the cryogenics int o a vehicle fuel tank. The KATE software interfaces with the IHM (Integrated Health Management) systems bus to communicate with other systems that are part of ACLO. One system that KATE uses the IHM bus to communicate with is AIS (Advanced Inspection System). KATE will send messages to AIS when there is a detected anomaly. These messages include visual inspection of specific valves, pressure gauges and control messages to have AIS open or close manual valves. My goals include implementing the connection to the IHM bus within KATE and for the AIS project. I will also be working on implementing changes to KATE's Ul and implementing the physics objects in KATE that will model portions of the cryogenics loading operation.
NASA Technical Reports Server (NTRS)
Rushby, John; Crow, Judith
1990-01-01
The authors explore issues in the specification, verification, and validation of artificial intelligence (AI) based software, using a prototype fault detection, isolation and recovery (FDIR) system for the Manned Maneuvering Unit (MMU). They use this system as a vehicle for exploring issues in the semantics of C-Language Integrated Production System (CLIPS)-style rule-based languages, the verification of properties relating to safety and reliability, and the static and dynamic analysis of knowledge based systems. This analysis reveals errors and shortcomings in the MMU FDIR system and raises a number of issues concerning software engineering in CLIPs. The authors came to realize that the MMU FDIR system does not conform to conventional definitions of AI software, despite the fact that it was intended and indeed presented as an AI system. The authors discuss this apparent disparity and related questions such as the role of AI techniques in space and aircraft operations and the suitability of CLIPS for critical applications.
MAX - An advanced parallel computer for space applications
NASA Technical Reports Server (NTRS)
Lewis, Blair F.; Bunker, Robert L.
1991-01-01
MAX is a fault-tolerant multicomputer hardware and software architecture designed to meet the needs of NASA spacecraft systems. It consists of conventional computing modules (computers) connected via a dual network topology. One network is used to transfer data among the computers and between computers and I/O devices. This network's topology is arbitrary. The second network operates as a broadcast medium for operating system synchronization messages and supports the operating system's Byzantine resilience. A fully distributed operating system supports multitasking in an asynchronous event and data driven environment. A large grain dataflow paradigm is used to coordinate the multitasking and provide easy control of concurrency. It is the basis of the system's fault tolerance and allows both static and dynamical location of tasks. Redundant execution of tasks with software voting of results may be specified for critical tasks. The dataflow paradigm also supports simplified software design, test and maintenance. A unique feature is a method for reliably patching code in an executing dataflow application.
NASA Astrophysics Data System (ADS)
Chen, Jian; Randall, Robert Bond; Peeters, Bart
2016-06-01
Artificial Neural Networks (ANNs) have the potential to solve the problem of automated diagnostics of piston slap faults, but the critical issue for the successful application of ANN is the training of the network by a large amount of data in various engine conditions (different speed/load conditions in normal condition, and with different locations/levels of faults). On the other hand, the latest simulation technology provides a useful alternative in that the effect of clearance changes may readily be explored without recourse to cutting metal, in order to create enough training data for the ANNs. In this paper, based on some existing simplified models of piston slap, an advanced multi-body dynamic simulation software was used to simulate piston slap faults with different speeds/loads and clearance conditions. Meanwhile, the simulation models were validated and updated by a series of experiments. Three-stage network systems are proposed to diagnose piston faults: fault detection, fault localisation and fault severity identification. Multi Layer Perceptron (MLP) networks were used in the detection stage and severity/prognosis stage and a Probabilistic Neural Network (PNN) was used to identify which cylinder has faults. Finally, it was demonstrated that the networks trained purely on simulated data can efficiently detect piston slap faults in real tests and identify the location and severity of the faults as well.
NASA Astrophysics Data System (ADS)
Hududillah, Teuku Hafid; Simanjuntak, Andrean V. H.; Husni, Muhammad
2017-07-01
Gravity is a non-destructive geophysical technique that has numerous application in engineering and environmental field like locating a fault zone. The purpose of this study is to spot the Seulimeum fault system in Iejue, Aceh Besar (Indonesia) by using a gravity technique and correlate the result with geologic map and conjointly to grasp a trend pattern of fault system. An estimation of subsurface geological structure of Seulimeum fault has been done by using gravity field anomaly data. Gravity anomaly data which used in this study is from Topex that is processed up to Free Air Correction. The step in the Next data processing is applying Bouger correction and Terrin Correction to obtain complete Bouger anomaly that is topographically dependent. Subsurface modeling is done using the Gav2DC for windows software. The result showed a low residual gravity value at a north half compared to south a part of study space that indicated a pattern of fault zone. Gravity residual was successfully correlate with the geologic map that show the existence of the Seulimeum fault in this study space. The study of earthquake records can be used for differentiating the active and non active fault elements, this gives an indication that the delineated fault elements are active.
Fleet-Wide Prognostic and Health Management Suite: Asset Fault Signature Database
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vivek Agarwal; Nancy J. Lybeck; Randall Bickford
Proactive online monitoring in the nuclear industry is being explored using the Electric Power Research Institute’s Fleet-Wide Prognostic and Health Management (FW-PHM) Suite software. The FW-PHM Suite is a set of web-based diagnostic and prognostic tools and databases that serves as an integrated health monitoring architecture. The FW-PHM Suite has four main modules: (1) Diagnostic Advisor, (2) Asset Fault Signature (AFS) Database, (3) Remaining Useful Life Advisor, and (4) Remaining Useful Life Database. The paper focuses on the AFS Database of the FW-PHM Suite, which is used to catalog asset fault signatures. A fault signature is a structured representation ofmore » the information that an expert would use to first detect and then verify the occurrence of a specific type of fault. The fault signatures developed to assess the health status of generator step-up transformers are described in the paper. The developed fault signatures capture this knowledge and implement it in a standardized approach, thereby streamlining the diagnostic and prognostic process. This will support the automation of proactive online monitoring techniques in nuclear power plants to diagnose incipient faults, perform proactive maintenance, and estimate the remaining useful life of assets.« less
Effect of off-fault low-velocity elastic inclusions on supershear rupture dynamics
NASA Astrophysics Data System (ADS)
Ma, Xiao; Elbanna, A. E.
2015-10-01
Heterogeneous velocity structures are expected to affect fault rupture dynamics. To quantitatively evaluate some of these effects, we examine a model of dynamic rupture on a frictional fault embedded in an elastic full space, governed by plane strain elasticity, with a pair of off-fault inclusions that have a lower rigidity than the background medium. We solve the elastodynamic problem using the Finite Element software Pylith. The fault operates under linear slip-weakening friction law. We initiate the rupture by artificially overstressing a localized region near the left edge of the fault. We primarily consider embedded soft inclusions with 20 per cent reduction in both the pressure wave and shear wave speeds. The embedded inclusions are placed at different distances from the fault surface and have different sizes. We show that the existence of a soft inclusion may significantly shorten the transition length to supershear propagation through the Burridge-Andrews mechanism. We also observe that supershear rupture is generated at pre-stress values that are lower than what is theoretically predicted for a homogeneous medium. We discuss the implications of our results for dynamic rupture propagation in complex velocity structures as well as supershear propagation on understressed faults.
NASA Astrophysics Data System (ADS)
Kamaruddin, Nur Aminuda; Saad, Rosli; Nordiana, M. M.; Azwin, I. N.
2015-04-01
The Great Sumatra Fault system was split into two sub-parallel lines or segments at the Northern Sumatra. This event is one of the impacts of powerful earthquakes that hit Sumatra Island especially one that occurred in 2004. These two sub-parallel segments known as Aceh and Seulimeum fault. The study is focused on the Seulimeum fault and two geophysical methods chosen aimed to compare and verified the result obtained respectively. 2-D resistivity method is a common geophysical method used in determination of near surface structures such as faults, cavities, voids and sinkholes. Meanwhile, the magnetic method often chosen to delineate subsurface structures, determine depth of magnetic source bodies and possibly sediment thickness. Three survey lines of resistivity method and randomly magnetic stations were carried out covering Krueng district. The resistivity data processed using Res2Dinv and result presented using Surfer software. The fault identified by the contrast of low and high resistivity value. Meanwhile, the magnetic data were presented in magnetic residual contour map and the extended fault system is suspected represent by the contrast value of the magnetic anomalies. Within suspected fault zone, the results of resistivity are tally with magnetic result.
Using Combined SFTA and SFMECA Techniques for Space Critical Software
NASA Astrophysics Data System (ADS)
Nicodemos, F. G.; Lahoz, C. H. N.; Abdala, M. A. D.; Saotome, O.
2012-01-01
This work addresses the combined Software Fault Tree Analysis (SFTA) and Software Failure Modes, Effects and Criticality Analysis (SFMECA) techniques applied to space critical software of satellite launch vehicles. The combined approach is under research as part of the Verification and Validation (V&V) efforts to increase software dependability and as future application in other projects under development at Instituto de Aeronáutica e Espaço (IAE). The applicability of such approach was conducted on system software specification and applied to a case study based on the Brazilian Satellite Launcher (VLS). The main goal is to identify possible failure causes and obtain compensating provisions that lead to inclusion of new functional and non-functional system software requirements.
Continuous Fine-Fault Estimation with Real-Time GNSS
NASA Astrophysics Data System (ADS)
Norford, B. B.; Melbourne, T. I.; Szeliga, W. M.; Santillan, V. M.; Scrivner, C.; Senko, J.; Larsen, D.
2017-12-01
Thousands of real-time telemetered GNSS stations operate throughout the circum-Pacific that may be used for rapid earthquake characterization and estimation of local tsunami excitation. We report on the development of a GNSS-based finite-fault inversion system that continuously estimates slip using real-time GNSS position streams from the Cascadia subduction zone and which is being expanded throughout the circum-Pacific. The system uses 1 Hz precise point position streams computed in the ITRF14 reference frame using clock and satellite orbit corrections from the IGS. The software is implemented as seven independent modules that filter time series using Kalman filters, trigger and estimate coseismic offsets, invert for slip using a non-negative least squares method developed by Lawson and Hanson (1974) and elastic half-space Green's Functions developed by Okada (1985), smooth the results temporally and spatially, and write the resulting streams of time-dependent slip to a RabbitMQ messaging server for use by downstream modules such as tsunami excitation modules. Additional fault models can be easily added to the system for other circum-Pacific subduction zones as additional real-time GNSS data become available. The system is currently being tested using data from well-recorded earthquakes including the 2011 Tohoku earthquake, the 2010 Maule earthquake, the 2015 Illapel earthquake, the 2003 Tokachi-oki earthquake, the 2014 Iquique earthquake, the 2010 Mentawai earthquake, the 2016 Kaikoura earthquake, the 2016 Ecuador earthquake, the 2015 Gorkha earthquake, and others. Test data will be fed to the system and the resultant earthquake characterizations will be compared with published earthquake parameters. Seismic events will be assumed to occur on major faults, so, for example, only the San Andreas fault will be considered in Southern California, while the hundreds of other faults in the region will be ignored. Rake will be constrained along each subfault to be consistent with NUVEL-1 plate convergence directions. This software provides a basis for a GNSS-based rapid earthquake finite fault estimation system with global scope.
Design and evaluation of a fault-tolerant multiprocessor using hardware recovery blocks
NASA Technical Reports Server (NTRS)
Lee, Y. H.; Shin, K. G.
1982-01-01
A fault-tolerant multiprocessor with a rollback recovery mechanism is discussed. The rollback mechanism is based on the hardware recovery block which is a hardware equivalent to the software recovery block. The hardware recovery block is constructed by consecutive state-save operations and several state-save units in every processor and memory module. When a fault is detected, the multiprocessor reconfigures itself to replace the faulty component and then the process originally assigned to the faulty component retreats to one of the previously saved states in order to resume fault-free execution. A mathematical model is proposed to calculate both the coverage of multi-step rollback recovery and the risk of restart. A performance evaluation in terms of task execution time is also presented.
2015-04-01
troubleshooting avionics system faults while the aircraft is on the ground. The core component of the PATS-30, the ruggedized laptop, is no longer sustainable...as well as trouble shooting avionics system faults while the aircraft is on the ground. The PATS-70 utilizes up-to-date, sustainable technology for...Operational Flight Program (OFP) software loading and diagnostic avionics system testing and includes additional TPSs to enhance its capability
2001-12-01
and Lieutenant Namik Kaplan , Turkish Navy. Maj Tiefert’s thesis, “Modeling Control Channel Dynamics of SAAM using NS Network Simulation”, helped lay...DEC99] Deconinck , Dr. ir. Geert, Fault Tolerant Systems, ESAT / Division ACCA , Katholieke Universiteit Leuven, October 1999. [FRE00] Freed...Systems”, Addison-Wesley, 1989. [KAP99] Kaplan , Namik, “Prototyping of an Active and Lightweight Router,” March 1999 [KAT99] Kati, Effraim
Flight test results of the strapdown hexad inertial reference unit (SIRU). Volume 2: Test report
NASA Technical Reports Server (NTRS)
Hruby, R. J.; Bjorkman, W. S.
1977-01-01
Results of flight tests of the Strapdown Inertial Reference Unit (SIRU) navigation system are presented. The fault tolerant SIRU navigation system features a redundant inertial sensor unit and dual computers. System software provides for detection and isolation of inertial sensor failures and continued operation in the event of failures. Flight test results include assessments of the system's navigational performance and fault tolerance. Performance shortcomings are analyzed.
Health management and controls for earth to orbit propulsion systems
NASA Technical Reports Server (NTRS)
Bickford, R. L.
1992-01-01
Fault detection and isolation for advanced rocket engine controllers are discussed focusing on advanced sensing systems and software which significantly improve component failure detection for engine safety and health management. Aerojet's Space Transportation Main Engine controller for the National Launch System is the state of the art in fault tolerant engine avionics. Health management systems provide high levels of automated fault coverage and significantly improve vehicle delivered reliability and lower preflight operations costs. Key technologies, including the sensor data validation algorithms and flight capable spectrometers, have been demonstrated in ground applications and are found to be suitable for bridging programs into flight applications.
NASA Astrophysics Data System (ADS)
Haziza, M.
1990-10-01
The DIAMS satellite fault isolation expert system shell concept is described. The project, initiated in 1985, has led to the development of a prototype Expert System (ES) dedicated to the Telecom 1 attitude and orbit control system. The prototype ES has been installed in the Telecom 1 satellite control center and evaluated by Telecom 1 operations. The development of a fault isolation ES covering a whole spacecraft (the French telecommunication satellite Telecom 2) is currently being undertaken. Full scale industrial applications raise stringent requirements in terms of knowledge management and software development methodology. The approach used by MATRA ESPACE to face this challenge is outlined.
Multicore Considerations for Legacy Flight Software Migration
NASA Technical Reports Server (NTRS)
Vines, Kenneth; Day, Len
2013-01-01
In this paper we will discuss potential benefits and pitfalls when considering a migration from an existing single core code base to a multicore processor implementation. The results of this study present options that should be considered before migrating fault managers, device handlers and tasks with time-constrained requirements to a multicore flight software environment. Possible future multicore test bed demonstrations are also discussed.
Surface Rupture Map of the 2002 M7.9 Denali Fault Earthquake, Alaska: Digital Data
Haeussler, Peter J.
2009-01-01
The November 3, 2002, Mw7.9 Denali Fault earthquake produced about 340 km of surface rupture along the Susitna Glacier Thrust Fault and the right-lateral, strike-slip Denali and Totschunda Faults. Digital photogrammetric methods were primarily used to create a 1:500-scale, three-dimensional surface rupture map, and 1:6,000-scale aerial photographs were used for three-dimensional digitization in ESRI's ArcMap GIS software, using Leica's StereoAnalyst plug in. Points were digitized 4.3 m apart, on average, for the entire surface rupture. Earthquake-induced landslides, sackungen, and unruptured Holocene fault scarps on the eastern Denali Fault were also digitized where they lay within the limits of air photo coverage. This digital three-dimensional fault-trace map is superior to traditional maps in terms of relative and absolute accuracy, completeness, and detail and is used as a basis for three-dimensional visualization. Field work complements the air photo observations in locations of dense vegetation, on bedrock, or in areas where the surface trace is weakly developed. Seventeen km of the fault trace, which broke through glacier ice, were not digitized in detail due to time constraints, and air photos missed another 10 km of fault rupture through the upper Black Rapids Glacier, so that was not mapped in detail either.
Development of Asset Fault Signatures for Prognostic and Health Management in the Nuclear Industry
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vivek Agarwal; Nancy J. Lybeck; Randall Bickford
2014-06-01
Proactive online monitoring in the nuclear industry is being explored using the Electric Power Research Institute’s Fleet-Wide Prognostic and Health Management (FW-PHM) Suite software. The FW-PHM Suite is a set of web-based diagnostic and prognostic tools and databases that serves as an integrated health monitoring architecture. The FW-PHM Suite has four main modules: Diagnostic Advisor, Asset Fault Signature (AFS) Database, Remaining Useful Life Advisor, and Remaining Useful Life Database. This paper focuses on development of asset fault signatures to assess the health status of generator step-up generators and emergency diesel generators in nuclear power plants. Asset fault signatures describe themore » distinctive features based on technical examinations that can be used to detect a specific fault type. At the most basic level, fault signatures are comprised of an asset type, a fault type, and a set of one or more fault features (symptoms) that are indicative of the specified fault. The AFS Database is populated with asset fault signatures via a content development exercise that is based on the results of intensive technical research and on the knowledge and experience of technical experts. The developed fault signatures capture this knowledge and implement it in a standardized approach, thereby streamlining the diagnostic and prognostic process. This will support the automation of proactive online monitoring techniques in nuclear power plants to diagnose incipient faults, perform proactive maintenance, and estimate the remaining useful life of assets.« less
Experiments with Test Case Generation and Runtime Analysis
NASA Technical Reports Server (NTRS)
Artho, Cyrille; Drusinsky, Doron; Goldberg, Allen; Havelund, Klaus; Lowry, Mike; Pasareanu, Corina; Rosu, Grigore; Visser, Willem; Koga, Dennis (Technical Monitor)
2003-01-01
Software testing is typically an ad hoc process where human testers manually write many test inputs and expected test results, perhaps automating their execution in a regression suite. This process is cumbersome and costly. This paper reports preliminary results on an approach to further automate this process. The approach consists of combining automated test case generation based on systematically exploring the program's input domain, with runtime analysis, where execution traces are monitored and verified against temporal logic specifications, or analyzed using advanced algorithms for detecting concurrency errors such as data races and deadlocks. The approach suggests to generate specifications dynamically per input instance rather than statically once-and-for-all. The paper describes experiments with variants of this approach in the context of two examples, a planetary rover controller and a space craft fault protection system.
Collaborative Protection and Control Schemes for Shipboard Electrical Systems
2007-03-26
VSCs ) for fault current limiting and interruption. Revisions needed on the VSCs to perform these functions have been identified, and feasibility of this...disturbances very fast - less than 3-4 ms [3]. Next section summarizes the details of the agent based protection scheme that uses the VSC as the...fault currents. In our previous work [2, 3], it has been demonstrated that this new functionally for VSC can be achieved by proper selection of
Combining dynamical decoupling with fault-tolerant quantum computation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ng, Hui Khoon; Preskill, John; Lidar, Daniel A.
2011-07-15
We study how dynamical decoupling (DD) pulse sequences can improve the reliability of quantum computers. We prove upper bounds on the accuracy of DD-protected quantum gates and derive sufficient conditions for DD-protected gates to outperform unprotected gates. Under suitable conditions, fault-tolerant quantum circuits constructed from DD-protected gates can tolerate stronger noise and have a lower overhead cost than fault-tolerant circuits constructed from unprotected gates. Our accuracy estimates depend on the dynamics of the bath that couples to the quantum computer and can be expressed either in terms of the operator norm of the bath's Hamiltonian or in terms of themore » power spectrum of bath correlations; we explain in particular how the performance of recursively generated concatenated pulse sequences can be analyzed from either viewpoint. Our results apply to Hamiltonian noise models with limited spatial correlations.« less
On the nature of bias and defects in the software specification process
NASA Technical Reports Server (NTRS)
Straub, Pablo A.; Zelkowitz, Marvin V.
1992-01-01
Implementation bias in a specification is an arbitrary constraint in the solution space. This paper describes the problem of bias. Additionally, this paper presents a model of the specification and design processes describing individual subprocesses in terms of precision/detail diagrams and a model of bias in multi-attribute software specifications. While studying how bias is introduced into a specification we realized that software defects and bias are dual problems of a single phenomenon. This was used to explain the large proportion of faults found during the coding phase at the Software Engineering Laboratory at NASA/GSFC.
Aircraft Fault Detection and Classification Using Multi-Level Immune Learning Detection
NASA Technical Reports Server (NTRS)
Wong, Derek; Poll, Scott; KrishnaKumar, Kalmanje
2005-01-01
This work is an extension of a recently developed software tool called MILD (Multi-level Immune Learning Detection), which implements a negative selection algorithm for anomaly and fault detection that is inspired by the human immune system. The immunity-based approach can detect a broad spectrum of known and unforeseen faults. We extend MILD by applying a neural network classifier to identify the pattern of fault detectors that are activated during fault detection. Consequently, MILD now performs fault detection and identification of the system under investigation. This paper describes the application of MILD to detect and classify faults of a generic transport aircraft augmented with an intelligent flight controller. The intelligent control architecture is designed to accommodate faults without the need to explicitly identify them. Adding knowledge about the existence and type of a fault will improve the handling qualities of a degraded aircraft and impact tactical and strategic maneuvering decisions. In addition, providing fault information to the pilot is important for maintaining situational awareness so that he can avoid performing an action that might lead to unexpected behavior - e.g., an action that exceeds the remaining control authority of the damaged aircraft. We discuss the detection and classification results of simulated failures of the aircraft's control system and show that MILD is effective at determining the problem with low false alarm and misclassification rates.
Extreme scale multi-physics simulations of the tsunamigenic 2004 Sumatra megathrust earthquake
NASA Astrophysics Data System (ADS)
Ulrich, T.; Gabriel, A. A.; Madden, E. H.; Wollherr, S.; Uphoff, C.; Rettenberger, S.; Bader, M.
2017-12-01
SeisSol (www.seissol.org) is an open-source software package based on an arbitrary high-order derivative Discontinuous Galerkin method (ADER-DG). It solves spontaneous dynamic rupture propagation on pre-existing fault interfaces according to non-linear friction laws, coupled to seismic wave propagation with high-order accuracy in space and time (minimal dispersion errors). SeisSol exploits unstructured meshes to account for complex geometries, e.g. high resolution topography and bathymetry, 3D subsurface structure, and fault networks. We present the up-to-date largest (1500 km of faults) and longest (500 s) dynamic rupture simulation modeling the 2004 Sumatra-Andaman earthquake. We demonstrate the need for end-to-end-optimization and petascale performance of scientific software to realize realistic simulations on the extreme scales of subduction zone earthquakes: Considering the full complexity of subduction zone geometries leads inevitably to huge differences in element sizes. The main code improvements include a cache-aware wave propagation scheme and optimizations of the dynamic rupture kernels using code generation. In addition, a novel clustered local-time-stepping scheme for dynamic rupture has been established. Finally, asynchronous output has been implemented to overlap I/O and compute time. We resolve the frictional sliding process on the curved mega-thrust and a system of splay faults, as well as the seismic wave field and seafloor displacement with frequency content up to 2.2 Hz. We validate the scenario by geodetic, seismological and tsunami observations. The resulting rupture dynamics shed new light on the activation and importance of splay faults.
Extended Bright Bodies - Flight and Ground Software Challenges on the Cassini Mission at Saturn
NASA Technical Reports Server (NTRS)
Sung, Tina S.; Burk, Thomas A.
2016-01-01
Extended bright bodies in the Saturn environment such as Saturn's rings, the planet itself, and Saturn's satellites near the Cassini spacecraft may interfere with the star tracker's ability to find stars. These interferences can create faulty spacecraft attitude knowledge, which would decrease the pointing accuracy or even trip a fault protection response on board the spacecraft. The effects of the extended bright body interference were observed in December of 2000 when Cassini flew by Jupiter. Based on this flight experience and expected star tracker behavior at Saturn, the Cassini AACS operations team defined flight rules to suspend the star tracker during predicted interference windows. The flight rules are also implemented in the existing ground software called Kinematic Predictor Tool to create star identification suspend commands to be uplinked to the spacecraft for future predicted interferences. This paper discusses the details of how extended bright bodies impact Cassini's acquisition of attitude knowledge, how the observed data helped the ground engineers in developing flight rules, and how automated methods are used in the flight and ground software to ensure the spacecraft is continuously operated within these flight rules. This paper also discusses how these established procedures will continue to be used to overcome new bright body challenges that Cassini will encounter during its dips inside the rings of Saturn for its final orbits of a remarkable 20-year mission at Saturn.
NASA Astrophysics Data System (ADS)
Shi, J. T.; Han, X. T.; Xie, J. F.; Yao, L.; Huang, L. T.; Li, L.
2013-03-01
A Pulsed High Magnetic Field Facility (PHMFF) has been established in Wuhan National High Magnetic Field Center (WHMFC) and various protection measures are applied in its control system. In order to improve the reliability and robustness of the control system, the safety analysis of the PHMFF is carried out based on Fault Tree Analysis (FTA) technique. The function and realization of 5 protection systems, which include sequence experiment operation system, safety assistant system, emergency stop system, fault detecting and processing system and accident isolating protection system, are given. The tests and operation indicate that these measures improve the safety of the facility and ensure the safety of people.
SCEC-VDO: A New 3-Dimensional Visualization and Movie Making Software for Earth Science Data
NASA Astrophysics Data System (ADS)
Milner, K. R.; Sanskriti, F.; Yu, J.; Callaghan, S.; Maechling, P. J.; Jordan, T. H.
2016-12-01
Researchers and undergraduate interns at the Southern California Earthquake Center (SCEC) have created a new 3-dimensional (3D) visualization software tool called SCEC Virtual Display of Objects (SCEC-VDO). SCEC-VDO is written in Java and uses the Visualization Toolkit (VTK) backend to render 3D content. SCEC-VDO offers advantages over existing 3D visualization software for viewing georeferenced data beneath the Earth's surface. Many popular visualization packages, such as Google Earth, restrict the user to views of the Earth from above, obstructing views of geological features such as faults and earthquake hypocenters at depth. SCEC-VDO allows the user to view data both above and below the Earth's surface at any angle. It includes tools for viewing global earthquakes from the U.S. Geological Survey, faults from the SCEC Community Fault Model, and results from the latest SCEC models of earthquake hazards in California including UCERF3 and RSQSim. Its object-oriented plugin architecture allows for the easy integration of new regional and global datasets, regardless of the science domain. SCEC-VDO also features rich animation capabilities, allowing users to build a timeline with keyframes of camera position and displayed data. The software is built with the concept of statefulness, allowing for reproducibility and collaboration using an xml file. A prior version of SCEC-VDO, which began development in 2005 under the SCEC Undergraduate Studies in Earthquake Information Technology internship, used the now unsupported Java3D library. Replacing Java3D with the widely supported and actively developed VTK libraries not only ensures that SCEC-VDO can continue to function for years to come, but allows for the export of 3D scenes to web viewers and popular software such as Paraview. SCEC-VDO runs on all recent 64-bit Windows, Mac OS X, and Linux systems with Java 8 or later. More information, including downloads, tutorials, and example movies created fully within SCEC-VDO is available here: http://scecvdo.usc.edu
Copyright Protection for Computer Software: Is There a Need for More Protection?
ERIC Educational Resources Information Center
Ku, Linlin
Because the computer industry's expansion has been much faster than has the development of laws protecting computer software and since the practice of software piracy seems to be alive and well, the issue of whether existing laws can provide effective protection for software needs further discussion. Three bodies of law have been used to protect…
Onboard Sensor Data Qualification in Human-Rated Launch Vehicles
NASA Technical Reports Server (NTRS)
Wong, Edmond; Melcher, Kevin J.; Maul, William A.; Chicatelli, Amy K.; Sowers, Thomas S.; Fulton, Christopher; Bickford, Randall
2012-01-01
The avionics system software for human-rated launch vehicles requires an implementation approach that is robust to failures, especially the failure of sensors used to monitor vehicle conditions that might result in an abort determination. Sensor measurements provide the basis for operational decisions on human-rated launch vehicles. This data is often used to assess the health of system or subsystem components, to identify failures, and to take corrective action. An incorrect conclusion and/or response may result if the sensor itself provides faulty data, or if the data provided by the sensor has been corrupted. Operational decisions based on faulty sensor data have the potential to be catastrophic, resulting in loss of mission or loss of crew. To prevent these later situations from occurring, a Modular Architecture and Generalized Methodology for Sensor Data Qualification in Human-rated Launch Vehicles has been developed. Sensor Data Qualification (SDQ) is a set of algorithms that can be implemented in onboard flight software, and can be used to qualify data obtained from flight-critical sensors prior to the data being used by other flight software algorithms. Qualified data has been analyzed by SDQ and is determined to be a true representation of the sensed system state; that is, the sensor data is determined not to be corrupted by sensor faults or signal transmission faults. Sensor data can become corrupted by faults at any point in the signal path between the sensor and the flight computer. Qualifying the sensor data has the benefit of ensuring that erroneous data is identified and flagged before otherwise being used for operational decisions, thus increasing confidence in the response of the other flight software processes using the qualified data, and decreasing the probability of false alarms or missed detections.
Electrical short circuit and current overload tests on aircraft wiring
NASA Technical Reports Server (NTRS)
Cahill, Patricia
1995-01-01
The findings of electrical short circuit and current overload tests performed on commercial aircraft wiring are presented. A series of bench-scale tests were conducted to evaluate circuit breaker response to overcurrent and to determine if the wire showed any visible signs of thermal degradation due to overcurrent. Three types of wire used in commercial aircraft were evaluated: MIL-W-22759/34 (150 C rated), MIL-W-81381/12 (200 C rated), and BMS 1360 (260 C rated). A second series of tests evaluated circuit breaker response to short circuits and ticking faults. These tests were also meant to determine if the three test wires behaved differently under these conditions and if a short circuit or ticking fault could start a fire. It is concluded that circuit breakers provided reliable overcurrent protection. Circuit breakers may not protect wire from ticking faults but can protect wire from direct shorts. These tests indicated that the appearance of a wire subjected to a current that totally degrades the insulation looks identical to a wire subjected to a fire; however the 'fire exposed' conductor was more brittle than the conductor degraded by overcurrent. Preliminary testing indicates that direct short circuits are not likely to start a fire. Preliminary testing indicated that direct short circuits do not erode insulation and conductor to the extent that ticking faults did. Circuit breakers may not safeguard against the ignition of flammable materials by ticking faults. The flammability of materials near ticking faults is far more important than the rating of the wire insulation material.
Barall, Michael
2009-01-01
We present a new finite-element technique for calculating dynamic 3-D spontaneous rupture on an earthquake fault, which can reduce the required computational resources by a factor of six or more, without loss of accuracy. The grid-doubling technique employs small cells in a thin layer surrounding the fault. The remainder of the modelling volume is filled with larger cells, typically two or four times as large as the small cells. In the resulting non-conforming mesh, an interpolation method is used to join the thin layer of smaller cells to the volume of larger cells. Grid-doubling is effective because spontaneous rupture calculations typically require higher spatial resolution on and near the fault than elsewhere in the model volume. The technique can be applied to non-planar faults by morphing, or smoothly distorting, the entire mesh to produce the desired 3-D fault geometry. Using our FaultMod finite-element software, we have tested grid-doubling with both slip-weakening and rate-and-state friction laws, by running the SCEC/USGS 3-D dynamic rupture benchmark problems. We have also applied it to a model of the Hayward fault, Northern California, which uses realistic fault geometry and rock properties. FaultMod implements fault slip using common nodes, which represent motion common to both sides of the fault, and differential nodes, which represent motion of one side of the fault relative to the other side. We describe how to modify the traction-at-split-nodes method to work with common and differential nodes, using an implicit time stepping algorithm.
Insulation detection of electric vehicle batteries
NASA Astrophysics Data System (ADS)
Dai, Qiqi; Zhu, Zhongwen; Huang, Denggao; Du, Mingxing; Wei, Kexin
2018-06-01
In this paper, an electric vehicle insulation detection method with single side switching fixed resistance is designed, and the hardware and software design of the system are given. The experiment proves that the insulation detection system can detect the insulation resistance in a wide range of resistance values, and accurately report the fault level. This system can effectively monitor the insulation fault between the car body and the high voltage line and avoid the passengers from being injured.
2013-03-01
amounts of time and effort to implement. Future testing with commercial, fault-tolerant synthesis software, under a radiation environment, will yield ...initial viewpoint of the author is to take the flash-based FPGA route. This will yield a simple, reconfigurable circuit while providing the added...structure seen in Figure 30. Each of these full adder blocks were replaced in subsequent iterations to yield proper comparison with this baseline
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nancy J. Lybeck; Vivek Agarwal; Binh T. Pham
The Light Water Reactor Sustainability program at Idaho National Laboratory (INL) is actively conducting research to develop and demonstrate online monitoring (OLM) capabilities for active components in existing Nuclear Power Plants. A pilot project is currently underway to apply OLM to Generator Step-Up Transformers (GSUs) and Emergency Diesel Generators (EDGs). INL and the Electric Power Research Institute (EPRI) are working jointly to implement the pilot project. The EPRI Fleet-Wide Prognostic and Health Management (FW-PHM) Software Suite will be used to implement monitoring in conjunction with utility partners: the Shearon Harris Nuclear Generating Station (owned by Duke Energy for GSUs, andmore » Braidwood Generating Station (owned by Exelon Corporation) for EDGs. This report presents monitoring techniques, fault signatures, and diagnostic and prognostic models for GSUs. GSUs are main transformers that are directly connected to generators, stepping up the voltage from the generator output voltage to the highest transmission voltages for supplying electricity to the transmission grid. Technical experts from Shearon Harris are assisting INL and EPRI in identifying critical faults and defining fault signatures associated with each fault. The resulting diagnostic models will be implemented in the FW-PHM Software Suite and tested using data from Shearon-Harris. Parallel research on EDGs is being conducted, and will be reported in an interim report during the first quarter of fiscal year 2013.« less
Rover Attitude and Pointing System Simulation Testbed
NASA Technical Reports Server (NTRS)
Vanelli, Charles A.; Grinblat, Jonathan F.; Sirlin, Samuel W.; Pfister, Sam
2009-01-01
The MER (Mars Exploration Rover) Attitude and Pointing System Simulation Testbed Environment (RAPSSTER) provides a simulation platform used for the development and test of GNC (guidance, navigation, and control) flight algorithm designs for the Mars rovers, which was specifically tailored to the MERs, but has since been used in the development of rover algorithms for the Mars Science Laboratory (MSL) as well. The software provides an integrated simulation and software testbed environment for the development of Mars rover attitude and pointing flight software. It provides an environment that is able to run the MER GNC flight software directly (as opposed to running an algorithmic model of the MER GNC flight code). This improves simulation fidelity and confidence in the results. Further more, the simulation environment allows the user to single step through its execution, pausing, and restarting at will. The system also provides for the introduction of simulated faults specific to Mars rover environments that cannot be replicated in other testbed platforms, to stress test the GNC flight algorithms under examination. The software provides facilities to do these stress tests in ways that cannot be done in the real-time flight system testbeds, such as time-jumping (both forwards and backwards), and introduction of simulated actuator faults that would be difficult, expensive, and/or destructive to implement in the real-time testbeds. Actual flight-quality codes can be incorporated back into the development-test suite of GNC developers, closing the loop between the GNC developers and the flight software developers. The software provides fully automated scripting, allowing multiple tests to be run with varying parameters, without human supervision.
The SCEC/UseIT Intern Program: Creating Open-Source Visualization Software Using Diverse Resources
NASA Astrophysics Data System (ADS)
Francoeur, H.; Callaghan, S.; Perry, S.; Jordan, T.
2004-12-01
The Southern California Earthquake Center undergraduate IT intern program (SCEC UseIT) conducts IT research to benefit collaborative earth science research. Through this program, interns have developed real-time, interactive, 3D visualization software using open-source tools. Dubbed LA3D, a distribution of this software is now in use by the seismic community. LA3D enables the user to interactively view Southern California datasets and models of importance to earthquake scientists, such as faults, earthquakes, fault blocks, digital elevation models, and seismic hazard maps. LA3D is now being extended to support visualizations anywhere on the planet. The new software, called SCEC-VIDEO (Virtual Interactive Display of Earth Objects), makes use of a modular, plugin-based software architecture which supports easy development and integration of new data sets. Currently SCEC-VIDEO is in beta testing, with a full open-source release slated for the future. Both LA3D and SCEC-VIDEO were developed using a wide variety of software technologies. These, which included relational databases, web services, software management technologies, and 3-D graphics in Java, were necessary to integrate the heterogeneous array of data sources which comprise our software. Currently the interns are working to integrate new technologies and larger data sets to increase software functionality and value. In addition, both LA3D and SCEC-VIDEO allow the user to script and create movies. Thus program interns with computer science backgrounds have been writing software while interns with other interests, such as cinema, geology, and education, have been making movies that have proved of great use in scientific talks, media interviews, and education. Thus, SCEC UseIT incorporates a wide variety of scientific and human resources to create products of value to the scientific and outreach communities. The program plans to continue with its interdisciplinary approach, increasing the relevance of the software and expanding its use in the scientific community.
DEVELOPMENT AND TESTING OF FAULT-DIAGNOSIS ALGORITHMS FOR REACTOR PLANT SYSTEMS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Grelle, Austin L.; Park, Young S.; Vilim, Richard B.
Argonne National Laboratory is further developing fault diagnosis algorithms for use by the operator of a nuclear plant to aid in improved monitoring of overall plant condition and performance. The objective is better management of plant upsets through more timely, informed decisions on control actions with the ultimate goal of improved plant safety, production, and cost management. Integration of these algorithms with visual aids for operators is taking place through a collaboration under the concept of an operator advisory system. This is a software entity whose purpose is to manage and distill the enormous amount of information an operator mustmore » process to understand the plant state, particularly in off-normal situations, and how the state trajectory will unfold in time. The fault diagnosis algorithms were exhaustively tested using computer simulations of twenty different faults introduced into the chemical and volume control system (CVCS) of a pressurized water reactor (PWR). The algorithms are unique in that each new application to a facility requires providing only the piping and instrumentation diagram (PID) and no other plant-specific information; a subject-matter expert is not needed to install and maintain each instance of an application. The testing approach followed accepted procedures for verifying and validating software. It was shown that the code satisfies its functional requirement which is to accept sensor information, identify process variable trends based on this sensor information, and then to return an accurate diagnosis based on chains of rules related to these trends. The validation and verification exercise made use of GPASS, a one-dimensional systems code, for simulating CVCS operation. Plant components were failed and the code generated the resulting plant response. Parametric studies with respect to the severity of the fault, the richness of the plant sensor set, and the accuracy of sensors were performed as part of the validation exercise. The background and overview of the software will be presented to give an overview of the approach. Following, the verification and validation effort using the GPASS code for simulation of plant transients including a sensitivity study on important parameters will be presented« less
Integrated Software Health Management for Aircraft GN and C
NASA Technical Reports Server (NTRS)
Schumann, Johann; Mengshoel, Ole
2011-01-01
Modern aircraft rely heavily on dependable operation of many safety-critical software components. Despite careful design, verification and validation (V&V), on-board software can fail with disastrous consequences if it encounters problematic software/hardware interaction or must operate in an unexpected environment. We are using a Bayesian approach to monitor the software and its behavior during operation and provide up-to-date information about the health of the software and its components. The powerful reasoning mechanism provided by our model-based Bayesian approach makes reliable diagnosis of the root causes possible and minimizes the number of false alarms. Compilation of the Bayesian model into compact arithmetic circuits makes SWHM feasible even on platforms with limited CPU power. We show initial results of SWHM on a small simulator of an embedded aircraft software system, where software and sensor faults can be injected.
Mechatronics technology in predictive maintenance method
NASA Astrophysics Data System (ADS)
Majid, Nurul Afiqah A.; Muthalif, Asan G. A.
2017-11-01
This paper presents recent mechatronics technology that can help to implement predictive maintenance by combining intelligent and predictive maintenance instrument. Vibration Fault Simulation System (VFSS) is an example of mechatronics system. The focus of this study is the prediction on the use of critical machines to detect vibration. Vibration measurement is often used as the key indicator of the state of the machine. This paper shows the choice of the appropriate strategy in the vibration of diagnostic process of the mechanical system, especially rotating machines, in recognition of the failure during the working process. In this paper, the vibration signature analysis is implemented to detect faults in rotary machining that includes imbalance, mechanical looseness, bent shaft, misalignment, missing blade bearing fault, balancing mass and critical speed. In order to perform vibration signature analysis for rotating machinery faults, studies have been made on how mechatronics technology is used as predictive maintenance methods. Vibration Faults Simulation Rig (VFSR) is designed to simulate and understand faults signatures. These techniques are based on the processing of vibrational data in frequency-domain. The LabVIEW-based spectrum analyzer software is developed to acquire and extract frequency contents of faults signals. This system is successfully tested based on the unique vibration fault signatures that always occur in a rotating machinery.
NASA Astrophysics Data System (ADS)
Weigand, R.
Two new processor devices have been developed for the use on board of spacecrafts. An 8-bit 8032-microcontroller targets typical controlling applications in instruments and sub-systems, or could be used as a main processor on small satellites, whereas the LEON 32-bit SPARC processor can be used for high performance controlling and data processing tasks. The ADV80S32 is fully compliant to the Intel 80x1 architecture and instruction set, extended by additional peripherals, 512 bytes on-chip RAM and a bootstrap PROM, which allows downloading the application software using the CCSDS PacketWire pro- tocol. The memory controller provides a de-multiplexed address/data bus, and allows to access up to 16 MB data and 8 MB program RAM. The peripherals have been de- signed for the specific needs of a spacecraft, such as serial interfaces compatible to RS232, PacketWire and TTC-B-01, counters/timers for extended duration and a CRC calculation unit accelerating the CCSDS TM/TC protocol. The 0.5 um Atmel manu- facturing technology (MG2RT) provides latch-up and total dose immunity; SEU fault immunity is implemented by using SEU hardened Flip-Flops and EDAC protection of internal and external memories. The maximum clock frequency of 20 MHz allows a processing power of 3 MIPS. Engineering samples are available. For SW develop- ment, various SW packages for the 8051 architecture are on the market. The LEON processor implements a 32-bit SPARC V8 architecture, including all the multiply and divide instructions, complemented by a floating-point unit (FPU). It includes several standard peripherals, such as timers/watchdog, interrupt controller, UARTs, parallel I/Os and a memory controller, allowing to use 8, 16 and 32 bit PROM, SRAM or memory mapped I/O. With on-chip separate instruction and data caches, almost one instruction per clock cycle can be reached in some applications. A 33-MHz 32-bit PCI master/target interface and a PCI arbiter allow operating the device in a plug-in card (for SW development on PC etc.), or to consider using it as a PCI master controller in an on-board system. Advanced SEU fault tolerance is in- troduced by design, using triple modular redundancy (TMR) flip-flops for all registers and EDAC protection for all memories. The device will be manufactured in a radia- tion hard Atmel 0.25 um technology, targeting 100 MHz processor clock frequency. The non fault-tolerant LEON processor VHDL model is available as free source code, and the SPARC architecture is a well-known industry standard. Therefore, know-how, software tools and operating systems are widely available.
Failure Diagnosis for the Holdup Tank System via ISFA
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Huijuan; Bragg-Sitton, Shannon; Smidts, Carol
This paper discusses the use of the integrated system failure analysis (ISFA) technique for fault diagnosis for the holdup tank system. ISFA is a simulation-based, qualitative and integrated approach used to study fault propagation in systems containing both hardware and software subsystems. The holdup tank system consists of a tank containing a fluid whose level is controlled by an inlet valve and an outlet valve. We introduce the component and functional models of the system, quantify the main parameters and simulate possible failure-propagation paths based on the fault propagation approach, ISFA. The results show that most component failures in themore » holdup tank system can be identified clearly and that ISFA is viable as a technique for fault diagnosis. Since ISFA is a qualitative technique that can be used in the very early stages of system design, this case study provides indications that it can be used early to study design aspects that relate to robustness and fault tolerance.« less
Advanced information processing system
NASA Technical Reports Server (NTRS)
Lala, J. H.
1984-01-01
Design and performance details of the advanced information processing system (AIPS) for fault and damage tolerant data processing on aircraft and spacecraft are presented. AIPS comprises several computers distributed throughout the vehicle and linked by a damage tolerant data bus. Most I/O functions are available to all the computers, which run in a TDMA mode. Each computer performs separate specific tasks in normal operation and assumes other tasks in degraded modes. Redundant software assures that all fault monitoring, logging and reporting are automated, together with control functions. Redundant duplex links and damage-spread limitation provide the fault tolerance. Details of an advanced design of a laboratory-scale proof-of-concept system are described, including functional operations.
Method and system for controlling a permanent magnet machine during fault conditions
Krefta, Ronald John; Walters, James E.; Gunawan, Fani S.
2004-05-25
Method and system for controlling a permanent magnet machine driven by an inverter is provided. The method allows for monitoring a signal indicative of a fault condition. The method further allows for generating during the fault condition a respective signal configured to maintain a field weakening current even though electrical power from an energy source is absent during said fault condition. The level of the maintained field-weakening current enables the machine to operate in a safe mode so that the inverter is protected from excess voltage.
Study of Stand-Alone Microgrid under Condition of Faults on Distribution Line
NASA Astrophysics Data System (ADS)
Malla, S. G.; Bhende, C. N.
2014-10-01
The behavior of stand-alone microgrid is analyzed under the condition of faults on distribution feeders. During fault since battery is not able to maintain dc-link voltage within limit, the resistive dump load control is presented to do so. An inverter control is proposed to maintain balanced voltages at PCC under the unbalanced load condition and to reduce voltage unbalance factor (VUF) at load points. The proposed inverter control also has facility to protect itself from high fault current. Existing maximum power point tracker (MPPT) algorithm is modified to limit the speed of generator during fault. Extensive simulation results using MATLAB/SIMULINK established that the performance of the controllers is quite satisfactory under different fault conditions as well as unbalanced load conditions.
NASA Astrophysics Data System (ADS)
Álvarez del Castillo, Alejandra; Alaniz-Álvarez, Susana Alicia; Nieto-Samaniego, Angel Francisco; Xu, Shunshan; Ochoa-González, Gil Humberto; Velasquillo-Martínez, Luis Germán
2017-07-01
In the oil, gas and geothermal industry, the extraction or the input of fluids induces changes in the stress field of the reservoir, if the in-situ stress state of a fault plane is sufficiently disturbed, a fault may slip and can trigger fluid leakage or the reservoir might fracture and become damaged. The goal of the SSLIPO 1.0 software is to obtain data that can reduce the risk of affecting the stability of wellbores. The input data are the magnitudes of the three principal stresses and their orientation in geographic coordinates. The output data are the slip direction of a fracture in geographic coordinates, and its normal (σn) and shear (τ) stresses resolved on a single or multiple fracture planes. With this information, it is possible to calculate the slip tendency (τ/σn) and the propensity to open a fracture that is inversely proportional to σn. This software could analyze any compressional stress system, even non-Andersonian. An example is given from an oilfield in southern Mexico, in a region that contains fractures formed in three events of deformation. In the example SSLIPO 1.0 was used to determine in which deformation event the oil migrated. SSLIPO 1.0 is an open code application developed in MATLAB. The URL to obtain the source code and to download SSLIPO 1.0 are: http://www.geociencias.unam.mx/ alaniz/main_code.txt, http://www.geociencias.unam.mx/ alaniz/ SSLIPO_pkg.exe.
Data collection and analysis software development for rotor dynamics testing in spin laboratory
NASA Astrophysics Data System (ADS)
Abdul-Aziz, Ali; Arble, Daniel; Woike, Mark
2017-04-01
Gas turbine engine components undergo high rotational loading another complex environmental conditions. Such operating environment leads these components to experience damages and cracks that can cause catastrophic failure during flights. There are traditional crack detections and health monitoring methodologies currently being used which rely on periodic routine maintenances, nondestructive inspections that often times involve engine and components dis-assemblies. These methods do not also offer adequate information about the faults, especially, if these faults at subsurface or not clearly evident. At NASA Glenn research center, the rotor dynamics laboratory is presently involved in developing newer techniques that are highly dependent on sensor technology to enable health monitoring and prediction of damage and cracks in rotor disks. These approaches are noninvasive and relatively economical. Spin tests are performed using a subscale test article mimicking turbine rotor disk undergoing rotational load. Non-contact instruments such as capacitive and microwave sensors are used to measure the blade tip gap displacement and blade vibrations characteristics in an attempt develop a physics based model to assess/predict the faults in the rotor disk. Data collection is a major component in this experimental-analytical procedure and as a result, an upgrade to an older version of the data acquisition software which is based on LabVIEW program has been implemented to support efficiently running tests and analyze the results. Outcomes obtained from the tests data and related experimental and analytical rotor dynamics modeling including key features of the updated software are presented and discussed.
Automatic Fault Characterization via Abnormality-Enhanced Classification
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bronevetsky, G; Laguna, I; de Supinski, B R
Enterprise and high-performance computing systems are growing extremely large and complex, employing hundreds to hundreds of thousands of processors and software/hardware stacks built by many people across many organizations. As the growing scale of these machines increases the frequency of faults, system complexity makes these faults difficult to detect and to diagnose. Current system management techniques, which focus primarily on efficient data access and query mechanisms, require system administrators to examine the behavior of various system services manually. Growing system complexity is making this manual process unmanageable: administrators require more effective management tools that can detect faults and help tomore » identify their root causes. System administrators need timely notification when a fault is manifested that includes the type of fault, the time period in which it occurred and the processor on which it originated. Statistical modeling approaches can accurately characterize system behavior. However, the complex effects of system faults make these tools difficult to apply effectively. This paper investigates the application of classification and clustering algorithms to fault detection and characterization. We show experimentally that naively applying these methods achieves poor accuracy. Further, we design novel techniques that combine classification algorithms with information on the abnormality of application behavior to improve detection and characterization accuracy. Our experiments demonstrate that these techniques can detect and characterize faults with 65% accuracy, compared to just 5% accuracy for naive approaches.« less
Development of a Software Safety Process and a Case Study of Its Use
NASA Technical Reports Server (NTRS)
Knight, J. C.
1996-01-01
Research in the year covered by this reporting period has been primarily directed toward: continued development of mock-ups of computer screens for operator of a digital reactor control system; development of a reactor simulation to permit testing of various elements of the control system; formal specification of user interfaces; fault-tree analysis including software; evaluation of formal verification techniques; and continued development of a software documentation system. Technical results relating to this grant and the remainder of the principal investigator's research program are contained in various reports and papers.
NASA Technical Reports Server (NTRS)
1990-01-01
The present conference on digital avionics discusses vehicle-management systems, spacecraft avionics, special vehicle avionics, communication/navigation/identification systems, software qualification and quality assurance, launch-vehicle avionics, Ada applications, sensor and signal processing, general aviation avionics, automated software development, design-for-testability techniques, and avionics-software engineering. Also discussed are optical technology and systems, modular avionics, fault-tolerant avionics, commercial avionics, space systems, data buses, crew-station technology, embedded processors and operating systems, AI and expert systems, data links, and pilot/vehicle interfaces.
A novel N-input voting algorithm for X-by-wire fault-tolerant systems.
Karimi, Abbas; Zarafshan, Faraneh; Al-Haddad, S A R; Ramli, Abdul Rahman
2014-01-01
Voting is an important operation in multichannel computation paradigm and realization of ultrareliable and real-time control systems that arbitrates among the results of N redundant variants. These systems include N-modular redundant (NMR) hardware systems and diversely designed software systems based on N-version programming (NVP). Depending on the characteristics of the application and the type of selected voter, the voting algorithms can be implemented for either hardware or software systems. In this paper, a novel voting algorithm is introduced for real-time fault-tolerant control systems, appropriate for applications in which N is large. Then, its behavior has been software implemented in different scenarios of error-injection on the system inputs. The results of analyzed evaluations through plots and statistical computations have demonstrated that this novel algorithm does not have the limitations of some popular voting algorithms such as median and weighted; moreover, it is able to significantly increase the reliability and availability of the system in the best case to 2489.7% and 626.74%, respectively, and in the worst case to 3.84% and 1.55%, respectively.
Distributed controller clustering in software defined networks.
Abdelaziz, Ahmed; Fong, Ang Tan; Gani, Abdullah; Garba, Usman; Khan, Suleman; Akhunzada, Adnan; Talebian, Hamid; Choo, Kim-Kwang Raymond
2017-01-01
Software Defined Networking (SDN) is an emerging promising paradigm for network management because of its centralized network intelligence. However, the centralized control architecture of the software-defined networks (SDNs) brings novel challenges of reliability, scalability, fault tolerance and interoperability. In this paper, we proposed a novel clustered distributed controller architecture in the real setting of SDNs. The distributed cluster implementation comprises of multiple popular SDN controllers. The proposed mechanism is evaluated using a real world network topology running on top of an emulated SDN environment. The result shows that the proposed distributed controller clustering mechanism is able to significantly reduce the average latency from 8.1% to 1.6%, the packet loss from 5.22% to 4.15%, compared to distributed controller without clustering running on HP Virtual Application Network (VAN) SDN and Open Network Operating System (ONOS) controllers respectively. Moreover, proposed method also shows reasonable CPU utilization results. Furthermore, the proposed mechanism makes possible to handle unexpected load fluctuations while maintaining a continuous network operation, even when there is a controller failure. The paper is a potential contribution stepping towards addressing the issues of reliability, scalability, fault tolerance, and inter-operability.
Fault-Tolerant Software-Defined Radio on Manycore
NASA Technical Reports Server (NTRS)
Ricketts, Scott
2015-01-01
Software-defined radio (SDR) platforms generally rely on field-programmable gate arrays (FPGAs) and digital signal processors (DSPs), but such architectures require significant software development. In addition, application demands for radiation mitigation and fault tolerance exacerbate programming challenges. MaXentric Technologies, LLC, has developed a manycore-based SDR technology that provides 100 times the throughput of conventional radiationhardened general purpose processors. Manycore systems (30-100 cores and beyond) have the potential to provide high processing performance at error rates that are equivalent to current space-deployed uniprocessor systems. MaXentric's innovation is a highly flexible radio, providing over-the-air reconfiguration; adaptability; and uninterrupted, real-time, multimode operation. The technology is also compliant with NASA's Space Telecommunications Radio System (STRS) architecture. In addition to its many uses within NASA communications, the SDR can also serve as a highly programmable research-stage prototyping device for new waveforms and other communications technologies. It can also support noncommunication codes on its multicore processor, collocated with the communications workload-reducing the size, weight, and power of the overall system by aggregating processing jobs to a single board computer.
Autonomous power system brassboard
NASA Technical Reports Server (NTRS)
Merolla, Anthony
1992-01-01
The Autonomous Power System (APS) brassboard is a 20 kHz power distribution system which has been developed at NASA Lewis Research Center, Cleveland, Ohio. The brassboard exists to provide a realistic hardware platform capable of testing artificially intelligent (AI) software. The brassboard's power circuit topology is based upon a Power Distribution Control Unit (PDCU), which is a subset of an advanced development 20 kHz electrical power system (EPS) testbed, originally designed for Space Station Freedom (SSF). The APS program is designed to demonstrate the application of intelligent software as a fault detection, isolation, and recovery methodology for space power systems. This report discusses both the hardware and software elements used to construct the present configuration of the brassboard. The brassboard power components are described. These include the solid-state switches (herein referred to as switchgear), transformers, sources, and loads. Closely linked to this power portion of the brassboard is the first level of embedded control. Hardware used to implement this control and its associated software is discussed. An Ada software program, developed by Lewis Research Center's Space Station Freedom Directorate for their 20 kHz testbed, is used to control the brassboard's switchgear, as well as monitor key brassboard parameters through sensors located within these switches. The Ada code is downloaded from a PC/AT, and is resident within the 8086 microprocessor-based embedded controllers. The PC/AT is also used for smart terminal emulation, capable of controlling the switchgear as well as displaying data from them. Intelligent control is provided through use of a T1 Explorer and the Autonomous Power Expert (APEX) LISP software. Real-time load scheduling is implemented through use of a 'C' program-based scheduling engine. The methods of communication between these computers and the brassboard are explored. In order to evaluate the features of both the brassboard hardware and intelligent controlling software, fault circuits have been developed and integrated as part of the brassboard. A description of these fault circuits and their function is included. The brassboard has become an extremely useful test facility, promoting artificial intelligence (AI) applications for power distribution systems. However, there are elements of the brassboard which could be enhanced, thus improving system performance. Modifications and enhancements to improve the brassboard's operation are discussed.
1990-04-01
DECISION AIDS HAVE CREATED A VAST NEW POTENTIAL FOR SUPPORT OF STRATEGIC AND TACTICAL OPERATIONS. THE NON-MONOTONIC PROBABILIST (NMP), DEVELOPED BY...QUALITY OF THE NEW DESIGN WILL BE EVALUATED BY CREATING A VIDEO TAPE USING A VIDEO ANIMATION SYSTEM, AND A SOFTWARE SIMULATION OF THE NEW DESIGN. THE...FAULT TOLERANT, SECURE SHIPBOARD COMMUNICATIONS. THE LAN WILL UTILIZE PHOENIX DIGITAL’S FAULT TOLERANT, " SELF - HEALING " SMALL BUSINESS INNOVATION RESEARCH
Specification of Fenix MPI Fault Tolerance library version 1.0.1
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gamble, Marc; Van Der Wijngaart, Rob; Teranishi, Keita
This document provides a specification of Fenix, a software library compatible with the Message Passing Interface (MPI) to support fault recovery without application shutdown. The library consists of two modules. The first, termed process recovery , restores an application to a consistent state after it has suffered a loss of one or more MPI processes (ranks). The second specifies functions the user can invoke to store application data in Fenix managed redundant storage, and to retrieve it from that storage after process recovery.
1983-04-01
tolerances or spaci - able assets diagnostic/fault ness float fications isolation devices Operation of cannibalL- zation point Why Sustain materiel...with diagnostic software based on "fault tree " representation of the M65 ThS) to bridge the gap in diagnostics capability was demonstrated in 1980 and... identification friend or foe) which has much lower reliability than TSQ-73 peculiar hardware). Thus, as in other examples, reported readiness does not reflect
Design of penicillin fermentation process simulation system
NASA Astrophysics Data System (ADS)
Qi, Xiaoyu; Yuan, Zhonghu; Qi, Xiaoxuan; Zhang, Wenqi
2011-10-01
Real-time monitoring for batch process attracts increasing attention. It can ensure safety and provide products with consistent quality. The design of simulation system of batch process fault diagnosis is of great significance. In this paper, penicillin fermentation, a typical non-linear, dynamic, multi-stage batch production process, is taken as the research object. A visual human-machine interactive simulation software system based on Windows operation system is developed. The simulation system can provide an effective platform for the research of batch process fault diagnosis.
On the design of fault-tolerant robotic manipulator systems
NASA Technical Reports Server (NTRS)
Tesar, Delbert
1993-01-01
Robotic systems are finding increasing use in space applications. Many of these devices are going to be operational on board the Space Station Freedom. Fault tolerance has been deemed necessary because of the criticality of the tasks and the inaccessibility of the systems to maintenance and repair. Design for fault tolerance in manipulator systems is an area within robotics that is without precedence in the literature. In this paper, we will attempt to lay down the foundations for such a technology. Design for fault tolerance demands new and special approaches to design, often at considerable variance from established design practices. These design aspects, together with reliability evaluation and modeling tools, are presented. Mechanical architectures that employ protective redundancies at many levels and have a modular architecture are then studied in detail. Once a mechanical architecture for fault tolerance has been derived, the chronological stages of operational fault tolerance are investigated. Failure detection, isolation, and estimation methods are surveyed, and such methods for robot sensors and actuators are derived. Failure recovery methods are also presented for each of the protective layers of redundancy. Failure recovery tactics often span all of the layers of a control hierarchy. Thus, a unified framework for decision-making and control, which orchestrates both the nominal redundancy management tasks and the failure management tasks, has been derived. The well-developed field of fault-tolerant computers is studied next, and some design principles relevant to the design of fault-tolerant robot controllers are abstracted. Conclusions are drawn, and a road map for the design of fault-tolerant manipulator systems is laid out with recommendations for a 10 DOF arm with dual actuators at each joint.
Reliability Through Life of Internal Protection Devices in Small-Cell ABSL Batteries
NASA Technical Reports Server (NTRS)
Neubauer, Jeremy; Ng, Ka Lok; Bennetti, Andrea; Pearson, Chris; Rao, gopal
2007-01-01
This viewgraph presentation reviews a reliability analysis of small cell protection batteries. The contents include: 1) The s-p Topology; 2) Cell Level Protection Devices; 3) Battery Level Fault Protection; 4) Large Cell Comparison; and 5) Battery Level Testing and Results.
Fault Tolerant Considerations and Methods for Guidance and Control Systems
1987-07-01
multifunction devices such as microprocessors with software. In striving toward the economic goal, however, a cost is incurred in a different coin, i.e...therefore been developed which reduces the software risk to acceptable proportions. Several of the techniques thus developed incur no significant cost ...complex that their design and implementation need computerized tools in order to be cost -effective (in a broad sense, including the capability of
Wu, Zhao; Xiong, Naixue; Huang, Yannong; Xu, Degang; Hu, Chunyang
2015-01-01
The services composition technology provides flexible methods for building service composition applications (SCAs) in wireless sensor networks (WSNs). The high reliability and high performance of SCAs help services composition technology promote the practical application of WSNs. The optimization methods for reliability and performance used for traditional software systems are mostly based on the instantiations of software components, which are inapplicable and inefficient in the ever-changing SCAs in WSNs. In this paper, we consider the SCAs with fault tolerance in WSNs. Based on a Universal Generating Function (UGF) we propose a reliability and performance model of SCAs in WSNs, which generalizes a redundancy optimization problem to a multi-state system. Based on this model, an efficient optimization algorithm for reliability and performance of SCAs in WSNs is developed based on a Genetic Algorithm (GA) to find the optimal structure of SCAs with fault-tolerance in WSNs. In order to examine the feasibility of our algorithm, we have evaluated the performance. Furthermore, the interrelationships between the reliability, performance and cost are investigated. In addition, a distinct approach to determine the most suitable parameters in the suggested algorithm is proposed. PMID:26561818
Huang, Jiahua; Zhou, Hai; Zhang, Binbin; Ding, Biao
2015-09-01
This article develops a new failure database software for orthopaedics implants based on WEB. The software is based on B/S mode, ASP dynamic web technology is used as its main development language to achieve data interactivity, Microsoft Access is used to create a database, these mature technologies make the software extend function or upgrade easily. In this article, the design and development idea of the software, the software working process and functions as well as relative technical features are presented. With this software, we can store many different types of the fault events of orthopaedics implants, the failure data can be statistically analyzed, and in the macroscopic view, it can be used to evaluate the reliability of orthopaedics implants and operations, it also can ultimately guide the doctors to improve the clinical treatment level.
Software reliability experiments data analysis and investigation
NASA Technical Reports Server (NTRS)
Walker, J. Leslie; Caglayan, Alper K.
1991-01-01
The objectives are to investigate the fundamental reasons which cause independently developed software programs to fail dependently, and to examine fault tolerant software structures which maximize reliability gain in the presence of such dependent failure behavior. The authors used 20 redundant programs from a software reliability experiment to analyze the software errors causing coincident failures, to compare the reliability of N-version and recovery block structures composed of these programs, and to examine the impact of diversity on software reliability using subpopulations of these programs. The results indicate that both conceptually related and unrelated errors can cause coincident failures and that recovery block structures offer more reliability gain than N-version structures if acceptance checks that fail independently from the software components are available. The authors present a theory of general program checkers that have potential application for acceptance tests.
NASA Technical Reports Server (NTRS)
Redinbo, Robert
1994-01-01
Fault tolerance features in the first three major subsystems appearing in the next generation of communications satellites are described. These satellites will contain extensive but efficient high-speed processing and switching capabilities to support the low signal strengths associated with very small aperture terminals. The terminals' numerous data channels are combined through frequency division multiplexing (FDM) on the up-links and are protected individually by forward error-correcting (FEC) binary convolutional codes. The front-end processing resources, demultiplexer, demodulators, and FEC decoders extract all data channels which are then switched individually, multiplexed, and remodulated before retransmission to earth terminals through narrow beam spot antennas. Algorithm based fault tolerance (ABFT) techniques, which relate real number parity values with data flows and operations, are used to protect the data processing operations. The additional checking features utilize resources that can be substituted for normal processing elements when resource reconfiguration is required to replace a failed unit.
Providing the full DDF link protection for bus-connected SIEPON based system architecture
NASA Astrophysics Data System (ADS)
Hwang, I.-Shyan; Pakpahan, Andrew Fernando; Liem, Andrew Tanny; Nikoukar, AliAkbar
2016-09-01
Currently a massive amount of traffic per second is delivered through EPON systems, one of the prominent access network technologies for delivering the next generation network. Therefore, it is vital to keep the EPON optical distribution network (ODN) working by providing the necessity protection mechanism in the deployed devices; otherwise, when failures occur it will cause a great loss for both network operators and business customers. In this paper, we propose a bus-connected architecture to protect and recover distribution drop fiber (DDF) link faults or transceiver failures at ONU(s) in SIEPON system. The proposed architecture provides a cost-effective architecture, which delivers the high fault-tolerance in handling multiple DDF faults, while also providing flexibility in choosing the backup ONU assignments. Simulation results show that the proposed architecture provides the reliability and maintains quality of service (QoS) performance in terms of mean packet delay, system throughput, packet loss and EF jitter when DDF link failures occur.
Li, Dan; Hu, Xiaoguang
2017-03-01
Because of the high availability requirements from weapon equipment, an in-depth study has been conducted on the real-time fault-tolerance of the widely applied Compact PCI (CPCI) bus measurement and control system. A redundancy design method that uses heartbeat detection to connect the primary and alternate devices has been developed. To address the low successful execution rate and relatively large waste of time slices in the primary version of the task software, an improved algorithm for real-time fault-tolerant scheduling is proposed based on the Basic Checking available time Elimination idle time (BCE) algorithm, applying a single-neuron self-adaptive proportion sum differential (PSD) controller. The experimental validation results indicate that this system has excellent redundancy and fault-tolerance, and the newly developed method can effectively improve the system availability. Copyright © 2017 ISA. Published by Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Huang, Yuehua; Li, Xiaomin; Cheng, Jiangzhou; Nie, Deyu; Wang, Zhuoyuan
2018-02-01
This paper presents a novel fault location method by injecting travelling wave current. The new methodology is based on Time Difference Of Arrival(TDOA)measurement which is available measurements the injection point and the end node of main radial. In other words, TDOA is the maximum correlation time when the signal reflected wave crest of the injected and fault appear simultaneously. Then distance calculation is equal to the wave velocity multiplied by TDOA. Furthermore, in case of some transformers connected to the end of the feeder, it’s necessary to combine with the transient voltage comparison of amplitude. Finally, in order to verify the effectiveness of this method, several simulations have been undertaken by using MATLAB/SIMULINK software packages. The proposed fault location is useful to short the positioning time in the premise of ensuring the accuracy, besides the error is 5.1% and 13.7%.
Cho, Ming-Yuan; Hoang, Thi Thom
2017-01-01
Fast and accurate fault classification is essential to power system operations. In this paper, in order to classify electrical faults in radial distribution systems, a particle swarm optimization (PSO) based support vector machine (SVM) classifier has been proposed. The proposed PSO based SVM classifier is able to select appropriate input features and optimize SVM parameters to increase classification accuracy. Further, a time-domain reflectometry (TDR) method with a pseudorandom binary sequence (PRBS) stimulus has been used to generate a dataset for purposes of classification. The proposed technique has been tested on a typical radial distribution network to identify ten different types of faults considering 12 given input features generated by using Simulink software and MATLAB Toolbox. The success rate of the SVM classifier is over 97%, which demonstrates the effectiveness and high efficiency of the developed method.
NASA Astrophysics Data System (ADS)
Wollherr, Stephanie; Gabriel, Alice-Agnes; Igel, Heiner
2015-04-01
In dynamic rupture models, high stress concentrations at rupture fronts have to to be accommodated by off-fault inelastic processes such as plastic deformation. As presented in (Roten et al., 2014), incorporating plastic yielding can significantly reduce earlier predictions of ground motions in the Los Angeles Basin. Further, an inelastic response of materials surrounding a fault potentially has a strong impact on surface displacement and is therefore a key aspect in understanding the triggering of tsunamis through floor uplifting. We present an implementation of off-fault-plasticity and its verification for the software package SeisSol, an arbitrary high-order derivative discontinuous Galerkin (ADER-DG) method. The software recently reached multi-petaflop/s performance on some of the largest supercomputers worldwide and was a Gordon Bell prize finalist application in 2014 (Heinecke et al., 2014). For the nonelastic calculations we impose a Drucker-Prager yield criterion in shear stress with a viscous regularization following (Andrews, 2005). It permits the smooth relaxation of high stress concentrations induced in the dynamic rupture process. We verify the implementation by comparison to the SCEC/USGS Spontaneous Rupture Code Verification Benchmarks. The results of test problem TPV13 with a 60-degree dipping normal fault show that SeisSol is in good accordance with other codes. Additionally we aim to explore the numerical characteristics of the off-fault plasticity implementation by performing convergence tests for the 2D code. The ADER-DG method is especially suited for complex geometries by using unstructured tetrahedral meshes. Local adaptation of the mesh resolution enables a fine sampling of the cohesive zone on the fault while simultaneously satisfying the dispersion requirements of wave propagation away from the fault. In this context we will investigate the influence of off-fault-plasticity on geometrically complex fault zone structures like subduction zones or branched faults. Studying the interplay of stress conditions and angle dependence of neighbouring branches including inelastic material behaviour and its effects on rupture jumps and seismic activation helps to advance our understanding of earthquake source processes. An application is the simulation of a real large-scale subduction zone scenario including plasticity to validate the coupling of our dynamic rupture calculations to a tsunami model in the framework of the ASCETE project (http://www.ascete.de/). Andrews, D. J. (2005): Rupture dynamics with energy loss outside the slip zone, J. Geophys. Res., 110, B01307. Heinecke, A. (2014), A. Breuer, S. Rettenberger, M. Bader, A.-A. Gabriel, C. Pelties, A. Bode, W. Barth, K. Vaidyanathan, M. Smelyanskiy and P. Dubey: Petascale High Order Dynamic Rupture Earthquake Simulations on Heterogeneous Supercomputers. In Supercomputing 2014, The International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, New Orleans, LA, USA, November 2014. Roten, D. (2014), K. B. Olsen, S.M. Day, Y. Cui, and D. Fäh: Expected seismic shaking in Los Angeles reduced by San Andreas fault zone plasticity, Geophys. Res. Lett., 41, 2769-2777.
NASA Astrophysics Data System (ADS)
Yim, S.-W.; Yu, S.-D.; Kim, H.-R.; Kim, M.-J.; Park, C.-R.; Yang, S.-E.; Kim, W.-S.; Hyun, O.-B.; Sim, J.; Park, K.-B.; Oh, I.-S.
2010-11-01
We have constructed and completed the preparation for a long-term operation test of a superconducting fault current limiter (SFCL) in a Korea Electric Power Corporation (KEPCO) test grid. The SFCL with rating of 22.9 kV/630 A, 3-phases, has been connected to the 22.9 kV test grid equipped with reclosers and other protection devices in Gochang Power Testing Center of KEPCO. The main goals of the test are the verification of SFCL performance and protection coordination studies. A line-commutation type SFCL was fabricated and installed for this project, and the superconducting components were cooled by a cryo-cooler to 77 K in the sub-cooled liquid nitrogen pressurized by 3 bar of helium gas. The verification test includes un-manned - long-term operation with and without loads and fault tests. Since the test site is 170 km away from the laboratory, we will adopt the un-manned operation with real-time remote monitoring and controlling using high speed internet. For the fault tests, we will apply fault currents up to around 8 kArms to the SFCL using an artificial fault generator. The fault tests may allow us not only to confirm the current limiting capability of the SFCL, but also to adjust the SFCL - recloser coordination such as resetting over-current relay parameters. This paper describes the construction of the testing facilities and discusses the plans for the verification tests.
HyspIRI Intelligent Payload Module(IPM) and Benchmarking Algorithms for Upload
NASA Technical Reports Server (NTRS)
Mandl, Daniel
2010-01-01
Features: Hardware: a) Xilinx Virtex-5 (GSFC Space Cube 2); b) 2 x 400MHz PPC; c) 100MHz Bus; d) 2 x 512MB SDRAM; e) Dual Gigabit Ethernet. Support Linux kernel 2.6.31 (gcc version 4.2.2). Support software running in stand alone mode for better performance. Can stream raw data up to 800 Mbps. Ready for operations. Software Application Examples: Band-stripping Algiotrhmsl:cloud, sulfur, flood, thermal, SWIL, NDVI, NDWI, SIWI, oil spills, algae blooms, etc. Corrections: geometric, radiometric, atmospheric. Core Flight System/dynamic software bus. CCSDS File Delivery Protocol. Delay Tolerant Network. CASPER /onboard planning. Fault monitoring/recovery software. S/C command and telemetry software. Data compression. Sensor Web for Autonomous Mission Operations.
Series and parallel arc-fault circuit interrupter tests.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Johnson, Jay Dean; Fresquez, Armando J.; Gudgel, Bob
2013-07-01
While the 2011 National Electrical Codeª (NEC) only requires series arc-fault protection, some arc-fault circuit interrupter (AFCI) manufacturers are designing products to detect and mitigate both series and parallel arc-faults. Sandia National Laboratories (SNL) has extensively investigated the electrical differences of series and parallel arc-faults and has offered possible classification and mitigation solutions. As part of this effort, Sandia National Laboratories has collaborated with MidNite Solar to create and test a 24-string combiner box with an AFCI which detects, differentiates, and de-energizes series and parallel arc-faults. In the case of the MidNite AFCI prototype, series arc-faults are mitigated by openingmore » the PV strings, whereas parallel arc-faults are mitigated by shorting the array. A range of different experimental series and parallel arc-fault tests with the MidNite combiner box were performed at the Distributed Energy Technologies Laboratory (DETL) at SNL in Albuquerque, NM. In all the tests, the prototype de-energized the arc-faults in the time period required by the arc-fault circuit interrupt testing standard, UL 1699B. The experimental tests confirm series and parallel arc-faults can be successfully mitigated with a combiner box-integrated solution.« less
Finite-Fault and Other New Capabilities of CISN ShakeAlert
NASA Astrophysics Data System (ADS)
Boese, M.; Felizardo, C.; Heaton, T. H.; Hudnut, K. W.; Hauksson, E.
2013-12-01
Over the past 6 years, scientists at Caltech, UC Berkeley, the Univ. of Southern California, the Univ. of Washington, the US Geological Survey, and ETH Zurich (Switzerland) have developed the 'ShakeAlert' earthquake early warning demonstration system for California and the Pacific Northwest. We have now started to transform this system into a stable end-to-end production system that will be integrated into the daily routine operations of the CISN and PNSN networks. To quickly determine the earthquake magnitude and location, ShakeAlert currently processes and interprets real-time data-streams from several hundred seismic stations within the California Integrated Seismic Network (CISN) and the Pacific Northwest Seismic Network (PNSN). Based on these parameters, the 'UserDisplay' software predicts and displays the arrival and intensity of shaking at a given user site. Real-time ShakeAlert feeds are currently being shared with around 160 individuals, companies, and emergency response organizations to gather feedback about the system performance, to educate potential users about EEW, and to identify needs and applications of EEW in a future operational warning system. To improve the performance during large earthquakes (M>6.5), we have started to develop, implement, and test a number of new algorithms for the ShakeAlert system: the 'FinDer' (Finite Fault Rupture Detector) algorithm provides real-time estimates of locations and extents of finite-fault ruptures from high-frequency seismic data. The 'GPSlip' algorithm estimates the fault slip along these ruptures using high-rate real-time GPS data. And, third, a new type of ground-motion prediction models derived from over 415,000 rupture simulations along active faults in southern California improves MMI intensity predictions for large earthquakes with consideration of finite-fault, rupture directivity, and basin response effects. FinDer and GPSlip are currently being real-time and offline tested in a separate internal ShakeAlert installation at Caltech. Real-time position and displacement time series from around 100 GPS sensors are obtained in JSON format from RTK/PPP(AR) solutions using the RTNet software at USGS Pasadena. However, we have also started to investigate the usage of onsite (in-receiver) processing using NetR9 with RTX and tracebuf2 output format. A number of changes to the ShakeAlert processing, xml message format, and the usage of this information in the UserDisplay software were necessary to handle the new finite-fault and slip information from the FinDer and GPSlip algorithms. In addition, we have developed a framework for end-to-end off-line testing with archived and simulated waveform data using the Earthworm tankplayer. Detailed background information about the algorithms, processing, and results from these test runs will be presented.
NASA Astrophysics Data System (ADS)
Da Silva, Antonio; Sánchez Prieto, Sebastián; Rodriguez Polo, Oscar; Parra Espada, Pablo
Computer memories are not supposed to forget, but they do. Because of the proximity of the Sun, from the Solar Orbiter boot software perspective, it is mandatory to look out for permanent memory errors resulting from (SEL) latch-up failures in application binaries stored in EEPROM and its SDRAM deployment areas. In this situation, the last line in defense established by FDIR mechanisms is the capability of the boot software to provide an accurate report of the memories’ damages and to perform an application software update, that avoid the harmed locations by flashing EEPROM with a new binary. This paper describes the OTA EEPROM firmware update procedure verification of the boot software that will run in the Instrument Control Unit (ICU) of the Energetic Particle Detector (EPD) on-board Solar Orbiter. Since the maximum number of rewrites on real EEPROM is limited and permanent memory faults cannot be friendly emulated in real hardware, the verification has been accomplished by the use of a LEON2 Virtual Platform (Leon2ViP) with fault injection capabilities and real SpaceWire interfaces developed by the Space Research Group (SRG) of the University of Alcalá. This way it is possible to run the exact same target binary software as if was run on the real ICU platform. Furthermore, the use of this virtual hardware-in-the-loop (VHIL) approach makes it possible to communicate with Electrical Ground Support Equipment (EGSE) through real SpaceWire interfaces in an agile, controlled and deterministic environment.
Alteration of fault rocks by CO2-bearing fluids with implications for sequestration
NASA Astrophysics Data System (ADS)
Luetkemeyer, P. B.; Kirschner, D. L.; Solum, J. G.; Naruk, S.
2011-12-01
Carbonates and sulfates commonly occur as primary (diagenetic) pore cements and secondary fluid-mobilized veins within fault zones. Stable isotope analyses of calcite, formation fluid, and fault zone fluids can help elucidate the carbon sources and the extent of fluid-rock interaction within a particular reservoir. Introduction of CO2 bearing fluids into a reservoir/fault system can profoundly affect the overall fluid chemistry of the reservoir/fault system and may lead to the enhancement or degradation of porosity within the fault zone. The extent of precipitation and/or dissolution of minerals within a fault zone can ultimately influence the sealing properties of a fault. The Colorado Plateau contains a number of large carbon dioxide reservoirs some of which leak and some of which do not. Several normal faults within the Paradox Basin (SE Utah) dissect the Green River anticline giving rise to a series of footwall reservoirs with fault-dependent columns. Numerous CO2-charged springs and geysers are associated with these faults. This study seeks to identify regional sources and subsurface migration of CO2 to these reservoirs and the effect(s) faults have on trap performance. Data provided in this study include mineralogical, elemental, and stable isotope data for fault rocks, host rocks, and carbonate veins that come from two localities along one fault that locally sealed CO2. This fault is just tens of meters away from another normal fault that has leaked CO2-charged waters to the land surface for thousands of years. These analyses have been used to determine the source of carbon isotopes from sedimentary derived carbon and deeply sourced CO2. XRF and XRD data taken from several transects across the normal faults are consistent with mechanical mixing and fluid-assisted mass transfer processes within the fault zone. δ13C range from -6% to +10% (PDB); δ18O values range from +15% to +24% (VSMOW). Geochemical modeling software is used to model the alteration productions of fault rocks from fluids of various chemistries coming from several different reservoirs within an active CO2-charged fault system. These results are compared to data obtained in the field.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Binh T. Pham; Nancy J. Lybeck; Vivek Agarwal
The Light Water Reactor Sustainability program at Idaho National Laboratory is actively conducting research to develop and demonstrate online monitoring capabilities for active components in existing nuclear power plants. Idaho National Laboratory and the Electric Power Research Institute are working jointly to implement a pilot project to apply these capabilities to emergency diesel generators and generator step-up transformers. The Electric Power Research Institute Fleet-Wide Prognostic and Health Management Software Suite will be used to implement monitoring in conjunction with utility partners: Braidwood Generating Station (owned by Exelon Corporation) for emergency diesel generators, and Shearon Harris Nuclear Generating Station (owned bymore » Duke Energy Progress) for generator step-up transformers. This report presents monitoring techniques, fault signatures, and diagnostic and prognostic models for emergency diesel generators. Emergency diesel generators provide backup power to the nuclear power plant, allowing operation of essential equipment such as pumps in the emergency core coolant system during catastrophic events, including loss of offsite power. Technical experts from Braidwood are assisting Idaho National Laboratory and Electric Power Research Institute in identifying critical faults and defining fault signatures associated with each fault. The resulting diagnostic models will be implemented in the Fleet-Wide Prognostic and Health Management Software Suite and tested using data from Braidwood. Parallel research on generator step-up transformers was summarized in an interim report during the fourth quarter of fiscal year 2012.« less
NASA ground terminal communication equipment automated fault isolation expert systems
NASA Technical Reports Server (NTRS)
Tang, Y. K.; Wetzel, C. R.
1990-01-01
The prototype expert systems are described that diagnose the Distribution and Switching System I and II (DSS1 and DSS2), Statistical Multiplexers (SM), and Multiplexer and Demultiplexer systems (MDM) at the NASA Ground Terminal (NGT). A system level fault isolation expert system monitors the activities of a selected data stream, verifies that the fault exists in the NGT and identifies the faulty equipment. Equipment level fault isolation expert systems are invoked to isolate the fault to a Line Replaceable Unit (LRU) level. Input and sometimes output data stream activities for the equipment are available. The system level fault isolation expert system compares the equipment input and output status for a data stream and performs loopback tests (if necessary) to isolate the faulty equipment. The equipment level fault isolation system utilizes the process of elimination and/or the maintenance personnel's fault isolation experience stored in its knowledge base. The DSS1, DSS2 and SM fault isolation systems, using the knowledge of the current equipment configuration and the equipment circuitry issues a set of test connections according to the predefined rules. The faulty component or board can be identified by the expert system by analyzing the test results. The MDM fault isolation system correlates the failure symptoms with the faulty component based on maintenance personnel experience. The faulty component can be determined by knowing the failure symptoms. The DSS1, DSS2, SM, and MDM equipment simulators are implemented in PASCAL. The DSS1 fault isolation expert system was converted to C language from VP-Expert and integrated into the NGT automation software for offline switch diagnoses. Potentially, the NGT fault isolation algorithms can be used for the DSS1, SM, amd MDM located at Goddard Space Flight Center (GSFC).
JET ICRH plant statistics from 2008-2012
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wooldridge, E.; Monakhov, I.; Blackman, T.
2014-02-12
JET ICRH plant faults from 2008 - 2012 have been catalogued and a new assessment of the reliability of the plant by sub-system is given. Data from pulses where ICRH was used, excluding the ITER-Like Antenna (ILA) and its generators, has been collated. This is compared to fault data in order to investigate any correlation between faults and operations. The number of faults is shown to have decreased between 2011-2012 in comparison to 2008-2009 as the time between faults is shown to have increased. Future electronic fault logging requirements to enable easier analysis are discussed. Due to the changing configurationmore » of the ICRH plant; the introduction of ELM tolerant systems, generator upgrade, changes to the settings of the VSWR protection et cetera, a method to expand the fault database to include more historical data [1] in a consistent way are discussed.« less
30 CFR 75.824 - Electrical protection.
Code of Federal Regulations, 2010 CFR
2010-07-01
... transformer and over-current relay in the neutral grounding resistor circuit. (vi) A single window-type current transformer that encircles all three-phase conductors must be used to activate the ground-fault... current transformer. (vii) A test circuit for the ground-fault device must be provided. The test circuit...
30 CFR 75.824 - Electrical protection.
Code of Federal Regulations, 2012 CFR
2012-07-01
... transformer and over-current relay in the neutral grounding resistor circuit. (vi) A single window-type current transformer that encircles all three-phase conductors must be used to activate the ground-fault... current transformer. (vii) A test circuit for the ground-fault device must be provided. The test circuit...
30 CFR 75.824 - Electrical protection.
Code of Federal Regulations, 2014 CFR
2014-07-01
... transformer and over-current relay in the neutral grounding resistor circuit. (vi) A single window-type current transformer that encircles all three-phase conductors must be used to activate the ground-fault... current transformer. (vii) A test circuit for the ground-fault device must be provided. The test circuit...
30 CFR 75.824 - Electrical protection.
Code of Federal Regulations, 2011 CFR
2011-07-01
... transformer and over-current relay in the neutral grounding resistor circuit. (vi) A single window-type current transformer that encircles all three-phase conductors must be used to activate the ground-fault... current transformer. (vii) A test circuit for the ground-fault device must be provided. The test circuit...
30 CFR 75.824 - Electrical protection.
Code of Federal Regulations, 2013 CFR
2013-07-01
... transformer and over-current relay in the neutral grounding resistor circuit. (vi) A single window-type current transformer that encircles all three-phase conductors must be used to activate the ground-fault... current transformer. (vii) A test circuit for the ground-fault device must be provided. The test circuit...
NASA Astrophysics Data System (ADS)
Martin-Rojas, Ivan; Alfaro, Pedro; Estévez, Antonio
2014-05-01
We present a study that encompasses several software tools (iGIS©, ArcGIS©, Autocad©, etc.) and data (geological mapping, high resolution digital topographic data, high resolution aerial photographs, etc.) to create a detailed 3D geometric model of an active fault propagation growth fold. This 3D model clearly shows structural features of the analysed fold, as well as growth relationships and sedimentary patterns. The results obtained permit us to discuss the kinematics and structural evolution of the fold and the fault in time and space. The study fault propagation fold is the Crevillente syncline. This fold represents the northern limit of the Bajo Segura Basin, an intermontane basin in the Eastern Betic Cordillera (SE Spain) developed from upper Miocene on. 3D features of the Crevillente syncline, including growth pattern, indicate that limb rotation and, consequently, fault activity was higher during Messinian than during Tortonian; consequently, fault activity was also higher. From Pliocene on our data point that limb rotation and fault activity steadies or probably decreases. This in time evolution of the Crevillente syncline is not the same all along the structure; actually the 3D geometric model indicates that observed lateral heterogeneity is related to along strike variation of fault displacement.
Automated Generation of Fault Management Artifacts from a Simple System Model
NASA Technical Reports Server (NTRS)
Kennedy, Andrew K.; Day, John C.
2013-01-01
Our understanding of off-nominal behavior - failure modes and fault propagation - in complex systems is often based purely on engineering intuition; specific cases are assessed in an ad hoc fashion as a (fallible) fault management engineer sees fit. This work is an attempt to provide a more rigorous approach to this understanding and assessment by automating the creation of a fault management artifact, the Failure Modes and Effects Analysis (FMEA) through querying a representation of the system in a SysML model. This work builds off the previous development of an off-nominal behavior model for the upcoming Soil Moisture Active-Passive (SMAP) mission at the Jet Propulsion Laboratory. We further developed the previous system model to more fully incorporate the ideas of State Analysis, and it was restructured in an organizational hierarchy that models the system as layers of control systems while also incorporating the concept of "design authority". We present software that was developed to traverse the elements and relationships in this model to automatically construct an FMEA spreadsheet. We further discuss extending this model to automatically generate other typical fault management artifacts, such as Fault Trees, to efficiently portray system behavior, and depend less on the intuition of fault management engineers to ensure complete examination of off-nominal behavior.
Synchronization and fault-masking in redundant real-time systems
NASA Technical Reports Server (NTRS)
Krishna, C. M.; Shin, K. G.; Butler, R. W.
1983-01-01
A real time computer may fail because of massive component failures or not responding quickly enough to satisfy real time requirements. An increase in redundancy - a conventional means of improving reliability - can improve the former but can - in some cases - degrade the latter considerably due to the overhead associated with redundancy management, namely the time delay resulting from synchronization and voting/interactive consistency techniques. The implications of synchronization and voting/interactive consistency algorithms in N-modular clusters on reliability are considered. All these studies were carried out in the context of real time applications. As a demonstrative example, we have analyzed results from experiments conducted at the NASA Airlab on the Software Implemented Fault Tolerance (SIFT) computer. This analysis has indeed indicated that in most real time applications, it is better to employ hardware synchronization instead of software synchronization and not allow reconfiguration.
Investigation of an advanced fault tolerant integrated avionics system
NASA Technical Reports Server (NTRS)
Dunn, W. R.; Cottrell, D.; Flanders, J.; Javornik, A.; Rusovick, M.
1986-01-01
Presented is an advanced, fault-tolerant multiprocessor avionics architecture as could be employed in an advanced rotorcraft such as LHX. The processor structure is designed to interface with existing digital avionics systems and concepts including the Army Digital Avionics System (ADAS) cockpit/display system, navaid and communications suites, integrated sensing suite, and the Advanced Digital Optical Control System (ADOCS). The report defines mission, maintenance and safety-of-flight reliability goals as might be expected for an operational LHX aircraft. Based on use of a modular, compact (16-bit) microprocessor card family, results of a preliminary study examining simplex, dual and standby-sparing architectures is presented. Given the stated constraints, it is shown that the dual architecture is best suited to meet reliability goals with minimum hardware and software overhead. The report presents hardware and software design considerations for realizing the architecture including redundancy management requirements and techniques as well as verification and validation needs and methods.
Software fault-tolerance by design diversity DEDIX: A tool for experiments
NASA Technical Reports Server (NTRS)
Avizienis, A.; Gunningberg, P.; Kelly, J. P. J.; Lyu, R. T.; Strigini, L.; Traverse, P. J.; Tso, K. S.; Voges, U.
1986-01-01
The use of multiple versions of a computer program, independently designed from a common specification, to reduce the effects of an error is discussed. If these versions are designed by independent programming teams, it is expected that a fault in one version will not have the same behavior as any fault in the other versions. Since the errors in the output of the versions are different and uncorrelated, it is possible to run the versions concurrently, cross-check their results at prespecified points, and mask errors. A DEsign DIversity eXperiments (DEDIX) testbed was implemented to study the influence of common mode errors which can result in a failure of the entire system. The layered design of DEDIX and its decision algorithm are described.
Formal specification and mechanical verification of SIFT - A fault-tolerant flight control system
NASA Technical Reports Server (NTRS)
Melliar-Smith, P. M.; Schwartz, R. L.
1982-01-01
The paper describes the methodology being employed to demonstrate rigorously that the SIFT (software-implemented fault-tolerant) computer meets its requirements. The methodology uses a hierarchy of design specifications, expressed in the mathematical domain of multisorted first-order predicate calculus. The most abstract of these, from which almost all details of mechanization have been removed, represents the requirements on the system for reliability and intended functionality. Successive specifications in the hierarchy add design and implementation detail until the PASCAL programs implementing the SIFT executive are reached. A formal proof that a SIFT system in a 'safe' state operates correctly despite the presence of arbitrary faults has been completed all the way from the most abstract specifications to the PASCAL program.
Modelling of 3D fractured geological systems - technique and application
NASA Astrophysics Data System (ADS)
Cacace, M.; Scheck-Wenderoth, M.; Cherubini, Y.; Kaiser, B. O.; Bloecher, G.
2011-12-01
All rocks in the earth's crust are fractured to some extent. Faults and fractures are important in different scientific and industry fields comprising engineering, geotechnical and hydrogeological applications. Many petroleum, gas and geothermal and water supply reservoirs form in faulted and fractured geological systems. Additionally, faults and fractures may control the transport of chemical contaminants into and through the subsurface. Depending on their origin and orientation with respect to the recent and palaeo stress field as well as on the overall kinematics of chemical processes occurring within them, faults and fractures can act either as hydraulic conductors providing preferential pathways for fluid to flow or as barriers preventing flow across them. The main challenge in modelling processes occurring in fractured rocks is related to the way of describing the heterogeneities of such geological systems. Flow paths are controlled by the geometry of faults and their open void space. To correctly simulate these processes an adequate 3D mesh is a basic requirement. Unfortunately, the representation of realistic 3D geological environments is limited by the complexity of embedded fracture networks often resulting in oversimplified models of the natural system. A technical description of an improved method to integrate generic dipping structures (representing faults and fractures) into a 3D porous medium is out forward. The automated mesh generation algorithm is composed of various existing routines from computational geometry (e.g. 2D-3D projection, interpolation, intersection, convex hull calculation) and meshing (e.g. triangulation in 2D and tetrahedralization in 3D). All routines have been combined in an automated software framework and the robustness of the approach has been tested and verified. These techniques and methods can be applied for fractured porous media including fault systems and therefore found wide applications in different geo-energy related topics including CO2 storage in deep saline aquifers, shale gas extraction and geothermal heat recovery. The main advantage is that dipping structures can be integrated into a 3D body representing the porous media and the interaction between the discrete flow paths through and across faults and fractures and within the rock matrix can be correctly simulated. In addition the complete workflow is captured by open-source software.
Dynamic rupture simulations of the 2016 Mw7.8 Kaikōura earthquake: a cascading multi-fault event
NASA Astrophysics Data System (ADS)
Ulrich, T.; Gabriel, A. A.; Ampuero, J. P.; Xu, W.; Feng, G.
2017-12-01
The Mw7.8 Kaikōura earthquake struck the Northern part of New Zealand's South Island roughly one year ago. It ruptured multiple segments of the contractional North Canterbury fault zone and of the Marlborough fault system. Field observations combined with satellite data suggest a rupture path involving partly unmapped faults separated by large stepover distances larger than 5 km, the maximum distance usually considered by the latest seismic hazard assessment methods. This might imply distant rupture transfer mechanisms generally not considered in seismic hazard assessment. We present high-resolution 3D dynamic rupture simulations of the Kaikōura earthquake under physically self-consistent initial stress and strength conditions. Our simulations are based on recent finite-fault slip inversions that constrain fault system geometry and final slip distribution from remote sensing, surface rupture and geodetic data (Xu et al., 2017). We assume a uniform background stress field, without lateral fault stress or strength heterogeneity. We use the open-source software SeisSol (www.seissol.org) which is based on an arbitrary high-order accurate DERivative Discontinuous Galerkin method (ADER-DG). Our method can account for complex fault geometries, high resolution topography and bathymetry, 3D subsurface structure, off-fault plasticity and modern friction laws. It enables the simulation of seismic wave propagation with high-order accuracy in space and time in complex media. We show that a cascading rupture driven by dynamic triggering can break all fault segments that were involved in this earthquake without mechanically requiring an underlying thrust fault. Our prefered fault geometry connects most fault segments: it does not features stepover larger than 2 km. The best scenario matches the main macroscopic characteristics of the earthquake, including its apparently slow rupture propagation caused by zigzag cascading, the moment magnitude and the overall inferred slip distribution. We observe a high sensitivity of cascading dynamics on fault-step over distance and off-fault energy dissipation.
NASA Technical Reports Server (NTRS)
Haakensen, Erik Edward
1998-01-01
The desire for low-cost reliable computing is increasing. Most current fault tolerant computing solutions are not very flexible, i.e., they cannot adapt to reliability requirements of newly emerging applications in business, commerce, and manufacturing. It is important that users have a flexible, reliable platform to support both critical and noncritical applications. Chameleon, under development at the Center for Reliable and High-Performance Computing at the University of Illinois, is a software framework. for supporting cost-effective adaptable networked fault tolerant service. This thesis details a simulation of fault injection, detection, and recovery in Chameleon. The simulation was written in C++ using the DEPEND simulation library. The results obtained from the simulation included the amount of overhead incurred by the fault detection and recovery mechanisms supported by Chameleon. In addition, information about fault scenarios from which Chameleon cannot recover was gained. The results of the simulation showed that both critical and noncritical applications can be executed in the Chameleon environment with a fairly small amount of overhead. No single point of failure from which Chameleon could not recover was found. Chameleon was also found to be capable of recovering from several multiple failure scenarios.
Dynamic assertion testing of flight control software
NASA Technical Reports Server (NTRS)
Andrews, D. M.; Mahmood, A.; Mccluskey, E. J.
1985-01-01
An experiment in using assertions to dynamically test fault tolerant flight software is described. The experiment showed that 87% of typical errors introduced into the program would be detected by assertions. Detailed analysis of the test data showed that the number of assertions needed to detect those errors could be reduced to a minimal set. The analysis also revealed that the most effective assertions tested program parameters that provided greater indirect (collateral) testing of other parameters.
Fault Tolerant Hardware/Software Architecture for Flight Critical Function
1985-09-01
Applications Studies Programme. The results of AGARD work are reported to the member nations and the NATO Authorities through the AGARD series of...systems, and is being advocated as a defense against design deficiencies which can plague software. - -- -- z--mm-L ___ K A critical application area for...day of the lecture series concludes with part I of a paper on the ;use of the Ada programming language In flight critical applications . Ada has been
NASA Astrophysics Data System (ADS)
Yashvantrai Vyas, Bhargav; Maheshwari, Rudra Prakash; Das, Biswarup
2016-06-01
Application of series compensation in extra high voltage (EHV) transmission line makes the protection job difficult for engineers, due to alteration in system parameters and measurements. The problem amplifies with inclusion of electronically controlled compensation like thyristor controlled series compensation (TCSC) as it produce harmonics and rapid change in system parameters during fault associated with TCSC control. This paper presents a pattern recognition based fault type identification approach with support vector machine. The scheme uses only half cycle post fault data of three phase currents to accomplish the task. The change in current signal features during fault has been considered as discriminatory measure. The developed scheme in this paper is tested over a large set of fault data with variation in system and fault parameters. These fault cases have been generated with PSCAD/EMTDC on a 400 kV, 300 km transmission line model. The developed algorithm has proved better for implementation on TCSC compensated line with its improved accuracy and speed.
NASA Tech Briefs, February 2004
NASA Technical Reports Server (NTRS)
2004-01-01
Topics include: Simulation Testing of Embedded Flight Software; Improved Indentation Test for Measuring Nonlinear Elasticity; Ultraviolet-Absorption Spectroscopic Biofilm Monitor; Electronic Tongue for Quantitation of Contaminants in Water; Radar for Measuring Soil Moisture Under Vegetation; Modular Wireless Data-Acquisition and Control System; Microwave System for Detecting Ice on Aircraft; Routing Algorithm Exploits Spatial Relations; Two-Finger EKG Method of Detecting Evasive Responses; Updated System-Availability and Resource-Allocation Program; Routines for Computing Pressure Drops in Venturis; Software for Fault-Tolerant Matrix Multiplication; Reproducible Growth of High-Quality Cubic-SiC Layers; Nonlinear Thermoelastic Model for SMAs and SMA Hybrid Composites; Liquid-Crystal Thermosets, a New Generation of High-Performance Liquid-Crystal Polymers; Formulations for Stronger Solid Oxide Fuel-Cell Electrolytes; Simulation of Hazards and Poses for a Rocker-Bogie Rover; Autonomous Formation Flight; Expandable Purge Chambers Would Protect Cryogenic Fittings; Wavy-Planform Helicopter Blades Make Less Noise; Miniature Robotic Spacecraft for Inspecting Other Spacecraft; Miniature Ring-Shaped Peristaltic Pump; Compact Plasma Accelerator; Improved Electrohydraulic Linear Actuators; A Software Architecture for Semiautonomous Robot Control; Fabrication of Channels for Nanobiotechnological Devices; Improved Thin, Flexible Heat Pipes; Miniature Radioisotope Thermoelectric Power Cubes; Permanent Sequestration of Emitted Gases in the Form of Clathrate Hydrates; Electrochemical, H2O2-Boosted Catalytic Oxidation System; Electrokinetic In Situ Treatment of Metal-Contaminated Soil; Pumping Liquid Oxygen by Use of Pulsed Magnetic Fields; Magnetocaloric Pumping of Liquid Oxygen; Tailoring Ion-Thruster Grid Apertures for Greater Efficiency; and Lidar for Guidance of a Spacecraft or Exploratory Robot.
NASA Astrophysics Data System (ADS)
Weatherill, Graeme; Garcia, Julio; Poggi, Valerio; Chen, Yen-Shin; Pagani, Marco
2016-04-01
The Global Earthquake Model (GEM) has, since its inception in 2009, made many contributions to the practice of seismic hazard modeling in different regions of the globe. The OpenQuake-engine (hereafter referred to simply as OpenQuake), GEM's open-source software for calculation of earthquake hazard and risk, has found application in many countries, spanning a diversity of tectonic environments. GEM itself has produced a database of national and regional seismic hazard models, harmonizing into OpenQuake's own definition the varied seismogenic sources found therein. The characterization of active faults in probabilistic seismic hazard analysis (PSHA) is at the centre of this process, motivating many of the developments in OpenQuake and presenting hazard modellers with the challenge of reconciling seismological, geological and geodetic information for the different regions of the world. Faced with these challenges, and from the experience gained in the process of harmonizing existing models of seismic hazard, four critical issues are addressed. The challenge GEM has faced in the development of software is how to define a representation of an active fault (both in terms of geometry and earthquake behaviour) that is sufficiently flexible to adapt to different tectonic conditions and levels of data completeness. By exploring the different fault typologies supported by OpenQuake we illustrate how seismic hazard calculations can, and do, take into account complexities such as geometrical irregularity of faults in the prediction of ground motion, highlighting some of the potential pitfalls and inconsistencies that can arise. This exploration leads to the second main challenge in active fault modeling, what elements of the fault source model impact most upon the hazard at a site, and when does this matter? Through a series of sensitivity studies we show how different configurations of fault geometry, and the corresponding characterisation of near-fault phenomena (including hanging wall and directivity effects) within modern ground motion prediction equations, can have an influence on the seismic hazard at a site. Yet we also illustrate the conditions under which these effects may be partially tempered when considering the full uncertainty in rupture behaviour within the fault system. The third challenge is the development of efficient means for representing both aleatory and epistemic uncertainties from active fault models in PSHA. In implementing state-of-the-art seismic hazard models into OpenQuake, such as those recently undertaken in California and Japan, new modeling techniques are needed that redefine how we treat interdependence of ruptures within the model (such as mutual exclusivity), and the propagation of uncertainties emerging from geology. Finally, we illustrate how OpenQuake, and GEM's additional toolkits for model preparation, can be applied to address long-standing issues in active fault modeling in PSHA. These include constraining the seismogenic coupling of a fault and the partitioning of seismic moment between the active fault surfaces and the surrounding seismogenic crust. We illustrate some of the possible roles that geodesy can play in the process, but highlight where this may introduce new uncertainties and potential biases into the seismic hazard process, and how these can be addressed.
Logic design for dynamic and interactive recovery.
NASA Technical Reports Server (NTRS)
Carter, W. C.; Jessep, D. C.; Wadia, A. B.; Schneider, P. R.; Bouricius, W. G.
1971-01-01
Recovery in a fault-tolerant computer means the continuation of system operation with data integrity after an error occurs. This paper delineates two parallel concepts embodied in the hardware and software functions required for recovery; detection, diagnosis, and reconfiguration for hardware, data integrity, checkpointing, and restart for the software. The hardware relies on the recovery variable set, checking circuits, and diagnostics, and the software relies on the recovery information set, audit, and reconstruct routines, to characterize the system state and assist in recovery when required. Of particular utility is a handware unit, the recovery control unit, which serves as an interface between error detection and software recovery programs in the supervisor and provides dynamic interactive recovery.
Software reliability perspectives
NASA Technical Reports Server (NTRS)
Wilson, Larry; Shen, Wenhui
1987-01-01
Software which is used in life critical functions must be known to be highly reliable before installation. This requires a strong testing program to estimate the reliability, since neither formal methods, software engineering nor fault tolerant methods can guarantee perfection. Prior to the final testing software goes through a debugging period and many models have been developed to try to estimate reliability from the debugging data. However, the existing models are poorly validated and often give poor performance. This paper emphasizes the fact that part of their failures can be attributed to the random nature of the debugging data given to these models as input, and it poses the problem of correcting this defect as an area of future research.
NASA Astrophysics Data System (ADS)
Moskalenko, A. N.; Khudoley, A. K.; Khusnitdinov, R. R.
2017-05-01
In this work, we consider application of an original method for determining the indicators of the tectonic stress fields in the northern Baikit anteclise based on 3D seismic data for further reconstruction of the stress state parameters when analyzing structural maps of seismic horizons and corresponded faults. The stress state parameters are determined by the orientations of the main stress axes and shape of the stress ellipsoid. To calculate the stress state parameters from data on the spatial orientations of faults and slip vectors, we used the algorithms from quasiprimary stress computation methods and cataclastic analysis, implemented in the software products FaultKinWin and StressGeol, respectively. The results of this work show that kinematic characteristics of faults regularly change toward the top of succession and that the stress state parameters are characterized by different values of the Lode-Nadai coefficient. Faults are presented as strike-slip faults with normal or reverse component of displacement. Three stages of formation of the faults are revealed: (1) partial inversion of ancient normal faults, (2) the most intense stage with the predominance of thrust and strike-slip faults at north-northeast orientation of an axis of the main compression, and (3) strike-slip faults at the west-northwest orientation of an axis of the main compression. The second and third stages are pre-Vendian in age and correlate to tectonic events that took place during the evolution of the active southwestern margin of the Siberian Craton.
NASA Astrophysics Data System (ADS)
Ilik, Semih C.; Arsoy, Aysen B.
2017-07-01
Integration of distributed generation (DG) such as renewable energy sources to electrical network becomes more prevalent in recent years. Grid connection of DG has effects on load flow directions, voltage profile, short circuit power and especially protection selectivity. Applying traditional overcurrent protection scheme is inconvenient when system reliability and sustainability are considered. If a fault happens in DG connected network, short circuit contribution of DG, creates additional branch element feeding the fault current; compels to consider directional overcurrent (OC) protection scheme. Protection coordination might get lost for changing working conditions when DG sources are connected. Directional overcurrent relay parameters are determined for downstream and upstream relays when different combinations of DG connected singular or plural, on radial test system. With the help of proposed flow chart, relay parameters are updated and coordination between relays kept sustained for different working conditions in DigSILENT PowerFactory program.
Component Prioritization Schema for Achieving Maximum Time and Cost Benefits from Software Testing
NASA Astrophysics Data System (ADS)
Srivastava, Praveen Ranjan; Pareek, Deepak
Software testing is any activity aimed at evaluating an attribute or capability of a program or system and determining that it meets its required results. Defining the end of software testing represents crucial features of any software development project. A premature release will involve risks like undetected bugs, cost of fixing faults later, and discontented customers. Any software organization would want to achieve maximum possible benefits from software testing with minimum resources. Testing time and cost need to be optimized for achieving a competitive edge in the market. In this paper, we propose a schema, called the Component Prioritization Schema (CPS), to achieve an effective and uniform prioritization of the software components. This schema serves as an extension to the Non Homogenous Poisson Process based Cumulative Priority Model. We also introduce an approach for handling time-intensive versus cost-intensive projects.
A software engineering approach to expert system design and verification
NASA Technical Reports Server (NTRS)
Bochsler, Daniel C.; Goodwin, Mary Ann
1988-01-01
Software engineering design and verification methods for developing expert systems are not yet well defined. Integration of expert system technology into software production environments will require effective software engineering methodologies to support the entire life cycle of expert systems. The software engineering methods used to design and verify an expert system, RENEX, is discussed. RENEX demonstrates autonomous rendezvous and proximity operations, including replanning trajectory events and subsystem fault detection, onboard a space vehicle during flight. The RENEX designers utilized a number of software engineering methodologies to deal with the complex problems inherent in this system. An overview is presented of the methods utilized. Details of the verification process receive special emphasis. The benefits and weaknesses of the methods for supporting the development life cycle of expert systems are evaluated, and recommendations are made based on the overall experiences with the methods.
NASA Technical Reports Server (NTRS)
Benard, Doug; Dorais, Gregory A.; Gamble, Ed; Kanefsky, Bob; Kurien, James; Millar, William; Muscettola, Nicola; Nayak, Pandu; Rouquette, Nicolas; Rajan, Kanna;
2000-01-01
Remote Agent (RA) is a model-based, reusable artificial intelligence (At) software system that enables goal-based spacecraft commanding and robust fault recovery. RA was flight validated during an experiment on board of DS1 between May 17th and May 21th, 1999.
NASA Astrophysics Data System (ADS)
Ruzhich, Valery V.; Psakhie, Sergey G.; Levina, Elena A.; Shilko, Evgeny V.; Grigoriev, Alexandr S.
2017-12-01
In the paper we briefly outline the experience in forecasting catastrophic earthquakes and the general problems in ensuring seismic safety. The purpose of our long-term research is the development and improvement of the methods of man-caused impacts on large-scale fault segments to safely reduce the negative effect of seismodynamic failure. Various laboratory and large-scale field experiments were carried out in the segments of tectonic faults in Baikal rift zone and in main cracks in block-structured ice cove of Lake Baikal using the developed measuring systems and special software for identification and treatment of deformation response of faulty segments to man-caused impacts. The results of the study let us to ground the necessity of development of servo-controlled technologies, which are able to provide changing the shear resistance and deformation regime of fault zone segments by applying vibrational and pulse triggering impacts. We suppose that the use of triggering impacts in highly stressed segments of active faults will promote transferring the geodynamic state of these segments from a metastable to a more stable and safe state.
NASA Astrophysics Data System (ADS)
Alegre, D. M.; Koroishi, E. H.; Melo, G. P.
2015-07-01
This paper presents a methodology for detection and localization of faults by using state observers. State Observers can rebuild the states not measured or values from points of difficult access in the system. So faults can be detected in these points without the knowledge of its measures, and can be track by the reconstructions of their states. In this paper this methodology will be applied in a system which represents a simplified model of a vehicle. In this model the chassis of the car was represented by a flat plate, which was divided in finite elements of plate (plate of Kirchoff), in addition, was considered the car suspension (springs and dampers). A test rig was built and the developed methodology was used to detect and locate faults on this system. In analyses done, the idea is to use a system with a specific fault, and then use the state observers to locate it, checking on a quantitative variation of the parameter of the system which caused this crash. For the computational simulations the software MATLAB was used.
A Legal Guide for the Software Developer.
ERIC Educational Resources Information Center
Minnesota Small Business Assistance Office, St. Paul.
This booklet has been prepared to familiarize the inventor, creator, or developer of a new computer software product or software invention with the basic legal issues involved in developing, protecting, and distributing the software in the United States. Basic types of software protection and related legal matters are discussed in detail,…
Computer Software & Intellectual Property. Background Paper.
ERIC Educational Resources Information Center
Congress of the U.S., Washington, DC. Office of Technology Assessment.
This background paper reviews copyright, patent, and trade secret protections as these issues are related to computer software. Topics discussed include current issues regarding legal protection for computer software including the necessity for defining intellectual property, determining what should or should not be protected, commerical piracy,…
The Design of Fault Tolerant Quantum Dot Cellular Automata Based Logic
NASA Technical Reports Server (NTRS)
Armstrong, C. Duane; Humphreys, William M.; Fijany, Amir
2002-01-01
As transistor geometries are reduced, quantum effects begin to dominate device performance. At some point, transistors cease to have the properties that make them useful computational components. New computing elements must be developed in order to keep pace with Moore s Law. Quantum dot cellular automata (QCA) represent an alternative paradigm to transistor-based logic. QCA architectures that are robust to manufacturing tolerances and defects must be developed. We are developing software that allows the exploration of fault tolerant QCA gate architectures by automating the specification, simulation, analysis and documentation processes.
NASA Technical Reports Server (NTRS)
Stiffler, J. J.; Bryant, L. A.; Guccione, L.
1979-01-01
A computer program to aid in accessing the reliability of fault tolerant avionics systems was developed. A simple mathematical expression was used to evaluate the reliability of any redundant configuration over any interval during which the failure rates and coverage parameters remained unaffected by configuration changes. Provision was made for convolving such expressions in order to evaluate the reliability of a dual mode system. A coverage model was also developed to determine the various relevant coverage coefficients as a function of the available hardware and software fault detector characteristics, and subsequent isolation and recovery delay statistics.
A fault tolerant 80960 engine controller
NASA Technical Reports Server (NTRS)
Reichmuth, D. M.; Gage, M. L.; Paterson, E. S.; Kramer, D. D.
1993-01-01
The paper describes the design of the 80960 Fault Tolerant Engine Controller for the supervision of engine operations, which was designed for the NASA Marshall Space Center. Consideration is given to the major electronic components of the controller, including the engine controller, effectors, and the sensors, as well as to the controller hardware, the controller module and the communications module, and the controller software. The architecture of the controller hardware allows modifications to be made to fit the requirements of any new propulsion systems. Multiple flow diagrams are presented illustrating the controller's operations.
Development of a space-systems network testbed
NASA Technical Reports Server (NTRS)
Lala, Jaynarayan; Alger, Linda; Adams, Stuart; Burkhardt, Laura; Nagle, Gail; Murray, Nicholas
1988-01-01
This paper describes a communications network testbed which has been designed to allow the development of architectures and algorithms that meet the functional requirements of future NASA communication systems. The central hardware components of the Network Testbed are programmable circuit switching communication nodes which can be adapted by software or firmware changes to customize the testbed to particular architectures and algorithms. Fault detection, isolation, and reconfiguration has been implemented in the Network with a hybrid approach which utilizes features of both centralized and distributed techniques to provide efficient handling of faults within the Network.