Fault Tree Analysis Application for Safety and Reliability
NASA Technical Reports Server (NTRS)
Wallace, Dolores R.
2003-01-01
Many commercial software tools exist for fault tree analysis (FTA), an accepted method for mitigating risk in systems. The method embedded in the tools identifies a root as use in system components, but when software is identified as a root cause, it does not build trees into the software component. No commercial software tools have been built specifically for development and analysis of software fault trees. Research indicates that the methods of FTA could be applied to software, but the method is not practical without automated tool support. With appropriate automated tool support, software fault tree analysis (SFTA) may be a practical technique for identifying the underlying cause of software faults that may lead to critical system failures. We strive to demonstrate that existing commercial tools for FTA can be adapted for use with SFTA, and that applied to a safety-critical system, SFTA can be used to identify serious potential problems long before integrator and system testing.
NASA Technical Reports Server (NTRS)
Butler, Ricky W.; Boerschlein, David P.
1993-01-01
Fault-Tree Compiler (FTC) program, is software tool used to calculate probability of top event in fault tree. Gates of five different types allowed in fault tree: AND, OR, EXCLUSIVE OR, INVERT, and M OF N. High-level input language easy to understand and use. In addition, program supports hierarchical fault-tree definition feature, which simplifies tree-description process and reduces execution time. Set of programs created forming basis for reliability-analysis workstation: SURE, ASSIST, PAWS/STEM, and FTC fault-tree tool (LAR-14586). Written in PASCAL, ANSI-compliant C language, and FORTRAN 77. Other versions available upon request.
NASA Technical Reports Server (NTRS)
Butler, Ricky W.; Martensen, Anna L.
1992-01-01
FTC, Fault-Tree Compiler program, is reliability-analysis software tool used to calculate probability of top event of fault tree. Five different types of gates allowed in fault tree: AND, OR, EXCLUSIVE OR, INVERT, and M OF N. High-level input language of FTC easy to understand and use. Program supports hierarchical fault-tree-definition feature simplifying process of description of tree and reduces execution time. Solution technique implemented in FORTRAN, and user interface in Pascal. Written to run on DEC VAX computer operating under VMS operating system.
Tutorial: Advanced fault tree applications using HARP
NASA Technical Reports Server (NTRS)
Dugan, Joanne Bechta; Bavuso, Salvatore J.; Boyd, Mark A.
1993-01-01
Reliability analysis of fault tolerant computer systems for critical applications is complicated by several factors. These modeling difficulties are discussed and dynamic fault tree modeling techniques for handling them are described and demonstrated. Several advanced fault tolerant computer systems are described, and fault tree models for their analysis are presented. HARP (Hybrid Automated Reliability Predictor) is a software package developed at Duke University and NASA Langley Research Center that is capable of solving the fault tree models presented.
Learning from examples - Generation and evaluation of decision trees for software resource analysis
NASA Technical Reports Server (NTRS)
Selby, Richard W.; Porter, Adam A.
1988-01-01
A general solution method for the automatic generation of decision (or classification) trees is investigated. The approach is to provide insights through in-depth empirical characterization and evaluation of decision trees for software resource data analysis. The trees identify classes of objects (software modules) that had high development effort. Sixteen software systems ranging from 3,000 to 112,000 source lines were selected for analysis from a NASA production environment. The collection and analysis of 74 attributes (or metrics), for over 4,700 objects, captured information about the development effort, faults, changes, design style, and implementation style. A total of 9,600 decision trees were automatically generated and evaluated. The trees correctly identified 79.3 percent of the software modules that had high development effort or faults, and the trees generated from the best parameter combinations correctly identified 88.4 percent of the modules on the average.
Development and validation of techniques for improving software dependability
NASA Technical Reports Server (NTRS)
Knight, John C.
1992-01-01
A collection of document abstracts are presented on the topic of improving software dependability through NASA grant NAG-1-1123. Specific topics include: modeling of error detection; software inspection; test cases; Magnetic Stereotaxis System safety specifications and fault trees; and injection of synthetic faults into software.
Analysis of a hardware and software fault tolerant processor for critical applications
NASA Technical Reports Server (NTRS)
Dugan, Joanne B.
1993-01-01
Computer systems for critical applications must be designed to tolerate software faults as well as hardware faults. A unified approach to tolerating hardware and software faults is characterized by classifying faults in terms of duration (transient or permanent) rather than source (hardware or software). Errors arising from transient faults can be handled through masking or voting, but errors arising from permanent faults require system reconfiguration to bypass the failed component. Most errors which are caused by software faults can be considered transient, in that they are input-dependent. Software faults are triggered by a particular set of inputs. Quantitative dependability analysis of systems which exhibit a unified approach to fault tolerance can be performed by a hierarchical combination of fault tree and Markov models. A methodology for analyzing hardware and software fault tolerant systems is applied to the analysis of a hypothetical system, loosely based on the Fault Tolerant Parallel Processor. The models consider both transient and permanent faults, hardware and software faults, independent and related software faults, automatic recovery, and reconfiguration.
Software For Fault-Tree Diagnosis Of A System
NASA Technical Reports Server (NTRS)
Iverson, Dave; Patterson-Hine, Ann; Liao, Jack
1993-01-01
Fault Tree Diagnosis System (FTDS) computer program is automated-diagnostic-system program identifying likely causes of specified failure on basis of information represented in system-reliability mathematical models known as fault trees. Is modified implementation of failure-cause-identification phase of Narayanan's and Viswanadham's methodology for acquisition of knowledge and reasoning in analyzing failures of systems. Knowledge base of if/then rules replaced with object-oriented fault-tree representation. Enhancement yields more-efficient identification of causes of failures and enables dynamic updating of knowledge base. Written in C language, C++, and Common LISP.
Secure Embedded System Design Methodologies for Military Cryptographic Systems
2016-03-31
Fault- Tree Analysis (FTA); Built-In Self-Test (BIST) Introduction Secure access-control systems restrict operations to authorized users via methods...failures in the individual software/processor elements, the question of exactly how unlikely is difficult to answer. Fault- Tree Analysis (FTA) has a...Collins of Sandia National Laboratories for years of sharing his extensive knowledge of Fail-Safe Design Assurance and Fault- Tree Analysis
Object-Oriented Algorithm For Evaluation Of Fault Trees
NASA Technical Reports Server (NTRS)
Patterson-Hine, F. A.; Koen, B. V.
1992-01-01
Algorithm for direct evaluation of fault trees incorporates techniques of object-oriented programming. Reduces number of calls needed to solve trees with repeated events. Provides significantly improved software environment for such computations as quantitative analyses of safety and reliability of complicated systems of equipment (e.g., spacecraft or factories).
Comparative analysis of techniques for evaluating the effectiveness of aircraft computing systems
NASA Technical Reports Server (NTRS)
Hitt, E. F.; Bridgman, M. S.; Robinson, A. C.
1981-01-01
Performability analysis is a technique developed for evaluating the effectiveness of fault-tolerant computing systems in multiphase missions. Performability was evaluated for its accuracy, practical usefulness, and relative cost. The evaluation was performed by applying performability and the fault tree method to a set of sample problems ranging from simple to moderately complex. The problems involved as many as five outcomes, two to five mission phases, permanent faults, and some functional dependencies. Transient faults and software errors were not considered. A different analyst was responsible for each technique. Significantly more time and effort were required to learn performability analysis than the fault tree method. Performability is inherently as accurate as fault tree analysis. For the sample problems, fault trees were more practical and less time consuming to apply, while performability required less ingenuity and was more checkable. Performability offers some advantages for evaluating very complex problems.
Graphical workstation capability for reliability modeling
NASA Technical Reports Server (NTRS)
Bavuso, Salvatore J.; Koppen, Sandra V.; Haley, Pamela J.
1992-01-01
In addition to computational capabilities, software tools for estimating the reliability of fault-tolerant digital computer systems must also provide a means of interfacing with the user. Described here is the new graphical interface capability of the hybrid automated reliability predictor (HARP), a software package that implements advanced reliability modeling techniques. The graphics oriented (GO) module provides the user with a graphical language for modeling system failure modes through the selection of various fault-tree gates, including sequence-dependency gates, or by a Markov chain. By using this graphical input language, a fault tree becomes a convenient notation for describing a system. In accounting for any sequence dependencies, HARP converts the fault-tree notation to a complex stochastic process that is reduced to a Markov chain, which it can then solve for system reliability. The graphics capability is available for use on an IBM-compatible PC, a Sun, and a VAX workstation. The GO module is written in the C programming language and uses the graphical kernal system (GKS) standard for graphics implementation. The PC, VAX, and Sun versions of the HARP GO module are currently in beta-testing stages.
Logic flowgraph methodology - A tool for modeling embedded systems
NASA Technical Reports Server (NTRS)
Muthukumar, C. T.; Guarro, S. B.; Apostolakis, G. E.
1991-01-01
The logic flowgraph methodology (LFM), a method for modeling hardware in terms of its process parameters, has been extended to form an analytical tool for the analysis of integrated (hardware/software) embedded systems. In the software part of a given embedded system model, timing and the control flow among different software components are modeled by augmenting LFM with modified Petrinet structures. The objective of the use of such an augmented LFM model is to uncover possible errors and the potential for unanticipated software/hardware interactions. This is done by backtracking through the augmented LFM mode according to established procedures which allow the semiautomated construction of fault trees for any chosen state of the embedded system (top event). These fault trees, in turn, produce the possible combinations of lower-level states (events) that may lead to the top event.
Using certification trails to achieve software fault tolerance
NASA Technical Reports Server (NTRS)
Sullivan, Gregory F.; Masson, Gerald M.
1993-01-01
A conceptually novel and powerful technique to achieve fault tolerance in hardware and software systems is introduced. When used for software fault tolerance, this new technique uses time and software redundancy and can be outlined as follows. In the initial phase, a program is run to solve a problem and store the result. In addition, this program leaves behind a trail of data called a certification trail. In the second phase, another program is run which solves the original problem again. This program, however, has access to the certification trail left by the first program. Because of the availability of the certification trail, the second phase can be performed by a less complex program and can execute more quickly. In the final phase, the two results are accepted as correct; otherwise an error is indicated. An essential aspect of this approach is that the second program must always generate either an error indication or a correct output even when the certification trail it receives from the first program is incorrect. The certification trail approach to fault tolerance was formalized and it was illustrated by applying it to the fundamental problem of finding a minimum spanning tree. Cases in which the second phase can be run concorrectly with the first and act as a monitor are discussed. The certification trail approach was compared to other approaches to fault tolerance. Because of space limitations we have omitted examples of our technique applied to the Huffman tree, and convex hull problems. These can be found in the full version of this paper.
Quantitative method of medication system interface evaluation.
Pingenot, Alleene Anne; Shanteau, James; Pingenot, James D F
2007-01-01
The objective of this study was to develop a quantitative method of evaluating the user interface for medication system software. A detailed task analysis provided a description of user goals and essential activity. A structural fault analysis was used to develop a detailed description of the system interface. Nurses experienced with use of the system under evaluation provided estimates of failure rates for each point in this simplified fault tree. Means of estimated failure rates provided quantitative data for fault analysis. Authors note that, although failures of steps in the program were frequent, participants reported numerous methods of working around these failures so that overall system failure was rare. However, frequent process failure can affect the time required for processing medications, making a system inefficient. This method of interface analysis, called Software Efficiency Evaluation and Fault Identification Method, provides quantitative information with which prototypes can be compared and problems within an interface identified.
Development of a methodology for assessing the safety of embedded software systems
NASA Technical Reports Server (NTRS)
Garrett, C. J.; Guarro, S. B.; Apostolakis, G. E.
1993-01-01
A Dynamic Flowgraph Methodology (DFM) based on an integrated approach to modeling and analyzing the behavior of software-driven embedded systems for assessing and verifying reliability and safety is discussed. DFM is based on an extension of the Logic Flowgraph Methodology to incorporate state transition models. System models which express the logic of the system in terms of causal relationships between physical variables and temporal characteristics of software modules are analyzed to determine how a certain state can be reached. This is done by developing timed fault trees which take the form of logical combinations of static trees relating the system parameters at different point in time. The resulting information concerning the hardware and software states can be used to eliminate unsafe execution paths and identify testing criteria for safety critical software functions.
Methodology for Designing Fault-Protection Software
NASA Technical Reports Server (NTRS)
Barltrop, Kevin; Levison, Jeffrey; Kan, Edwin
2006-01-01
A document describes a methodology for designing fault-protection (FP) software for autonomous spacecraft. The methodology embodies and extends established engineering practices in the technical discipline of Fault Detection, Diagnosis, Mitigation, and Recovery; and has been successfully implemented in the Deep Impact Spacecraft, a NASA Discovery mission. Based on established concepts of Fault Monitors and Responses, this FP methodology extends the notion of Opinion, Symptom, Alarm (aka Fault), and Response with numerous new notions, sub-notions, software constructs, and logic and timing gates. For example, Monitor generates a RawOpinion, which graduates into Opinion, categorized into no-opinion, acceptable, or unacceptable opinion. RaiseSymptom, ForceSymptom, and ClearSymptom govern the establishment and then mapping to an Alarm (aka Fault). Local Response is distinguished from FP System Response. A 1-to-n and n-to- 1 mapping is established among Monitors, Symptoms, and Responses. Responses are categorized by device versus by function. Responses operate in tiers, where the early tiers attempt to resolve the Fault in a localized step-by-step fashion, relegating more system-level response to later tier(s). Recovery actions are gated by epoch recovery timing, enabling strategy, urgency, MaxRetry gate, hardware availability, hazardous versus ordinary fault, and many other priority gates. This methodology is systematic, logical, and uses multiple linked tables, parameter files, and recovery command sequences. The credibility of the FP design is proven via a fault-tree analysis "top-down" approach, and a functional fault-mode-effects-and-analysis via "bottoms-up" approach. Via this process, the mitigation and recovery strategy(s) per Fault Containment Region scope (width versus depth) the FP architecture.
Integrated Approach To Design And Analysis Of Systems
NASA Technical Reports Server (NTRS)
Patterson-Hine, F. A.; Iverson, David L.
1993-01-01
Object-oriented fault-tree representation unifies evaluation of reliability and diagnosis of faults. Programming/fault tree described more fully in "Object-Oriented Algorithm For Evaluation Of Fault Trees" (ARC-12731). Augmented fault tree object contains more information than fault tree object used in quantitative analysis of reliability. Additional information needed to diagnose faults in system represented by fault tree.
Model authoring system for fail safe analysis
NASA Technical Reports Server (NTRS)
Sikora, Scott E.
1990-01-01
The Model Authoring System is a prototype software application for generating fault tree analyses and failure mode and effects analyses for circuit designs. Utilizing established artificial intelligence and expert system techniques, the circuits are modeled as a frame-based knowledge base in an expert system shell, which allows the use of object oriented programming and an inference engine. The behavior of the circuit is then captured through IF-THEN rules, which then are searched to generate either a graphical fault tree analysis or failure modes and effects analysis. Sophisticated authoring techniques allow the circuit to be easily modeled, permit its behavior to be quickly defined, and provide abstraction features to deal with complexity.
[The Application of the Fault Tree Analysis Method in Medical Equipment Maintenance].
Liu, Hongbin
2015-11-01
In this paper, the traditional fault tree analysis method is presented, detailed instructions for its application characteristics in medical instrument maintenance is made. It is made significant changes when the traditional fault tree analysis method is introduced into the medical instrument maintenance: gave up the logic symbolic, logic analysis and calculation, gave up its complicated programs, and only keep its image and practical fault tree diagram, and the fault tree diagram there are also differences: the fault tree is no longer a logical tree but the thinking tree in troubleshooting, the definition of the fault tree's nodes is different, the composition of the fault tree's branches is also different.
NASA Technical Reports Server (NTRS)
Lee, Charles; Alena, Richard L.; Robinson, Peter
2004-01-01
We started from ISS fault trees example to migrate to decision trees, presented a method to convert fault trees to decision trees. The method shows that the visualizations of root cause of fault are easier and the tree manipulating becomes more programmatic via available decision tree programs. The visualization of decision trees for the diagnostic shows a format of straight forward and easy understands. For ISS real time fault diagnostic, the status of the systems could be shown by mining the signals through the trees and see where it stops at. The other advantage to use decision trees is that the trees can learn the fault patterns and predict the future fault from the historic data. The learning is not only on the static data sets but also can be online, through accumulating the real time data sets, the decision trees can gain and store faults patterns in the trees and recognize them when they come.
49 CFR Appendix D to Part 236 - Independent Review of Verification and Validation
Code of Federal Regulations, 2010 CFR
2010-10-01
... standards. (f) The reviewer shall analyze all Fault Tree Analyses (FTA), Failure Mode and Effects... for each product vulnerability cited by the reviewer; (4) Identification of any documentation or... not properly followed; (6) Identification of the software verification and validation procedures, as...
1983-04-01
tolerances or spaci - able assets diagnostic/fault ness float fications isolation devices Operation of cannibalL- zation point Why Sustain materiel...with diagnostic software based on "fault tree " representation of the M65 ThS) to bridge the gap in diagnostics capability was demonstrated in 1980 and... identification friend or foe) which has much lower reliability than TSQ-73 peculiar hardware). Thus, as in other examples, reported readiness does not reflect
Development of a Software Safety Process and a Case Study of Its Use
NASA Technical Reports Server (NTRS)
Knight, J. C.
1996-01-01
Research in the year covered by this reporting period has been primarily directed toward: continued development of mock-ups of computer screens for operator of a digital reactor control system; development of a reactor simulation to permit testing of various elements of the control system; formal specification of user interfaces; fault-tree analysis including software; evaluation of formal verification techniques; and continued development of a software documentation system. Technical results relating to this grant and the remainder of the principal investigator's research program are contained in various reports and papers.
Automatic translation of digraph to fault-tree models
NASA Technical Reports Server (NTRS)
Iverson, David L.
1992-01-01
The author presents a technique for converting digraph models, including those models containing cycles, to a fault-tree format. A computer program which automatically performs this translation using an object-oriented representation of the models has been developed. The fault-trees resulting from translations can be used for fault-tree analysis and diagnosis. Programs to calculate fault-tree and digraph cut sets and perform diagnosis with fault-tree models have also been developed. The digraph to fault-tree translation system has been successfully tested on several digraphs of varying size and complexity. Details of some representative translation problems are presented. Most of the computation performed by the program is dedicated to finding minimal cut sets for digraph nodes in order to break cycles in the digraph. Fault-trees produced by the translator have been successfully used with NASA's Fault-Tree Diagnosis System (FTDS) to produce automated diagnostic systems.
Application Research of Fault Tree Analysis in Grid Communication System Corrective Maintenance
NASA Astrophysics Data System (ADS)
Wang, Jian; Yang, Zhenwei; Kang, Mei
2018-01-01
This paper attempts to apply the fault tree analysis method to the corrective maintenance field of grid communication system. Through the establishment of the fault tree model of typical system and the engineering experience, the fault tree analysis theory is used to analyze the fault tree model, which contains the field of structural function, probability importance and so on. The results show that the fault tree analysis can realize fast positioning and well repairing of the system. Meanwhile, it finds that the analysis method of fault tree has some guiding significance to the reliability researching and upgrading f the system.
Using Combined SFTA and SFMECA Techniques for Space Critical Software
NASA Astrophysics Data System (ADS)
Nicodemos, F. G.; Lahoz, C. H. N.; Abdala, M. A. D.; Saotome, O.
2012-01-01
This work addresses the combined Software Fault Tree Analysis (SFTA) and Software Failure Modes, Effects and Criticality Analysis (SFMECA) techniques applied to space critical software of satellite launch vehicles. The combined approach is under research as part of the Verification and Validation (V&V) efforts to increase software dependability and as future application in other projects under development at Instituto de Aeronáutica e Espaço (IAE). The applicability of such approach was conducted on system software specification and applied to a case study based on the Brazilian Satellite Launcher (VLS). The main goal is to identify possible failure causes and obtain compensating provisions that lead to inclusion of new functional and non-functional system software requirements.
Fault Tree in the Trenches, A Success Story
NASA Technical Reports Server (NTRS)
Long, R. Allen; Goodson, Amanda (Technical Monitor)
2000-01-01
Getting caught up in the explanation of Fault Tree Analysis (FTA) minutiae is easy. In fact, most FTA literature tends to address FTA concepts and methodology. Yet there seems to be few articles addressing actual design changes resulting from the successful application of fault tree analysis. This paper demonstrates how fault tree analysis was used to identify and solve a potentially catastrophic mechanical problem at a rocket motor manufacturer. While developing the fault tree given in this example, the analyst was told by several organizations that the piece of equipment in question had been evaluated by several committees and organizations, and that the analyst was wasting his time. The fault tree/cutset analysis resulted in a joint-redesign of the control system by the tool engineering group and the fault tree analyst, as well as bragging rights for the analyst. (That the fault tree found problems where other engineering reviews had failed was not lost on the other engineering groups.) Even more interesting was that this was the analyst's first fault tree which further demonstrates how effective fault tree analysis can be in guiding (i.e., forcing) the analyst to take a methodical approach in evaluating complex systems.
Code of Federal Regulations, 2010 CFR
2010-10-01
..., national, or international standards. (f) The reviewer shall analyze all Fault Tree Analyses (FTA), Failure... cited by the reviewer; (4) Identification of any documentation or information sought by the reviewer...) Identification of the hardware and software verification and validation procedures for the PTC system's safety...
Technology transfer by means of fault tree synthesis
NASA Astrophysics Data System (ADS)
Batzias, Dimitris F.
2012-12-01
Since Fault Tree Analysis (FTA) attempts to model and analyze failure processes of engineering, it forms a common technique for good industrial practice. On the contrary, fault tree synthesis (FTS) refers to the methodology of constructing complex trees either from dentritic modules built ad hoc or from fault tress already used and stored in a Knowledge Base. In both cases, technology transfer takes place in a quasi-inductive mode, from partial to holistic knowledge. In this work, an algorithmic procedure, including 9 activity steps and 3 decision nodes is developed for performing effectively this transfer when the fault under investigation occurs within one of the latter stages of an industrial procedure with several stages in series. The main parts of the algorithmic procedure are: (i) the construction of a local fault tree within the corresponding production stage, where the fault has been detected, (ii) the formation of an interface made of input faults that might occur upstream, (iii) the fuzzy (to count for uncertainty) multicriteria ranking of these faults according to their significance, and (iv) the synthesis of an extended fault tree based on the construction of part (i) and on the local fault tree of the first-ranked fault in part (iii). An implementation is presented, referring to 'uneven sealing of Al anodic film', thus proving the functionality of the developed methodology.
Faults Discovery By Using Mined Data
NASA Technical Reports Server (NTRS)
Lee, Charles
2005-01-01
Fault discovery in the complex systems consist of model based reasoning, fault tree analysis, rule based inference methods, and other approaches. Model based reasoning builds models for the systems either by mathematic formulations or by experiment model. Fault Tree Analysis shows the possible causes of a system malfunction by enumerating the suspect components and their respective failure modes that may have induced the problem. The rule based inference build the model based on the expert knowledge. Those models and methods have one thing in common; they have presumed some prior-conditions. Complex systems often use fault trees to analyze the faults. Fault diagnosis, when error occurs, is performed by engineers and analysts performing extensive examination of all data gathered during the mission. International Space Station (ISS) control center operates on the data feedback from the system and decisions are made based on threshold values by using fault trees. Since those decision-making tasks are safety critical and must be done promptly, the engineers who manually analyze the data are facing time challenge. To automate this process, this paper present an approach that uses decision trees to discover fault from data in real-time and capture the contents of fault trees as the initial state of the trees.
Software-implemented fault insertion: An FTMP example
NASA Technical Reports Server (NTRS)
Czeck, Edward W.; Siewiorek, Daniel P.; Segall, Zary Z.
1987-01-01
This report presents a model for fault insertion through software; describes its implementation on a fault-tolerant computer, FTMP; presents a summary of fault detection, identification, and reconfiguration data collected with software-implemented fault insertion; and compares the results to hardware fault insertion data. Experimental results show detection time to be a function of time of insertion and system workload. For the fault detection time, there is no correlation between software-inserted faults and hardware-inserted faults; this is because hardware-inserted faults must manifest as errors before detection, whereas software-inserted faults immediately exercise the error detection mechanisms. In summary, the software-implemented fault insertion is able to be used as an evaluation technique for the fault-handling capabilities of a system in fault detection, identification and recovery. Although the software-inserted faults do not map directly to hardware-inserted faults, experiments show software-implemented fault insertion is capable of emulating hardware fault insertion, with greater ease and automation.
Trade Studies of Space Launch Architectures using Modular Probabilistic Risk Analysis
NASA Technical Reports Server (NTRS)
Mathias, Donovan L.; Go, Susie
2006-01-01
A top-down risk assessment in the early phases of space exploration architecture development can provide understanding and intuition of the potential risks associated with new designs and technologies. In this approach, risk analysts draw from their past experience and the heritage of similar existing systems as a source for reliability data. This top-down approach captures the complex interactions of the risk driving parts of the integrated system without requiring detailed knowledge of the parts themselves, which is often unavailable in the early design stages. Traditional probabilistic risk analysis (PRA) technologies, however, suffer several drawbacks that limit their timely application to complex technology development programs. The most restrictive of these is a dependence on static planning scenarios, expressed through fault and event trees. Fault trees incorporating comprehensive mission scenarios are routinely constructed for complex space systems, and several commercial software products are available for evaluating fault statistics. These static representations cannot capture the dynamic behavior of system failures without substantial modification of the initial tree. Consequently, the development of dynamic models using fault tree analysis has been an active area of research in recent years. This paper discusses the implementation and demonstration of dynamic, modular scenario modeling for integration of subsystem fault evaluation modules using the Space Architecture Failure Evaluation (SAFE) tool. SAFE is a C++ code that was originally developed to support NASA s Space Launch Initiative. It provides a flexible framework for system architecture definition and trade studies. SAFE supports extensible modeling of dynamic, time-dependent risk drivers of the system and functions at the level of fidelity for which design and failure data exists. The approach is scalable, allowing inclusion of additional information as detailed data becomes available. The tool performs a Monte Carlo analysis to provide statistical estimates. Example results of an architecture system reliability study are summarized for an exploration system concept using heritage data from liquid-fueled expendable Saturn V/Apollo launch vehicles.
Fault trees and sequence dependencies
NASA Technical Reports Server (NTRS)
Dugan, Joanne Bechta; Boyd, Mark A.; Bavuso, Salvatore J.
1990-01-01
One of the frequently cited shortcomings of fault-tree models, their inability to model so-called sequence dependencies, is discussed. Several sources of such sequence dependencies are discussed, and new fault-tree gates to capture this behavior are defined. These complex behaviors can be included in present fault-tree models because they utilize a Markov solution. The utility of the new gates is demonstrated by presenting several models of the fault-tolerant parallel processor, which include both hot and cold spares.
McElroy, Lisa M; Khorzad, Rebeca; Rowe, Theresa A; Abecassis, Zachary A; Apley, Daniel W; Barnard, Cynthia; Holl, Jane L
The purpose of this study was to use fault tree analysis to evaluate the adequacy of quality reporting programs in identifying root causes of postoperative bloodstream infection (BSI). A systematic review of the literature was used to construct a fault tree to evaluate 3 postoperative BSI reporting programs: National Surgical Quality Improvement Program (NSQIP), Centers for Medicare and Medicaid Services (CMS), and The Joint Commission (JC). The literature review revealed 699 eligible publications, 90 of which were used to create the fault tree containing 105 faults. A total of 14 identified faults are currently mandated for reporting to NSQIP, 5 to CMS, and 3 to JC; 2 or more programs require 4 identified faults. The fault tree identifies numerous contributing faults to postoperative BSI and reveals substantial variation in the requirements and ability of national quality data reporting programs to capture these potential faults. Efforts to prevent postoperative BSI require more comprehensive data collection to identify the root causes and develop high-reliability improvement strategies.
A dynamic fault tree model of a propulsion system
NASA Technical Reports Server (NTRS)
Xu, Hong; Dugan, Joanne Bechta; Meshkat, Leila
2006-01-01
We present a dynamic fault tree model of the benchmark propulsion system, and solve it using Galileo. Dynamic fault trees (DFT) extend traditional static fault trees with special gates to model spares and other sequence dependencies. Galileo solves DFT models using a judicious combination of automatically generated Markov and Binary Decision Diagram models. Galileo easily handles the complexities exhibited by the benchmark problem. In particular, Galileo is designed to model phased mission systems.
Chen, Yingyi; Zhen, Zhumi; Yu, Huihui; Xu, Jing
2017-01-14
In the Internet of Things (IoT) equipment used for aquaculture is often deployed in outdoor ponds located in remote areas. Faults occur frequently in these tough environments and the staff generally lack professional knowledge and pay a low degree of attention in these areas. Once faults happen, expert personnel must carry out maintenance outdoors. Therefore, this study presents an intelligent method for fault diagnosis based on fault tree analysis and a fuzzy neural network. In the proposed method, first, the fault tree presents a logic structure of fault symptoms and faults. Second, rules extracted from the fault trees avoid duplicate and redundancy. Third, the fuzzy neural network is applied to train the relationship mapping between fault symptoms and faults. In the aquaculture IoT, one fault can cause various fault symptoms, and one symptom can be caused by a variety of faults. Four fault relationships are obtained. Results show that one symptom-to-one fault, two symptoms-to-two faults, and two symptoms-to-one fault relationships can be rapidly diagnosed with high precision, while one symptom-to-two faults patterns perform not so well, but are still worth researching. This model implements diagnosis for most kinds of faults in the aquaculture IoT.
Chen, Yingyi; Zhen, Zhumi; Yu, Huihui; Xu, Jing
2017-01-01
In the Internet of Things (IoT) equipment used for aquaculture is often deployed in outdoor ponds located in remote areas. Faults occur frequently in these tough environments and the staff generally lack professional knowledge and pay a low degree of attention in these areas. Once faults happen, expert personnel must carry out maintenance outdoors. Therefore, this study presents an intelligent method for fault diagnosis based on fault tree analysis and a fuzzy neural network. In the proposed method, first, the fault tree presents a logic structure of fault symptoms and faults. Second, rules extracted from the fault trees avoid duplicate and redundancy. Third, the fuzzy neural network is applied to train the relationship mapping between fault symptoms and faults. In the aquaculture IoT, one fault can cause various fault symptoms, and one symptom can be caused by a variety of faults. Four fault relationships are obtained. Results show that one symptom-to-one fault, two symptoms-to-two faults, and two symptoms-to-one fault relationships can be rapidly diagnosed with high precision, while one symptom-to-two faults patterns perform not so well, but are still worth researching. This model implements diagnosis for most kinds of faults in the aquaculture IoT. PMID:28098822
Reliability computation using fault tree analysis
NASA Technical Reports Server (NTRS)
Chelson, P. O.
1971-01-01
A method is presented for calculating event probabilities from an arbitrary fault tree. The method includes an analytical derivation of the system equation and is not a simulation program. The method can handle systems that incorporate standby redundancy and it uses conditional probabilities for computing fault trees where the same basic failure appears in more than one fault path.
Object-oriented fault tree evaluation program for quantitative analyses
NASA Technical Reports Server (NTRS)
Patterson-Hine, F. A.; Koen, B. V.
1988-01-01
Object-oriented programming can be combined with fault free techniques to give a significantly improved environment for evaluating the safety and reliability of large complex systems for space missions. Deep knowledge about system components and interactions, available from reliability studies and other sources, can be described using objects that make up a knowledge base. This knowledge base can be interrogated throughout the design process, during system testing, and during operation, and can be easily modified to reflect design changes in order to maintain a consistent information source. An object-oriented environment for reliability assessment has been developed on a Texas Instrument (TI) Explorer LISP workstation. The program, which directly evaluates system fault trees, utilizes the object-oriented extension to LISP called Flavors that is available on the Explorer. The object representation of a fault tree facilitates the storage and retrieval of information associated with each event in the tree, including tree structural information and intermediate results obtained during the tree reduction process. Reliability data associated with each basic event are stored in the fault tree objects. The object-oriented environment on the Explorer also includes a graphical tree editor which was modified to display and edit the fault trees.
NASA Technical Reports Server (NTRS)
Martensen, Anna L.; Butler, Ricky W.
1987-01-01
The Fault Tree Compiler Program is a new reliability tool used to predict the top event probability for a fault tree. Five different gate types are allowed in the fault tree: AND, OR, EXCLUSIVE OR, INVERT, and M OF N gates. The high level input language is easy to understand and use when describing the system tree. In addition, the use of the hierarchical fault tree capability can simplify the tree description and decrease program execution time. The current solution technique provides an answer precise (within the limits of double precision floating point arithmetic) to the five digits in the answer. The user may vary one failure rate or failure probability over a range of values and plot the results for sensitivity analyses. The solution technique is implemented in FORTRAN; the remaining program code is implemented in Pascal. The program is written to run on a Digital Corporation VAX with the VMS operation system.
The Fault Tree Compiler (FTC): Program and mathematics
NASA Technical Reports Server (NTRS)
Butler, Ricky W.; Martensen, Anna L.
1989-01-01
The Fault Tree Compiler Program is a new reliability tool used to predict the top-event probability for a fault tree. Five different gate types are allowed in the fault tree: AND, OR, EXCLUSIVE OR, INVERT, AND m OF n gates. The high-level input language is easy to understand and use when describing the system tree. In addition, the use of the hierarchical fault tree capability can simplify the tree description and decrease program execution time. The current solution technique provides an answer precisely (within the limits of double precision floating point arithmetic) within a user specified number of digits accuracy. The user may vary one failure rate or failure probability over a range of values and plot the results for sensitivity analyses. The solution technique is implemented in FORTRAN; the remaining program code is implemented in Pascal. The program is written to run on a Digital Equipment Corporation (DEC) VAX computer with the VMS operation system.
Systems Theoretic Process Analysis Applied to an Offshore Supply Vessel Dynamic Positioning System
2016-06-01
additional safety issues that were either not identified or inadequately mitigated through the use of Fault Tree Analysis and Failure Modes and...Techniques ...................................................................................................... 15 1.3.1. Fault Tree Analysis...49 3.2. Fault Tree Analysis Comparison
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sattison, M.B.; Schroeder, J.A.; Russell, K.D.
The Idaho National Engineering Laboratory (INEL) over the past year has created 75 plant-specific Accident Sequence Precursor (ASP) models using the SAPHIRE suite of PRA codes. Along with the new models, the INEL has also developed a new module for SAPHIRE which is tailored specifically to the unique needs of ASP evaluations. These models and software will be the next generation of risk tools for the evaluation of accident precursors by both NRR and AEOD. This paper presents an overview of the models and software. Key characteristics include: (1) classification of the plant models according to plant response with amore » unique set of event trees for each plant class, (2) plant-specific fault trees using supercomponents, (3) generation and retention of all system and sequence cutsets, (4) full flexibility in modifying logic, regenerating cutsets, and requantifying results, and (5) user interface for streamlined evaluation of ASP events.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sattison, M.B.; Schroeder, J.A.; Russell, K.D.
The Idaho National Engineering Laboratory (INEL) over the past year has created 75 plant-specific Accident Sequence Precursor (ASP) models using the SAPHIRE suite of PRA codes. Along with the new models, the INEL has also developed a new module for SAPHIRE which is tailored specifically to the unique needs of conditional core damage probability (CCDP) evaluations. These models and software will be the next generation of risk tools for the evaluation of accident precursors by both NRR and AEOD. This paper presents an overview of the models and software. Key characteristics include: (1) classification of the plant models according tomore » plant response with a unique set of event trees for each plant class, (2) plant-specific fault trees using supercomponents, (3) generation and retention of all system and sequence cutsets, (4) full flexibility in modifying logic, regenerating cutsets, and requantifying results, and (5) user interface for streamlined evaluation of ASP events.« less
An overview of the phase-modular fault tree approach to phased mission system analysis
NASA Technical Reports Server (NTRS)
Meshkat, L.; Xing, L.; Donohue, S. K.; Ou, Y.
2003-01-01
We look at how fault tree analysis (FTA), a primary means of performing reliability analysis of PMS, can meet this challenge in this paper by presenting an overview of the modular approach to solving fault trees that represent PMS.
Try Fault Tree Analysis, a Step-by-Step Way to Improve Organization Development.
ERIC Educational Resources Information Center
Spitzer, Dean
1980-01-01
Fault Tree Analysis, a systems safety engineering technology used to analyze organizational systems, is described. Explains the use of logic gates to represent the relationship between failure events, qualitative analysis, quantitative analysis, and effective use of Fault Tree Analysis. (CT)
Fault Tree Analysis: A Research Tool for Educational Planning. Technical Report No. 1.
ERIC Educational Resources Information Center
Alameda County School Dept., Hayward, CA. PACE Center.
This ESEA Title III report describes fault tree analysis and assesses its applicability to education. Fault tree analysis is an operations research tool which is designed to increase the probability of success in any system by analyzing the most likely modes of failure that could occur. A graphic portrayal, which has the form of a tree, is…
Review: Evaluation of Foot-and-Mouth Disease Control Using Fault Tree Analysis.
Isoda, N; Kadohira, M; Sekiguchi, S; Schuppers, M; Stärk, K D C
2015-06-01
An outbreak of foot-and-mouth disease (FMD) causes huge economic losses and animal welfare problems. Although much can be learnt from past FMD outbreaks, several countries are not satisfied with their degree of contingency planning and aiming at more assurance that their control measures will be effective. The purpose of the present article was to develop a generic fault tree framework for the control of an FMD outbreak as a basis for systematic improvement and refinement of control activities and general preparedness. Fault trees are typically used in engineering to document pathways that can lead to an undesired event, that is, ineffective FMD control. The fault tree method allows risk managers to identify immature parts of the control system and to analyse the events or steps that will most probably delay rapid and effective disease control during a real outbreak. The present developed fault tree is generic and can be tailored to fit the specific needs of countries. For instance, the specific fault tree for the 2001 FMD outbreak in the UK was refined based on control weaknesses discussed in peer-reviewed articles. Furthermore, the specific fault tree based on the 2001 outbreak was applied to the subsequent FMD outbreak in 2007 to assess the refinement of control measures following the earlier, major outbreak. The FMD fault tree can assist risk managers to develop more refined and adequate control activities against FMD outbreaks and to find optimum strategies for rapid control. Further application using the current tree will be one of the basic measures for FMD control worldwide. © 2013 Blackwell Verlag GmbH.
The weakest t-norm based intuitionistic fuzzy fault-tree analysis to evaluate system reliability.
Kumar, Mohit; Yadav, Shiv Prasad
2012-07-01
In this paper, a new approach of intuitionistic fuzzy fault-tree analysis is proposed to evaluate system reliability and to find the most critical system component that affects the system reliability. Here weakest t-norm based intuitionistic fuzzy fault tree analysis is presented to calculate fault interval of system components from integrating expert's knowledge and experience in terms of providing the possibility of failure of bottom events. It applies fault-tree analysis, α-cut of intuitionistic fuzzy set and T(ω) (the weakest t-norm) based arithmetic operations on triangular intuitionistic fuzzy sets to obtain fault interval and reliability interval of the system. This paper also modifies Tanaka et al.'s fuzzy fault-tree definition. In numerical verification, a malfunction of weapon system "automatic gun" is presented as a numerical example. The result of the proposed method is compared with the listing approaches of reliability analysis methods. Copyright © 2012 ISA. Published by Elsevier Ltd. All rights reserved.
NASA Technical Reports Server (NTRS)
Brunelle, J. E.; Eckhardt, D. E., Jr.
1985-01-01
Results are presented of an experiment conducted in the NASA Avionics Integrated Research Laboratory (AIRLAB) to investigate the implementation of fault-tolerant software techniques on fault-tolerant computer architectures, in particular the Software Implemented Fault Tolerance (SIFT) computer. The N-version programming and recovery block techniques were implemented on a portion of the SIFT operating system. The results indicate that, to effectively implement fault-tolerant software design techniques, system requirements will be impacted and suggest that retrofitting fault-tolerant software on existing designs will be inefficient and may require system modification.
Fault Tree Based Diagnosis with Optimal Test Sequencing for Field Service Engineers
NASA Technical Reports Server (NTRS)
Iverson, David L.; George, Laurence L.; Patterson-Hine, F. A.; Lum, Henry, Jr. (Technical Monitor)
1994-01-01
When field service engineers go to customer sites to service equipment, they want to diagnose and repair failures quickly and cost effectively. Symptoms exhibited by failed equipment frequently suggest several possible causes which require different approaches to diagnosis. This can lead the engineer to follow several fruitless paths in the diagnostic process before they find the actual failure. To assist in this situation, we have developed the Fault Tree Diagnosis and Optimal Test Sequence (FTDOTS) software system that performs automated diagnosis and ranks diagnostic hypotheses based on failure probability and the time or cost required to isolate and repair each failure. FTDOTS first finds a set of possible failures that explain exhibited symptoms by using a fault tree reliability model as a diagnostic knowledge to rank the hypothesized failures based on how likely they are and how long it would take or how much it would cost to isolate and repair them. This ordering suggests an optimal sequence for the field service engineer to investigate the hypothesized failures in order to minimize the time or cost required to accomplish the repair task. Previously, field service personnel would arrive at the customer site and choose which components to investigate based on past experience and service manuals. Using FTDOTS running on a portable computer, they can now enter a set of symptoms and get a list of possible failures ordered in an optimal test sequence to help them in their decisions. If facilities are available, the field engineer can connect the portable computer to the malfunctioning device for automated data gathering. FTDOTS is currently being applied to field service of medical test equipment. The techniques are flexible enough to use for many different types of devices. If a fault tree model of the equipment and information about component failure probabilities and isolation times or costs are available, a diagnostic knowledge base for that device can be developed easily.
Factors That Affect Software Testability
NASA Technical Reports Server (NTRS)
Voas, Jeffrey M.
1991-01-01
Software faults that infrequently affect software's output are dangerous. When a software fault causes frequent software failures, testing is likely to reveal the fault before the software is releases; when the fault remains undetected during testing, it can cause disaster after the software is installed. A technique for predicting whether a particular piece of software is likely to reveal faults within itself during testing is found in [Voas91b]. A piece of software that is likely to reveal faults within itself during testing is said to have high testability. A piece of software that is not likely to reveal faults within itself during testing is said to have low testability. It is preferable to design software with higher testabilities from the outset, i.e., create software with as high of a degree of testability as possible to avoid the problems of having undetected faults that are associated with low testability. Information loss is a phenomenon that occurs during program execution that increases the likelihood that a fault will remain undetected. In this paper, I identify two brad classes of information loss, define them, and suggest ways of predicting the potential for information loss to occur. We do this in order to decrease the likelihood that faults will remain undetected during testing.
NASA Technical Reports Server (NTRS)
Guarro, Sergio B.
2010-01-01
This report validates and documents the detailed features and practical application of the framework for software intensive digital systems risk assessment and risk-informed safety assurance presented in the NASA PRA Procedures Guide for Managers and Practitioner. This framework, called herein the "Context-based Software Risk Model" (CSRM), enables the assessment of the contribution of software and software-intensive digital systems to overall system risk, in a manner which is entirely compatible and integrated with the format of a "standard" Probabilistic Risk Assessment (PRA), as currently documented and applied for NASA missions and applications. The CSRM also provides a risk-informed path and criteria for conducting organized and systematic digital system and software testing so that, within this risk-informed paradigm, the achievement of a quantitatively defined level of safety and mission success assurance may be targeted and demonstrated. The framework is based on the concept of context-dependent software risk scenarios and on the modeling of such scenarios via the use of traditional PRA techniques - i.e., event trees and fault trees - in combination with more advanced modeling devices such as the Dynamic Flowgraph Methodology (DFM) or other dynamic logic-modeling representations. The scenarios can be synthesized and quantified in a conditional logic and probabilistic formulation. The application of the CSRM method documented in this report refers to the MiniAERCam system designed and developed by the NASA Johnson Space Center.
Fault tree models for fault tolerant hypercube multiprocessors
NASA Technical Reports Server (NTRS)
Boyd, Mark A.; Tuazon, Jezus O.
1991-01-01
Three candidate fault tolerant hypercube architectures are modeled, their reliability analyses are compared, and the resulting implications of these methods of incorporating fault tolerance into hypercube multiprocessors are discussed. In the course of performing the reliability analyses, the use of HARP and fault trees in modeling sequence dependent system behaviors is demonstrated.
Product Support Manager Guidebook
2011-04-01
package is being developed using supportability analysis concepts such as Failure Mode, Effects and Criticality Analysis (FMECA), Fault Tree Analysis ( FTA ...Analysis (LORA) Condition Based Maintenance + (CBM+) Fault Tree Analysis ( FTA ) Failure Mode, Effects, and Criticality Analysis (FMECA) Maintenance Task...Reporting and Corrective Action System (FRACAS), Fault Tree Analysis ( FTA ), Level of Repair Analysis (LORA), Maintenance Task Analysis (MTA
MIRAP, microcomputer reliability analysis program
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jehee, J.N.T.
1989-01-01
A program for a microcomputer is outlined that can determine minimal cut sets from a specified fault tree logic. The speed and memory limitations of the microcomputers on which the program is implemented (Atari ST and IBM) are addressed by reducing the fault tree's size and by storing the cut set data on disk. Extensive well proven fault tree restructuring techniques, such as the identification of sibling events and of independent gate events, reduces the fault tree's size but does not alter its logic. New methods are used for the Boolean reduction of the fault tree logic. Special criteria formore » combining events in the 'AND' and 'OR' logic avoid the creation of many subsuming cut sets which all would cancel out due to existing cut sets. Figures and tables illustrates these methods. 4 refs., 5 tabs.« less
The FTA Method And A Possibility Of Its Application In The Area Of Road Freight Transport
NASA Astrophysics Data System (ADS)
Poliaková, Adela
2015-06-01
The Fault Tree process utilizes logic diagrams to portray and analyse potentially hazardous events. Three basic symbols (logic gates) are adequate for diagramming any fault tree. However, additional recently developed symbols can be used to reduce the time and effort required for analysis. A fault tree is a graphical representation of the relationship between certain specific events and the ultimate undesired event (2). This paper deals to method of Fault Tree Analysis basic description and provides a practical view on possibility of application by quality improvement in road freight transport company.
Certification trails for data structures
NASA Technical Reports Server (NTRS)
Sullivan, Gregory F.; Masson, Gerald M.
1993-01-01
Certification trails are a recently introduced and promising approach to fault detection and fault tolerance. The applicability of the certification trail technique is significantly generalized. Previously, certification trails had to be customized to each algorithm application; trails appropriate to wide classes of algorithms were developed. These certification trails are based on common data-structure operations such as those carried out using these sets of operations such as those carried out using balanced binary trees and heaps. Any algorithms using these sets of operations can therefore employ the certification trail method to achieve software fault tolerance. To exemplify the scope of the generalization of the certification trail technique provided, constructions of trails for abstract data types such as priority queues and union-find structures are given. These trails are applicable to any data-structure implementation of the abstract data type. It is also shown that these ideals lead naturally to monitors for data-structure operations.
Fault Tree Analysis: Its Implications for Use in Education.
ERIC Educational Resources Information Center
Barker, Bruce O.
This study introduces the concept of Fault Tree Analysis as a systems tool and examines the implications of Fault Tree Analysis (FTA) as a technique for isolating failure modes in educational systems. A definition of FTA and discussion of its history, as it relates to education, are provided. The step by step process for implementation and use of…
Preventing medical errors by designing benign failures.
Grout, John R
2003-07-01
One way to successfully reduce medical errors is to design health care systems that are more resistant to the tendencies of human beings to err. One interdisciplinary approach entails creating design changes, mitigating human errors, and making human error irrelevant to outcomes. This approach is intended to facilitate the creation of benign failures, which have been called mistake-proofing devices and forcing functions elsewhere. USING FAULT TREES TO DESIGN FORCING FUNCTIONS: A fault tree is a graphical tool used to understand the relationships that either directly cause or contribute to the cause of a particular failure. A careful analysis of a fault tree enables the analyst to anticipate how the process will behave after the change. EXAMPLE OF AN APPLICATION: A scenario in which a patient is scalded while bathing can serve as an example of how multiple fault trees can be used to design forcing functions. The first fault tree shows the undesirable event--patient scalded while bathing. The second fault tree has a benign event--no water. Adding a scald valve changes the outcome from the undesirable event ("patient scalded while bathing") to the benign event ("no water") Analysis of fault trees does not ensure or guarantee that changes necessary to eliminate error actually occur. Most mistake-proofing is used to prevent simple errors and to create well-defended processes, but complex errors can also result. The utilization of mistake-proofing or forcing functions can be thought of as changing the logic of a process. Errors that formerly caused undesirable failures can be converted into the causes of benign failures. The use of fault trees can provide a variety of insights into the design of forcing functions that will improve patient safety.
Practical Methods for Estimating Software Systems Fault Content and Location
NASA Technical Reports Server (NTRS)
Nikora, A.; Schneidewind, N.; Munson, J.
1999-01-01
Over the past several years, we have developed techniques to discriminate between fault-prone software modules and those that are not, to estimate a software system's residual fault content, to identify those portions of a software system having the highest estimated number of faults, and to estimate the effects of requirements changes on software quality.
ERIC Educational Resources Information Center
Barker, Bruce O.; Petersen, Paul D.
This paper explores the fault-tree analysis approach to isolating failure modes within a system. Fault tree investigates potentially undesirable events and then looks for failures in sequence that would lead to their occurring. Relationships among these events are symbolized by AND or OR logic gates, AND used when single events must coexist to…
Evidential Networks for Fault Tree Analysis with Imprecise Knowledge
NASA Astrophysics Data System (ADS)
Yang, Jianping; Huang, Hong-Zhong; Liu, Yu; Li, Yan-Feng
2012-06-01
Fault tree analysis (FTA), as one of the powerful tools in reliability engineering, has been widely used to enhance system quality attributes. In most fault tree analyses, precise values are adopted to represent the probabilities of occurrence of those events. Due to the lack of sufficient data or imprecision of existing data at the early stage of product design, it is often difficult to accurately estimate the failure rates of individual events or the probabilities of occurrence of the events. Therefore, such imprecision and uncertainty need to be taken into account in reliability analysis. In this paper, the evidential networks (EN) are employed to quantify and propagate the aforementioned uncertainty and imprecision in fault tree analysis. The detailed conversion processes of some logic gates to EN are described in fault tree (FT). The figures of the logic gates and the converted equivalent EN, together with the associated truth tables and the conditional belief mass tables, are also presented in this work. The new epistemic importance is proposed to describe the effect of ignorance degree of event. The fault tree of an aircraft engine damaged by oil filter plugs is presented to demonstrate the proposed method.
Object-oriented fault tree models applied to system diagnosis
NASA Technical Reports Server (NTRS)
Iverson, David L.; Patterson-Hine, F. A.
1990-01-01
When a diagnosis system is used in a dynamic environment, such as the distributed computer system planned for use on Space Station Freedom, it must execute quickly and its knowledge base must be easily updated. Representing system knowledge as object-oriented augmented fault trees provides both features. The diagnosis system described here is based on the failure cause identification process of the diagnostic system described by Narayanan and Viswanadham. Their system has been enhanced in this implementation by replacing the knowledge base of if-then rules with an object-oriented fault tree representation. This allows the system to perform its task much faster and facilitates dynamic updating of the knowledge base in a changing diagnosis environment. Accessing the information contained in the objects is more efficient than performing a lookup operation on an indexed rule base. Additionally, the object-oriented fault trees can be easily updated to represent current system status. This paper describes the fault tree representation, the diagnosis algorithm extensions, and an example application of this system. Comparisons are made between the object-oriented fault tree knowledge structure solution and one implementation of a rule-based solution. Plans for future work on this system are also discussed.
Software dependability in the Tandem GUARDIAN system
NASA Technical Reports Server (NTRS)
Lee, Inhwan; Iyer, Ravishankar K.
1995-01-01
Based on extensive field failure data for Tandem's GUARDIAN operating system this paper discusses evaluation of the dependability of operational software. Software faults considered are major defects that result in processor failures and invoke backup processes to take over. The paper categorizes the underlying causes of software failures and evaluates the effectiveness of the process pair technique in tolerating software faults. A model to describe the impact of software faults on the reliability of an overall system is proposed. The model is used to evaluate the significance of key factors that determine software dependability and to identify areas for improvement. An analysis of the data shows that about 77% of processor failures that are initially considered due to software are confirmed as software problems. The analysis shows that the use of process pairs to provide checkpointing and restart (originally intended for tolerating hardware faults) allows the system to tolerate about 75% of reported software faults that result in processor failures. The loose coupling between processors, which results in the backup execution (the processor state and the sequence of events) being different from the original execution, is a major reason for the measured software fault tolerance. Over two-thirds (72%) of measured software failures are recurrences of previously reported faults. Modeling, based on the data, shows that, in addition to reducing the number of software faults, software dependability can be enhanced by reducing the recurrence rate.
Automated Generation of Fault Management Artifacts from a Simple System Model
NASA Technical Reports Server (NTRS)
Kennedy, Andrew K.; Day, John C.
2013-01-01
Our understanding of off-nominal behavior - failure modes and fault propagation - in complex systems is often based purely on engineering intuition; specific cases are assessed in an ad hoc fashion as a (fallible) fault management engineer sees fit. This work is an attempt to provide a more rigorous approach to this understanding and assessment by automating the creation of a fault management artifact, the Failure Modes and Effects Analysis (FMEA) through querying a representation of the system in a SysML model. This work builds off the previous development of an off-nominal behavior model for the upcoming Soil Moisture Active-Passive (SMAP) mission at the Jet Propulsion Laboratory. We further developed the previous system model to more fully incorporate the ideas of State Analysis, and it was restructured in an organizational hierarchy that models the system as layers of control systems while also incorporating the concept of "design authority". We present software that was developed to traverse the elements and relationships in this model to automatically construct an FMEA spreadsheet. We further discuss extending this model to automatically generate other typical fault management artifacts, such as Fault Trees, to efficiently portray system behavior, and depend less on the intuition of fault management engineers to ensure complete examination of off-nominal behavior.
Probabilistic fault tree analysis of a radiation treatment system.
Ekaette, Edidiong; Lee, Robert C; Cooke, David L; Iftody, Sandra; Craighead, Peter
2007-12-01
Inappropriate administration of radiation for cancer treatment can result in severe consequences such as premature death or appreciably impaired quality of life. There has been little study of vulnerable treatment process components and their contribution to the risk of radiation treatment (RT). In this article, we describe the application of probabilistic fault tree methods to assess the probability of radiation misadministration to patients at a large cancer treatment center. We conducted a systematic analysis of the RT process that identified four process domains: Assessment, Preparation, Treatment, and Follow-up. For the Preparation domain, we analyzed possible incident scenarios via fault trees. For each task, we also identified existing quality control measures. To populate the fault trees we used subjective probabilities from experts and compared results with incident report data. Both the fault tree and the incident report analysis revealed simulation tasks to be most prone to incidents, and the treatment prescription task to be least prone to incidents. The probability of a Preparation domain incident was estimated to be in the range of 0.1-0.7% based on incident reports, which is comparable to the mean value of 0.4% from the fault tree analysis using probabilities from the expert elicitation exercise. In conclusion, an analysis of part of the RT system using a fault tree populated with subjective probabilities from experts was useful in identifying vulnerable components of the system, and provided quantitative data for risk management.
Reconfigurable tree architectures using subtree oriented fault tolerance
NASA Technical Reports Server (NTRS)
Lowrie, Matthew B.
1987-01-01
An approach to the design of reconfigurable tree architecture is presented in which spare processors are allocated at the leaves. The approach is unique in that spares are associated with subtrees and sharing of spares between these subtrees can occur. The Subtree Oriented Fault Tolerance (SOFT) approach is more reliable than previous approaches capable of tolerating link and switch failures for both single chip and multichip tree implementations while reducing redundancy in terms of both spare processors and links. VLSI layout is 0(n) for binary trees and is directly extensible to N-ary trees and fault tolerance through performance degradation.
Multi-version software reliability through fault-avoidance and fault-tolerance
NASA Technical Reports Server (NTRS)
Vouk, Mladen A.; Mcallister, David F.
1989-01-01
A number of experimental and theoretical issues associated with the practical use of multi-version software to provide run-time tolerance to software faults were investigated. A specialized tool was developed and evaluated for measuring testing coverage for a variety of metrics. The tool was used to collect information on the relationships between software faults and coverage provided by the testing process as measured by different metrics (including data flow metrics). Considerable correlation was found between coverage provided by some higher metrics and the elimination of faults in the code. Back-to-back testing was continued as an efficient mechanism for removal of un-correlated faults, and common-cause faults of variable span. Software reliability estimation methods was also continued based on non-random sampling, and the relationship between software reliability and code coverage provided through testing. New fault tolerance models were formulated. Simulation studies of the Acceptance Voting and Multi-stage Voting algorithms were finished and it was found that these two schemes for software fault tolerance are superior in many respects to some commonly used schemes. Particularly encouraging are the safety properties of the Acceptance testing scheme.
Software Fault Tolerance: A Tutorial
NASA Technical Reports Server (NTRS)
Torres-Pomales, Wilfredo
2000-01-01
Because of our present inability to produce error-free software, software fault tolerance is and will continue to be an important consideration in software systems. The root cause of software design errors is the complexity of the systems. Compounding the problems in building correct software is the difficulty in assessing the correctness of software for highly complex systems. After a brief overview of the software development processes, we note how hard-to-detect design faults are likely to be introduced during development and how software faults tend to be state-dependent and activated by particular input sequences. Although component reliability is an important quality measure for system level analysis, software reliability is hard to characterize and the use of post-verification reliability estimates remains a controversial issue. For some applications software safety is more important than reliability, and fault tolerance techniques used in those applications are aimed at preventing catastrophes. Single version software fault tolerance techniques discussed include system structuring and closure, atomic actions, inline fault detection, exception handling, and others. Multiversion techniques are based on the assumption that software built differently should fail differently and thus, if one of the redundant versions fails, it is expected that at least one of the other versions will provide an acceptable output. Recovery blocks, N-version programming, and other multiversion techniques are reviewed.
Fault tolerant software modules for SIFT
NASA Technical Reports Server (NTRS)
Hecht, M.; Hecht, H.
1982-01-01
The implementation of software fault tolerance is investigated for critical modules of the Software Implemented Fault Tolerance (SIFT) operating system to support the computational and reliability requirements of advanced fly by wire transport aircraft. Fault tolerant designs generated for the error reported and global executive are examined. A description of the alternate routines, implementation requirements, and software validation are included.
Rymer, M.J.
2000-01-01
The Coachella Valley area was strongly shaken by the 1992 Joshua Tree (23 April) and Landers (28 June) earthquakes, and both events caused triggered slip on active faults within the area. Triggered slip associated with the Joshua Tree earthquake was on a newly recognized fault, the East Wide Canyon fault, near the southwestern edge of the Little San Bernardino Mountains. Slip associated with the Landers earthquake formed along the San Andreas fault in the southeastern Coachella Valley. Surface fractures formed along the East Wide Canyon fault in association with the Joshua Tree earthquake. The fractures extended discontinuously over a 1.5-km stretch of the fault, near its southern end. Sense of slip was consistently right-oblique, west side down, similar to the long-term style of faulting. Measured offset values were small, with right-lateral and vertical components of slip ranging from 1 to 6 mm and 1 to 4 mm, respectively. This is the first documented historic slip on the East Wide Canyon fault, which was first mapped only months before the Joshua Tree earthquake. Surface slip associated with the Joshua Tree earthquake most likely developed as triggered slip given its 5 km distance from the Joshua Tree epicenter and aftershocks. As revealed in a trench investigation, slip formed in an area with only a thin (<3 m thick) veneer of alluvium in contrast to earlier documented triggered slip events in this region, all in the deep basins of the Salton Trough. A paleoseismic trench study in an area of 1992 surface slip revealed evidence of two and possibly three surface faulting events on the East Wide Canyon fault during the late Quaternary, probably latest Pleistocene (first event) and mid- to late Holocene (second two events). About two months after the Joshua Tree earthquake, the Landers earthquake then triggered slip on many faults, including the San Andreas fault in the southeastern Coachella Valley. Surface fractures associated with this event formed discontinuous breaks over a 54-km-long stretch of the fault, from the Indio Hills southeastward to Durmid Hill. Sense of slip was right-lateral; only locally was there a minor (~1 mm) vertical component of slip. Measured dextral displacement values ranged from 1 to 20 mm, with the largest amounts found in the Mecca Hills where large slip values have been measured following past triggered-slip events.
NASA Astrophysics Data System (ADS)
de Barros, Felipe P. J.; Bolster, Diogo; Sanchez-Vila, Xavier; Nowak, Wolfgang
2011-05-01
Assessing health risk in hydrological systems is an interdisciplinary field. It relies on the expertise in the fields of hydrology and public health and needs powerful translation concepts to provide decision support and policy making. Reliable health risk estimates need to account for the uncertainties and variabilities present in hydrological, physiological, and human behavioral parameters. Despite significant theoretical advancements in stochastic hydrology, there is still a dire need to further propagate these concepts to practical problems and to society in general. Following a recent line of work, we use fault trees to address the task of probabilistic risk analysis and to support related decision and management problems. Fault trees allow us to decompose the assessment of health risk into individual manageable modules, thus tackling a complex system by a structural divide and conquer approach. The complexity within each module can be chosen individually according to data availability, parsimony, relative importance, and stage of analysis. Three differences are highlighted in this paper when compared to previous works: (1) The fault tree proposed here accounts for the uncertainty in both hydrological and health components, (2) system failure within the fault tree is defined in terms of risk being above a threshold value, whereas previous studies that used fault trees used auxiliary events such as exceedance of critical concentration levels, and (3) we introduce a new form of stochastic fault tree that allows us to weaken the assumption of independent subsystems that is required by a classical fault tree approach. We illustrate our concept in a simple groundwater-related setting.
Li, Qiuying; Pham, Hoang
2017-01-01
In this paper, we propose a software reliability model that considers not only error generation but also fault removal efficiency combined with testing coverage information based on a nonhomogeneous Poisson process (NHPP). During the past four decades, many software reliability growth models (SRGMs) based on NHPP have been proposed to estimate the software reliability measures, most of which have the same following agreements: 1) it is a common phenomenon that during the testing phase, the fault detection rate always changes; 2) as a result of imperfect debugging, fault removal has been related to a fault re-introduction rate. But there are few SRGMs in the literature that differentiate between fault detection and fault removal, i.e. they seldom consider the imperfect fault removal efficiency. But in practical software developing process, fault removal efficiency cannot always be perfect, i.e. the failures detected might not be removed completely and the original faults might still exist and new faults might be introduced meanwhile, which is referred to as imperfect debugging phenomenon. In this study, a model aiming to incorporate fault introduction rate, fault removal efficiency and testing coverage into software reliability evaluation is developed, using testing coverage to express the fault detection rate and using fault removal efficiency to consider the fault repair. We compare the performance of the proposed model with several existing NHPP SRGMs using three sets of real failure data based on five criteria. The results exhibit that the model can give a better fitting and predictive performance.
Experimental evaluation of the certification-trail method
NASA Technical Reports Server (NTRS)
Sullivan, Gregory F.; Wilson, Dwight S.; Masson, Gerald M.; Itoh, Mamoru; Smith, Warren W.; Kay, Jonathan S.
1993-01-01
Certification trails are a recently introduced and promising approach to fault-detection and fault-tolerance. A comprehensive attempt to assess experimentally the performance and overall value of the method is reported. The method is applied to algorithms for the following problems: huffman tree, shortest path, minimum spanning tree, sorting, and convex hull. Our results reveal many cases in which an approach using certification-trails allows for significantly faster overall program execution time than a basic time redundancy-approach. Algorithms for the answer-validation problem for abstract data types were also examined. This kind of problem provides a basis for applying the certification-trail method to wide classes of algorithms. Answer-validation solutions for two types of priority queues were implemented and analyzed. In both cases, the algorithm which performs answer-validation is substantially faster than the original algorithm for computing the answer. Next, a probabilistic model and analysis which enables comparison between the certification-trail method and the time-redundancy approach were presented. The analysis reveals some substantial and sometimes surprising advantages for ther certification-trail method. Finally, the work our group performed on the design and implementation of fault injection testbeds for experimental analysis of the certification trail technique is discussed. This work employs two distinct methodologies, software fault injection (modification of instruction, data, and stack segments of programs on a Sun Sparcstation ELC and on an IBM 386 PC) and hardware fault injection (control, address, and data lines of a Motorola MC68000-based target system pulsed at logical zero/one values). Our results indicate the viability of the certification trail technique. It is also believed that the tools developed provide a solid base for additional exploration.
A methodology for testing fault-tolerant software
NASA Technical Reports Server (NTRS)
Andrews, D. M.; Mahmood, A.; Mccluskey, E. J.
1985-01-01
A methodology for testing fault tolerant software is presented. There are problems associated with testing fault tolerant software because many errors are masked or corrected by voters, limiter, or automatic channel synchronization. This methodology illustrates how the same strategies used for testing fault tolerant hardware can be applied to testing fault tolerant software. For example, one strategy used in testing fault tolerant hardware is to disable the redundancy during testing. A similar testing strategy is proposed for software, namely, to move the major emphasis on testing earlier in the development cycle (before the redundancy is in place) thus reducing the possibility that undetected errors will be masked when limiters and voters are added.
Planning effectiveness may grow on fault trees.
Chow, C W; Haddad, K; Mannino, B
1991-10-01
The first step of a strategic planning process--identifying and analyzing threats and opportunities--requires subjective judgments. By using an analytical tool known as a fault tree, healthcare administrators can reduce the unreliability of subjective decision making by creating a logical structure for problem solving and decision making. A case study of 11 healthcare administrators showed that an analysis technique called prospective hindsight can add to a fault tree's ability to improve a strategic planning process.
NASA Astrophysics Data System (ADS)
Batzias, Dimitris F.
2012-12-01
Fault Tree Analysis (FTA) can be used for technology transfer when the relevant problem (called 'top even' in FTA) is solved in a technology centre and the results are diffused to interested parties (usually Small Medium Enterprises - SMEs) that have not the proper equipment and the required know-how to solve the problem by their own. Nevertheless, there is a significant drawback in this procedure: the information usually provided by the SMEs to the technology centre, about production conditions and corresponding quality characteristics of the product, and (sometimes) the relevant expertise in the Knowledge Base of this centre may be inadequate to form a complete fault tree. Since such cases are quite frequent in practice, we have developed a methodology for transforming incomplete fault tree to Ishikawa diagram, which is more flexible and less strict in establishing causal chains, because it uses a surface phenomenological level with a limited number of categories of faults. On the other hand, such an Ishikawa diagram can be extended to simulate a fault tree as relevant knowledge increases. An implementation of this transformation, referring to anodization of aluminium, is presented.
Li, Qiuying; Pham, Hoang
2017-01-01
In this paper, we propose a software reliability model that considers not only error generation but also fault removal efficiency combined with testing coverage information based on a nonhomogeneous Poisson process (NHPP). During the past four decades, many software reliability growth models (SRGMs) based on NHPP have been proposed to estimate the software reliability measures, most of which have the same following agreements: 1) it is a common phenomenon that during the testing phase, the fault detection rate always changes; 2) as a result of imperfect debugging, fault removal has been related to a fault re-introduction rate. But there are few SRGMs in the literature that differentiate between fault detection and fault removal, i.e. they seldom consider the imperfect fault removal efficiency. But in practical software developing process, fault removal efficiency cannot always be perfect, i.e. the failures detected might not be removed completely and the original faults might still exist and new faults might be introduced meanwhile, which is referred to as imperfect debugging phenomenon. In this study, a model aiming to incorporate fault introduction rate, fault removal efficiency and testing coverage into software reliability evaluation is developed, using testing coverage to express the fault detection rate and using fault removal efficiency to consider the fault repair. We compare the performance of the proposed model with several existing NHPP SRGMs using three sets of real failure data based on five criteria. The results exhibit that the model can give a better fitting and predictive performance. PMID:28750091
A systematic risk management approach employed on the CloudSat project
NASA Technical Reports Server (NTRS)
Basilio, R. R.; Plourde, K. S.; Lam, T.
2000-01-01
The CloudSat Project has developed a simplified approach for fault tree analysis and probabilistic risk assessment. A system-level fault tree has been constructed to identify credible fault scenarios and failure modes leading up to a potential failure to meet the nominal mission success criteria.
Fault Tree Analysis: A Bibliography
NASA Technical Reports Server (NTRS)
2000-01-01
Fault tree analysis is a top-down approach to the identification of process hazards. It is as one of the best methods for systematically identifying an graphically displaying the many ways some things can go wrong. This bibliography references 266 documents in the NASA STI Database that contain the major concepts. fault tree analysis, risk an probability theory, in the basic index or major subject terms. An abstract is included with most citations, followed by the applicable subject terms.
Practical Issues in Implementing Software Reliability Measurement
NASA Technical Reports Server (NTRS)
Nikora, Allen P.; Schneidewind, Norman F.; Everett, William W.; Munson, John C.; Vouk, Mladen A.; Musa, John D.
1999-01-01
Many ways of estimating software systems' reliability, or reliability-related quantities, have been developed over the past several years. Of particular interest are methods that can be used to estimate a software system's fault content prior to test, or to discriminate between components that are fault-prone and those that are not. The results of these methods can be used to: 1) More accurately focus scarce fault identification resources on those portions of a software system most in need of it. 2) Estimate and forecast the risk of exposure to residual faults in a software system during operation, and develop risk and safety criteria to guide the release of a software system to fielded use. 3) Estimate the efficiency of test suites in detecting residual faults. 4) Estimate the stability of the software maintenance process.
Assessing Survivability Using Software Fault Injection
2001-04-01
UNCLASSIFIED Defense Technical Information Center Compilation Part Notice ADPO10875 TITLE: Assessing Survivability Using Software Fault Injection...Esc to exit .......................................................................... = 11-1 Assessing Survivability Using Software Fault Injection...Jeffrey Voas Reliable Software Technologies 21351 Ridgetop Circle, #400 Dulles, VA 20166 jmvoas@rstcorp.crom Abstract approved sources have the
The use of automatic programming techniques for fault tolerant computing systems
NASA Technical Reports Server (NTRS)
Wild, C.
1985-01-01
It is conjectured that the production of software for ultra-reliable computing systems such as required by Space Station, aircraft, nuclear power plants and the like will require a high degree of automation as well as fault tolerance. In this paper, the relationship between automatic programming techniques and fault tolerant computing systems is explored. Initial efforts in the automatic synthesis of code from assertions to be used for error detection as well as the automatic generation of assertions and test cases from abstract data type specifications is outlined. Speculation on the ability to generate truly diverse designs capable of recovery from errors by exploring alternate paths in the program synthesis tree is discussed. Some initial thoughts on the use of knowledge based systems for the global detection of abnormal behavior using expectations and the goal-directed reconfiguration of resources to meet critical mission objectives are given. One of the sources of information for these systems would be the knowledge captured during the automatic programming process.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sarrack, A.G.
The purpose of this report is to document fault tree analyses which have been completed for the Defense Waste Processing Facility (DWPF) safety analysis. Logic models for equipment failures and human error combinations that could lead to flammable gas explosions in various process tanks, or failure of critical support systems were developed for internal initiating events and for earthquakes. These fault trees provide frequency estimates for support systems failures and accidents that could lead to radioactive and hazardous chemical releases both on-site and off-site. Top event frequency results from these fault trees will be used in further APET analyses tomore » calculate accident risk associated with DWPF facility operations. This report lists and explains important underlying assumptions, provides references for failure data sources, and briefly describes the fault tree method used. Specific commitments from DWPF to provide new procedural/administrative controls or system design changes are listed in the ''Facility Commitments'' section. The purpose of the ''Assumptions'' section is to clarify the basis for fault tree modeling, and is not necessarily a list of items required to be protected by Technical Safety Requirements (TSRs).« less
Graphical fault tree analysis for fatal falls in the construction industry.
Chi, Chia-Fen; Lin, Syuan-Zih; Dewi, Ratna Sari
2014-11-01
The current study applied a fault tree analysis to represent the causal relationships among events and causes that contributed to fatal falls in the construction industry. Four hundred and eleven work-related fatalities in the Taiwanese construction industry were analyzed in terms of age, gender, experience, falling site, falling height, company size, and the causes for each fatality. Given that most fatal accidents involve multiple events, the current study coded up to a maximum of three causes for each fall fatality. After the Boolean algebra and minimal cut set analyses, accident causes associated with each falling site can be presented as a fault tree to provide an overview of the basic causes, which could trigger fall fatalities in the construction industry. Graphical icons were designed for each falling site along with the associated accident causes to illustrate the fault tree in a graphical manner. A graphical fault tree can improve inter-disciplinary discussion of risk management and the communication of accident causation to first line supervisors. Copyright © 2014 Elsevier Ltd. All rights reserved.
Fault Tree Analysis for an Inspection Robot in a Nuclear Power Plant
NASA Astrophysics Data System (ADS)
Ferguson, Thomas A.; Lu, Lixuan
2017-09-01
The life extension of current nuclear reactors has led to an increasing demand on inspection and maintenance of critical reactor components that are too expensive to replace. To reduce the exposure dosage to workers, robotics have become an attractive alternative as a preventative safety tool in nuclear power plants. It is crucial to understand the reliability of these robots in order to increase the veracity and confidence of their results. This study presents the Fault Tree (FT) analysis to a coolant outlet piper snake-arm inspection robot in a nuclear power plant. Fault trees were constructed for a qualitative analysis to determine the reliability of the robot. Insight on the applicability of fault tree methods for inspection robotics in the nuclear industry is gained through this investigation.
Interim reliability evaluation program, Browns Ferry fault trees
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stewart, M.E.
1981-01-01
An abbreviated fault tree method is used to evaluate and model Browns Ferry systems in the Interim Reliability Evaluation programs, simplifying the recording and displaying of events, yet maintaining the system of identifying faults. The level of investigation is not changed. The analytical thought process inherent in the conventional method is not compromised. But the abbreviated method takes less time, and the fault modes are much more visible.
NASA Technical Reports Server (NTRS)
English, Thomas
2005-01-01
A standard tool of reliability analysis used at NASA-JSC is the event tree. An event tree is simply a probability tree, with the probabilities determining the next step through the tree specified at each node. The nodal probabilities are determined by a reliability study of the physical system at work for a particular node. The reliability study performed at a node is typically referred to as a fault tree analysis, with the potential of a fault tree existing.for each node on the event tree. When examining an event tree it is obvious why the event tree/fault tree approach has been adopted. Typical event trees are quite complex in nature, and the event tree/fault tree approach provides a systematic and organized approach to reliability analysis. The purpose of this study was two fold. Firstly, we wanted to explore the possibility that a semi-Markov process can create dependencies between sojourn times (the times it takes to transition from one state to the next) that can decrease the uncertainty when estimating time to failures. Using a generalized semi-Markov model, we studied a four element reliability model and were able to demonstrate such sojourn time dependencies. Secondly, we wanted to study the use of semi-Markov processes to introduce a time variable into the event tree diagrams that are commonly developed in PRA (Probabilistic Risk Assessment) analyses. Event tree end states which change with time are more representative of failure scenarios than are the usual static probability-derived end states.
The cost of software fault tolerance
NASA Technical Reports Server (NTRS)
Migneault, G. E.
1982-01-01
The proposed use of software fault tolerance techniques as a means of reducing software costs in avionics and as a means of addressing the issue of system unreliability due to faults in software is examined. A model is developed to provide a view of the relationships among cost, redundancy, and reliability which suggests strategies for software development and maintenance which are not conventional.
Structural system reliability calculation using a probabilistic fault tree analysis method
NASA Technical Reports Server (NTRS)
Torng, T. Y.; Wu, Y.-T.; Millwater, H. R.
1992-01-01
The development of a new probabilistic fault tree analysis (PFTA) method for calculating structural system reliability is summarized. The proposed PFTA procedure includes: developing a fault tree to represent the complex structural system, constructing an approximation function for each bottom event, determining a dominant sampling sequence for all bottom events, and calculating the system reliability using an adaptive importance sampling method. PFTA is suitable for complicated structural problems that require computer-intensive computer calculations. A computer program has been developed to implement the PFTA.
Study of fault tolerant software technology for dynamic systems
NASA Technical Reports Server (NTRS)
Caglayan, A. K.; Zacharias, G. L.
1985-01-01
The major aim of this study is to investigate the feasibility of using systems-based failure detection isolation and compensation (FDIC) techniques in building fault-tolerant software and extending them, whenever possible, to the domain of software fault tolerance. First, it is shown that systems-based FDIC methods can be extended to develop software error detection techniques by using system models for software modules. In particular, it is demonstrated that systems-based FDIC techniques can yield consistency checks that are easier to implement than acceptance tests based on software specifications. Next, it is shown that systems-based failure compensation techniques can be generalized to the domain of software fault tolerance in developing software error recovery procedures. Finally, the feasibility of using fault-tolerant software in flight software is investigated. In particular, possible system and version instabilities, and functional performance degradation that may occur in N-Version programming applications to flight software are illustrated. Finally, a comparative analysis of N-Version and recovery block techniques in the context of generic blocks in flight software is presented.
Using Fault Trees to Advance Understanding of Diagnostic Errors.
Rogith, Deevakar; Iyengar, M Sriram; Singh, Hardeep
2017-11-01
Diagnostic errors annually affect at least 5% of adults in the outpatient setting in the United States. Formal analytic techniques are only infrequently used to understand them, in part because of the complexity of diagnostic processes and clinical work flows involved. In this article, diagnostic errors were modeled using fault tree analysis (FTA), a form of root cause analysis that has been successfully used in other high-complexity, high-risk contexts. How factors contributing to diagnostic errors can be systematically modeled by FTA to inform error understanding and error prevention is demonstrated. A team of three experts reviewed 10 published cases of diagnostic error and constructed fault trees. The fault trees were modeled according to currently available conceptual frameworks characterizing diagnostic error. The 10 trees were then synthesized into a single fault tree to identify common contributing factors and pathways leading to diagnostic error. FTA is a visual, structured, deductive approach that depicts the temporal sequence of events and their interactions in a formal logical hierarchy. The visual FTA enables easier understanding of causative processes and cognitive and system factors, as well as rapid identification of common pathways and interactions in a unified fashion. In addition, it enables calculation of empirical estimates for causative pathways. Thus, fault trees might provide a useful framework for both quantitative and qualitative analysis of diagnostic errors. Future directions include establishing validity and reliability by modeling a wider range of error cases, conducting quantitative evaluations, and undertaking deeper exploration of other FTA capabilities. Copyright © 2017 The Joint Commission. Published by Elsevier Inc. All rights reserved.
Locating hardware faults in a data communications network of a parallel computer
Archer, Charles J.; Megerian, Mark G.; Ratterman, Joseph D.; Smith, Brian E.
2010-01-12
Hardware faults location in a data communications network of a parallel computer. Such a parallel computer includes a plurality of compute nodes and a data communications network that couples the compute nodes for data communications and organizes the compute node as a tree. Locating hardware faults includes identifying a next compute node as a parent node and a root of a parent test tree, identifying for each child compute node of the parent node a child test tree having the child compute node as root, running a same test suite on the parent test tree and each child test tree, and identifying the parent compute node as having a defective link connected from the parent compute node to a child compute node if the test suite fails on the parent test tree and succeeds on all the child test trees.
Study of fault-tolerant software technology
NASA Technical Reports Server (NTRS)
Slivinski, T.; Broglio, C.; Wild, C.; Goldberg, J.; Levitt, K.; Hitt, E.; Webb, J.
1984-01-01
Presented is an overview of the current state of the art of fault-tolerant software and an analysis of quantitative techniques and models developed to assess its impact. It examines research efforts as well as experience gained from commercial application of these techniques. The paper also addresses the computer architecture and design implications on hardware, operating systems and programming languages (including Ada) of using fault-tolerant software in real-time aerospace applications. It concludes that fault-tolerant software has progressed beyond the pure research state. The paper also finds that, although not perfectly matched, newer architectural and language capabilities provide many of the notations and functions needed to effectively and efficiently implement software fault-tolerance.
Khan, F I; Abbasi, S A
2000-07-10
Fault tree analysis (FTA) is based on constructing a hypothetical tree of base events (initiating events) branching into numerous other sub-events, propagating the fault and eventually leading to the top event (accident). It has been a powerful technique used traditionally in identifying hazards in nuclear installations and power industries. As the systematic articulation of the fault tree is associated with assigning probabilities to each fault, the exercise is also sometimes called probabilistic risk assessment. But powerful as this technique is, it is also very cumbersome and costly, limiting its area of application. We have developed a new algorithm based on analytical simulation (named as AS-II), which makes the application of FTA simpler, quicker, and cheaper; thus opening up the possibility of its wider use in risk assessment in chemical process industries. Based on the methodology we have developed a computer-automated tool. The details are presented in this paper.
DG TO FT - AUTOMATIC TRANSLATION OF DIGRAPH TO FAULT TREE MODELS
NASA Technical Reports Server (NTRS)
Iverson, D. L.
1994-01-01
Fault tree and digraph models are frequently used for system failure analysis. Both types of models represent a failure space view of the system using AND and OR nodes in a directed graph structure. Each model has its advantages. While digraphs can be derived in a fairly straightforward manner from system schematics and knowledge about component failure modes and system design, fault tree structure allows for fast processing using efficient techniques developed for tree data structures. The similarities between digraphs and fault trees permits the information encoded in the digraph to be translated into a logically equivalent fault tree. The DG TO FT translation tool will automatically translate digraph models, including those with loops or cycles, into fault tree models that have the same minimum cut set solutions as the input digraph. This tool could be useful, for example, if some parts of a system have been modeled using digraphs and others using fault trees. The digraphs could be translated and incorporated into the fault trees, allowing them to be analyzed using a number of powerful fault tree processing codes, such as cut set and quantitative solution codes. A cut set for a given node is a group of failure events that will cause the failure of the node. A minimum cut set for a node is any cut set that, if any of the failures in the set were to be removed, the occurrence of the other failures in the set will not cause the failure of the event represented by the node. Cut sets calculations can be used to find dependencies, weak links, and vital system components whose failures would cause serious systems failure. The DG TO FT translation system reads in a digraph with each node listed as a separate object in the input file. The user specifies a terminal node for the digraph that will be used as the top node of the resulting fault tree. A fault tree basic event node representing the failure of that digraph node is created and becomes a child of the terminal root node. A subtree is created for each of the inputs to the digraph terminal node and the root of those subtrees are added as children of the top node of the fault tree. Every node in the digraph upstream of the terminal node will be visited and converted. During the conversion process, the algorithm keeps track of the path from the digraph terminal node to the current digraph node. If a node is visited twice, then the program has found a cycle in the digraph. This cycle is broken by finding the minimal cut sets of the twice visited digraph node and forming those cut sets into subtrees. Another implementation of the algorithm resolves loops by building a subtree based on the digraph minimal cut sets calculation. It does not reduce the subtree to minimal cut set form. This second implementation produces larger fault trees, but runs much faster than the version using minimal cut sets since it does not spend time reducing the subtrees to minimal cut sets. The fault trees produced by DG TO FT will contain OR gates, AND gates, Basic Event nodes, and NOP gates. The results of a translation can be output as a text object description of the fault tree similar to the text digraph input format. The translator can also output a LISP language formatted file and an augmented LISP file which can be used by the FTDS (ARC-13019) diagnosis system, available from COSMIC, which performs diagnostic reasoning using the fault tree as a knowledge base. DG TO FT is written in C-language to be machine independent. It has been successfully implemented on a Sun running SunOS, a DECstation running ULTRIX, a Macintosh running System 7, and a DEC VAX running VMS. The RAM requirement varies with the size of the models. DG TO FT is available in UNIX tar format on a .25 inch streaming magnetic tape cartridge (standard distribution) or on a 3.5 inch diskette. It is also available on a 3.5 inch Macintosh format diskette or on a 9-track 1600 BPI magnetic tape in DEC VAX FILES-11 format. Sample input and sample output are provided on the distribution medium. An electronic copy of the documentation in Macintosh Microsoft Word format is provided on the distribution medium. DG TO FT was developed in 1992. Sun, and SunOS are trademarks of Sun Microsystems, Inc. DECstation, ULTRIX, VAX, and VMS are trademarks of Digital Equipment Corporation. UNIX is a registered trademark of AT&T Bell Laboratories. Macintosh is a registered trademark of Apple Computer, Inc. System 7 is a trademark of Apple Computers Inc. Microsoft Word is a trademark of Microsoft Corporation.
NASA Technical Reports Server (NTRS)
Padilla, Peter A.
1991-01-01
An investigation was made in AIRLAB of the fault handling performance of the Fault Tolerant MultiProcessor (FTMP). Fault handling errors detected during fault injection experiments were characterized. In these fault injection experiments, the FTMP disabled a working unit instead of the faulted unit once in every 500 faults, on the average. System design weaknesses allow active faults to exercise a part of the fault management software that handles Byzantine or lying faults. Byzantine faults behave such that the faulted unit points to a working unit as the source of errors. The design's problems involve: (1) the design and interface between the simplex error detection hardware and the error processing software, (2) the functional capabilities of the FTMP system bus, and (3) the communication requirements of a multiprocessor architecture. These weak areas in the FTMP's design increase the probability that, for any hardware fault, a good line replacement unit (LRU) is mistakenly disabled by the fault management software.
Detection of faults and software reliability analysis
NASA Technical Reports Server (NTRS)
Knight, J. C.
1986-01-01
Multiversion or N-version programming was proposed as a method of providing fault tolerance in software. The approach requires the separate, independent preparation of multiple versions of a piece of software for some application. Specific topics addressed are: failure probabilities in N-version systems, consistent comparison in N-version systems, descriptions of the faults found in the Knight and Leveson experiment, analytic models of comparison testing, characteristics of the input regions that trigger faults, fault tolerance through data diversity, and the relationship between failures caused by automatically seeded faults.
Coordinated Fault Tolerance for High-Performance Computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dongarra, Jack; Bosilca, George; et al.
2013-04-08
Our work to meet our goal of end-to-end fault tolerance has focused on two areas: (1) improving fault tolerance in various software currently available and widely used throughout the HEC domain and (2) using fault information exchange and coordination to achieve holistic, systemwide fault tolerance and understanding how to design and implement interfaces for integrating fault tolerance features for multiple layers of the software stack—from the application, math libraries, and programming language runtime to other common system software such as jobs schedulers, resource managers, and monitoring tools.
Model Transformation for a System of Systems Dependability Safety Case
NASA Technical Reports Server (NTRS)
Murphy, Judy; Driskell, Stephen B.
2010-01-01
Software plays an increasingly larger role in all aspects of NASA's science missions. This has been extended to the identification, management and control of faults which affect safety-critical functions and by default, the overall success of the mission. Traditionally, the analysis of fault identification, management and control are hardware based. Due to the increasing complexity of system, there has been a corresponding increase in the complexity in fault management software. The NASA Independent Validation & Verification (IV&V) program is creating processes and procedures to identify, and incorporate safety-critical software requirements along with corresponding software faults so that potential hazards may be mitigated. This Specific to Generic ... A Case for Reuse paper describes the phases of a dependability and safety study which identifies a new, process to create a foundation for reusable assets. These assets support the identification and management of specific software faults and, their transformation from specific to generic software faults. This approach also has applications to other systems outside of the NASA environment. This paper addresses how a mission specific dependability and safety case is being transformed to a generic dependability and safety case which can be reused for any type of space mission with an emphasis on software fault conditions.
Preliminary design of the redundant software experiment
NASA Technical Reports Server (NTRS)
Campbell, Roy; Deimel, Lionel; Eckhardt, Dave, Jr.; Kelly, John; Knight, John; Lauterbach, Linda; Lee, Larry; Mcallister, Dave; Mchugh, John
1985-01-01
The goal of the present experiment is to characterize the fault distributions of highly reliable software replicates, constructed using techniques and environments which are similar to those used in comtemporary industrial software facilities. The fault distributions and their effect on the reliability of fault tolerant configurations of the software will be determined through extensive life testing of the replicates against carefully constructed randomly generated test data. Each detected error will be carefully analyzed to provide insight in to their nature and cause. A direct objective is to develop techniques for reducing the intensity of coincident errors, thus increasing the reliability gain which can be achieved with fault tolerance. Data on the reliability gains realized, and the cost of the fault tolerant configurations can be used to design a companion experiment to determine the cost effectiveness of the fault tolerant strategy. Finally, the data and analysis produced by this experiment will be valuable to the software engineering community as a whole because it will provide a useful insight into the nature and cause of hard to find, subtle faults which escape standard software engineering validation techniques and thus persist far into the software life cycle.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sattison, M.B.
The Idaho National Engineering Laboratory (INEL) over the three years has created 75 plant-specific Accident Sequence Precursor (ASP) models using the SAPHIRE suite of PRA codes. Along with the new models, the INEL has also developed a new module for SAPHIRE which is tailored specifically to the unique needs of ASP evaluations. These models and software will be the next generation of risk tools for the evaluation of accident precursors by both the U.S. Nuclear Regulatory Commission`s (NRC`s) Office of Nuclear Reactor Regulation (NRR) and the Office for Analysis and Evaluation of Operational Data (AEOD). This paper presents an overviewmore » of the models and software. Key characteristics include: (1) classification of the plant models according to plant response with a unique set of event trees for each plant class, (2) plant-specific fault trees using supercomponents, (3) generation and retention of all system and sequence cutsets, (4) full flexibility in modifying logic, regenerating cutsets, and requantifying results, and (5) user interface for streamlined evaluation of ASP events. Future plans for the ASP models is also presented.« less
Software fault tolerance for real-time avionics systems
NASA Technical Reports Server (NTRS)
Anderson, T.; Knight, J. C.
1983-01-01
Avionics systems have very high reliability requirements and are therefore prime candidates for the inclusion of fault tolerance techniques. In order to provide tolerance to software faults, some form of state restoration is usually advocated as a means of recovery. State restoration can be very expensive for systems which utilize concurrent processes. The concurrency present in most avionics systems and the further difficulties introduced by timing constraints imply that providing tolerance for software faults may be inordinately expensive or complex. A straightforward pragmatic approach to software fault tolerance which is believed to be applicable to many real-time avionics systems is proposed. A classification system for software errors is presented together with approaches to recovery and continued service for each error type.
Reliability database development for use with an object-oriented fault tree evaluation program
NASA Technical Reports Server (NTRS)
Heger, A. Sharif; Harringtton, Robert J.; Koen, Billy V.; Patterson-Hine, F. Ann
1989-01-01
A description is given of the development of a fault-tree analysis method using object-oriented programming. In addition, the authors discuss the programs that have been developed or are under development to connect a fault-tree analysis routine to a reliability database. To assess the performance of the routines, a relational database simulating one of the nuclear power industry databases has been constructed. For a realistic assessment of the results of this project, the use of one of existing nuclear power reliability databases is planned.
Fault diagnosis of power transformer based on fault-tree analysis (FTA)
NASA Astrophysics Data System (ADS)
Wang, Yongliang; Li, Xiaoqiang; Ma, Jianwei; Li, SuoYu
2017-05-01
Power transformers is an important equipment in power plants and substations, power distribution transmission link is made an important hub of power systems. Its performance directly affects the quality and health of the power system reliability and stability. This paper summarizes the five parts according to the fault type power transformers, then from the time dimension divided into three stages of power transformer fault, use DGA routine analysis and infrared diagnostics criterion set power transformer running state, finally, according to the needs of power transformer fault diagnosis, by the general to the section by stepwise refinement of dendritic tree constructed power transformer fault
Design study of Software-Implemented Fault-Tolerance (SIFT) computer
NASA Technical Reports Server (NTRS)
Wensley, J. H.; Goldberg, J.; Green, M. W.; Kutz, W. H.; Levitt, K. N.; Mills, M. E.; Shostak, R. E.; Whiting-Okeefe, P. M.; Zeidler, H. M.
1982-01-01
Software-implemented fault tolerant (SIFT) computer design for commercial aviation is reported. A SIFT design concept is addressed. Alternate strategies for physical implementation are considered. Hardware and software design correctness is addressed. System modeling and effectiveness evaluation are considered from a fault-tolerant point of view.
CUTSETS - MINIMAL CUT SET CALCULATION FOR DIGRAPH AND FAULT TREE RELIABILITY MODELS
NASA Technical Reports Server (NTRS)
Iverson, D. L.
1994-01-01
Fault tree and digraph models are frequently used for system failure analysis. Both type of models represent a failure space view of the system using AND and OR nodes in a directed graph structure. Fault trees must have a tree structure and do not allow cycles or loops in the graph. Digraphs allow any pattern of interconnection between loops in the graphs. A common operation performed on digraph and fault tree models is the calculation of minimal cut sets. A cut set is a set of basic failures that could cause a given target failure event to occur. A minimal cut set for a target event node in a fault tree or digraph is any cut set for the node with the property that if any one of the failures in the set is removed, the occurrence of the other failures in the set will not cause the target failure event. CUTSETS will identify all the minimal cut sets for a given node. The CUTSETS package contains programs that solve for minimal cut sets of fault trees and digraphs using object-oriented programming techniques. These cut set codes can be used to solve graph models for reliability analysis and identify potential single point failures in a modeled system. The fault tree minimal cut set code reads in a fault tree model input file with each node listed in a text format. In the input file the user specifies a top node of the fault tree and a maximum cut set size to be calculated. CUTSETS will find minimal sets of basic events which would cause the failure at the output of a given fault tree gate. The program can find all the minimal cut sets of a node, or minimal cut sets up to a specified size. The algorithm performs a recursive top down parse of the fault tree, starting at the specified top node, and combines the cut sets of each child node into sets of basic event failures that would cause the failure event at the output of that gate. Minimal cut set solutions can be found for all nodes in the fault tree or just for the top node. The digraph cut set code uses the same techniques as the fault tree cut set code, except it includes all upstream digraph nodes in the cut sets for a given node and checks for cycles in the digraph during the solution process. CUTSETS solves for specified nodes and will not automatically solve for all upstream digraph nodes. The cut sets will be output as a text file. CUTSETS includes a utility program that will convert the popular COD format digraph model description files into text input files suitable for use with the CUTSETS programs. FEAT (MSC-21873) and FIRM (MSC-21860) available from COSMIC are examples of programs that produce COD format digraph model description files that may be converted for use with the CUTSETS programs. CUTSETS is written in C-language to be machine independent. It has been successfully implemented on a Sun running SunOS, a DECstation running ULTRIX, a Macintosh running System 7, and a DEC VAX running VMS. The RAM requirement varies with the size of the models. CUTSETS is available in UNIX tar format on a .25 inch streaming magnetic tape cartridge (standard distribution) or on a 3.5 inch diskette. It is also available on a 3.5 inch Macintosh format diskette or on a 9-track 1600 BPI magnetic tape in DEC VAX FILES-11 format. Sample input and sample output are provided on the distribution medium. An electronic copy of the documentation in Macintosh Microsoft Word format is included on the distribution medium. Sun and SunOS are trademarks of Sun Microsystems, Inc. DEC, DeCstation, ULTRIX, VAX, and VMS are trademarks of Digital Equipment Corporation. UNIX is a registered trademark of AT&T Bell Laboratories. Macintosh is a registered trademark of Apple Computer, Inc.
Fault trees for decision making in systems analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lambert, Howard E.
1975-10-09
The application of fault tree analysis (FTA) to system safety and reliability is presented within the framework of system safety analysis. The concepts and techniques involved in manual and automated fault tree construction are described and their differences noted. The theory of mathematical reliability pertinent to FTA is presented with emphasis on engineering applications. An outline of the quantitative reliability techniques of the Reactor Safety Study is given. Concepts of probabilistic importance are presented within the fault tree framework and applied to the areas of system design, diagnosis and simulation. The computer code IMPORTANCE ranks basic events and cut setsmore » according to a sensitivity analysis. A useful feature of the IMPORTANCE code is that it can accept relative failure data as input. The output of the IMPORTANCE code can assist an analyst in finding weaknesses in system design and operation, suggest the most optimal course of system upgrade, and determine the optimal location of sensors within a system. A general simulation model of system failure in terms of fault tree logic is described. The model is intended for efficient diagnosis of the causes of system failure in the event of a system breakdown. It can also be used to assist an operator in making decisions under a time constraint regarding the future course of operations. The model is well suited for computer implementation. New results incorporated in the simulation model include an algorithm to generate repair checklists on the basis of fault tree logic and a one-step-ahead optimization procedure that minimizes the expected time to diagnose system failure.« less
Integration of Advanced Probabilistic Analysis Techniques with Multi-Physics Models
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cetiner, Mustafa Sacit; none,; Flanagan, George F.
2014-07-30
An integrated simulation platform that couples probabilistic analysis-based tools with model-based simulation tools can provide valuable insights for reactive and proactive responses to plant operating conditions. The objective of this work is to demonstrate the benefits of a partial implementation of the Small Modular Reactor (SMR) Probabilistic Risk Assessment (PRA) Detailed Framework Specification through the coupling of advanced PRA capabilities and accurate multi-physics plant models. Coupling a probabilistic model with a multi-physics model will aid in design, operations, and safety by providing a more accurate understanding of plant behavior. This represents the first attempt at actually integrating these two typesmore » of analyses for a control system used for operations, on a faster than real-time basis. This report documents the development of the basic communication capability to exchange data with the probabilistic model using Reliability Workbench (RWB) and the multi-physics model using Dymola. The communication pathways from injecting a fault (i.e., failing a component) to the probabilistic and multi-physics models were successfully completed. This first version was tested with prototypic models represented in both RWB and Modelica. First, a simple event tree/fault tree (ET/FT) model was created to develop the software code to implement the communication capabilities between the dynamic-link library (dll) and RWB. A program, written in C#, successfully communicates faults to the probabilistic model through the dll. A systems model of the Advanced Liquid-Metal Reactor–Power Reactor Inherently Safe Module (ALMR-PRISM) design developed under another DOE project was upgraded using Dymola to include proper interfaces to allow data exchange with the control application (ConApp). A program, written in C+, successfully communicates faults to the multi-physics model. The results of the example simulation were successfully plotted.« less
Fire safety in transit systems fault tree analysis
DOT National Transportation Integrated Search
1981-09-01
Fire safety countermeasures applicable to transit vehicles are identified and evaluated. This document contains fault trees which illustrate the sequences of events which may lead to a transit-fire related casualty. A description of the basis for the...
A second generation experiment in fault-tolerant software
NASA Technical Reports Server (NTRS)
Knight, J. C.
1986-01-01
The primary goal was to determine whether the application of fault tolerance to software increases its reliability if the cost of production is the same as for an equivalent nonfault tolerance version derived from the same requirements specification. Software development protocols are discussed. The feasibility of adapting to software design fault tolerance the technique of N-fold Modular Redundancy with majority voting was studied.
System Analysis by Mapping a Fault-tree into a Bayesian-network
NASA Astrophysics Data System (ADS)
Sheng, B.; Deng, C.; Wang, Y. H.; Tang, L. H.
2018-05-01
In view of the limitations of fault tree analysis in reliability assessment, Bayesian Network (BN) has been studied as an alternative technology. After a brief introduction to the method for mapping a Fault Tree (FT) into an equivalent BN, equations used to calculate the structure importance degree, the probability importance degree and the critical importance degree are presented. Furthermore, the correctness of these equations is proved mathematically. Combining with an aircraft landing gear’s FT, an equivalent BN is developed and analysed. The results show that richer and more accurate information have been achieved through the BN method than the FT, which demonstrates that the BN is a superior technique in both reliability assessment and fault diagnosis.
A diagnosis system using object-oriented fault tree models
NASA Technical Reports Server (NTRS)
Iverson, David L.; Patterson-Hine, F. A.
1990-01-01
Spaceborne computing systems must provide reliable, continuous operation for extended periods. Due to weight, power, and volume constraints, these systems must manage resources very effectively. A fault diagnosis algorithm is described which enables fast and flexible diagnoses in the dynamic distributed computing environments planned for future space missions. The algorithm uses a knowledge base that is easily changed and updated to reflect current system status. Augmented fault trees represented in an object-oriented form provide deep system knowledge that is easy to access and revise as a system changes. Given such a fault tree, a set of failure events that have occurred, and a set of failure events that have not occurred, this diagnosis system uses forward and backward chaining to propagate causal and temporal information about other failure events in the system being diagnosed. Once the system has established temporal and causal constraints, it reasons backward from heuristically selected failure events to find a set of basic failure events which are a likely cause of the occurrence of the top failure event in the fault tree. The diagnosis system has been implemented in common LISP using Flavors.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lumsdaine, Andrew
2013-03-08
The main purpose of the Coordinated Infrastructure for Fault Tolerance in Systems initiative has been to conduct research with a goal of providing end-to-end fault tolerance on a systemwide basis for applications and other system software. While fault tolerance has been an integral part of most high-performance computing (HPC) system software developed over the past decade, it has been treated mostly as a collection of isolated stovepipes. Visibility and response to faults has typically been limited to the particular hardware and software subsystems in which they are initially observed. Little fault information is shared across subsystems, allowing little flexibility ormore » control on a system-wide basis, making it practically impossible to provide cohesive end-to-end fault tolerance in support of scientific applications. As an example, consider faults such as communication link failures that can be seen by a network library but are not directly visible to the job scheduler, or consider faults related to node failures that can be detected by system monitoring software but are not inherently visible to the resource manager. If information about such faults could be shared by the network libraries or monitoring software, then other system software, such as a resource manager or job scheduler, could ensure that failed nodes or failed network links were excluded from further job allocations and that further diagnosis could be performed. As a founding member and one of the lead developers of the Open MPI project, our efforts over the course of this project have been focused on making Open MPI more robust to failures by supporting various fault tolerance techniques, and using fault information exchange and coordination between MPI and the HPC system software stack from the application, numeric libraries, and programming language runtime to other common system components such as jobs schedulers, resource managers, and monitoring tools.« less
Reset Tree-Based Optical Fault Detection
Lee, Dong-Geon; Choi, Dooho; Seo, Jungtaek; Kim, Howon
2013-01-01
In this paper, we present a new reset tree-based scheme to protect cryptographic hardware against optical fault injection attacks. As one of the most powerful invasive attacks on cryptographic hardware, optical fault attacks cause semiconductors to misbehave by injecting high-energy light into a decapped integrated circuit. The contaminated result from the affected chip is then used to reveal secret information, such as a key, from the cryptographic hardware. Since the advent of such attacks, various countermeasures have been proposed. Although most of these countermeasures are strong, there is still the possibility of attack. In this paper, we present a novel optical fault detection scheme that utilizes the buffers on a circuit's reset signal tree as a fault detection sensor. To evaluate our proposal, we model radiation-induced currents into circuit components and perform a SPICE simulation. The proposed scheme is expected to be used as a supplemental security tool. PMID:23698267
Fault tree applications within the safety program of Idaho Nuclear Corporation
NASA Technical Reports Server (NTRS)
Vesely, W. E.
1971-01-01
Computerized fault tree analyses are used to obtain both qualitative and quantitative information about the safety and reliability of an electrical control system that shuts the reactor down when certain safety criteria are exceeded, in the design of a nuclear plant protection system, and in an investigation of a backup emergency system for reactor shutdown. The fault tree yields the modes by which the system failure or accident will occur, the most critical failure or accident causing areas, detailed failure probabilities, and the response of safety or reliability to design modifications and maintenance schemes.
Coordinated Fault-Tolerance for High-Performance Computing Final Project Report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Panda, Dhabaleswar Kumar; Beckman, Pete
2011-07-28
With the Coordinated Infrastructure for Fault Tolerance Systems (CIFTS, as the original project came to be called) project, our aim has been to understand and tackle the following broad research questions, the answers to which will help the HEC community analyze and shape the direction of research in the field of fault tolerance and resiliency on future high-end leadership systems. Will availability of global fault information, obtained by fault information exchange between the different HEC software on a system, allow individual system software to better detect, diagnose, and adaptively respond to faults? If fault-awareness is raised throughout the system throughmore » fault information exchange, is it possible to get all system software working together to provide a more comprehensive end-to-end fault management on the system? What are the missing fault-tolerance features that widely used HEC system software lacks today that would inhibit such software from taking advantage of systemwide global fault information? What are the practical limitations of a systemwide approach for end-to-end fault management based on fault awareness and coordination? What mechanisms, tools, and technologies are needed to bring about fault awareness and coordination of responses on a leadership-class system? What standards, outreach, and community interaction are needed for adoption of the concept of fault awareness and coordination for fault management on future systems? Keeping our overall objectives in mind, the CIFTS team has taken a parallel fourfold approach. Our central goal was to design and implement a light-weight, scalable infrastructure with a simple, standardized interface to allow communication of fault-related information through the system and facilitate coordinated responses. This work led to the development of the Fault Tolerance Backplane (FTB) publish-subscribe API specification, together with a reference implementation and several experimental implementations on top of existing publish-subscribe tools. We enhanced the intrinsic fault tolerance capabilities representative implementations of a variety of key HPC software subsystems and integrated them with the FTB. Targeting software subsystems included: MPI communication libraries, checkpoint/restart libraries, resource managers and job schedulers, and system monitoring tools. Leveraging the aforementioned infrastructure, as well as developing and utilizing additional tools, we have examined issues associated with expanded, end-to-end fault response from both system and application viewpoints. From the standpoint of system operations, we have investigated log and root cause analysis, anomaly detection and fault prediction, and generalized notification mechanisms. Our applications work has included libraries for fault-tolerance linear algebra, application frameworks for coupled multiphysics applications, and external frameworks to support the monitoring and response for general applications. Our final goal was to engage the high-end computing community to increase awareness of tools and issues around coordinated end-to-end fault management.« less
Lognormal Approximations of Fault Tree Uncertainty Distributions.
El-Shanawany, Ashraf Ben; Ardron, Keith H; Walker, Simon P
2018-01-26
Fault trees are used in reliability modeling to create logical models of fault combinations that can lead to undesirable events. The output of a fault tree analysis (the top event probability) is expressed in terms of the failure probabilities of basic events that are input to the model. Typically, the basic event probabilities are not known exactly, but are modeled as probability distributions: therefore, the top event probability is also represented as an uncertainty distribution. Monte Carlo methods are generally used for evaluating the uncertainty distribution, but such calculations are computationally intensive and do not readily reveal the dominant contributors to the uncertainty. In this article, a closed-form approximation for the fault tree top event uncertainty distribution is developed, which is applicable when the uncertainties in the basic events of the model are lognormally distributed. The results of the approximate method are compared with results from two sampling-based methods: namely, the Monte Carlo method and the Wilks method based on order statistics. It is shown that the closed-form expression can provide a reasonable approximation to results obtained by Monte Carlo sampling, without incurring the computational expense. The Wilks method is found to be a useful means of providing an upper bound for the percentiles of the uncertainty distribution while being computationally inexpensive compared with full Monte Carlo sampling. The lognormal approximation method and Wilks's method appear attractive, practical alternatives for the evaluation of uncertainty in the output of fault trees and similar multilinear models. © 2018 Society for Risk Analysis.
Fault Tree Analysis as a Planning and Management Tool: A Case Study
ERIC Educational Resources Information Center
Witkin, Belle Ruth
1977-01-01
Fault Tree Analysis is an operations research technique used to analyse the most probable modes of failure in a system, in order to redesign or monitor the system more closely in order to increase its likelihood of success. (Author)
Modeling and Hazard Analysis Using STPA
NASA Astrophysics Data System (ADS)
Ishimatsu, Takuto; Leveson, Nancy; Thomas, John; Katahira, Masa; Miyamoto, Yuko; Nakao, Haruka
2010-09-01
A joint research project between MIT and JAXA/JAMSS is investigating the application of a new hazard analysis to the system and software in the HTV. Traditional hazard analysis focuses on component failures but software does not fail in this way. Software most often contributes to accidents by commanding the spacecraft into an unsafe state(e.g., turning off the descent engines prematurely) or by not issuing required commands. That makes the standard hazard analysis techniques of limited usefulness on software-intensive systems, which describes most spacecraft built today. STPA is a new hazard analysis technique based on systems theory rather than reliability theory. It treats safety as a control problem rather than a failure problem. The goal of STPA, which is to create a set of scenarios that can lead to a hazard, is the same as FTA but STPA includes a broader set of potential scenarios including those in which no failures occur but the problems arise due to unsafe and unintended interactions among the system components. STPA also provides more guidance to the analysts that traditional fault tree analysis. Functional control diagrams are used to guide the analysis. In addition, JAXA uses a model-based system engineering development environment(created originally by Leveson and called SpecTRM) which also assists in the hazard analysis. One of the advantages of STPA is that it can be applied early in the system engineering and development process in a safety-driven design process where hazard analysis drives the design decisions rather than waiting until reviews identify problems that are then costly or difficult to fix. It can also be applied in an after-the-fact analysis and hazard assessment, which is what we did in this case study. This paper describes the experimental application of STPA to the JAXA HTV in order to determine the feasibility and usefulness of the new hazard analysis technique. Because the HTV was originally developed using fault tree analysis and following the NASA standards for safety-critical systems, the results of our experimental application of STPA can be compared with these more traditional safety engineering approaches in terms of the problems identified and the resources required to use it.
NASA Astrophysics Data System (ADS)
Rodak, C. M.; McHugh, R.; Wei, X.
2016-12-01
The development and combination of horizontal drilling and hydraulic fracturing has unlocked unconventional hydrocarbon reserves around the globe. These advances have triggered a number of concerns regarding aquifer contamination and over-exploitation, leading to scientific studies investigating potential risks posed by directional hydraulic fracturing activities. These studies, balanced with potential economic benefits of energy production, are a crucial source of information for communities considering the development of unconventional reservoirs. However, probabilistic quantification of the overall risk posed by hydraulic fracturing at the system level are rare. Here we present the concept of fault tree analysis to determine the overall probability of groundwater contamination or over-exploitation, broadly referred to as the probability of failure. The potential utility of fault tree analysis for the quantification and communication of risks is approached with a general application. However, the fault tree design is robust and can handle various combinations of regional-specific data pertaining to relevant spatial scales, geological conditions, and industry practices where available. All available data are grouped into quantity and quality-based impacts and sub-divided based on the stage of the hydraulic fracturing process in which the data is relevant as described by the USEPA. Each stage is broken down into the unique basic events required for failure; for example, to quantify the risk of an on-site spill we must consider the likelihood, magnitude, composition, and subsurface transport of the spill. The structure of the fault tree described above can be used to render a highly complex system of variables into a straightforward equation for risk calculation based on Boolean logic. This project shows the utility of fault tree analysis for the visual communication of the potential risks of hydraulic fracturing activities on groundwater resources.
Experiments in fault tolerant software reliability
NASA Technical Reports Server (NTRS)
Mcallister, David F.; Tai, K. C.; Vouk, Mladen A.
1987-01-01
The reliability of voting was evaluated in a fault-tolerant software system for small output spaces. The effectiveness of the back-to-back testing process was investigated. Version 3.0 of the RSDIMU-ATS, a semi-automated test bed for certification testing of RSDIMU software, was prepared and distributed. Software reliability estimation methods based on non-random sampling are being studied. The investigation of existing fault-tolerance models was continued and formulation of new models was initiated.
The Curiosity Mars Rover's Fault Protection Engine
NASA Technical Reports Server (NTRS)
Benowitz, Ed
2014-01-01
The Curiosity Rover, currently operating on Mars, contains flight software onboard to autonomously handle aspects of system fault protection. Over 1000 monitors and 39 responses are present in the flight software. Orchestrating these behaviors is the flight software's fault protection engine. In this paper, we discuss the engine's design, responsibilities, and present some lessons learned for future missions.
Fault Tree Analysis: An Emerging Methodology for Instructional Science.
ERIC Educational Resources Information Center
Wood, R. Kent; And Others
1979-01-01
Describes Fault Tree Analysis, a tool for systems analysis which attempts to identify possible modes of failure in systems to increase the probability of success. The article defines the technique and presents the steps of FTA construction, focusing on its application to education. (RAO)
Hierarchical Simulation to Assess Hardware and Software Dependability
NASA Technical Reports Server (NTRS)
Ries, Gregory Lawrence
1997-01-01
This thesis presents a method for conducting hierarchical simulations to assess system hardware and software dependability. The method is intended to model embedded microprocessor systems. A key contribution of the thesis is the idea of using fault dictionaries to propagate fault effects upward from the level of abstraction where a fault model is assumed to the system level where the ultimate impact of the fault is observed. A second important contribution is the analysis of the software behavior under faults as well as the hardware behavior. The simulation method is demonstrated and validated in four case studies analyzing Myrinet, a commercial, high-speed networking system. One key result from the case studies shows that the simulation method predicts the same fault impact 87.5% of the time as is obtained by similar fault injections into a real Myrinet system. Reasons for the remaining discrepancy are examined in the thesis. A second key result shows the reduction in the number of simulations needed due to the fault dictionary method. In one case study, 500 faults were injected at the chip level, but only 255 propagated to the system level. Of these 255 faults, 110 shared identical fault dictionary entries at the system level and so did not need to be resimulated. The necessary number of system-level simulations was therefore reduced from 500 to 145. Finally, the case studies show how the simulation method can be used to improve the dependability of the target system. The simulation analysis was used to add recovery to the target software for the most common fault propagation mechanisms that would cause the software to hang. After the modification, the number of hangs was reduced by 60% for fault injections into the real system.
Software reliability through fault-avoidance and fault-tolerance
NASA Technical Reports Server (NTRS)
Vouk, Mladen A.; Mcallister, David F.
1993-01-01
Strategies and tools for the testing, risk assessment and risk control of dependable software-based systems were developed. Part of this project consists of studies to enable the transfer of technology to industry, for example the risk management techniques for safety-concious systems. Theoretical investigations of Boolean and Relational Operator (BRO) testing strategy were conducted for condition-based testing. The Basic Graph Generation and Analysis tool (BGG) was extended to fully incorporate several variants of the BRO metric. Single- and multi-phase risk, coverage and time-based models are being developed to provide additional theoretical and empirical basis for estimation of the reliability and availability of large, highly dependable software. A model for software process and risk management was developed. The use of cause-effect graphing for software specification and validation was investigated. Lastly, advanced software fault-tolerance models were studied to provide alternatives and improvements in situations where simple software fault-tolerance strategies break down.
Program listing for fault tree analysis of JPL technical report 32-1542
NASA Technical Reports Server (NTRS)
Chelson, P. O.
1971-01-01
The computer program listing for the MAIN program and those subroutines unique to the fault tree analysis are described. Some subroutines are used for analyzing the reliability block diagram. The program is written in FORTRAN 5 and is running on a UNIVAC 1108.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Powell, Danny H; Elwood Jr, Robert H
2011-01-01
Analysis of the material protection, control, and accountability (MPC&A) system is necessary to understand the limits and vulnerabilities of the system to internal threats. A self-appraisal helps the facility be prepared to respond to internal threats and reduce the risk of theft or diversion of nuclear material. The material control and accountability (MC&A) system effectiveness tool (MSET) fault tree was developed to depict the failure of the MPC&A system as a result of poor practices and random failures in the MC&A system. It can also be employed as a basis for assessing deliberate threats against a facility. MSET uses faultmore » tree analysis, which is a top-down approach to examining system failure. The analysis starts with identifying a potential undesirable event called a 'top event' and then determining the ways it can occur (e.g., 'Fail To Maintain Nuclear Materials Under The Purview Of The MC&A System'). The analysis proceeds by determining how the top event can be caused by individual or combined lower level faults or failures. These faults, which are the causes of the top event, are 'connected' through logic gates. The MSET model uses AND-gates and OR-gates and propagates the effect of event failure using Boolean algebra. To enable the fault tree analysis calculations, the basic events in the fault tree are populated with probability risk values derived by conversion of questionnaire data to numeric values. The basic events are treated as independent variables. This assumption affects the Boolean algebraic calculations used to calculate results. All the necessary calculations are built into the fault tree codes, but it is often useful to estimate the probabilities manually as a check on code functioning. The probability of failure of a given basic event is the probability that the basic event primary question fails to meet the performance metric for that question. The failure probability is related to how well the facility performs the task identified in that basic event over time (not just one performance or exercise). Fault tree calculations provide a failure probability for the top event in the fault tree. The basic fault tree calculations establish a baseline relative risk value for the system. This probability depicts relative risk, not absolute risk. Subsequent calculations are made to evaluate the change in relative risk that would occur if system performance is improved or degraded. During the development effort of MSET, the fault tree analysis program used was SAPHIRE. SAPHIRE is an acronym for 'Systems Analysis Programs for Hands-on Integrated Reliability Evaluations.' Version 1 of the SAPHIRE code was sponsored by the Nuclear Regulatory Commission in 1987 as an innovative way to draw, edit, and analyze graphical fault trees primarily for safe operation of nuclear power reactors. When the fault tree calculations are performed, the fault tree analysis program will produce several reports that can be used to analyze the MPC&A system. SAPHIRE produces reports showing risk importance factors for all basic events in the operational MC&A system. The risk importance information is used to examine the potential impacts when performance of certain basic events increases or decreases. The initial results produced by the SAPHIRE program are considered relative risk values. None of the results can be interpreted as absolute risk values since the basic event probability values represent estimates of risk associated with the performance of MPC&A tasks throughout the material balance area (MBA). The RRR for a basic event represents the decrease in total system risk that would result from improvement of that one event to a perfect performance level. Improvement of the basic event with the greatest RRR value produces a greater decrease in total system risk than improvement of any other basic event. Basic events with the greatest potential for system risk reduction are assigned performance improvement values, and new fault tree calculations show the improvement in total system risk. The operational impact or cost-effectiveness from implementing the performance improvements can then be evaluated. The improvements being evaluated can be system performance improvements, or they can be potential, or actual, upgrades to the system. The RIR for a basic event represents the increase in total system risk that would result from failure of that one event. Failure of the basic event with the greatest RIR value produces a greater increase in total system risk than failure of any other basic event. Basic events with the greatest potential for system risk increase are assigned failure performance values, and new fault tree calculations show the increase in total system risk. This evaluation shows the importance of preventing performance degradation of the basic events. SAPHIRE identifies combinations of basic events where concurrent failure of the events results in failure of the top event.« less
Software reliability models for fault-tolerant avionics computers and related topics
NASA Technical Reports Server (NTRS)
Miller, Douglas R.
1987-01-01
Software reliability research is briefly described. General research topics are reliability growth models, quality of software reliability prediction, the complete monotonicity property of reliability growth, conceptual modelling of software failure behavior, assurance of ultrahigh reliability, and analysis techniques for fault-tolerant systems.
Study of a unified hardware and software fault-tolerant architecture
NASA Technical Reports Server (NTRS)
Lala, Jaynarayan; Alger, Linda; Friend, Steven; Greeley, Gregory; Sacco, Stephen; Adams, Stuart
1989-01-01
A unified architectural concept, called the Fault Tolerant Processor Attached Processor (FTP-AP), that can tolerate hardware as well as software faults is proposed for applications requiring ultrareliable computation capability. An emulation of the FTP-AP architecture, consisting of a breadboard Motorola 68010-based quadruply redundant Fault Tolerant Processor, four VAX 750s as attached processors, and four versions of a transport aircraft yaw damper control law, is used as a testbed in the AIRLAB to examine a number of critical issues. Solutions of several basic problems associated with N-Version software are proposed and implemented on the testbed. This includes a confidence voter to resolve coincident errors in N-Version software. A reliability model of N-Version software that is based upon the recent understanding of software failure mechanisms is also developed. The basic FTP-AP architectural concept appears suitable for hosting N-Version application software while at the same time tolerating hardware failures. Architectural enhancements for greater efficiency, software reliability modeling, and N-Version issues that merit further research are identified.
Direct evaluation of fault trees using object-oriented programming techniques
NASA Technical Reports Server (NTRS)
Patterson-Hine, F. A.; Koen, B. V.
1989-01-01
Object-oriented programming techniques are used in an algorithm for the direct evaluation of fault trees. The algorithm combines a simple bottom-up procedure for trees without repeated events with a top-down recursive procedure for trees with repeated events. The object-oriented approach results in a dynamic modularization of the tree at each step in the reduction process. The algorithm reduces the number of recursive calls required to solve trees with repeated events and calculates intermediate results as well as the solution of the top event. The intermediate results can be reused if part of the tree is modified. An example is presented in which the results of the algorithm implemented with conventional techniques are compared to those of the object-oriented approach.
An integrated approach to system design, reliability, and diagnosis
NASA Technical Reports Server (NTRS)
Patterson-Hine, F. A.; Iverson, David L.
1990-01-01
The requirement for ultradependability of computer systems in future avionics and space applications necessitates a top-down, integrated systems engineering approach for design, implementation, testing, and operation. The functional analyses of hardware and software systems must be combined by models that are flexible enough to represent their interactions and behavior. The information contained in these models must be accessible throughout all phases of the system life cycle in order to maintain consistency and accuracy in design and operational decisions. One approach being taken by researchers at Ames Research Center is the creation of an object-oriented environment that integrates information about system components required in the reliability evaluation with behavioral information useful for diagnostic algorithms. Procedures have been developed at Ames that perform reliability evaluations during design and failure diagnoses during system operation. These procedures utilize information from a central source, structured as object-oriented fault trees. Fault trees were selected because they are a flexible model widely used in aerospace applications and because they give a concise, structured representation of system behavior. The utility of this integrated environment for aerospace applications in light of our experiences during its development and use is described. The techniques for reliability evaluation and failure diagnosis are discussed, and current extensions of the environment and areas requiring further development are summarized.
An integrated approach to system design, reliability, and diagnosis
NASA Astrophysics Data System (ADS)
Patterson-Hine, F. A.; Iverson, David L.
1990-12-01
The requirement for ultradependability of computer systems in future avionics and space applications necessitates a top-down, integrated systems engineering approach for design, implementation, testing, and operation. The functional analyses of hardware and software systems must be combined by models that are flexible enough to represent their interactions and behavior. The information contained in these models must be accessible throughout all phases of the system life cycle in order to maintain consistency and accuracy in design and operational decisions. One approach being taken by researchers at Ames Research Center is the creation of an object-oriented environment that integrates information about system components required in the reliability evaluation with behavioral information useful for diagnostic algorithms. Procedures have been developed at Ames that perform reliability evaluations during design and failure diagnoses during system operation. These procedures utilize information from a central source, structured as object-oriented fault trees. Fault trees were selected because they are a flexible model widely used in aerospace applications and because they give a concise, structured representation of system behavior. The utility of this integrated environment for aerospace applications in light of our experiences during its development and use is described. The techniques for reliability evaluation and failure diagnosis are discussed, and current extensions of the environment and areas requiring further development are summarized.
NASA Astrophysics Data System (ADS)
Guns, K. A.; Bennett, R. A.; Blisniuk, K.
2017-12-01
To better evaluate the distribution and transfer of strain and slip along the Southern San Andreas Fault (SSAF) zone in the northern Coachella valley in southern California, we integrate geological and geodetic observations to test whether strain is being transferred away from the SSAF system towards the Eastern California Shear Zone through microblock rotation of the Eastern Transverse Ranges (ETR). The faults of the ETR consist of five east-west trending left lateral strike slip faults that have measured cumulative offsets of up to 20 km and as low as 1 km. Present kinematic and block models present a variety of slip rate estimates, from as low as zero to as high as 7 mm/yr, suggesting a gap in our understanding of what role these faults play in the larger system. To determine whether present-day block rotation along these faults is contributing to strain transfer in the region, we are applying 10Be surface exposure dating methods to observed offset channel and alluvial fan deposits in order to estimate fault slip rates along two faults in the ETR. We present observations of offset geomorphic landforms using field mapping and LiDAR data at three sites along the Blue Cut Fault and one site along the Smoke Tree Wash Fault in Joshua Tree National Park which indicate recent Quaternary fault activity. Initial results of site mapping and clast count analyses reveal at least three stages of offset, including potential Holocene offsets, for one site along the Blue Cut Fault, while preliminary 10Be geochronology is in progress. This geologic slip rate data, combined with our new geodetic surface velocity field derived from updated campaign-based GPS measurements within Joshua Tree National Park will allow us to construct a suite of elastic fault block models to elucidate rates of strain transfer away from the SSAF and how that strain transfer may be affecting the length of the interseismic period along the SSAF.
FAULT TREE ANALYSIS FOR EXPOSURE TO REFRIGERANTS USED FOR AUTOMOTIVE AIR CONDITIONING IN THE U.S.
A fault tree analysis was used to estimate the number of refrigerant exposures of automotive service technicians and vehicle occupants in the United States. Exposures of service technicians can occur when service equipment or automotive air-conditioning systems leak during servic...
A Fault Tree Approach to Analysis of Organizational Communication Systems.
ERIC Educational Resources Information Center
Witkin, Belle Ruth; Stephens, Kent G.
Fault Tree Analysis (FTA) is a method of examing communication in an organization by focusing on: (1) the complex interrelationships in human systems, particularly in communication systems; (2) interactions across subsystems and system boundaries; and (3) the need to select and "prioritize" channels which will eliminate noise in the…
Analyzing Software Errors in Safety-Critical Embedded Systems
NASA Technical Reports Server (NTRS)
Lutz, Robyn R.
1994-01-01
This paper analyzes the root causes of safty-related software faults identified as potentially hazardous to the system are distributed somewhat differently over the set of possible error causes than non-safety-related software faults.
Applying fault tree analysis to the prevention of wrong-site surgery.
Abecassis, Zachary A; McElroy, Lisa M; Patel, Ronak M; Khorzad, Rebeca; Carroll, Charles; Mehrotra, Sanjay
2015-01-01
Wrong-site surgery (WSS) is a rare event that occurs to hundreds of patients each year. Despite national implementation of the Universal Protocol over the past decade, development of effective interventions remains a challenge. We performed a systematic review of the literature reporting root causes of WSS and used the results to perform a fault tree analysis to assess the reliability of the system in preventing WSS and identifying high-priority targets for interventions aimed at reducing WSS. Process components where a single error could result in WSS were labeled with OR gates; process aspects reinforced by verification were labeled with AND gates. The overall redundancy of the system was evaluated based on prevalence of AND gates and OR gates. In total, 37 studies described risk factors for WSS. The fault tree contains 35 faults, most of which fall into five main categories. Despite the Universal Protocol mandating patient verification, surgical site signing, and a brief time-out, a large proportion of the process relies on human transcription and verification. Fault tree analysis provides a standardized perspective of errors or faults within the system of surgical scheduling and site confirmation. It can be adapted by institutions or specialties to lead to more targeted interventions to increase redundancy and reliability within the preoperative process. Copyright © 2015 Elsevier Inc. All rights reserved.
An experiment in software reliability
NASA Technical Reports Server (NTRS)
Dunham, J. R.; Pierce, J. L.
1986-01-01
The results of a software reliability experiment conducted in a controlled laboratory setting are reported. The experiment was undertaken to gather data on software failures and is one in a series of experiments being pursued by the Fault Tolerant Systems Branch of NASA Langley Research Center to find a means of credibly performing reliability evaluations of flight control software. The experiment tests a small sample of implementations of radar tracking software having ultra-reliability requirements and uses n-version programming for error detection, and repetitive run modeling for failure and fault rate estimation. The experiment results agree with those of Nagel and Skrivan in that the program error rates suggest an approximate log-linear pattern and the individual faults occurred with significantly different error rates. Additional analysis of the experimental data raises new questions concerning the phenomenon of interacting faults. This phenomenon may provide one explanation for software reliability decay.
A coverage and slicing dependencies analysis for seeking software security defects.
He, Hui; Zhang, Dongyan; Liu, Min; Zhang, Weizhe; Gao, Dongmin
2014-01-01
Software security defects have a serious impact on the software quality and reliability. It is a major hidden danger for the operation of a system that a software system has some security flaws. When the scale of the software increases, its vulnerability has becoming much more difficult to find out. Once these vulnerabilities are exploited, it may lead to great loss. In this situation, the concept of Software Assurance is carried out by some experts. And the automated fault localization technique is a part of the research of Software Assurance. Currently, automated fault localization method includes coverage based fault localization (CBFL) and program slicing. Both of the methods have their own location advantages and defects. In this paper, we have put forward a new method, named Reverse Data Dependence Analysis Model, which integrates the two methods by analyzing the program structure. On this basis, we finally proposed a new automated fault localization method. This method not only is automation lossless but also changes the basic location unit into single sentence, which makes the location effect more accurate. Through several experiments, we proved that our method is more effective. Furthermore, we analyzed the effectiveness among these existing methods and different faults.
Fault recovery characteristics of the fault tolerant multi-processor
NASA Technical Reports Server (NTRS)
Padilla, Peter A.
1990-01-01
The fault handling performance of the fault tolerant multiprocessor (FTMP) was investigated. Fault handling errors detected during fault injection experiments were characterized. In these fault injection experiments, the FTMP disabled a working unit instead of the faulted unit once every 500 faults, on the average. System design weaknesses allow active faults to exercise a part of the fault management software that handles byzantine or lying faults. It is pointed out that these weak areas in the FTMP's design increase the probability that, for any hardware fault, a good LRU (line replaceable unit) is mistakenly disabled by the fault management software. It is concluded that fault injection can help detect and analyze the behavior of a system in the ultra-reliable regime. Although fault injection testing cannot be exhaustive, it has been demonstrated that it provides a unique capability to unmask problems and to characterize the behavior of a fault-tolerant system.
Langenheim, Victoria E.; Rymer, Michael J.; Catchings, Rufus D.; Goldman, Mark R.; Watt, Janet T.; Powell, Robert E.; Matti, Jonathan C.
2016-03-02
We describe high-resolution gravity and seismic refraction surveys acquired to determine the thickness of valley-fill deposits and to delineate geologic structures that might influence groundwater flow beneath the Smoke Tree Wash area in Joshua Tree National Park. These surveys identified a sedimentary basin that is fault-controlled. A profile across the Smoke Tree Wash fault zone reveals low gravity values and seismic velocities that coincide with a mapped strand of the Smoke Tree Wash fault. Modeling of the gravity data reveals a basin about 2–2.5 km long and 1 km wide that is roughly centered on this mapped strand, and bounded by inferred faults. According to the gravity model the deepest part of the basin is about 270 m, but this area coincides with low velocities that are not characteristic of typical basement complex rocks. Most likely, the density contrast assumed in the inversion is too high or the uncharacteristically low velocities represent highly fractured or weathered basement rocks, or both. A longer seismic profile extending onto basement outcrops would help differentiate which scenario is more accurate. The seismic velocities also determine the depth to water table along the profile to be about 40–60 m, consistent with water levels measured in water wells near the northern end of the profile.
A Fault Tree Approach to Needs Assessment -- An Overview.
ERIC Educational Resources Information Center
Stephens, Kent G.
A "failsafe" technology is presented based on a new unified theory of needs assessment. Basically the paper discusses fault tree analysis as a technique for enhancing the probability of success in any system by analyzing the most likely modes of failure that could occur and then suggesting high priority avoidance strategies for those…
NHPP-Based Software Reliability Models Using Equilibrium Distribution
NASA Astrophysics Data System (ADS)
Xiao, Xiao; Okamura, Hiroyuki; Dohi, Tadashi
Non-homogeneous Poisson processes (NHPPs) have gained much popularity in actual software testing phases to estimate the software reliability, the number of remaining faults in software and the software release timing. In this paper, we propose a new modeling approach for the NHPP-based software reliability models (SRMs) to describe the stochastic behavior of software fault-detection processes. The fundamental idea is to apply the equilibrium distribution to the fault-detection time distribution in NHPP-based modeling. We also develop efficient parameter estimation procedures for the proposed NHPP-based SRMs. Through numerical experiments, it can be concluded that the proposed NHPP-based SRMs outperform the existing ones in many data sets from the perspective of goodness-of-fit and prediction performance.
Huang, Weiqing; Fan, Hongbo; Qiu, Yongfu; Cheng, Zhiyu; Xu, Pingru; Qian, Yu
2016-05-01
Recently, China has frequently experienced large-scale, severe and persistent haze pollution due to surging urbanization and industrialization and a rapid growth in the number of motor vehicles and energy consumption. The vehicle emission due to the consumption of a large number of fossil fuels is no doubt a critical factor of the haze pollution. This work is focused on the causation mechanism of haze pollution related to the vehicle emission for Guangzhou city by employing the Fault Tree Analysis (FTA) method for the first time. With the establishment of the fault tree system of "Haze weather-Vehicle exhausts explosive emission", all of the important risk factors are discussed and identified by using this deductive FTA method. The qualitative and quantitative assessments of the fault tree system are carried out based on the structure, probability and critical importance degree analysis of the risk factors. The study may provide a new simple and effective tool/strategy for the causation mechanism analysis and risk management of haze pollution in China. Copyright © 2016 Elsevier Ltd. All rights reserved.
A study of fault prediction and reliability assessment in the SEL environment
NASA Technical Reports Server (NTRS)
Basili, Victor R.; Patnaik, Debabrata
1986-01-01
An empirical study on estimation and prediction of faults, prediction of fault detection and correction effort, and reliability assessment in the Software Engineering Laboratory environment (SEL) is presented. Fault estimation using empirical relationships and fault prediction using curve fitting method are investigated. Relationships between debugging efforts (fault detection and correction effort) in different test phases are provided, in order to make an early estimate of future debugging effort. This study concludes with the fault analysis, application of a reliability model, and analysis of a normalized metric for reliability assessment and reliability monitoring during development of software.
NASA Astrophysics Data System (ADS)
Sanchez-Vila, X.; de Barros, F.; Bolster, D.; Nowak, W.
2010-12-01
Assessing the potential risk of hydro(geo)logical supply systems to human population is an interdisciplinary field. It relies on the expertise in fields as distant as hydrogeology, medicine, or anthropology, and needs powerful translation concepts to provide decision support and policy making. Reliable health risk estimates need to account for the uncertainties in hydrological, physiological and human behavioral parameters. We propose the use of fault trees to address the task of probabilistic risk analysis (PRA) and to support related management decisions. Fault trees allow decomposing the assessment of health risk into individual manageable modules, thus tackling a complex system by a structural “Divide and Conquer” approach. The complexity within each module can be chosen individually according to data availability, parsimony, relative importance and stage of analysis. The separation in modules allows for a true inter- and multi-disciplinary approach. This presentation highlights the three novel features of our work: (1) we define failure in terms of risk being above a threshold value, whereas previous studies used auxiliary events such as exceedance of critical concentration levels, (2) we plot an integrated fault tree that handles uncertainty in both hydrological and health components in a unified way, and (3) we introduce a new form of stochastic fault tree that allows to weaken the assumption of independent subsystems that is required by a classical fault tree approach. We illustrate our concept in a simple groundwater-related setting.
Measurement and analysis of operating system fault tolerance
NASA Technical Reports Server (NTRS)
Lee, I.; Tang, D.; Iyer, R. K.
1992-01-01
This paper demonstrates a methodology to model and evaluate the fault tolerance characteristics of operational software. The methodology is illustrated through case studies on three different operating systems: the Tandem GUARDIAN fault-tolerant system, the VAX/VMS distributed system, and the IBM/MVS system. Measurements are made on these systems for substantial periods to collect software error and recovery data. In addition to investigating basic dependability characteristics such as major software problems and error distributions, we develop two levels of models to describe error and recovery processes inside an operating system and on multiple instances of an operating system running in a distributed environment. Based on the models, reward analysis is conducted to evaluate the loss of service due to software errors and the effect of the fault-tolerance techniques implemented in the systems. Software error correlation in multicomputer systems is also investigated.
A fuzzy decision tree for fault classification.
Zio, Enrico; Baraldi, Piero; Popescu, Irina C
2008-02-01
In plant accident management, the control room operators are required to identify the causes of the accident, based on the different patterns of evolution of the monitored process variables thereby developing. This task is often quite challenging, given the large number of process parameters monitored and the intense emotional states under which it is performed. To aid the operators, various techniques of fault classification have been engineered. An important requirement for their practical application is the physical interpretability of the relationships among the process variables underpinning the fault classification. In this view, the present work propounds a fuzzy approach to fault classification, which relies on fuzzy if-then rules inferred from the clustering of available preclassified signal data, which are then organized in a logical and transparent decision tree structure. The advantages offered by the proposed approach are precisely that a transparent fault classification model is mined out of the signal data and that the underlying physical relationships among the process variables are easily interpretable as linguistic if-then rules that can be explicitly visualized in the decision tree structure. The approach is applied to a case study regarding the classification of simulated faults in the feedwater system of a boiling water reactor.
Development and analysis of the Software Implemented Fault-Tolerance (SIFT) computer
NASA Technical Reports Server (NTRS)
Goldberg, J.; Kautz, W. H.; Melliar-Smith, P. M.; Green, M. W.; Levitt, K. N.; Schwartz, R. L.; Weinstock, C. B.
1984-01-01
SIFT (Software Implemented Fault Tolerance) is an experimental, fault-tolerant computer system designed to meet the extreme reliability requirements for safety-critical functions in advanced aircraft. Errors are masked by performing a majority voting operation over the results of identical computations, and faulty processors are removed from service by reassigning computations to the nonfaulty processors. This scheme has been implemented in a special architecture using a set of standard Bendix BDX930 processors, augmented by a special asynchronous-broadcast communication interface that provides direct, processor to processor communication among all processors. Fault isolation is accomplished in hardware; all other fault-tolerance functions, together with scheduling and synchronization are implemented exclusively by executive system software. The system reliability is predicted by a Markov model. Mathematical consistency of the system software with respect to the reliability model has been partially verified, using recently developed tools for machine-aided proof of program correctness.
FTC - THE FAULT-TREE COMPILER (SUN VERSION)
NASA Technical Reports Server (NTRS)
Butler, R. W.
1994-01-01
FTC, the Fault-Tree Compiler program, is a tool used to calculate the top-event probability for a fault-tree. Five different gate types are allowed in the fault tree: AND, OR, EXCLUSIVE OR, INVERT, and M OF N. The high-level input language is easy to understand and use. In addition, the program supports a hierarchical fault tree definition feature which simplifies the tree-description process and reduces execution time. A rigorous error bound is derived for the solution technique. This bound enables the program to supply an answer precisely (within the limits of double precision floating point arithmetic) at a user-specified number of digits accuracy. The program also facilitates sensitivity analysis with respect to any specified parameter of the fault tree such as a component failure rate or a specific event probability by allowing the user to vary one failure rate or the failure probability over a range of values and plot the results. The mathematical approach chosen to solve a reliability problem may vary with the size and nature of the problem. Although different solution techniques are utilized on different programs, it is possible to have a common input language. The Systems Validation Methods group at NASA Langley Research Center has created a set of programs that form the basis for a reliability analysis workstation. The set of programs are: SURE reliability analysis program (COSMIC program LAR-13789, LAR-14921); the ASSIST specification interface program (LAR-14193, LAR-14923), PAWS/STEM reliability analysis programs (LAR-14165, LAR-14920); and the FTC fault tree tool (LAR-14586, LAR-14922). FTC is used to calculate the top-event probability for a fault tree. PAWS/STEM and SURE are programs which interpret the same SURE language, but utilize different solution methods. ASSIST is a preprocessor that generates SURE language from a more abstract definition. SURE, ASSIST, and PAWS/STEM are also offered as a bundle. Please see the abstract for COS-10039/COS-10041, SARA - SURE/ASSIST Reliability Analysis Workstation, for pricing details. FTC was originally developed for DEC VAX series computers running VMS and was later ported for use on Sun computers running SunOS. The program is written in PASCAL, ANSI compliant C-language, and FORTRAN 77. The TEMPLATE graphics library is required to obtain graphical output. The standard distribution medium for the VMS version of FTC (LAR-14586) is a 9-track 1600 BPI magnetic tape in VMSINSTAL format. It is also available on a TK50 tape cartridge in VMSINSTAL format. Executables are included. The standard distribution medium for the Sun version of FTC (LAR-14922) is a .25 inch streaming magnetic tape cartridge in UNIX tar format. Both Sun3 and Sun4 executables are included. FTC was developed in 1989 and last updated in 1992. DEC, VAX, VMS, and TK50 are trademarks of Digital Equipment Corporation. UNIX is a registered trademark of AT&T Bell Laboratories. SunOS is a trademark of Sun Microsystems, Inc.
FTC - THE FAULT-TREE COMPILER (VAX VMS VERSION)
NASA Technical Reports Server (NTRS)
Butler, R. W.
1994-01-01
FTC, the Fault-Tree Compiler program, is a tool used to calculate the top-event probability for a fault-tree. Five different gate types are allowed in the fault tree: AND, OR, EXCLUSIVE OR, INVERT, and M OF N. The high-level input language is easy to understand and use. In addition, the program supports a hierarchical fault tree definition feature which simplifies the tree-description process and reduces execution time. A rigorous error bound is derived for the solution technique. This bound enables the program to supply an answer precisely (within the limits of double precision floating point arithmetic) at a user-specified number of digits accuracy. The program also facilitates sensitivity analysis with respect to any specified parameter of the fault tree such as a component failure rate or a specific event probability by allowing the user to vary one failure rate or the failure probability over a range of values and plot the results. The mathematical approach chosen to solve a reliability problem may vary with the size and nature of the problem. Although different solution techniques are utilized on different programs, it is possible to have a common input language. The Systems Validation Methods group at NASA Langley Research Center has created a set of programs that form the basis for a reliability analysis workstation. The set of programs are: SURE reliability analysis program (COSMIC program LAR-13789, LAR-14921); the ASSIST specification interface program (LAR-14193, LAR-14923), PAWS/STEM reliability analysis programs (LAR-14165, LAR-14920); and the FTC fault tree tool (LAR-14586, LAR-14922). FTC is used to calculate the top-event probability for a fault tree. PAWS/STEM and SURE are programs which interpret the same SURE language, but utilize different solution methods. ASSIST is a preprocessor that generates SURE language from a more abstract definition. SURE, ASSIST, and PAWS/STEM are also offered as a bundle. Please see the abstract for COS-10039/COS-10041, SARA - SURE/ASSIST Reliability Analysis Workstation, for pricing details. FTC was originally developed for DEC VAX series computers running VMS and was later ported for use on Sun computers running SunOS. The program is written in PASCAL, ANSI compliant C-language, and FORTRAN 77. The TEMPLATE graphics library is required to obtain graphical output. The standard distribution medium for the VMS version of FTC (LAR-14586) is a 9-track 1600 BPI magnetic tape in VMSINSTAL format. It is also available on a TK50 tape cartridge in VMSINSTAL format. Executables are included. The standard distribution medium for the Sun version of FTC (LAR-14922) is a .25 inch streaming magnetic tape cartridge in UNIX tar format. Both Sun3 and Sun4 executables are included. FTC was developed in 1989 and last updated in 1992. DEC, VAX, VMS, and TK50 are trademarks of Digital Equipment Corporation. UNIX is a registered trademark of AT&T Bell Laboratories. SunOS is a trademark of Sun Microsystems, Inc.
SPACE PROPULSION SYSTEM PHASED-MISSION PROBABILITY ANALYSIS USING CONVENTIONAL PRA METHODS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Curtis Smith; James Knudsen
As part of a series of papers on the topic of advance probabilistic methods, a benchmark phased-mission problem has been suggested. This problem consists of modeling a space mission using an ion propulsion system, where the mission consists of seven mission phases. The mission requires that the propulsion operate for several phases, where the configuration changes as a function of phase. The ion propulsion system itself consists of five thruster assemblies and a single propellant supply, where each thruster assembly has one propulsion power unit and two ion engines. In this paper, we evaluate the probability of mission failure usingmore » the conventional methodology of event tree/fault tree analysis. The event tree and fault trees are developed and analyzed using Systems Analysis Programs for Hands-on Integrated Reliability Evaluations (SAPHIRE). While the benchmark problem is nominally a "dynamic" problem, in our analysis the mission phases are modeled in a single event tree to show the progression from one phase to the next. The propulsion system is modeled in fault trees to account for the operation; or in this case, the failure of the system. Specifically, the propulsion system is decomposed into each of the five thruster assemblies and fed into the appropriate N-out-of-M gate to evaluate mission failure. A separate fault tree for the propulsion system is developed to account for the different success criteria of each mission phase. Common-cause failure modeling is treated using traditional (i.e., parametrically) methods. As part of this paper, we discuss the overall results in addition to the positive and negative aspects of modeling dynamic situations with non-dynamic modeling techniques. One insight from the use of this conventional method for analyzing the benchmark problem is that it requires significant manual manipulation to the fault trees and how they are linked into the event tree. The conventional method also requires editing the resultant cut sets to obtain the correct results. While conventional methods may be used to evaluate a dynamic system like that in the benchmark, the level of effort required may preclude its use on real-world problems.« less
Analyzing and Predicting Effort Associated with Finding and Fixing Software Faults
NASA Technical Reports Server (NTRS)
Hamill, Maggie; Goseva-Popstojanova, Katerina
2016-01-01
Context: Software developers spend a significant amount of time fixing faults. However, not many papers have addressed the actual effort needed to fix software faults. Objective: The objective of this paper is twofold: (1) analysis of the effort needed to fix software faults and how it was affected by several factors and (2) prediction of the level of fix implementation effort based on the information provided in software change requests. Method: The work is based on data related to 1200 failures, extracted from the change tracking system of a large NASA mission. The analysis includes descriptive and inferential statistics. Predictions are made using three supervised machine learning algorithms and three sampling techniques aimed at addressing the imbalanced data problem. Results: Our results show that (1) 83% of the total fix implementation effort was associated with only 20% of failures. (2) Both safety critical failures and post-release failures required three times more effort to fix compared to non-critical and pre-release counterparts, respectively. (3) Failures with fixes spread across multiple components or across multiple types of software artifacts required more effort. The spread across artifacts was more costly than spread across components. (4) Surprisingly, some types of faults associated with later life-cycle activities did not require significant effort. (5) The level of fix implementation effort was predicted with 73% overall accuracy using the original, imbalanced data. Using oversampling techniques improved the overall accuracy up to 77%. More importantly, oversampling significantly improved the prediction of the high level effort, from 31% to around 85%. Conclusions: This paper shows the importance of tying software failures to changes made to fix all associated faults, in one or more software components and/or in one or more software artifacts, and the benefit of studying how the spread of faults and other factors affect the fix implementation effort.
A Fault Tree Approach to Analysis of Behavioral Systems: An Overview.
ERIC Educational Resources Information Center
Stephens, Kent G.
Developed at Brigham Young University, Fault Tree Analysis (FTA) is a technique for enhancing the probability of success in any system by analyzing the most likely modes of failure that could occur. It provides a logical, step-by-step description of possible failure events within a system and their interaction--the combinations of potential…
MER Surface Phase; Blurring the Line Between Fault Protection and What is Supposed to Happen
NASA Technical Reports Server (NTRS)
Reeves, Glenn E.
2008-01-01
An assessment on the limitations of communication with MER rovers and how such constraints drove the system design, flight software and fault protection architecture, blurring the line between traditional fault protection and expected nominal behavior, and requiring the most novel autonomous and semi-autonomous elements of the vehicle software including communication, surface mobility, attitude knowledge acquisition, fault protection, and the activity arbitration service.
The engine fuel system fault analysis
NASA Astrophysics Data System (ADS)
Zhang, Yong; Song, Hanqiang; Yang, Changsheng; Zhao, Wei
2017-05-01
For improving the reliability of the engine fuel system, the typical fault factor of the engine fuel system was analyzed from the point view of structure and functional. The fault character was gotten by building the fuel system fault tree. According the utilizing of fault mode effect analysis method (FMEA), several factors of key component fuel regulator was obtained, which include the fault mode, the fault cause, and the fault influences. All of this made foundation for next development of fault diagnosis system.
Transient Faults in Computer Systems
NASA Technical Reports Server (NTRS)
Masson, Gerald M.
1993-01-01
A powerful technique particularly appropriate for the detection of errors caused by transient faults in computer systems was developed. The technique can be implemented in either software or hardware; the research conducted thus far primarily considered software implementations. The error detection technique developed has the distinct advantage of having provably complete coverage of all errors caused by transient faults that affect the output produced by the execution of a program. In other words, the technique does not have to be tuned to a particular error model to enhance error coverage. Also, the correctness of the technique can be formally verified. The technique uses time and software redundancy. The foundation for an effective, low-overhead, software-based certification trail approach to real-time error detection resulting from transient fault phenomena was developed.
Copilot: Monitoring Embedded Systems
NASA Technical Reports Server (NTRS)
Pike, Lee; Wegmann, Nis; Niller, Sebastian; Goodloe, Alwyn
2012-01-01
Runtime verification (RV) is a natural fit for ultra-critical systems, where correctness is imperative. In ultra-critical systems, even if the software is fault-free, because of the inherent unreliability of commodity hardware and the adversity of operational environments, processing units (and their hosted software) are replicated, and fault-tolerant algorithms are used to compare the outputs. We investigate both software monitoring in distributed fault-tolerant systems, as well as implementing fault-tolerance mechanisms using RV techniques. We describe the Copilot language and compiler, specifically designed for generating monitors for distributed, hard real-time systems. We also describe two case-studies in which we generated Copilot monitors in avionics systems.
Fault tree analysis: NiH2 aerospace cells for LEO mission
NASA Technical Reports Server (NTRS)
Klein, Glenn C.; Rash, Donald E., Jr.
1992-01-01
The Fault Tree Analysis (FTA) is one of several reliability analyses or assessments applied to battery cells to be utilized in typical Electric Power Subsystems for spacecraft in low Earth orbit missions. FTA is generally the process of reviewing and analytically examining a system or equipment in such a way as to emphasize the lower level fault occurrences which directly or indirectly contribute to the major fault or top level event. This qualitative FTA addresses the potential of occurrence for five specific top level events: hydrogen leakage through either discrete leakage paths or through pressure vessel rupture; and four distinct modes of performance degradation - high charge voltage, suppressed discharge voltage, loss of capacity, and high pressure.
Modular techniques for dynamic fault-tree analysis
NASA Technical Reports Server (NTRS)
Patterson-Hine, F. A.; Dugan, Joanne B.
1992-01-01
It is noted that current approaches used to assess the dependability of complex systems such as Space Station Freedom and the Air Traffic Control System are incapable of handling the size and complexity of these highly integrated designs. A novel technique for modeling such systems which is built upon current techniques in Markov theory and combinatorial analysis is described. It enables the development of a hierarchical representation of system behavior which is more flexible than either technique alone. A solution strategy which is based on an object-oriented approach to model representation and evaluation is discussed. The technique is virtually transparent to the user since the fault tree models can be built graphically and the objects defined automatically. The tree modularization procedure allows the two model types, Markov and combinatoric, to coexist and does not require that the entire fault tree be translated to a Markov chain for evaluation. This effectively reduces the size of the Markov chain required and enables solutions with less truncation, making analysis of longer mission times possible. Using the fault-tolerant parallel processor as an example, a model is built and solved for a specific mission scenario and the solution approach is illustrated in detail.
1988-08-20
34 William A. Link, Patuxent Wildlife Research Center "Increasing reliability of multiversion fault-tolerant software design by modulation," Junryo 3... Multiversion lault-Tolerant Software Design by Modularization Junryo Miyashita Department of Computer Science California state University at san Bernardino Fault...They shall beE refered to as " multiversion fault-tolerant software design". Onel problem of developing multi-versions of a program is the high cost
User's guide to programming fault injection and data acquisition in the SIFT environment
NASA Technical Reports Server (NTRS)
Elks, Carl R.; Green, David F.; Palumbo, Daniel L.
1987-01-01
Described are the features, command language, and functional design of the SIFT (Software Implemented Fault Tolerance) fault injection and data acquisition interface software. The document is also intended to assist and guide the SIFT user in defining, developing, and executing SIFT fault injection experiments and the subsequent collection and reduction of that fault injection data. It is also intended to be used in conjunction with the SIFT User's Guide (NASA Technical Memorandum 86289) for reference to SIFT system commands, procedures and functions, and overall guidance in SIFT system programming.
Soft error evaluation and vulnerability analysis in Xilinx Zynq-7010 system-on chip
NASA Astrophysics Data System (ADS)
Du, Xuecheng; He, Chaohui; Liu, Shuhuan; Zhang, Yao; Li, Yonghong; Xiong, Ceng; Tan, Pengkang
2016-09-01
Radiation-induced soft errors are an increasingly important threat to the reliability of modern electronic systems. In order to evaluate system-on chip's reliability and soft error, the fault tree analysis method was used in this work. The system fault tree was constructed based on Xilinx Zynq-7010 All Programmable SoC. Moreover, the soft error rates of different components in Zynq-7010 SoC were tested by americium-241 alpha radiation source. Furthermore, some parameters that used to evaluate the system's reliability and safety were calculated using Isograph Reliability Workbench 11.0, such as failure rate, unavailability and mean time to failure (MTTF). According to fault tree analysis for system-on chip, the critical blocks and system reliability were evaluated through the qualitative and quantitative analysis.
The Infeasibility of Experimental Quantification of Life-Critical Software Reliability
NASA Technical Reports Server (NTRS)
Butler, Ricky W.; Finelli, George B.
1991-01-01
This paper affirms that quantification of life-critical software reliability is infeasible using statistical methods whether applied to standard software or fault-tolerant software. The key assumption of software fault tolerance|separately programmed versions fail independently|is shown to be problematic. This assumption cannot be justified by experimentation in the ultra-reliability region and subjective arguments in its favor are not sufficiently strong to justify it as an axiom. Also, the implications of the recent multi-version software experiments support this affirmation.
Experimental analysis of computer system dependability
NASA Technical Reports Server (NTRS)
Iyer, Ravishankar, K.; Tang, Dong
1993-01-01
This paper reviews an area which has evolved over the past 15 years: experimental analysis of computer system dependability. Methodologies and advances are discussed for three basic approaches used in the area: simulated fault injection, physical fault injection, and measurement-based analysis. The three approaches are suited, respectively, to dependability evaluation in the three phases of a system's life: design phase, prototype phase, and operational phase. Before the discussion of these phases, several statistical techniques used in the area are introduced. For each phase, a classification of research methods or study topics is outlined, followed by discussion of these methods or topics as well as representative studies. The statistical techniques introduced include the estimation of parameters and confidence intervals, probability distribution characterization, and several multivariate analysis methods. Importance sampling, a statistical technique used to accelerate Monte Carlo simulation, is also introduced. The discussion of simulated fault injection covers electrical-level, logic-level, and function-level fault injection methods as well as representative simulation environments such as FOCUS and DEPEND. The discussion of physical fault injection covers hardware, software, and radiation fault injection methods as well as several software and hybrid tools including FIAT, FERARI, HYBRID, and FINE. The discussion of measurement-based analysis covers measurement and data processing techniques, basic error characterization, dependency analysis, Markov reward modeling, software-dependability, and fault diagnosis. The discussion involves several important issues studies in the area, including fault models, fast simulation techniques, workload/failure dependency, correlated failures, and software fault tolerance.
Fault Management Techniques in Human Spaceflight Operations
NASA Technical Reports Server (NTRS)
O'Hagan, Brian; Crocker, Alan
2006-01-01
This paper discusses human spaceflight fault management operations. Fault detection and response capabilities available in current US human spaceflight programs Space Shuttle and International Space Station are described while emphasizing system design impacts on operational techniques and constraints. Preflight and inflight processes along with products used to anticipate, mitigate and respond to failures are introduced. Examples of operational products used to support failure responses are presented. Possible improvements in the state of the art, as well as prioritization and success criteria for their implementation are proposed. This paper describes how the architecture of a command and control system impacts operations in areas such as the required fault response times, automated vs. manual fault responses, use of workarounds, etc. The architecture includes the use of redundancy at the system and software function level, software capabilities, use of intelligent or autonomous systems, number and severity of software defects, etc. This in turn drives which Caution and Warning (C&W) events should be annunciated, C&W event classification, operator display designs, crew training, flight control team training, and procedure development. Other factors impacting operations are the complexity of a system, skills needed to understand and operate a system, and the use of commonality vs. optimized solutions for software and responses. Fault detection, annunciation, safing responses, and recovery capabilities are explored using real examples to uncover underlying philosophies and constraints. These factors directly impact operations in that the crew and flight control team need to understand what happened, why it happened, what the system is doing, and what, if any, corrective actions they need to perform. If a fault results in multiple C&W events, or if several faults occur simultaneously, the root cause(s) of the fault(s), as well as their vehicle-wide impacts, must be determined in order to maintain situational awareness. This allows both automated and manual recovery operations to focus on the real cause of the fault(s). An appropriate balance must be struck between correcting the root cause failure and addressing the impacts of that fault on other vehicle components. Lastly, this paper presents a strategy for using lessons learned to improve the software, displays, and procedures in addition to determining what is a candidate for automation. Enabling technologies and techniques are identified to promote system evolution from one that requires manual fault responses to one that uses automation and autonomy where they are most effective. These considerations include the value in correcting software defects in a timely manner, automation of repetitive tasks, making time critical responses autonomous, etc. The paper recommends the appropriate use of intelligent systems to determine the root causes of faults and correctly identify separate unrelated faults.
Software fault tolerance in computer operating systems
NASA Technical Reports Server (NTRS)
Iyer, Ravishankar K.; Lee, Inhwan
1994-01-01
This chapter provides data and analysis of the dependability and fault tolerance for three operating systems: the Tandem/GUARDIAN fault-tolerant system, the VAX/VMS distributed system, and the IBM/MVS system. Based on measurements from these systems, basic software error characteristics are investigated. Fault tolerance in operating systems resulting from the use of process pairs and recovery routines is evaluated. Two levels of models are developed to analyze error and recovery processes inside an operating system and interactions among multiple instances of an operating system running in a distributed environment. The measurements show that the use of process pairs in Tandem systems, which was originally intended for tolerating hardware faults, allows the system to tolerate about 70% of defects in system software that result in processor failures. The loose coupling between processors which results in the backup execution (the processor state and the sequence of events occurring) being different from the original execution is a major reason for the measured software fault tolerance. The IBM/MVS system fault tolerance almost doubles when recovery routines are provided, in comparison to the case in which no recovery routines are available. However, even when recovery routines are provided, there is almost a 50% chance of system failure when critical system jobs are involved.
Decision tree and PCA-based fault diagnosis of rotating machinery
NASA Astrophysics Data System (ADS)
Sun, Weixiang; Chen, Jin; Li, Jiaqing
2007-04-01
After analysing the flaws of conventional fault diagnosis methods, data mining technology is introduced to fault diagnosis field, and a new method based on C4.5 decision tree and principal component analysis (PCA) is proposed. In this method, PCA is used to reduce features after data collection, preprocessing and feature extraction. Then, C4.5 is trained by using the samples to generate a decision tree model with diagnosis knowledge. At last the tree model is used to make diagnosis analysis. To validate the method proposed, six kinds of running states (normal or without any defect, unbalance, rotor radial rub, oil whirl, shaft crack and a simultaneous state of unbalance and radial rub), are simulated on Bently Rotor Kit RK4 to test C4.5 and PCA-based method and back-propagation neural network (BPNN). The result shows that C4.5 and PCA-based diagnosis method has higher accuracy and needs less training time than BPNN.
Diagnostics Tools Identify Faults Prior to Failure
NASA Technical Reports Server (NTRS)
2013-01-01
Through the SBIR program, Rochester, New York-based Impact Technologies LLC collaborated with Ames Research Center to commercialize the Center s Hybrid Diagnostic Engine, or HyDE, software. The fault detecting program is now incorporated into a software suite that identifies potential faults early in the design phase of systems ranging from printers to vehicles and robots, saving time and money.
Hardware Fault Simulator for Microprocessors
NASA Technical Reports Server (NTRS)
Hess, L. M.; Timoc, C. C.
1983-01-01
Breadboarded circuit is faster and more thorough than software simulator. Elementary fault simulator for AND gate uses three gates and shaft register to simulate stuck-at-one or stuck-at-zero conditions at inputs and output. Experimental results showed hardware fault simulator for microprocessor gave faster results than software simulator, by two orders of magnitude, with one test being applied every 4 microseconds.
NASA Technical Reports Server (NTRS)
Chang, Chi-Yung (Inventor); Fang, Wai-Chi (Inventor); Curlander, John C. (Inventor)
1995-01-01
A system for data compression utilizing systolic array architecture for Vector Quantization (VQ) is disclosed for both full-searched and tree-searched. For a tree-searched VQ, the special case of a Binary Tree-Search VQ (BTSVQ) is disclosed with identical Processing Elements (PE) in the array for both a Raw-Codebook VQ (RCVQ) and a Difference-Codebook VQ (DCVQ) algorithm. A fault tolerant system is disclosed which allows a PE that has developed a fault to be bypassed in the array and replaced by a spare at the end of the array, with codebook memory assignment shifted one PE past the faulty PE of the array.
Fault tree analysis for system modeling in case of intentional EMI
NASA Astrophysics Data System (ADS)
Genender, E.; Mleczko, M.; Döring, O.; Garbe, H.; Potthast, S.
2011-08-01
The complexity of modern systems on the one hand and the rising threat of intentional electromagnetic interference (IEMI) on the other hand increase the necessity for systematical risk analysis. Most of the problems can not be treated deterministically since slight changes in the configuration (source, position, polarization, ...) can dramatically change the outcome of an event. For that purpose, methods known from probabilistic risk analysis can be applied. One of the most common approaches is the fault tree analysis (FTA). The FTA is used to determine the system failure probability and also the main contributors to its failure. In this paper the fault tree analysis is introduced and a possible application of that method is shown using a small computer network as an example. The constraints of this methods are explained and conclusions for further research are drawn.
NASA Astrophysics Data System (ADS)
Akinci, A.; Pace, B.
2017-12-01
In this study, we discuss the seismic hazard variability of peak ground acceleration (PGA) at 475 years return period in the Southern Apennines of Italy. The uncertainty and parametric sensitivity are presented to quantify the impact of the several fault parameters on ground motion predictions for 10% exceedance in 50-year hazard. A time-independent PSHA model is constructed based on the long-term recurrence behavior of seismogenic faults adopting the characteristic earthquake model for those sources capable of rupturing the entire fault segment with a single maximum magnitude. The fault-based source model uses the dimensions and slip rates of mapped fault to develop magnitude-frequency estimates for characteristic earthquakes. Variability of the selected fault parameter is given with a truncated normal random variable distribution presented by standard deviation about a mean value. A Monte Carlo approach, based on the random balanced sampling by logic tree, is used in order to capture the uncertainty in seismic hazard calculations. For generating both uncertainty and sensitivity maps, we perform 200 simulations for each of the fault parameters. The results are synthesized both in frequency-magnitude distribution of modeled faults as well as the different maps: the overall uncertainty maps provide a confidence interval for the PGA values and the parameter uncertainty maps determine the sensitivity of hazard assessment to variability of every logic tree branch. These branches of logic tree, analyzed through the Monte Carlo approach, are maximum magnitudes, fault length, fault width, fault dip and slip rates. The overall variability of these parameters is determined by varying them simultaneously in the hazard calculations while the sensitivity of each parameter to overall variability is determined varying each of the fault parameters while fixing others. However, in this study we do not investigate the sensitivity of mean hazard results to the consideration of different GMPEs. Distribution of possible seismic hazard results is illustrated by 95% confidence factor map, which indicates the dispersion about mean value, and coefficient of variation map, which shows percent variability. The results of our study clearly illustrate the influence of active fault parameters to probabilistic seismic hazard maps.
NASA Technical Reports Server (NTRS)
Dunham, J. R. (Editor); Knight, J. C. (Editor)
1982-01-01
The state of the art in the production of crucial software for flight control applications was addressed. The association between reliability metrics and software is considered. Thirteen software development projects are discussed. A short term need for research in the areas of tool development and software fault tolerance was indicated. For the long term, research in format verification or proof methods was recommended. Formal specification and software reliability modeling, were recommended as topics for both short and long term research.
Algorithm-Based Fault Tolerance for Numerical Subroutines
NASA Technical Reports Server (NTRS)
Tumon, Michael; Granat, Robert; Lou, John
2007-01-01
A software library implements a new methodology of detecting faults in numerical subroutines, thus enabling application programs that contain the subroutines to recover transparently from single-event upsets. The software library in question is fault-detecting middleware that is wrapped around the numericalsubroutines. Conventional serial versions (based on LAPACK and FFTW) and a parallel version (based on ScaLAPACK) exist. The source code of the application program that contains the numerical subroutines is not modified, and the middleware is transparent to the user. The methodology used is a type of algorithm- based fault tolerance (ABFT). In ABFT, a checksum is computed before a computation and compared with the checksum of the computational result; an error is declared if the difference between the checksums exceeds some threshold. Novel normalization methods are used in the checksum comparison to ensure correct fault detections independent of algorithm inputs. In tests of this software reported in the peer-reviewed literature, this library was shown to enable detection of 99.9 percent of significant faults while generating no false alarms.
Applications of Logic Coverage Criteria and Logic Mutation to Software Testing
ERIC Educational Resources Information Center
Kaminski, Garrett K.
2011-01-01
Logic is an important component of software. Thus, software logic testing has enjoyed significant research over a period of decades, with renewed interest in the last several years. One approach to detecting logic faults is to create and execute tests that satisfy logic coverage criteria. Another approach to detecting faults is to perform mutation…
Li, Jia; Wang, Deming; Huang, Zonghou
2017-01-01
Coal dust explosions (CDE) are one of the main threats to the occupational safety of coal miners. Aiming to identify and assess the risk of CDE, this paper proposes a novel method of fuzzy fault tree analysis combined with the Visual Basic (VB) program. In this methodology, various potential causes of the CDE are identified and a CDE fault tree is constructed. To overcome drawbacks from the lack of exact probability data for the basic events, fuzzy set theory is employed and the probability data of each basic event is treated as intuitionistic trapezoidal fuzzy numbers. In addition, a new approach for calculating the weighting of each expert is also introduced in this paper to reduce the error during the expert elicitation process. Specifically, an in-depth quantitative analysis of the fuzzy fault tree, such as the importance measure of the basic events and the cut sets, and the CDE occurrence probability is given to assess the explosion risk and acquire more details of the CDE. The VB program is applied to simplify the analysis process. A case study and analysis is provided to illustrate the effectiveness of this proposed method, and some suggestions are given to take preventive measures in advance and avoid CDE accidents. PMID:28793348
Shi, Lei; Shuai, Jian; Xu, Kui
2014-08-15
Fire and explosion accidents of steel oil storage tanks (FEASOST) occur occasionally during the petroleum and chemical industry production and storage processes and often have devastating impact on lives, the environment and property. To contribute towards the development of a quantitative approach for assessing the occurrence probability of FEASOST, a fault tree of FEASOST is constructed that identifies various potential causes. Traditional fault tree analysis (FTA) can achieve quantitative evaluation if the failure data of all of the basic events (BEs) are available, which is almost impossible due to the lack of detailed data, as well as other uncertainties. This paper makes an attempt to perform FTA of FEASOST by a hybrid application between an expert elicitation based improved analysis hierarchy process (AHP) and fuzzy set theory, and the occurrence possibility of FEASOST is estimated for an oil depot in China. A comparison between statistical data and calculated data using fuzzy fault tree analysis (FFTA) based on traditional and improved AHP is also made. Sensitivity and importance analysis has been performed to identify the most crucial BEs leading to FEASOST that will provide insights into how managers should focus effective mitigation. Copyright © 2014 Elsevier B.V. All rights reserved.
Wang, Hetang; Li, Jia; Wang, Deming; Huang, Zonghou
2017-01-01
Coal dust explosions (CDE) are one of the main threats to the occupational safety of coal miners. Aiming to identify and assess the risk of CDE, this paper proposes a novel method of fuzzy fault tree analysis combined with the Visual Basic (VB) program. In this methodology, various potential causes of the CDE are identified and a CDE fault tree is constructed. To overcome drawbacks from the lack of exact probability data for the basic events, fuzzy set theory is employed and the probability data of each basic event is treated as intuitionistic trapezoidal fuzzy numbers. In addition, a new approach for calculating the weighting of each expert is also introduced in this paper to reduce the error during the expert elicitation process. Specifically, an in-depth quantitative analysis of the fuzzy fault tree, such as the importance measure of the basic events and the cut sets, and the CDE occurrence probability is given to assess the explosion risk and acquire more details of the CDE. The VB program is applied to simplify the analysis process. A case study and analysis is provided to illustrate the effectiveness of this proposed method, and some suggestions are given to take preventive measures in advance and avoid CDE accidents.
Runtime Verification in Context : Can Optimizing Error Detection Improve Fault Diagnosis
NASA Technical Reports Server (NTRS)
Dwyer, Matthew B.; Purandare, Rahul; Person, Suzette
2010-01-01
Runtime verification has primarily been developed and evaluated as a means of enriching the software testing process. While many researchers have pointed to its potential applicability in online approaches to software fault tolerance, there has been a dearth of work exploring the details of how that might be accomplished. In this paper, we describe how a component-oriented approach to software health management exposes the connections between program execution, error detection, fault diagnosis, and recovery. We identify both research challenges and opportunities in exploiting those connections. Specifically, we describe how recent approaches to reducing the overhead of runtime monitoring aimed at error detection might be adapted to reduce the overhead and improve the effectiveness of fault diagnosis.
Predeployment validation of fault-tolerant systems through software-implemented fault insertion
NASA Technical Reports Server (NTRS)
Czeck, Edward W.; Siewiorek, Daniel P.; Segall, Zary Z.
1989-01-01
Fault injection-based automated testing (FIAT) environment, which can be used to experimentally characterize and evaluate distributed realtime systems under fault-free and faulted conditions is described. A survey is presented of validation methodologies. The need for fault insertion based on validation methodologies is demonstrated. The origins and models of faults, and motivation for the FIAT concept are reviewed. FIAT employs a validation methodology which builds confidence in the system through first providing a baseline of fault-free performance data and then characterizing the behavior of the system with faults present. Fault insertion is accomplished through software and allows faults or the manifestation of faults to be inserted by either seeding faults into memory or triggering error detection mechanisms. FIAT is capable of emulating a variety of fault-tolerant strategies and architectures, can monitor system activity, and can automatically orchestrate experiments involving insertion of faults. There is a common system interface which allows ease of use to decrease experiment development and run time. Fault models chosen for experiments on FIAT have generated system responses which parallel those observed in real systems under faulty conditions. These capabilities are shown by two example experiments each using a different fault-tolerance strategy.
Advanced information processing system: Fault injection study and results
NASA Technical Reports Server (NTRS)
Burkhardt, Laura F.; Masotto, Thomas K.; Lala, Jaynarayan H.
1992-01-01
The objective of the AIPS program is to achieve a validated fault tolerant distributed computer system. The goals of the AIPS fault injection study were: (1) to present the fault injection study components addressing the AIPS validation objective; (2) to obtain feedback for fault removal from the design implementation; (3) to obtain statistical data regarding fault detection, isolation, and reconfiguration responses; and (4) to obtain data regarding the effects of faults on system performance. The parameters are described that must be varied to create a comprehensive set of fault injection tests, the subset of test cases selected, the test case measurements, and the test case execution. Both pin level hardware faults using a hardware fault injector and software injected memory mutations were used to test the system. An overview is provided of the hardware fault injector and the associated software used to carry out the experiments. Detailed specifications are given of fault and test results for the I/O Network and the AIPS Fault Tolerant Processor, respectively. The results are summarized and conclusions are given.
Fault tree analysis for urban flooding.
ten Veldhuis, J A E; Clemens, F H L R; van Gelder, P H A J M
2009-01-01
Traditional methods to evaluate flood risk generally focus on heavy storm events as the principal cause of flooding. Conversely, fault tree analysis is a technique that aims at modelling all potential causes of flooding. It quantifies both overall flood probability and relative contributions of individual causes of flooding. This paper presents a fault model for urban flooding and an application to the case of Haarlem, a city of 147,000 inhabitants. Data from a complaint register, rainfall gauges and hydrodynamic model calculations are used to quantify probabilities of basic events in the fault tree. This results in a flood probability of 0.78/week for Haarlem. It is shown that gully pot blockages contribute to 79% of flood incidents, whereas storm events contribute only 5%. This implies that for this case more efficient gully pot cleaning is a more effective strategy to reduce flood probability than enlarging drainage system capacity. Whether this is also the most cost-effective strategy can only be decided after risk assessment has been complemented with a quantification of consequences of both types of events. To do this will be the next step in this study.
NASA Astrophysics Data System (ADS)
Koji, Yusuke; Kitamura, Yoshinobu; Kato, Yoshikiyo; Tsutsui, Yoshio; Mizoguchi, Riichiro
In conceptual design, it is important to develop functional structures which reflect the rich experience in the knowledge from previous design failures. Especially, if a designer learns possible abnormal behaviors from a previous design failure, he or she can add an additional function which prevents such abnormal behaviors and faults. To do this, it is a crucial issue to share such knowledge about possible faulty phenomena and how to cope with them. In fact, a part of such knowledge is described in FMEA (Failure Mode and Effect Analysis) sheets, function structure models for systematic design and fault trees for FTA (Fault Tree Analysis).
Failure analysis of energy storage spring in automobile composite brake chamber
NASA Astrophysics Data System (ADS)
Luo, Zai; Wei, Qing; Hu, Xiaofeng
2015-02-01
This paper set energy storage spring of parking brake cavity, part of automobile composite brake chamber, as the research object. And constructed the fault tree model of energy storage spring which caused parking brake failure based on the fault tree analysis method. Next, the parking brake failure model of energy storage spring was established by analyzing the working principle of composite brake chamber. Finally, the data of working load and the push rod stroke measured by comprehensive test-bed valve was used to validate the failure model above. The experimental result shows that the failure model can distinguish whether the energy storage spring is faulted.
A fast bottom-up algorithm for computing the cut sets of noncoherent fault trees
DOE Office of Scientific and Technical Information (OSTI.GOV)
Corynen, G.C.
1987-11-01
An efficient procedure for finding the cut sets of large fault trees has been developed. Designed to address coherent or noncoherent systems, dependent events, shared or common-cause events, the method - called SHORTCUT - is based on a fast algorithm for transforming a noncoherent tree into a quasi-coherent tree (COHERE), and on a new algorithm for reducing cut sets (SUBSET). To assure sufficient clarity and precision, the procedure is discussed in the language of simple sets, which is also developed in this report. Although the new method has not yet been fully implemented on the computer, we report theoretical worst-casemore » estimates of its computational complexity. 12 refs., 10 figs.« less
Health management and controls for Earth-to-orbit propulsion systems
NASA Astrophysics Data System (ADS)
Bickford, R. L.
1995-03-01
Avionics and health management technologies increase the safety and reliability while decreasing the overall cost for Earth-to-orbit (ETO) propulsion systems. New ETO propulsion systems will depend on highly reliable fault tolerant flight avionics, advanced sensing systems and artificial intelligence aided software to ensure critical control, safety and maintenance requirements are met in a cost effective manner. Propulsion avionics consist of the engine controller, actuators, sensors, software and ground support elements. In addition to control and safety functions, these elements perform system monitoring for health management. Health management is enhanced by advanced sensing systems and algorithms which provide automated fault detection and enable adaptive control and/or maintenance approaches. Aerojet is developing advanced fault tolerant rocket engine controllers which provide very high levels of reliability. Smart sensors and software systems which significantly enhance fault coverage and enable automated operations are also under development. Smart sensing systems, such as flight capable plume spectrometers, have reached maturity in ground-based applications and are suitable for bridging to flight. Software to detect failed sensors has reached similar maturity. This paper will discuss fault detection and isolation for advanced rocket engine controllers as well as examples of advanced sensing systems and software which significantly improve component failure detection for engine system safety and health management.
Electromagnetic Compatibility (EMC) in Microelectronics.
1983-02-01
Fault Tree Analysis", System Saftey Symposium, June 8-9, 1965, Seattle: The Boeing Company . 12. Fussell, J.B., "Fault Tree Analysis-Concepts and...procedure for assessing EMC in microelectronics and for applying DD, 1473 EOiTO OP I, NOV6 IS OESOL.ETE UNCLASSIFIED SECURITY CLASSIFICATION OF THIS...CRITERIA 2.1 Background 2 2.2 The Probabilistic Nature of EMC 2 2.3 The Probabilistic Approach 5 2.4 The Compatibility Factor 6 3 APPLYING PROBABILISTIC
A graphical language for reliability model generation
NASA Technical Reports Server (NTRS)
Howell, Sandra V.; Bavuso, Salvatore J.; Haley, Pamela J.
1990-01-01
A graphical interface capability of the hybrid automated reliability predictor (HARP) is described. The graphics-oriented (GO) module provides the user with a graphical language for modeling system failure modes through the selection of various fault tree gates, including sequence dependency gates, or by a Markov chain. With this graphical input language, a fault tree becomes a convenient notation for describing a system. In accounting for any sequence dependencies, HARP converts the fault-tree notation to a complex stochastic process that is reduced to a Markov chain which it can then solve for system reliability. The graphics capability is available for use on an IBM-compatible PC, a Sun, and a VAX workstation. The GO module is written in the C programming language and uses the Graphical Kernel System (GKS) standard for graphics implementation. The PC, VAX, and Sun versions of the HARP GO module are currently in beta-testing.
Diagnostic Analyzer for Gearboxes (DAG): User's Guide. Version 3.1 for Microsoft Windows 3.1
NASA Technical Reports Server (NTRS)
Jammu, Vinay B.; Kourosh, Danai
1997-01-01
This documentation describes the Diagnostic Analyzer for Gearboxes (DAG) software for performing fault diagnosis of gearboxes. First, the user would construct a graphical representation of the gearbox using the gear, bearing, shaft, and sensor tools contained in the DAG software. Next, a set of vibration features obtained by processing the vibration signals recorded from the gearbox using a signal analyzer is required. Given this information, the DAG software uses an unsupervised neural network referred to as the Fault Detection Network (FDN) to identify the occurrence of faults, and a pattern classifier called Single Category-Based Classifier (SCBC) for abnormality scaling of individual vibration features. The abnormality-scaled vibration features are then used as inputs to a Structure-Based Connectionist Network (SBCN) for identifying faults in gearbox subsystems and components. The weights of the SBCN represent its diagnostic knowledge and are derived from the structure of the gearbox graphically presented in DAG. The outputs of SBCN are fault possibility values between 0 and 1 for individual subsystems and components in the gearbox with a 1 representing a definite fault and a 0 representing normality. This manual describes the steps involved in creating the diagnostic gearbox model, along with the options and analysis tools of the DAG software.
A Voyager attitude control perspective on fault tolerant systems
NASA Technical Reports Server (NTRS)
Rasmussen, R. D.; Litty, E. C.
1981-01-01
In current spacecraft design, a trend can be observed to achieve greater fault tolerance through the application of on-board software dedicated to detecting and isolating failures. Whether fault tolerance through software can meet the desired objectives depends on very careful consideration and control of the system in which the software is imbedded. The considered investigation has the objective to provide some of the insight needed for the required analysis of the system. A description is given of the techniques which have been developed in this connection during the development of the Voyager spacecraft. The Voyager Galileo Attitude and Articulation Control Subsystem (AACS) fault tolerant design is discussed to emphasize basic lessons learned from this experience. The central driver of hardware redundancy implementation on Voyager was known as the 'single point failure criterion'.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vinnikov, B.; NRC Kurchatov Inst.
According to Scientific and Technical Cooperation between the USA and Russia in the field of nuclear engineering the Idaho National Laboratory has transferred to the possession of the National Research Center ' Kurchatov Inst. ' the SAPHIRE software without any fee. With the help of the software Kurchatov Inst. developed a Pilot Living PSA- Model of Leningrad NPP Unit 1. Computations of core damage frequencies were carried out for additional Initiating Events. In the submitted paper such additional Initiating Events are fires in various compartments of the NPP. During the computations of each fire, structure of the PSA - Modelmore » was not changed, but Fault Trees for the appropriate systems, which are removed from service during the fire, were changed. It follows from the computations, that for ten fires Core Damaged Frequencies (CDF) are not changed. Other six fires will cause additional core damage. On the basis of the calculated results it is possible to determine a degree of importance of these fires and to establish sequence of performance of fire-prevention measures in various places of the NPP. (authors)« less
Development of a Software Safety Process and a Case Study of Its Use
NASA Technical Reports Server (NTRS)
Knight, J. C.
1997-01-01
Research in the year covered by this reporting period has been primarily directed toward the following areas: (1) Formal specification of user interfaces; (2) Fault-tree analysis including software; (3) Evaluation of formal specification notations; (4) Evaluation of formal verification techniques; (5) Expanded analysis of the shell architecture concept; (6) Development of techniques to address the problem of information survivability; and (7) Development of a sophisticated tool for the manipulation of formal specifications written in Z. This report summarizes activities under the grant. The technical results relating to this grant and the remainder of the principal investigator's research program are contained in various reports and papers. The remainder of this report is organized as follows. In the next section, an overview of the project is given. This is followed by a summary of accomplishments during the reporting period and details of students funded. Seminars presented describing work under this grant are listed in the following section, and the final section lists publications resulting from this grant.
A fault-tolerant intelligent robotic control system
NASA Technical Reports Server (NTRS)
Marzwell, Neville I.; Tso, Kam Sing
1993-01-01
This paper describes the concept, design, and features of a fault-tolerant intelligent robotic control system being developed for space and commercial applications that require high dependability. The comprehensive strategy integrates system level hardware/software fault tolerance with task level handling of uncertainties and unexpected events for robotic control. The underlying architecture for system level fault tolerance is the distributed recovery block which protects against application software, system software, hardware, and network failures. Task level fault tolerance provisions are implemented in a knowledge-based system which utilizes advanced automation techniques such as rule-based and model-based reasoning to monitor, diagnose, and recover from unexpected events. The two level design provides tolerance of two or more faults occurring serially at any level of command, control, sensing, or actuation. The potential benefits of such a fault tolerant robotic control system include: (1) a minimized potential for damage to humans, the work site, and the robot itself; (2) continuous operation with a minimum of uncommanded motion in the presence of failures; and (3) more reliable autonomous operation providing increased efficiency in the execution of robotic tasks and decreased demand on human operators for controlling and monitoring the robotic servicing routines.
Ground Software Maintenance Facility (GSMF) system manual
NASA Technical Reports Server (NTRS)
Derrig, D.; Griffith, G.
1986-01-01
The Ground Software Maintenance Facility (GSMF) is designed to support development and maintenance of spacelab ground support software. THE GSMF consists of a Perkin Elmer 3250 (Host computer) and a MITRA 125s (ATE computer), with appropriate interface devices and software to simulate the Electrical Ground Support Equipment (EGSE). This document is presented in three sections: (1) GSMF Overview; (2) Software Structure; and (3) Fault Isolation Capability. The overview contains information on hardware and software organization along with their corresponding block diagrams. The Software Structure section describes the modes of software structure including source files, link information, and database files. The Fault Isolation section describes the capabilities of the Ground Computer Interface Device, Perkin Elmer host, and MITRA ATE.
Detection of faults and software reliability analysis
NASA Technical Reports Server (NTRS)
Knight, John C.
1987-01-01
Multi-version or N-version programming is proposed as a method of providing fault tolerance in software. The approach requires the separate, independent preparation of multiple versions of a piece of software for some application. These versions are executed in parallel in the application environment; each receives identical inputs and each produces its version of the required outputs. The outputs are collected by a voter and, in principle, they should all be the same. In practice there may be some disagreement. If this occurs, the results of the majority are taken to be the correct output, and that is the output used by the system. A total of 27 programs were produced. Each of these programs was then subjected to one million randomly-generated test cases. The experiment yielded a number of programs containing faults that are useful for general studies of software reliability as well as studies of N-version programming. Fault tolerance through data diversity and analytic models of comparison testing are discussed.
Automated Fault Interpretation and Extraction using Improved Supplementary Seismic Datasets
NASA Astrophysics Data System (ADS)
Bollmann, T. A.; Shank, R.
2017-12-01
During the interpretation of seismic volumes, it is necessary to interpret faults along with horizons of interest. With the improvement of technology, the interpretation of faults can be expedited with the aid of different algorithms that create supplementary seismic attributes, such as semblance and coherency. These products highlight discontinuities, but still need a large amount of human interaction to interpret faults and are plagued by noise and stratigraphic discontinuities. Hale (2013) presents a method to improve on these datasets by creating what is referred to as a Fault Likelihood volume. In general, these volumes contain less noise and do not emphasize stratigraphic features. Instead, planar features within a specified strike and dip range are highlighted. Once a satisfactory Fault Likelihood Volume is created, extraction of fault surfaces is much easier. The extracted fault surfaces are then exported to interpretation software for QC. Numerous software packages have implemented this methodology with varying results. After investigating these platforms, we developed a preferred Automated Fault Interpretation workflow.
Measurement of fault latency in a digital avionic miniprocessor
NASA Technical Reports Server (NTRS)
Mcgough, J. G.; Swern, F. L.
1981-01-01
The results of fault injection experiments utilizing a gate-level emulation of the central processor unit of the Bendix BDX-930 digital computer are presented. The failure detection coverage of comparison-monitoring and a typical avionics CPU self-test program was determined. The specific tasks and experiments included: (1) inject randomly selected gate-level and pin-level faults and emulate six software programs using comparison-monitoring to detect the faults; (2) based upon the derived empirical data develop and validate a model of fault latency that will forecast a software program's detecting ability; (3) given a typical avionics self-test program, inject randomly selected faults at both the gate-level and pin-level and determine the proportion of faults detected; (4) determine why faults were undetected; (5) recommend how the emulation can be extended to multiprocessor systems such as SIFT; and (6) determine the proportion of faults detected by a uniprocessor BIT (built-in-test) irrespective of self-test.
Criteria for software modularization
NASA Technical Reports Server (NTRS)
Card, David N.; Page, Gerald T.; Mcgarry, Frank E.
1985-01-01
A central issue in programming practice involves determining the appropriate size and information content of a software module. This study attempted to determine the effectiveness of two widely used criteria for software modularization, strength and size, in reducing fault rate and development cost. Data from 453 FORTRAN modules developed by professional programmers were analyzed. The results indicated that module strength is a good criterion with respect to fault rate, whereas arbitrary module size limitations inhibit programmer productivity. This analysis is a first step toward defining empirically based standards for software modularization.
NASA Technical Reports Server (NTRS)
Smith, T. B., III; Lala, J. H.
1984-01-01
The FTMP architecture is a high reliability computer concept modeled after a homogeneous multiprocessor architecture. Elements of the FTMP are operated in tight synchronism with one another and hardware fault-detection and fault-masking is provided which is transparent to the software. Operating system design and user software design is thus greatly simplified. Performance of the FTMP is also comparable to that of a simplex equivalent due to the efficiency of fault handling hardware. The FTMP project constructed an engineering module of the FTMP, programmed the machine and extensively tested the architecture through fault injection and other stress testing. This testing confirmed the soundness of the FTMP concepts.
NASA Astrophysics Data System (ADS)
Wu, Jianing; Yan, Shaoze; Xie, Liyang
2011-12-01
To address the impact of solar array anomalies, it is important to perform analysis of the solar array reliability. This paper establishes the fault tree analysis (FTA) and fuzzy reasoning Petri net (FRPN) models of a solar array mechanical system and analyzes reliability to find mechanisms of the solar array fault. The index final truth degree (FTD) and cosine matching function (CMF) are employed to resolve the issue of how to evaluate the importance and influence of different faults. So an improvement reliability analysis method is developed by means of the sorting of FTD and CMF. An example is analyzed using the proposed method. The analysis results show that harsh thermal environment and impact caused by particles in space are the most vital causes of the solar array fault. Furthermore, other fault modes and the corresponding improvement methods are discussed. The results reported in this paper could be useful for the spacecraft designers, particularly, in the process of redesigning the solar array and scheduling its reliability growth plan.
Seera, Manjeevan; Lim, Chee Peng; Ishak, Dahaman; Singh, Harapajan
2012-01-01
In this paper, a novel approach to detect and classify comprehensive fault conditions of induction motors using a hybrid fuzzy min-max (FMM) neural network and classification and regression tree (CART) is proposed. The hybrid model, known as FMM-CART, exploits the advantages of both FMM and CART for undertaking data classification and rule extraction problems. A series of real experiments is conducted, whereby the motor current signature analysis method is applied to form a database comprising stator current signatures under different motor conditions. The signal harmonics from the power spectral density are extracted as discriminative input features for fault detection and classification with FMM-CART. A comprehensive list of induction motor fault conditions, viz., broken rotor bars, unbalanced voltages, stator winding faults, and eccentricity problems, has been successfully classified using FMM-CART with good accuracy rates. The results are comparable, if not better, than those reported in the literature. Useful explanatory rules in the form of a decision tree are also elicited from FMM-CART to analyze and understand different fault conditions of induction motors.
Probabilistic Seismic Hazard Assessment of the Chiapas State (SE Mexico)
NASA Astrophysics Data System (ADS)
Rodríguez-Lomelí, Anabel Georgina; García-Mayordomo, Julián
2015-04-01
The Chiapas State, in southeastern Mexico, is a very active seismic region due to the interaction of three tectonic plates: Northamerica, Cocos and Caribe. We present a probabilistic seismic hazard assessment (PSHA) specifically performed to evaluate seismic hazard in the Chiapas state. The PSHA was based on a composited seismic catalogue homogenized to Mw and was used a logic tree procedure for the consideration of different seismogenic source models and ground motion prediction equations (GMPEs). The results were obtained in terms of peak ground acceleration as well as spectral accelerations. The earthquake catalogue was compiled from the International Seismological Center and the Servicio Sismológico Nacional de México sources. Two different seismogenic source zones (SSZ) models were devised based on a revision of the tectonics of the region and the available geomorphological and geological maps. The SSZ were finally defined by the analysis of geophysical data, resulting two main different SSZ models. The Gutenberg-Richter parameters for each SSZ were calculated from the declustered and homogenized catalogue, while the maximum expected earthquake was assessed from both the catalogue and geological criteria. Several worldwide and regional GMPEs for subduction and crustal zones were revised. For each SSZ model we considered four possible combinations of GMPEs. Finally, hazard was calculated in terms of PGA and SA for 500-, 1000-, and 2500-years return periods for each branch of the logic tree using the CRISIS2007 software. The final hazard maps represent the mean values obtained from the two seismogenic and four attenuation models considered in the logic tree. For the three return periods analyzed, the maps locate the most hazardous areas in the Chiapas Central Pacific Zone, the Pacific Coastal Plain and in the Motagua and Polochic Fault Zone; intermediate hazard values in the Chiapas Batholith Zone and in the Strike-Slip Faults Province. The hazard decreases towards the northeast across the Reverse Faults Province and up to Yucatan Platform, where the lowest values are reached. We also produced uniform hazard spectra (UHS) for the three main cities of Chiapas. Tapachula city presents the highest spectral accelerations, while Tuxtla Gutierrez and San Cristobal de las Casas cities show similar values. We conclude that seismic hazard in Chiapas is chiefly controlled by the subduction of the Cocos beneath Northamerica and Caribe tectonic plates, that makes the coastal areas the most hazardous. Additionally, the Motagua and Polochic Fault Zones are also important, increasing the hazard particularly in southeastern Chiapas.
Path Searching Based Fault Automated Recovery Scheme for Distribution Grid with DG
NASA Astrophysics Data System (ADS)
Xia, Lin; Qun, Wang; Hui, Xue; Simeng, Zhu
2016-12-01
Applying the method of path searching based on distribution network topology in setting software has a good effect, and the path searching method containing DG power source is also applicable to the automatic generation and division of planned islands after the fault. This paper applies path searching algorithm in the automatic division of planned islands after faults: starting from the switch of fault isolation, ending in each power source, and according to the line load that the searching path traverses and the load integrated by important optimized searching path, forming optimized division scheme of planned islands that uses each DG as power source and is balanced to local important load. Finally, COBASE software and distribution network automation software applied are used to illustrate the effectiveness of the realization of such automatic restoration program.
Development of a software safety process and a case study of its use
NASA Technical Reports Server (NTRS)
Knight, John C.
1993-01-01
The goal of this research is to continue the development of a comprehensive approach to software safety and to evaluate the approach with a case study. The case study is a major part of the project, and it involves the analysis of a specific safety-critical system from the medical equipment domain. The particular application being used was selected because of the availability of a suitable candidate system. We consider the results to be generally applicable and in no way particularly limited by the domain. The research is concentrating on issues raised by the specification and verification phases of the software lifecycle since they are central to our previously-developed rigorous definitions of software safety. The theoretical research is based on our framework of definitions for software safety. In the area of specification, the main topics being investigated are the development of techniques for building system fault trees that correctly incorporate software issues and the development of rigorous techniques for the preparation of software safety specifications. The research results are documented. Another area of theoretical investigation is the development of verification methods tailored to the characteristics of safety requirements. Verification of the correct implementation of the safety specification is central to the goal of establishing safe software. The empirical component of this research is focusing on a case study in order to provide detailed characterizations of the issues as they appear in practice, and to provide a testbed for the evaluation of various existing and new theoretical results, tools, and techniques. The Magnetic Stereotaxis System is summarized.
Software reliability through fault-avoidance and fault-tolerance
NASA Technical Reports Server (NTRS)
Vouk, Mladen A.; Mcallister, David F.
1992-01-01
Accomplishments in the following research areas are summarized: structure based testing, reliability growth, and design testability with risk evaluation; reliability growth models and software risk management; and evaluation of consensus voting, consensus recovery block, and acceptance voting. Four papers generated during the reporting period are included as appendices.
Galileo spacecraft power distribution and autonomous fault recovery
NASA Technical Reports Server (NTRS)
Detwiler, R. C.
1982-01-01
There is a trend in current spacecraft design to achieve greater fault tolerance through the implemenation of on-board software dedicated to detecting and isolating failures. A combination of hardware and software is utilized in the Galileo power system for autonomous fault recovery. Galileo is a dual-spun spacecraft designed to carry a number of scientific instruments into a series of orbits around the planet Jupiter. In addition to its self-contained scientific payload, it will also carry a probe system which will be separated from the spacecraft some 150 days prior to Jupiter encounter. The Galileo spacecraft is scheduled to be launched in 1985. Attention is given to the power system, the fault protection requirements, and the power fault recovery implementation.
Detection of faults and software reliability analysis
NASA Technical Reports Server (NTRS)
Knight, J. C.
1987-01-01
Specific topics briefly addressed include: the consistent comparison problem in N-version system; analytic models of comparison testing; fault tolerance through data diversity; and the relationship between failures caused by automatically seeded faults.
Aydin, Ilhan; Karakose, Mehmet; Akin, Erhan
2014-03-01
Although reconstructed phase space is one of the most powerful methods for analyzing a time series, it can fail in fault diagnosis of an induction motor when the appropriate pre-processing is not performed. Therefore, boundary analysis based a new feature extraction method in phase space is proposed for diagnosis of induction motor faults. The proposed approach requires the measurement of one phase current signal to construct the phase space representation. Each phase space is converted into an image, and the boundary of each image is extracted by a boundary detection algorithm. A fuzzy decision tree has been designed to detect broken rotor bars and broken connector faults. The results indicate that the proposed approach has a higher recognition rate than other methods on the same dataset. © 2013 ISA Published by ISA All rights reserved.
Design for dependability: A simulation-based approach. Ph.D. Thesis, 1993
NASA Technical Reports Server (NTRS)
Goswami, Kumar K.
1994-01-01
This research addresses issues in simulation-based system level dependability analysis of fault-tolerant computer systems. The issues and difficulties of providing a general simulation-based approach for system level analysis are discussed and a methodology that address and tackle these issues is presented. The proposed methodology is designed to permit the study of a wide variety of architectures under various fault conditions. It permits detailed functional modeling of architectural features such as sparing policies, repair schemes, routing algorithms as well as other fault-tolerant mechanisms, and it allows the execution of actual application software. One key benefit of this approach is that the behavior of a system under faults does not have to be pre-defined as it is normally done. Instead, a system can be simulated in detail and injected with faults to determine its failure modes. The thesis describes how object-oriented design is used to incorporate this methodology into a general purpose design and fault injection package called DEPEND. A software model is presented that uses abstractions of application programs to study the behavior and effect of software on hardware faults in the early design stage when actual code is not available. Finally, an acceleration technique that combines hierarchical simulation, time acceleration algorithms and hybrid simulation to reduce simulation time is introduced.
A Guide to Street Tree Inventory Software
Gene A. Olig; Robert W. Miller
1997-01-01
The purpose of this publication is to serve as a reference and guide for urban forestry professionals in the selection of a street tree inventory software program. The programs described include only those that are commercially available. The increasing demand for street tree inventory software follows a trend towards a more computerized society and the increasing...
Debugging and Logging Services for Defence Service Oriented Architectures
2012-02-01
Service A software component and callable end point that provides a logically related set of operations, each of which perform a logical step in a...important to note that in some cases when the fault is identified to lie in uneditable code such as program libraries, or outsourced software services ...debugging is limited to characterisation of the fault, reporting it to the software or service provider and development of work-arounds and management
Software Implemented Fault-Tolerant (SIFT) user's guide
NASA Technical Reports Server (NTRS)
Green, D. F., Jr.; Palumbo, D. L.; Baltrus, D. W.
1984-01-01
Program development for a Software Implemented Fault Tolerant (SIFT) computer system is accomplished in the NASA LaRC AIRLAB facility using a DEC VAX-11 to interface with eight Bendix BDX 930 flight control processors. The interface software which provides this SIFT program development capability was developed by AIRLAB personnel. This technical memorandum describes the application and design of this software in detail, and is intended to assist both the user in performance of SIFT research and the systems programmer responsible for maintaining and/or upgrading the SIFT programming environment.
The P-Mesh: A Commodity-based Scalable Network Architecture for Clusters
NASA Technical Reports Server (NTRS)
Nitzberg, Bill; Kuszmaul, Chris; Stockdale, Ian; Becker, Jeff; Jiang, John; Wong, Parkson; Tweten, David (Technical Monitor)
1998-01-01
We designed a new network architecture, the P-Mesh which combines the scalability and fault resilience of a torus with the performance of a switch. We compare the scalability, performance, and cost of the hub, switch, torus, tree, and P-Mesh architectures. The latter three are capable of scaling to thousands of nodes, however, the torus has severe performance limitations with that many processors. The tree and P-Mesh have similar latency, bandwidth, and bisection bandwidth, but the P-Mesh outperforms the switch architecture (a lower bound for tree performance) on 16-node NAB Parallel Benchmark tests by up to 23%, and costs 40% less. Further, the P-Mesh has better fault resilience characteristics. The P-Mesh architecture trades increased management overhead for lower cost, and is a good bridging technology while the price of tree uplinks is expensive.
The Infeasibility of Quantifying the Reliability of Life-Critical Real-Time Software
NASA Technical Reports Server (NTRS)
Butler, Ricky W.; Finelli, George B.
1991-01-01
This paper affirms that the quantification of life-critical software reliability is infeasible using statistical methods whether applied to standard software or fault-tolerant software. The classical methods of estimating reliability are shown to lead to exhorbitant amounts of testing when applied to life-critical software. Reliability growth models are examined and also shown to be incapable of overcoming the need for excessive amounts of testing. The key assumption of software fault tolerance separately programmed versions fail independently is shown to be problematic. This assumption cannot be justified by experimentation in the ultrareliability region and subjective arguments in its favor are not sufficiently strong to justify it as an axiom. Also, the implications of the recent multiversion software experiments support this affirmation.
The 1992 Landers earthquake sequence; seismological observations
Egill Hauksson,; Jones, Lucile M.; Hutton, Kate; Eberhart-Phillips, Donna
1993-01-01
The (MW6.1, 7.3, 6.2) 1992 Landers earthquakes began on April 23 with the MW6.1 1992 Joshua Tree preshock and form the most substantial earthquake sequence to occur in California in the last 40 years. This sequence ruptured almost 100 km of both surficial and concealed faults and caused aftershocks over an area 100 km wide by 180 km long. The faulting was predominantly strike slip and three main events in the sequence had unilateral rupture to the north away from the San Andreas fault. The MW6.1 Joshua Tree preshock at 33°N58′ and 116°W19′ on 0451 UT April 23 was preceded by a tightly clustered foreshock sequence (M≤4.6) beginning 2 hours before the mainshock and followed by a large aftershock sequence with more than 6000 aftershocks. The aftershocks extended along a northerly trend from about 10 km north of the San Andreas fault, northwest of Indio, to the east-striking Pinto Mountain fault. The Mw7.3 Landers mainshock occurred at 34°N13′ and 116°W26′ at 1158 UT, June 28, 1992, and was preceded for 12 hours by 25 small M≤3 earthquakes at the mainshock epicenter. The distribution of more than 20,000 aftershocks, analyzed in this study, and short-period focal mechanisms illuminate a complex sequence of faulting. The aftershocks extend 60 km to the north of the mainshock epicenter along a system of at least five different surficial faults, and 40 km to the south, crossing the Pinto Mountain fault through the Joshua Tree aftershock zone towards the San Andreas fault near Indio. The rupture initiated in the depth range of 3–6 km, similar to previous M∼5 earthquakes in the region, although the maximum depth of aftershocks is about 15 km. The mainshock focal mechanism showed right-lateral strike-slip faulting with a strike of N10°W on an almost vertical fault. The rupture formed an arclike zone well defined by both surficial faulting and aftershocks, with more westerly faulting to the north. This change in strike is accomplished by jumping across dilational jogs connecting surficial faults with strikes rotated progressively to the west. A 20-km-long linear cluster of aftershocks occurred 10–20 km north of Barstow, or 30–40 km north of the end of the mainshock rupture. The most prominent off-fault aftershock cluster occurred 30 km to the west of the Landers mainshock. The largest aftershock was within this cluster, the Mw6.2 Big Bear aftershock occurring at 34°N10′ and 116°W49′ at 1505 UT June 28. It exhibited left-lateral strike-slip faulting on a northeast striking and steeply dipping plane. The Big Bear aftershocks form a linear trend extending 20 km to the northeast with a scattered distribution to the north. The Landers mainshock occurred near the southernmost extent of the Eastern California Shear Zone, an 80-km-wide, more than 400-km-long zone of deformation. This zone extends into the Death Valley region and accommodates about 10 to 20% of the plate motion between the Pacific and North American plates. The Joshua Tree preshock, its aftershocks, and Landers aftershocks form a previously missing link that connects the Eastern California Shear Zone to the southern San Andreas fault.
TreeRipper web application: towards a fully automated optical tree recognition software.
Hughes, Joseph
2011-05-20
Relationships between species, genes and genomes have been printed as trees for over a century. Whilst this may have been the best format for exchanging and sharing phylogenetic hypotheses during the 20th century, the worldwide web now provides faster and automated ways of transferring and sharing phylogenetic knowledge. However, novel software is needed to defrost these published phylogenies for the 21st century. TreeRipper is a simple website for the fully-automated recognition of multifurcating phylogenetic trees (http://linnaeus.zoology.gla.ac.uk/~jhughes/treeripper/). The program accepts a range of input image formats (PNG, JPG/JPEG or GIF). The underlying command line c++ program follows a number of cleaning steps to detect lines, remove node labels, patch-up broken lines and corners and detect line edges. The edge contour is then determined to detect the branch length, tip label positions and the topology of the tree. Optical Character Recognition (OCR) is used to convert the tip labels into text with the freely available tesseract-ocr software. 32% of images meeting the prerequisites for TreeRipper were successfully recognised, the largest tree had 115 leaves. Despite the diversity of ways phylogenies have been illustrated making the design of a fully automated tree recognition software difficult, TreeRipper is a step towards automating the digitization of past phylogenies. We also provide a dataset of 100 tree images and associated tree files for training and/or benchmarking future software. TreeRipper is an open source project licensed under the GNU General Public Licence v3.
Effectiveness of back-to-back testing
NASA Technical Reports Server (NTRS)
Vouk, Mladen A.; Mcallister, David F.; Eckhardt, David E.; Caglayan, Alper; Kelly, John P. J.
1987-01-01
Three models of back-to-back testing processes are described. Two models treat the case where there is no intercomponent failure dependence. The third model describes the more realistic case where there is correlation among the failure probabilities of the functionally equivalent components. The theory indicates that back-to-back testing can, under the right conditions, provide a considerable gain in software reliability. The models are used to analyze the data obtained in a fault-tolerant software experiment. It is shown that the expected gain is indeed achieved, and exceeded, provided the intercomponent failure dependence is sufficiently small. However, even with the relatively high correlation the use of several functionally equivalent components coupled with back-to-back testing may provide a considerable reliability gain. Implications of this finding are that the multiversion software development is a feasible and cost effective approach to providing highly reliable software components intended for fault-tolerant software systems, on condition that special attention is directed at early detection and elimination of correlated faults.
NASA Technical Reports Server (NTRS)
Briand, Lionel C.; Basili, Victor R.; Hetmanski, Christopher J.
1992-01-01
Applying equal testing and verification effort to all parts of a software system is not very efficient, especially when resources are limited and scheduling is tight. Therefore, one needs to be able to differentiate low/high fault density components so that the testing/verification effort can be concentrated where needed. Such a strategy is expected to detect more faults and thus improve the resulting reliability of the overall system. This paper presents an alternative approach for constructing such models that is intended to fulfill specific software engineering needs (i.e. dealing with partial/incomplete information and creating models that are easy to interpret). Our approach to classification is as follows: (1) to measure the software system to be considered; and (2) to build multivariate stochastic models for prediction. We present experimental results obtained by classifying FORTRAN components developed at the NASA/GSFC into two fault density classes: low and high. Also we evaluate the accuracy of the model and the insights it provides into the software process.
Fault Mitigation Schemes for Future Spaceflight Multicore Processors
NASA Technical Reports Server (NTRS)
Alexander, James W.; Clement, Bradley J.; Gostelow, Kim P.; Lai, John Y.
2012-01-01
Future planetary exploration missions demand significant advances in on-board computing capabilities over current avionics architectures based on a single-core processing element. The state-of-the-art multi-core processor provides much promise in meeting such challenges while introducing new fault tolerance problems when applied to space missions. Software-based schemes are being presented in this paper that can achieve system-level fault mitigation beyond that provided by radiation-hard-by-design (RHBD). For mission and time critical applications such as the Terrain Relative Navigation (TRN) for planetary or small body navigation, and landing, a range of fault tolerance methods can be adapted by the application. The software methods being investigated include Error Correction Code (ECC) for data packet routing between cores, virtual network routing, Triple Modular Redundancy (TMR), and Algorithm-Based Fault Tolerance (ABFT). A robust fault tolerance framework that provides fail-operational behavior under hard real-time constraints and graceful degradation will be demonstrated using TRN executing on a commercial Tilera(R) processor with simulated fault injections.
NASA Technical Reports Server (NTRS)
Al Hassan, Mohammad; Britton, Paul; Hatfield, Glen Spencer; Novack, Steven D.
2017-01-01
Today's launch vehicles complex electronic and avionics systems heavily utilize Field Programmable Gate Array (FPGA) integrated circuits (IC) for their superb speed and reconfiguration capabilities. Consequently, FPGAs are prevalent ICs in communication protocols such as MILSTD- 1553B and in control signal commands such as in solenoid valve actuations. This paper will identify reliability concerns and high level guidelines to estimate FPGA total failure rates in a launch vehicle application. The paper will discuss hardware, hardware description language, and radiation induced failures. The hardware contribution of the approach accounts for physical failures of the IC. The hardware description language portion will discuss the high level FPGA programming languages and software/code reliability growth. The radiation portion will discuss FPGA susceptibility to space environment radiation.
Reliability analysis of the solar array based on Fault Tree Analysis
NASA Astrophysics Data System (ADS)
Jianing, Wu; Shaoze, Yan
2011-07-01
The solar array is an important device used in the spacecraft, which influences the quality of in-orbit operation of the spacecraft and even the launches. This paper analyzes the reliability of the mechanical system and certifies the most vital subsystem of the solar array. The fault tree analysis (FTA) model is established according to the operating process of the mechanical system based on DFH-3 satellite; the logical expression of the top event is obtained by Boolean algebra and the reliability of the solar array is calculated. The conclusion shows that the hinges are the most vital links between the solar arrays. By analyzing the structure importance(SI) of the hinge's FTA model, some fatal causes, including faults of the seal, insufficient torque of the locking spring, temperature in space, and friction force, can be identified. Damage is the initial stage of the fault, so limiting damage is significant to prevent faults. Furthermore, recommendations for improving reliability associated with damage limitation are discussed, which can be used for the redesigning of the solar array and the reliability growth planning.
Simplex GPS and InSAR Inversion Software
NASA Technical Reports Server (NTRS)
Donnellan, Andrea; Parker, Jay W.; Lyzenga, Gregory A.; Pierce, Marlon E.
2012-01-01
Changes in the shape of the Earth's surface can be routinely measured with precisions better than centimeters. Processes below the surface often drive these changes and as a result, investigators require models with inversion methods to characterize the sources. Simplex inverts any combination of GPS (global positioning system), UAVSAR (uninhabited aerial vehicle synthetic aperture radar), and InSAR (interferometric synthetic aperture radar) data simultaneously for elastic response from fault and fluid motions. It can be used to solve for multiple faults and parameters, all of which can be specified or allowed to vary. The software can be used to study long-term tectonic motions and the faults responsible for those motions, or can be used to invert for co-seismic slip from earthquakes. Solutions involving estimation of fault motion and changes in fluid reservoirs such as magma or water are possible. Any arbitrary number of faults or parameters can be considered. Simplex specifically solves for any of location, geometry, fault slip, and expansion/contraction of a single or multiple faults. It inverts GPS and InSAR data for elastic dislocations in a half-space. Slip parameters include strike slip, dip slip, and tensile dislocations. It includes a map interface for both setting up the models and viewing the results. Results, including faults, and observed, computed, and residual displacements, are output in text format, a map interface, and can be exported to KML. The software interfaces with the QuakeTables database allowing a user to select existing fault parameters or data. Simplex can be accessed through the QuakeSim portal graphical user interface or run from a UNIX command line.
Modeling and Performance Considerations for Automated Fault Isolation in Complex Systems
NASA Technical Reports Server (NTRS)
Ferrell, Bob; Oostdyk, Rebecca
2010-01-01
The purpose of this paper is to document the modeling considerations and performance metrics that were examined in the development of a large-scale Fault Detection, Isolation and Recovery (FDIR) system. The FDIR system is envisioned to perform health management functions for both a launch vehicle and the ground systems that support the vehicle during checkout and launch countdown by using suite of complimentary software tools that alert operators to anomalies and failures in real-time. The FDIR team members developed a set of operational requirements for the models that would be used for fault isolation and worked closely with the vendor of the software tools selected for fault isolation to ensure that the software was able to meet the requirements. Once the requirements were established, example models of sufficient complexity were used to test the performance of the software. The results of the performance testing demonstrated the need for enhancements to the software in order to meet the demands of the full-scale ground and vehicle FDIR system. The paper highlights the importance of the development of operational requirements and preliminary performance testing as a strategy for identifying deficiencies in highly scalable systems and rectifying those deficiencies before they imperil the success of the project
Fault tree safety analysis of a large Li/SOCl(sub)2 spacecraft battery
NASA Technical Reports Server (NTRS)
Uy, O. Manuel; Maurer, R. H.
1987-01-01
The results of the safety fault tree analysis on the eight module, 576 F cell Li/SOCl2 battery on the spacecraft and in the integration and test environment prior to launch on the ground are presented. The analysis showed that with the right combination of blocking diodes, electrical fuses, thermal fuses, thermal switches, cell balance, cell vents, and battery module vents the probability of a single cell or a 72 cell module exploding can be reduced to .000001, essentially the probability due to explosion for unexplained reasons.
NASA Technical Reports Server (NTRS)
Briand, Lionel C.; Basili, Victor R.; Hetmanski, Christopher J.
1993-01-01
Applying equal testing and verification effort to all parts of a software system is not very efficient, especially when resources are limited and scheduling is tight. Therefore, one needs to be able to differentiate low/high fault frequency components so that testing/verification effort can be concentrated where needed. Such a strategy is expected to detect more faults and thus improve the resulting reliability of the overall system. This paper presents the Optimized Set Reduction approach for constructing such models, intended to fulfill specific software engineering needs. Our approach to classification is to measure the software system and build multivariate stochastic models for predicting high risk system components. We present experimental results obtained by classifying Ada components into two classes: is or is not likely to generate faults during system and acceptance test. Also, we evaluate the accuracy of the model and the insights it provides into the error making process.
PDSS/IMC requirements and functional specifications
NASA Technical Reports Server (NTRS)
1983-01-01
The system (software and hardware) requirements for the Payload Development Support System (PDSS)/Image Motion Compensator (IMC) are provided. The PDSS/IMC system provides the capability for performing Image Motion Compensator Electronics (IMCE) flight software test, checkout, and verification and provides the capability for monitoring the IMC flight computer system during qualification testing for fault detection and fault isolation.
SIRU development. Volume 3: Software description and program documentation
NASA Technical Reports Server (NTRS)
Oehrle, J.
1973-01-01
The development and initial evaluation of a strapdown inertial reference unit (SIRU) system are discussed. The SIRU configuration is a modular inertial subsystem with hardware and software features that achieve fault tolerant operational capabilities. The SIRU redundant hardware design is formulated about a six gyro and six accelerometer instrument module package. The six axes array provides redundant independent sensing and the symmetry enables the formulation of an optimal software redundant data processing structure with self-contained fault detection and isolation (FDI) capabilities. The basic SIRU software coding system used in the DDP-516 computer is documented.
Towards Certification of a Space System Application of Fault Detection and Isolation
NASA Technical Reports Server (NTRS)
Feather, Martin S.; Markosian, Lawrence Z.
2008-01-01
Advanced fault detection, isolation and recovery (FDIR) software is being investigated at NASA as a means to the improve reliability and availability of its space systems. Certification is a critical step in the acceptance of such software. Its attainment hinges on performing the necessary verification and validation to show that the software will fulfill its requirements in the intended setting. Presented herein is our ongoing work to plan for the certification of a pilot application of advanced FDIR software in a NASA setting. We describe the application, and the key challenges and opportunities it offers for certification.
NASA Technical Reports Server (NTRS)
Lala, J. H.; Smith, T. B., III
1983-01-01
The experimental test and evaluation of the Fault-Tolerant Multiprocessor (FTMP) is described. Major objectives of this exercise include expanding validation envelope, building confidence in the system, revealing any weaknesses in the architectural concepts and in their execution in hardware and software, and in general, stressing the hardware and software. To this end, pin-level faults were injected into one LRU of the FTMP and the FTMP response was measured in terms of fault detection, isolation, and recovery times. A total of 21,055 stuck-at-0, stuck-at-1 and invert-signal faults were injected in the CPU, memory, bus interface circuits, Bus Guardian Units, and voters and error latches. Of these, 17,418 were detected. At least 80 percent of undetected faults are estimated to be on unused pins. The multiprocessor identified all detected faults correctly and recovered successfully in each case. Total recovery time for all faults averaged a little over one second. This can be reduced to half a second by including appropriate self-tests.
Fault Tolerant Software Technology for Distributed Computer Systems
1989-03-01
RAY.) &-TR-88-296 I Fin;.’ Technical Report ,r 19,39 i A28 3329 F’ULT TOLERANT SOFTWARE TECHNOLOGY FOR DISTRIBUTED COMPUTER SYSTEMS Georgia Institute...GrfisABN 34-70IiWftlI NO0. IN?3. NO IACCESSION NO. 158 21 7 11. TITLE (Incld security Cassification) FAULT TOLERANT SOFTWARE FOR DISTRIBUTED COMPUTER ...Technology for Distributed Computing Systems," a two year effort performed at Georgia Institute of Technology as part of the Clouds Project. The Clouds
Staged-Fault Testing of Distance Protection Relay Settings
NASA Astrophysics Data System (ADS)
Havelka, J.; Malarić, R.; Frlan, K.
2012-01-01
In order to analyze the operation of the protection system during induced fault testing in the Croatian power system, a simulation using the CAPE software has been performed. The CAPE software (Computer-Aided Protection Engineering) is expert software intended primarily for relay protection engineers, which calculates current and voltage values during faults in the power system, so that relay protection devices can be properly set up. Once the accuracy of the simulation model had been confirmed, a series of simulations were performed in order to obtain the optimal fault location to test the protection system. The simulation results were used to specify the test sequence definitions for the end-to-end relay testing using advanced testing equipment with GPS synchronization for secondary injection in protection schemes based on communication. The objective of the end-to-end testing was to perform field validation of the protection settings, including verification of the circuit breaker operation, telecommunication channel time and the effectiveness of the relay algorithms. Once the end-to-end secondary injection testing had been completed, the induced fault testing was performed with three-end lines loaded and in service. This paper describes and analyses the test procedure, consisting of CAPE simulations, end-to-end test with advanced secondary equipment and staged-fault test of a three-end power line in the Croatian transmission system.
Distributed asynchronous microprocessor architectures in fault tolerant integrated flight systems
NASA Technical Reports Server (NTRS)
Dunn, W. R.
1983-01-01
The paper discusses the implementation of fault tolerant digital flight control and navigation systems for rotorcraft application. It is shown that in implementing fault tolerance at the systems level using advanced LSI/VLSI technology, aircraft physical layout and flight systems requirements tend to define a system architecture of distributed, asynchronous microprocessors in which fault tolerance can be achieved locally through hardware redundancy and/or globally through application of analytical redundancy. The effects of asynchronism on the execution of dynamic flight software is discussed. It is shown that if the asynchronous microprocessors have knowledge of time, these errors can be significantly reduced through appropiate modifications of the flight software. Finally, the papear extends previous work to show that through the combined use of time referencing and stable flight algorithms, individual microprocessors can be configured to autonomously tolerate intermittent faults.
Revealing the ISO/IEC 9126-1 Clique Tree for COTS Software Evaluation
NASA Technical Reports Server (NTRS)
Morris, A. Terry
2007-01-01
Previous research has shown that acyclic dependency models, if they exist, can be extracted from software quality standards and that these models can be used to assess software safety and product quality. In the case of commercial off-the-shelf (COTS) software, the extracted dependency model can be used in a probabilistic Bayesian network context for COTS software evaluation. Furthermore, while experts typically employ Bayesian networks to encode domain knowledge, secondary structures (clique trees) from Bayesian network graphs can be used to determine the probabilistic distribution of any software variable (attribute) using any clique that contains that variable. Secondary structures, therefore, provide insight into the fundamental nature of graphical networks. This paper will apply secondary structure calculations to reveal the clique tree of the acyclic dependency model extracted from the ISO/IEC 9126-1 software quality standard. Suggestions will be provided to describe how the clique tree may be exploited to aid efficient transformation of an evaluation model.
NASA Astrophysics Data System (ADS)
LI, Y.; Yang, S. H.
2017-05-01
The Antarctica astronomical telescopes work chronically on the top of the unattended South Pole, and they have only one chance to maintain every year. Due to the complexity of the optical, mechanical, and electrical systems, the telescopes are hard to be maintained and need multi-tasker expedition teams, which means an excessive awareness is essential for the reliability of the Antarctica telescopes. Based on the fault mechanism and fault mode of the main-axis control system for the equatorial Antarctica astronomical telescope AST3-3 (Antarctic Schmidt Telescopes 3-3), the method of fault tree analysis is introduced in this article, and we obtains the importance degree of the top event from the importance degree of the bottom event structure. From the above results, the hidden problems and weak links can be effectively found out, which will indicate the direction for promoting the stability of the system and optimizing the design of the system.
Experiments in fault tolerant software reliability
NASA Technical Reports Server (NTRS)
Mcallister, David F.; Vouk, Mladen A.
1989-01-01
Twenty functionally equivalent programs were built and tested in a multiversion software experiment. Following unit testing, all programs were subjected to an extensive system test. In the process sixty-one distinct faults were identified among the versions. Less than 12 percent of the faults exhibited varying degrees of positive correlation. The common-cause (or similar) faults spanned as many as 14 components. However, a majority of these faults were trivial, and easily detected by proper unit and/or system testing. Only two of the seven similar faults were difficult faults, and both were caused by specification ambiguities. One of these faults exhibited variable identical-and-wrong response span, i.e. response span which varied with the testing conditions and input data. Techniques that could have been used to avoid the faults are discussed. For example, it was determined that back-to-back testing of 2-tuples could have been used to eliminate about 90 percent of the faults. In addition, four of the seven similar faults could have been detected by using back-to-back testing of 5-tuples. It is believed that most, if not all, similar faults could have been avoided had the specifications been written using more formal notation, the unit testing phase was subject to more stringent standards and controls, and better tools for measuring the quality and adequacy of the test data (e.g. coverage) were used.
Fault tree analysis of most common rolling bearing tribological failures
NASA Astrophysics Data System (ADS)
Vencl, Aleksandar; Gašić, Vlada; Stojanović, Blaža
2017-02-01
Wear as a tribological process has a major influence on the reliability and life of rolling bearings. Field examinations of bearing failures due to wear indicate possible causes and point to the necessary measurements for wear reduction or elimination. Wear itself is a very complex process initiated by the action of different mechanisms, and can be manifested by different wear types which are often related. However, the dominant type of wear can be approximately determined. The paper presents the classification of most common bearing damages according to the dominant wear type, i.e. abrasive wear, adhesive wear, surface fatigue wear, erosive wear, fretting wear and corrosive wear. The wear types are correlated with the terms used in ISO 15243 standard. Each wear type is illustrated with an appropriate photograph, and for each wear type, appropriate description of causes and manifestations is presented. Possible causes of rolling bearing failure are used for the fault tree analysis (FTA). It was performed to determine the root causes for bearing failures. The constructed fault tree diagram for rolling bearing failure can be useful tool for maintenance engineers.
Renjith, V R; Madhu, G; Nayagam, V Lakshmana Gomathi; Bhasi, A B
2010-11-15
The hazards associated with major accident hazard (MAH) industries are fire, explosion and toxic gas releases. Of these, toxic gas release is the worst as it has the potential to cause extensive fatalities. Qualitative and quantitative hazard analyses are essential for the identification and quantification of these hazards related to chemical industries. Fault tree analysis (FTA) is an established technique in hazard identification. This technique has the advantage of being both qualitative and quantitative, if the probabilities and frequencies of the basic events are known. This paper outlines the estimation of the probability of release of chlorine from storage and filling facility of chlor-alkali industry using FTA. An attempt has also been made to arrive at the probability of chlorine release using expert elicitation and proven fuzzy logic technique for Indian conditions. Sensitivity analysis has been done to evaluate the percentage contribution of each basic event that could lead to chlorine release. Two-dimensional fuzzy fault tree analysis (TDFFTA) has been proposed for balancing the hesitation factor involved in expert elicitation. Copyright © 2010 Elsevier B.V. All rights reserved.
An experimental investigation of fault tolerant software structures in an avionics application
NASA Technical Reports Server (NTRS)
Caglayan, Alper K.; Eckhardt, Dave E., Jr.
1989-01-01
The objective of this experimental investigation is to compare the functional performance and software reliability of competing fault tolerant software structures utilizing software diversity. In this experiment, three versions of the redundancy management software for a skewed sensor array have been developed using three diverse failure detection and isolation algorithms and incorporated into various N-version, recovery block and hybrid software structures. The empirical results show that, for maximum functional performance improvement in the selected application domain, the results of diverse algorithms should be voted before being processed by multiple versions without enforced diversity. Results also suggest that when the reliability gain with an N-version structure is modest, recovery block structures are more feasible since higher reliability can be obtained using an acceptance check with a modest reliability.
Using recurrence plot analysis for software execution interpretation and fault detection
NASA Astrophysics Data System (ADS)
Mosdorf, M.
2015-09-01
This paper shows a method targeted at software execution interpretation and fault detection using recurrence plot analysis. In in the proposed approach recurrence plot analysis is applied to software execution trace that contains executed assembly instructions. Results of this analysis are subject to further processing with PCA (Principal Component Analysis) method that simplifies number coefficients used for software execution classification. This method was used for the analysis of five algorithms: Bubble Sort, Quick Sort, Median Filter, FIR, SHA-1. Results show that some of the collected traces could be easily assigned to particular algorithms (logs from Bubble Sort and FIR algorithms) while others are more difficult to distinguish.
Slicken 1.0: Program for calculating the orientation of shear on reactivated faults
NASA Astrophysics Data System (ADS)
Xu, Hong; Xu, Shunshan; Nieto-Samaniego, Ángel F.; Alaniz-Álvarez, Susana A.
2017-07-01
The slip vector on a fault is an important parameter in the study of the movement history of a fault and its faulting mechanism. Although there exist many graphical programs to represent the shear stress (or slickenline) orientations on faults, programs to quantitatively calculate the orientation of fault slip based on a given stress field are scarce. In consequence, we develop Slicken 1.0, a software to rapidly calculate the orientation of maximum shear stress on any fault plane. For this direct method of calculating the resolved shear stress on a planar surface, the input data are the unit vector normal to the involved plane, the unit vectors of the three principal stress axes, and the stress ratio. The advantage of this program is that the vertical or horizontal principal stresses are not necessarily required. Due to its nimble design using Java SE 8.0, it runs on most operating systems with the corresponding Java VM. The software program will be practical for geoscience students, geologists and engineers and will help resolve a deficiency in field geology, and structural and engineering geology.
Protecting Against Faults in JPL Spacecraft
NASA Technical Reports Server (NTRS)
Morgan, Paula
2007-01-01
A paper discusses techniques for protecting against faults in spacecraft designed and operated by NASA s Jet Propulsion Laboratory (JPL). The paper addresses, more specifically, fault-protection requirements and techniques common to most JPL spacecraft (in contradistinction to unique, mission specific techniques), standard practices in the implementation of these techniques, and fault-protection software architectures. Common requirements include those to protect onboard command, data-processing, and control computers; protect against loss of Earth/spacecraft radio communication; maintain safe temperatures; and recover from power overloads. The paper describes fault-protection techniques as part of a fault-management strategy that also includes functional redundancy, redundant hardware, and autonomous monitoring of (1) the operational and health statuses of spacecraft components, (2) temperatures inside and outside the spacecraft, and (3) allocation of power. The strategy also provides for preprogrammed automated responses to anomalous conditions. In addition, the software running in almost every JPL spacecraft incorporates a general-purpose "Safe Mode" response algorithm that configures the spacecraft in a lower-power state that is safe and predictable, thereby facilitating diagnosis of more complex faults by a team of human experts on Earth.
Survey of critical failure events in on-chip interconnect by fault tree analysis
NASA Astrophysics Data System (ADS)
Yokogawa, Shinji; Kunii, Kyousuke
2018-07-01
In this paper, a framework based on reliability physics is proposed for adopting fault tree analysis (FTA) to the on-chip interconnect system of a semiconductor. By integrating expert knowledge and experience regarding the possibilities of failure on basic events, critical issues of on-chip interconnect reliability will be evaluated by FTA. In particular, FTA is used to identify the minimal cut sets with high risk priority. Critical events affecting the on-chip interconnect reliability are identified and discussed from the viewpoint of long-term reliability assessment. The moisture impact is evaluated as an external event.
A theoretical basis for the analysis of redundant software subject to coincident errors
NASA Technical Reports Server (NTRS)
Eckhardt, D. E., Jr.; Lee, L. D.
1985-01-01
Fundamental to the development of redundant software techniques fault-tolerant software, is an understanding of the impact of multiple-joint occurrences of coincident errors. A theoretical basis for the study of redundant software is developed which provides a probabilistic framework for empirically evaluating the effectiveness of the general (N-Version) strategy when component versions are subject to coincident errors, and permits an analytical study of the effects of these errors. The basic assumptions of the model are: (1) independently designed software components are chosen in a random sample; and (2) in the user environment, the system is required to execute on a stationary input series. The intensity of coincident errors, has a central role in the model. This function describes the propensity to introduce design faults in such a way that software components fail together when executing in the user environment. The model is used to give conditions under which an N-Version system is a better strategy for reducing system failure probability than relying on a single version of software. A condition which limits the effectiveness of a fault-tolerant strategy is studied, and it is posted whether system failure probability varies monotonically with increasing N or whether an optimal choice of N exists.
Expert systems applied to fault isolation and energy storage management, phase 2
NASA Technical Reports Server (NTRS)
1987-01-01
A user's guide for the Fault Isolation and Energy Storage (FIES) II system is provided. Included are a brief discussion of the background and scope of this project, a discussion of basic and advanced operating installation and problem determination procedures for the FIES II system and information on hardware and software design and implementation. A number of appendices are provided including a detailed specification for the microprocessor software, a detailed description of the expert system rule base and a description and listings of the LISP interface software.
Software fault tolerance using data diversity
NASA Technical Reports Server (NTRS)
Knight, John C.
1991-01-01
Research on data diversity is discussed. Data diversity relies on a different form of redundancy from existing approaches to software fault tolerance and is substantially less expensive to implement. Data diversity can also be applied to software testing and greatly facilitates the automation of testing. Up to now it has been explored both theoretically and in a pilot study, and has been shown to be a promising technique. The effectiveness of data diversity as an error detection mechanism and the application of data diversity to differential equation solvers are discussed.
An empirical study of flight control software reliability
NASA Technical Reports Server (NTRS)
Dunham, J. R.; Pierce, J. L.
1986-01-01
The results of a laboratory experiment in flight control software reliability are reported. The experiment tests a small sample of implementations of a pitch axis control law for a PA28 aircraft with over 14 million pitch commands with varying levels of additive input and feedback noise. The testing which uses the method of n-version programming for error detection surfaced four software faults in one implementation of the control law. The small number of detected faults precluded the conduct of the error burst analyses. The pitch axis problem provides data for use in constructing a model in the prediction of the reliability of software in systems with feedback. The study is undertaken to find means to perform reliability evaluations of flight control software.
Sun, Weifang; Yao, Bin; Zeng, Nianyin; Chen, Binqiang; He, Yuchao; Cao, Xincheng; He, Wangpeng
2017-07-12
As a typical example of large and complex mechanical systems, rotating machinery is prone to diversified sorts of mechanical faults. Among these faults, one of the prominent causes of malfunction is generated in gear transmission chains. Although they can be collected via vibration signals, the fault signatures are always submerged in overwhelming interfering contents. Therefore, identifying the critical fault's characteristic signal is far from an easy task. In order to improve the recognition accuracy of a fault's characteristic signal, a novel intelligent fault diagnosis method is presented. In this method, a dual-tree complex wavelet transform (DTCWT) is employed to acquire the multiscale signal's features. In addition, a convolutional neural network (CNN) approach is utilized to automatically recognise a fault feature from the multiscale signal features. The experiment results of the recognition for gear faults show the feasibility and effectiveness of the proposed method, especially in the gear's weak fault features.
NASA Technical Reports Server (NTRS)
Quinn, Todd M.; Walters, Jerry L.
1991-01-01
Future space explorations will require long term human presence in space. Space environments that provide working and living quarters for manned missions are becoming increasingly larger and more sophisticated. Monitor and control of the space environment subsystems by expert system software, which emulate human reasoning processes, could maintain the health of the subsystems and help reduce the human workload. The autonomous power expert (APEX) system was developed to emulate a human expert's reasoning processes used to diagnose fault conditions in the domain of space power distribution. APEX is a fault detection, isolation, and recovery (FDIR) system, capable of autonomous monitoring and control of the power distribution system. APEX consists of a knowledge base, a data base, an inference engine, and various support and interface software. APEX provides the user with an easy-to-use interactive interface. When a fault is detected, APEX will inform the user of the detection. The user can direct APEX to isolate the probable cause of the fault. Once a fault has been isolated, the user can ask APEX to justify its fault isolation and to recommend actions to correct the fault. APEX implementation and capabilities are discussed.
Fault Management Architectures and the Challenges of Providing Software Assurance
NASA Technical Reports Server (NTRS)
Savarino, Shirley; Fitz, Rhonda; Fesq, Lorraine; Whitman, Gerek
2015-01-01
Fault Management (FM) is focused on safety, the preservation of assets, and maintaining the desired functionality of the system. How FM is implemented varies among missions. Common to most missions is system complexity due to a need to establish a multi-dimensional structure across hardware, software and spacecraft operations. FM is necessary to identify and respond to system faults, mitigate technical risks and ensure operational continuity. Generally, FM architecture, implementation, and software assurance efforts increase with mission complexity. Because FM is a systems engineering discipline with a distributed implementation, providing efficient and effective verification and validation (V&V) is challenging. A breakout session at the 2012 NASA Independent Verification & Validation (IV&V) Annual Workshop titled "V&V of Fault Management: Challenges and Successes" exposed this issue in terms of V&V for a representative set of architectures. NASA's Software Assurance Research Program (SARP) has provided funds to NASA IV&V to extend the work performed at the Workshop session in partnership with NASA's Jet Propulsion Laboratory (JPL). NASA IV&V will extract FM architectures across the IV&V portfolio and evaluate the data set, assess visibility for validation and test, and define software assurance methods that could be applied to the various architectures and designs. This SARP initiative focuses efforts on FM architectures from critical and complex projects within NASA. The identification of particular FM architectures and associated V&V/IV&V techniques provides a data set that can enable improved assurance that a system will adequately detect and respond to adverse conditions. Ultimately, results from this activity will be incorporated into the NASA Fault Management Handbook providing dissemination across NASA, other agencies and the space community. This paper discusses the approach taken to perform the evaluations and preliminary findings from the research.
Fault Management Architectures and the Challenges of Providing Software Assurance
NASA Technical Reports Server (NTRS)
Savarino, Shirley; Fitz, Rhonda; Fesq, Lorraine; Whitman, Gerek
2015-01-01
The satellite systems Fault Management (FM) is focused on safety, the preservation of assets, and maintaining the desired functionality of the system. How FM is implemented varies among missions. Common to most is system complexity due to a need to establish a multi-dimensional structure across hardware, software and operations. This structure is necessary to identify and respond to system faults, mitigate technical risks and ensure operational continuity. These architecture, implementation and software assurance efforts increase with mission complexity. Because FM is a systems engineering discipline with a distributed implementation, providing efficient and effective verification and validation (VV) is challenging. A breakout session at the 2012 NASA Independent Verification Validation (IVV) Annual Workshop titled VV of Fault Management: Challenges and Successes exposed these issues in terms of VV for a representative set of architectures. NASA's IVV is funded by NASA's Software Assurance Research Program (SARP) in partnership with NASA's Jet Propulsion Laboratory (JPL) to extend the work performed at the Workshop session. NASA IVV will extract FM architectures across the IVV portfolio and evaluate the data set for robustness, assess visibility for validation and test, and define software assurance methods that could be applied to the various architectures and designs. This work focuses efforts on FM architectures from critical and complex projects within NASA. The identification of particular FM architectures, visibility, and associated VVIVV techniques provides a data set that can enable higher assurance that a satellite system will adequately detect and respond to adverse conditions. Ultimately, results from this activity will be incorporated into the NASA Fault Management Handbook providing dissemination across NASA, other agencies and the satellite community. This paper discusses the approach taken to perform the evaluations and preliminary findings from the research including identification of FM architectures, visibility observations, and methods utilized for VVIVV.
NASA Technical Reports Server (NTRS)
Al Hassan, Mohammad; Novack, Steven D.; Hatfield, Glen S.; Britton, Paul
2017-01-01
Today's launch vehicles complex electronic and avionic systems heavily utilize the Field Programmable Gate Array (FPGA) integrated circuit (IC). FPGAs are prevalent ICs in communication protocols such as MIL-STD-1553B, and in control signal commands such as in solenoid/servo valves actuations. This paper will demonstrate guidelines to estimate FPGA failure rates for a launch vehicle, the guidelines will account for hardware, firmware, and radiation induced failures. The hardware contribution of the approach accounts for physical failures of the IC, FPGA memory and clock. The firmware portion will provide guidelines on the high level FPGA programming language and ways to account for software/code reliability growth. The radiation portion will provide guidelines on environment susceptibility as well as guidelines on tailoring other launch vehicle programs historical data to a specific launch vehicle.
NASA Technical Reports Server (NTRS)
Lala, J. H.; Smith, T. B., III
1983-01-01
The software developed for the Fault-Tolerant Multiprocessor (FTMP) is described. The FTMP executive is a timer-interrupt driven dispatcher that schedules iterative tasks which run at 3.125, 12.5, and 25 Hz. Major tasks which run under the executive include system configuration control, flight control, and display. The flight control task includes autopilot and autoland functions for a jet transport aircraft. System Displays include status displays of all hardware elements (processors, memories, I/O ports, buses), failure log displays showing transient and hard faults, and an autopilot display. All software is in a higher order language (AED, an ALGOL derivative). The executive is a fully distributed general purpose executive which automatically balances the load among available processor triads. Provisions for graceful performance degradation under processing overload are an integral part of the scheduling algorithms.
Software for determining the true displacement of faults
NASA Astrophysics Data System (ADS)
Nieto-Fuentes, R.; Nieto-Samaniego, Á. F.; Xu, S.-S.; Alaniz-Álvarez, S. A.
2014-03-01
One of the most important parameters of faults is the true (or net) displacement, which is measured by restoring two originally adjacent points, called “piercing points”, to their original positions. This measurement is not typically applicable because it is rare to observe piercing points in natural outcrops. Much more common is the measurement of the apparent displacement of a marker. Methods to calculate the true displacement of faults using descriptive geometry, trigonometry or vector algebra are common in the literature, and most of them solve a specific situation from a large amount of possible combinations of the fault parameters. True displacements are not routinely calculated because it is a tedious and tiring task, despite their importance and the relatively simple methodology. We believe that the solution is to develop software capable of performing this work. In a previous publication, our research group proposed a method to calculate the true displacement of faults by solving most combinations of fault parameters using simple trigonometric equations. The purpose of this contribution is to present a computer program for calculating the true displacement of faults. The input data are the dip of the fault; the pitch angles of the markers, slickenlines and observation lines; and the marker separation. To prevent the common difficulties involved in switching between operative systems, the software is developed using the Java programing language. The computer program could be used as a tool in education and will also be useful for the calculation of the true fault displacement in geological and engineering works. The application resolves the cases with known direction of net slip, which commonly is assumed parallel to the slickenlines. This assumption is not always valid and must be used with caution, because the slickenlines are formed during a step of the incremental displacement on the fault surface, whereas the net slip is related to the finite slip.
Determining preventability of pediatric readmissions using fault tree analysis.
Jonas, Jennifer A; Devon, Erin Pete; Ronan, Jeanine C; Ng, Sonia C; Owusu-McKenzie, Jacqueline Y; Strausbaugh, Janet T; Fieldston, Evan S; Hart, Jessica K
2016-05-01
Previous studies attempting to distinguish preventable from nonpreventable readmissions reported challenges in completing reviews efficiently and consistently. (1) Examine the efficiency and reliability of a Web-based fault tree tool designed to guide physicians through chart reviews to a determination about preventability. (2) Investigate root causes of general pediatrics readmissions and identify the percent that are preventable. General pediatricians from The Children's Hospital of Philadelphia used a Web-based fault tree tool to classify root causes of all general pediatrics 15-day readmissions in 2014. The tool guided reviewers through a logical progression of questions, which resulted in 1 of 18 root causes of readmission, 8 of which were considered potentially preventable. Twenty percent of cases were cross-checked to measure inter-rater reliability. Of the 7252 discharges, 248 were readmitted, for an all-cause general pediatrics 15-day readmission rate of 3.4%. Of those readmissions, 15 (6.0%) were deemed potentially preventable, corresponding to 0.2% of total discharges. The most common cause of potentially preventable readmissions was premature discharge. For the 50 cross-checked cases, both reviews resulted in the same root cause for 44 (86%) of files (κ = 0.79; 95% confidence interval: 0.60-0.98). Completing 1 review using the tool took approximately 20 minutes. The Web-based fault tree tool helped physicians to identify root causes of hospital readmissions and classify them as either preventable or not preventable in an efficient and consistent way. It also confirmed that only a small percentage of general pediatrics 15-day readmissions are potentially preventable. Journal of Hospital Medicine 2016;11:329-335. © 2016 Society of Hospital Medicine. © 2016 Society of Hospital Medicine.
Katzman, Braden; Tang, Doris; Santella, Anthony; Bao, Zhirong
2018-04-04
AceTree, a software application first released in 2006, facilitates exploration, curation and editing of tracked C. elegans nuclei in 4-dimensional (4D) fluorescence microscopy datasets. Since its initial release, AceTree has been continuously used to interact with, edit and interpret C. elegans lineage data. In its 11 year lifetime, AceTree has been periodically updated to meet the technical and research demands of its community of users. This paper presents the newest iteration of AceTree which contains extensive updates, demonstrates the new applicability of AceTree in other developmental contexts, and presents its evolutionary software development paradigm as a viable model for maintaining scientific software. Large scale updates have been made to the user interface for an improved user experience. Tools have been grouped according to functionality and obsolete methods have been removed. Internal requirements have been changed that enable greater flexibility of use both in C. elegans contexts and in other model organisms. Additionally, the original 3-dimensional (3D) viewing window has been completely reimplemented. The new window provides a new suite of tools for data exploration. By responding to technical advancements and research demands, AceTree has remained a useful tool for scientific research for over a decade. The updates made to the codebase have extended AceTree's applicability beyond its initial use in C. elegans and enabled its usage with other model organisms. The evolution of AceTree demonstrates a viable model for maintaining scientific software over long periods of time.
Risk-Significant Adverse Condition Awareness Strengthens Assurance of Fault Management Systems
NASA Technical Reports Server (NTRS)
Fitz, Rhonda
2017-01-01
As spaceflight systems increase in complexity, Fault Management (FM) systems are ranked high in risk-based assessment of software criticality, emphasizing the importance of establishing highly competent domain expertise to provide assurance. Adverse conditions (ACs) and specific vulnerabilities encountered by safety- and mission-critical software systems have been identified through efforts to reduce the risk posture of software-intensive NASA missions. Acknowledgement of potential off-nominal conditions and analysis to determine software system resiliency are important aspects of hazard analysis and FM. A key component of assuring FM is an assessment of how well software addresses susceptibility to failure through consideration of ACs. Focus on significant risk predicted through experienced analysis conducted at the NASA Independent Verification & Validation (IV&V) Program enables the scoping of effective assurance strategies with regard to overall asset protection of complex spaceflight as well as ground systems. Research efforts sponsored by NASAs Office of Safety and Mission Assurance (OSMA) defined terminology, categorized data fields, and designed a baseline repository that centralizes and compiles a comprehensive listing of ACs and correlated data relevant across many NASA missions. This prototype tool helps projects improve analysis by tracking ACs and allowing queries based on project, mission type, domain/component, causal fault, and other key characteristics. Vulnerability in off-nominal situations, architectural design weaknesses, and unexpected or undesirable system behaviors in reaction to faults are curtailed with the awareness of ACs and risk-significant scenarios modeled for analysts through this database. Integration within the Enterprise Architecture at NASA IV&V enables interfacing with other tools and datasets, technical support, and accessibility across the Agency. This paper discusses the development of an improved workflow process utilizing this database for adaptive, risk-informed FM assurance that critical software systems will safely and securely protect against faults and respond to ACs in order to achieve successful missions.
Risk-Significant Adverse Condition Awareness Strengthens Assurance of Fault Management Systems
NASA Technical Reports Server (NTRS)
Fitz, Rhonda
2017-01-01
As spaceflight systems increase in complexity, Fault Management (FM) systems are ranked high in risk-based assessment of software criticality, emphasizing the importance of establishing highly competent domain expertise to provide assurance. Adverse conditions (ACs) and specific vulnerabilities encountered by safety- and mission-critical software systems have been identified through efforts to reduce the risk posture of software-intensive NASA missions. Acknowledgement of potential off-nominal conditions and analysis to determine software system resiliency are important aspects of hazard analysis and FM. A key component of assuring FM is an assessment of how well software addresses susceptibility to failure through consideration of ACs. Focus on significant risk predicted through experienced analysis conducted at the NASA Independent Verification Validation (IVV) Program enables the scoping of effective assurance strategies with regard to overall asset protection of complex spaceflight as well as ground systems. Research efforts sponsored by NASA's Office of Safety and Mission Assurance defined terminology, categorized data fields, and designed a baseline repository that centralizes and compiles a comprehensive listing of ACs and correlated data relevant across many NASA missions. This prototype tool helps projects improve analysis by tracking ACs and allowing queries based on project, mission type, domaincomponent, causal fault, and other key characteristics. Vulnerability in off-nominal situations, architectural design weaknesses, and unexpected or undesirable system behaviors in reaction to faults are curtailed with the awareness of ACs and risk-significant scenarios modeled for analysts through this database. Integration within the Enterprise Architecture at NASA IVV enables interfacing with other tools and datasets, technical support, and accessibility across the Agency. This paper discusses the development of an improved workflow process utilizing this database for adaptive, risk-informed FM assurance that critical software systems will safely and securely protect against faults and respond to ACs in order to achieve successful missions.
Risk Analysis of Return Support Material on Gas Compressor Platform Project
NASA Astrophysics Data System (ADS)
Silvianita; Aulia, B. U.; Khakim, M. L. N.; Rosyid, Daniel M.
2017-07-01
On a fixed platforms project are not only carried out by a contractor, but two or more contractors. Cooperation in the construction of fixed platforms is often not according to plan, it is caused by several factors. It takes a good synergy between the contractor to avoid miss communication may cause problems on the project. For the example is about support material (sea fastening, skid shoe and shipping support) used in the process of sending a jacket structure to operation place often does not return to the contractor. It needs a systematic method to overcome the problem of support material. This paper analyses the causes and effects of GAS Compressor Platform that support material is not return, using Fault Tree Analysis (FTA) and Event Tree Analysis (ETA). From fault tree analysis, the probability of top event is 0.7783. From event tree analysis diagram, the contractors lose Rp.350.000.000, - to Rp.10.000.000.000, -.
Mines Systems Safety Improvement Using an Integrated Event Tree and Fault Tree Analysis
NASA Astrophysics Data System (ADS)
Kumar, Ranjan; Ghosh, Achyuta Krishna
2017-04-01
Mines systems such as ventilation system, strata support system, flame proof safety equipment, are exposed to dynamic operational conditions such as stress, humidity, dust, temperature, etc., and safety improvement of such systems can be done preferably during planning and design stage. However, the existing safety analysis methods do not handle the accident initiation and progression of mine systems explicitly. To bridge this gap, this paper presents an integrated Event Tree (ET) and Fault Tree (FT) approach for safety analysis and improvement of mine systems design. This approach includes ET and FT modeling coupled with redundancy allocation technique. In this method, a concept of top hazard probability is introduced for identifying system failure probability and redundancy is allocated to the system either at component or system level. A case study on mine methane explosion safety with two initiating events is performed. The results demonstrate that the presented method can reveal the accident scenarios and improve the safety of complex mine systems simultaneously.
NASA Astrophysics Data System (ADS)
Li, Yongbo; Li, Guoyan; Yang, Yuantao; Liang, Xihui; Xu, Minqiang
2018-05-01
The fault diagnosis of planetary gearboxes is crucial to reduce the maintenance costs and economic losses. This paper proposes a novel fault diagnosis method based on adaptive multi-scale morphological filter (AMMF) and modified hierarchical permutation entropy (MHPE) to identify the different health conditions of planetary gearboxes. In this method, AMMF is firstly adopted to remove the fault-unrelated components and enhance the fault characteristics. Second, MHPE is utilized to extract the fault features from the denoised vibration signals. Third, Laplacian score (LS) approach is employed to refine the fault features. In the end, the obtained features are fed into the binary tree support vector machine (BT-SVM) to accomplish the fault pattern identification. The proposed method is numerically and experimentally demonstrated to be able to recognize the different fault categories of planetary gearboxes.
The Local Wind Pump for Marginal Societies in Indonesia: A Perspective of Fault Tree Analysis
NASA Astrophysics Data System (ADS)
Gunawan, Insan; Taufik, Ahmad
2007-10-01
There are many efforts to reduce a cost of investment of well established hybrid wind pump applied to rural areas. A recent study on a local wind pump (LWP) for marginal societies in Indonesia (traditional farmers, peasant and tribes) was one of the efforts reporting a new application area. The objectives of the study were defined to measure reliability value of the LWP due to fluctuated wind intensity, low wind speed, economic point of view regarding a prolong economic crisis occurring and an available local component of the LWP and to sustain economics productivity (agriculture product) of the society. In the study, a fault tree analysis (FTA) was deployed as one of three methods used for assessing the LWP. In this article, the FTA has been thoroughly discussed in order to improve a better performance of the LWP applied in dry land watering system of Mesuji district of Lampung province-Indonesia. In the early stage, all of local component of the LWP was classified in term of its function. There were four groups of the components. Moreover, all of the sub components of each group were subjected to failure modes of the FTA, namely (1) primary failure modes; (2) secondary failure modes and (3) common failure modes. In the data processing stage, an available software package, ITEM was deployed. It was observed that the component indicated obtaining relative a long life duration of operational life cycle in 1,666 hours. Moreover, to enhance high performance the LWP, maintenance schedule, critical sub component suffering from failure and an overhaul priority have been identified in term of quantity values. Throughout a year pilot project, it can be concluded that the LWP is a reliable product to the societies enhancing their economics productivities.
2013-05-01
specifics of the correlation will be explored followed by discussion of new paradigms— the ordered event list (OEL) and the decision tree — that result from...4.2.1 Brief Overview of the Decision Tree Paradigm ................................................15 4.2.2 OEL Explained...6 Figure 3. A depiction of a notional fault/activation tree . ................................................................7
The Dangers of Failure Masking in Fault-Tolerant Software: Aspects of a Recent In-Flight Upset Event
NASA Technical Reports Server (NTRS)
Johnson, C. W.; Holloway, C. M.
2007-01-01
On 1 August 2005, a Boeing Company 777-200 aircraft, operating on an international passenger flight from Australia to Malaysia, was involved in a significant upset event while flying on autopilot. The Australian Transport Safety Bureau's investigation into the event discovered that an anomaly existed in the component software hierarchy that allowed inputs from a known faulty accelerometer to be processed by the air data inertial reference unit (ADIRU) and used by the primary flight computer, autopilot and other aircraft systems. This anomaly had existed in original ADIRU software, and had not been detected in the testing and certification process for the unit. This paper describes the software aspects of the incident in detail, and suggests possible implications concerning complex, safety-critical, fault-tolerant software.
Monitoring of Microseismicity with ArrayTechniques in the Peach Tree Valley Region
NASA Astrophysics Data System (ADS)
Garcia-Reyes, J. L.; Clayton, R. W.
2016-12-01
This study is focused on the analysis of microseismicity along the San Andreas Fault in the PeachTree Valley region. This zone is part of the transition zone between the locked portion to the south (Parkfield, CA) and the creeping section to the north (Jovilet, et al., JGR, 2014). The data for the study comes from a 2-week deployment of 116 Zland nodes in a cross-shaped configuration along (8.2 km) and across (9 km) the Fault. We analyze the distribution of microseismicity using a 3D backprojection technique, and we explore the use of Hidden Markov Models to identify different patterns of microseismicity (Hammer et al., GJI, 2013). The goal of the study is to relate the style of seismicity to the mechanical state of the Fault. The results show the evolution of seismic activity as well as at least two different patterns of seismic signals.
[Impact of water pollution risk in water transfer project based on fault tree analysis].
Liu, Jian-Chang; Zhang, Wei; Wang, Li-Min; Li, Dai-Qing; Fan, Xiu-Ying; Deng, Hong-Bing
2009-09-15
The methods to assess water pollution risk for medium water transfer are gradually being explored. The event-nature-proportion method was developed to evaluate the probability of the single event. Fault tree analysis on the basis of calculation on single event was employed to evaluate the extent of whole water pollution risk for the channel water body. The result indicates, that the risk of pollutants from towns and villages along the line of water transfer project to the channel water body is at high level with the probability of 0.373, which will increase pollution to the channel water body at the rate of 64.53 mg/L COD, 4.57 mg/L NH4(+) -N and 0.066 mg/L volatilization hydroxybenzene, respectively. The measurement of fault probability on the basis of proportion method is proved to be useful in assessing water pollution risk under much uncertainty.
Viewpoint on ISA TR84.0.02--simplified methods and fault tree analysis.
Summers, A E
2000-01-01
ANSI/ISA-S84.01-1996 and IEC 61508 require the establishment of a safety integrity level for any safety instrumented system or safety related system used to mitigate risk. Each stage of design, operation, maintenance, and testing is judged against this safety integrity level. Quantitative techniques can be used to verify whether the safety integrity level is met. ISA-dTR84.0.02 is a technical report under development by ISA, which discusses how to apply quantitative analysis techniques to safety instrumented systems. This paper discusses two of those techniques: (1) Simplified equations and (2) Fault tree analysis.
TH-EF-BRC-03: Fault Tree Analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Thomadsen, B.
2016-06-15
This Hands-on Workshop will be focused on providing participants with experience with the principal tools of TG 100 and hence start to build both competence and confidence in the use of risk-based quality management techniques. The three principal tools forming the basis of TG 100’s risk analysis: Process mapping, Failure-Modes and Effects Analysis and fault-tree analysis will be introduced with a 5 minute refresher presentation and each presentation will be followed by a 30 minute small group exercise. An exercise on developing QM from the risk analysis follows. During the exercise periods, participants will apply the principles in 2 differentmore » clinical scenarios. At the conclusion of each exercise there will be ample time for participants to discuss with each other and the faculty their experience and any challenges encountered. Learning Objectives: To review the principles of Process Mapping, Failure Modes and Effects Analysis and Fault Tree Analysis. To gain familiarity with these three techniques in a small group setting. To share and discuss experiences with the three techniques with faculty and participants. Director, TreatSafely, LLC. Director, Center for the Assessment of Radiological Sciences. Occasional Consultant to the IAEA and Varian.« less
Estimating earthquake-induced failure probability and downtime of critical facilities.
Porter, Keith; Ramer, Kyle
2012-01-01
Fault trees have long been used to estimate failure risk in earthquakes, especially for nuclear power plants (NPPs). One interesting application is that one can assess and manage the probability that two facilities - a primary and backup - would be simultaneously rendered inoperative in a single earthquake. Another is that one can calculate the probabilistic time required to restore a facility to functionality, and the probability that, during any given planning period, the facility would be rendered inoperative for any specified duration. A large new peer-reviewed library of component damageability and repair-time data for the first time enables fault trees to be used to calculate the seismic risk of operational failure and downtime for a wide variety of buildings other than NPPs. With the new library, seismic risk of both the failure probability and probabilistic downtime can be assessed and managed, considering the facility's unique combination of structural and non-structural components, their seismic installation conditions, and the other systems on which the facility relies. An example is offered of real computer data centres operated by a California utility. The fault trees were created and tested in collaboration with utility operators, and the failure probability and downtime results validated in several ways.
Huang, Weiqing; Fan, Hongbo; Qiu, Yongfu; Cheng, Zhiyu; Qian, Yu
2016-02-15
Haze weather has become a serious environmental pollution problem which occurs in many Chinese cities. One of the most critical factors for the formation of haze weather is the exhausts of coal combustion, thus it is meaningful to figure out the causation mechanism between urban haze and the exhausts of coal combustion. Based on above considerations, the fault tree analysis (FAT) approach was employed for the causation mechanism of urban haze in Beijing by considering the risk events related with the exhausts of coal combustion for the first time. Using this approach, firstly the fault tree of the urban haze causation system connecting with coal combustion exhausts was established; consequently the risk events were discussed and identified; then, the minimal cut sets were successfully determined using Boolean algebra; finally, the structure, probability and critical importance degree analysis of the risk events were completed for the qualitative and quantitative assessment. The study results proved that the FTA was an effective and simple tool for the causation mechanism analysis and risk management of urban haze in China. Copyright © 2015 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Mulyana, Cukup; Muhammad, Fajar; Saad, Aswad H.; Mariah, Riveli, Nowo
2017-03-01
Storage tank component is the most critical component in LNG regasification terminal. It has the risk of failure and accident which impacts to human health and environment. Risk assessment is conducted to detect and reduce the risk of failure in storage tank. The aim of this research is determining and calculating the probability of failure in regasification unit of LNG. In this case, the failure is caused by Boiling Liquid Expanding Vapor Explosion (BLEVE) and jet fire in LNG storage tank component. The failure probability can be determined by using Fault Tree Analysis (FTA). Besides that, the impact of heat radiation which is generated is calculated. Fault tree for BLEVE and jet fire on storage tank component has been determined and obtained with the value of failure probability for BLEVE of 5.63 × 10-19 and for jet fire of 9.57 × 10-3. The value of failure probability for jet fire is high enough and need to be reduced by customizing PID scheme of regasification LNG unit in pipeline number 1312 and unit 1. The value of failure probability after customization has been obtained of 4.22 × 10-6.
Runtime Speculative Software-Only Fault Tolerance
2012-06-01
reliability of RSFT, a in-depth analysis on its window of vulnerability is also discussed and measured via simulated fault injection. The performance...propagation of faults through the entire program. For optimal performance, these techniques have to use herotic alias analysis to find the minimum set of...affect program output. No program source code or alias analysis is needed to analyze the fault propagation ahead of time. 2.3 Limitations of Existing
An implementation and performance measurement of the progressive retry technique
NASA Technical Reports Server (NTRS)
Suri, Gaurav; Huang, Yennun; Wang, Yi-Min; Fuchs, W. Kent; Kintala, Chandra
1995-01-01
This paper describes a recovery technique called progressive retry for bypassing software faults in message-passing applications. The technique is implemented as reusable modules to provide application-level software fault tolerance. The paper describes the implementation of the technique and presents results from the application of progressive retry to two telecommunications systems. the results presented show that the technique is helpful in reducing the total recovery time for message-passing applications.
An Empirical Approach to Logical Clustering of Software Failure Regions
1994-03-01
this is a coincidence or normal behavior of failure regions. " Software faults were numbered in order as they were discovered, by the various testing...locations of the associated faults. The goal of this research will be an improved testing technique that incorporates failure region behavior . To do this...clustering behavior . This, however, does not correlate with the structural clustering of failure regions observed by Ginn (1991) on the same set of data
Static and Dynamic Verification of Critical Software for Space Applications
NASA Astrophysics Data System (ADS)
Moreira, F.; Maia, R.; Costa, D.; Duro, N.; Rodríguez-Dapena, P.; Hjortnaes, K.
Space technology is no longer used only for much specialised research activities or for sophisticated manned space missions. Modern society relies more and more on space technology and applications for every day activities. Worldwide telecommunications, Earth observation, navigation and remote sensing are only a few examples of space applications on which we rely daily. The European driven global navigation system Galileo and its associated applications, e.g. air traffic management, vessel and car navigation, will significantly expand the already stringent safety requirements for space based applications Apart from their usefulness and practical applications, every single piece of onboard software deployed into the space represents an enormous investment. With a long lifetime operation and being extremely difficult to maintain and upgrade, at least when comparing with "mainstream" software development, the importance of ensuring their correctness before deployment is immense. Verification &Validation techniques and technologies have a key role in ensuring that the onboard software is correct and error free, or at least free from errors that can potentially lead to catastrophic failures. Many RAMS techniques including both static criticality analysis and dynamic verification techniques have been used as a means to verify and validate critical software and to ensure its correctness. But, traditionally, these have been isolated applied. One of the main reasons is the immaturity of this field in what concerns to its application to the increasing software product(s) within space systems. This paper presents an innovative way of combining both static and dynamic techniques exploiting their synergy and complementarity for software fault removal. The methodology proposed is based on the combination of Software FMEA and FTA with Fault-injection techniques. The case study herein described is implemented with support from two tools: The SoftCare tool for the SFMEA and SFTA, and the Xception tool for fault-injection. Keywords: Verification &Validation, RAMS, Onboard software, SFMEA, STA, Fault-injection 1 This work is being performed under the project STADY Applied Static And Dynamic Verification Of Critical Software, ESA/ESTEC Contract Nr. 15751/02/NL/LvH.
Maintaining the Health of Software Monitors
NASA Technical Reports Server (NTRS)
Person, Suzette; Rungta, Neha
2013-01-01
Software health management (SWHM) techniques complement the rigorous verification and validation processes that are applied to safety-critical systems prior to their deployment. These techniques are used to monitor deployed software in its execution environment, serving as the last line of defense against the effects of a critical fault. SWHM monitors use information from the specification and implementation of the monitored software to detect violations, predict possible failures, and help the system recover from faults. Changes to the monitored software, such as adding new functionality or fixing defects, therefore, have the potential to impact the correctness of both the monitored software and the SWHM monitor. In this work, we describe how the results of a software change impact analysis technique, Directed Incremental Symbolic Execution (DiSE), can be applied to monitored software to identify the potential impact of the changes on the SWHM monitor software. The results of DiSE can then be used by other analysis techniques, e.g., testing, debugging, to help preserve and improve the integrity of the SWHM monitor as the monitored software evolves.
NASA Astrophysics Data System (ADS)
Li, Shuanghong; Cao, Hongliang; Yang, Yupu
2018-02-01
Fault diagnosis is a key process for the reliability and safety of solid oxide fuel cell (SOFC) systems. However, it is difficult to rapidly and accurately identify faults for complicated SOFC systems, especially when simultaneous faults appear. In this research, a data-driven Multi-Label (ML) pattern identification approach is proposed to address the simultaneous fault diagnosis of SOFC systems. The framework of the simultaneous-fault diagnosis primarily includes two components: feature extraction and ML-SVM classifier. The simultaneous-fault diagnosis approach can be trained to diagnose simultaneous SOFC faults, such as fuel leakage, air leakage in different positions in the SOFC system, by just using simple training data sets consisting only single fault and not demanding simultaneous faults data. The experimental result shows the proposed framework can diagnose the simultaneous SOFC system faults with high accuracy requiring small number training data and low computational burden. In addition, Fault Inference Tree Analysis (FITA) is employed to identify the correlations among possible faults and their corresponding symptoms at the system component level.
NASA Astrophysics Data System (ADS)
Schwartz, D. P.; Haeussler, P. J.; Seitz, G. G.; Dawson, T. E.; Stenner, H. D.; Matmon, A.; Crone, A. J.; Personius, S.; Burns, P. B.; Cadena, A.; Thoms, E.
2005-12-01
Developing accurate rupture histories of long, high-slip-rate strike-slip faults is is especially challenging where recurrence is relatively short (hundreds of years), adjacent segments may fail within decades of each other, and uncertainties in dating can be as large as, or larger than, the time between events. The Denali Fault system (DFS) is the major active structure of interior Alaska, but received little study since pioneering fault investigations in the early 1970s. Until the summer of 2003 essentially no data existed on the timing or spatial distribution of past ruptures on the DFS. This changed with the occurrence of the M7.9 2002 Denali fault earthquake, which has been a catalyst for present paleoseismic investigations. It provided a well-constrained rupture length and slip distribution. Strike-slip faulting occurred along 290 km of the Denali and Totschunda faults, leaving unruptured ?140km of the eastern Denali fault, ?180 km of the western Denali fault, and ?70 km of the eastern Totschunda fault. The DFS presents us with a blank canvas on which to fill a chronology of past earthquakes using modern paleoseismic techniques. Aware of correlation issues with potentially closely-timed earthquakes we have a) investigated 11 paleoseismic sites that allow a variety of dating techniques, b) measured paleo offsets, which provide insight into magnitude and rupture length of past events, at 18 locations, and c) developed late Pleistocene and Holocene slip rates using exposure age dating to constrain long-term fault behavior models. We are in the process of: 1) radiocarbon-dating peats involved in faulting and liquefaction, and especially short-lived forest floor vegetation that includes outer rings of trees, spruce needles, and blueberry leaves killed and buried during paleoearthquakes; 2) supporting development of a 700-900 year tree-ring time-series for precise dating of trees used in event timing; 3) employing Pb 210 for constraining the youngest ruptures in sag ponds on the eastern and western Denali fault; and 4) using volcanic ashes in trenches for dating and correlation. Initial results are: 1) Large earthquakes occurred along the 2002 rupture section 350-700 yrb02 (2-sigma, calendar-corrected, years before 2002) with offsets about the same as 2002. The Denali penultimate rupture appears younger (350-570 yrb02) than the Totschunda (580-700 yrb02); 2) The western Denali fault is geomorphically fresh, its MRE likely occurred within the past 250 years, the penultimate event occurred 570-680 yrb02, and slip in each event was 4m; 3) The eastern Denali MRE post-dates peat dated at 550-680 yrb02, is younger than the penultimate Totschunda event, and could be part of the penultimate Denali fault rupture or a separate earthquake; 4) A 120-km section of the Denali fault between tNenana glacier and the Delta River may be a zone of overlap for large events and/or capable of producing smaller earthquakes; its western part has fresh scarps with small (1m) offsets. 2004/2005 field observations show there are longer datable records, with 4-5 events recorded in trenches on the eastern Denali fault and the west end of the 2002 rupture, 2-3 events on the western part of the fault in Denali National Park, and 3-4 events on the Totschunda fault. These and extensive datable material provide the basis to define the paleoseismic history of DFS earthquake ruptures through multiple and complete earthquake cycles.
Support vector machines-based fault diagnosis for turbo-pump rotor
NASA Astrophysics Data System (ADS)
Yuan, Sheng-Fa; Chu, Fu-Lei
2006-05-01
Most artificial intelligence methods used in fault diagnosis are based on empirical risk minimisation principle and have poor generalisation when fault samples are few. Support vector machines (SVM) is a new general machine-learning tool based on structural risk minimisation principle that exhibits good generalisation even when fault samples are few. Fault diagnosis based on SVM is discussed. Since basic SVM is originally designed for two-class classification, while most of fault diagnosis problems are multi-class cases, a new multi-class classification of SVM named 'one to others' algorithm is presented to solve the multi-class recognition problems. It is a binary tree classifier composed of several two-class classifiers organised by fault priority, which is simple, and has little repeated training amount, and the rate of training and recognition is expedited. The effectiveness of the method is verified by the application to the fault diagnosis for turbo pump rotor.
A Reference Model for Software and System Inspections. White Paper
NASA Technical Reports Server (NTRS)
He, Lulu; Shull, Forrest
2009-01-01
Software Quality Assurance (SQA) is an important component of the software development process. SQA processes provide assurance that the software products and processes in the project life cycle conform to their specified requirements by planning, enacting, and performing a set of activities to provide adequate confidence that quality is being built into the software. Typical techniques include: (1) Testing (2) Simulation (3) Model checking (4) Symbolic execution (5) Management reviews (6) Technical reviews (7) Inspections (8) Walk-throughs (9) Audits (10) Analysis (complexity analysis, control flow analysis, algorithmic analysis) (11) Formal method Our work over the last few years has resulted in substantial knowledge about SQA techniques, especially the areas of technical reviews and inspections. But can we apply the same QA techniques to the system development process? If yes, what kind of tailoring do we need before applying them in the system engineering context? If not, what types of QA techniques are actually used at system level? And, is there any room for improvement.) After a brief examination of the system engineering literature (especially focused on NASA and DoD guidance) we found that: (1) System and software development process interact with each other at different phases through development life cycle (2) Reviews are emphasized in both system and software development. (Figl.3). For some reviews (e.g. SRR, PDR, CDR), there are both system versions and software versions. (3) Analysis techniques are emphasized (e.g. Fault Tree Analysis, Preliminary Hazard Analysis) and some details are given about how to apply them. (4) Reviews are expected to use the outputs of the analysis techniques. In other words, these particular analyses are usually conducted in preparation for (before) reviews. The goal of our work is to explore the interaction between the Quality Assurance (QA) techniques at the system level and the software level.
NASA Technical Reports Server (NTRS)
Basili, V. R.
1981-01-01
Work on metrics is discussed. Factors that affect software quality are reviewed. Metrics is discussed in terms of criteria achievements, reliability, and fault tolerance. Subjective and objective metrics are distinguished. Product/process and cost/quality metrics are characterized and discussed.
EDNA: Expert fault digraph analysis using CLIPS
NASA Technical Reports Server (NTRS)
Dixit, Vishweshwar V.
1990-01-01
Traditionally fault models are represented by trees. Recently, digraph models have been proposed (Sack). Digraph models closely imitate the real system dependencies and hence are easy to develop, validate and maintain. However, they can also contain directed cycles and analysis algorithms are hard to find. Available algorithms tend to be complicated and slow. On the other hand, the tree analysis (VGRH, Tayl) is well understood and rooted in vast research effort and analytical techniques. The tree analysis algorithms are sophisticated and orders of magnitude faster. Transformation of a digraph (cyclic) into trees (CLP, LP) is a viable approach to blend the advantages of the representations. Neither the digraphs nor the trees provide the ability to handle heuristic knowledge. An expert system, to capture the engineering knowledge, is essential. We propose an approach here, namely, expert network analysis. We combine the digraph representation and tree algorithms. The models are augmented by probabilistic and heuristic knowledge. CLIPS, an expert system shell from NASA-JSC will be used to develop a tool. The technique provides the ability to handle probabilities and heuristic knowledge. Mixed analysis, some nodes with probabilities, is possible. The tool provides graphics interface for input, query, and update. With the combined approach it is expected to be a valuable tool in the design process as well in the capture of final design knowledge.
Emerging technologies for V&V of ISHM software for space exploration
NASA Technical Reports Server (NTRS)
Feather, Martin S.; Markosian, Lawrence Z.
2006-01-01
Systems1,2 required to exhibit high operational reliability often rely on some form of fault protection to recognize and respond to faults, preventing faults' escalation to catastrophic failures. Integrated System Health Management (ISHM) extends the functionality of fault protection to both scale to more complex systems (and systems of systems), and to maintain capability rather than just avert catastrophe. Forms of ISHM have been utilized to good effect in the maintenance phase of systems' total lifecycles (often referred to as 'condition-based mainte-nance'), but less so in a 'fault protection' role during actual operations. One of the impediments to such use lies in the challenges of verification, validation and certification of ISHM systems themselves. This paper makes the case that state-of-the-practice V&V and certification techniques will not suffice for emerging forms of ISHM systems; however, a number of maturing software engineering assurance technologies show particular promise for addressing these ISHM V&V challenges.
NASA Astrophysics Data System (ADS)
Hu, Bingbing; Li, Bing
2016-02-01
It is very difficult to detect weak fault signatures due to the large amount of noise in a wind turbine system. Multiscale noise tuning stochastic resonance (MSTSR) has proved to be an effective way to extract weak signals buried in strong noise. However, the MSTSR method originally based on discrete wavelet transform (DWT) has disadvantages such as shift variance and the aliasing effects in engineering application. In this paper, the dual-tree complex wavelet transform (DTCWT) is introduced into the MSTSR method, which makes it possible to further improve the system output signal-to-noise ratio and the accuracy of fault diagnosis by the merits of DTCWT (nearly shift invariant and reduced aliasing effects). Moreover, this method utilizes the relationship between the two dual-tree wavelet basis functions, instead of matching the single wavelet basis function to the signal being analyzed, which may speed up the signal processing and be employed in on-line engineering monitoring. The proposed method is applied to the analysis of bearing outer ring and shaft coupling vibration signals carrying fault information. The results confirm that the method performs better in extracting the fault features than the original DWT-based MSTSR, the wavelet transform with post spectral analysis, and EMD-based spectral analysis methods.
Locating hardware faults in a parallel computer
Archer, Charles J.; Megerian, Mark G.; Ratterman, Joseph D.; Smith, Brian E.
2010-04-13
Locating hardware faults in a parallel computer, including defining within a tree network of the parallel computer two or more sets of non-overlapping test levels of compute nodes of the network that together include all the data communications links of the network, each non-overlapping test level comprising two or more adjacent tiers of the tree; defining test cells within each non-overlapping test level, each test cell comprising a subtree of the tree including a subtree root compute node and all descendant compute nodes of the subtree root compute node within a non-overlapping test level; performing, separately on each set of non-overlapping test levels, an uplink test on all test cells in a set of non-overlapping test levels; and performing, separately from the uplink tests and separately on each set of non-overlapping test levels, a downlink test on all test cells in a set of non-overlapping test levels.
2016-08-16
Force Research Laboratory Space Vehicles Directorate AFRL /RVSV 3550 Aberdeen Ave, SE 11. SPONSOR/MONITOR’S REPORT Kirtland AFB, NM 87117-5776 NUMBER...Ft Belvoir, VA 22060-6218 1 cy AFRL /RVIL Kirtland AFB, NM 87117-5776 2 cys Official Record Copy AFRL /RVSV/Richard S. Erwin 1 cy... AFRL -RV-PS- AFRL -RV-PS- TR-2016-0112 TR-2016-0112 SPECIFICATION, SYNTHESIS, AND VERIFICATION OF SOFTWARE-BASED CONTROL PROTOCOLS FOR FAULT-TOLERANT
NASA Technical Reports Server (NTRS)
Hoppa, Mary Ann; Wilson, Larry W.
1994-01-01
There are many software reliability models which try to predict future performance of software based on data generated by the debugging process. Our research has shown that by improving the quality of the data one can greatly improve the predictions. We are working on methodologies which control some of the randomness inherent in the standard data generation processes in order to improve the accuracy of predictions. Our contribution is twofold in that we describe an experimental methodology using a data structure called the debugging graph and apply this methodology to assess the robustness of existing models. The debugging graph is used to analyze the effects of various fault recovery orders on the predictive accuracy of several well-known software reliability algorithms. We found that, along a particular debugging path in the graph, the predictive performance of different models can vary greatly. Similarly, just because a model 'fits' a given path's data well does not guarantee that the model would perform well on a different path. Further we observed bug interactions and noted their potential effects on the predictive process. We saw that not only do different faults fail at different rates, but that those rates can be affected by the particular debugging stage at which the rates are evaluated. Based on our experiment, we conjecture that the accuracy of a reliability prediction is affected by the fault recovery order as well as by fault interaction.
Diagnosis diagrams for passing signals on an automatic block signaling railway section
NASA Astrophysics Data System (ADS)
Spunei, E.; Piroi, I.; Chioncel, C. P.; Piroi, F.
2018-01-01
This work presents a diagnosis method for railway traffic security installations. More specifically, the authors present a series of diagnosis charts for passing signals on a railway block equipped with an automatic block signaling installation. These charts are based on the exploitation electric schemes, and are subsequently used to develop a diagnosis software package. The thus developed software package contributes substantially to a reduction of failure detection and remedy for these types of installation faults. The use of the software package eliminates making wrong decisions in the fault detection process, decisions that may result in longer remedy times and, sometimes, to railway traffic events.
Certification of computational results
NASA Technical Reports Server (NTRS)
Sullivan, Gregory F.; Wilson, Dwight S.; Masson, Gerald M.
1993-01-01
A conceptually novel and powerful technique to achieve fault detection and fault tolerance in hardware and software systems is described. When used for software fault detection, this new technique uses time and software redundancy and can be outlined as follows. In the initial phase, a program is run to solve a problem and store the result. In addition, this program leaves behind a trail of data called a certification trail. In the second phase, another program is run which solves the original problem again. This program, however, has access to the certification trail left by the first program. Because of the availability of the certification trail, the second phase can be performed by a less complex program and can execute more quickly. In the final phase, the two results are compared and if they agree the results are accepted as correct; otherwise an error is indicated. An essential aspect of this approach is that the second program must always generate either an error indication or a correct output even when the certification trail it receives from the first program is incorrect. The certification trail approach to fault tolerance is formalized and realizations of it are illustrated by considering algorithms for the following problems: convex hull, sorting, and shortest path. Cases in which the second phase can be run concurrently with the first and act as a monitor are discussed. The certification trail approach are compared to other approaches to fault tolerance.
Fault Injection Validation of a Safety-Critical TMR Sysem
NASA Astrophysics Data System (ADS)
Irrera, Ivano; Madeira, Henrique; Zentai, Andras; Hergovics, Beata
2016-08-01
Digital systems and their software are the core technology for controlling and monitoring industrial systems in practically all activity domains. Functional safety standards such as the European standard EN 50128 for railway applications define the procedures and technical requirements for the development of software for railway control and protection systems. The validation of such systems is a highly demanding task. In this paper we discuss the use of fault injection techniques, which have been used extensively in several domains, particularly in the space domain, to complement the traditional procedures to validate a SIL (Safety Integrity Level) 4 system for railway signalling, implementing a TMR (Triple Modular Redundancy) architecture. The fault injection tool is based on JTAG technology. The results of our injection campaign showed a high degree of tolerance to most of the injected faults, but several cases of unexpected behaviour have also been observed, helping understanding worst-case scenarios.
NASA Astrophysics Data System (ADS)
Da Silva, A.; Sánchez Prieto, S.; Polo, O.; Parra Espada, P.
2013-05-01
Because of the tough robustness requirements in space software development, it is imperative to carry out verification tasks at a very early development stage to ensure that the implemented exception mechanisms work properly. All this should be done long time before the real hardware is available. But even if real hardware is available the verification of software fault tolerance mechanisms can be difficult since real faulty situations must be systematically and artificially brought about which can be imposible on real hardware. To solve this problem the Alcala Space Research Group (SRG) has developed a LEON2 virtual platform (Leon2ViP) with fault injection capabilities. This way it is posible to run the exact same target binary software as runs on the physical system in a more controlled and deterministic environment, allowing a more strict requirements verification. Leon2ViP enables unmanned and tightly focused fault injection campaigns, not possible otherwise, in order to expose and diagnose flaws in the software implementation early. Furthermore, the use of a virtual hardware-in-the-loop approach makes it possible to carry out preliminary integration tests with the spacecraft emulator or the sensors. The use of Leon2ViP has meant a signicant improvement, in both time and cost, in the development and verification processes of the Instrument Control Unit boot software on board Solar Orbiter's Energetic Particle Detector.
A quantitative analysis of the F18 flight control system
NASA Technical Reports Server (NTRS)
Doyle, Stacy A.; Dugan, Joanne B.; Patterson-Hine, Ann
1993-01-01
This paper presents an informal quantitative analysis of the F18 flight control system (FCS). The analysis technique combines a coverage model with a fault tree model. To demonstrate the method's extensive capabilities, we replace the fault tree with a digraph model of the F18 FCS, the only model available to us. The substitution shows that while digraphs have primarily been used for qualitative analysis, they can also be used for quantitative analysis. Based on our assumptions and the particular failure rates assigned to the F18 FCS components, we show that coverage does have a significant effect on the system's reliability and thus it is important to include coverage in the reliability analysis.
Chen, Gang; Song, Yongduan; Lewis, Frank L
2016-05-03
This paper investigates the distributed fault-tolerant control problem of networked Euler-Lagrange systems with actuator and communication link faults. An adaptive fault-tolerant cooperative control scheme is proposed to achieve the coordinated tracking control of networked uncertain Lagrange systems on a general directed communication topology, which contains a spanning tree with the root node being the active target system. The proposed algorithm is capable of compensating for the actuator bias fault, the partial loss of effectiveness actuation fault, the communication link fault, the model uncertainty, and the external disturbance simultaneously. The control scheme does not use any fault detection and isolation mechanism to detect, separate, and identify the actuator faults online, which largely reduces the online computation and expedites the responsiveness of the controller. To validate the effectiveness of the proposed method, a test-bed of multiple robot-arm cooperative control system is developed for real-time verification. Experiments on the networked robot-arms are conduced and the results confirm the benefits and the effectiveness of the proposed distributed fault-tolerant control algorithms.
Friction Laws Derived From the Acoustic Emissions of a Laboratory Fault by Machine Learning
NASA Astrophysics Data System (ADS)
Rouet-Leduc, B.; Hulbert, C.; Ren, C. X.; Bolton, D. C.; Marone, C.; Johnson, P. A.
2017-12-01
Fault friction controls nearly all aspects of fault rupture, yet it is only possible to measure in the laboratory. Here we describe laboratory experiments where acoustic emissions are recorded from the fault. We find that by applying a machine learning approach known as "extreme gradient boosting trees" to the continuous acoustical signal, the fault friction can be directly inferred, showing that instantaneous characteristics of the acoustic signal are a fingerprint of the frictional state. This machine learning-based inference leads to a simple law that links the acoustic signal to the friction state, and holds for every stress cycle the laboratory fault goes through. The approach does not use any other measured parameter than instantaneous statistics of the acoustic signal. This finding may have importance for inferring frictional characteristics from seismic waves in Earth where fault friction cannot be measured.
The Design of a Fault-Tolerant COTS-Based Bus Architecture for Space Applications
NASA Technical Reports Server (NTRS)
Chau, Savio N.; Alkalai, Leon; Tai, Ann T.
2000-01-01
The high-performance, scalability and miniaturization requirements together with the power, mass and cost constraints mandate the use of commercial-off-the-shelf (COTS) components and standards in the X2000 avionics system architecture for deep-space missions. In this paper, we report our experiences and findings on the design of an IEEE 1394 compliant fault-tolerant COTS-based bus architecture. While the COTS standard IEEE 1394 adequately supports power management, high performance and scalability, its topological criteria impose restrictions on fault tolerance realization. To circumvent the difficulties, we derive a "stack-tree" topology that not only complies with the IEEE 1394 standard but also facilitates fault tolerance realization in a spaceborne system with limited dedicated resource redundancies. Moreover, by exploiting pertinent standard features of the 1394 interface which are not purposely designed for fault tolerance, we devise a comprehensive set of fault detection mechanisms to support the fault-tolerant bus architecture.
Assurance of Fault Management: Risk-Significant Adverse Condition Awareness
NASA Technical Reports Server (NTRS)
Fitz, Rhonda
2016-01-01
Fault Management (FM) systems are ranked high in risk-based assessment of criticality within flight software, emphasizing the importance of establishing highly competent domain expertise to provide assurance for NASA projects, especially as spaceflight systems continue to increase in complexity. Insight into specific characteristics of FM architectures seen embedded within safety- and mission-critical software systems analyzed by the NASA Independent Verification Validation (IVV) Program has been enhanced with an FM Technical Reference (TR) suite. Benefits are aimed beyond the IVV community to those that seek ways to efficiently and effectively provide software assurance to reduce the FM risk posture of NASA and other space missions. The identification of particular FM architectures, visibility, and associated IVV techniques provides a TR suite that enables greater assurance that critical software systems will adequately protect against faults and respond to adverse conditions. The role FM has with regard to overall asset protection of flight software systems is being addressed with the development of an adverse condition (AC) database encompassing flight software vulnerabilities.Identification of potential off-nominal conditions and analysis to determine how a system responds to these conditions are important aspects of hazard analysis and fault management. Understanding what ACs the mission may face, and ensuring they are prevented or addressed is the responsibility of the assurance team, which necessarily should have insight into ACs beyond those defined by the project itself. Research efforts sponsored by NASAs Office of Safety and Mission Assurance defined terminology, categorized data fields, and designed a baseline repository that centralizes and compiles a comprehensive listing of ACs and correlated data relevant across many NASA missions. This prototype tool helps projects improve analysis by tracking ACs, and allowing queries based on project, mission type, domain component, causal fault, and other key characteristics. The repository has a firm structure, initial collection of data, and an interface established for informational queries, with plans for integration within the Enterprise Architecture at NASA IVV, enabling support and accessibility across the Agency. The development of an improved workflow process for adaptive, risk-informed FM assurance is currently underway.
Fault Detection and Correction for the Solar Dynamics Observatory Attitude Control System
NASA Technical Reports Server (NTRS)
Starin, Scott R.; Vess, Melissa F.; Kenney, Thomas M.; Maldonado, Manuel D.; Morgenstern, Wendy M.
2007-01-01
The Solar Dynamics Observatory is an Explorer-class mission that will launch in early 2009. The spacecraft will operate in a geosynchronous orbit, sending data 24 hours a day to a devoted ground station in White Sands, New Mexico. It will carry a suite of instruments designed to observe the Sun in multiple wavelengths at unprecedented resolution. The Atmospheric Imaging Assembly includes four telescopes with focal plane CCDs that can image the full solar disk in four different visible wavelengths. The Extreme-ultraviolet Variability Experiment will collect time-correlated data on the activity of the Sun's corona. The Helioseismic and Magnetic Imager will enable study of pressure waves moving through the body of the Sun. The attitude control system on Solar Dynamics Observatory is responsible for four main phases of activity. The physical safety of the spacecraft after separation must be guaranteed. Fine attitude determination and control must be sufficient for instrument calibration maneuvers. The mission science mode requires 2-arcsecond control according to error signals provided by guide telescopes on the Atmospheric Imaging Assembly, one of the three instruments to be carried. Lastly, accurate execution of linear and angular momentum changes to the spacecraft must be provided for momentum management and orbit maintenance. In thsp aper, single-fault tolerant fault detection and correction of the Solar Dynamics Observatory attitude control system is described. The attitude control hardware suite for the mission is catalogued, with special attention to redundancy at the hardware level. Four reaction wheels are used where any three are satisfactory. Four pairs of redundant thrusters are employed for orbit change maneuvers and momentum management. Three two-axis gyroscopes provide full redundancy for rate sensing. A digital Sun sensor and two autonomous star trackers provide two-out-of-three redundancy for fine attitude determination. The use of software to maximize chances of recovery from any hardware or software fault is detailed. A generic fault detection and correction software structure is used, allowing additions, deletions, and adjustments to fault detection and correction rules. This software structure is fed by in-line fault tests that are also able to take appropriate actions to avoid corruption of the data stream.
MrEnt: an editor for publication-quality phylogenetic tree illustrations.
Zuccon, Alessandro; Zuccon, Dario
2014-09-01
We developed MrEnt, a Windows-based, user-friendly software that allows the production of complex, high-resolution, publication-quality phylogenetic trees in few steps, directly from the analysis output. The program recognizes the standard Nexus tree format and the annotated tree files produced by BEAST and MrBayes. MrEnt combines in a single software a large suite of tree manipulation functions (e.g. handling of multiple trees, tree rotation, character mapping, node collapsing, compression of large clades, handling of time scale and error bars for chronograms) with drawing tools typical of standard graphic editors, including handling of graphic elements and images. The tree illustration can be printed or exported in several standard formats suitable for journal publication, PowerPoint presentation or Web publication. © 2014 John Wiley & Sons Ltd.
Mason F. Patterson; P. Eric Wiseman; Matthew F. Winn; Sang-mook Lee; Philip A. Araman
2011-01-01
UrbanCrowns is a software program developed by the USDA Forest Service that computes crown attributes using a side-view digital photograph and a few basic field measurements. From an operational standpoint, it is not known how well the software performs under varying photographic conditions for trees of diverse size, which could impact measurement reproducibility and...
Fault-tolerant clock synchronization in distributed systems
NASA Technical Reports Server (NTRS)
Ramanathan, Parameswaran; Shin, Kang G.; Butler, Ricky W.
1990-01-01
Existing fault-tolerant clock synchronization algorithms are compared and contrasted. These include the following: software synchronization algorithms, such as convergence-averaging, convergence-nonaveraging, and consistency algorithms, as well as probabilistic synchronization; hardware synchronization algorithms; and hybrid synchronization. The worst-case clock skews guaranteed by representative algorithms are compared, along with other important aspects such as time, message, and cost overhead imposed by the algorithms. More recent developments such as hardware-assisted software synchronization and algorithms for synchronizing large, partially connected distributed systems are especially emphasized.
The development of an interim generalized gate logic software simulator
NASA Technical Reports Server (NTRS)
Mcgough, J. G.; Nemeroff, S.
1985-01-01
A proof-of-concept computer program called IGGLOSS (Interim Generalized Gate Logic Software Simulator) was developed and is discussed. The simulator engine was designed to perform stochastic estimation of self test coverage (fault-detection latency times) of digital computers or systems. A major attribute of the IGGLOSS is its high-speed simulation: 9.5 x 1,000,000 gates/cpu sec for nonfaulted circuits and 4.4 x 1,000,000 gates/cpu sec for faulted circuits on a VAX 11/780 host computer.
The Design of a Fault-Tolerant COTS-Based Bus Architecture
NASA Technical Reports Server (NTRS)
Chau, Savio N.; Alkalai, Leon; Burt, John B.; Tai, Ann T.
1999-01-01
In this paper, we report our experiences and findings on the design of a fault-tolerant bus architecture comprised of two COTS buses, the IEEE 1394 and the 12C. This fault-tolerant bus is the backbone system bus for the avionics architecture of the X2000 program at the Jet Propulsion Laboratory. COTS buses are attractive because of the availability of low cost commercial products. However, they are not specifically designed for highly reliable applications such as long-life deep-space missions. The X2000 design team has devised a multi-level fault tolerance approach to compensate for this shortcoming of COTS buses. First, the approach enhances the fault tolerance capabilities of the IEEE 1394 and 12 C buses by adding a layer of fault handling hardware and software. Second, algorithms are developed to enable the IEEE 1394 and the 12 C buses assist each other to isolate and recovery from faults. Third, the set of IEEE 1394 and 12 C buses is duplicated to further enhance system reliability. The X2000 design team has paid special attention to guarantee that all fault tolerance provisions will not cause the bus design to deviate from the commercial standard specifications. Otherwise, the economic attractiveness of using COTS will be diminished. The hardware and software design of the X2000 fault-tolerant bus are being implemented and flight hardware will be delivered to the ST4 and Europa Orbiter missions.
AADL and Model-based Engineering
2014-10-20
and MBE Feiler, Oct 20, 2014 © 2014 Carnegie Mellon University We Rely on Software for Safe Aircraft Operation Embedded software systems ...D eveloper Compute Platform Runtime Architecture Application Software Embedded SW System Engineer Data Stream Characteristics Latency...confusion Hardware Engineer Why do system level failures still occur despite fault tolerance techniques being deployed in systems ? Embedded software
A survey of fault diagnosis technology
NASA Technical Reports Server (NTRS)
Riedesel, Joel
1989-01-01
Existing techniques and methodologies for fault diagnosis are surveyed. The techniques run the gamut from theoretical artificial intelligence work to conventional software engineering applications. They are shown to define a spectrum of implementation alternatives where tradeoffs determine their position on the spectrum. Various tradeoffs include execution time limitations and memory requirements of the algorithms as well as their effectiveness in addressing the fault diagnosis problem.
An experiment in software reliability: Additional analyses using data from automated replications
NASA Technical Reports Server (NTRS)
Dunham, Janet R.; Lauterbach, Linda A.
1988-01-01
A study undertaken to collect software error data of laboratory quality for use in the development of credible methods for predicting the reliability of software used in life-critical applications is summarized. The software error data reported were acquired through automated repetitive run testing of three independent implementations of a launch interceptor condition module of a radar tracking problem. The results are based on 100 test applications to accumulate a sufficient sample size for error rate estimation. The data collected is used to confirm the results of two Boeing studies reported in NASA-CR-165836 Software Reliability: Repetitive Run Experimentation and Modeling, and NASA-CR-172378 Software Reliability: Additional Investigations into Modeling With Replicated Experiments, respectively. That is, the results confirm the log-linear pattern of software error rates and reject the hypothesis of equal error rates per individual fault. This rejection casts doubt on the assumption that the program's failure rate is a constant multiple of the number of residual bugs; an assumption which underlies some of the current models of software reliability. data raises new questions concerning the phenomenon of interacting faults.
Software Health Management with Bayesian Networks
NASA Technical Reports Server (NTRS)
Mengshoel, Ole; Schumann, JOhann
2011-01-01
Most modern aircraft as well as other complex machinery is equipped with diagnostics systems for its major subsystems. During operation, sensors provide important information about the subsystem (e.g., the engine) and that information is used to detect and diagnose faults. Most of these systems focus on the monitoring of a mechanical, hydraulic, or electromechanical subsystem of the vehicle or machinery. Only recently, health management systems that monitor software have been developed. In this paper, we will discuss our approach of using Bayesian networks for Software Health Management (SWHM). We will discuss SWHM requirements, which make advanced reasoning capabilities for the detection and diagnosis important. Then we will present our approach to using Bayesian networks for the construction of health models that dynamically monitor a software system and is capable of detecting and diagnosing faults.
NASA Astrophysics Data System (ADS)
Kettle, L. M.; Mora, P.; Weatherley, D.; Gross, L.; Xing, H.
2006-12-01
Simulations using the Finite Element method are widely used in many engineering applications and for the solution of partial differential equations (PDEs). Computational models based on the solution of PDEs play a key role in earth systems simulations. We present numerical modelling of crustal fault systems where the dynamic elastic wave equation is solved using the Finite Element method. This is achieved using a high level computational modelling language, escript, available as open source software from ACcESS (Australian Computational Earth Systems Simulator), the University of Queensland. Escript is an advanced geophysical simulation software package developed at ACcESS which includes parallel equation solvers, data visualisation and data analysis software. The escript library was implemented to develop a flexible Finite Element model which reliably simulates the mechanism of faulting and the physics of earthquakes. Both 2D and 3D elastodynamic models are being developed to study the dynamics of crustal fault systems. Our final goal is to build a flexible model which can be applied to any fault system with user-defined geometry and input parameters. To study the physics of earthquake processes, two different time scales must be modelled, firstly the quasi-static loading phase which gradually increases stress in the system (~100years), and secondly the dynamic rupture process which rapidly redistributes stress in the system (~100secs). We will discuss the solution of the time-dependent elastic wave equation for an arbitrary fault system using escript. This involves prescribing the correct initial stress distribution in the system to simulate the quasi-static loading of faults to failure; determining a suitable frictional constitutive law which accurately reproduces the dynamics of the stick/slip instability at the faults; and using a robust time integration scheme. These dynamic models generate data and information that can be used for earthquake forecasting.
SIRU utilization. Volume 2: Software description and program documentation
NASA Technical Reports Server (NTRS)
Oehrle, J.; Whittredge, R.
1973-01-01
A complete description of the additional analysis, development and evaluation provided for the SIRU system as identified in the requirements for the SIRU utilization program is presented. The SIRU configuration is a modular inertial subsystem with hardware and software features that achieve fault tolerant operational capabilities. The SIRU redundant hardware design is formulated about a six gyro and six accelerometer instrument module package. The modules are mounted in this package so that their measurement input axes form a unique symmetrical pattern that corresponds to the array of perpendiculars to the faces of a regular dodecahedron. This six axes array provides redundant independent sensing and the symmetry enables the formulation of an optimal software redundant data processing structure with self-contained fault detection and isolation (FDI) capabilities. Documentation of the additional software and software modifications required to implement the utilization capabilities includes assembly listings and flow charts
A theoretical basis for the analysis of multiversion software subject to coincident errors
NASA Technical Reports Server (NTRS)
Eckhardt, D. E., Jr.; Lee, L. D.
1985-01-01
Fundamental to the development of redundant software techniques (known as fault-tolerant software) is an understanding of the impact of multiple joint occurrences of errors, referred to here as coincident errors. A theoretical basis for the study of redundant software is developed which: (1) provides a probabilistic framework for empirically evaluating the effectiveness of a general multiversion strategy when component versions are subject to coincident errors, and (2) permits an analytical study of the effects of these errors. An intensity function, called the intensity of coincident errors, has a central role in this analysis. This function describes the propensity of programmers to introduce design faults in such a way that software components fail together when executing in the application environment. A condition under which a multiversion system is a better strategy than relying on a single version is given.
Spacecraft fault tolerance: The Magellan experience
NASA Technical Reports Server (NTRS)
Kasuda, Rick; Packard, Donna Sexton
1993-01-01
Interplanetary and earth orbiting missions are now imposing unique fault tolerant requirements upon spacecraft design. Mission success is the prime motivator for building spacecraft with fault tolerant systems. The Magellan spacecraft had many such requirements imposed upon its design. Magellan met these requirements by building redundancy into all the major subsystem components and designing the onboard hardware and software with the capability to detect a fault, isolate it to a component, and issue commands to achieve a back-up configuration. This discussion is limited to fault protection, which is the autonomous capability to respond to a fault. The Magellan fault protection design is discussed, as well as the developmental and flight experiences and a summary of the lessons learned.
Method and system for diagnostics of apparatus
NASA Technical Reports Server (NTRS)
Gorinevsky, Dimitry (Inventor)
2012-01-01
Proposed is a method, implemented in software, for estimating fault state of an apparatus outfitted with sensors. At each execution period the method processes sensor data from the apparatus to obtain a set of parity parameters, which are further used for estimating fault state. The estimation method formulates a convex optimization problem for each fault hypothesis and employs a convex solver to compute fault parameter estimates and fault likelihoods for each fault hypothesis. The highest likelihoods and corresponding parameter estimates are transmitted to a display device or an automated decision and control system. The obtained accurate estimate of fault state can be used to improve safety, performance, or maintenance processes for the apparatus.
Fault-zone waves observed at the southern Joshua Tree earthquake rupture zone
Hough, S.E.; Ben-Zion, Y.; Leary, P.
1994-01-01
Waveform and spectral characteristics of several aftershocks of the M 6.1 22 April 1992 Joshua Tree earthquake recorded at stations just north of the Indio Hills in the Coachella Valley can be interpreted in terms of waves propagating within narrow, low-velocity, high-attenuation, vertical zones. Evidence for our interpretation consists of: (1) emergent P arrivals prior to and opposite in polarity to the impulsive direct phase; these arrivals can be modeled as headwaves indicative of a transfault velocity contrast; (2) spectral peaks in the S wave train that can be interpreted as internally reflected, low-velocity fault-zone wave energy; and (3) spatial selectivity of event-station pairs at which these data are observed, suggesting a long, narrow geologic structure. The observed waveforms are modeled using the analytical solution of Ben-Zion and Aki (1990) for a plane-parallel layered fault-zone structure. Synthetic waveform fits to the observed data indicate the presence of NS-trending vertical fault-zone layers characterized by a thickness of 50 to 100 m, a velocity decrease of 10 to 15% relative to the surrounding rock, and a P-wave quality factor in the range 25 to 50.
Probability and possibility-based representations of uncertainty in fault tree analysis.
Flage, Roger; Baraldi, Piero; Zio, Enrico; Aven, Terje
2013-01-01
Expert knowledge is an important source of input to risk analysis. In practice, experts might be reluctant to characterize their knowledge and the related (epistemic) uncertainty using precise probabilities. The theory of possibility allows for imprecision in probability assignments. The associated possibilistic representation of epistemic uncertainty can be combined with, and transformed into, a probabilistic representation; in this article, we show this with reference to a simple fault tree analysis. We apply an integrated (hybrid) probabilistic-possibilistic computational framework for the joint propagation of the epistemic uncertainty on the values of the (limiting relative frequency) probabilities of the basic events of the fault tree, and we use possibility-probability (probability-possibility) transformations for propagating the epistemic uncertainty within purely probabilistic and possibilistic settings. The results of the different approaches (hybrid, probabilistic, and possibilistic) are compared with respect to the representation of uncertainty about the top event (limiting relative frequency) probability. Both the rationale underpinning the approaches and the computational efforts they require are critically examined. We conclude that the approaches relevant in a given setting depend on the purpose of the risk analysis, and that further research is required to make the possibilistic approaches operational in a risk analysis context. © 2012 Society for Risk Analysis.
Sequential Test Strategies for Multiple Fault Isolation
NASA Technical Reports Server (NTRS)
Shakeri, M.; Pattipati, Krishna R.; Raghavan, V.; Patterson-Hine, Ann; Kell, T.
1997-01-01
In this paper, we consider the problem of constructing near optimal test sequencing algorithms for diagnosing multiple faults in redundant (fault-tolerant) systems. The computational complexity of solving the optimal multiple-fault isolation problem is super-exponential, that is, it is much more difficult than the single-fault isolation problem, which, by itself, is NP-hard. By employing concepts from information theory and Lagrangian relaxation, we present several static and dynamic (on-line or interactive) test sequencing algorithms for the multiple fault isolation problem that provide a trade-off between the degree of suboptimality and computational complexity. Furthermore, we present novel diagnostic strategies that generate a static diagnostic directed graph (digraph), instead of a static diagnostic tree, for multiple fault diagnosis. Using this approach, the storage complexity of the overall diagnostic strategy reduces substantially. Computational results based on real-world systems indicate that the size of a static multiple fault strategy is strictly related to the structure of the system, and that the use of an on-line multiple fault strategy can diagnose faults in systems with as many as 10,000 failure sources.
MacDonald Iii, Angus W; Zick, Jennifer L; Chafee, Matthew V; Netoff, Theoden I
2015-01-01
The grand challenges of schizophrenia research are linking the causes of the disorder to its symptoms and finding ways to overcome those symptoms. We argue that the field will be unable to address these challenges within psychiatry's standard neo-Kraepelinian (DSM) perspective. At the same time the current corrective, based in molecular genetics and cognitive neuroscience, is also likely to flounder due to its neglect for psychiatry's syndromal structure. We suggest adopting a new approach long used in reliability engineering, which also serves as a synthesis of these approaches. This approach, known as fault tree analysis, can be combined with extant neuroscientific data collection and computational modeling efforts to uncover the causal structures underlying the cognitive and affective failures in people with schizophrenia as well as other complex psychiatric phenomena. By making explicit how causes combine from basic faults to downstream failures, this approach makes affordances for: (1) causes that are neither necessary nor sufficient in and of themselves; (2) within-diagnosis heterogeneity; and (3) between diagnosis co-morbidity.
A novel design for sap flux data acquisition in large research plots using open source components
NASA Astrophysics Data System (ADS)
Hawthorne, D. A.; Oishi, A. C.
2017-12-01
Sap flux sensors are a widely-used tool for estimating in-situ, tree-level transpiration rates. These probes are installed in the stems of multiple trees within a study area and are typically left in place throughout the year. Sensors vary in their design and theory of operation, but all require electrical power for a heating element and produce at least one analog signal that must be digitized for storage. There are two topologies traditionally adopted to energize these sensors and gather the data from them. In one, a single data logger and power source are used. Dedicated cables radiate out from the logger to supply power to each of the probes and retrieve analog signals. In the other layout, a standalone data logger is located at each monitored tree. Batteries must then be distributed throughout the plot to service these loggers. We present a hybrid solution based on industrial control systems that employs a central data logger and battery, but co-locates digitizing hardware with the sensors at each tree. Each hardware node is able to communicate and share power over wire links with neighboring nodes. The resulting network provides a fault-tolerant path between the logger and each sensor. The approach is optimized to limit disturbance of the study plot, protect signal integrity and to enhance system reliability. This open-source implementation is built on the Arduino micro-controller system and employs RS485 and Modbus communications protocols. It is supported by laptop based management software coded in Python. The system is designed to be readily fabricated and programmed by non-experts. It works with a variety of sap-flux measurement techniques and it is able to interface to additional environmental sensors.
Optical fiber-fault surveillance for passive optical networks in S-band operation window
NASA Astrophysics Data System (ADS)
Yeh, Chien-Hung; Chi, Sien
2005-07-01
An S-band (1470 to 1520 nm) fiber laser scheme, which uses multiple fiber Bragg grating (FBG) elements as feedback elements on each passive branch, is proposed and described for in-service fault identification in passive optical networks (PONs). By tuning a wavelength selective filter located within the laser cavity over a gain bandwidth, the fiber-fault of each branch can be monitored without affecting the in-service channels. In our experiment, an S-band four-branch monitoring tree-structured PON system is demonstrated and investigated experimentally.
Optical fiber-fault surveillance for passive optical networks in S-band operation window.
Yeh, Chien-Hung; Chi, Sien
2005-07-11
An S-band (1470 to 1520 nm) fiber laser scheme, which uses multiple fiber Bragg grating (FBG) elements as feedback elements on each passive branch, is proposed and described for in-service fault identification in passive optical networks (PONs). By tuning a wavelength selective filter located within the laser cavity over a gain bandwidth, the fiber-fault of each branch can be monitored without affecting the in-service channels. In our experiment, an S-band four-branch monitoring tree-structured PON system is demonstrated and investigated experimentally.
Multi-Agent Diagnosis and Control of an Air Revitalization System for Life Support in Space
NASA Technical Reports Server (NTRS)
Malin, Jane T.; Kowing, Jeffrey; Nieten, Joseph; Graham, Jeffrey s.; Schreckenghost, Debra; Bonasso, Pete; Fleming, Land D.; MacMahon, Matt; Thronesbery, Carroll
2000-01-01
An architecture of interoperating agents has been developed to provide control and fault management for advanced life support systems in space. In this adjustable autonomy architecture, software agents coordinate with human agents and provide support in novel fault management situations. This architecture combines the Livingstone model-based mode identification and reconfiguration (MIR) system with the 3T architecture for autonomous flexible command and control. The MIR software agent performs model-based state identification and diagnosis. MIR identifies novel recovery configurations and the set of commands required for the recovery. The AZT procedural executive and the human operator use the diagnoses and recovery recommendations, and provide command sequencing. User interface extensions have been developed to support human monitoring of both AZT and MIR data and activities. This architecture has been demonstrated performing control and fault management for an oxygen production system for air revitalization in space. The software operates in a dynamic simulation testbed.
Transparent Ada rendezvous in a fault tolerant distributed system
NASA Technical Reports Server (NTRS)
Racine, Roger
1986-01-01
There are many problems associated with distributing an Ada program over a loosely coupled communication network. Some of these problems involve the various aspects of the distributed rendezvous. The problems addressed involve supporting the delay statement in a selective call and supporting the else clause in a selective call. Most of these difficulties are compounded by the need for an efficient communication system. The difficulties are compounded even more by considering the possibility of hardware faults occurring while the program is running. With a hardware fault tolerant computer system, it is possible to design a distribution scheme and communication software which is efficient and allows Ada semantics to be preserved. An Ada design for the communications software of one such system will be presented, including a description of the services provided in the seven layers of an International Standards Organization (ISO) Open System Interconnect (OSI) model communications system. The system capabilities (hardware and software) that allow this communication system will also be described.
Sun, Weifang; Yao, Bin; Zeng, Nianyin; He, Yuchao; Cao, Xincheng; He, Wangpeng
2017-01-01
As a typical example of large and complex mechanical systems, rotating machinery is prone to diversified sorts of mechanical faults. Among these faults, one of the prominent causes of malfunction is generated in gear transmission chains. Although they can be collected via vibration signals, the fault signatures are always submerged in overwhelming interfering contents. Therefore, identifying the critical fault’s characteristic signal is far from an easy task. In order to improve the recognition accuracy of a fault’s characteristic signal, a novel intelligent fault diagnosis method is presented. In this method, a dual-tree complex wavelet transform (DTCWT) is employed to acquire the multiscale signal’s features. In addition, a convolutional neural network (CNN) approach is utilized to automatically recognise a fault feature from the multiscale signal features. The experiment results of the recognition for gear faults show the feasibility and effectiveness of the proposed method, especially in the gear’s weak fault features. PMID:28773148
Characterization of the faulted behavior of digital computers and fault tolerant systems
NASA Technical Reports Server (NTRS)
Bavuso, Salvatore J.; Miner, Paul S.
1989-01-01
A development status evaluation is presented for efforts conducted at NASA-Langley since 1977, toward the characterization of the latent fault in digital fault-tolerant systems. Attention is given to the practical, high speed, generalized gate-level logic system simulator developed, as well as to the validation methodology used for the simulator, on the basis of faultable software and hardware simulations employing a prototype MIL-STD-1750A processor. After validation, latency tests will be performed.
Computer Sciences and Data Systems, volume 1
NASA Technical Reports Server (NTRS)
1987-01-01
Topics addressed include: software engineering; university grants; institutes; concurrent processing; sparse distributed memory; distributed operating systems; intelligent data management processes; expert system for image analysis; fault tolerant software; and architecture research.
Survey of Verification and Validation Techniques for Small Satellite Software Development
NASA Technical Reports Server (NTRS)
Jacklin, Stephen A.
2015-01-01
The purpose of this paper is to provide an overview of the current trends and practices in small-satellite software verification and validation. This document is not intended to promote a specific software assurance method. Rather, it seeks to present an unbiased survey of software assurance methods used to verify and validate small satellite software and to make mention of the benefits and value of each approach. These methods include simulation and testing, verification and validation with model-based design, formal methods, and fault-tolerant software design with run-time monitoring. Although the literature reveals that simulation and testing has by far the longest legacy, model-based design methods are proving to be useful for software verification and validation. Some work in formal methods, though not widely used for any satellites, may offer new ways to improve small satellite software verification and validation. These methods need to be further advanced to deal with the state explosion problem and to make them more usable by small-satellite software engineers to be regularly applied to software verification. Last, it is explained how run-time monitoring, combined with fault-tolerant software design methods, provides an important means to detect and correct software errors that escape the verification process or those errors that are produced after launch through the effects of ionizing radiation.
Using Remote Sensing Data to Constrain Models of Fault Interactions and Plate Boundary Deformation
NASA Astrophysics Data System (ADS)
Glasscoe, M. T.; Donnellan, A.; Lyzenga, G. A.; Parker, J. W.; Milliner, C. W. D.
2016-12-01
Determining the distribution of slip and behavior of fault interactions at plate boundaries is a complex problem. Field and remotely sensed data often lack the necessary coverage to fully resolve fault behavior. However, realistic physical models may be used to more accurately characterize the complex behavior of faults constrained with observed data, such as GPS, InSAR, and SfM. These results will improve the utility of using combined models and data to estimate earthquake potential and characterize plate boundary behavior. Plate boundary faults exhibit complex behavior, with partitioned slip and distributed deformation. To investigate what fraction of slip becomes distributed deformation off major faults, we examine a model fault embedded within a damage zone of reduced elastic rigidity that narrows with depth and forward model the slip and resulting surface deformation. The fault segments and slip distributions are modeled using the JPL GeoFEST software. GeoFEST (Geophysical Finite Element Simulation Tool) is a two- and three-dimensional finite element software package for modeling solid stress and strain in geophysical and other continuum domain applications [Lyzenga, et al., 2000; Glasscoe, et al., 2004; Parker, et al., 2008, 2010]. New methods to advance geohazards research using computer simulations and remotely sensed observations for model validation are required to understand fault slip, the complex nature of fault interaction and plate boundary deformation. These models help enhance our understanding of the underlying processes, such as transient deformation and fault creep, and can aid in developing observation strategies for sUAV, airborne, and upcoming satellite missions seeking to determine how faults behave and interact and assess their associated hazard. Models will also help to characterize this behavior, which will enable improvements in hazard estimation. Validating the model results against remotely sensed observations will allow us to better constrain fault zone rheology and physical properties, having implications for the overall understanding of earthquake physics, fault interactions, plate boundary deformation and earthquake hazard, preparedness and risk reduction.
Fault Tolerant Real-Time Systems
1993-09-30
The ART (Advanced Real-Time Technology) Project of Carnegie Mellon University is engaged in wide ranging research on hard real - time systems . The...including hardware and software fault tolerance using temporal redundancy and analytic redundancy to permit the construction of real - time systems whose
Calculation and use of an environment's characteristic software metric set
NASA Technical Reports Server (NTRS)
Basili, Victor R.; Selby, Richard W., Jr.
1985-01-01
Since both cost/quality and production environments differ, this study presents an approach for customizing a characteristic set of software metrics to an environment. The approach is applied in the Software Engineering Laboratory (SEL), a NASA Goddard production environment, to 49 candidate process and product metrics of 652 modules from six (51,000 to 112,000 lines) projects. For this particular environment, the method yielded the characteristic metric set (source lines, fault correction effort per executable statement, design effort, code effort, number of I/O parameters, number of versions). The uses examined for a characteristic metric set include forecasting the effort for development, modification, and fault correction of modules based on historical data.
2009-09-01
this information supports the decison - making process as it is applied to the management of risk. 2. Operational Risk Operational risk is the threat... reasonability . However, to make a software system fault tolerant, the system needs to recognize and fix a system state condition. To detect a fault, a fault...Tracking ..........................................51 C. DECISION- MAKING PROCESS................................................................51 1. Risk
Intelligent fault management for the Space Station active thermal control system
NASA Technical Reports Server (NTRS)
Hill, Tim; Faltisco, Robert M.
1992-01-01
The Thermal Advanced Automation Project (TAAP) approach and architecture is described for automating the Space Station Freedom (SSF) Active Thermal Control System (ATCS). The baseline functionally and advanced automation techniques for Fault Detection, Isolation, and Recovery (FDIR) will be compared and contrasted. Advanced automation techniques such as rule-based systems and model-based reasoning should be utilized to efficiently control, monitor, and diagnose this extremely complex physical system. TAAP is developing advanced FDIR software for use on the SSF thermal control system. The goal of TAAP is to join Knowledge-Based System (KBS) technology, using a combination of rules and model-based reasoning, with conventional monitoring and control software in order to maximize autonomy of the ATCS. TAAP's predecessor was NASA's Thermal Expert System (TEXSYS) project which was the first large real-time expert system to use both extensive rules and model-based reasoning to control and perform FDIR on a large, complex physical system. TEXSYS showed that a method is needed for safely and inexpensively testing all possible faults of the ATCS, particularly those potentially damaging to the hardware, in order to develop a fully capable FDIR system. TAAP therefore includes the development of a high-fidelity simulation of the thermal control system. The simulation provides realistic, dynamic ATCS behavior and fault insertion capability for software testing without hardware related risks or expense. In addition, thermal engineers will gain greater confidence in the KBS FDIR software than was possible prior to this kind of simulation testing. The TAAP KBS will initially be a ground-based extension of the baseline ATCS monitoring and control software and could be migrated on-board as additional computation resources are made available.
Development of an Environment for Software Reliability Model Selection
1992-09-01
now is directed to other related problems such as tools for model selection, multiversion programming, and software fault tolerance modeling... multiversion programming, 7. Hlardware can be repaired by spare modules, which is not. the case for software, 2-6 N. Preventive maintenance is very important
NASA Astrophysics Data System (ADS)
Chartier, Thomas; Scotti, Oona; Boiselet, Aurelien; Lyon-Caen, Hélène
2016-04-01
Including faults in probabilistic seismic hazard assessment tends to increase the degree of uncertainty in the results due to the intrinsically uncertain nature of the fault data. This is especially the case in the low to moderate seismicity regions of Europe, where slow slipping faults are difficult to characterize. In order to better understand the key parameters that control the uncertainty in the fault-related hazard computations, we propose to build an analytic tool that provides a clear link between the different components of the fault-related hazard computations and their impact on the results. This will allow identifying the important parameters that need to be better constrained in order to reduce the resulting uncertainty in hazard and also provide a more hazard-oriented strategy for collecting relevant fault parameters in the field. The tool will be illustrated through the example of the West Corinth rifts fault-models. Recent work performed in the gulf has shown the complexity of the normal faulting system that is accommodating the extensional deformation of the rift. A logic-tree approach is proposed to account for this complexity and the multiplicity of scientifically defendable interpretations. At the nodes of the logic tree, different options that could be considered at each step of the fault-related seismic hazard will be considered. The first nodes represent the uncertainty in the geometries of the faults and their slip rates, which can derive from different data and methodologies. The subsequent node explores, for a given geometry/slip rate of faults, different earthquake rupture scenarios that may occur in the complex network of faults. The idea is to allow the possibility of several faults segments to break together in a single rupture scenario. To build these multiple-fault-segment scenarios, two approaches are considered: one based on simple rules (i.e. minimum distance between faults) and a second one that relies on physically-based simulations. The following nodes represents for each rupture scenario different rupture forecast models (i.e; characteristic or Gutenberg-Richter) and for a given rupture forecast, two probability models commonly used in seismic hazard assessment: poissonian or time-dependent. The final node represents an exhaustive set of ground motion prediction equations chosen in order to be compatible with the region. Finally, the expected probability of exceeding a given ground motion level is computed at each sites. Results will be discussed for a few specific localities of the West Corinth Gulf.
Field, Edward; Biasi, Glenn P.; Bird, Peter; Dawson, Timothy E.; Felzer, Karen R.; Jackson, David A.; Johnson, Kaj M.; Jordan, Thomas H.; Madden, Christopher; Michael, Andrew J.; Milner, Kevin; Page, Morgan T.; Parsons, Thomas E.; Powers, Peter; Shaw, Bruce E.; Thatcher, Wayne R.; Weldon, Ray J.; Zeng, Yuehua
2015-01-01
The 2014 Working Group on California Earthquake Probabilities (WGCEP 2014) presents time-dependent earthquake probabilities for the third Uniform California Earthquake Rupture Forecast (UCERF3). Building on the UCERF3 time-independent model, published previously, renewal models are utilized to represent elastic-rebound-implied probabilities. A new methodology has been developed that solves applicability issues in the previous approach for un-segmented models. The new methodology also supports magnitude-dependent aperiodicity and accounts for the historic open interval on faults that lack a date-of-last-event constraint. Epistemic uncertainties are represented with a logic tree, producing 5,760 different forecasts. Results for a variety of evaluation metrics are presented, including logic-tree sensitivity analyses and comparisons to the previous model (UCERF2). For 30-year M≥6.7 probabilities, the most significant changes from UCERF2 are a threefold increase on the Calaveras fault and a threefold decrease on the San Jacinto fault. Such changes are due mostly to differences in the time-independent models (e.g., fault slip rates), with relaxation of segmentation and inclusion of multi-fault ruptures being particularly influential. In fact, some UCERF2 faults were simply too long to produce M 6.7 sized events given the segmentation assumptions in that study. Probability model differences are also influential, with the implied gains (relative to a Poisson model) being generally higher in UCERF3. Accounting for the historic open interval is one reason. Another is an effective 27% increase in the total elastic-rebound-model weight. The exact factors influencing differences between UCERF2 and UCERF3, as well as the relative importance of logic-tree branches, vary throughout the region, and depend on the evaluation metric of interest. For example, M≥6.7 probabilities may not be a good proxy for other hazard or loss measures. This sensitivity, coupled with the approximate nature of the model and known limitations, means the applicability of UCERF3 should be evaluated on a case-by-case basis.
Power plant fault detection using artificial neural network
NASA Astrophysics Data System (ADS)
Thanakodi, Suresh; Nazar, Nazatul Shiema Moh; Joini, Nur Fazriana; Hidzir, Hidzrin Dayana Mohd; Awira, Mohammad Zulfikar Khairul
2018-02-01
The fault that commonly occurs in power plants is due to various factors that affect the system outage. There are many types of faults in power plants such as single line to ground fault, double line to ground fault, and line to line fault. The primary aim of this paper is to diagnose the fault in 14 buses power plants by using an Artificial Neural Network (ANN). The Multilayered Perceptron Network (MLP) that detection trained utilized the offline training methods such as Gradient Descent Backpropagation (GDBP), Levenberg-Marquardt (LM), and Bayesian Regularization (BR). The best method is used to build the Graphical User Interface (GUI). The modelling of 14 buses power plant, network training, and GUI used the MATLAB software.
Development of N-version software samples for an experiment in software fault tolerance
NASA Technical Reports Server (NTRS)
Lauterbach, L.
1987-01-01
The report documents the task planning and software development phases of an effort to obtain twenty versions of code independently designed and developed from a common specification. These versions were created for use in future experiments in software fault tolerance, in continuation of the experimental series underway at the Systems Validation Methods Branch (SVMB) at NASA Langley Research Center. The 20 versions were developed under controlled conditions at four U.S. universities, by 20 teams of two researchers each. The versions process raw data from a modified Redundant Strapped Down Inertial Measurement Unit (RSDIMU). The specifications, and over 200 questions submitted by the developers concerning the specifications, are included as appendices to this report. Design documents, and design and code walkthrough reports for each version, were also obtained in this task for use in future studies.
Surrogate oracles, generalized dependency and simpler models
NASA Technical Reports Server (NTRS)
Wilson, Larry
1990-01-01
Software reliability models require the sequence of interfailure times from the debugging process as input. It was previously illustrated that using data from replicated debugging could greatly improve reliability predictions. However, inexpensive replication of the debugging process requires the existence of a cheap, fast error detector. Laboratory experiments can be designed around a gold version which is used as an oracle or around an n-version error detector. Unfortunately, software developers can not be expected to have an oracle or to bear the expense of n-versions. A generic technique is being investigated for approximating replicated data by using the partially debugged software as a difference detector. It is believed that the failure rate of each fault has significant dependence on the presence or absence of other faults. Thus, in order to discuss a failure rate for a known fault, the presence or absence of each of the other known faults needs to be specified. Also, in simpler models which use shorter input sequences without sacrificing accuracy are of interest. In fact, a possible gain in performance is conjectured. To investigate these propositions, NASA computers running LIC (RTI) versions are used to generate data. This data will be used to label the debugging graph associated with each version. These labeled graphs will be used to test the utility of a surrogate oracle, to analyze the dependent nature of fault failure rates and to explore the feasibility of reliability models which use the data of only the most recent failures.
Testing Scientific Software: A Systematic Literature Review.
Kanewala, Upulee; Bieman, James M
2014-10-01
Scientific software plays an important role in critical decision making, for example making weather predictions based on climate models, and computation of evidence for research publications. Recently, scientists have had to retract publications due to errors caused by software faults. Systematic testing can identify such faults in code. This study aims to identify specific challenges, proposed solutions, and unsolved problems faced when testing scientific software. We conducted a systematic literature survey to identify and analyze relevant literature. We identified 62 studies that provided relevant information about testing scientific software. We found that challenges faced when testing scientific software fall into two main categories: (1) testing challenges that occur due to characteristics of scientific software such as oracle problems and (2) testing challenges that occur due to cultural differences between scientists and the software engineering community such as viewing the code and the model that it implements as inseparable entities. In addition, we identified methods to potentially overcome these challenges and their limitations. Finally we describe unsolved challenges and how software engineering researchers and practitioners can help to overcome them. Scientific software presents special challenges for testing. Specifically, cultural differences between scientist developers and software engineers, along with the characteristics of the scientific software make testing more difficult. Existing techniques such as code clone detection can help to improve the testing process. Software engineers should consider special challenges posed by scientific software such as oracle problems when developing testing techniques.
Implementation of an experimental fault-tolerant memory system
NASA Technical Reports Server (NTRS)
Carter, W. C.; Mccarthy, C. E.
1976-01-01
The experimental fault-tolerant memory system described in this paper has been designed to enable the modular addition of spares, to validate the theoretical fault-secure and self-testing properties of the translator/corrector, to provide a basis for experiments using the new testing and correction processes for recovery, and to determine the practicality of such systems. The hardware design and implementation are described, together with methods of fault insertion. The hardware/software interface, including a restricted single error correction/double error detection (SEC/DED) code, is specified. Procedures are carefully described which, (1) test for specified physical faults, (2) ensure that single error corrections are not miscorrections due to triple faults, and (3) enable recovery from double errors.
NASA Technical Reports Server (NTRS)
Freeman, Kenneth A.; Walsh, Rick; Weeks, David J.
1988-01-01
Space Station issues in fault management are discussed. The system background is described with attention given to design guidelines and power hardware. A contractually developed fault management system, FRAMES, is integrated with the energy management functions, the control switchgear, and the scheduling and operations management functions. The constraints that shaped the FRAMES system and its implementation are considered.
Integrated Environment for Development and Assurance
2015-01-26
Jan 26, 2015 © 2015 Carnegie Mellon University We Rely on Software for Safe Aircraft Operation Embedded software systems introduce a new class of...eveloper Compute Platform Runtime Architecture Application Software Embedded SW System Engineer Data Stream Characteristics Latency jitter affects...Why do system level failures still occur despite fault tolerance techniques being deployed in systems ? Embedded software system as major source of
Flight elements: Fault detection and fault management
NASA Technical Reports Server (NTRS)
Lum, H.; Patterson-Hine, A.; Edge, J. T.; Lawler, D.
1990-01-01
Fault management for an intelligent computational system must be developed using a top down integrated engineering approach. An approach proposed includes integrating the overall environment involving sensors and their associated data; design knowledge capture; operations; fault detection, identification, and reconfiguration; testability; causal models including digraph matrix analysis; and overall performance impacts on the hardware and software architecture. Implementation of the concept to achieve a real time intelligent fault detection and management system will be accomplished via the implementation of several objectives, which are: Development of fault tolerant/FDIR requirement and specification from a systems level which will carry through from conceptual design through implementation and mission operations; Implementation of monitoring, diagnosis, and reconfiguration at all system levels providing fault isolation and system integration; Optimize system operations to manage degraded system performance through system integration; and Lower development and operations costs through the implementation of an intelligent real time fault detection and fault management system and an information management system.
NASA Astrophysics Data System (ADS)
Lin, S.; Luo, D.; Yanlin, F.; Li, Y.
2016-12-01
Shallow Seismic Reflection (SSR) is a major geophysical exploration method with its exploration depth range, high-resolution in urban active fault exploration. In this paper, we carried out (SSR) and High-resolution refraction (HRR) test in the Liangyun Basin to explore a buried fault. We used NZ distributed 64 channel seismic instrument, 60HZ high sensitivity detector, Geode multi-channel portable acquisition system and hammer source. We selected single side hammer hit multiple overlay, 48 channels received and 12 times of coverage. As there are some coincidence measuring lines of SSR and HRR, we chose multi chase and encounter observation system. Based on the satellite positioning, we arranged 11 survey lines in our study area with total length for 8132 meters. GEOGIGA seismic reflection data processing software was used to deal with the SSR data. After repeated tests from the aspects of single shot record compilation, interference wave pressing, static correction, velocity parameter extraction, dynamic correction, eventually got the shallow seismic reflection profile images. Meanwhile, we used Canadian technology company good refraction and tomographic imaging software to deal with HRR seismic data, which is based on nonlinear first arrival wave travel time tomography. Combined with drilling geological profiles, we explained 11 measured seismic profiles. Results show 18 obvious fault feature breakpoints, including 4 normal faults of south-west, 7 reverse faults of south-west, one normal fault of north-east and 6 reverse faults of north-east. Breakpoints buried depth is 15-18 meters, and the inferred fault distance is 3-12 meters. Comprehensive analysis shows that the fault property is reverse fault with northeast incline section, and fewer branch normal faults presenting southwest incline section. Since good corresponding relationship between the seismic interpretation results, drilling data and SEM results on the property, occurrence, broken length of the fault, we considered the Liangyun fault to be an active fault which has strong activity during the Neogene Pliocene and early Pleistocene, Middle Pleistocene period. The combined application of SSR and HRR can provide more parameters to explain the seismic results, and improve the accuracy of the interpretation.
TES: A modular systems approach to expert system development for real-time space applications
NASA Technical Reports Server (NTRS)
Cacace, Ralph; England, Brenda
1988-01-01
A major goal of the Space Station era is to reduce reliance on support from ground based experts. The development of software programs using expert systems technology is one means of reaching this goal without requiring crew members to become intimately familiar with the many complex spacecraft subsystems. Development of an expert systems program requires a validation of the software with actual flight hardware. By combining accurate hardware and software modelling techniques with a modular systems approach to expert systems development, the validation of these software programs can be successfully completed with minimum risk and effort. The TIMES Expert System (TES) is an application that monitors and evaluates real time data to perform fault detection and fault isolation tasks as they would otherwise be carried out by a knowledgeable designer. The development process and primary features of TES, a modular systems approach, and the lessons learned are discussed.
Redundancy management for efficient fault recovery in NASA's distributed computing system
NASA Technical Reports Server (NTRS)
Malek, Miroslaw; Pandya, Mihir; Yau, Kitty
1991-01-01
The management of redundancy in computer systems was studied and guidelines were provided for the development of NASA's fault-tolerant distributed systems. Fault recovery and reconfiguration mechanisms were examined. A theoretical foundation was laid for redundancy management by efficient reconfiguration methods and algorithmic diversity. Algorithms were developed to optimize the resources for embedding of computational graphs of tasks in the system architecture and reconfiguration of these tasks after a failure has occurred. The computational structure represented by a path and the complete binary tree was considered and the mesh and hypercube architectures were targeted for their embeddings. The innovative concept of Hybrid Algorithm Technique was introduced. This new technique provides a mechanism for obtaining fault tolerance while exhibiting improved performance.
Fault Detection, Isolation and Recovery (FDIR) Portable Liquid Oxygen Hardware Demonstrator
NASA Technical Reports Server (NTRS)
Oostdyk, Rebecca L.; Perotti, Jose M.
2011-01-01
The Fault Detection, Isolation and Recovery (FDIR) hardware demonstration will highlight the effort being conducted by Constellation's Ground Operations (GO) to provide the Launch Control System (LCS) with system-level health management during vehicle processing and countdown activities. A proof-of-concept demonstration of the FDIR prototype established the capability of the software to provide real-time fault detection and isolation using generated Liquid Hydrogen data. The FDIR portable testbed unit (presented here) aims to enhance FDIR by providing a dynamic simulation of Constellation subsystems that feed the FDIR software live data based on Liquid Oxygen system properties. The LO2 cryogenic ground system has key properties that are analogous to the properties of an electronic circuit. The LO2 system is modeled using electrical components and an equivalent circuit is designed on a printed circuit board to simulate the live data. The portable testbed is also be equipped with data acquisition and communication hardware to relay the measurements to the FDIR application running on a PC. This portable testbed is an ideal capability to perform FDIR software testing, troubleshooting, training among others.
Failure mode effect analysis and fault tree analysis as a combined methodology in risk management
NASA Astrophysics Data System (ADS)
Wessiani, N. A.; Yoshio, F.
2018-04-01
There have been many studies reported the implementation of Failure Mode Effect Analysis (FMEA) and Fault Tree Analysis (FTA) as a method in risk management. However, most of the studies usually only choose one of these two methods in their risk management methodology. On the other side, combining these two methods will reduce the drawbacks of each methods when implemented separately. This paper aims to combine the methodology of FMEA and FTA in assessing risk. A case study in the metal company will illustrate how this methodology can be implemented. In the case study, this combined methodology will assess the internal risks that occur in the production process. Further, those internal risks should be mitigated based on their level of risks.
Testing Scientific Software: A Systematic Literature Review
Kanewala, Upulee; Bieman, James M.
2014-01-01
Context Scientific software plays an important role in critical decision making, for example making weather predictions based on climate models, and computation of evidence for research publications. Recently, scientists have had to retract publications due to errors caused by software faults. Systematic testing can identify such faults in code. Objective This study aims to identify specific challenges, proposed solutions, and unsolved problems faced when testing scientific software. Method We conducted a systematic literature survey to identify and analyze relevant literature. We identified 62 studies that provided relevant information about testing scientific software. Results We found that challenges faced when testing scientific software fall into two main categories: (1) testing challenges that occur due to characteristics of scientific software such as oracle problems and (2) testing challenges that occur due to cultural differences between scientists and the software engineering community such as viewing the code and the model that it implements as inseparable entities. In addition, we identified methods to potentially overcome these challenges and their limitations. Finally we describe unsolved challenges and how software engineering researchers and practitioners can help to overcome them. Conclusions Scientific software presents special challenges for testing. Specifically, cultural differences between scientist developers and software engineers, along with the characteristics of the scientific software make testing more difficult. Existing techniques such as code clone detection can help to improve the testing process. Software engineers should consider special challenges posed by scientific software such as oracle problems when developing testing techniques. PMID:25125798
RecPhyloXML - a format for reconciled gene trees.
Duchemin, Wandrille; Gence, Guillaume; Arigon Chifolleau, Anne-Muriel; Arvestad, Lars; Bansal, Mukul S; Berry, Vincent; Boussau, Bastien; Chevenet, François; Comte, Nicolas; Davín, Adrián A; Dessimoz, Christophe; Dylus, David; Hasic, Damir; Mallo, Diego; Planel, Rémi; Posada, David; Scornavacca, Celine; Szöllosi, Gergely; Zhang, Louxin; Tannier, Éric; Daubin, Vincent
2018-05-14
A reconciliation is an annotation of the nodes of a gene tree with evolutionary events-for example, speciation, gene duplication, transfer, loss, etc-along with a mapping onto a species tree. Many algorithms and software produce or use reconciliations but often using different reconciliation formats, regarding the type of events considered or whether the species tree is dated or not. This complicates the comparison and communication between different programs. Here, we gather a consortium of software developers in gene tree species tree reconciliation to propose and endorse a format that aims to promote an integrative-albeit flexible-specification of phylogenetic reconciliations. This format, named recPhyloXML, is accompanied by several tools such as a reconciled tree visualizer and conversion utilities. http://phylariane.univ-lyon1.fr/recphyloxml/. wandrille.duchemin@univ-lyon1.fr. There is no supplementary data associated with this publication.
Onboard Nonlinear Engine Sensor and Component Fault Diagnosis and Isolation Scheme
NASA Technical Reports Server (NTRS)
Tang, Liang; DeCastro, Jonathan A.; Zhang, Xiaodong
2011-01-01
A method detects and isolates in-flight sensor, actuator, and component faults for advanced propulsion systems. In sharp contrast to many conventional methods, which deal with either sensor fault or component fault, but not both, this method considers sensor fault, actuator fault, and component fault under one systemic and unified framework. The proposed solution consists of two main components: a bank of real-time, nonlinear adaptive fault diagnostic estimators for residual generation, and a residual evaluation module that includes adaptive thresholds and a Transferable Belief Model (TBM)-based residual evaluation scheme. By employing a nonlinear adaptive learning architecture, the developed approach is capable of directly dealing with nonlinear engine models and nonlinear faults without the need of linearization. Software modules have been developed and evaluated with the NASA C-MAPSS engine model. Several typical engine-fault modes, including a subset of sensor/actuator/components faults, were tested with a mild transient operation scenario. The simulation results demonstrated that the algorithm was able to successfully detect and isolate all simulated faults as long as the fault magnitudes were larger than the minimum detectable/isolable sizes, and no misdiagnosis occurred
Using Decision Trees to Detect and Isolate Simulated Leaks in the J-2X Rocket Engine
NASA Technical Reports Server (NTRS)
Schwabacher, Mark A.; Aguilar, Robert; Figueroa, Fernando F.
2009-01-01
The goal of this work was to use data-driven methods to automatically detect and isolate faults in the J-2X rocket engine. It was decided to use decision trees, since they tend to be easier to interpret than other data-driven methods. The decision tree algorithm automatically "learns" a decision tree by performing a search through the space of possible decision trees to find one that fits the training data. The particular decision tree algorithm used is known as C4.5. Simulated J-2X data from a high-fidelity simulator developed at Pratt & Whitney Rocketdyne and known as the Detailed Real-Time Model (DRTM) was used to "train" and test the decision tree. Fifty-six DRTM simulations were performed for this purpose, with different leak sizes, different leak locations, and different times of leak onset. To make the simulations as realistic as possible, they included simulated sensor noise, and included a gradual degradation in both fuel and oxidizer turbine efficiency. A decision tree was trained using 11 of these simulations, and tested using the remaining 45 simulations. In the training phase, the C4.5 algorithm was provided with labeled examples of data from nominal operation and data including leaks in each leak location. From the data, it "learned" a decision tree that can classify unseen data as having no leak or having a leak in one of the five leak locations. In the test phase, the decision tree produced very low false alarm rates and low missed detection rates on the unseen data. It had very good fault isolation rates for three of the five simulated leak locations, but it tended to confuse the remaining two locations, perhaps because a large leak at one of these two locations can look very similar to a small leak at the other location.
MacDonald III, Angus W.; Zick, Jennifer L.; Chafee, Matthew V.; Netoff, Theoden I.
2016-01-01
The grand challenges of schizophrenia research are linking the causes of the disorder to its symptoms and finding ways to overcome those symptoms. We argue that the field will be unable to address these challenges within psychiatry’s standard neo-Kraepelinian (DSM) perspective. At the same time the current corrective, based in molecular genetics and cognitive neuroscience, is also likely to flounder due to its neglect for psychiatry’s syndromal structure. We suggest adopting a new approach long used in reliability engineering, which also serves as a synthesis of these approaches. This approach, known as fault tree analysis, can be combined with extant neuroscientific data collection and computational modeling efforts to uncover the causal structures underlying the cognitive and affective failures in people with schizophrenia as well as other complex psychiatric phenomena. By making explicit how causes combine from basic faults to downstream failures, this approach makes affordances for: (1) causes that are neither necessary nor sufficient in and of themselves; (2) within-diagnosis heterogeneity; and (3) between diagnosis co-morbidity. PMID:26779007
Geology of Joshua Tree National Park geodatabase
Powell, Robert E.; Matti, Jonathan C.; Cossette, Pamela M.
2015-09-16
The database in this Open-File Report describes the geology of Joshua Tree National Park and was completed in support of the National Cooperative Geologic Mapping Program of the U.S. Geological Survey (USGS) and in cooperation with the National Park Service (NPS). The geologic observations and interpretations represented in the database are relevant to both the ongoing scientific interests of the USGS in southern California and the management requirements of NPS, specifically of Joshua Tree National Park (JOTR).Joshua Tree National Park is situated within the eastern part of California’s Transverse Ranges province and straddles the transition between the Mojave and Sonoran deserts. The geologically diverse terrain that underlies JOTR reveals a rich and varied geologic evolution, one that spans nearly two billion years of Earth history. The Park’s landscape is the current expression of this evolution, its varied landforms reflecting the differing origins of underlying rock types and their differing responses to subsequent geologic events. Crystalline basement in the Park consists of Proterozoic plutonic and metamorphic rocks intruded by a composite Mesozoic batholith of Triassic through Late Cretaceous plutons arrayed in northwest-trending lithodemic belts. The basement was exhumed during the Cenozoic and underwent differential deep weathering beneath a low-relief erosion surface, with the deepest weathering profiles forming on quartz-rich, biotite-bearing granitoid rocks. Disruption of the basement terrain by faults of the San Andreas system began ca. 20 Ma and the JOTR sinistral domain, preceded by basalt eruptions, began perhaps as early as ca. 7 Ma, but no later than 5 Ma. Uplift of the mountain blocks during this interval led to erosional stripping of the thick zones of weathered quartz-rich granitoid rocks to form etchplains dotted by bouldery tors—the iconic landscape of the Park. The stripped debris filled basins along the fault zones.Mountain ranges and basins in the Park exhibit an east-west physiographic grain controlled by left-lateral fault zones that form a sinistral domain within the broad zone of dextral shear along the transform boundary between the North American and Pacific plates. Geologic and geophysical evidence reveal that movement on the sinistral faults zones has resulted in left steps along the zones, resulting in the development of sub-basins beneath Pinto Basin and Shavers and Chuckwalla Valleys. The sinistral fault zones connect the Mojave Desert dextral faults of the Eastern California Shear Zone to the north and east with the Coachella Valley strands of the southern San Andreas Fault Zone to the west.Quaternary surficial deposits accumulated in alluvial washes and playas and lakes along the valley floors; in alluvial fans, washes, and sheet wash aprons along piedmonts flanking the mountain ranges; and in eolian dunes and sand sheets that span the transition from valley floor to piedmont slope. Sequences of Quaternary pediments are planed into piedmonts flanking valley-floor and upland basins, each pediment in turn overlain by successively younger residual and alluvial surficial deposits.
An experimental evaluation of software redundancy as a strategy for improving reliability
NASA Technical Reports Server (NTRS)
Eckhardt, Dave E., Jr.; Caglayan, Alper K.; Knight, John C.; Lee, Larry D.; Mcallister, David F.; Vouk, Mladen A.; Kelly, John P. J.
1990-01-01
The strategy of using multiple versions of independently developed software as a means to tolerate residual software design faults is suggested by the success of hardware redundancy for tolerating hardware failures. Although, as generally accepted, the independence of hardware failures resulting from physical wearout can lead to substantial increases in reliability for redundant hardware structures, a similar conclusion is not immediate for software. The degree to which design faults are manifested as independent failures determines the effectiveness of redundancy as a method for improving software reliability. Interest in multi-version software centers on whether it provides an adequate measure of increased reliability to warrant its use in critical applications. The effectiveness of multi-version software is studied by comparing estimates of the failure probabilities of these systems with the failure probabilities of single versions. The estimates are obtained under a model of dependent failures and compared with estimates obtained when failures are assumed to be independent. The experimental results are based on twenty versions of an aerospace application developed and certified by sixty programmers from four universities. Descriptions of the application, development and certification processes, and operational evaluation are given together with an analysis of the twenty versions.
NASA Astrophysics Data System (ADS)
Parra, Pablo; da Silva, Antonio; Polo, Óscar R.; Sánchez, Sebastián
2018-02-01
In this day and age, successful embedded critical software needs agile and continuous development and testing procedures. This paper presents the overall testing and code coverage metrics obtained during the unit testing procedure carried out to verify the correctness of the boot software that will run in the Instrument Control Unit (ICU) of the Energetic Particle Detector (EPD) on-board Solar Orbiter. The ICU boot software is a critical part of the project so its verification should be addressed at an early development stage, so any test case missed in this process may affect the quality of the overall on-board software. According to the European Cooperation for Space Standardization ESA standards, testing this kind of critical software must cover 100% of the source code statement and decision paths. This leads to the complete testing of fault tolerance and recovery mechanisms that have to resolve every possible memory corruption or communication error brought about by the space environment. The introduced procedure enables fault injection from the beginning of the development process and enables to fulfill the exigent code coverage demands on the boot software.
Improved FTA methodology and application to subsea pipeline reliability design.
Lin, Jing; Yuan, Yongbo; Zhang, Mingyuan
2014-01-01
An innovative logic tree, Failure Expansion Tree (FET), is proposed in this paper, which improves on traditional Fault Tree Analysis (FTA). It describes a different thinking approach for risk factor identification and reliability risk assessment. By providing a more comprehensive and objective methodology, the rather subjective nature of FTA node discovery is significantly reduced and the resulting mathematical calculations for quantitative analysis are greatly simplified. Applied to the Useful Life phase of a subsea pipeline engineering project, the approach provides a more structured analysis by constructing a tree following the laws of physics and geometry. Resulting improvements are summarized in comparison table form.
Improved FTA Methodology and Application to Subsea Pipeline Reliability Design
Lin, Jing; Yuan, Yongbo; Zhang, Mingyuan
2014-01-01
An innovative logic tree, Failure Expansion Tree (FET), is proposed in this paper, which improves on traditional Fault Tree Analysis (FTA). It describes a different thinking approach for risk factor identification and reliability risk assessment. By providing a more comprehensive and objective methodology, the rather subjective nature of FTA node discovery is significantly reduced and the resulting mathematical calculations for quantitative analysis are greatly simplified. Applied to the Useful Life phase of a subsea pipeline engineering project, the approach provides a more structured analysis by constructing a tree following the laws of physics and geometry. Resulting improvements are summarized in comparison table form. PMID:24667681
i-Tree: Tools to assess and manage structure, function, and value of community forests
NASA Astrophysics Data System (ADS)
Hirabayashi, S.; Nowak, D.; Endreny, T. A.; Kroll, C.; Maco, S.
2011-12-01
Trees in urban communities can mitigate many adverse effects associated with anthropogenic activities and climate change (e.g. urban heat island, greenhouse gas, air pollution, and floods). To protect environmental and human health, managers need to make informed decisions regarding urban forest management practices. Here we present the i-Tree suite of software tools (www.itreetools.org) developed by the USDA Forest Service and their cooperators. This software suite can help urban forest managers assess and manage the structure, function, and value of urban tree populations regardless of community size or technical capacity. i-Tree is a state-of-the-art, peer-reviewed Windows GUI- or Web-based software that is freely available, supported, and continuously refined by the USDA Forest Service and their cooperators. Two major features of i-Tree are 1) to analyze current canopy structures and identify potential planting spots, and 2) to estimate the environmental benefits provided by the trees, such as carbon storage and sequestration, energy conservation, air pollution removal, and storm water reduction. To cover diverse forest topologies, various tools were developed within the i-Tree suite: i-Tree Design for points (individual trees), i-Tree Streets for lines (street trees), and i-Tree Eco, Vue, and Canopy (in the order of complexity) for areas (community trees). Once the forest structure is identified with these tools, ecosystem services provided by trees can be estimated with common models and protocols, and reports in the form of texts, charts, and figures are then created for users. Since i-Tree was developed with a client/server architecture, nationwide data in the US such as location-related parameters, weather, streamflow, and air pollution data are stored in the server and retrieved to a user's computer at run-time. Freely available remote-sensed images (e.g. NLCD and Google maps) are also employed to estimate tree canopy characteristics. As the demand for i-Tree grows internationally, environmental databases from more countries will be coupled with the software suite. Two more i-Tree applications, i-Tree Forecast and i-Tree Landscape are now under development. i-Tree Forecast simulates canopy structures for up to 100 years based on planting and mortality rates and adds capabilities for other i-Tree applications to estimate the benefits of future canopy scenarios. While most i-Tree applications employ a spatially lumped approach, i-Tree landscape employs a spatially distributed approach that allows users to map changes in canopy cover and ecosystem services through time and space. These new i-Tree tools provide an advanced platform for urban managers to assess the impact of current and future urban forests. i-Tree allows managers to promote effective urban forest management and sound arboricultural practices by providing information for advocacy and planning, baseline data for making informed decisions, and standardization for comparisons with other communities.
Hardware fault insertion and instrumentation system: Mechanization and validation
NASA Technical Reports Server (NTRS)
Benson, J. W.
1987-01-01
Automated test capability for extensive low-level hardware fault insertion testing is developed. The test capability is used to calibrate fault detection coverage and associated latency times as relevant to projecting overall system reliability. Described are modifications made to the NASA Ames Reconfigurable Flight Control System (RDFCS) Facility to fully automate the total test loop involving the Draper Laboratories' Fault Injector Unit. The automated capability provided included the application of sequences of simulated low-level hardware faults, the precise measurement of fault latency times, the identification of fault symptoms, and bulk storage of test case results. A PDP-11/60 served as a test coordinator, and a PDP-11/04 as an instrumentation device. The fault injector was controlled by applications test software in the PDP-11/60, rather than by manual commands from a terminal keyboard. The time base was especially developed for this application to use a variety of signal sources in the system simulator.
RAMP: A fault tolerant distributed microcomputer structure for aircraft navigation and control
NASA Technical Reports Server (NTRS)
Dunn, W. R.
1980-01-01
RAMP consists of distributed sets of parallel computers partioned on the basis of software and packaging constraints. To minimize hardware and software complexity, the processors operate asynchronously. It was shown that through the design of asymptotically stable control laws, data errors due to the asynchronism were minimized. It was further shown that by designing control laws with this property and making minor hardware modifications to the RAMP modules, the system became inherently tolerant to intermittent faults. A laboratory version of RAMP was constructed and is described in the paper along with the experimental results.
A field-to-desktop toolchain for X-ray CT densitometry enables tree ring analysis
De Mil, Tom; Vannoppen, Astrid; Beeckman, Hans; Van Acker, Joris; Van den Bulcke, Jan
2016-01-01
Background and Aims Disentangling tree growth requires more than ring width data only. Densitometry is considered a valuable proxy, yet laborious wood sample preparation and lack of dedicated software limit the widespread use of density profiling for tree ring analysis. An X-ray computed tomography-based toolchain of tree increment cores is presented, which results in profile data sets suitable for visual exploration as well as density-based pattern matching. Methods Two temperate (Quercus petraea, Fagus sylvatica) and one tropical species (Terminalia superba) were used for density profiling using an X-ray computed tomography facility with custom-made sample holders and dedicated processing software. Key Results Density-based pattern matching is developed and able to detect anomalies in ring series that can be corrected via interactive software. Conclusions A digital workflow allows generation of structure-corrected profiles of large sets of cores in a short time span that provide sufficient intra-annual density information for tree ring analysis. Furthermore, visual exploration of such data sets is of high value. The dated profiles can be used for high-resolution chronologies and also offer opportunities for fast screening of lesser studied tropical tree species. PMID:27107414
Technical Basis for Evaluating Software-Related Common-Cause Failures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Muhlheim, Michael David; Wood, Richard
2016-04-01
The instrumentation and control (I&C) system architecture at a nuclear power plant (NPP) incorporates protections against common-cause failures (CCFs) through the use of diversity and defense-in-depth. Even for well-established analog-based I&C system designs, the potential for CCFs of multiple systems (or redundancies within a system) constitutes a credible threat to defeating the defense-in-depth provisions within the I&C system architectures. The integration of digital technologies into the I&C systems provides many advantages compared to the aging analog systems with respect to reliability, maintenance, operability, and cost effectiveness. However, maintaining the diversity and defense-in-depth for both the hardware and software within themore » digital system is challenging. In fact, the introduction of digital technologies may actually increase the potential for CCF vulnerabilities because of the introduction of undetected systematic faults. These systematic faults are defined as a “design fault located in a software component” and at a high level, are predominately the result of (1) errors in the requirement specification, (2) inadequate provisions to account for design limits (e.g., environmental stress), or (3) technical faults incorporated in the internal system (or architectural) design or implementation. Other technology-neutral CCF concerns include hardware design errors, equipment qualification deficiencies, installation or maintenance errors, instrument loop scaling and setpoint mistakes.« less
Finite Element Simulations of Kaikoura, NZ Earthquake using DInSAR and High-Resolution DSMs
NASA Astrophysics Data System (ADS)
Barba, M.; Willis, M. J.; Tiampo, K. F.; Glasscoe, M. T.; Clark, M. K.; Zekkos, D.; Stahl, T. A.; Massey, C. I.
2017-12-01
Three-dimensional displacements from the Kaikoura, NZ, earthquake in November 2016 are imaged here using Differential Interferometric Synthetic Aperture Radar (DInSAR) and high-resolution Digital Surface Model (DSM) differencing and optical pixel tracking. Full-resolution co- and post-seismic interferograms of Sentinel-1A/B images are constructed using the JPL ISCE software. The OSU SETSM software is used to produce repeat 0.5 m posting DSMs from commercial satellite imagery, which are supplemented with UAV derived DSMs over the Kaikoura fault rupture on the eastern South Island, NZ. DInSAR provides long-wavelength motions while DSM differencing and optical pixel tracking provides both horizontal and vertical near fault motions, improving the modeling of shallow rupture dynamics. JPL GeoFEST software is used to perform finite element modeling of the fault segments and slip distributions and, in turn, the associated asperity distribution. The asperity profile is then used to simulate event rupture, the spatial distribution of stress drop, and the associated stress changes. Finite element modeling of slope stability is accomplished using the ultra high-resolution UAV derived DSMs to examine the evolution of post-earthquake topography, landslide dynamics and volumes. Results include new insights into shallow dynamics of fault slip and partitioning, estimates of stress change, and improved understanding of its relationship with the associated seismicity, deformation, and triggered cascading hazards.
SSME digital control design characteristics
NASA Technical Reports Server (NTRS)
Mitchell, W. T.; Searle, R. F.
1985-01-01
To protect against a latent programming error (software fault) existing in an untried branch combination that would render the space shuttle out of control in a critical flight phase, the Backup Flight System (BFS) was chartered to provide a safety alternative. The BFS is designed to operate in critical flight phases (ascent and descent) by monitoring the activities of the space shuttle flight subsystems that are under control of the primary flight software (PFS) (e.g., navigation, crew interface, propulsion), then, upon manual command by the flightcrew, to assume control of the space shuttle and deliver it to a noncritical flight condition (safe orbit or touchdown). The problems associated with the selection of the PFS/BFS system architecture, the internal BFS architecture, the fault tolerant software mechanisms, and the long term BFS utility are discussed.
NASA Technical Reports Server (NTRS)
Brenner, Richard; Lala, Jaynarayan H.; Nagle, Gail A.; Schor, Andrei; Turkovich, John
1994-01-01
This program demonstrated the integration of a number of technologies that can increase the availability and reliability of launch vehicles while lowering costs. Availability is increased with an advanced guidance algorithm that adapts trajectories in real-time. Reliability is increased with fault-tolerant computers and communication protocols. Costs are reduced by automatically generating code and documentation. This program was realized through the cooperative efforts of academia, industry, and government. The NASA-LaRC coordinated the effort, while Draper performed the integration. Georgia Institute of Technology supplied a weak Hamiltonian finite element method for optimal control problems. Martin Marietta used MATLAB to apply this method to a launch vehicle (FENOC). Draper supplied the fault-tolerant computing and software automation technology. The fault-tolerant technology includes sequential and parallel fault-tolerant processors (FTP & FTPP) and authentication protocols (AP) for communication. Fault-tolerant technology was incrementally incorporated. Development culminated with a heterogeneous network of workstations and fault-tolerant computers using AP. Draper's software automation system, ASTER, was used to specify a static guidance system based on FENOC, navigation, flight control (GN&C), models, and the interface to a user interface for mission control. ASTER generated Ada code for GN&C and C code for models. An algebraic transform engine (ATE) was developed to automatically translate MATLAB scripts into ASTER.
The embedded software life cycle - An expanded view
NASA Technical Reports Server (NTRS)
Larman, Brian T.; Loesh, Robert E.
1989-01-01
Six common issues that are encountered in the development of software for embedded computer systems are discussed from the perspective of their interrelationships with the development process and/or the system itself. Particular attention is given to concurrent hardware/software development, prototyping, the inaccessibility of the operational system, fault tolerance, the long life cycle, and inheritance. It is noted that the life cycle for embedded software must include elements beyond simply the specification and implementation of the target software.
NASA Technical Reports Server (NTRS)
Leveson, Nancy
1987-01-01
Software safety and its relationship to other qualities are discussed. It is shown that standard reliability and fault tolerance techniques will not solve the safety problem for the present. A new attitude requires: looking at what you do NOT want software to do along with what you want it to do; and assuming things will go wrong. New procedures and changes to entire software development process are necessary: special software safety analysis techniques are needed; and design techniques, especially eliminating complexity, can be very helpful.
A Genetic Representation for Evolutionary Fault Recovery in Virtex FPGAs
NASA Technical Reports Server (NTRS)
Lohn, Jason; Larchev, Greg; DeMara, Ronald; Korsmeyer, David (Technical Monitor)
2003-01-01
Most evolutionary approaches to fault recovery in FPGAs focus on evolving alternative logic configurations as opposed to evolving the intra-cell routing. Since the majority of transistors in a typical FPGA are dedicated to interconnect, nearly 80% according to one estimate, evolutionary fault-recovery systems should benefit hy accommodating routing. In this paper, we propose an evolutionary fault-recovery system employing a genetic representation that takes into account both logic and routing configurations. Experiments were run using a software model of the Xilinx Virtex FPGA. We report that using four Virtex combinational logic blocks, we were able to evolve a 100% accurate quadrature decoder finite state machine in the presence of a stuck-at-zero fault.
Reinforcement and Drill by Microcomputer.
ERIC Educational Resources Information Center
Balajthy, Ernest
1984-01-01
Points out why drill work has a role in the language arts classroom, explores the possibilities of using a microcomputer to give children drill work, and discusses the characteristics of a good software program, along with faults found in many software programs. (FL)
Users guide for FRCS: fuel reduction cost simulator software.
Roger D. Fight; Bruce R. Hartsough; Peter Noordijk
2006-01-01
The Fuel Reduction Cost Simulator (FRCS) spreadsheet application is public domain software used to estimate costs for fuel reduction treatments involving removal of trees of mixed sizes in the form of whole trees, logs, or chips from a forest. Equipment production rates were developed from existing studies. Equipment operating cost rates are from December 2002 prices...
AceTree: a tool for visual analysis of Caenorhabditis elegans embryogenesis
Boyle, Thomas J; Bao, Zhirong; Murray, John I; Araya, Carlos L; Waterston, Robert H
2006-01-01
Background The invariant lineage of the nematode Caenorhabditis elegans has potential as a powerful tool for the description of mutant phenotypes and gene expression patterns. We previously described procedures for the imaging and automatic extraction of the cell lineage from C. elegans embryos. That method uses time-lapse confocal imaging of a strain expressing histone-GFP fusions and a software package, StarryNite, processes the thousands of images and produces output files that describe the location and lineage relationship of each nucleus at each time point. Results We have developed a companion software package, AceTree, which links the images and the annotations using tree representations of the lineage. This facilitates curation and editing of the lineage. AceTree also contains powerful visualization and interpretive tools, such as space filling models and tree-based expression patterning, that can be used to extract biological significance from the data. Conclusion By pairing a fast lineaging program written in C with a user interface program written in Java we have produced a powerful software suite for exploring embryonic development. PMID:16740163
AceTree: a tool for visual analysis of Caenorhabditis elegans embryogenesis.
Boyle, Thomas J; Bao, Zhirong; Murray, John I; Araya, Carlos L; Waterston, Robert H
2006-06-01
The invariant lineage of the nematode Caenorhabditis elegans has potential as a powerful tool for the description of mutant phenotypes and gene expression patterns. We previously described procedures for the imaging and automatic extraction of the cell lineage from C. elegans embryos. That method uses time-lapse confocal imaging of a strain expressing histone-GFP fusions and a software package, StarryNite, processes the thousands of images and produces output files that describe the location and lineage relationship of each nucleus at each time point. We have developed a companion software package, AceTree, which links the images and the annotations using tree representations of the lineage. This facilitates curation and editing of the lineage. AceTree also contains powerful visualization and interpretive tools, such as space filling models and tree-based expression patterning, that can be used to extract biological significance from the data. By pairing a fast lineaging program written in C with a user interface program written in Java we have produced a powerful software suite for exploring embryonic development.
Time-dependent seismic hazard analysis for the Greater Tehran and surrounding areas
NASA Astrophysics Data System (ADS)
Jalalalhosseini, Seyed Mostafa; Zafarani, Hamid; Zare, Mehdi
2018-01-01
This study presents a time-dependent approach for seismic hazard in Tehran and surrounding areas. Hazard is evaluated by combining background seismic activity, and larger earthquakes may emanate from fault segments. Using available historical and paleoseismological data or empirical relation, the recurrence time and maximum magnitude of characteristic earthquakes for the major faults have been explored. The Brownian passage time (BPT) distribution has been used to calculate equivalent fictitious seismicity rate for major faults in the region. To include ground motion uncertainty, a logic tree and five ground motion prediction equations have been selected based on their applicability in the region. Finally, hazard maps have been presented.
LIDAR Helps Identify Source of 1872 Earthquake Near Chelan, Washington
NASA Astrophysics Data System (ADS)
Sherrod, B. L.; Blakely, R. J.; Weaver, C. S.
2015-12-01
One of the largest historic earthquakes in the Pacific Northwest occurred on 15 December 1872 (M6.5-7) near the south end of Lake Chelan in north-central Washington State. Lack of recognized surface deformation suggested that the earthquake occurred on a blind, perhaps deep, fault. New LiDAR data show landslides and a ~6 km long, NW-side-up scarp in Spencer Canyon, ~30 km south of Lake Chelan. Two landslides in Spencer Canyon impounded small ponds. An historical account indicated that dead trees were visible in one pond in AD1884. Wood from a snag in the pond yielded a calibrated age of AD1670-1940. Tree ring counts show that the oldest living trees on each landslide are 130 and 128 years old. The larger of the two landslides obliterated the scarp and thus, post-dates the last scarp-forming event. Two trenches across the scarp exposed a NW-dipping thrust fault. One trench exposed alluvial fan deposits, Mazama ash, and scarp colluvium cut by a single thrust fault. Three charcoal samples from a colluvium buried during the last fault displacement had calibrated ages between AD1680 and AD1940. The second trench exposed gneiss thrust over colluvium during at least two, and possibly three fault displacements. The younger of two charcoal samples collected from a colluvium below gneiss had a calibrated age of AD1665- AD1905. For an historical constraint, we assume that the lack of felt reports for large earthquakes in the period between 1872 and today indicates that no large earthquakes capable of rupturing the ground surface occurred in the region after the 1872 earthquake; thus the last displacement on the Spencer Canyon scarp cannot post-date the 1872 earthquake. Modeling of the age data suggests that the last displacement occurred between AD1840 and AD1890. These data, combined with the historical record, indicate that this fault is the source of the 1872 earthquake. Analyses of aeromagnetic data reveal lithologic contacts beneath the scarp that form an ENE-striking, curvilinear zone ~2.5 km wide and ~55 km long. This zone coincides with monoclines mapped in Mesozoic bedrock and Miocene flood basalts. This study ends uncertainty regarding the source of the 1872 earthquake and provides important information for seismic hazard analyses of major infrastructure projects in Washington and British Columbia.
Fault detection and fault tolerance in robotics
NASA Technical Reports Server (NTRS)
Visinsky, Monica; Walker, Ian D.; Cavallaro, Joseph R.
1992-01-01
Robots are used in inaccessible or hazardous environments in order to alleviate some of the time, cost and risk involved in preparing men to endure these conditions. In order to perform their expected tasks, the robots are often quite complex, thus increasing their potential for failures. If men must be sent into these environments to repair each component failure in the robot, the advantages of using the robot are quickly lost. Fault tolerant robots are needed which can effectively cope with failures and continue their tasks until repairs can be realistically scheduled. Before fault tolerant capabilities can be created, methods of detecting and pinpointing failures must be perfected. This paper develops a basic fault tree analysis of a robot in order to obtain a better understanding of where failures can occur and how they contribute to other failures in the robot. The resulting failure flow chart can also be used to analyze the resiliency of the robot in the presence of specific faults. By simulating robot failures and fault detection schemes, the problems involved in detecting failures for robots are explored in more depth.
NASA Astrophysics Data System (ADS)
Lai, Wenqing; Wang, Yuandong; Li, Wenpeng; Sun, Guang; Qu, Guomin; Cui, Shigang; Li, Mengke; Wang, Yongqiang
2017-10-01
Based on long term vibration monitoring of the No.2 oil-immersed fat wave reactor in the ±500kV converter station in East Mongolia, the vibration signals in normal state and in core loose fault state were saved. Through the time-frequency analysis of the signals, the vibration characteristics of the core loose fault were obtained, and a fault diagnosis method based on the dual tree complex wavelet (DT-CWT) and support vector machine (SVM) was proposed. The vibration signals were analyzed by DT-CWT, and the energy entropy of the vibration signals were taken as the feature vector; the support vector machine was used to train and test the feature vector, and the accurate identification of the core loose fault of the flat wave reactor was realized. Through the identification of many groups of normal and core loose fault state vibration signals, the diagnostic accuracy of the result reached 97.36%. The effectiveness and accuracy of the method in the fault diagnosis of the flat wave reactor core is verified.
Graymer, R.W.; Ponce, D.A.; Jachens, R.C.; Simpson, R.W.; Phelps, G.A.; Wentworth, C.M.
2005-01-01
In order to better understand mechanisms of active faults, we studied relationships between fault behavior and rock units along the Hayward fault using a three-dimensional geologic map. The three-dimensional map-constructed from hypocenters, potential field data, and surface map data-provided a geologic map of each fault surface, showing rock units on either side of the fault truncated by the fault. The two fault-surface maps were superimposed to create a rock-rock juxtaposition map. The three maps were compared with seismicity, including aseismic patches, surface creep, and fault dip along the fault, by using visuallization software to explore three-dimensional relationships. Fault behavior appears to be correlated to the fault-surface maps, but not to the rock-rock juxtaposition map, suggesting that properties of individual wall-rock units, including rock strength, play an important role in fault behavior. Although preliminary, these results suggest that any attempt to understand the detailed distribution of earthquakes or creep along a fault should include consideration of the rock types that abut the fault surface, including the incorporation of observations of physical properties of the rock bodies that intersect the fault at depth. ?? 2005 Geological Society of America.
Method and system for dynamic probabilistic risk assessment
NASA Technical Reports Server (NTRS)
Dugan, Joanne Bechta (Inventor); Xu, Hong (Inventor)
2013-01-01
The DEFT methodology, system and computer readable medium extends the applicability of the PRA (Probabilistic Risk Assessment) methodology to computer-based systems, by allowing DFT (Dynamic Fault Tree) nodes as pivot nodes in the Event Tree (ET) model. DEFT includes a mathematical model and solution algorithm, supports all common PRA analysis functions and cutsets. Additional capabilities enabled by the DFT include modularization, phased mission analysis, sequence dependencies, and imperfect coverage.
Development of a photogrammetric method of measuring tree taper outside bark
David R. Larsen
2006-01-01
A photogrammetric method is presented for measuring tree diameters outside bark using calibrated control ground-based digital photographs. The method was designed to rapidly collect tree taper information from subject trees for the development of tree taper equations. Software that is commercially available, but designed for a different purpose, can be readily adapted...
Fault diagnosis of helical gearbox using acoustic signal and wavelets
NASA Astrophysics Data System (ADS)
Pranesh, SK; Abraham, Siju; Sugumaran, V.; Amarnath, M.
2017-05-01
The efficient transmission of power in machines is needed and gears are an appropriate choice. Faults in gears result in loss of energy and money. The monitoring and fault diagnosis are done by analysis of the acoustic and vibrational signals which are generally considered to be unwanted by products. This study proposes the usage of machine learning algorithm for condition monitoring of a helical gearbox by using the sound signals produced by the gearbox. Artificial faults were created and subsequently signals were captured by a microphone. An extensive study using different wavelet transformations for feature extraction from the acoustic signals was done, followed by waveletselection and feature selection using J48 decision tree and feature classification was performed using K star algorithm. Classification accuracy of 100% was obtained in the study
NASA Technical Reports Server (NTRS)
Fitz, Rhonda; Whitman, Gerek
2016-01-01
Research into complexities of software systems Fault Management (FM) and how architectural design decisions affect safety, preservation of assets, and maintenance of desired system functionality has coalesced into a technical reference (TR) suite that advances the provision of safety and mission assurance. The NASA Independent Verification and Validation (IV&V) Program, with Software Assurance Research Program support, extracted FM architectures across the IV&V portfolio to evaluate robustness, assess visibility for validation and test, and define software assurance methods applied to the architectures and designs. This investigation spanned IV&V projects with seven different primary developers, a wide range of sizes and complexities, and encompassed Deep Space Robotic, Human Spaceflight, and Earth Orbiter mission FM architectures. The initiative continues with an expansion of the TR suite to include Launch Vehicles, adding the benefit of investigating differences intrinsic to model-based FM architectures and insight into complexities of FM within an Agile software development environment, in order to improve awareness of how nontraditional processes affect FM architectural design and system health management. The identification of particular FM architectures, visibility, and associated IV&V techniques provides a TR suite that enables greater assurance that critical software systems will adequately protect against faults and respond to adverse conditions. Additionally, the role FM has with regard to strengthened security requirements, with potential to advance overall asset protection of flight software systems, is being addressed with the development of an adverse conditions database encompassing flight software vulnerabilities. Capitalizing on the established framework, this TR suite provides assurance capability for a variety of FM architectures and varied development approaches. Research results are being disseminated across NASA, other agencies, and the software community. This paper discusses the findings and TR suite informing the FM domain in best practices for FM architectural design, visibility observations, and methods employed for IV&V and mission assurance.
MulRF: a software package for phylogenetic analysis using multi-copy gene trees.
Chaudhary, Ruchi; Fernández-Baca, David; Burleigh, John Gordon
2015-02-01
MulRF is a platform-independent software package for phylogenetic analysis using multi-copy gene trees. It seeks the species tree that minimizes the Robinson-Foulds (RF) distance to the input trees using a generalization of the RF distance to multi-labeled trees. The underlying generic tree distance measure and fast running time make MulRF useful for inferring phylogenies from large collections of gene trees, in which multiple evolutionary processes as well as phylogenetic error may contribute to gene tree discord. MulRF implements several features for customizing the species tree search and assessing the results, and it provides a user-friendly graphical user interface (GUI) with tree visualization. The species tree search is implemented in C++ and the GUI in Java Swing. MulRF's executable as well as sample datasets and manual are available at http://genome.cs.iastate.edu/CBL/MulRF/, and the source code is available at https://github.com/ruchiherself/MulRFRepo. ruchic@ufl.edu Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Inferring patterns in mitochondrial DNA sequences through hypercube independent spanning trees.
Silva, Eduardo Sant Ana da; Pedrini, Helio
2016-03-01
Given a graph G, a set of spanning trees rooted at a vertex r of G is said vertex/edge independent if, for each vertex v of G, v≠r, the paths of r to v in any pair of trees are vertex/edge disjoint. Independent spanning trees (ISTs) provide a number of advantages in data broadcasting due to their fault tolerant properties. For this reason, some studies have addressed the issue by providing mechanisms for constructing independent spanning trees efficiently. In this work, we investigate how to construct independent spanning trees on hypercubes, which are generated based upon spanning binomial trees, and how to use them to predict mitochondrial DNA sequence parts through paths on the hypercube. The prediction works both for inferring mitochondrial DNA sequences comprised of six bases as well as infer anomalies that probably should not belong to the mitochondrial DNA standard. Copyright © 2016 Elsevier Ltd. All rights reserved.
Nouri.Gharahasanlou, Ali; Mokhtarei, Ashkan; Khodayarei, Aliasqar; Ataei, Mohammad
2014-01-01
Evaluating and analyzing the risk in the mining industry is a new approach for improving the machinery performance. Reliability, safety, and maintenance management based on the risk analysis can enhance the overall availability and utilization of the mining technological systems. This study investigates the failure occurrence probability of the crushing and mixing bed hall department at Azarabadegan Khoy cement plant by using fault tree analysis (FTA) method. The results of the analysis in 200 h operating interval show that the probability of failure occurrence for crushing, conveyor systems, crushing and mixing bed hall department is 73, 64, and 95 percent respectively and the conveyor belt subsystem found as the most probable system for failure. Finally, maintenance as a method of control and prevent the occurrence of failure is proposed. PMID:26779433
Towards generating ECSS-compliant fault tree analysis results via ConcertoFLA
NASA Astrophysics Data System (ADS)
Gallina, B.; Haider, Z.; Carlsson, A.
2018-05-01
Attitude Control Systems (ACSs) maintain the orientation of the satellite in three-dimensional space. ACSs need to be engineered in compliance with ECSS standards and need to ensure a certain degree of dependability. Thus, dependability analysis is conducted at various levels and by using ECSS-compliant techniques. Fault Tree Analysis (FTA) is one of these techniques. FTA is being automated within various Model Driven Engineering (MDE)-based methodologies. The tool-supported CHESS-methodology is one of them. This methodology incorporates ConcertoFLA, a dependability analysis technique enabling failure behavior analysis and thus FTA-results generation. ConcertoFLA, however, similarly to other techniques, still belongs to the academic research niche. To promote this technique within the space industry, we apply it on an ACS and discuss about its multi-faceted potentialities in the context of ECSS-compliant engineering.
NASA Astrophysics Data System (ADS)
Zeng, Yajun; Skibniewski, Miroslaw J.
2013-08-01
Enterprise resource planning (ERP) system implementations are often characterised with large capital outlay, long implementation duration, and high risk of failure. In order to avoid ERP implementation failure and realise the benefits of the system, sound risk management is the key. This paper proposes a probabilistic risk assessment approach for ERP system implementation projects based on fault tree analysis, which models the relationship between ERP system components and specific risk factors. Unlike traditional risk management approaches that have been mostly focused on meeting project budget and schedule objectives, the proposed approach intends to address the risks that may cause ERP system usage failure. The approach can be used to identify the root causes of ERP system implementation usage failure and quantify the impact of critical component failures or critical risk events in the implementation process.
Accelerated Monte Carlo Simulation for Safety Analysis of the Advanced Airspace Concept
NASA Technical Reports Server (NTRS)
Thipphavong, David
2010-01-01
Safe separation of aircraft is a primary objective of any air traffic control system. An accelerated Monte Carlo approach was developed to assess the level of safety provided by a proposed next-generation air traffic control system. It combines features of fault tree and standard Monte Carlo methods. It runs more than one order of magnitude faster than the standard Monte Carlo method while providing risk estimates that only differ by about 10%. It also preserves component-level model fidelity that is difficult to maintain using the standard fault tree method. This balance of speed and fidelity allows sensitivity analysis to be completed in days instead of weeks or months with the standard Monte Carlo method. Results indicate that risk estimates are sensitive to transponder, pilot visual avoidance, and conflict detection failure probabilities.
Nouri Gharahasanlou, Ali; Mokhtarei, Ashkan; Khodayarei, Aliasqar; Ataei, Mohammad
2014-04-01
Evaluating and analyzing the risk in the mining industry is a new approach for improving the machinery performance. Reliability, safety, and maintenance management based on the risk analysis can enhance the overall availability and utilization of the mining technological systems. This study investigates the failure occurrence probability of the crushing and mixing bed hall department at Azarabadegan Khoy cement plant by using fault tree analysis (FTA) method. The results of the analysis in 200 h operating interval show that the probability of failure occurrence for crushing, conveyor systems, crushing and mixing bed hall department is 73, 64, and 95 percent respectively and the conveyor belt subsystem found as the most probable system for failure. Finally, maintenance as a method of control and prevent the occurrence of failure is proposed.
Risk assessment techniques with applicability in marine engineering
NASA Astrophysics Data System (ADS)
Rudenko, E.; Panaitescu, F. V.; Panaitescu, M.
2015-11-01
Nowadays risk management is a carefully planned process. The task of risk management is organically woven into the general problem of increasing the efficiency of business. Passive attitude to risk and awareness of its existence are replaced by active management techniques. Risk assessment is one of the most important stages of risk management, since for risk management it is necessary first to analyze and evaluate risk. There are many definitions of this notion but in general case risk assessment refers to the systematic process of identifying the factors and types of risk and their quantitative assessment, i.e. risk analysis methodology combines mutually complementary quantitative and qualitative approaches. Purpose of the work: In this paper we will consider as risk assessment technique Fault Tree analysis (FTA). The objectives are: understand purpose of FTA, understand and apply rules of Boolean algebra, analyse a simple system using FTA, FTA advantages and disadvantages. Research and methodology: The main purpose is to help identify potential causes of system failures before the failures actually occur. We can evaluate the probability of the Top event.The steps of this analize are: the system's examination from Top to Down, the use of symbols to represent events, the use of mathematical tools for critical areas, the use of Fault tree logic diagrams to identify the cause of the Top event. Results: In the finally of study it will be obtained: critical areas, Fault tree logical diagrams and the probability of the Top event. These results can be used for the risk assessment analyses.
Automatically generated acceptance test: A software reliability experiment
NASA Technical Reports Server (NTRS)
Protzel, Peter W.
1988-01-01
This study presents results of a software reliability experiment investigating the feasibility of a new error detection method. The method can be used as an acceptance test and is solely based on empirical data about the behavior of internal states of a program. The experimental design uses the existing environment of a multi-version experiment previously conducted at the NASA Langley Research Center, in which the launch interceptor problem is used as a model. This allows the controlled experimental investigation of versions with well-known single and multiple faults, and the availability of an oracle permits the determination of the error detection performance of the test. Fault interaction phenomena are observed that have an amplifying effect on the number of error occurrences. Preliminary results indicate that all faults examined so far are detected by the acceptance test. This shows promise for further investigations, and for the employment of this test method on other applications.
NASA Astrophysics Data System (ADS)
Alshammari, A.; Brantley, D.; Knapp, C. C.; Lakshmi, V.
2017-12-01
In this study, multi chemical components ((H2O, H2S) will be injected with supercritical carbon dioxide in onshore part of South Georgia Rift (SGR) Basin model. Chemical reaction expected issue between these components to produce stable mineral of carbonite rocks by the time. The 3D geological model has been extracted from petrel software and computer modelling group (CMG) package software has been used to build simulation model explain the effect of mineralization on fault permeability that control on plume migration critically between (0-0.05 m Darcy). The expected results will be correlated with single component case (CO2 only) to evaluate the importance the mineralization on CO2 plume migration in structure and stratigraphic traps and detect the variation of fault leakage in case of critical values (low permeability). The results will also, show us the ratio of every trapped phase in (SGR) basin reservoir model.
An SSME High Pressure Oxidizer Turbopump diagnostic system using G2 real-time expert system
NASA Technical Reports Server (NTRS)
Guo, Ten-Huei
1991-01-01
An expert system which diagnoses various seal leakage faults in the High Pressure Oxidizer Turbopump of the SSME was developed using G2 real-time expert system. Three major functions of the software were implemented: model-based data generation, real-time expert system reasoning, and real-time input/output communication. This system is proposed as one module of a complete diagnostic system for the SSME. Diagnosis of a fault is defined as the determination of its type, severity, and likelihood. Since fault diagnosis is often accomplished through the use of heuristic human knowledge, an expert system based approach has been adopted as a paradigm to develop this diagnostic system. To implement this approach, a software shell which can be easily programmed to emulate the human decision process, the G2 Real-Time Expert System, was selected. Lessons learned from this implementation are discussed.
An SSME high pressure oxidizer turbopump diagnostic system using G2(TM) real-time expert system
NASA Technical Reports Server (NTRS)
Guo, Ten-Huei
1991-01-01
An expert system which diagnoses various seal leakage faults in the High Pressure Oxidizer Turbopump of the SSME was developed using G2(TM) real-time expert system. Three major functions of the software were implemented: model-based data generation, real-time expert system reasoning, and real-time input/output communication. This system is proposed as one module of a complete diagnostic system for Space Shuttle Main Engine. Diagnosis of a fault is defined as the determination of its type, severity, and likelihood. Since fault diagnosis is often accomplished through the use of heuristic human knowledge, an expert system based approach was adopted as a paradigm to develop this diagnostic system. To implement this approach, a software shell which can be easily programmed to emulate the human decision process, the G2 Real-Time Expert System, was selected. Lessons learned from this implementation are discussed.
Progressive retry for software error recovery in distributed systems
NASA Technical Reports Server (NTRS)
Wang, Yi-Min; Huang, Yennun; Fuchs, W. K.
1993-01-01
In this paper, we describe a method of execution retry for bypassing software errors based on checkpointing, rollback, message reordering and replaying. We demonstrate how rollback techniques, previously developed for transient hardware failure recovery, can also be used to recover from software faults by exploiting message reordering to bypass software errors. Our approach intentionally increases the degree of nondeterminism and the scope of rollback when a previous retry fails. Examples from our experience with telecommunications software systems illustrate the benefits of the scheme.
NASA Astrophysics Data System (ADS)
Indah, F. P.; Syafriani, S.; Andiyansyah, Z. S.
2018-04-01
Sumatra is in an active subduction zone between the indo-australian plate and the eurasian plate and is located at a fault along the sumatra fault so that sumatra is vulnerable to earthquakes. One of the ways to find out the cause of earthquake can be done by identifying the type of earthquake-causing faults based on earthquake of focal mechanism. The data used to identify the type of fault cause of earthquake is the earth tensor moment data which is sourced from global cmt period 1976-2016. The data used in this research using magnitude m ≥ 6 sr. This research uses gmt software (generic mapping tolls) to describe the form of fault. From the research result, it is found that the characteristics of fault field that formed in every region in sumatera island based on data processing and data of earthquake history of 1976-2016 period that the type of fault in sumatera fault is strike slip, fault type in mentawai fault is reverse fault (rising faults) and dip-slip, while the fault type in the subduction zone is dip-slip.
A research program in empirical computer science
NASA Technical Reports Server (NTRS)
Knight, J. C.
1991-01-01
During the grant reporting period our primary activities have been to begin preparation for the establishment of a research program in experimental computer science. The focus of research in this program will be safety-critical systems. Many questions that arise in the effort to improve software dependability can only be addressed empirically. For example, there is no way to predict the performance of the various proposed approaches to building fault-tolerant software. Performance models, though valuable, are parameterized and cannot be used to make quantitative predictions without experimental determination of underlying distributions. In the past, experimentation has been able to shed some light on the practical benefits and limitations of software fault tolerance. It is common, also, for experimentation to reveal new questions or new aspects of problems that were previously unknown. A good example is the Consistent Comparison Problem that was revealed by experimentation and subsequently studied in depth. The result was a clear understanding of a previously unknown problem with software fault tolerance. The purpose of a research program in empirical computer science is to perform controlled experiments in the area of real-time, embedded control systems. The goal of the various experiments will be to determine better approaches to the construction of the software for computing systems that have to be relied upon. As such it will validate research concepts from other sources, provide new research results, and facilitate the transition of research results from concepts to practical procedures that can be applied with low risk to NASA flight projects. The target of experimentation will be the production software development activities undertaken by any organization prepared to contribute to the research program. Experimental goals, procedures, data analysis and result reporting will be performed for the most part by the University of Virginia.
Khoshgoftaar, T M; Allen, E B; Hudepohl, J P; Aud, S J
1997-01-01
Society relies on telecommunications to such an extent that telecommunications software must have high reliability. Enhanced measurement for early risk assessment of latent defects (EMERALD) is a joint project of Nortel and Bell Canada for improving the reliability of telecommunications software products. This paper reports a case study of neural-network modeling techniques developed for the EMERALD system. The resulting neural network is currently in the prototype testing phase at Nortel. Neural-network models can be used to identify fault-prone modules for extra attention early in development, and thus reduce the risk of operational problems with those modules. We modeled a subset of modules representing over seven million lines of code from a very large telecommunications software system. The set consisted of those modules reused with changes from the previous release. The dependent variable was membership in the class of fault-prone modules. The independent variables were principal components of nine measures of software design attributes. We compared the neural-network model with a nonparametric discriminant model and found the neural-network model had better predictive accuracy.
Extended Testability Analysis Tool
NASA Technical Reports Server (NTRS)
Melcher, Kevin; Maul, William A.; Fulton, Christopher
2012-01-01
The Extended Testability Analysis (ETA) Tool is a software application that supports fault management (FM) by performing testability analyses on the fault propagation model of a given system. Fault management includes the prevention of faults through robust design margins and quality assurance methods, or the mitigation of system failures. Fault management requires an understanding of the system design and operation, potential failure mechanisms within the system, and the propagation of those potential failures through the system. The purpose of the ETA Tool software is to process the testability analysis results from a commercial software program called TEAMS Designer in order to provide a detailed set of diagnostic assessment reports. The ETA Tool is a command-line process with several user-selectable report output options. The ETA Tool also extends the COTS testability analysis and enables variation studies with sensor sensitivity impacts on system diagnostics and component isolation using a single testability output. The ETA Tool can also provide extended analyses from a single set of testability output files. The following analysis reports are available to the user: (1) the Detectability Report provides a breakdown of how each tested failure mode was detected, (2) the Test Utilization Report identifies all the failure modes that each test detects, (3) the Failure Mode Isolation Report demonstrates the system s ability to discriminate between failure modes, (4) the Component Isolation Report demonstrates the system s ability to discriminate between failure modes relative to the components containing the failure modes, (5) the Sensor Sensor Sensitivity Analysis Report shows the diagnostic impact due to loss of sensor information, and (6) the Effect Mapping Report identifies failure modes that result in specified system-level effects.
Bodin, Paul; Bilham, Roger; Behr, Jeff; Gomberg, Joan; Hudnut, Kenneth W.
1994-01-01
Five out of six functioning creepmeters on southern California faults recorded slip triggered at the time of some or all of the three largest events of the 1992 Landers earthquake sequence. Digital creep data indicate that dextral slip was triggered within 1 min of each mainshock and that maximum slip velocities occurred 2 to 3 min later. The duration of triggered slip events ranged from a few hours to several weeks. We note that triggered slip occurs commonly on faults that exhibit fault creep. To account for the observation that slip can be triggered repeatedly on a fault, we propose that the amplitude of triggered slip may be proportional to the depth of slip in the creep event and to the available near-surface tectonic strain that would otherwise eventually be released as fault creep. We advance the notion that seismic surface waves, perhaps amplified by sediments, generate transient local conditions that favor the release of tectonic strain to varying depths. Synthetic strain seismograms are presented that suggest increased pore pressure during periods of fault-normal contraction may be responsible for triggered slip, since maximum dextral shear strain transients correspond to times of maximum fault-normal contraction.
Modeling of a latent fault detector in a digital system
NASA Technical Reports Server (NTRS)
Nagel, P. M.
1978-01-01
Methods of modeling the detection time or latency period of a hardware fault in a digital system are proposed that explain how a computer detects faults in a computational mode. The objectives were to study how software reacts to a fault, to account for as many variables as possible affecting detection and to forecast a given program's detecting ability prior to computation. A series of experiments were conducted on a small emulated microprocessor with fault injection capability. Results indicate that the detecting capability of a program largely depends on the instruction subset used during computation and the frequency of its use and has little direct dependence on such variables as fault mode, number set, degree of branching and program length. A model is discussed which employs an analog with balls in an urn to explain the rate of which subsequent repetitions of an instruction or instruction set detect a given fault.
Proceedings of the Twenty-Third Annual Software Engineering Workshop
NASA Technical Reports Server (NTRS)
1999-01-01
The Twenty-third Annual Software Engineering Workshop (SEW) provided 20 presentations designed to further the goals of the Software Engineering Laboratory (SEL) of the NASA-GSFC. The presentations were selected on their creativity. The sessions which were held on 2-3 of December 1998, centered on the SEL, Experimentation, Inspections, Fault Prediction, Verification and Validation, and Embedded Systems and Safety-Critical Systems.
Operational Suitability Guide. Volume 2. Templates
1990-05-01
Intended mission, and the required technical and operational characteristics. The mission must be adequately defined and key hardware and software ...operational availability. With the use of fault-tolerant computer hardware and software , the system R&M will significantly improve end-to-end...should Include both hardware and software elements, as appropriate. Unique characteristics or unique support concepts should be Identified if they result
An Incremental Life-cycle Assurance Strategy for Critical System Certification
2014-11-04
for Safe Aircraft Operation Embedded software systems introduce a new class of problems not addressed by traditional system modeling & analysis...Platform Runtime Architecture Application Software Embedded SW System Engineer Data Stream Characteristics Latency jitter affects control behavior...do system level failures still occur despite fault tolerance techniques being deployed in systems ? Embedded software system as major source of
Resilience Design Patterns - A Structured Approach to Resilience at Extreme Scale (version 1.0)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hukerikar, Saurabh; Engelmann, Christian
Reliability is a serious concern for future extreme-scale high-performance computing (HPC) systems. Projections based on the current generation of HPC systems and technology roadmaps suggest that very high fault rates in future systems. The errors resulting from these faults will propagate and generate various kinds of failures, which may result in outcomes ranging from result corruptions to catastrophic application crashes. Practical limits on power consumption in HPC systems will require future systems to embrace innovative architectures, increasing the levels of hardware and software complexities. The resilience challenge for extreme-scale HPC systems requires management of various hardware and software technologies thatmore » are capable of handling a broad set of fault models at accelerated fault rates. These techniques must seek to improve resilience at reasonable overheads to power consumption and performance. While the HPC community has developed various solutions, application-level as well as system-based solutions, the solution space of HPC resilience techniques remains fragmented. There are no formal methods and metrics to investigate and evaluate resilience holistically in HPC systems that consider impact scope, handling coverage, and performance & power eciency across the system stack. Additionally, few of the current approaches are portable to newer architectures and software ecosystems, which are expected to be deployed on future systems. In this document, we develop a structured approach to the management of HPC resilience based on the concept of resilience-based design patterns. A design pattern is a general repeatable solution to a commonly occurring problem. We identify the commonly occurring problems and solutions used to deal with faults, errors and failures in HPC systems. The catalog of resilience design patterns provides designers with reusable design elements. We define a design framework that enhances our understanding of the important constraints and opportunities for solutions deployed at various layers of the system stack. The framework may be used to establish mechanisms and interfaces to coordinate flexible fault management across hardware and software components. The framework also enables optimization of the cost-benefit trade-os among performance, resilience, and power consumption. The overall goal of this work is to enable a systematic methodology for the design and evaluation of resilience technologies in extreme-scale HPC systems that keep scientific applications running to a correct solution in a timely and cost-ecient manner in spite of frequent faults, errors, and failures of various types.« less
NASA Technical Reports Server (NTRS)
Dewberry, Brandon S.
1990-01-01
The Environmental Control and Life Support System (ECLSS) is a Freedom Station distributed system with inherent applicability to advanced automation primarily due to the comparatively large reaction times of its subsystem processes. This allows longer contemplation times in which to form a more intelligent control strategy and to detect or prevent faults. The objective of the ECLSS Advanced Automation Project is to reduce the flight and ground manpower needed to support the initial and evolutionary ECLS system. The approach is to search out and make apparent those processes in the baseline system which are in need of more automatic control and fault detection strategies, to influence the ECLSS design by suggesting software hooks and hardware scars which will allow easy adaptation to advanced algorithms, and to develop complex software prototypes which fit into the ECLSS software architecture and will be shown in an ECLSS hardware testbed to increase the autonomy of the system. Covered here are the preliminary investigation and evaluation process, aimed at searching the ECLSS for candidate functions for automation and providing a software hooks and hardware scars analysis. This analysis shows changes needed in the baselined system for easy accommodation of knowledge-based or other complex implementations which, when integrated in flight or ground sustaining engineering architectures, will produce a more autonomous and fault tolerant Environmental Control and Life Support System.
Urban Crowns: crown analysis software to assist in quantifying urban tree benefits
Matthew F. Winn; Sang-Mook Lee Bradley; Philip A. Araman
2010-01-01
UrbanCrowns is a Microsoft® Windows®-based computer program developed by the U.S. Forest Service Southern Research Station. The software assists urban forestry professionals, arborists, and community volunteers in assessing and monitoring the crown characteristics of urban trees (both deciduous and coniferous) using a single side-view digital photograph. Program output...
Machine Learning of Fault Friction
NASA Astrophysics Data System (ADS)
Johnson, P. A.; Rouet-Leduc, B.; Hulbert, C.; Marone, C.; Guyer, R. A.
2017-12-01
We are applying machine learning (ML) techniques to continuous acoustic emission (AE) data from laboratory earthquake experiments. Our goal is to apply explicit ML methods to this acoustic datathe AE in order to infer frictional properties of a laboratory fault. The experiment is a double direct shear apparatus comprised of fault blocks surrounding fault gouge comprised of glass beads or quartz powder. Fault characteristics are recorded, including shear stress, applied load (bulk friction = shear stress/normal load) and shear velocity. The raw acoustic signal is continuously recorded. We rely on explicit decision tree approaches (Random Forest and Gradient Boosted Trees) that allow us to identify important features linked to the fault friction. A training procedure that employs both the AE and the recorded shear stress from the experiment is first conducted. Then, testing takes place on data the algorithm has never seen before, using only the continuous AE signal. We find that these methods provide rich information regarding frictional processes during slip (Rouet-Leduc et al., 2017a; Hulbert et al., 2017). In addition, similar machine learning approaches predict failure times, as well as slip magnitudes in some cases. We find that these methods work for both stick slip and slow slip experiments, for periodic slip and for aperiodic slip. We also derive a fundamental relationship between the AE and the friction describing the frictional behavior of any earthquake slip cycle in a given experiment (Rouet-Leduc et al., 2017b). Our goal is to ultimately scale these approaches to Earth geophysical data to probe fault friction. References Rouet-Leduc, B., C. Hulbert, N. Lubbers, K. Barros, C. Humphreys and P. A. Johnson, Machine learning predicts laboratory earthquakes, in review (2017). https://arxiv.org/abs/1702.05774Rouet-LeDuc, B. et al., Friction Laws Derived From the Acoustic Emissions of a Laboratory Fault by Machine Learning (2017), AGU Fall Meeting Session S025: Earthquake source: from the laboratory to the fieldHulbert, C., Characterizing slow slip applying machine learning (2017), AGU Fall Meeting Session S019: Slow slip, Tectonic Tremor, and the Brittle-to-Ductile Transition Zone: What mechanisms control the diversity of slow and fast earthquakes?
Software Requirements Analysis as Fault Predictor
NASA Technical Reports Server (NTRS)
Wallace, Dolores
2003-01-01
Waiting until the integration and system test phase to discover errors leads to more costly rework than resolving those same errors earlier in the lifecycle. Costs increase even more significantly once a software system has become operational. WE can assess the quality of system requirements, but do little to correlate this information either to system assurance activities or long-term reliability projections - both of which remain unclear and anecdotal. Extending earlier work on requirements accomplished by the ARM tool, measuring requirements quality information against code complexity and test data for the same system may be used to predict specific software modules containing high impact or deeply embedded faults now escaping in operational systems. Such knowledge would lead to more effective and efficient test programs. It may enable insight into whether a program should be maintained or started over.
NASA Technical Reports Server (NTRS)
Fitz, Rhonda; Whitman, Gerek
2016-01-01
Research into complexities of software systems Fault Management (FM) and how architectural design decisions affect safety, preservation of assets, and maintenance of desired system functionality has coalesced into a technical reference (TR) suite that advances the provision of safety and mission assurance. The NASA Independent Verification and Validation (IVV) Program, with Software Assurance Research Program support, extracted FM architectures across the IVV portfolio to evaluate robustness, assess visibility for validation and test, and define software assurance methods applied to the architectures and designs. This investigation spanned IVV projects with seven different primary developers, a wide range of sizes and complexities, and encompassed Deep Space Robotic, Human Spaceflight, and Earth Orbiter mission FM architectures. The initiative continues with an expansion of the TR suite to include Launch Vehicles, adding the benefit of investigating differences intrinsic to model-based FM architectures and insight into complexities of FM within an Agile software development environment, in order to improve awareness of how nontraditional processes affect FM architectural design and system health management.
A framework for software fault tolerance in real-time systems
NASA Technical Reports Server (NTRS)
Anderson, T.; Knight, J. C.
1983-01-01
A classification scheme for errors and a technique for the provision of software fault tolerance in cyclic real-time systems is presented. The technique requires that the process structure of a system be represented by a synchronization graph which is used by an executive as a specification of the relative times at which they will communicate during execution. Communication between concurrent processes is severely limited and may only take place between processes engaged in an exchange. A history of error occurrences is maintained by an error handler. When an error is detected, the error handler classifies it using the error history information and then initiates appropriate recovery action.
The implementation and use of Ada on distributed systems with high reliability requirements
NASA Technical Reports Server (NTRS)
Knight, J. C.
1984-01-01
The use and implementation of Ada in distributed environments in which reliability is the primary concern is investigated. Emphasis is placed on the possibility that a distributed system may be programmed entirely in ADA so that the individual tasks of the system are unconcerned with which processors they are executing on, and that failures may occur in the software or underlying hardware. The primary activities are: (1) Continued development and testing of our fault-tolerant Ada testbed; (2) consideration of desirable language changes to allow Ada to provide useful semantics for failure; (3) analysis of the inadequacies of existing software fault tolerance strategies.
NASA Technical Reports Server (NTRS)
Bodechtel, J. (Principal Investigator)
1975-01-01
The author has identified the following significant results. The geological interpretation on data exhibiting the Italian peninsula led to the recognition of tectonic features which are explained by a clockwise rotation of various blocks along left-handed transform faults. These faults can be interpreted as resulting from shear due to main stress directed north-eastwards. A land use map of the mountainous regions of Italy was produced on a scale of 1:250,000. For the digital treatment of MSS-CCTs an image processing software was written in FORTRAN 4. The software package includes descriptive statistics and also classification algorithms.
The Development of Design Tools for Fault Tolerant Quantum Dot Cellular Automata Based Logic
NASA Technical Reports Server (NTRS)
Armstrong, Curtis D.; Humphreys, William M.
2003-01-01
We are developing software to explore the fault tolerance of quantum dot cellular automata gate architectures in the presence of manufacturing variations and device defects. The Topology Optimization Methodology using Applied Statistics (TOMAS) framework extends the capabilities of the A Quantum Interconnected Network Array Simulator (AQUINAS) by adding front-end and back-end software and creating an environment that integrates all of these components. The front-end tools establish all simulation parameters, configure the simulation system, automate the Monte Carlo generation of simulation files, and execute the simulation of these files. The back-end tools perform automated data parsing, statistical analysis and report generation.
Monitoring microearthquakes with the San Andreas fault observatory at depth
Oye, V.; Ellsworth, W.L.
2007-01-01
In 2005, the San Andreas Fault Observatory at Depth (SAFOD) was drilled through the San Andreas Fault zone at a depth of about 3.1 km. The borehole has subsequently been instrumented with high-frequency geophones in order to better constrain locations and source processes of nearby microearthquakes that will be targeted in the upcoming phase of SAFOD. The microseismic monitoring software MIMO, developed by NORSAR, has been installed at SAFOD to provide near-real time locations and magnitude estimates using the high sampling rate (4000 Hz) waveform data. To improve the detection and location accuracy, we incorporate data from the nearby, shallow borehole (???250 m) seismometers of the High Resolution Seismic Network (HRSN). The event association algorithm of the MIMO software incorporates HRSN detections provided by the USGS real time earthworm software. The concept of the new event association is based on the generalized beam forming, primarily used in array seismology. The method requires the pre-computation of theoretical travel times in a 3D grid of potential microearthquake locations to the seismometers of the current station network. By minimizing the differences between theoretical and observed detection times an event is associated and the location accuracy is significantly improved.
Fuzzy logic based on-line fault detection and classification in transmission line.
Adhikari, Shuma; Sinha, Nidul; Dorendrajit, Thingam
2016-01-01
This study presents fuzzy logic based online fault detection and classification of transmission line using Programmable Automation and Control technology based National Instrument Compact Reconfigurable i/o (CRIO) devices. The LabVIEW software combined with CRIO can perform real time data acquisition of transmission line. When fault occurs in the system current waveforms are distorted due to transients and their pattern changes according to the type of fault in the system. The three phase alternating current, zero sequence and positive sequence current data generated by LabVIEW through CRIO-9067 are processed directly for relaying. The result shows that proposed technique is capable of right tripping action and classification of type of fault at high speed therefore can be employed in practical application.
Airborne Advanced Reconfigurable Computer System (ARCS)
NASA Technical Reports Server (NTRS)
Bjurman, B. E.; Jenkins, G. M.; Masreliez, C. J.; Mcclellan, K. L.; Templeman, J. E.
1976-01-01
A digital computer subsystem fault-tolerant concept was defined, and the potential benefits and costs of such a subsystem were assessed when used as the central element of a new transport's flight control system. The derived advanced reconfigurable computer system (ARCS) is a triple-redundant computer subsystem that automatically reconfigures, under multiple fault conditions, from triplex to duplex to simplex operation, with redundancy recovery if the fault condition is transient. The study included criteria development covering factors at the aircraft's operation level that would influence the design of a fault-tolerant system for commercial airline use. A new reliability analysis tool was developed for evaluating redundant, fault-tolerant system availability and survivability; and a stringent digital system software design methodology was used to achieve design/implementation visibility.
Various Indices for Diagnosis of Air-gap Eccentricity Fault in Induction Motor-A Review
NASA Astrophysics Data System (ADS)
Nikhil; Mathew, Lini, Dr.; Sharma, Amandeep
2018-03-01
From the past few years, research has gained an ardent pace in the field of fault detection and diagnosis in induction motors. In the current scenario, software is being introduced with diagnostic features to improve stability and reliability in fault diagnostic techniques. Human involvement in decision making for fault detection is slowly being replaced by Artificial Intelligence techniques. In this paper, a brief introduction of eccentricity fault is presented along with their causes and effects on the health of induction motors. Various indices used to detect eccentricity are being introduced along with their boundary conditions and their future scope of research. At last, merits and demerits of all indices are discussed and a comparison is made between them.
Tree-space statistics and approximations for large-scale analysis of anatomical trees.
Feragen, Aasa; Owen, Megan; Petersen, Jens; Wille, Mathilde M W; Thomsen, Laura H; Dirksen, Asger; de Bruijne, Marleen
2013-01-01
Statistical analysis of anatomical trees is hard to perform due to differences in the topological structure of the trees. In this paper we define statistical properties of leaf-labeled anatomical trees with geometric edge attributes by considering the anatomical trees as points in the geometric space of leaf-labeled trees. This tree-space is a geodesic metric space where any two trees are connected by a unique shortest path, which corresponds to a tree deformation. However, tree-space is not a manifold, and the usual strategy of performing statistical analysis in a tangent space and projecting onto tree-space is not available. Using tree-space and its shortest paths, a variety of statistical properties, such as mean, principal component, hypothesis testing and linear discriminant analysis can be defined. For some of these properties it is still an open problem how to compute them; others (like the mean) can be computed, but efficient alternatives are helpful in speeding up algorithms that use means iteratively, like hypothesis testing. In this paper, we take advantage of a very large dataset (N = 8016) to obtain computable approximations, under the assumption that the data trees parametrize the relevant parts of tree-space well. Using the developed approximate statistics, we illustrate how the structure and geometry of airway trees vary across a population and show that airway trees with Chronic Obstructive Pulmonary Disease come from a different distribution in tree-space than healthy ones. Software is available from http://image.diku.dk/aasa/software.php.
A field-to-desktop toolchain for X-ray CT densitometry enables tree ring analysis.
De Mil, Tom; Vannoppen, Astrid; Beeckman, Hans; Van Acker, Joris; Van den Bulcke, Jan
2016-06-01
Disentangling tree growth requires more than ring width data only. Densitometry is considered a valuable proxy, yet laborious wood sample preparation and lack of dedicated software limit the widespread use of density profiling for tree ring analysis. An X-ray computed tomography-based toolchain of tree increment cores is presented, which results in profile data sets suitable for visual exploration as well as density-based pattern matching. Two temperate (Quercus petraea, Fagus sylvatica) and one tropical species (Terminalia superba) were used for density profiling using an X-ray computed tomography facility with custom-made sample holders and dedicated processing software. Density-based pattern matching is developed and able to detect anomalies in ring series that can be corrected via interactive software. A digital workflow allows generation of structure-corrected profiles of large sets of cores in a short time span that provide sufficient intra-annual density information for tree ring analysis. Furthermore, visual exploration of such data sets is of high value. The dated profiles can be used for high-resolution chronologies and also offer opportunities for fast screening of lesser studied tropical tree species. © The Author 2016. Published by Oxford University Press on behalf of the Annals of Botany Company. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
NASA Astrophysics Data System (ADS)
Bonanno, Emanuele; Bonini, Lorenzo; Basili, Roberto; Toscani, Giovanni; Seno, Silvio
2016-04-01
Fault-related folding kinematic models are widely used to explain accommodation of crustal shortening. These models, however, include simplifications, such as the assumption of constant growth rate of faults. This value sometimes is not constant in isotropic materials, and even more variable if one considers naturally anisotropic geological systems. , This means that these simplifications could lead to incorrect interpretations of the reality. In this study, we use analogue models to evaluate how thin, mechanical discontinuities, such as beddings or thin weak layers, influence the propagation of reverse faults and related folds. The experiments are performed with two different settings to simulate initially-blind master faults dipping at 30° and 45°. The 30° dip represents one of the Andersonian conjugate fault, and 45° dip is very frequent in positive reactivation of normal faults. The experimental apparatus consists of a clay layer placed above two plates: one plate, the footwall, is fixed; the other one, the hanging wall, is mobile. Motor-controlled sliding of the hanging wall plate along an inclined plane reproduces the reverse fault movement. We run thirty-six experiments: eighteen with dip of 30° and eighteen with dip of 45°. For each dip-angle setting, we initially run isotropic experiments that serve as a reference. Then, we run the other experiments with one or two discontinuities (horizontal precuts performed into the clay layer). We monitored the experiments collecting side photographs every 1.0 mm of displacement of the master fault. These images have been analyzed through PIVlab software, a tool based on the Digital Image Correlation method. With the "displacement field analysis" (one of the PIVlab tools) we evaluated, the variation of the trishear zone shape and how the master-fault tip and newly-formed faults propagate into the clay medium. With the "strain distribution analysis", we observed the amount of the on-fault and off-fault deformation with respect to the faulting pattern and evolution. Secondly, using MOVE software, we extracted the positions of fault tips and folds every 5 mm of displacement on the master fault. Analyzing these positions in all of the experiments, we found that the growth rate of the faults and the related fold shape vary depending on the number of discontinuities in the clay medium. Other results can be summarized as follows: 1) the fault growth rate is not constant, but varies especially while the new faults interacts with precuts; 2) the new faults tend to crosscut the discontinuities when the angle between them is approximately 90°; 3) the trishear zone change its shape during the experiments especially when the main fault interacts with the discontinuities.
Wang, Xiupin; Peng, Qingzhi; Li, Peiwu; Zhang, Qi; Ding, Xiaoxia; Zhang, Wen; Zhang, Liangxiao
2016-10-12
High complexity of identification for non-target triacylglycerols (TAGs) is a major challenge in lipidomics analysis. To identify non-target TAGs, a powerful tool named accurate MS(n) spectrometry generating so-called ion trees is used. In this paper, we presented a technique for efficient structural elucidation of TAGs on MS(n) spectral trees produced by LTQ Orbitrap MS(n), which was implemented as an open source software package, or TIT. The TIT software was used to support automatic annotation of non-target TAGs on MS(n) ion trees from a self-built fragment ion database. This database includes 19108 simulate TAG molecules from a random combination of fatty acids and corresponding 500582 self-built multistage fragment ions (MS ≤ 3). Our software can identify TAGs using a "stage-by-stage elimination" strategy. By utilizing the MS(1) accurate mass and referenced RKMD, the TIT software can discriminate unique elemental composition candidates. The regiospecific isomers of fatty acyl chains will be distinguished using MS(2) and MS(3) fragment spectra. We applied the algorithm to the selection of 45 TAG standards and demonstrated that the molecular ions could be 100% correctly assigned. Therefore, the TIT software could be applied to TAG identification in complex biological samples such as mouse plasma extracts. Copyright © 2016 Elsevier B.V. All rights reserved.
Model-Based Fault Diagnosis: Performing Root Cause and Impact Analyses in Real Time
NASA Technical Reports Server (NTRS)
Figueroa, Jorge F.; Walker, Mark G.; Kapadia, Ravi; Morris, Jonathan
2012-01-01
Generic, object-oriented fault models, built according to causal-directed graph theory, have been integrated into an overall software architecture dedicated to monitoring and predicting the health of mission- critical systems. Processing over the generic fault models is triggered by event detection logic that is defined according to the specific functional requirements of the system and its components. Once triggered, the fault models provide an automated way for performing both upstream root cause analysis (RCA), and for predicting downstream effects or impact analysis. The methodology has been applied to integrated system health management (ISHM) implementations at NASA SSC's Rocket Engine Test Stands (RETS).
Dataflow models for fault-tolerant control systems
NASA Technical Reports Server (NTRS)
Papadopoulos, G. M.
1984-01-01
Dataflow concepts are used to generate a unified hardware/software model of redundant physical systems which are prone to faults. Basic results in input congruence and synchronization are shown to reduce to a simple model of data exchanges between processing sites. Procedures are given for the construction of congruence schemata, the distinguishing features of any correctly designed redundant system.
Fault Injection Campaign for a Fault Tolerant Duplex Framework
NASA Technical Reports Server (NTRS)
Sacco, Gian Franco; Ferraro, Robert D.; von llmen, Paul; Rennels, Dave A.
2007-01-01
Fault tolerance is an efficient approach adopted to avoid or reduce the damage of a system failure. In this work we present the results of a fault injection campaign we conducted on the Duplex Framework (DF). The DF is a software developed by the UCLA group [1, 2] that uses a fault tolerant approach and allows to run two replicas of the same process on two different nodes of a commercial off-the-shelf (COTS) computer cluster. A third process running on a different node, constantly monitors the results computed by the two replicas, and eventually restarts the two replica processes if an inconsistency in their computation is detected. This approach is very cost efficient and can be adopted to control processes on spacecrafts where the fault rate produced by cosmic rays is not very high.
NASA Technical Reports Server (NTRS)
Malin, Jane T.; Schrenkenghost, Debra K.
2001-01-01
The Adjustable Autonomy Testbed (AAT) is a simulation-based testbed located in the Intelligent Systems Laboratory in the Automation, Robotics and Simulation Division at NASA Johnson Space Center. The purpose of the testbed is to support evaluation and validation of prototypes of adjustable autonomous agent software for control and fault management for complex systems. The AA T project has developed prototype adjustable autonomous agent software and human interfaces for cooperative fault management. This software builds on current autonomous agent technology by altering the architecture, components and interfaces for effective teamwork between autonomous systems and human experts. Autonomous agents include a planner, flexible executive, low level control and deductive model-based fault isolation. Adjustable autonomy is intended to increase the flexibility and effectiveness of fault management with an autonomous system. The test domain for this work is control of advanced life support systems for habitats for planetary exploration. The CONFIG hybrid discrete event simulation environment provides flexible and dynamically reconfigurable models of the behavior of components and fluids in the life support systems. Both discrete event and continuous (discrete time) simulation are supported, and flows and pressures are computed globally. This provides fast dynamic simulations of interacting hardware systems in closed loops that can be reconfigured during operations scenarios, producing complex cascading effects of operations and failures. Current object-oriented model libraries support modeling of fluid systems, and models have been developed of physico-chemical and biological subsystems for processing advanced life support gases. In FY01, water recovery system models will be developed.
AADL Fault Modeling and Analysis Within an ARP4761 Safety Assessment
2014-10-01
Analysis Generator 27 3.2.3 Mapping to OpenFTA Format File 27 3.2.4 Mapping to Generic XML Format 28 3.2.5 AADL and FTA Mapping Rules 28 3.2.6 Issues...PSSA), System Safety Assessment (SSA), Common Cause Analysis (CCA), Fault Tree Analysis ( FTA ), Failure Modes and Effects Analysis (FMEA), Failure...Modes and Effects Summary, Mar - kov Analysis (MA), and Dependence Diagrams (DDs), also referred to as Reliability Block Dia- grams (RBDs). The
Unsupervised Learning —A Novel Clustering Method for Rolling Bearing Faults Identification
NASA Astrophysics Data System (ADS)
Kai, Li; Bo, Luo; Tao, Ma; Xuefeng, Yang; Guangming, Wang
2017-12-01
To promptly process the massive fault data and automatically provide accurate diagnosis results, numerous studies have been conducted on intelligent fault diagnosis of rolling bearing. Among these studies, such as artificial neural networks, support vector machines, decision trees and other supervised learning methods are used commonly. These methods can detect the failure of rolling bearing effectively, but to achieve better detection results, it often requires a lot of training samples. Based on above, a novel clustering method is proposed in this paper. This novel method is able to find the correct number of clusters automatically the effectiveness of the proposed method is validated using datasets from rolling element bearings. The diagnosis results show that the proposed method can accurately detect the fault types of small samples. Meanwhile, the diagnosis results are also relative high accuracy even for massive samples.
Reliable High Performance Peta- and Exa-Scale Computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bronevetsky, G
2012-04-02
As supercomputers become larger and more powerful, they are growing increasingly complex. This is reflected both in the exponentially increasing numbers of components in HPC systems (LLNL is currently installing the 1.6 million core Sequoia system) as well as the wide variety of software and hardware components that a typical system includes. At this scale it becomes infeasible to make each component sufficiently reliable to prevent regular faults somewhere in the system or to account for all possible cross-component interactions. The resulting faults and instability cause HPC applications to crash, perform sub-optimally or even produce erroneous results. As supercomputers continuemore » to approach Exascale performance and full system reliability becomes prohibitively expensive, we will require novel techniques to bridge the gap between the lower reliability provided by hardware systems and users unchanging need for consistent performance and reliable results. Previous research on HPC system reliability has developed various techniques for tolerating and detecting various types of faults. However, these techniques have seen very limited real applicability because of our poor understanding of how real systems are affected by complex faults such as soft fault-induced bit flips or performance degradations. Prior work on such techniques has had very limited practical utility because it has generally focused on analyzing the behavior of entire software/hardware systems both during normal operation and in the face of faults. Because such behaviors are extremely complex, such studies have only produced coarse behavioral models of limited sets of software/hardware system stacks. Since this provides little insight into the many different system stacks and applications used in practice, this work has had little real-world impact. My project addresses this problem by developing a modular methodology to analyze the behavior of applications and systems during both normal and faulty operation. By synthesizing models of individual components into a whole-system behavior models my work is making it possible to automatically understand the behavior of arbitrary real-world systems to enable them to tolerate a wide range of system faults. My project is following a multi-pronged research strategy. Section II discusses my work on modeling the behavior of existing applications and systems. Section II.A discusses resilience in the face of soft faults and Section II.B looks at techniques to tolerate performance faults. Finally Section III presents an alternative approach that studies how a system should be designed from the ground up to make resilience natural and easy.« less
Fault Analysis on Bevel Gear Teeth Surface Damage of Aeroengine
NASA Astrophysics Data System (ADS)
Cheng, Li; Chen, Lishun; Li, Silu; Liang, Tao
2017-12-01
Aiming at the trouble phenomenon for bevel gear teeth surface damage of Aero-engine, Fault Tree of bevel gear teeth surface damage was drawing by logical relations, the possible cause of trouble was analyzed, scanning electron-microscope, energy spectrum analysis, Metallographic examination, hardness measurement and other analysis means were adopted to investigate the spall gear tooth. The results showed that Material composition, Metallographic structure, Micro-hardness, Carburization depth of the fault bevel gear accord with technical requirements. Contact fatigue spall defect caused bevel gear teeth surface damage. The small magnitude of Interference of accessory gearbox install hole and driving bevel gear bearing seat was mainly caused. Improved measures were proposed, after proof, Thermoelement measures are effective.
NASA Technical Reports Server (NTRS)
Ferrell, Bob A.; Lewis, Mark E.; Perotti, Jose M.; Brown, Barbara L.; Oostdyk, Rebecca L.; Goetz, Jesse W.
2010-01-01
This paper's main purpose is to detail issues and lessons learned regarding designing, integrating, and implementing Fault Detection Isolation and Recovery (FDIR) for Constellation Exploration Program (CxP) Ground Operations at Kennedy Space Center (KSC). Part of the0 overall implementation of National Aeronautics and Space Administration's (NASA's) CxP, FDIR is being implemented in three main components of the program (Ares, Orion, and Ground Operations/Processing). While not initially part of the design baseline for the CxP Ground Operations, NASA felt that FDIR is important enough to develop, that NASA's Exploration Systems Mission Directorate's (ESMD's) Exploration Technology Development Program (ETDP) initiated a task for it under their Integrated System Health Management (ISHM) research area. This task, referred to as the FDIIR project, is a multi-year multi-center effort. The primary purpose of the FDIR project is to develop a prototype and pathway upon which Fault Detection and Isolation (FDI) may be transitioned into the Ground Operations baseline. Currently, Qualtech Systems Inc (QSI) Commercial Off The Shelf (COTS) software products Testability Engineering and Maintenance System (TEAMS) Designer and TEAMS RDS/RT are being utilized in the implementation of FDI within the FDIR project. The TEAMS Designer COTS software product is being utilized to model the system with Functional Fault Models (FFMs). A limited set of systems in Ground Operations are being modeled by the FDIR project, and the entire Ares Launch Vehicle is being modeled under the Functional Fault Analysis (FFA) project at Marshall Space Flight Center (MSFC). Integration of the Ares FFMs and the Ground Processing FFMs is being done under the FDIR project also utilizing the TEAMS Designer COTS software product. One of the most significant challenges related to integration is to ensure that FFMs developed by different organizations can be integrated easily and without errors. Software Interface Control Documents (ICDs) for the FFMs and their usage will be addressed as the solution to this issue. In particular, the advantages and disadvantages of these ICDs across physically separate development groups will be delineated.
Software Health Management: A Short Review of Challenges and Existing Techniques
NASA Technical Reports Server (NTRS)
Pipatsrisawat, Knot; Darwiche, Adnan; Mengshoel, Ole J.; Schumann, Johann
2009-01-01
Modern spacecraft (as well as most other complex mechanisms like aircraft, automobiles, and chemical plants) rely more and more on software, to a point where software failures have caused severe accidents and loss of missions. Software failures during a manned mission can cause loss of life, so there are severe requirements to make the software as safe and reliable as possible. Typically, verification and validation (V&V) has the task of making sure that all software errors are found before the software is deployed and that it always conforms to the requirements. Experience, however, shows that this gold standard of error-free software cannot be reached in practice. Even if the software alone is free of glitches, its interoperation with the hardware (e.g., with sensors or actuators) can cause problems. Unexpected operational conditions or changes in the environment may ultimately cause a software system to fail. Is there a way to surmount this problem? In most modern aircraft and many automobiles, hardware such as central electrical, mechanical, and hydraulic components are monitored by IVHM (Integrated Vehicle Health Management) systems. These systems can recognize, isolate, and identify faults and failures, both those that already occurred as well as imminent ones. With the help of diagnostics and prognostics, appropriate mitigation strategies can be selected (replacement or repair, switch to redundant systems, etc.). In this short paper, we discuss some challenges and promising techniques for software health management (SWHM). In particular, we identify unique challenges for preventing software failure in systems which involve both software and hardware components. We then present our classifications of techniques related to SWHM. These classifications are performed based on dimensions of interest to both developers and users of the techniques, and hopefully provide a map for dealing with software faults and failures.
Goal-Function Tree Modeling for Systems Engineering and Fault Management
NASA Technical Reports Server (NTRS)
Johnson, Stephen B.; Breckenridge, Jonathan T.
2013-01-01
This paper describes a new representation that enables rigorous definition and decomposition of both nominal and off-nominal system goals and functions: the Goal-Function Tree (GFT). GFTs extend the concept and process of functional decomposition, utilizing state variables as a key mechanism to ensure physical and logical consistency and completeness of the decomposition of goals (requirements) and functions, and enabling full and complete traceabilitiy to the design. The GFT also provides for means to define and represent off-nominal goals and functions that are activated when the system's nominal goals are not met. The physical accuracy of the GFT, and its ability to represent both nominal and off-nominal goals enable the GFT to be used for various analyses of the system, including assessments of the completeness and traceability of system goals and functions, the coverage of fault management failure detections, and definition of system failure scenarios.
Risk management of PPP project in the preparation stage based on Fault Tree Analysis
NASA Astrophysics Data System (ADS)
Xing, Yuanzhi; Guan, Qiuling
2017-03-01
The risk management of PPP(Public Private Partnership) project can improve the level of risk control between government departments and private investors, so as to make more beneficial decisions, reduce investment losses and achieve mutual benefit as well. Therefore, this paper takes the PPP project preparation stage venture as the research object to identify and confirm four types of risks. At the same time, fault tree analysis(FTA) is used to evaluate the risk factors that belong to different parts, and quantify the influencing degree of risk impact on the basis of risk identification. In addition, it determines the importance order of risk factors by calculating unit structure importance on PPP project preparation stage. The result shows that accuracy of government decision-making, rationality of private investors funds allocation and instability of market returns are the main factors to generate the shared risk on the project.
Rocket engine system reliability analyses using probabilistic and fuzzy logic techniques
NASA Technical Reports Server (NTRS)
Hardy, Terry L.; Rapp, Douglas C.
1994-01-01
The reliability of rocket engine systems was analyzed by using probabilistic and fuzzy logic techniques. Fault trees were developed for integrated modular engine (IME) and discrete engine systems, and then were used with the two techniques to quantify reliability. The IRRAS (Integrated Reliability and Risk Analysis System) computer code, developed for the U.S. Nuclear Regulatory Commission, was used for the probabilistic analyses, and FUZZYFTA (Fuzzy Fault Tree Analysis), a code developed at NASA Lewis Research Center, was used for the fuzzy logic analyses. Although both techniques provided estimates of the reliability of the IME and discrete systems, probabilistic techniques emphasized uncertainty resulting from randomness in the system whereas fuzzy logic techniques emphasized uncertainty resulting from vagueness in the system. Because uncertainty can have both random and vague components, both techniques were found to be useful tools in the analysis of rocket engine system reliability.
Enterprise architecture availability analysis using fault trees and stakeholder interviews
NASA Astrophysics Data System (ADS)
Närman, Per; Franke, Ulrik; König, Johan; Buschle, Markus; Ekstedt, Mathias
2014-01-01
The availability of enterprise information systems is a key concern for many organisations. This article describes a method for availability analysis based on Fault Tree Analysis and constructs from the ArchiMate enterprise architecture (EA) language. To test the quality of the method, several case-studies within the banking and electrical utility industries were performed. Input data were collected through stakeholder interviews. The results from the case studies were compared with availability of log data to determine the accuracy of the method's predictions. In the five cases where accurate log data were available, the yearly downtime estimates were within eight hours from the actual downtimes. The cost of performing the analysis was low; no case study required more than 20 man-hours of work, making the method ideal for practitioners with an interest in obtaining rapid availability estimates of their enterprise information systems.
Khan, F I; Iqbal, A; Ramesh, N; Abbasi, S A
2001-10-12
As it is conventionally done, strategies for incorporating accident--prevention measures in any hazardous chemical process industry are developed on the basis of input from risk assessment. However, the two steps-- risk assessment and hazard reduction (or safety) measures--are not linked interactively in the existing methodologies. This prevents a quantitative assessment of the impacts of safety measures on risk control. We have made an attempt to develop a methodology in which risk assessment steps are interactively linked with implementation of safety measures. The resultant system tells us the extent of reduction of risk by each successive safety measure. It also tells based on sophisticated maximum credible accident analysis (MCAA) and probabilistic fault tree analysis (PFTA) whether a given unit can ever be made 'safe'. The application of the methodology has been illustrated with a case study.
Uncertainty analysis in fault tree models with dependent basic events.
Pedroni, Nicola; Zio, Enrico
2013-06-01
In general, two types of dependence need to be considered when estimating the probability of the top event (TE) of a fault tree (FT): "objective" dependence between the (random) occurrences of different basic events (BEs) in the FT and "state-of-knowledge" (epistemic) dependence between estimates of the epistemically uncertain probabilities of some BEs of the FT model. In this article, we study the effects on the TE probability of objective and epistemic dependences. The well-known Frèchet bounds and the distribution envelope determination (DEnv) method are used to model all kinds of (possibly unknown) objective and epistemic dependences, respectively. For exemplification, the analyses are carried out on a FT with six BEs. Results show that both types of dependence significantly affect the TE probability; however, the effects of epistemic dependence are likely to be overwhelmed by those of objective dependence (if present). © 2012 Society for Risk Analysis.
A fault tree model to assess probability of contaminant discharge from shipwrecks.
Landquist, H; Rosén, L; Lindhe, A; Norberg, T; Hassellöv, I-M; Lindgren, J F; Dahllöf, I
2014-11-15
Shipwrecks on the sea floor around the world may contain hazardous substances that can cause harm to the marine environment. Today there are no comprehensive methods for environmental risk assessment of shipwrecks, and thus there is poor support for decision-making on prioritization of mitigation measures. The purpose of this study was to develop a tool for quantitative risk estimation of potentially polluting shipwrecks, and in particular an estimation of the annual probability of hazardous substance discharge. The assessment of the probability of discharge is performed using fault tree analysis, facilitating quantification of the probability with respect to a set of identified hazardous events. This approach enables a structured assessment providing transparent uncertainty and sensitivity analyses. The model facilitates quantification of risk, quantification of the uncertainties in the risk calculation and identification of parameters to be investigated further in order to obtain a more reliable risk calculation. Copyright © 2014 Elsevier Ltd. All rights reserved.
Qualitative Importance Measures of Systems Components - A New Approach and Its Applications
NASA Astrophysics Data System (ADS)
Chybowski, Leszek; Gawdzińska, Katarzyna; Wiśnicki, Bogusz
2016-12-01
The paper presents an improved methodology of analysing the qualitative importance of components in the functional and reliability structures of the system. We present basic importance measures, i.e. the Birnbaum's structural measure, the order of the smallest minimal cut-set, the repetition count of an i-th event in the Fault Tree and the streams measure. A subsystem of circulation pumps and fuel heaters in the main engine fuel supply system of a container vessel illustrates the qualitative importance analysis. We constructed a functional model and a Fault Tree which we analysed using qualitative measures. Additionally, we compared the calculated measures and introduced corrected measures as a tool for improving the analysis. We proposed scaled measures and a common measure taking into account the location of the component in the reliability and functional structures. Finally, we proposed an area where the measures could be applied.
Schwartz, D.P.; Pantosti, D.; Okumura, K.; Powers, T.J.; Hamilton, J.C.
1998-01-01
Trenching, microgeomorphic mapping, and tree ring analysis provide information on timing of paleoearthquakes and behavior of the San Andreas fault in the Santa Cruz mountains. At the Grizzly Flat site alluvial units dated at 1640-1659 A.D., 1679-1894 A.D., 1668-1893 A.D., and the present ground surface are displaced by a single event. This was the 1906 surface rupture. Combined trench dates and tree ring analysis suggest that the penultimate event occurred in the mid-1600s, possibly in an interval as narrow as 1632-1659 A.D. There is no direct evidence in the trenches for the 1838 or 1865 earthquakes, which have been proposed as occurring on this part of the fault zone. In a minimum time of about 340 years only one large surface faulting event (1906) occurred at Grizzly Flat, in contrast to previous recurrence estimates of 95-110 years for the Santa Cruz mountains segment. Comparison with dates of the penultimate San Andreas earthquake at sites north of San Francisco suggests that the San Andreas fault between Point Arena and the Santa Cruz mountains may have failed either as a sequence of closely timed earthquakes on adjacent segments or as a single long rupture similar in length to the 1906 rupture around the mid-1600s. The 1906 coseismic geodetic slip and the late Holocene geologic slip rate on the San Francisco peninsula and southward are about 50-70% and 70% of their values north of San Francisco, respectively. The slip gradient along the 1906 rupture section of the San Andreas reflects partitioning of plate boundary slip onto the San Gregorio, Sargent, and other faults south of the Golden Gate. If a mid-1600s event ruptured the same section of the fault that failed in 1906, it supports the concept that long strike-slip faults can contain master rupture segments that repeat in both length and slip distribution. Recognition of a persistent slip rate gradient along the northern San Andreas fault and the concept of a master segment remove the requirement that lower slip sections of large events such as 1906 must fill in on a periodic basis with smaller and more frequent earthquakes.
Proceedings of the 14th Annual Software Engineering Workshop
NASA Technical Reports Server (NTRS)
1989-01-01
Several software related topics are presented. Topics covered include studies and experiment at the Software Engineering Laboratory at the Goddard Space Flight Center, predicting project success from the Software Project Management Process, software environments, testing in a reuse environment, domain directed reuse, and classification tree analysis using the Amadeus measurement and empirical analysis.
Moran, Michael J.; Wilson, Jon W.; Beard, L. Sue
2015-11-03
Several major faults, including the Salt Cedar Fault and the Palm Tree Fault, play an important role in the movement of groundwater. Groundwater may move along these faults and discharge where faults intersect volcanic breccias or fractured rock. Vertical movement of groundwater along faults is suggested as a mechanism for the introduction of heat energy present in groundwater from many of the springs. Groundwater altitudes in the study area indicate a potential for flow from Eldorado Valley to Black Canyon although current interpretations of the geology of this area do not favor such flow. If groundwater from Eldorado Valley discharges at springs in Black Canyon then the development of groundwater resources in Eldorado Valley could result in a decrease in discharge from the springs. Geology and structure indicate that it is not likely that groundwater can move between Detrital Valley and Black Canyon. Thus, the development of groundwater resources in Detrital Valley may not result in a decrease in discharge from springs in Black Canyon.
Abstract for 1999 Rational Software User Conference
NASA Technical Reports Server (NTRS)
Dunphy, Julia; Rouquette, Nicolas; Feather, Martin; Tung, Yu-Wen
1999-01-01
We develop spacecraft fault-protection software at NASA/JPL. Challenges exemplified by our task: 1) high-quality systems - need for extensive validation & verification; 2) multi-disciplinary context - involves experts from diverse areas; 3) embedded systems - must adapt to external practices, notations, etc.; and 4) development pressures - NASA's mandate of "better, faster, cheaper".
Software quality: Process or people
NASA Technical Reports Server (NTRS)
Palmer, Regina; Labaugh, Modenna
1993-01-01
This paper will present data related to software development processes and personnel involvement from the perspective of software quality assurance. We examine eight years of data collected from six projects. Data collected varied by project but usually included defect and fault density with limited use of code metrics, schedule adherence, and budget growth information. The data are a blend of AFSCP 800-14 and suggested productivity measures in Software Metrics: A Practioner's Guide to Improved Product Development. A software quality assurance database tool, SQUID, was used to store and tabulate the data.
A software upgrade method for micro-electronics medical implants.
Cao, Yang; Hao, Hongwei; Xue, Lin; Li, Luming; Ma, Bozhi
2006-01-01
A software upgrade method for micro-electronics medical implants is designed to enhance the devices' function or renew the software if there are some bugs found, the software updating or some memory units disabled. The implants needn't be replaced by operations if the faults can be corrected through reprogramming, which reduces the patients' pain and improves the safety effectively. This paper introduces the software upgrade method using in-application programming (IAP) and emphasizes how to insure the system, especially the implanted part's reliability and stability while upgrading.
Evaluation of acoustic tomography for tree decay detection
Shanquing Liang; Xiping Wang; Janice Wiedenbeck; Zhiyong Cai; Feng Fu
2008-01-01
In this study, the acoustic tomography technique was used to detect internal decay in high value black cherry (Prunus seratina) trees. Two-dimensional images of the cross sections of the tree samples were constructed using PiCUS Q70 software. The trees were felled following the field test, and a disc from each testing elevation was subsequently cut...
NASA Astrophysics Data System (ADS)
Abdelrhman, Ahmed M.; Sei Kien, Yong; Salman Leong, M.; Meng Hee, Lim; Al-Obaidi, Salah M. Ali
2017-07-01
The vibration signals produced by rotating machinery contain useful information for condition monitoring and fault diagnosis. Fault severities assessment is a challenging task. Wavelet Transform (WT) as a multivariate analysis tool is able to compromise between the time and frequency information in the signals and served as a de-noising method. The CWT scaling function gives different resolutions to the discretely signals such as very fine resolution at lower scale but coarser resolution at a higher scale. However, the computational cost increased as it needs to produce different signal resolutions. DWT has better low computation cost as the dilation function allowed the signals to be decomposed through a tree of low and high pass filters and no further analysing the high-frequency components. In this paper, a method for bearing faults identification is presented by combing Continuous Wavelet Transform (CWT) and Discrete Wavelet Transform (DWT) with envelope analysis for bearing fault diagnosis. The experimental data was sampled by Case Western Reserve University. The analysis result showed that the proposed method is effective in bearing faults detection, identify the exact fault’s location and severity assessment especially for the inner race and outer race faults.
Advanced microprocessor based power protection system using artificial neural network techniques
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Z.; Kalam, A.; Zayegh, A.
This paper describes an intelligent embedded microprocessor based system for fault classification in power system protection system using advanced 32-bit microprocessor technology. The paper demonstrates the development of protective relay to provide overcurrent protection schemes for fault detection. It also describes a method for power fault classification in three-phase system based on the use of neural network technology. The proposed design is implemented and tested on a single line three phase power system in power laboratory. Both the hardware and software development are described in detail.
(Quickly) Testing the Tester via Path Coverage
NASA Technical Reports Server (NTRS)
Groce, Alex
2009-01-01
The configuration complexity and code size of an automated testing framework may grow to a point that the tester itself becomes a significant software artifact, prone to poor configuration and implementation errors. Unfortunately, testing the tester by using old versions of the software under test (SUT) may be impractical or impossible: test framework changes may have been motivated by interface changes in the tested system, or fault detection may become too expensive in terms of computing time to justify running until errors are detected on older versions of the software. We propose the use of path coverage measures as a "quick and dirty" method for detecting many faults in complex test frameworks. We also note the possibility of using techniques developed to diversify state-space searches in model checking to diversify test focus, and an associated classification of tester changes into focus-changing and non-focus-changing modifications.
Advanced information processing system: Local system services
NASA Technical Reports Server (NTRS)
Burkhardt, Laura; Alger, Linda; Whittredge, Roy; Stasiowski, Peter
1989-01-01
The Advanced Information Processing System (AIPS) is a multi-computer architecture composed of hardware and software building blocks that can be configured to meet a broad range of application requirements. The hardware building blocks are fault-tolerant, general-purpose computers, fault-and damage-tolerant networks (both computer and input/output), and interfaces between the networks and the computers. The software building blocks are the major software functions: local system services, input/output, system services, inter-computer system services, and the system manager. The foundation of the local system services is an operating system with the functions required for a traditional real-time multi-tasking computer, such as task scheduling, inter-task communication, memory management, interrupt handling, and time maintenance. Resting on this foundation are the redundancy management functions necessary in a redundant computer and the status reporting functions required for an operator interface. The functional requirements, functional design and detailed specifications for all the local system services are documented.
A second generation experiment in fault-tolerant software
NASA Technical Reports Server (NTRS)
Knight, J. C.
1986-01-01
Information was collected on the efficacy of fault-tolerant software by conducting two large-scale controlled experiments. In the first, an empirical study of multi-version software (MVS) was conducted. The second experiment is an empirical evaluation of self testing as a method of error detection (STED). The purpose ot the MVS experiment was to obtain empirical measurement of the performance of multi-version systems. Twenty versions of a program were prepared at four different sites under reasonably realistic development conditions from the same specifications. The purpose of the STED experiment was to obtain empirical measurements of the performance of assertions in error detection. Eight versions of a program were modified to include assertions at two different sites under controlled conditions. The overall structure of the testing environment for the MVS experiment and its status are described. Work to date in the STED experiment is also presented.
An Integrated Crustal Dynamics Simulator
NASA Astrophysics Data System (ADS)
Xing, H. L.; Mora, P.
2007-12-01
Numerical modelling offers an outstanding opportunity to gain an understanding of the crustal dynamics and complex crustal system behaviour. This presentation provides our long-term and ongoing effort on finite element based computational model and software development to simulate the interacting fault system for earthquake forecasting. A R-minimum strategy based finite-element computational model and software tool, PANDAS, for modelling 3-dimensional nonlinear frictional contact behaviour between multiple deformable bodies with the arbitrarily-shaped contact element strategy has been developed by the authors, which builds up a virtual laboratory to simulate interacting fault systems including crustal boundary conditions and various nonlinearities (e.g. from frictional contact, materials, geometry and thermal coupling). It has been successfully applied to large scale computing of the complex nonlinear phenomena in the non-continuum media involving the nonlinear frictional instability, multiple material properties and complex geometries on supercomputers, such as the South Australia (SA) interacting fault system, South California fault model and Sumatra subduction model. It has been also extended and to simulate the hot fractured rock (HFR) geothermal reservoir system in collaboration of Geodynamics Ltd which is constructing the first geothermal reservoir system in Australia and to model the tsunami generation induced by earthquakes. Both are supported by Australian Research Council.
Three-Dimensional Geologic Map of the Hayward Fault Zone, San Francisco Bay Region, California
Phelps, G.A.; Graymer, R.W.; Jachens, R.C.; Ponce, D.A.; Simpson, R.W.; Wentworth, C.M.
2008-01-01
A three-dimensional (3D) geologic map of the Hayward Fault zone was created by integrating the results from geologic mapping, potential field geophysics, and seismology investigations. The map volume is 100 km long, 20 km wide, and extends to a depth of 12 km below sea level. The map volume is oriented northwest and is approximately bisected by the Hayward Fault. The complex geologic structure of the region makes it difficult to trace many geologic units into the subsurface. Therefore, the map units are generalized from 1:24,000-scale geologic maps. Descriptions of geologic units and structures are offered, along with a discussion of the methods used to map them and incorporate them into the 3D geologic map. The map spatial database and associated viewing software are provided. Elements of the map, such as individual fault surfaces, are also provided in a non-proprietary format so that the user can access the map via open-source software. The sheet accompanying this manuscript shows views taken from the 3D geologic map for the user to access. The 3D geologic map is designed as a multi-purpose resource for further geologic investigations and process modeling.
Investigation of Fuel Oil/Lube Oil Spray Fires On Board Vessels. Volume 3.
1998-11-01
U.S. Coast Guard Research and Development Center 1082 Shennecossett Road, Groton, CT 06340-6096 Report No. CG-D-01-99, III Investigation of Fuel ...refinery). Developed the technical and mathematical specifications for BRAVO™2.0, a state-of-the-art Windows program for performing event tree and fault...tree analyses. Also managed the development of and prepared the technical specifications for QRA ROOTS™, a Windows program for storing, searching K-4
1992-01-01
boost plenum which houses the camshaft . The compressed mixture is metered by a throttle to intake valves of the engine. The engine is constructed from...difficulties associated with a time-tagged fault tree . In particular, recent work indicates that the multi-layer perception architecture can give good fdi...Abstract: In the past decade, wastepaper recycling has gained a wider acceptance. Depletion of tree stocks, waste water treatment demands and
Interim reliability evaluation program, Browns Ferry 1
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mays, S.E.; Poloski, J.P.; Sullivan, W.H.
1981-01-01
Probabilistic risk analysis techniques, i.e., event tree and fault tree analysis, were utilized to provide a risk assessment of the Browns Ferry Nuclear Plant Unit 1. Browns Ferry 1 is a General Electric boiling water reactor of the BWR 4 product line with a Mark 1 (drywell and torus) containment. Within the guidelines of the IREP Procedure and Schedule Guide, dominant accident sequences that contribute to public health and safety risks were identified and grouped according to release categories.
Cost-effectiveness analysis of risk-reduction measures to reach water safety targets.
Lindhe, Andreas; Rosén, Lars; Norberg, Tommy; Bergstedt, Olof; Pettersson, Thomas J R
2011-01-01
Identifying the most suitable risk-reduction measures in drinking water systems requires a thorough analysis of possible alternatives. In addition to the effects on the risk level, also the economic aspects of the risk-reduction alternatives are commonly considered important. Drinking water supplies are complex systems and to avoid sub-optimisation of risk-reduction measures, the entire system from source to tap needs to be considered. There is a lack of methods for quantification of water supply risk reduction in an economic context for entire drinking water systems. The aim of this paper is to present a novel approach for risk assessment in combination with economic analysis to evaluate risk-reduction measures based on a source-to-tap approach. The approach combines a probabilistic and dynamic fault tree method with cost-effectiveness analysis (CEA). The developed approach comprises the following main parts: (1) quantification of risk reduction of alternatives using a probabilistic fault tree model of the entire system; (2) combination of the modelling results with CEA; and (3) evaluation of the alternatives with respect to the risk reduction, the probability of not reaching water safety targets and the cost-effectiveness. The fault tree method and CEA enable comparison of risk-reduction measures in the same quantitative unit and consider costs and uncertainties. The approach provides a structured and thorough analysis of risk-reduction measures that facilitates transparency and long-term planning of drinking water systems in order to avoid sub-optimisation of available resources for risk reduction. Copyright © 2010 Elsevier Ltd. All rights reserved.
NASA Technical Reports Server (NTRS)
Harper, Richard E.; Elks, Carl
1995-01-01
An Army Fault Tolerant Architecture (AFTA) has been developed to meet real-time fault tolerant processing requirements of future Army applications. AFTA is the enabling technology that will allow the Army to configure existing processors and other hardware to provide high throughput and ultrahigh reliability necessary for TF/TA/NOE flight control and other advanced Army applications. A comprehensive conceptual study of AFTA has been completed that addresses a wide range of issues including requirements, architecture, hardware, software, testability, producibility, analytical models, validation and verification, common mode faults, VHDL, and a fault tolerant data bus. A Brassboard AFTA for demonstration and validation has been fabricated, and two operating systems and a flight-critical Army application have been ported to it. Detailed performance measurements have been made of fault tolerance and operating system overheads while AFTA was executing the flight application in the presence of faults.
Novel elastic protection against DDF failures in an enhanced software-defined SIEPON
NASA Astrophysics Data System (ADS)
Pakpahan, Andrew Fernando; Hwang, I.-Shyan; Yu, Yu-Ming; Hsu, Wu-Hsiao; Liem, Andrew Tanny; Nikoukar, AliAkbar
2017-07-01
Ever-increasing bandwidth demands on passive optical networks (PONs) are pushing the utilization of every fiber strand to its limit. This is mandating comprehensive protection until the end of the distribution drop fiber (DDF). Hence, it is important to provide refined protection with an advanced fault-protection architecture and recovery mechanism that is able to cope with various DDF failures. We propose a novel elastic protection against DDF failures that incorporates a software-defined networking (SDN) capability and a bus protection line to enhance the resiliency of the existing Service Interoperability in Ethernet Passive Optical Networks (SIEPON) system. We propose the addition of an integrated SDN controller and flow tables to the optical line terminal and optical network units (ONUs) in order to deliver various DDF protection scenarios. The proposed architecture enables flexible assignment of backup ONU(s) in pre/post-fault conditions depending on the PON traffic load. A transient backup ONU and multiple backup ONUs can be deployed in the pre-fault and post-fault scenarios, respectively. Our extensively discussed simulation results show that our proposed architecture provides better overall throughput and drop probability compared to the architecture with a fixed DDF protection mechanism. It does so while still maintaining overall QoS performance in terms of packet delay, mean jitter, packet loss, and throughput under various fault conditions.
CARE3MENU- A CARE III USER FRIENDLY INTERFACE
NASA Technical Reports Server (NTRS)
Pierce, J. L.
1994-01-01
CARE3MENU generates an input file for the CARE III program. CARE III is used for reliability prediction of complex, redundant, fault-tolerant systems including digital computers, aircraft, nuclear and chemical control systems. The CARE III input file often becomes complicated and is not easily formatted with a text editor. CARE3MENU provides an easy, interactive method of creating an input file by automatically formatting a set of user-supplied inputs for the CARE III system. CARE3MENU provides detailed on-line help for most of its screen formats. The reliability model input process is divided into sections using menu-driven screen displays. Each stage, or set of identical modules comprising the model, must be identified and described in terms of number of modules, minimum number of modules for stage operation, and critical fault threshold. The fault handling and fault occurence models are detailed in several screens by parameters such as transition rates, propagation and detection densities, Weibull or exponential characteristics, and model accuracy. The system fault tree and critical pairs fault tree screens are used to define the governing logic and to identify modules affected by component failures. Additional CARE3MENU screens prompt the user for output options and run time control values such as mission time and truncation values. There are fourteen major screens, many with default values and HELP options. The documentation includes: 1) a users guide with several examples of CARE III models, the dialog required to input them to CARE3MENU, and the output files created; and 2) a maintenance manual for assistance in changing the HELP files and modifying any of the menu formats or contents. CARE3MENU is written in FORTRAN 77 for interactive execution and has been implemented on a DEC VAX series computer operating under VMS. This program was developed in 1985.
NASA Astrophysics Data System (ADS)
Polverino, Pierpaolo; Pianese, Cesare; Sorrentino, Marco; Marra, Dario
2015-04-01
The paper focuses on the design of a procedure for the development of an on-field diagnostic algorithm for solid oxide fuel cell (SOFC) systems. The diagnosis design phase relies on an in-deep analysis of the mutual interactions among all system components by exploiting the physical knowledge of the SOFC system as a whole. This phase consists of the Fault Tree Analysis (FTA), which identifies the correlations among possible faults and their corresponding symptoms at system components level. The main outcome of the FTA is an inferential isolation tool (Fault Signature Matrix - FSM), which univocally links the faults to the symptoms detected during the system monitoring. In this work the FTA is considered as a starting point to develop an improved FSM. Making use of a model-based investigation, a fault-to-symptoms dependency study is performed. To this purpose a dynamic model, previously developed by the authors, is exploited to simulate the system under faulty conditions. Five faults are simulated, one for the stack and four occurring at BOP level. Moreover, the robustness of the FSM design is increased by exploiting symptom thresholds defined for the investigation of the quantitative effects of the simulated faults on the affected variables.
FINDS: A fault inferring nonlinear detection system programmers manual, version 3.0
NASA Technical Reports Server (NTRS)
Lancraft, R. E.
1985-01-01
Detailed software documentation of the digital computer program FINDS (Fault Inferring Nonlinear Detection System) Version 3.0 is provided. FINDS is a highly modular and extensible computer program designed to monitor and detect sensor failures, while at the same time providing reliable state estimates. In this version of the program the FINDS methodology is used to detect, isolate, and compensate for failures in simulated avionics sensors used by the Advanced Transport Operating Systems (ATOPS) Transport System Research Vehicle (TSRV) in a Microwave Landing System (MLS) environment. It is intended that this report serve as a programmers guide to aid in the maintenance, modification, and revision of the FINDS software.
Adopting software quality measures for healthcare processes.
Yildiz, Ozkan; Demirörs, Onur
2009-01-01
In this study, we investigated the adoptability of software quality measures for healthcare process measurement. Quality measures of ISO/IEC 9126 are redefined from a process perspective to build a generic healthcare process quality measurement model. Case study research method is used, and the model is applied to a public hospital's Entry to Care process. After the application, weak and strong aspects of the process can be easily observed. Access audibility, fault removal, completeness of documentation, and machine utilization are weak aspects and these aspects are the candidates for process improvement. On the other hand, functional completeness, fault ratio, input validity checking, response time, and throughput time are the strong aspects of the process.
Implementation of a research prototype onboard fault monitoring and diagnosis system
NASA Technical Reports Server (NTRS)
Palmer, Michael T.; Abbott, Kathy H.; Schutte, Paul C.; Ricks, Wendell R.
1987-01-01
Due to the dynamic and complex nature of in-flight fault monitoring and diagnosis, a research effort was undertaken at NASA Langley Research Center to investigate the application of artificial intelligence techniques for improved situational awareness. Under this research effort, concepts were developed and a software architecture was designed to address the complexities of onboard monitoring and diagnosis. This paper describes the implementation of these concepts in a computer program called FaultFinder. The implementation of the monitoring, diagnosis, and interface functions as separate modules is discussed, as well as the blackboard designed for the communication of these modules. Some related issues concerning the future installation of FaultFinder in an aircraft are also discussed.
Using faults for PSHA in a volcanic context: the Etna case (Southern Italy)
NASA Astrophysics Data System (ADS)
Azzaro, Raffaele; D'Amico, Salvatore; Gee, Robin; Pace, Bruno; Peruzza, Laura
2016-04-01
At Mt. Etna volcano (Southern Italy), recurrent volcano-tectonic earthquakes affect the urbanised areas, with an overall population of about 400,000 and with important infrastructures and lifelines. For this reason, seismic hazard analyses have been undertaken in the last decade focusing on the capability of local faults to generate damaging earthquakes especially in the short-term (30-5 yrs); these results have to be intended as complementary to the regulatory seismic hazard maps, and devoted to establish priority in the seismic retrofitting of the exposed municipalities. Starting from past experience, in the framework of the V3 Project funded by the Italian Department of Civil Defense we performed a fully probabilistic seismic hazard assessment by using an original definition of seismic sources and ground-motion prediction equations specifically derived for this volcanic area; calculations are referred to a new brand topographic surface (Mt. Etna reaches more than 3,000 m in elevation, in less than 20 km from the coast), and to both Poissonian and time-dependent occurrence models. We present at first the process of defining seismic sources that includes individual faults, seismic zones and gridded seismicity; they are obtained by integrating geological field data with long-term (the historical macroseismic catalogue) and short-term earthquake data (the instrumental catalogue). The analysis of the Frequency Magnitude Distribution identifies areas in the volcanic complex, with a- and b-values of the Gutenberg-Richter relationship representative of different dynamic processes. Then, we discuss the variability of the mean occurrence times of major earthquakes along the main Etnean faults estimated by using a purely geologic approach. This analysis has been carried out through the software code FISH, a Matlab® tool developed to turn fault data representative of the seismogenic process into hazard models. The utilization of a magnitude-size scaling relationship specific for volcanic areas is a key element: the FiSH code may thus calculate the most probable values of characteristic expected magnitude (Mchar) with the associated standard deviation σ, the corresponding mean recurrence times (Tmean) and the aperiodicity factor for each fault. Finally, we show some results obtained by the OpenQuake-engine by considering a conceptual logic tree model organised in several branches (zone and zoneless, historical and geological rates, Poisson and time-dependent assumptions). Maps are referred to various exposure periods (10% exceeding probability in 30-5 years) and different spectral accelerations. The volcanic region of Mt. Etna represents a perfect lab for fault-based PSHA; the large dataset of input parameters used in the calculations allows testing different methodological approaches and validating some conceptual procedures.
TU-AB-BRD-03: Fault Tree Analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dunscombe, P.
2015-06-15
Current quality assurance and quality management guidelines provided by various professional organizations are prescriptive in nature, focusing principally on performance characteristics of planning and delivery devices. However, published analyses of events in radiation therapy show that most events are often caused by flaws in clinical processes rather than by device failures. This suggests the need for the development of a quality management program that is based on integrated approaches to process and equipment quality assurance. Industrial engineers have developed various risk assessment tools that are used to identify and eliminate potential failures from a system or a process before amore » failure impacts a customer. These tools include, but are not limited to, process mapping, failure modes and effects analysis, fault tree analysis. Task Group 100 of the American Association of Physicists in Medicine has developed these tools and used them to formulate an example risk-based quality management program for intensity-modulated radiotherapy. This is a prospective risk assessment approach that analyzes potential error pathways inherent in a clinical process and then ranks them according to relative risk, typically before implementation, followed by the design of a new process or modification of the existing process. Appropriate controls are then put in place to ensure that failures are less likely to occur and, if they do, they will more likely be detected before they propagate through the process, compromising treatment outcome and causing harm to the patient. Such a prospective approach forms the basis of the work of Task Group 100 that has recently been approved by the AAPM. This session will be devoted to a discussion of these tools and practical examples of how these tools can be used in a given radiotherapy clinic to develop a risk based quality management program. Learning Objectives: Learn how to design a process map for a radiotherapy process Learn how to perform failure modes and effects analysis analysis for a given process Learn what fault trees are all about Learn how to design a quality management program based upon the information obtained from process mapping, failure modes and effects analysis and fault tree analysis. Dunscombe: Director, TreatSafely, LLC and Center for the Assessment of Radiological Sciences; Consultant to IAEA and Varian Thomadsen: President, Center for the Assessment of Radiological Sciences Palta: Vice President of the Center for the Assessment of Radiological Sciences.« less
Designing application software in wide area network settings
NASA Technical Reports Server (NTRS)
Makpangou, Mesaac; Birman, Ken
1990-01-01
Progress in methodologies for developing robust local area network software has not been matched by similar results for wide area settings. The design of application software spanning multiple local area environments is examined. For important classes of applications, simple design techniques are presented that yield fault tolerant wide area programs. An implementation of these techniques as a set of tools for use within the ISIS system is described.
Digital Database of Recently Active Traces of the Hayward Fault, California
Lienkaemper, James J.
2006-01-01
The purpose of this map is to show the location of and evidence for recent movement on active fault traces within the Hayward Fault Zone, California. The mapped traces represent the integration of the following three different types of data: (1) geomorphic expression, (2) creep (aseismic fault slip),and (3) trench exposures. This publication is a major revision of an earlier map (Lienkaemper, 1992), which both brings up to date the evidence for faulting and makes it available formatted both as a digital database for use within a geographic information system (GIS) and for broader public access interactively using widely available viewing software. The pamphlet describes in detail the types of scientific observations used to make the map, gives references pertaining to the fault and the evidence of faulting, and provides guidance for use of and limitations of the map. [Last revised Nov. 2008, a minor update for 2007 LiDAR and recent trench investigations; see version history below.
Probabilistic seismic hazard study based on active fault and finite element geodynamic models
NASA Astrophysics Data System (ADS)
Kastelic, Vanja; Carafa, Michele M. C.; Visini, Francesco
2016-04-01
We present a probabilistic seismic hazard analysis (PSHA) that is exclusively based on active faults and geodynamic finite element input models whereas seismic catalogues were used only in a posterior comparison. We applied the developed model in the External Dinarides, a slow deforming thrust-and-fold belt at the contact between Adria and Eurasia.. is the Our method consists of establishing s two earthquake rupture forecast models: (i) a geological active fault input (GEO) model and, (ii) a finite element (FEM) model. The GEO model is based on active fault database that provides information on fault location and its geometric and kinematic parameters together with estimations on its slip rate. By default in this model all deformation is set to be released along the active faults. The FEM model is based on a numerical geodynamic model developed for the region of study. In this model the deformation is, besides along the active faults, released also in the volumetric continuum elements. From both models we calculated their corresponding activity rates, its earthquake rates and their final expected peak ground accelerations. We investigated both the source model and the earthquake model uncertainties by varying the main active fault and earthquake rate calculation parameters through constructing corresponding branches of the seismic hazard logic tree. Hazard maps and UHS curves have been produced for horizontal ground motion on bedrock conditions VS 30 ≥ 800 m/s), thereby not considering local site amplification effects. The hazard was computed over a 0.2° spaced grid considering 648 branches of the logic tree and the mean value of 10% probability of exceedance in 50 years hazard level, while the 5th and 95th percentiles were also computed to investigate the model limits. We conducted a sensitivity analysis to control which of the input parameters influence the final hazard results in which measure. The results of such comparison evidence the deformation model and with their internal variability together with the choice of the ground motion prediction equations (GMPEs) are the most influencing parameter. Both of these parameters have significan affect on the hazard results. Thus having good knowledge of the existence of active faults and their geometric and activity characteristics is of key importance. We also show that PSHA models based exclusively on active faults and geodynamic inputs, which are thus not dependent on past earthquake occurrences, provide a valid method for seismic hazard calculation.
NASA Technical Reports Server (NTRS)
King, Ellis; Hart, Jeremy; Odegard, Ryan
2010-01-01
The Orion Crew Exploration Vehicle (CET) is being designed to include significantly more automation capability than either the Space Shuttle or the International Space Station (ISS). In particular, the vehicle flight software has requirements to accommodate increasingly automated missions throughout all phases of flight. A data-driven flight software architecture will provide an evolvable automation capability to sequence through Guidance, Navigation & Control (GN&C) flight software modes and configurations while maintaining the required flexibility and human control over the automation. This flexibility is a key aspect needed to address the maturation of operational concepts, to permit ground and crew operators to gain trust in the system and mitigate unpredictability in human spaceflight. To allow for mission flexibility and reconfrgurability, a data driven approach is being taken to load the mission event plan as well cis the flight software artifacts associated with the GN&C subsystem. A database of GN&C level sequencing data is presented which manages and tracks the mission specific and algorithm parameters to provide a capability to schedule GN&C events within mission segments. The flight software data schema for performing automated mission sequencing is presented with a concept of operations for interactions with ground and onboard crew members. A prototype architecture for fault identification, isolation and recovery interactions with the automation software is presented and discussed as a forward work item.
Thermal Expert System (TEXSYS): Systems autonomy demonstration project, volume 2. Results
NASA Technical Reports Server (NTRS)
Glass, B. J. (Editor)
1992-01-01
The Systems Autonomy Demonstration Project (SADP) produced a knowledge-based real-time control system for control and fault detection, isolation, and recovery (FDIR) of a prototype two-phase Space Station Freedom external active thermal control system (EATCS). The Thermal Expert System (TEXSYS) was demonstrated in recent tests to be capable of reliable fault anticipation and detection, as well as ordinary control of the thermal bus. Performance requirements were addressed by adopting a hierarchical symbolic control approach-layering model-based expert system software on a conventional, numerical data acquisition and control system. The model-based reasoning capabilities of TEXSYS were shown to be advantageous over typical rule-based expert systems, particularly for detection of unforeseen faults and sensor failures. Volume 1 gives a project overview and testing highlights. Volume 2 provides detail on the EATCS testbed, test operations, and online test results. Appendix A is a test archive, while Appendix B is a compendium of design and user manuals for the TEXSYS software.
Thermal Expert System (TEXSYS): Systems automony demonstration project, volume 1. Overview
NASA Technical Reports Server (NTRS)
Glass, B. J. (Editor)
1992-01-01
The Systems Autonomy Demonstration Project (SADP) produced a knowledge-based real-time control system for control and fault detection, isolation, and recovery (FDIR) of a prototype two-phase Space Station Freedom external active thermal control system (EATCS). The Thermal Expert System (TEXSYS) was demonstrated in recent tests to be capable of reliable fault anticipation and detection, as well as ordinary control of the thermal bus. Performance requirements were addressed by adopting a hierarchical symbolic control approach-layering model-based expert system software on a conventional, numerical data acquisition and control system. The model-based reasoning capabilities of TEXSYS were shown to be advantageous over typical rule-based expert systems, particularly for detection of unforeseen faults and sensor failures. Volume 1 gives a project overview and testing highlights. Volume 2 provides detail on the EATCS test bed, test operations, and online test results. Appendix A is a test archive, while Appendix B is a compendium of design and user manuals for the TEXSYS software.
Development and realization of the open fault diagnosis system based on XPE
NASA Astrophysics Data System (ADS)
Deng, Hui; Wang, TaiYong; He, HuiLong; Xu, YongGang; Zeng, JuXiang
2005-12-01
To make the complex mechanical equipment work in good service, the technology for realizing an embedded open system is introduced systematically, including open hardware configuration, customized embedded operation system and open software structure. The ETX technology is adopted in this system, integrating the CPU main-board functions, and achieving the quick, real-time signal acquisition and intelligent data analysis with applying DSP and CPLD data acquisition card. Under the open configuration, the signal bus mode such as PCI, ISA and PC/104 can be selected and the styles of the signals can be chosen too. In addition, through customizing XPE system, adopting the EWF (Enhanced Write Filter), and realizing the open system authentically, the stability of the system is enhanced. Multi-thread and multi-task programming techniques are adopted in the software programming process. Interconnecting with the remote fault diagnosis center via the net interface, cooperative diagnosis is conducted and the intelligent degree of the fault diagnosis is improved.
Thermal Expert System (TEXSYS): Systems autonomy demonstration project, volume 2. Results
NASA Astrophysics Data System (ADS)
Glass, B. J.
1992-10-01
The Systems Autonomy Demonstration Project (SADP) produced a knowledge-based real-time control system for control and fault detection, isolation, and recovery (FDIR) of a prototype two-phase Space Station Freedom external active thermal control system (EATCS). The Thermal Expert System (TEXSYS) was demonstrated in recent tests to be capable of reliable fault anticipation and detection, as well as ordinary control of the thermal bus. Performance requirements were addressed by adopting a hierarchical symbolic control approach-layering model-based expert system software on a conventional, numerical data acquisition and control system. The model-based reasoning capabilities of TEXSYS were shown to be advantageous over typical rule-based expert systems, particularly for detection of unforeseen faults and sensor failures. Volume 1 gives a project overview and testing highlights. Volume 2 provides detail on the EATCS testbed, test operations, and online test results. Appendix A is a test archive, while Appendix B is a compendium of design and user manuals for the TEXSYS software.
High-throughput state-machine replication using software transactional memory.
Zhao, Wenbing; Yang, William; Zhang, Honglei; Yang, Jack; Luo, Xiong; Zhu, Yueqin; Yang, Mary; Luo, Chaomin
2016-11-01
State-machine replication is a common way of constructing general purpose fault tolerance systems. To ensure replica consistency, requests must be executed sequentially according to some total order at all non-faulty replicas. Unfortunately, this could severely limit the system throughput. This issue has been partially addressed by identifying non-conflicting requests based on application semantics and executing these requests concurrently. However, identifying and tracking non-conflicting requests require intimate knowledge of application design and implementation, and a custom fault tolerance solution developed for one application cannot be easily adopted by other applications. Software transactional memory offers a new way of constructing concurrent programs. In this article, we present the mechanisms needed to retrofit existing concurrency control algorithms designed for software transactional memory for state-machine replication. The main benefit for using software transactional memory in state-machine replication is that general purpose concurrency control mechanisms can be designed without deep knowledge of application semantics. As such, new fault tolerance systems based on state-machine replications with excellent throughput can be easily designed and maintained. In this article, we introduce three different concurrency control mechanisms for state-machine replication using software transactional memory, namely, ordered strong strict two-phase locking, conventional timestamp-based multiversion concurrency control, and speculative timestamp-based multiversion concurrency control. Our experiments show that speculative timestamp-based multiversion concurrency control mechanism has the best performance in all types of workload, the conventional timestamp-based multiversion concurrency control offers the worst performance due to high abort rate in the presence of even moderate contention between transactions. The ordered strong strict two-phase locking mechanism offers the simplest solution with excellent performance in low contention workload, and fairly good performance in high contention workload.
High-throughput state-machine replication using software transactional memory
Yang, William; Zhang, Honglei; Yang, Jack; Luo, Xiong; Zhu, Yueqin; Yang, Mary; Luo, Chaomin
2017-01-01
State-machine replication is a common way of constructing general purpose fault tolerance systems. To ensure replica consistency, requests must be executed sequentially according to some total order at all non-faulty replicas. Unfortunately, this could severely limit the system throughput. This issue has been partially addressed by identifying non-conflicting requests based on application semantics and executing these requests concurrently. However, identifying and tracking non-conflicting requests require intimate knowledge of application design and implementation, and a custom fault tolerance solution developed for one application cannot be easily adopted by other applications. Software transactional memory offers a new way of constructing concurrent programs. In this article, we present the mechanisms needed to retrofit existing concurrency control algorithms designed for software transactional memory for state-machine replication. The main benefit for using software transactional memory in state-machine replication is that general purpose concurrency control mechanisms can be designed without deep knowledge of application semantics. As such, new fault tolerance systems based on state-machine replications with excellent throughput can be easily designed and maintained. In this article, we introduce three different concurrency control mechanisms for state-machine replication using software transactional memory, namely, ordered strong strict two-phase locking, conventional timestamp-based multiversion concurrency control, and speculative timestamp-based multiversion concurrency control. Our experiments show that speculative timestamp-based multiversion concurrency control mechanism has the best performance in all types of workload, the conventional timestamp-based multiversion concurrency control offers the worst performance due to high abort rate in the presence of even moderate contention between transactions. The ordered strong strict two-phase locking mechanism offers the simplest solution with excellent performance in low contention workload, and fairly good performance in high contention workload. PMID:29075049
Quantitative Measures for Software Independent Verification and Validation
NASA Technical Reports Server (NTRS)
Lee, Alice
1996-01-01
As software is maintained or reused, it undergoes an evolution which tends to increase the overall complexity of the code. To understand the effects of this, we brought in statistics experts and leading researchers in software complexity, reliability, and their interrelationships. These experts' project has resulted in our ability to statistically correlate specific code complexity attributes, in orthogonal domains, to errors found over time in the HAL/S flight software which flies in the Space Shuttle. Although only a prototype-tools experiment, the result of this research appears to be extendable to all other NASA software, given appropriate data similar to that logged for the Shuttle onboard software. Our research has demonstrated that a more complete domain coverage can be mathematically demonstrated with the approach we have applied, thereby ensuring full insight into the cause-and-effects relationship between the complexity of a software system and the fault density of that system. By applying the operational profile we can characterize the dynamic effects of software path complexity under this same approach We now have the ability to measure specific attributes which have been statistically demonstrated to correlate to increased error probability, and to know which actions to take, for each complexity domain. Shuttle software verifiers can now monitor the changes in the software complexity, assess the added or decreased risk of software faults in modified code, and determine necessary corrections. The reports, tool documentation, user's guides, and new approach that have resulted from this research effort represent advances in the state of the art of software quality and reliability assurance. Details describing how to apply this technique to other NASA code are contained in this document.
A fault is born: The Landers-Mojave earthquake line
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nur, A.; Ron, H.
1993-04-01
The epicenter and the southern portion of the 1992 Landers earthquake fell on an approximately N-S earthquake line, defined by both epicentral locations and by the rupture directions of four previous M>5 earthquakes in the Mojave: The 1947 Manix; 1975 Galway Lake; 1979 Homestead Valley: and 1992 Joshua Tree events. Another M 5.2 earthquake epicenter in 1965 fell on this line where it intersects the Calico fault. In contrast, the northern part of the Landers rupture followed the NW-SE trending Camp Rock and parallel faults, exhibiting an apparently unusual rupture kink. The block tectonic model (Ron et al., 1984) combiningmore » fault kinematic and mechanics, explains both the alignment of the events, and their ruptures (Nur et al., 1986, 1989), as well as the Landers kink (Nur et al., 1992). Accordingly, the now NW oriented faults have rotated into their present direction away from the direction of maximum shortening, close to becoming locked, whereas a new fault set, optimally oriented relative to the direction of shortening, is developing to accommodate current crustal deformation. The Mojave-Landers line may thus be a new fault in formation. During the transition of faulting from the old, well developed and wak but poorly oriented faults to the strong, but favorably oriented new ones, both can slip simultaneously, giving rise to kinks such as Landers.« less
Don C. Bragg
2002-01-01
This article is an introduction to the computer software used by the Potential Relative Increment (PRI) approach to optimal tree diameter growth modeling. These DOS programs extract qualified tree and plot data from the Eastwide Forest Inventory Data Base (EFIDB), calculate relative tree increment, sort for the highest relative increments by diameter class, and...
Software reliability models for critical applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pham, H.; Pham, M.
This report presents the results of the first phase of the ongoing EG G Idaho, Inc. Software Reliability Research Program. The program is studying the existing software reliability models and proposes a state-of-the-art software reliability model that is relevant to the nuclear reactor control environment. This report consists of three parts: (1) summaries of the literature review of existing software reliability and fault tolerant software reliability models and their related issues, (2) proposed technique for software reliability enhancement, and (3) general discussion and future research. The development of this proposed state-of-the-art software reliability model will be performed in the secondmore » place. 407 refs., 4 figs., 2 tabs.« less
Software reliability models for critical applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pham, H.; Pham, M.
This report presents the results of the first phase of the ongoing EG&G Idaho, Inc. Software Reliability Research Program. The program is studying the existing software reliability models and proposes a state-of-the-art software reliability model that is relevant to the nuclear reactor control environment. This report consists of three parts: (1) summaries of the literature review of existing software reliability and fault tolerant software reliability models and their related issues, (2) proposed technique for software reliability enhancement, and (3) general discussion and future research. The development of this proposed state-of-the-art software reliability model will be performed in the second place.more » 407 refs., 4 figs., 2 tabs.« less
Tokutomi, Tomoharu; Fukushima, Akimune; Yamamoto, Kayono; Bansho, Yasushi; Hachiya, Tsuyoshi; Shimizu, Atsushi
2017-07-14
The Tohoku Medical Megabank project aims to create a next-generation personalized healthcare system by conducting large-scale genome-cohort studies involving three generations of local residents in the areas affected by the Great East Japan Earthquake. We collected medical and genomic information for developing a biobank to be used for this healthcare system. We designed a questionnaire-based pedigree-creation software program named "f-treeGC," which enables even less experienced medical practitioners to accurately and rapidly collect family health history and create pedigree charts. f-treeGC may be run on Adobe AIR. Pedigree charts are created in the following manner: 1) At system startup, the client is prompted to provide required information on the presence or absence of children; f-treeGC is capable of creating a pedigree up to three generations. 2) An interviewer fills out a multiple-choice questionnaire on genealogical information. 3) The information requested includes name, age, gender, general status, infertility status, pregnancy status, fetal status, and physical features or health conditions of individuals over three generations. In addition, information regarding the client and the proband, and birth order information, including multiple gestation, custody, multiple individuals, donor or surrogate, adoption, and consanguinity may be included. 4) f-treeGC shows only marriages between first cousins via the overlay function. 5) f-treeGC automatically creates a pedigree chart, and the chart-creation process is visible for inspection on the screen in real time. 6) The genealogical data may be saved as a file in the original format. The created/modified date and time may be changed as required, and the file may be password-protected and/or saved in read-only format. To enable sorting or searching from the database, the file name automatically contains the terms typed into the entry fields, including physical features or health conditions, by default. 7) Alternatively, family histories are collected using a completed foldable interview paper sheet named "f-sheet", which is identical to the questionnaire in f-treeGC. We developed a questionnaire-based family tree-creation software, named f-treeGC, which is fully compliant with international recommendations for standardized human pedigree nomenclature. The present software simplifies the process of collecting family histories and pedigrees, and has a variety of uses, from genome cohort studies or primary care to genetic counseling.
An Application of the Geo-Semantic Micro-services in Seamless Data-Model Integration
NASA Astrophysics Data System (ADS)
Jiang, P.; Elag, M.; Kumar, P.; Liu, R.; Hu, Y.; Marini, L.; Peckham, S. D.; Hsu, L.
2016-12-01
We are applying machine learning (ML) techniques to continuous acoustic emission (AE) data from laboratory earthquake experiments. Our goal is to apply explicit ML methods to this acoustic datathe AE in order to infer frictional properties of a laboratory fault. The experiment is a double direct shear apparatus comprised of fault blocks surrounding fault gouge comprised of glass beads or quartz powder. Fault characteristics are recorded, including shear stress, applied load (bulk friction = shear stress/normal load) and shear velocity. The raw acoustic signal is continuously recorded. We rely on explicit decision tree approaches (Random Forest and Gradient Boosted Trees) that allow us to identify important features linked to the fault friction. A training procedure that employs both the AE and the recorded shear stress from the experiment is first conducted. Then, testing takes place on data the algorithm has never seen before, using only the continuous AE signal. We find that these methods provide rich information regarding frictional processes during slip (Rouet-Leduc et al., 2017a; Hulbert et al., 2017). In addition, similar machine learning approaches predict failure times, as well as slip magnitudes in some cases. We find that these methods work for both stick slip and slow slip experiments, for periodic slip and for aperiodic slip. We also derive a fundamental relationship between the AE and the friction describing the frictional behavior of any earthquake slip cycle in a given experiment (Rouet-Leduc et al., 2017b). Our goal is to ultimately scale these approaches to Earth geophysical data to probe fault friction. References Rouet-Leduc, B., C. Hulbert, N. Lubbers, K. Barros, C. Humphreys and P. A. Johnson, Machine learning predicts laboratory earthquakes, in review (2017). https://arxiv.org/abs/1702.05774Rouet-LeDuc, B. et al., Friction Laws Derived From the Acoustic Emissions of a Laboratory Fault by Machine Learning (2017), AGU Fall Meeting Session S025: Earthquake source: from the laboratory to the fieldHulbert, C., Characterizing slow slip applying machine learning (2017), AGU Fall Meeting Session S019: Slow slip, Tectonic Tremor, and the Brittle-to-Ductile Transition Zone: What mechanisms control the diversity of slow and fast earthquakes?
Resilience Design Patterns - A Structured Approach to Resilience at Extreme Scale (version 1.1)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hukerikar, Saurabh; Engelmann, Christian
Reliability is a serious concern for future extreme-scale high-performance computing (HPC) systems. Projections based on the current generation of HPC systems and technology roadmaps suggest the prevalence of very high fault rates in future systems. The errors resulting from these faults will propagate and generate various kinds of failures, which may result in outcomes ranging from result corruptions to catastrophic application crashes. Therefore the resilience challenge for extreme-scale HPC systems requires management of various hardware and software technologies that are capable of handling a broad set of fault models at accelerated fault rates. Also, due to practical limits on powermore » consumption in HPC systems future systems are likely to embrace innovative architectures, increasing the levels of hardware and software complexities. As a result the techniques that seek to improve resilience must navigate the complex trade-off space between resilience and the overheads to power consumption and performance. While the HPC community has developed various resilience solutions, application-level techniques as well as system-based solutions, the solution space of HPC resilience techniques remains fragmented. There are no formal methods and metrics to investigate and evaluate resilience holistically in HPC systems that consider impact scope, handling coverage, and performance & power efficiency across the system stack. Additionally, few of the current approaches are portable to newer architectures and software environments that will be deployed on future systems. In this document, we develop a structured approach to the management of HPC resilience using the concept of resilience-based design patterns. A design pattern is a general repeatable solution to a commonly occurring problem. We identify the commonly occurring problems and solutions used to deal with faults, errors and failures in HPC systems. Each established solution is described in the form of a pattern that addresses concrete problems in the design of resilient systems. The complete catalog of resilience design patterns provides designers with reusable design elements. We also define a framework that enhances a designer's understanding of the important constraints and opportunities for the design patterns to be implemented and deployed at various layers of the system stack. This design framework may be used to establish mechanisms and interfaces to coordinate flexible fault management across hardware and software components. The framework also supports optimization of the cost-benefit trade-offs among performance, resilience, and power consumption. The overall goal of this work is to enable a systematic methodology for the design and evaluation of resilience technologies in extreme-scale HPC systems that keep scientific applications running to a correct solution in a timely and cost-efficient manner in spite of frequent faults, errors, and failures of various types.« less
YBYRÁ facilitates comparison of large phylogenetic trees.
Machado, Denis Jacob
2015-07-01
The number and size of tree topologies that are being compared by phylogenetic systematists is increasing due to technological advancements in high-throughput DNA sequencing. However, we still lack tools to facilitate comparison among phylogenetic trees with a large number of terminals. The "YBYRÁ" project integrates software solutions for data analysis in phylogenetics. It comprises tools for (1) topological distance calculation based on the number of shared splits or clades, (2) sensitivity analysis and automatic generation of sensitivity plots and (3) clade diagnoses based on different categories of synapomorphies. YBYRÁ also provides (4) an original framework to facilitate the search for potential rogue taxa based on how much they affect average matching split distances (using MSdist). YBYRÁ facilitates comparison of large phylogenetic trees and outperforms competing software in terms of usability and time efficiency, specially for large data sets. The programs that comprises this toolkit are written in Python, hence they do not require installation and have minimum dependencies. The entire project is available under an open-source licence at http://www.ib.usp.br/grant/anfibios/researchSoftware.html .
NASA Technical Reports Server (NTRS)
1985-01-01
The primary objective of the Test Active Control Technology (ACT) System laboratory tests was to verify and validate the system concept, hardware, and software. The initial lab tests were open loop hardware tests of the Test ACT System as designed and built. During the course of the testing, minor problems were uncovered and corrected. Major software tests were run. The initial software testing was also open loop. These tests examined pitch control laws, wing load alleviation, signal selection/fault detection (SSFD), and output management. The Test ACT System was modified to interface with the direct drive valve (DDV) modules. The initial testing identified problem areas with DDV nonlinearities, valve friction induced limit cycling, DDV control loop instability, and channel command mismatch. The other DDV issue investigated was the ability to detect and isolate failures. Some simple schemes for failure detection were tested but were not completely satisfactory. The Test ACT System architecture continues to appear promising for ACT/FBW applications in systems that must be immune to worst case generic digital faults, and be able to tolerate two sequential nongeneric faults with no reduction in performance. The challenge in such an implementation would be to keep the analog element sufficiently simple to achieve the necessary reliability.
FTMP (Fault Tolerant Multiprocessor) programmer's manual
NASA Technical Reports Server (NTRS)
Feather, F. E.; Liceaga, C. A.; Padilla, P. A.
1986-01-01
The Fault Tolerant Multiprocessor (FTMP) computer system was constructed using the Rockwell/Collins CAPS-6 processor. It is installed in the Avionics Integration Research Laboratory (AIRLAB) of NASA Langley Research Center. It is hosted by AIRLAB's System 10, a VAX 11/750, for the loading of programs and experimentation. The FTMP support software includes a cross compiler for a high level language called Automated Engineering Design (AED) System, an assembler for the CAPS-6 processor assembly language, and a linker. Access to this support software is through an automated remote access facility on the VAX which relieves the user of the burden of learning how to use the IBM 4381. This manual is a compilation of information about the FTMP support environment. It explains the FTMP software and support environment along many of the finer points of running programs on FTMP. This will be helpful to the researcher trying to run an experiment on FTMP and even to the person probing FTMP with fault injections. Much of the information in this manual can be found in other sources; we are only attempting to bring together the basic points in a single source. If the reader should need points clarified, there is a list of support documentation in the back of this manual.
Dendroscope: An interactive viewer for large phylogenetic trees
Huson, Daniel H; Richter, Daniel C; Rausch, Christian; Dezulian, Tobias; Franz, Markus; Rupp, Regula
2007-01-01
Background Research in evolution requires software for visualizing and editing phylogenetic trees, for increasingly very large datasets, such as arise in expression analysis or metagenomics, for example. It would be desirable to have a program that provides these services in an effcient and user-friendly way, and that can be easily installed and run on all major operating systems. Although a large number of tree visualization tools are freely available, some as a part of more comprehensive analysis packages, all have drawbacks in one or more domains. They either lack some of the standard tree visualization techniques or basic graphics and editing features, or they are restricted to small trees containing only tens of thousands of taxa. Moreover, many programs are diffcult to install or are not available for all common operating systems. Results We have developed a new program, Dendroscope, for the interactive visualization and navigation of phylogenetic trees. The program provides all standard tree visualizations and is optimized to run interactively on trees containing hundreds of thousands of taxa. The program provides tree editing and graphics export capabilities. To support the inspection of large trees, Dendroscope offers a magnification tool. The software is written in Java 1.4 and installers are provided for Linux/Unix, MacOS X and Windows XP. Conclusion Dendroscope is a user-friendly program for visualizing and navigating phylogenetic trees, for both small and large datasets. PMID:18034891
Numerical simulation of the stress distribution in a coal mine caused by a normal fault
NASA Astrophysics Data System (ADS)
Zhang, Hongmei; Wu, Jiwen; Zhai, Xiaorong
2017-06-01
Luling coal mine was used for research using FLAC3D software to analyze the stress distribution characteristics of the two sides of a normal fault zone with two different working face models. The working faces were, respectively, on the hanging wall and the foot wall; the two directions of mining were directed to the fault. The stress distributions were different across the fault. The stress was concentrated and the influenced range of stress was gradually larger while the working face was located on the hanging wall. The fault zone played a negative effect to the stress transmission. Obviously, the fault prevented stress transmission, the stress concentrated on the fault zone and the hanging wall. In the second model, the stress on the two sides decreased at first, but then increased continuing to transmit to the hanging wall. The concentrated stress in the fault zone decreased and the stress transmission was obvious. Because of this, the result could be used to minimize roadway damage and lengthen the time available for coal mining by careful design of the roadway and working face.
Mission Services Evolution Center Message Bus
NASA Technical Reports Server (NTRS)
Mayorga, Arturo; Bristow, John O.; Butschky, Mike
2011-01-01
The Goddard Mission Services Evolution Center (GMSEC) Message Bus is a robust, lightweight, fault-tolerant middleware implementation that supports all messaging capabilities of the GMSEC API. This architecture is a distributed software system that routes messages based on message subject names and knowledge of the locations in the network of the interested software components.
NASA Technical Reports Server (NTRS)
Lindsey, Tony; Pecheur, Charles
2004-01-01
Livingstone PathFinder (LPF) is a simulation-based computer program for verifying autonomous diagnostic software. LPF is designed especially to be applied to NASA s Livingstone computer program, which implements a qualitative-model-based algorithm that diagnoses faults in a complex automated system (e.g., an exploratory robot, spacecraft, or aircraft). LPF forms a software test bed containing a Livingstone diagnosis engine, embedded in a simulated operating environment consisting of a simulator of the system to be diagnosed by Livingstone and a driver program that issues commands and faults according to a nondeterministic scenario provided by the user. LPF runs the test bed through all executions allowed by the scenario, checking for various selectable error conditions after each step. All components of the test bed are instrumented, so that execution can be single-stepped both backward and forward. The architecture of LPF is modular and includes generic interfaces to facilitate substitution of alternative versions of its different parts. Altogether, LPF provides a flexible, extensible framework for simulation-based analysis of diagnostic software; these characteristics also render it amenable to application to diagnostic programs other than Livingstone.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mays, S.E.; Poloski, J.P.; Sullivan, W.H.
1982-07-01
This report describes a risk study of the Browns Ferry, Unit 1, nuclear plant. The study is one of four such studies sponsored by the NRC Office of Research, Division of Risk Assessment, as part of its Interim Reliability Evaluation Program (IREP), Phase II. This report is contained in four volumes: a main report and three appendixes. Appendix B provides a description of Browns Ferry, Unit 1, plant systems and the failure evaluation of those systems as they apply to accidents at Browns Ferry. Information is presented concerning front-line system fault analysis; support system fault analysis; human error models andmore » probabilities; and generic control circuit analyses.« less
A Tree Locality-Sensitive Hash for Secure Software Testing
2017-09-14
errors, or to look for vulnerabilities that could allow a nefarious actor to use our software against us. Ultimately, all testing is designed to find...and an equivalent number of feasible paths discovered by Klee. 1.5 Summary This document the Tree Locality-Sensitive Hash (TLSH), a locality-senstive...performing two groups of tests that verify the accuracy and usefulness of TLSH. Chapter 5 summarizes the contents of the dissertation and lists avenues
Risk Analysis Methods for Deepwater Port Oil Transfer Systems
DOT National Transportation Integrated Search
1976-06-01
This report deals with the risk analysis methodology for oil spills from the oil transfer systems in deepwater ports. Failure mode and effect analysis in combination with fault tree analysis are identified as the methods best suited for the assessmen...
Windows .NET Network Distributed Basic Local Alignment Search Toolkit (W.ND-BLAST)
Dowd, Scot E; Zaragoza, Joaquin; Rodriguez, Javier R; Oliver, Melvin J; Payton, Paxton R
2005-01-01
Background BLAST is one of the most common and useful tools for Genetic Research. This paper describes a software application we have termed Windows .NET Distributed Basic Local Alignment Search Toolkit (W.ND-BLAST), which enhances the BLAST utility by improving usability, fault recovery, and scalability in a Windows desktop environment. Our goal was to develop an easy to use, fault tolerant, high-throughput BLAST solution that incorporates a comprehensive BLAST result viewer with curation and annotation functionality. Results W.ND-BLAST is a comprehensive Windows-based software toolkit that targets researchers, including those with minimal computer skills, and provides the ability increase the performance of BLAST by distributing BLAST queries to any number of Windows based machines across local area networks (LAN). W.ND-BLAST provides intuitive Graphic User Interfaces (GUI) for BLAST database creation, BLAST execution, BLAST output evaluation and BLAST result exportation. This software also provides several layers of fault tolerance and fault recovery to prevent loss of data if nodes or master machines fail. This paper lays out the functionality of W.ND-BLAST. W.ND-BLAST displays close to 100% performance efficiency when distributing tasks to 12 remote computers of the same performance class. A high throughput BLAST job which took 662.68 minutes (11 hours) on one average machine was completed in 44.97 minutes when distributed to 17 nodes, which included lower performance class machines. Finally, there is a comprehensive high-throughput BLAST Output Viewer (BOV) and Annotation Engine components, which provides comprehensive exportation of BLAST hits to text files, annotated fasta files, tables, or association files. Conclusion W.ND-BLAST provides an interactive tool that allows scientists to easily utilizing their available computing resources for high throughput and comprehensive sequence analyses. The install package for W.ND-BLAST is freely downloadable from . With registration the software is free, installation, networking, and usage instructions are provided as well as a support forum. PMID:15819992
Dependency Tree Annotation Software
2015-11-01
formats, and it provides numerous options for customizing how dependency trees are displayed. Built entirely in Java , it can run on a wide range of...tree can be saved as an image, .mxe (a mxGraph editing file), a .conll file, and several other file formats. DTE uses the open source Java version
DupTree: a program for large-scale phylogenetic analyses using gene tree parsimony.
Wehe, André; Bansal, Mukul S; Burleigh, J Gordon; Eulenstein, Oliver
2008-07-01
DupTree is a new software program for inferring rooted species trees from collections of gene trees using the gene tree parsimony approach. The program implements a novel algorithm that significantly improves upon the run time of standard search heuristics for gene tree parsimony, and enables the first truly genome-scale phylogenetic analyses. In addition, DupTree allows users to examine alternate rootings and to weight the reconciliation costs for gene trees. DupTree is an open source project written in C++. DupTree for Mac OS X, Windows, and Linux along with a sample dataset and an on-line manual are available at http://genome.cs.iastate.edu/CBL/DupTree
Computing all hybridization networks for multiple binary phylogenetic input trees.
Albrecht, Benjamin
2015-07-30
The computation of phylogenetic trees on the same set of species that are based on different orthologous genes can lead to incongruent trees. One possible explanation for this behavior are interspecific hybridization events recombining genes of different species. An important approach to analyze such events is the computation of hybridization networks. This work presents the first algorithm computing the hybridization number as well as a set of representative hybridization networks for multiple binary phylogenetic input trees on the same set of taxa. To improve its practical runtime, we show how this algorithm can be parallelized. Moreover, we demonstrate the efficiency of the software Hybroscale, containing an implementation of our algorithm, by comparing it to PIRNv2.0, which is so far the best available software computing the exact hybridization number for multiple binary phylogenetic trees on the same set of taxa. The algorithm is part of the software Hybroscale, which was developed specifically for the investigation of hybridization networks including their computation and visualization. Hybroscale is freely available(1) and runs on all three major operating systems. Our simulation study indicates that our approach is on average 100 times faster than PIRNv2.0. Moreover, we show how Hybroscale improves the interpretation of the reported hybridization networks by adding certain features to its graphical representation.
An Integrated Fault Tolerant Robotic Controller System for High Reliability and Safety
NASA Technical Reports Server (NTRS)
Marzwell, Neville I.; Tso, Kam S.; Hecht, Myron
1994-01-01
This paper describes the concepts and features of a fault-tolerant intelligent robotic control system being developed for applications that require high dependability (reliability, availability, and safety). The system consists of two major elements: a fault-tolerant controller and an operator workstation. The fault-tolerant controller uses a strategy which allows for detection and recovery of hardware, operating system, and application software failures.The fault-tolerant controller can be used by itself in a wide variety of applications in industry, process control, and communications. The controller in combination with the operator workstation can be applied to robotic applications such as spaceborne extravehicular activities, hazardous materials handling, inspection and maintenance of high value items (e.g., space vehicles, reactor internals, or aircraft), medicine, and other tasks where a robot system failure poses a significant risk to life or property.