improve software fault: Topics by Science.gov

Sample records for improve software fault

Development and validation of techniques for improving software dependability

NASA Technical Reports Server (NTRS)

Knight, John C.

1992-01-01

A collection of document abstracts are presented on the topic of improving software dependability through NASA grant NAG-1-1123. Specific topics include: modeling of error detection; software inspection; test cases; Magnetic Stereotaxis System safety specifications and fault trees; and injection of synthetic faults into software.
Coordinated Fault Tolerance for High-Performance Computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dongarra, Jack; Bosilca, George; et al.

2013-04-08

Our work to meet our goal of end-to-end fault tolerance has focused on two areas: (1) improving fault tolerance in various software currently available and widely used throughout the HEC domain and (2) using fault information exchange and coordination to achieve holistic, systemwide fault tolerance and understanding how to design and implement interfaces for integrating fault tolerance features for multiple layers of the software stack—from the application, math libraries, and programming language runtime to other common system software such as jobs schedulers, resource managers, and monitoring tools.
Software dependability in the Tandem GUARDIAN system

NASA Technical Reports Server (NTRS)

Lee, Inhwan; Iyer, Ravishankar K.

1995-01-01

Based on extensive field failure data for Tandem's GUARDIAN operating system this paper discusses evaluation of the dependability of operational software. Software faults considered are major defects that result in processor failures and invoke backup processes to take over. The paper categorizes the underlying causes of software failures and evaluates the effectiveness of the process pair technique in tolerating software faults. A model to describe the impact of software faults on the reliability of an overall system is proposed. The model is used to evaluate the significance of key factors that determine software dependability and to identify areas for improvement. An analysis of the data shows that about 77% of processor failures that are initially considered due to software are confirmed as software problems. The analysis shows that the use of process pairs to provide checkpointing and restart (originally intended for tolerating hardware faults) allows the system to tolerate about 75% of reported software faults that result in processor failures. The loose coupling between processors, which results in the backup execution (the processor state and the sequence of events) being different from the original execution, is a major reason for the measured software fault tolerance. Over two-thirds (72%) of measured software failures are recurrences of previously reported faults. Modeling, based on the data, shows that, in addition to reducing the number of software faults, software dependability can be enhanced by reducing the recurrence rate.
Automated Fault Interpretation and Extraction using Improved Supplementary Seismic Datasets

NASA Astrophysics Data System (ADS)

Bollmann, T. A.; Shank, R.

2017-12-01

During the interpretation of seismic volumes, it is necessary to interpret faults along with horizons of interest. With the improvement of technology, the interpretation of faults can be expedited with the aid of different algorithms that create supplementary seismic attributes, such as semblance and coherency. These products highlight discontinuities, but still need a large amount of human interaction to interpret faults and are plagued by noise and stratigraphic discontinuities. Hale (2013) presents a method to improve on these datasets by creating what is referred to as a Fault Likelihood volume. In general, these volumes contain less noise and do not emphasize stratigraphic features. Instead, planar features within a specified strike and dip range are highlighted. Once a satisfactory Fault Likelihood Volume is created, extraction of fault surfaces is much easier. The extracted fault surfaces are then exported to interpretation software for QC. Numerous software packages have implemented this methodology with varying results. After investigating these platforms, we developed a preferred Automated Fault Interpretation workflow.
Runtime Verification in Context : Can Optimizing Error Detection Improve Fault Diagnosis

NASA Technical Reports Server (NTRS)

Dwyer, Matthew B.; Purandare, Rahul; Person, Suzette

2010-01-01

Runtime verification has primarily been developed and evaluated as a means of enriching the software testing process. While many researchers have pointed to its potential applicability in online approaches to software fault tolerance, there has been a dearth of work exploring the details of how that might be accomplished. In this paper, we describe how a component-oriented approach to software health management exposes the connections between program execution, error detection, fault diagnosis, and recovery. We identify both research challenges and opportunities in exploiting those connections. Specifically, we describe how recent approaches to reducing the overhead of runtime monitoring aimed at error detection might be adapted to reduce the overhead and improve the effectiveness of fault diagnosis.
Software reliability through fault-avoidance and fault-tolerance

NASA Technical Reports Server (NTRS)

Vouk, Mladen A.; Mcallister, David F.

1993-01-01

Strategies and tools for the testing, risk assessment and risk control of dependable software-based systems were developed. Part of this project consists of studies to enable the transfer of technology to industry, for example the risk management techniques for safety-concious systems. Theoretical investigations of Boolean and Relational Operator (BRO) testing strategy were conducted for condition-based testing. The Basic Graph Generation and Analysis tool (BGG) was extended to fully incorporate several variants of the BRO metric. Single- and multi-phase risk, coverage and time-based models are being developed to provide additional theoretical and empirical basis for estimation of the reliability and availability of large, highly dependable software. A model for software process and risk management was developed. The use of cause-effect graphing for software specification and validation was investigated. Lastly, advanced software fault-tolerance models were studied to provide alternatives and improvements in situations where simple software fault-tolerance strategies break down.
Analyzing and Predicting Effort Associated with Finding and Fixing Software Faults

NASA Technical Reports Server (NTRS)

Hamill, Maggie; Goseva-Popstojanova, Katerina

2016-01-01

Context: Software developers spend a significant amount of time fixing faults. However, not many papers have addressed the actual effort needed to fix software faults. Objective: The objective of this paper is twofold: (1) analysis of the effort needed to fix software faults and how it was affected by several factors and (2) prediction of the level of fix implementation effort based on the information provided in software change requests. Method: The work is based on data related to 1200 failures, extracted from the change tracking system of a large NASA mission. The analysis includes descriptive and inferential statistics. Predictions are made using three supervised machine learning algorithms and three sampling techniques aimed at addressing the imbalanced data problem. Results: Our results show that (1) 83% of the total fix implementation effort was associated with only 20% of failures. (2) Both safety critical failures and post-release failures required three times more effort to fix compared to non-critical and pre-release counterparts, respectively. (3) Failures with fixes spread across multiple components or across multiple types of software artifacts required more effort. The spread across artifacts was more costly than spread across components. (4) Surprisingly, some types of faults associated with later life-cycle activities did not require significant effort. (5) The level of fix implementation effort was predicted with 73% overall accuracy using the original, imbalanced data. Using oversampling techniques improved the overall accuracy up to 77%. More importantly, oversampling significantly improved the prediction of the high level effort, from 31% to around 85%. Conclusions: This paper shows the importance of tying software failures to changes made to fix all associated faults, in one or more software components and/or in one or more software artifacts, and the benefit of studying how the spread of faults and other factors affect the fix implementation effort.
Hierarchical Simulation to Assess Hardware and Software Dependability

NASA Technical Reports Server (NTRS)

Ries, Gregory Lawrence

1997-01-01

This thesis presents a method for conducting hierarchical simulations to assess system hardware and software dependability. The method is intended to model embedded microprocessor systems. A key contribution of the thesis is the idea of using fault dictionaries to propagate fault effects upward from the level of abstraction where a fault model is assumed to the system level where the ultimate impact of the fault is observed. A second important contribution is the analysis of the software behavior under faults as well as the hardware behavior. The simulation method is demonstrated and validated in four case studies analyzing Myrinet, a commercial, high-speed networking system. One key result from the case studies shows that the simulation method predicts the same fault impact 87.5% of the time as is obtained by similar fault injections into a real Myrinet system. Reasons for the remaining discrepancy are examined in the thesis. A second key result shows the reduction in the number of simulations needed due to the fault dictionary method. In one case study, 500 faults were injected at the chip level, but only 255 propagated to the system level. Of these 255 faults, 110 shared identical fault dictionary entries at the system level and so did not need to be resimulated. The necessary number of system-level simulations was therefore reduced from 500 to 145. Finally, the case studies show how the simulation method can be used to improve the dependability of the target system. The simulation analysis was used to add recovery to the target software for the most common fault propagation mechanisms that would cause the software to hang. After the modification, the number of hangs was reduced by 60% for fault injections into the real system.
Software-implemented fault insertion: An FTMP example

NASA Technical Reports Server (NTRS)

Czeck, Edward W.; Siewiorek, Daniel P.; Segall, Zary Z.

1987-01-01

This report presents a model for fault insertion through software; describes its implementation on a fault-tolerant computer, FTMP; presents a summary of fault detection, identification, and reconfiguration data collected with software-implemented fault insertion; and compares the results to hardware fault insertion data. Experimental results show detection time to be a function of time of insertion and system workload. For the fault detection time, there is no correlation between software-inserted faults and hardware-inserted faults; this is because hardware-inserted faults must manifest as errors before detection, whereas software-inserted faults immediately exercise the error detection mechanisms. In summary, the software-implemented fault insertion is able to be used as an evaluation technique for the fault-handling capabilities of a system in fault detection, identification and recovery. Although the software-inserted faults do not map directly to hardware-inserted faults, experiments show software-implemented fault insertion is capable of emulating hardware fault insertion, with greater ease and automation.
Fault Management Techniques in Human Spaceflight Operations

NASA Technical Reports Server (NTRS)

O'Hagan, Brian; Crocker, Alan

2006-01-01

This paper discusses human spaceflight fault management operations. Fault detection and response capabilities available in current US human spaceflight programs Space Shuttle and International Space Station are described while emphasizing system design impacts on operational techniques and constraints. Preflight and inflight processes along with products used to anticipate, mitigate and respond to failures are introduced. Examples of operational products used to support failure responses are presented. Possible improvements in the state of the art, as well as prioritization and success criteria for their implementation are proposed. This paper describes how the architecture of a command and control system impacts operations in areas such as the required fault response times, automated vs. manual fault responses, use of workarounds, etc. The architecture includes the use of redundancy at the system and software function level, software capabilities, use of intelligent or autonomous systems, number and severity of software defects, etc. This in turn drives which Caution and Warning (C&W) events should be annunciated, C&W event classification, operator display designs, crew training, flight control team training, and procedure development. Other factors impacting operations are the complexity of a system, skills needed to understand and operate a system, and the use of commonality vs. optimized solutions for software and responses. Fault detection, annunciation, safing responses, and recovery capabilities are explored using real examples to uncover underlying philosophies and constraints. These factors directly impact operations in that the crew and flight control team need to understand what happened, why it happened, what the system is doing, and what, if any, corrective actions they need to perform. If a fault results in multiple C&W events, or if several faults occur simultaneously, the root cause(s) of the fault(s), as well as their vehicle-wide impacts, must be determined in order to maintain situational awareness. This allows both automated and manual recovery operations to focus on the real cause of the fault(s). An appropriate balance must be struck between correcting the root cause failure and addressing the impacts of that fault on other vehicle components. Lastly, this paper presents a strategy for using lessons learned to improve the software, displays, and procedures in addition to determining what is a candidate for automation. Enabling technologies and techniques are identified to promote system evolution from one that requires manual fault responses to one that uses automation and autonomy where they are most effective. These considerations include the value in correcting software defects in a timely manner, automation of repetitive tasks, making time critical responses autonomous, etc. The paper recommends the appropriate use of intelligent systems to determine the root causes of faults and correctly identify separate unrelated faults.
Analysis of a hardware and software fault tolerant processor for critical applications

NASA Technical Reports Server (NTRS)

Dugan, Joanne B.

1993-01-01

Computer systems for critical applications must be designed to tolerate software faults as well as hardware faults. A unified approach to tolerating hardware and software faults is characterized by classifying faults in terms of duration (transient or permanent) rather than source (hardware or software). Errors arising from transient faults can be handled through masking or voting, but errors arising from permanent faults require system reconfiguration to bypass the failed component. Most errors which are caused by software faults can be considered transient, in that they are input-dependent. Software faults are triggered by a particular set of inputs. Quantitative dependability analysis of systems which exhibit a unified approach to fault tolerance can be performed by a hierarchical combination of fault tree and Markov models. A methodology for analyzing hardware and software fault tolerant systems is applied to the analysis of a hypothetical system, loosely based on the Fault Tolerant Parallel Processor. The models consider both transient and permanent faults, hardware and software faults, independent and related software faults, automatic recovery, and reconfiguration.
Software reliability studies

NASA Technical Reports Server (NTRS)

Hoppa, Mary Ann; Wilson, Larry W.

1994-01-01

There are many software reliability models which try to predict future performance of software based on data generated by the debugging process. Our research has shown that by improving the quality of the data one can greatly improve the predictions. We are working on methodologies which control some of the randomness inherent in the standard data generation processes in order to improve the accuracy of predictions. Our contribution is twofold in that we describe an experimental methodology using a data structure called the debugging graph and apply this methodology to assess the robustness of existing models. The debugging graph is used to analyze the effects of various fault recovery orders on the predictive accuracy of several well-known software reliability algorithms. We found that, along a particular debugging path in the graph, the predictive performance of different models can vary greatly. Similarly, just because a model 'fits' a given path's data well does not guarantee that the model would perform well on a different path. Further we observed bug interactions and noted their potential effects on the predictive process. We saw that not only do different faults fail at different rates, but that those rates can be affected by the particular debugging stage at which the rates are evaluated. Based on our experiment, we conjecture that the accuracy of a reliability prediction is affected by the fault recovery order as well as by fault interaction.
Towards Certification of a Space System Application of Fault Detection and Isolation

NASA Technical Reports Server (NTRS)

Feather, Martin S.; Markosian, Lawrence Z.

2008-01-01

Advanced fault detection, isolation and recovery (FDIR) software is being investigated at NASA as a means to the improve reliability and availability of its space systems. Certification is a critical step in the acceptance of such software. Its attainment hinges on performing the necessary verification and validation to show that the software will fulfill its requirements in the intended setting. Presented herein is our ongoing work to plan for the certification of a pilot application of advanced FDIR software in a NASA setting. We describe the application, and the key challenges and opportunities it offers for certification.
Health management and controls for Earth-to-orbit propulsion systems

NASA Astrophysics Data System (ADS)

Bickford, R. L.

1995-03-01

Avionics and health management technologies increase the safety and reliability while decreasing the overall cost for Earth-to-orbit (ETO) propulsion systems. New ETO propulsion systems will depend on highly reliable fault tolerant flight avionics, advanced sensing systems and artificial intelligence aided software to ensure critical control, safety and maintenance requirements are met in a cost effective manner. Propulsion avionics consist of the engine controller, actuators, sensors, software and ground support elements. In addition to control and safety functions, these elements perform system monitoring for health management. Health management is enhanced by advanced sensing systems and algorithms which provide automated fault detection and enable adaptive control and/or maintenance approaches. Aerojet is developing advanced fault tolerant rocket engine controllers which provide very high levels of reliability. Smart sensors and software systems which significantly enhance fault coverage and enable automated operations are also under development. Smart sensing systems, such as flight capable plume spectrometers, have reached maturity in ground-based applications and are suitable for bridging to flight. Software to detect failed sensors has reached similar maturity. This paper will discuss fault detection and isolation for advanced rocket engine controllers as well as examples of advanced sensing systems and software which significantly improve component failure detection for engine system safety and health management.
Fault-tolerant software - Experiment with the sift operating system. [Software Implemented Fault Tolerance computer

NASA Technical Reports Server (NTRS)

Brunelle, J. E.; Eckhardt, D. E., Jr.

1985-01-01

Results are presented of an experiment conducted in the NASA Avionics Integrated Research Laboratory (AIRLAB) to investigate the implementation of fault-tolerant software techniques on fault-tolerant computer architectures, in particular the Software Implemented Fault Tolerance (SIFT) computer. The N-version programming and recovery block techniques were implemented on a portion of the SIFT operating system. The results indicate that, to effectively implement fault-tolerant software design techniques, system requirements will be impacted and suggest that retrofitting fault-tolerant software on existing designs will be inefficient and may require system modification.
Factors That Affect Software Testability

NASA Technical Reports Server (NTRS)

Voas, Jeffrey M.

1991-01-01

Software faults that infrequently affect software's output are dangerous. When a software fault causes frequent software failures, testing is likely to reveal the fault before the software is releases; when the fault remains undetected during testing, it can cause disaster after the software is installed. A technique for predicting whether a particular piece of software is likely to reveal faults within itself during testing is found in [Voas91b]. A piece of software that is likely to reveal faults within itself during testing is said to have high testability. A piece of software that is not likely to reveal faults within itself during testing is said to have low testability. It is preferable to design software with higher testabilities from the outset, i.e., create software with as high of a degree of testability as possible to avoid the problems of having undetected faults that are associated with low testability. Information loss is a phenomenon that occurs during program execution that increases the likelihood that a fault will remain undetected. In this paper, I identify two brad classes of information loss, define them, and suggest ways of predicting the potential for information loss to occur. We do this in order to decrease the likelihood that faults will remain undetected during testing.
An Empirical Approach to Logical Clustering of Software Failure Regions

DTIC Science & Technology

1994-03-01

this is a coincidence or normal behavior of failure regions. " Software faults were numbered in order as they were discovered, by the various testing...locations of the associated faults. The goal of this research will be an improved testing technique that incorporates failure region behavior . To do this...clustering behavior . This, however, does not correlate with the structural clustering of failure regions observed by Ginn (1991) on the same set of data
Developing interpretable models with optimized set reduction for identifying high risk software components

NASA Technical Reports Server (NTRS)

Briand, Lionel C.; Basili, Victor R.; Hetmanski, Christopher J.

1993-01-01

Applying equal testing and verification effort to all parts of a software system is not very efficient, especially when resources are limited and scheduling is tight. Therefore, one needs to be able to differentiate low/high fault frequency components so that testing/verification effort can be concentrated where needed. Such a strategy is expected to detect more faults and thus improve the resulting reliability of the overall system. This paper presents the Optimized Set Reduction approach for constructing such models, intended to fulfill specific software engineering needs. Our approach to classification is to measure the software system and build multivariate stochastic models for predicting high risk system components. We present experimental results obtained by classifying Ada components into two classes: is or is not likely to generate faults during system and acceptance test. Also, we evaluate the accuracy of the model and the insights it provides into the error making process.
Providing an empirical basis for optimizing the verification and testing phases of software development

NASA Technical Reports Server (NTRS)

Briand, Lionel C.; Basili, Victor R.; Hetmanski, Christopher J.

1992-01-01

Applying equal testing and verification effort to all parts of a software system is not very efficient, especially when resources are limited and scheduling is tight. Therefore, one needs to be able to differentiate low/high fault density components so that the testing/verification effort can be concentrated where needed. Such a strategy is expected to detect more faults and thus improve the resulting reliability of the overall system. This paper presents an alternative approach for constructing such models that is intended to fulfill specific software engineering needs (i.e. dealing with partial/incomplete information and creating models that are easy to interpret). Our approach to classification is as follows: (1) to measure the software system to be considered; and (2) to build multivariate stochastic models for prediction. We present experimental results obtained by classifying FORTRAN components developed at the NASA/GSFC into two fault density classes: low and high. Also we evaluate the accuracy of the model and the insights it provides into the software process.
An experimental investigation of fault tolerant software structures in an avionics application

NASA Technical Reports Server (NTRS)

Caglayan, Alper K.; Eckhardt, Dave E., Jr.

1989-01-01

The objective of this experimental investigation is to compare the functional performance and software reliability of competing fault tolerant software structures utilizing software diversity. In this experiment, three versions of the redundancy management software for a skewed sensor array have been developed using three diverse failure detection and isolation algorithms and incorporated into various N-version, recovery block and hybrid software structures. The empirical results show that, for maximum functional performance improvement in the selected application domain, the results of diverse algorithms should be voted before being processed by multiple versions without enforced diversity. Results also suggest that when the reliability gain with an N-version structure is modest, recovery block structures are more feasible since higher reliability can be obtained using an acceptance check with a modest reliability.

Risk-Significant Adverse Condition Awareness Strengthens Assurance of Fault Management Systems

NASA Technical Reports Server (NTRS)

Fitz, Rhonda

2017-01-01

As spaceflight systems increase in complexity, Fault Management (FM) systems are ranked high in risk-based assessment of software criticality, emphasizing the importance of establishing highly competent domain expertise to provide assurance. Adverse conditions (ACs) and specific vulnerabilities encountered by safety- and mission-critical software systems have been identified through efforts to reduce the risk posture of software-intensive NASA missions. Acknowledgement of potential off-nominal conditions and analysis to determine software system resiliency are important aspects of hazard analysis and FM. A key component of assuring FM is an assessment of how well software addresses susceptibility to failure through consideration of ACs. Focus on significant risk predicted through experienced analysis conducted at the NASA Independent Verification & Validation (IV&V) Program enables the scoping of effective assurance strategies with regard to overall asset protection of complex spaceflight as well as ground systems. Research efforts sponsored by NASAs Office of Safety and Mission Assurance (OSMA) defined terminology, categorized data fields, and designed a baseline repository that centralizes and compiles a comprehensive listing of ACs and correlated data relevant across many NASA missions. This prototype tool helps projects improve analysis by tracking ACs and allowing queries based on project, mission type, domain/component, causal fault, and other key characteristics. Vulnerability in off-nominal situations, architectural design weaknesses, and unexpected or undesirable system behaviors in reaction to faults are curtailed with the awareness of ACs and risk-significant scenarios modeled for analysts through this database. Integration within the Enterprise Architecture at NASA IV&V enables interfacing with other tools and datasets, technical support, and accessibility across the Agency. This paper discusses the development of an improved workflow process utilizing this database for adaptive, risk-informed FM assurance that critical software systems will safely and securely protect against faults and respond to ACs in order to achieve successful missions.
Risk-Significant Adverse Condition Awareness Strengthens Assurance of Fault Management Systems

NASA Technical Reports Server (NTRS)

Fitz, Rhonda

2017-01-01

As spaceflight systems increase in complexity, Fault Management (FM) systems are ranked high in risk-based assessment of software criticality, emphasizing the importance of establishing highly competent domain expertise to provide assurance. Adverse conditions (ACs) and specific vulnerabilities encountered by safety- and mission-critical software systems have been identified through efforts to reduce the risk posture of software-intensive NASA missions. Acknowledgement of potential off-nominal conditions and analysis to determine software system resiliency are important aspects of hazard analysis and FM. A key component of assuring FM is an assessment of how well software addresses susceptibility to failure through consideration of ACs. Focus on significant risk predicted through experienced analysis conducted at the NASA Independent Verification Validation (IVV) Program enables the scoping of effective assurance strategies with regard to overall asset protection of complex spaceflight as well as ground systems. Research efforts sponsored by NASA's Office of Safety and Mission Assurance defined terminology, categorized data fields, and designed a baseline repository that centralizes and compiles a comprehensive listing of ACs and correlated data relevant across many NASA missions. This prototype tool helps projects improve analysis by tracking ACs and allowing queries based on project, mission type, domaincomponent, causal fault, and other key characteristics. Vulnerability in off-nominal situations, architectural design weaknesses, and unexpected or undesirable system behaviors in reaction to faults are curtailed with the awareness of ACs and risk-significant scenarios modeled for analysts through this database. Integration within the Enterprise Architecture at NASA IVV enables interfacing with other tools and datasets, technical support, and accessibility across the Agency. This paper discusses the development of an improved workflow process utilizing this database for adaptive, risk-informed FM assurance that critical software systems will safely and securely protect against faults and respond to ACs in order to achieve successful missions.
Practical Methods for Estimating Software Systems Fault Content and Location

NASA Technical Reports Server (NTRS)

Nikora, A.; Schneidewind, N.; Munson, J.

1999-01-01

Over the past several years, we have developed techniques to discriminate between fault-prone software modules and those that are not, to estimate a software system's residual fault content, to identify those portions of a software system having the highest estimated number of faults, and to estimate the effects of requirements changes on software quality.
Multi-version software reliability through fault-avoidance and fault-tolerance

NASA Technical Reports Server (NTRS)

Vouk, Mladen A.; Mcallister, David F.

1989-01-01

A number of experimental and theoretical issues associated with the practical use of multi-version software to provide run-time tolerance to software faults were investigated. A specialized tool was developed and evaluated for measuring testing coverage for a variety of metrics. The tool was used to collect information on the relationships between software faults and coverage provided by the testing process as measured by different metrics (including data flow metrics). Considerable correlation was found between coverage provided by some higher metrics and the elimination of faults in the code. Back-to-back testing was continued as an efficient mechanism for removal of un-correlated faults, and common-cause faults of variable span. Software reliability estimation methods was also continued based on non-random sampling, and the relationship between software reliability and code coverage provided through testing. New fault tolerance models were formulated. Simulation studies of the Acceptance Voting and Multi-stage Voting algorithms were finished and it was found that these two schemes for software fault tolerance are superior in many respects to some commonly used schemes. Particularly encouraging are the safety properties of the Acceptance testing scheme.
Object-Oriented Algorithm For Evaluation Of Fault Trees

NASA Technical Reports Server (NTRS)

Patterson-Hine, F. A.; Koen, B. V.

1992-01-01

Algorithm for direct evaluation of fault trees incorporates techniques of object-oriented programming. Reduces number of calls needed to solve trees with repeated events. Provides significantly improved software environment for such computations as quantitative analyses of safety and reliability of complicated systems of equipment (e.g., spacecraft or factories).
Software Fault Tolerance: A Tutorial

NASA Technical Reports Server (NTRS)

Torres-Pomales, Wilfredo

2000-01-01

Because of our present inability to produce error-free software, software fault tolerance is and will continue to be an important consideration in software systems. The root cause of software design errors is the complexity of the systems. Compounding the problems in building correct software is the difficulty in assessing the correctness of software for highly complex systems. After a brief overview of the software development processes, we note how hard-to-detect design faults are likely to be introduced during development and how software faults tend to be state-dependent and activated by particular input sequences. Although component reliability is an important quality measure for system level analysis, software reliability is hard to characterize and the use of post-verification reliability estimates remains a controversial issue. For some applications software safety is more important than reliability, and fault tolerance techniques used in those applications are aimed at preventing catastrophes. Single version software fault tolerance techniques discussed include system structuring and closure, atomic actions, inline fault detection, exception handling, and others. Multiversion techniques are based on the assumption that software built differently should fail differently and thus, if one of the redundant versions fails, it is expected that at least one of the other versions will provide an acceptable output. Recovery blocks, N-version programming, and other multiversion techniques are reviewed.
Fault tolerant software modules for SIFT

NASA Technical Reports Server (NTRS)

Hecht, M.; Hecht, H.

1982-01-01

The implementation of software fault tolerance is investigated for critical modules of the Software Implemented Fault Tolerance (SIFT) operating system to support the computational and reliability requirements of advanced fly by wire transport aircraft. Fault tolerant designs generated for the error reported and global executive are examined. A description of the alternate routines, implementation requirements, and software validation are included.
A testing-coverage software reliability model considering fault removal efficiency and error generation.

PubMed

Li, Qiuying; Pham, Hoang

2017-01-01

In this paper, we propose a software reliability model that considers not only error generation but also fault removal efficiency combined with testing coverage information based on a nonhomogeneous Poisson process (NHPP). During the past four decades, many software reliability growth models (SRGMs) based on NHPP have been proposed to estimate the software reliability measures, most of which have the same following agreements: 1) it is a common phenomenon that during the testing phase, the fault detection rate always changes; 2) as a result of imperfect debugging, fault removal has been related to a fault re-introduction rate. But there are few SRGMs in the literature that differentiate between fault detection and fault removal, i.e. they seldom consider the imperfect fault removal efficiency. But in practical software developing process, fault removal efficiency cannot always be perfect, i.e. the failures detected might not be removed completely and the original faults might still exist and new faults might be introduced meanwhile, which is referred to as imperfect debugging phenomenon. In this study, a model aiming to incorporate fault introduction rate, fault removal efficiency and testing coverage into software reliability evaluation is developed, using testing coverage to express the fault detection rate and using fault removal efficiency to consider the fault repair. We compare the performance of the proposed model with several existing NHPP SRGMs using three sets of real failure data based on five criteria. The results exhibit that the model can give a better fitting and predictive performance.
An experimental evaluation of software redundancy as a strategy for improving reliability

NASA Technical Reports Server (NTRS)

Eckhardt, Dave E., Jr.; Caglayan, Alper K.; Knight, John C.; Lee, Larry D.; Mcallister, David F.; Vouk, Mladen A.; Kelly, John P. J.

1990-01-01

The strategy of using multiple versions of independently developed software as a means to tolerate residual software design faults is suggested by the success of hardware redundancy for tolerating hardware failures. Although, as generally accepted, the independence of hardware failures resulting from physical wearout can lead to substantial increases in reliability for redundant hardware structures, a similar conclusion is not immediate for software. The degree to which design faults are manifested as independent failures determines the effectiveness of redundancy as a method for improving software reliability. Interest in multi-version software centers on whether it provides an adequate measure of increased reliability to warrant its use in critical applications. The effectiveness of multi-version software is studied by comparing estimates of the failure probabilities of these systems with the failure probabilities of single versions. The estimates are obtained under a model of dependent failures and compared with estimates obtained when failures are assumed to be independent. The experimental results are based on twenty versions of an aerospace application developed and certified by sixty programmers from four universities. Descriptions of the application, development and certification processes, and operational evaluation are given together with an analysis of the twenty versions.
A methodology for testing fault-tolerant software

NASA Technical Reports Server (NTRS)

Andrews, D. M.; Mahmood, A.; Mccluskey, E. J.

1985-01-01

A methodology for testing fault tolerant software is presented. There are problems associated with testing fault tolerant software because many errors are masked or corrected by voters, limiter, or automatic channel synchronization. This methodology illustrates how the same strategies used for testing fault tolerant hardware can be applied to testing fault tolerant software. For example, one strategy used in testing fault tolerant hardware is to disable the redundancy during testing. A similar testing strategy is proposed for software, namely, to move the major emphasis on testing earlier in the development cycle (before the redundancy is in place) thus reducing the possibility that undetected errors will be masked when limiters and voters are added.
Method and system for diagnostics of apparatus

NASA Technical Reports Server (NTRS)

Gorinevsky, Dimitry (Inventor)

2012-01-01

Proposed is a method, implemented in software, for estimating fault state of an apparatus outfitted with sensors. At each execution period the method processes sensor data from the apparatus to obtain a set of parity parameters, which are further used for estimating fault state. The estimation method formulates a convex optimization problem for each fault hypothesis and employs a convex solver to compute fault parameter estimates and fault likelihoods for each fault hypothesis. The highest likelihoods and corresponding parameter estimates are transmitted to a display device or an automated decision and control system. The obtained accurate estimate of fault state can be used to improve safety, performance, or maintenance processes for the apparatus.
A testing-coverage software reliability model considering fault removal efficiency and error generation

PubMed Central

Li, Qiuying; Pham, Hoang

2017-01-01

In this paper, we propose a software reliability model that considers not only error generation but also fault removal efficiency combined with testing coverage information based on a nonhomogeneous Poisson process (NHPP). During the past four decades, many software reliability growth models (SRGMs) based on NHPP have been proposed to estimate the software reliability measures, most of which have the same following agreements: 1) it is a common phenomenon that during the testing phase, the fault detection rate always changes; 2) as a result of imperfect debugging, fault removal has been related to a fault re-introduction rate. But there are few SRGMs in the literature that differentiate between fault detection and fault removal, i.e. they seldom consider the imperfect fault removal efficiency. But in practical software developing process, fault removal efficiency cannot always be perfect, i.e. the failures detected might not be removed completely and the original faults might still exist and new faults might be introduced meanwhile, which is referred to as imperfect debugging phenomenon. In this study, a model aiming to incorporate fault introduction rate, fault removal efficiency and testing coverage into software reliability evaluation is developed, using testing coverage to express the fault detection rate and using fault removal efficiency to consider the fault repair. We compare the performance of the proposed model with several existing NHPP SRGMs using three sets of real failure data based on five criteria. The results exhibit that the model can give a better fitting and predictive performance. PMID:28750091
Assurance of Fault Management: Risk-Significant Adverse Condition Awareness

NASA Technical Reports Server (NTRS)

Fitz, Rhonda

2016-01-01

Fault Management (FM) systems are ranked high in risk-based assessment of criticality within flight software, emphasizing the importance of establishing highly competent domain expertise to provide assurance for NASA projects, especially as spaceflight systems continue to increase in complexity. Insight into specific characteristics of FM architectures seen embedded within safety- and mission-critical software systems analyzed by the NASA Independent Verification Validation (IVV) Program has been enhanced with an FM Technical Reference (TR) suite. Benefits are aimed beyond the IVV community to those that seek ways to efficiently and effectively provide software assurance to reduce the FM risk posture of NASA and other space missions. The identification of particular FM architectures, visibility, and associated IVV techniques provides a TR suite that enables greater assurance that critical software systems will adequately protect against faults and respond to adverse conditions. The role FM has with regard to overall asset protection of flight software systems is being addressed with the development of an adverse condition (AC) database encompassing flight software vulnerabilities.Identification of potential off-nominal conditions and analysis to determine how a system responds to these conditions are important aspects of hazard analysis and fault management. Understanding what ACs the mission may face, and ensuring they are prevented or addressed is the responsibility of the assurance team, which necessarily should have insight into ACs beyond those defined by the project itself. Research efforts sponsored by NASAs Office of Safety and Mission Assurance defined terminology, categorized data fields, and designed a baseline repository that centralizes and compiles a comprehensive listing of ACs and correlated data relevant across many NASA missions. This prototype tool helps projects improve analysis by tracking ACs, and allowing queries based on project, mission type, domain component, causal fault, and other key characteristics. The repository has a firm structure, initial collection of data, and an interface established for informational queries, with plans for integration within the Enterprise Architecture at NASA IVV, enabling support and accessibility across the Agency. The development of an improved workflow process for adaptive, risk-informed FM assurance is currently underway.
Practical Issues in Implementing Software Reliability Measurement

NASA Technical Reports Server (NTRS)

Nikora, Allen P.; Schneidewind, Norman F.; Everett, William W.; Munson, John C.; Vouk, Mladen A.; Musa, John D.

1999-01-01

Many ways of estimating software systems' reliability, or reliability-related quantities, have been developed over the past several years. Of particular interest are methods that can be used to estimate a software system's fault content prior to test, or to discriminate between components that are fault-prone and those that are not. The results of these methods can be used to: 1) More accurately focus scarce fault identification resources on those portions of a software system most in need of it. 2) Estimate and forecast the risk of exposure to residual faults in a software system during operation, and develop risk and safety criteria to guide the release of a software system to fielded use. 3) Estimate the efficiency of test suites in detecting residual faults. 4) Estimate the stability of the software maintenance process.
Assessing Survivability Using Software Fault Injection

DTIC Science & Technology

2001-04-01

UNCLASSIFIED Defense Technical Information Center Compilation Part Notice ADPO10875 TITLE: Assessing Survivability Using Software Fault Injection...Esc to exit .......................................................................... = 11-1 Assessing Survivability Using Software Fault Injection...Jeffrey Voas Reliable Software Technologies 21351 Ridgetop Circle, #400 Dulles, VA 20166 jmvoas@rstcorp.crom Abstract approved sources have the
Using Remote Sensing Data to Constrain Models of Fault Interactions and Plate Boundary Deformation

NASA Astrophysics Data System (ADS)

Glasscoe, M. T.; Donnellan, A.; Lyzenga, G. A.; Parker, J. W.; Milliner, C. W. D.

2016-12-01

Determining the distribution of slip and behavior of fault interactions at plate boundaries is a complex problem. Field and remotely sensed data often lack the necessary coverage to fully resolve fault behavior. However, realistic physical models may be used to more accurately characterize the complex behavior of faults constrained with observed data, such as GPS, InSAR, and SfM. These results will improve the utility of using combined models and data to estimate earthquake potential and characterize plate boundary behavior. Plate boundary faults exhibit complex behavior, with partitioned slip and distributed deformation. To investigate what fraction of slip becomes distributed deformation off major faults, we examine a model fault embedded within a damage zone of reduced elastic rigidity that narrows with depth and forward model the slip and resulting surface deformation. The fault segments and slip distributions are modeled using the JPL GeoFEST software. GeoFEST (Geophysical Finite Element Simulation Tool) is a two- and three-dimensional finite element software package for modeling solid stress and strain in geophysical and other continuum domain applications [Lyzenga, et al., 2000; Glasscoe, et al., 2004; Parker, et al., 2008, 2010]. New methods to advance geohazards research using computer simulations and remotely sensed observations for model validation are required to understand fault slip, the complex nature of fault interaction and plate boundary deformation. These models help enhance our understanding of the underlying processes, such as transient deformation and fault creep, and can aid in developing observation strategies for sUAV, airborne, and upcoming satellite missions seeking to determine how faults behave and interact and assess their associated hazard. Models will also help to characterize this behavior, which will enable improvements in hazard estimation. Validating the model results against remotely sensed observations will allow us to better constrain fault zone rheology and physical properties, having implications for the overall understanding of earthquake physics, fault interactions, plate boundary deformation and earthquake hazard, preparedness and risk reduction.
Finite Element Simulations of Kaikoura, NZ Earthquake using DInSAR and High-Resolution DSMs

NASA Astrophysics Data System (ADS)

Barba, M.; Willis, M. J.; Tiampo, K. F.; Glasscoe, M. T.; Clark, M. K.; Zekkos, D.; Stahl, T. A.; Massey, C. I.

2017-12-01

Three-dimensional displacements from the Kaikoura, NZ, earthquake in November 2016 are imaged here using Differential Interferometric Synthetic Aperture Radar (DInSAR) and high-resolution Digital Surface Model (DSM) differencing and optical pixel tracking. Full-resolution co- and post-seismic interferograms of Sentinel-1A/B images are constructed using the JPL ISCE software. The OSU SETSM software is used to produce repeat 0.5 m posting DSMs from commercial satellite imagery, which are supplemented with UAV derived DSMs over the Kaikoura fault rupture on the eastern South Island, NZ. DInSAR provides long-wavelength motions while DSM differencing and optical pixel tracking provides both horizontal and vertical near fault motions, improving the modeling of shallow rupture dynamics. JPL GeoFEST software is used to perform finite element modeling of the fault segments and slip distributions and, in turn, the associated asperity distribution. The asperity profile is then used to simulate event rupture, the spatial distribution of stress drop, and the associated stress changes. Finite element modeling of slope stability is accomplished using the ultra high-resolution UAV derived DSMs to examine the evolution of post-earthquake topography, landslide dynamics and volumes. Results include new insights into shallow dynamics of fault slip and partitioning, estimates of stress change, and improved understanding of its relationship with the associated seismicity, deformation, and triggered cascading hazards.
Maintaining the Health of Software Monitors

NASA Technical Reports Server (NTRS)

Person, Suzette; Rungta, Neha

2013-01-01

Software health management (SWHM) techniques complement the rigorous verification and validation processes that are applied to safety-critical systems prior to their deployment. These techniques are used to monitor deployed software in its execution environment, serving as the last line of defense against the effects of a critical fault. SWHM monitors use information from the specification and implementation of the monitored software to detect violations, predict possible failures, and help the system recover from faults. Changes to the monitored software, such as adding new functionality or fixing defects, therefore, have the potential to impact the correctness of both the monitored software and the SWHM monitor. In this work, we describe how the results of a software change impact analysis technique, Directed Incremental Symbolic Execution (DiSE), can be applied to monitored software to identify the potential impact of the changes on the SWHM monitor software. The results of DiSE can then be used by other analysis techniques, e.g., testing, debugging, to help preserve and improve the integrity of the SWHM monitor as the monitored software evolves.
Operational Suitability Guide. Volume 2. Templates

DTIC Science & Technology

1990-05-01

Intended mission, and the required technical and operational characteristics. The mission must be adequately defined and key hardware and software ...operational availability. With the use of fault-tolerant computer hardware and software , the system R&M will significantly improve end-to-end...should Include both hardware and software elements, as appropriate. Unique characteristics or unique support concepts should be Identified if they result
Fault Tree Analysis Application for Safety and Reliability

NASA Technical Reports Server (NTRS)

Wallace, Dolores R.

2003-01-01

Many commercial software tools exist for fault tree analysis (FTA), an accepted method for mitigating risk in systems. The method embedded in the tools identifies a root as use in system components, but when software is identified as a root cause, it does not build trees into the software component. No commercial software tools have been built specifically for development and analysis of software fault trees. Research indicates that the methods of FTA could be applied to software, but the method is not practical without automated tool support. With appropriate automated tool support, software fault tree analysis (SFTA) may be a practical technique for identifying the underlying cause of software faults that may lead to critical system failures. We strive to demonstrate that existing commercial tools for FTA can be adapted for use with SFTA, and that applied to a safety-critical system, SFTA can be used to identify serious potential problems long before integrator and system testing.

The cost of software fault tolerance

NASA Technical Reports Server (NTRS)

Migneault, G. E.

1982-01-01

The proposed use of software fault tolerance techniques as a means of reducing software costs in avionics and as a means of addressing the issue of system unreliability due to faults in software is examined. A model is developed to provide a view of the relationships among cost, redundancy, and reliability which suggests strategies for software development and maintenance which are not conventional.
Study of fault tolerant software technology for dynamic systems

NASA Technical Reports Server (NTRS)

Caglayan, A. K.; Zacharias, G. L.

1985-01-01

The major aim of this study is to investigate the feasibility of using systems-based failure detection isolation and compensation (FDIC) techniques in building fault-tolerant software and extending them, whenever possible, to the domain of software fault tolerance. First, it is shown that systems-based FDIC methods can be extended to develop software error detection techniques by using system models for software modules. In particular, it is demonstrated that systems-based FDIC techniques can yield consistency checks that are easier to implement than acceptance tests based on software specifications. Next, it is shown that systems-based failure compensation techniques can be generalized to the domain of software fault tolerance in developing software error recovery procedures. Finally, the feasibility of using fault-tolerant software in flight software is investigated. In particular, possible system and version instabilities, and functional performance degradation that may occur in N-Version programming applications to flight software are illustrated. Finally, a comparative analysis of N-Version and recovery block techniques in the context of generic blocks in flight software is presented.
Monitoring microearthquakes with the San Andreas fault observatory at depth

USGS Publications Warehouse

Oye, V.; Ellsworth, W.L.

2007-01-01

In 2005, the San Andreas Fault Observatory at Depth (SAFOD) was drilled through the San Andreas Fault zone at a depth of about 3.1 km. The borehole has subsequently been instrumented with high-frequency geophones in order to better constrain locations and source processes of nearby microearthquakes that will be targeted in the upcoming phase of SAFOD. The microseismic monitoring software MIMO, developed by NORSAR, has been installed at SAFOD to provide near-real time locations and magnitude estimates using the high sampling rate (4000 Hz) waveform data. To improve the detection and location accuracy, we incorporate data from the nearby, shallow borehole (???250 m) seismometers of the High Resolution Seismic Network (HRSN). The event association algorithm of the MIMO software incorporates HRSN detections provided by the USGS real time earthworm software. The concept of the new event association is based on the generalized beam forming, primarily used in array seismology. The method requires the pre-computation of theoretical travel times in a 3D grid of potential microearthquake locations to the seismometers of the current station network. By minimizing the differences between theoretical and observed detection times an event is associated and the location accuracy is significantly improved.
Study of fault-tolerant software technology

NASA Technical Reports Server (NTRS)

Slivinski, T.; Broglio, C.; Wild, C.; Goldberg, J.; Levitt, K.; Hitt, E.; Webb, J.

1984-01-01

Presented is an overview of the current state of the art of fault-tolerant software and an analysis of quantitative techniques and models developed to assess its impact. It examines research efforts as well as experience gained from commercial application of these techniques. The paper also addresses the computer architecture and design implications on hardware, operating systems and programming languages (including Ada) of using fault-tolerant software in real-time aerospace applications. It concludes that fault-tolerant software has progressed beyond the pure research state. The paper also finds that, although not perfectly matched, newer architectural and language capabilities provide many of the notations and functions needed to effectively and efficiently implement software fault-tolerance.
Abnormal fault-recovery characteristics of the fault-tolerant multiprocessor uncovered using a new fault-injection methodology

NASA Technical Reports Server (NTRS)

Padilla, Peter A.

1991-01-01

An investigation was made in AIRLAB of the fault handling performance of the Fault Tolerant MultiProcessor (FTMP). Fault handling errors detected during fault injection experiments were characterized. In these fault injection experiments, the FTMP disabled a working unit instead of the faulted unit once in every 500 faults, on the average. System design weaknesses allow active faults to exercise a part of the fault management software that handles Byzantine or lying faults. Byzantine faults behave such that the faulted unit points to a working unit as the source of errors. The design's problems involve: (1) the design and interface between the simplex error detection hardware and the error processing software, (2) the functional capabilities of the FTMP system bus, and (3) the communication requirements of a multiprocessor architecture. These weak areas in the FTMP's design increase the probability that, for any hardware fault, a good line replacement unit (LRU) is mistakenly disabled by the fault management software.
Detection of faults and software reliability analysis

NASA Technical Reports Server (NTRS)

Knight, J. C.

1986-01-01

Multiversion or N-version programming was proposed as a method of providing fault tolerance in software. The approach requires the separate, independent preparation of multiple versions of a piece of software for some application. Specific topics addressed are: failure probabilities in N-version systems, consistent comparison in N-version systems, descriptions of the faults found in the Knight and Leveson experiment, analytic models of comparison testing, characteristics of the input regions that trigger faults, fault tolerance through data diversity, and the relationship between failures caused by automatically seeded faults.
Model Transformation for a System of Systems Dependability Safety Case

NASA Technical Reports Server (NTRS)

Murphy, Judy; Driskell, Stephen B.

2010-01-01

Software plays an increasingly larger role in all aspects of NASA's science missions. This has been extended to the identification, management and control of faults which affect safety-critical functions and by default, the overall success of the mission. Traditionally, the analysis of fault identification, management and control are hardware based. Due to the increasing complexity of system, there has been a corresponding increase in the complexity in fault management software. The NASA Independent Validation & Verification (IV&V) program is creating processes and procedures to identify, and incorporate safety-critical software requirements along with corresponding software faults so that potential hazards may be mitigated. This Specific to Generic ... A Case for Reuse paper describes the phases of a dependability and safety study which identifies a new, process to create a foundation for reusable assets. These assets support the identification and management of specific software faults and, their transformation from specific to generic software faults. This approach also has applications to other systems outside of the NASA environment. This paper addresses how a mission specific dependability and safety case is being transformed to a generic dependability and safety case which can be reused for any type of space mission with an emphasis on software fault conditions.
Preliminary design of the redundant software experiment

NASA Technical Reports Server (NTRS)

Campbell, Roy; Deimel, Lionel; Eckhardt, Dave, Jr.; Kelly, John; Knight, John; Lauterbach, Linda; Lee, Larry; Mcallister, Dave; Mchugh, John

1985-01-01

The goal of the present experiment is to characterize the fault distributions of highly reliable software replicates, constructed using techniques and environments which are similar to those used in comtemporary industrial software facilities. The fault distributions and their effect on the reliability of fault tolerant configurations of the software will be determined through extensive life testing of the replicates against carefully constructed randomly generated test data. Each detected error will be carefully analyzed to provide insight in to their nature and cause. A direct objective is to develop techniques for reducing the intensity of coincident errors, thus increasing the reliability gain which can be achieved with fault tolerance. Data on the reliability gains realized, and the cost of the fault tolerant configurations can be used to design a companion experiment to determine the cost effectiveness of the fault tolerant strategy. Finally, the data and analysis produced by this experiment will be valuable to the software engineering community as a whole because it will provide a useful insight into the nature and cause of hard to find, subtle faults which escape standard software engineering validation techniques and thus persist far into the software life cycle.
Health management and controls for earth to orbit propulsion systems

NASA Technical Reports Server (NTRS)

Bickford, R. L.

1992-01-01

Fault detection and isolation for advanced rocket engine controllers are discussed focusing on advanced sensing systems and software which significantly improve component failure detection for engine safety and health management. Aerojet's Space Transportation Main Engine controller for the National Launch System is the state of the art in fault tolerant engine avionics. Health management systems provide high levels of automated fault coverage and significantly improve vehicle delivered reliability and lower preflight operations costs. Key technologies, including the sensor data validation algorithms and flight capable spectrometers, have been demonstrated in ground applications and are found to be suitable for bridging programs into flight applications.
Software fault tolerance for real-time avionics systems

NASA Technical Reports Server (NTRS)

Anderson, T.; Knight, J. C.

1983-01-01

Avionics systems have very high reliability requirements and are therefore prime candidates for the inclusion of fault tolerance techniques. In order to provide tolerance to software faults, some form of state restoration is usually advocated as a means of recovery. State restoration can be very expensive for systems which utilize concurrent processes. The concurrency present in most avionics systems and the further difficulties introduced by timing constraints imply that providing tolerance for software faults may be inordinately expensive or complex. A straightforward pragmatic approach to software fault tolerance which is believed to be applicable to many real-time avionics systems is proposed. A classification system for software errors is presented together with approaches to recovery and continued service for each error type.
Fault Management Architectures and the Challenges of Providing Software Assurance

NASA Technical Reports Server (NTRS)

Savarino, Shirley; Fitz, Rhonda; Fesq, Lorraine; Whitman, Gerek

2015-01-01

Fault Management (FM) is focused on safety, the preservation of assets, and maintaining the desired functionality of the system. How FM is implemented varies among missions. Common to most missions is system complexity due to a need to establish a multi-dimensional structure across hardware, software and spacecraft operations. FM is necessary to identify and respond to system faults, mitigate technical risks and ensure operational continuity. Generally, FM architecture, implementation, and software assurance efforts increase with mission complexity. Because FM is a systems engineering discipline with a distributed implementation, providing efficient and effective verification and validation (V&V) is challenging. A breakout session at the 2012 NASA Independent Verification & Validation (IV&V) Annual Workshop titled "V&V of Fault Management: Challenges and Successes" exposed this issue in terms of V&V for a representative set of architectures. NASA's Software Assurance Research Program (SARP) has provided funds to NASA IV&V to extend the work performed at the Workshop session in partnership with NASA's Jet Propulsion Laboratory (JPL). NASA IV&V will extract FM architectures across the IV&V portfolio and evaluate the data set, assess visibility for validation and test, and define software assurance methods that could be applied to the various architectures and designs. This SARP initiative focuses efforts on FM architectures from critical and complex projects within NASA. The identification of particular FM architectures and associated V&V/IV&V techniques provides a data set that can enable improved assurance that a system will adequately detect and respond to adverse conditions. Ultimately, results from this activity will be incorporated into the NASA Fault Management Handbook providing dissemination across NASA, other agencies and the space community. This paper discusses the approach taken to perform the evaluations and preliminary findings from the research.
Design study of Software-Implemented Fault-Tolerance (SIFT) computer

NASA Technical Reports Server (NTRS)

Wensley, J. H.; Goldberg, J.; Green, M. W.; Kutz, W. H.; Levitt, K. N.; Mills, M. E.; Shostak, R. E.; Whiting-Okeefe, P. M.; Zeidler, H. M.

1982-01-01

Software-implemented fault tolerant (SIFT) computer design for commercial aviation is reported. A SIFT design concept is addressed. Alternate strategies for physical implementation are considered. Hardware and software design correctness is addressed. System modeling and effectiveness evaluation are considered from a fault-tolerant point of view.
On-line early fault detection and diagnosis of municipal solid waste incinerators

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhao Jinsong; Huang Jianchao; Sun Wei

A fault detection and diagnosis framework is proposed in this paper for early fault detection and diagnosis (FDD) of municipal solid waste incinerators (MSWIs) in order to improve the safety and continuity of production. In this framework, principal component analysis (PCA), one of the multivariate statistical technologies, is used for detecting abnormal events, while rule-based reasoning performs the fault diagnosis and consequence prediction, and also generates recommendations for fault mitigation once an abnormal event is detected. A software package, SWIFT, is developed based on the proposed framework, and has been applied in an actual industrial MSWI. The application shows thatmore » automated real-time abnormal situation management (ASM) of the MSWI can be achieved by using SWIFT, resulting in an industrially acceptable low rate of wrong diagnosis, which has resulted in improved process continuity and environmental performance of the MSWI.« less
A second generation experiment in fault-tolerant software

NASA Technical Reports Server (NTRS)

Knight, J. C.

1986-01-01

The primary goal was to determine whether the application of fault tolerance to software increases its reliability if the cost of production is the same as for an equivalent nonfault tolerance version derived from the same requirements specification. Software development protocols are discussed. The feasibility of adapting to software design fault tolerance the technique of N-fold Modular Redundancy with majority voting was studied.
Award ER25750: Coordinated Infrastructure for Fault Tolerance Systems Indiana University Final Report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lumsdaine, Andrew

2013-03-08

The main purpose of the Coordinated Infrastructure for Fault Tolerance in Systems initiative has been to conduct research with a goal of providing end-to-end fault tolerance on a systemwide basis for applications and other system software. While fault tolerance has been an integral part of most high-performance computing (HPC) system software developed over the past decade, it has been treated mostly as a collection of isolated stovepipes. Visibility and response to faults has typically been limited to the particular hardware and software subsystems in which they are initially observed. Little fault information is shared across subsystems, allowing little flexibility ormore » control on a system-wide basis, making it practically impossible to provide cohesive end-to-end fault tolerance in support of scientific applications. As an example, consider faults such as communication link failures that can be seen by a network library but are not directly visible to the job scheduler, or consider faults related to node failures that can be detected by system monitoring software but are not inherently visible to the resource manager. If information about such faults could be shared by the network libraries or monitoring software, then other system software, such as a resource manager or job scheduler, could ensure that failed nodes or failed network links were excluded from further job allocations and that further diagnosis could be performed. As a founding member and one of the lead developers of the Open MPI project, our efforts over the course of this project have been focused on making Open MPI more robust to failures by supporting various fault tolerance techniques, and using fault information exchange and coordination between MPI and the HPC system software stack from the application, numeric libraries, and programming language runtime to other common system components such as jobs schedulers, resource managers, and monitoring tools.« less
Virtual Platform for See Robustness Verification of Bootloader Embedded Software on Board Solar Orbiter's Energetic Particle Detector

NASA Astrophysics Data System (ADS)

Da Silva, A.; Sánchez Prieto, S.; Polo, O.; Parra Espada, P.

2013-05-01

Because of the tough robustness requirements in space software development, it is imperative to carry out verification tasks at a very early development stage to ensure that the implemented exception mechanisms work properly. All this should be done long time before the real hardware is available. But even if real hardware is available the verification of software fault tolerance mechanisms can be difficult since real faulty situations must be systematically and artificially brought about which can be imposible on real hardware. To solve this problem the Alcala Space Research Group (SRG) has developed a LEON2 virtual platform (Leon2ViP) with fault injection capabilities. This way it is posible to run the exact same target binary software as runs on the physical system in a more controlled and deterministic environment, allowing a more strict requirements verification. Leon2ViP enables unmanned and tightly focused fault injection campaigns, not possible otherwise, in order to expose and diagnose flaws in the software implementation early. Furthermore, the use of a virtual hardware-in-the-loop approach makes it possible to carry out preliminary integration tests with the spacecraft emulator or the sensors. The use of Leon2ViP has meant a signicant improvement, in both time and cost, in the development and verification processes of the Instrument Control Unit boot software on board Solar Orbiter's Energetic Particle Detector.
Coordinated Fault-Tolerance for High-Performance Computing Final Project Report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Panda, Dhabaleswar Kumar; Beckman, Pete

2011-07-28

With the Coordinated Infrastructure for Fault Tolerance Systems (CIFTS, as the original project came to be called) project, our aim has been to understand and tackle the following broad research questions, the answers to which will help the HEC community analyze and shape the direction of research in the field of fault tolerance and resiliency on future high-end leadership systems. Will availability of global fault information, obtained by fault information exchange between the different HEC software on a system, allow individual system software to better detect, diagnose, and adaptively respond to faults? If fault-awareness is raised throughout the system throughmore » fault information exchange, is it possible to get all system software working together to provide a more comprehensive end-to-end fault management on the system? What are the missing fault-tolerance features that widely used HEC system software lacks today that would inhibit such software from taking advantage of systemwide global fault information? What are the practical limitations of a systemwide approach for end-to-end fault management based on fault awareness and coordination? What mechanisms, tools, and technologies are needed to bring about fault awareness and coordination of responses on a leadership-class system? What standards, outreach, and community interaction are needed for adoption of the concept of fault awareness and coordination for fault management on future systems? Keeping our overall objectives in mind, the CIFTS team has taken a parallel fourfold approach. Our central goal was to design and implement a light-weight, scalable infrastructure with a simple, standardized interface to allow communication of fault-related information through the system and facilitate coordinated responses. This work led to the development of the Fault Tolerance Backplane (FTB) publish-subscribe API specification, together with a reference implementation and several experimental implementations on top of existing publish-subscribe tools. We enhanced the intrinsic fault tolerance capabilities representative implementations of a variety of key HPC software subsystems and integrated them with the FTB. Targeting software subsystems included: MPI communication libraries, checkpoint/restart libraries, resource managers and job schedulers, and system monitoring tools. Leveraging the aforementioned infrastructure, as well as developing and utilizing additional tools, we have examined issues associated with expanded, end-to-end fault response from both system and application viewpoints. From the standpoint of system operations, we have investigated log and root cause analysis, anomaly detection and fault prediction, and generalized notification mechanisms. Our applications work has included libraries for fault-tolerance linear algebra, application frameworks for coupled multiphysics applications, and external frameworks to support the monitoring and response for general applications. Our final goal was to engage the high-end computing community to increase awareness of tools and issues around coordinated end-to-end fault management.« less
Software quality: Process or people

NASA Technical Reports Server (NTRS)

Palmer, Regina; Labaugh, Modenna

1993-01-01

This paper will present data related to software development processes and personnel involvement from the perspective of software quality assurance. We examine eight years of data collected from six projects. Data collected varied by project but usually included defect and fault density with limited use of code metrics, schedule adherence, and budget growth information. The data are a blend of AFSCP 800-14 and suggested productivity measures in Software Metrics: A Practioner's Guide to Improved Product Development. A software quality assurance database tool, SQUID, was used to store and tabulate the data.
A software upgrade method for micro-electronics medical implants.

PubMed

Cao, Yang; Hao, Hongwei; Xue, Lin; Li, Luming; Ma, Bozhi

2006-01-01

A software upgrade method for micro-electronics medical implants is designed to enhance the devices' function or renew the software if there are some bugs found, the software updating or some memory units disabled. The implants needn't be replaced by operations if the faults can be corrected through reprogramming, which reduces the patients' pain and improves the safety effectively. This paper introduces the software upgrade method using in-application programming (IAP) and emphasizes how to insure the system, especially the implanted part's reliability and stability while upgrading.
Survey of Verification and Validation Techniques for Small Satellite Software Development

NASA Technical Reports Server (NTRS)

Jacklin, Stephen A.

2015-01-01

The purpose of this paper is to provide an overview of the current trends and practices in small-satellite software verification and validation. This document is not intended to promote a specific software assurance method. Rather, it seeks to present an unbiased survey of software assurance methods used to verify and validate small satellite software and to make mention of the benefits and value of each approach. These methods include simulation and testing, verification and validation with model-based design, formal methods, and fault-tolerant software design with run-time monitoring. Although the literature reveals that simulation and testing has by far the longest legacy, model-based design methods are proving to be useful for software verification and validation. Some work in formal methods, though not widely used for any satellites, may offer new ways to improve small satellite software verification and validation. These methods need to be further advanced to deal with the state explosion problem and to make them more usable by small-satellite software engineers to be regularly applied to software verification. Last, it is explained how run-time monitoring, combined with fault-tolerant software design methods, provides an important means to detect and correct software errors that escape the verification process or those errors that are produced after launch through the effects of ionizing radiation.

Experiments in fault tolerant software reliability

NASA Technical Reports Server (NTRS)

Mcallister, David F.; Tai, K. C.; Vouk, Mladen A.

1987-01-01

The reliability of voting was evaluated in a fault-tolerant software system for small output spaces. The effectiveness of the back-to-back testing process was investigated. Version 3.0 of the RSDIMU-ATS, a semi-automated test bed for certification testing of RSDIMU software, was prepared and distributed. Software reliability estimation methods based on non-random sampling are being studied. The investigation of existing fault-tolerance models was continued and formulation of new models was initiated.
The Curiosity Mars Rover's Fault Protection Engine

NASA Technical Reports Server (NTRS)

Benowitz, Ed

2014-01-01

The Curiosity Rover, currently operating on Mars, contains flight software onboard to autonomously handle aspects of system fault protection. Over 1000 monitors and 39 responses are present in the flight software. Orchestrating these behaviors is the flight software's fault protection engine. In this paper, we discuss the engine's design, responsibilities, and present some lessons learned for future missions.
Various Indices for Diagnosis of Air-gap Eccentricity Fault in Induction Motor-A Review

NASA Astrophysics Data System (ADS)

Nikhil; Mathew, Lini, Dr.; Sharma, Amandeep

2018-03-01

From the past few years, research has gained an ardent pace in the field of fault detection and diagnosis in induction motors. In the current scenario, software is being introduced with diagnostic features to improve stability and reliability in fault diagnostic techniques. Human involvement in decision making for fault detection is slowly being replaced by Artificial Intelligence techniques. In this paper, a brief introduction of eccentricity fault is presented along with their causes and effects on the health of induction motors. Various indices used to detect eccentricity are being introduced along with their boundary conditions and their future scope of research. At last, merits and demerits of all indices are discussed and a comparison is made between them.
Testing Scientific Software: A Systematic Literature Review.

PubMed

Kanewala, Upulee; Bieman, James M

2014-10-01

Scientific software plays an important role in critical decision making, for example making weather predictions based on climate models, and computation of evidence for research publications. Recently, scientists have had to retract publications due to errors caused by software faults. Systematic testing can identify such faults in code. This study aims to identify specific challenges, proposed solutions, and unsolved problems faced when testing scientific software. We conducted a systematic literature survey to identify and analyze relevant literature. We identified 62 studies that provided relevant information about testing scientific software. We found that challenges faced when testing scientific software fall into two main categories: (1) testing challenges that occur due to characteristics of scientific software such as oracle problems and (2) testing challenges that occur due to cultural differences between scientists and the software engineering community such as viewing the code and the model that it implements as inseparable entities. In addition, we identified methods to potentially overcome these challenges and their limitations. Finally we describe unsolved challenges and how software engineering researchers and practitioners can help to overcome them. Scientific software presents special challenges for testing. Specifically, cultural differences between scientist developers and software engineers, along with the characteristics of the scientific software make testing more difficult. Existing techniques such as code clone detection can help to improve the testing process. Software engineers should consider special challenges posed by scientific software such as oracle problems when developing testing techniques.
Software reliability models for fault-tolerant avionics computers and related topics

NASA Technical Reports Server (NTRS)

Miller, Douglas R.

1987-01-01

Software reliability research is briefly described. General research topics are reliability growth models, quality of software reliability prediction, the complete monotonicity property of reliability growth, conceptual modelling of software failure behavior, assurance of ultrahigh reliability, and analysis techniques for fault-tolerant systems.
Study of a unified hardware and software fault-tolerant architecture

NASA Technical Reports Server (NTRS)

Lala, Jaynarayan; Alger, Linda; Friend, Steven; Greeley, Gregory; Sacco, Stephen; Adams, Stuart

1989-01-01

A unified architectural concept, called the Fault Tolerant Processor Attached Processor (FTP-AP), that can tolerate hardware as well as software faults is proposed for applications requiring ultrareliable computation capability. An emulation of the FTP-AP architecture, consisting of a breadboard Motorola 68010-based quadruply redundant Fault Tolerant Processor, four VAX 750s as attached processors, and four versions of a transport aircraft yaw damper control law, is used as a testbed in the AIRLAB to examine a number of critical issues. Solutions of several basic problems associated with N-Version software are proposed and implemented on the testbed. This includes a confidence voter to resolve coincident errors in N-Version software. A reliability model of N-Version software that is based upon the recent understanding of software failure mechanisms is also developed. The basic FTP-AP architectural concept appears suitable for hosting N-Version application software while at the same time tolerating hardware failures. Architectural enhancements for greater efficiency, software reliability modeling, and N-Version issues that merit further research are identified.
Adopting software quality measures for healthcare processes.

PubMed

Yildiz, Ozkan; Demirörs, Onur

2009-01-01

In this study, we investigated the adoptability of software quality measures for healthcare process measurement. Quality measures of ISO/IEC 9126 are redefined from a process perspective to build a generic healthcare process quality measurement model. Case study research method is used, and the model is applied to a public hospital's Entry to Care process. After the application, weak and strong aspects of the process can be easily observed. Access audibility, fault removal, completeness of documentation, and machine utilization are weak aspects and these aspects are the candidates for process improvement. On the other hand, functional completeness, fault ratio, input validity checking, response time, and throughput time are the strong aspects of the process.
Analyzing Software Errors in Safety-Critical Embedded Systems

NASA Technical Reports Server (NTRS)

Lutz, Robyn R.

1994-01-01

This paper analyzes the root causes of safty-related software faults identified as potentially hazardous to the system are distributed somewhat differently over the set of possible error causes than non-safety-related software faults.
Surrogate oracles, generalized dependency and simpler models

NASA Technical Reports Server (NTRS)

Wilson, Larry

1990-01-01

Software reliability models require the sequence of interfailure times from the debugging process as input. It was previously illustrated that using data from replicated debugging could greatly improve reliability predictions. However, inexpensive replication of the debugging process requires the existence of a cheap, fast error detector. Laboratory experiments can be designed around a gold version which is used as an oracle or around an n-version error detector. Unfortunately, software developers can not be expected to have an oracle or to bear the expense of n-versions. A generic technique is being investigated for approximating replicated data by using the partially debugged software as a difference detector. It is believed that the failure rate of each fault has significant dependence on the presence or absence of other faults. Thus, in order to discuss a failure rate for a known fault, the presence or absence of each of the other known faults needs to be specified. Also, in simpler models which use shorter input sequences without sacrificing accuracy are of interest. In fact, a possible gain in performance is conjectured. To investigate these propositions, NASA computers running LIC (RTI) versions are used to generate data. This data will be used to label the debugging graph associated with each version. These labeled graphs will be used to test the utility of a surrogate oracle, to analyze the dependent nature of fault failure rates and to explore the feasibility of reliability models which use the data of only the most recent failures.
Implementation of a research prototype onboard fault monitoring and diagnosis system

NASA Technical Reports Server (NTRS)

Palmer, Michael T.; Abbott, Kathy H.; Schutte, Paul C.; Ricks, Wendell R.

1987-01-01

Due to the dynamic and complex nature of in-flight fault monitoring and diagnosis, a research effort was undertaken at NASA Langley Research Center to investigate the application of artificial intelligence techniques for improved situational awareness. Under this research effort, concepts were developed and a software architecture was designed to address the complexities of onboard monitoring and diagnosis. This paper describes the implementation of these concepts in a computer program called FaultFinder. The implementation of the monitoring, diagnosis, and interface functions as separate modules is discussed, as well as the blackboard designed for the communication of these modules. Some related issues concerning the future installation of FaultFinder in an aircraft are also discussed.
Redundant and fault-tolerant algorithms for real-time measurement and control systems for weapon equipment.

PubMed

Li, Dan; Hu, Xiaoguang

2017-03-01

Because of the high availability requirements from weapon equipment, an in-depth study has been conducted on the real-time fault-tolerance of the widely applied Compact PCI (CPCI) bus measurement and control system. A redundancy design method that uses heartbeat detection to connect the primary and alternate devices has been developed. To address the low successful execution rate and relatively large waste of time slices in the primary version of the task software, an improved algorithm for real-time fault-tolerant scheduling is proposed based on the Basic Checking available time Elimination idle time (BCE) algorithm, applying a single-neuron self-adaptive proportion sum differential (PSD) controller. The experimental validation results indicate that this system has excellent redundancy and fault-tolerance, and the newly developed method can effectively improve the system availability. Copyright © 2017 ISA. Published by Elsevier Ltd. All rights reserved.
An experiment in software reliability

NASA Technical Reports Server (NTRS)

Dunham, J. R.; Pierce, J. L.

1986-01-01

The results of a software reliability experiment conducted in a controlled laboratory setting are reported. The experiment was undertaken to gather data on software failures and is one in a series of experiments being pursued by the Fault Tolerant Systems Branch of NASA Langley Research Center to find a means of credibly performing reliability evaluations of flight control software. The experiment tests a small sample of implementations of radar tracking software having ultra-reliability requirements and uses n-version programming for error detection, and repetitive run modeling for failure and fault rate estimation. The experiment results agree with those of Nagel and Skrivan in that the program error rates suggest an approximate log-linear pattern and the individual faults occurred with significantly different error rates. Additional analysis of the experimental data raises new questions concerning the phenomenon of interacting faults. This phenomenon may provide one explanation for software reliability decay.
A coverage and slicing dependencies analysis for seeking software security defects.

PubMed

He, Hui; Zhang, Dongyan; Liu, Min; Zhang, Weizhe; Gao, Dongmin

2014-01-01

Software security defects have a serious impact on the software quality and reliability. It is a major hidden danger for the operation of a system that a software system has some security flaws. When the scale of the software increases, its vulnerability has becoming much more difficult to find out. Once these vulnerabilities are exploited, it may lead to great loss. In this situation, the concept of Software Assurance is carried out by some experts. And the automated fault localization technique is a part of the research of Software Assurance. Currently, automated fault localization method includes coverage based fault localization (CBFL) and program slicing. Both of the methods have their own location advantages and defects. In this paper, we have put forward a new method, named Reverse Data Dependence Analysis Model, which integrates the two methods by analyzing the program structure. On this basis, we finally proposed a new automated fault localization method. This method not only is automation lossless but also changes the basic location unit into single sentence, which makes the location effect more accurate. Through several experiments, we proved that our method is more effective. Furthermore, we analyzed the effectiveness among these existing methods and different faults.
Fault recovery characteristics of the fault tolerant multi-processor

NASA Technical Reports Server (NTRS)

Padilla, Peter A.

1990-01-01

The fault handling performance of the fault tolerant multiprocessor (FTMP) was investigated. Fault handling errors detected during fault injection experiments were characterized. In these fault injection experiments, the FTMP disabled a working unit instead of the faulted unit once every 500 faults, on the average. System design weaknesses allow active faults to exercise a part of the fault management software that handles byzantine or lying faults. It is pointed out that these weak areas in the FTMP's design increase the probability that, for any hardware fault, a good LRU (line replaceable unit) is mistakenly disabled by the fault management software. It is concluded that fault injection can help detect and analyze the behavior of a system in the ultra-reliable regime. Although fault injection testing cannot be exhaustive, it has been demonstrated that it provides a unique capability to unmask problems and to characterize the behavior of a fault-tolerant system.
Technical Reference Suite Addressing Challenges of Providing Assurance for Fault Management Architectural Design

NASA Technical Reports Server (NTRS)

Fitz, Rhonda; Whitman, Gerek

2016-01-01

Research into complexities of software systems Fault Management (FM) and how architectural design decisions affect safety, preservation of assets, and maintenance of desired system functionality has coalesced into a technical reference (TR) suite that advances the provision of safety and mission assurance. The NASA Independent Verification and Validation (IVV) Program, with Software Assurance Research Program support, extracted FM architectures across the IVV portfolio to evaluate robustness, assess visibility for validation and test, and define software assurance methods applied to the architectures and designs. This investigation spanned IVV projects with seven different primary developers, a wide range of sizes and complexities, and encompassed Deep Space Robotic, Human Spaceflight, and Earth Orbiter mission FM architectures. The initiative continues with an expansion of the TR suite to include Launch Vehicles, adding the benefit of investigating differences intrinsic to model-based FM architectures and insight into complexities of FM within an Agile software development environment, in order to improve awareness of how nontraditional processes affect FM architectural design and system health management.
Synchronization and fault-masking in redundant real-time systems

NASA Technical Reports Server (NTRS)

Krishna, C. M.; Shin, K. G.; Butler, R. W.

1983-01-01

A real time computer may fail because of massive component failures or not responding quickly enough to satisfy real time requirements. An increase in redundancy - a conventional means of improving reliability - can improve the former but can - in some cases - degrade the latter considerably due to the overhead associated with redundancy management, namely the time delay resulting from synchronization and voting/interactive consistency techniques. The implications of synchronization and voting/interactive consistency algorithms in N-modular clusters on reliability are considered. All these studies were carried out in the context of real time applications. As a demonstrative example, we have analyzed results from experiments conducted at the NASA Airlab on the Software Implemented Fault Tolerance (SIFT) computer. This analysis has indeed indicated that in most real time applications, it is better to employ hardware synchronization instead of software synchronization and not allow reconfiguration.
NHPP-Based Software Reliability Models Using Equilibrium Distribution

NASA Astrophysics Data System (ADS)

Xiao, Xiao; Okamura, Hiroyuki; Dohi, Tadashi

Non-homogeneous Poisson processes (NHPPs) have gained much popularity in actual software testing phases to estimate the software reliability, the number of remaining faults in software and the software release timing. In this paper, we propose a new modeling approach for the NHPP-based software reliability models (SRMs) to describe the stochastic behavior of software fault-detection processes. The fundamental idea is to apply the equilibrium distribution to the fault-detection time distribution in NHPP-based modeling. We also develop efficient parameter estimation procedures for the proposed NHPP-based SRMs. Through numerical experiments, it can be concluded that the proposed NHPP-based SRMs outperform the existing ones in many data sets from the perspective of goodness-of-fit and prediction performance.
A study of fault prediction and reliability assessment in the SEL environment

NASA Technical Reports Server (NTRS)

Basili, Victor R.; Patnaik, Debabrata

1986-01-01

An empirical study on estimation and prediction of faults, prediction of fault detection and correction effort, and reliability assessment in the Software Engineering Laboratory environment (SEL) is presented. Fault estimation using empirical relationships and fault prediction using curve fitting method are investigated. Relationships between debugging efforts (fault detection and correction effort) in different test phases are provided, in order to make an early estimate of future debugging effort. This study concludes with the fault analysis, application of a reliability model, and analysis of a normalized metric for reliability assessment and reliability monitoring during development of software.
Application of neural networks to software quality modeling of a very large telecommunications system.

PubMed

Khoshgoftaar, T M; Allen, E B; Hudepohl, J P; Aud, S J

1997-01-01

Society relies on telecommunications to such an extent that telecommunications software must have high reliability. Enhanced measurement for early risk assessment of latent defects (EMERALD) is a joint project of Nortel and Bell Canada for improving the reliability of telecommunications software products. This paper reports a case study of neural-network modeling techniques developed for the EMERALD system. The resulting neural network is currently in the prototype testing phase at Nortel. Neural-network models can be used to identify fault-prone modules for extra attention early in development, and thus reduce the risk of operational problems with those modules. We modeled a subset of modules representing over seven million lines of code from a very large telecommunications software system. The set consisted of those modules reused with changes from the previous release. The dependent variable was membership in the class of fault-prone modules. The independent variables were principal components of nine measures of software design attributes. We compared the neural-network model with a nonparametric discriminant model and found the neural-network model had better predictive accuracy.
Software life cycle methodologies and environments

NASA Technical Reports Server (NTRS)

Fridge, Ernest

1991-01-01

Products of this project will significantly improve the quality and productivity of Space Station Freedom Program software processes by: improving software reliability and safety; and broadening the range of problems that can be solved with computational solutions. Projects brings in Computer Aided Software Engineering (CASE) technology for: Environments such as Engineering Script Language/Parts Composition System (ESL/PCS) application generator, Intelligent User Interface for cost avoidance in setting up operational computer runs, Framework programmable platform for defining process and software development work flow control, Process for bringing CASE technology into an organization's culture, and CLIPS/CLIPS Ada language for developing expert systems; and methodologies such as Method for developing fault tolerant, distributed systems and a method for developing systems for common sense reasoning and for solving expert systems problems when only approximate truths are known.

Measurement and analysis of operating system fault tolerance

NASA Technical Reports Server (NTRS)

Lee, I.; Tang, D.; Iyer, R. K.

1992-01-01

This paper demonstrates a methodology to model and evaluate the fault tolerance characteristics of operational software. The methodology is illustrated through case studies on three different operating systems: the Tandem GUARDIAN fault-tolerant system, the VAX/VMS distributed system, and the IBM/MVS system. Measurements are made on these systems for substantial periods to collect software error and recovery data. In addition to investigating basic dependability characteristics such as major software problems and error distributions, we develop two levels of models to describe error and recovery processes inside an operating system and on multiple instances of an operating system running in a distributed environment. Based on the models, reward analysis is conducted to evaluate the loss of service due to software errors and the effect of the fault-tolerance techniques implemented in the systems. Software error correlation in multicomputer systems is also investigated.
Development and analysis of the Software Implemented Fault-Tolerance (SIFT) computer

NASA Technical Reports Server (NTRS)

Goldberg, J.; Kautz, W. H.; Melliar-Smith, P. M.; Green, M. W.; Levitt, K. N.; Schwartz, R. L.; Weinstock, C. B.

1984-01-01

SIFT (Software Implemented Fault Tolerance) is an experimental, fault-tolerant computer system designed to meet the extreme reliability requirements for safety-critical functions in advanced aircraft. Errors are masked by performing a majority voting operation over the results of identical computations, and faulty processors are removed from service by reassigning computations to the nonfaulty processors. This scheme has been implemented in a special architecture using a set of standard Bendix BDX930 processors, augmented by a special asynchronous-broadcast communication interface that provides direct, processor to processor communication among all processors. Fault isolation is accomplished in hardware; all other fault-tolerance functions, together with scheduling and synchronization are implemented exclusively by executive system software. The system reliability is predicted by a Markov model. Mathematical consistency of the system software with respect to the reliability model has been partially verified, using recently developed tools for machine-aided proof of program correctness.
Testing Scientific Software: A Systematic Literature Review

PubMed Central

Kanewala, Upulee; Bieman, James M.

2014-01-01

Context Scientific software plays an important role in critical decision making, for example making weather predictions based on climate models, and computation of evidence for research publications. Recently, scientists have had to retract publications due to errors caused by software faults. Systematic testing can identify such faults in code. Objective This study aims to identify specific challenges, proposed solutions, and unsolved problems faced when testing scientific software. Method We conducted a systematic literature survey to identify and analyze relevant literature. We identified 62 studies that provided relevant information about testing scientific software. Results We found that challenges faced when testing scientific software fall into two main categories: (1) testing challenges that occur due to characteristics of scientific software such as oracle problems and (2) testing challenges that occur due to cultural differences between scientists and the software engineering community such as viewing the code and the model that it implements as inseparable entities. In addition, we identified methods to potentially overcome these challenges and their limitations. Finally we describe unsolved challenges and how software engineering researchers and practitioners can help to overcome them. Conclusions Scientific software presents special challenges for testing. Specifically, cultural differences between scientist developers and software engineers, along with the characteristics of the scientific software make testing more difficult. Existing techniques such as code clone detection can help to improve the testing process. Software engineers should consider special challenges posed by scientific software such as oracle problems when developing testing techniques. PMID:25125798
MER Surface Phase; Blurring the Line Between Fault Protection and What is Supposed to Happen

NASA Technical Reports Server (NTRS)

Reeves, Glenn E.

2008-01-01

An assessment on the limitations of communication with MER rovers and how such constraints drove the system design, flight software and fault protection architecture, blurring the line between traditional fault protection and expected nominal behavior, and requiring the most novel autonomous and semi-autonomous elements of the vehicle software including communication, surface mobility, attitude knowledge acquisition, fault protection, and the activity arbitration service.
Fault Study of Valve Based on Test Analysis and Comparison

NASA Astrophysics Data System (ADS)

Cheng, Li; Yang, Wukui; Liang, Tao; Xu, Yu; Chen, Chao

2017-10-01

The valve of a certain type of small engine often has the fault phenomenon of abnormal vibration noise and can’t close under the specified pressure, which may cause the engine automatic stop because of valve incomplete close leading to fuel leakage during test and startup on the bench. By test study compared to imported valve with the same use function and test condition valve, and put forward the thinking of improving valve structure, compared no-improved valve to improved valve by adopting Fluent field simulation software. As a result, improved valve can restore close pressure of valve, restrain abnormal vibration noise phenomenon, and effectively compensate compression value of spring because of steel ball contacting position downward with valve casing.
Transient Faults in Computer Systems

NASA Technical Reports Server (NTRS)

Masson, Gerald M.

1993-01-01

A powerful technique particularly appropriate for the detection of errors caused by transient faults in computer systems was developed. The technique can be implemented in either software or hardware; the research conducted thus far primarily considered software implementations. The error detection technique developed has the distinct advantage of having provably complete coverage of all errors caused by transient faults that affect the output produced by the execution of a program. In other words, the technique does not have to be tuned to a particular error model to enhance error coverage. Also, the correctness of the technique can be formally verified. The technique uses time and software redundancy. The foundation for an effective, low-overhead, software-based certification trail approach to real-time error detection resulting from transient fault phenomena was developed.
Development and realization of the open fault diagnosis system based on XPE

NASA Astrophysics Data System (ADS)

Deng, Hui; Wang, TaiYong; He, HuiLong; Xu, YongGang; Zeng, JuXiang

2005-12-01

To make the complex mechanical equipment work in good service, the technology for realizing an embedded open system is introduced systematically, including open hardware configuration, customized embedded operation system and open software structure. The ETX technology is adopted in this system, integrating the CPU main-board functions, and achieving the quick, real-time signal acquisition and intelligent data analysis with applying DSP and CPLD data acquisition card. Under the open configuration, the signal bus mode such as PCI, ISA and PC/104 can be selected and the styles of the signals can be chosen too. In addition, through customizing XPE system, adopting the EWF (Enhanced Write Filter), and realizing the open system authentically, the stability of the system is enhanced. Multi-thread and multi-task programming techniques are adopted in the software programming process. Interconnecting with the remote fault diagnosis center via the net interface, cooperative diagnosis is conducted and the intelligent degree of the fault diagnosis is improved.
Copilot: Monitoring Embedded Systems

NASA Technical Reports Server (NTRS)

Pike, Lee; Wegmann, Nis; Niller, Sebastian; Goodloe, Alwyn

2012-01-01

Runtime verification (RV) is a natural fit for ultra-critical systems, where correctness is imperative. In ultra-critical systems, even if the software is fault-free, because of the inherent unreliability of commodity hardware and the adversity of operational environments, processing units (and their hosted software) are replicated, and fault-tolerant algorithms are used to compare the outputs. We investigate both software monitoring in distributed fault-tolerant systems, as well as implementing fault-tolerance mechanisms using RV techniques. We describe the Copilot language and compiler, specifically designed for generating monitors for distributed, hard real-time systems. We also describe two case-studies in which we generated Copilot monitors in avionics systems.
USGS Imagery Applications During Disaster Response After Recent Earthquakes

NASA Astrophysics Data System (ADS)

Hudnut, K. W.; Brooks, B. A.; Glennie, C. L.; Finnegan, D. C.

2015-12-01

It is not only important to rapidly characterize surface fault rupture and related ground deformation after an earthquake, but also to repeatedly make observations following an event to forecast fault afterslip. These data may also be used by other agencies to monitor progress on damage repairs and restoration efforts by emergency responders and the public. Related requirements include repeatedly obtaining reference or baseline imagery before a major disaster occurs, as well as maintaining careful geodetic control on all imagery in a time series so that absolute georeferencing may be applied to the image stack through time. In addition, repeated post-event imagery acquisition is required, generally at a higher repetition rate soon after the event, then scaled back to less frequent acquisitions with time, to capture phenomena (such as fault afterslip) that are known to have rates that decrease rapidly with time. For example, lidar observations acquired before and after the South Napa earthquake of 2014, used in our extensive post-processing work that was funded primarily by FEMA, aided in the accurate forecasting of fault afterslip. Lidar was used to independently validate and verify the official USGS afterslip forecast. In order to keep pace with rapidly evolving technology, a development pipeline must be established and maintained to continually test and incorporate new sensors, while adapting these new components to the existing platform and linking them to the existing base software system, and then sequentially testing the system as it evolves. Improvements in system performance by incremental upgrades of system components and software are essential. Improving calibration parameters and thereby progressively eliminating artifacts requires ongoing testing, research and development. To improve the system, we have formed an interdisciplinary team with common interests and diverse sources of support. We share expertise and leverage funding while effectively and rapidly improving our system, which includes the sensor package and software for all steps in acquiring, processing and differencing repeat-pass lidar and electro-optical imagery, and the GRiD metadata and point cloud database standard, already used during disaster response surge events by other agencies (e.g., during Hurricane Sandy in 2012).
Symposium on the Interface: Computing Science and Statistics (20th). Theme: Computationally Intensive Methods in Statistics Held in Reston, Virginia on April 20-23, 1988

DTIC Science & Technology

1988-08-20

34 William A. Link, Patuxent Wildlife Research Center "Increasing reliability of multiversion fault-tolerant software design by modulation," Junryo 3... Multiversion lault-Tolerant Software Design by Modularization Junryo Miyashita Department of Computer Science California state University at san Bernardino Fault...They shall beE refered to as " multiversion fault-tolerant software design". Onel problem of developing multi-versions of a program is the high cost
User's guide to programming fault injection and data acquisition in the SIFT environment

NASA Technical Reports Server (NTRS)

Elks, Carl R.; Green, David F.; Palumbo, Daniel L.

1987-01-01

Described are the features, command language, and functional design of the SIFT (Software Implemented Fault Tolerance) fault injection and data acquisition interface software. The document is also intended to assist and guide the SIFT user in defining, developing, and executing SIFT fault injection experiments and the subsequent collection and reduction of that fault injection data. It is also intended to be used in conjunction with the SIFT User's Guide (NASA Technical Memorandum 86289) for reference to SIFT system commands, procedures and functions, and overall guidance in SIFT system programming.
The Infeasibility of Experimental Quantification of Life-Critical Software Reliability

NASA Technical Reports Server (NTRS)

Butler, Ricky W.; Finelli, George B.

1991-01-01

This paper affirms that quantification of life-critical software reliability is infeasible using statistical methods whether applied to standard software or fault-tolerant software. The key assumption of software fault tolerance|separately programmed versions fail independently|is shown to be problematic. This assumption cannot be justified by experimentation in the ultra-reliability region and subjective arguments in its favor are not sufficiently strong to justify it as an axiom. Also, the implications of the recent multi-version software experiments support this affirmation.
Combined Application of Shallow Seismic Reflection and High-resolution Refraction Exploration Approach to Active Fault Survey, Central Orogenic Belt, China

NASA Astrophysics Data System (ADS)

Lin, S.; Luo, D.; Yanlin, F.; Li, Y.

2016-12-01

Shallow Seismic Reflection (SSR) is a major geophysical exploration method with its exploration depth range, high-resolution in urban active fault exploration. In this paper, we carried out (SSR) and High-resolution refraction (HRR) test in the Liangyun Basin to explore a buried fault. We used NZ distributed 64 channel seismic instrument, 60HZ high sensitivity detector, Geode multi-channel portable acquisition system and hammer source. We selected single side hammer hit multiple overlay, 48 channels received and 12 times of coverage. As there are some coincidence measuring lines of SSR and HRR, we chose multi chase and encounter observation system. Based on the satellite positioning, we arranged 11 survey lines in our study area with total length for 8132 meters. GEOGIGA seismic reflection data processing software was used to deal with the SSR data. After repeated tests from the aspects of single shot record compilation, interference wave pressing, static correction, velocity parameter extraction, dynamic correction, eventually got the shallow seismic reflection profile images. Meanwhile, we used Canadian technology company good refraction and tomographic imaging software to deal with HRR seismic data, which is based on nonlinear first arrival wave travel time tomography. Combined with drilling geological profiles, we explained 11 measured seismic profiles. Results show 18 obvious fault feature breakpoints, including 4 normal faults of south-west, 7 reverse faults of south-west, one normal fault of north-east and 6 reverse faults of north-east. Breakpoints buried depth is 15-18 meters, and the inferred fault distance is 3-12 meters. Comprehensive analysis shows that the fault property is reverse fault with northeast incline section, and fewer branch normal faults presenting southwest incline section. Since good corresponding relationship between the seismic interpretation results, drilling data and SEM results on the property, occurrence, broken length of the fault, we considered the Liangyun fault to be an active fault which has strong activity during the Neogene Pliocene and early Pleistocene, Middle Pleistocene period. The combined application of SSR and HRR can provide more parameters to explain the seismic results, and improve the accuracy of the interpretation.
Experimental analysis of computer system dependability

NASA Technical Reports Server (NTRS)

Iyer, Ravishankar, K.; Tang, Dong

1993-01-01

This paper reviews an area which has evolved over the past 15 years: experimental analysis of computer system dependability. Methodologies and advances are discussed for three basic approaches used in the area: simulated fault injection, physical fault injection, and measurement-based analysis. The three approaches are suited, respectively, to dependability evaluation in the three phases of a system's life: design phase, prototype phase, and operational phase. Before the discussion of these phases, several statistical techniques used in the area are introduced. For each phase, a classification of research methods or study topics is outlined, followed by discussion of these methods or topics as well as representative studies. The statistical techniques introduced include the estimation of parameters and confidence intervals, probability distribution characterization, and several multivariate analysis methods. Importance sampling, a statistical technique used to accelerate Monte Carlo simulation, is also introduced. The discussion of simulated fault injection covers electrical-level, logic-level, and function-level fault injection methods as well as representative simulation environments such as FOCUS and DEPEND. The discussion of physical fault injection covers hardware, software, and radiation fault injection methods as well as several software and hybrid tools including FIAT, FERARI, HYBRID, and FINE. The discussion of measurement-based analysis covers measurement and data processing techniques, basic error characterization, dependency analysis, Markov reward modeling, software-dependability, and fault diagnosis. The discussion involves several important issues studies in the area, including fault models, fast simulation techniques, workload/failure dependency, correlated failures, and software fault tolerance.
Software fault tolerance in computer operating systems

NASA Technical Reports Server (NTRS)

Iyer, Ravishankar K.; Lee, Inhwan

1994-01-01

This chapter provides data and analysis of the dependability and fault tolerance for three operating systems: the Tandem/GUARDIAN fault-tolerant system, the VAX/VMS distributed system, and the IBM/MVS system. Based on measurements from these systems, basic software error characteristics are investigated. Fault tolerance in operating systems resulting from the use of process pairs and recovery routines is evaluated. Two levels of models are developed to analyze error and recovery processes inside an operating system and interactions among multiple instances of an operating system running in a distributed environment. The measurements show that the use of process pairs in Tandem systems, which was originally intended for tolerating hardware faults, allows the system to tolerate about 70% of defects in system software that result in processor failures. The loose coupling between processors which results in the backup execution (the processor state and the sequence of events occurring) being different from the original execution is a major reason for the measured software fault tolerance. The IBM/MVS system fault tolerance almost doubles when recovery routines are provided, in comparison to the case in which no recovery routines are available. However, even when recovery routines are provided, there is almost a 50% chance of system failure when critical system jobs are involved.
A research program in empirical computer science

NASA Technical Reports Server (NTRS)

Knight, J. C.

1991-01-01

During the grant reporting period our primary activities have been to begin preparation for the establishment of a research program in experimental computer science. The focus of research in this program will be safety-critical systems. Many questions that arise in the effort to improve software dependability can only be addressed empirically. For example, there is no way to predict the performance of the various proposed approaches to building fault-tolerant software. Performance models, though valuable, are parameterized and cannot be used to make quantitative predictions without experimental determination of underlying distributions. In the past, experimentation has been able to shed some light on the practical benefits and limitations of software fault tolerance. It is common, also, for experimentation to reveal new questions or new aspects of problems that were previously unknown. A good example is the Consistent Comparison Problem that was revealed by experimentation and subsequently studied in depth. The result was a clear understanding of a previously unknown problem with software fault tolerance. The purpose of a research program in empirical computer science is to perform controlled experiments in the area of real-time, embedded control systems. The goal of the various experiments will be to determine better approaches to the construction of the software for computing systems that have to be relied upon. As such it will validate research concepts from other sources, provide new research results, and facilitate the transition of research results from concepts to practical procedures that can be applied with low risk to NASA flight projects. The target of experimentation will be the production software development activities undertaken by any organization prepared to contribute to the research program. Experimental goals, procedures, data analysis and result reporting will be performed for the most part by the University of Virginia.
Diagnostics Tools Identify Faults Prior to Failure

NASA Technical Reports Server (NTRS)

2013-01-01

Through the SBIR program, Rochester, New York-based Impact Technologies LLC collaborated with Ames Research Center to commercialize the Center s Hybrid Diagnostic Engine, or HyDE, software. The fault detecting program is now incorporated into a software suite that identifies potential faults early in the design phase of systems ranging from printers to vehicles and robots, saving time and money.
Hardware Fault Simulator for Microprocessors

NASA Technical Reports Server (NTRS)

Hess, L. M.; Timoc, C. C.

1983-01-01

Breadboarded circuit is faster and more thorough than software simulator. Elementary fault simulator for AND gate uses three gates and shaft register to simulate stuck-at-one or stuck-at-zero conditions at inputs and output. Experimental results showed hardware fault simulator for microprocessor gave faster results than software simulator, by two orders of magnitude, with one test being applied every 4 microseconds.
Fault tolerance in computational grids: perspectives, challenges, and issues.

PubMed

Haider, Sajjad; Nazir, Babar

2016-01-01

Computational grids are established with the intention of providing shared access to hardware and software based resources with special reference to increased computational capabilities. Fault tolerance is one of the most important issues faced by the computational grids. The main contribution of this survey is the creation of an extended classification of problems that incur in the computational grid environments. The proposed classification will help researchers, developers, and maintainers of grids to understand the types of issues to be anticipated. Moreover, different types of problems, such as omission, interaction, and timing related have been identified that need to be handled on various layers of the computational grid. In this survey, an analysis and examination is also performed pertaining to the fault tolerance and fault detection mechanisms. Our conclusion is that a dependable and reliable grid can only be established when more emphasis is on fault identification. Moreover, our survey reveals that adaptive and intelligent fault identification, and tolerance techniques can improve the dependability of grid working environments.
Technical Reference Suite Addressing Challenges of Providing Assurance for Fault Management Architectural Design

NASA Technical Reports Server (NTRS)

Fitz, Rhonda; Whitman, Gerek

2016-01-01

Research into complexities of software systems Fault Management (FM) and how architectural design decisions affect safety, preservation of assets, and maintenance of desired system functionality has coalesced into a technical reference (TR) suite that advances the provision of safety and mission assurance. The NASA Independent Verification and Validation (IV&V) Program, with Software Assurance Research Program support, extracted FM architectures across the IV&V portfolio to evaluate robustness, assess visibility for validation and test, and define software assurance methods applied to the architectures and designs. This investigation spanned IV&V projects with seven different primary developers, a wide range of sizes and complexities, and encompassed Deep Space Robotic, Human Spaceflight, and Earth Orbiter mission FM architectures. The initiative continues with an expansion of the TR suite to include Launch Vehicles, adding the benefit of investigating differences intrinsic to model-based FM architectures and insight into complexities of FM within an Agile software development environment, in order to improve awareness of how nontraditional processes affect FM architectural design and system health management. The identification of particular FM architectures, visibility, and associated IV&V techniques provides a TR suite that enables greater assurance that critical software systems will adequately protect against faults and respond to adverse conditions. Additionally, the role FM has with regard to strengthened security requirements, with potential to advance overall asset protection of flight software systems, is being addressed with the development of an adverse conditions database encompassing flight software vulnerabilities. Capitalizing on the established framework, this TR suite provides assurance capability for a variety of FM architectures and varied development approaches. Research results are being disseminated across NASA, other agencies, and the software community. This paper discusses the findings and TR suite informing the FM domain in best practices for FM architectural design, visibility observations, and methods employed for IV&V and mission assurance.

Production of Reliable Flight Crucial Software: Validation Methods Research for Fault Tolerant Avionics and Control Systems Sub-Working Group Meeting

NASA Technical Reports Server (NTRS)

Dunham, J. R. (Editor); Knight, J. C. (Editor)

1982-01-01

The state of the art in the production of crucial software for flight control applications was addressed. The association between reliability metrics and software is considered. Thirteen software development projects are discussed. A short term need for research in the areas of tool development and software fault tolerance was indicated. For the long term, research in format verification or proof methods was recommended. Formal specification and software reliability modeling, were recommended as topics for both short and long term research.
Algorithm-Based Fault Tolerance for Numerical Subroutines

NASA Technical Reports Server (NTRS)

Tumon, Michael; Granat, Robert; Lou, John

2007-01-01

A software library implements a new methodology of detecting faults in numerical subroutines, thus enabling application programs that contain the subroutines to recover transparently from single-event upsets. The software library in question is fault-detecting middleware that is wrapped around the numericalsubroutines. Conventional serial versions (based on LAPACK and FFTW) and a parallel version (based on ScaLAPACK) exist. The source code of the application program that contains the numerical subroutines is not modified, and the middleware is transparent to the user. The methodology used is a type of algorithm- based fault tolerance (ABFT). In ABFT, a checksum is computed before a computation and compared with the checksum of the computational result; an error is declared if the difference between the checksums exceeds some threshold. Novel normalization methods are used in the checksum comparison to ensure correct fault detections independent of algorithm inputs. In tests of this software reported in the peer-reviewed literature, this library was shown to enable detection of 99.9 percent of significant faults while generating no false alarms.
Applications of Logic Coverage Criteria and Logic Mutation to Software Testing

ERIC Educational Resources Information Center

Kaminski, Garrett K.

2011-01-01

Logic is an important component of software. Thus, software logic testing has enjoyed significant research over a period of decades, with renewed interest in the last several years. One approach to detecting logic faults is to create and execute tests that satisfy logic coverage criteria. Another approach to detecting faults is to perform mutation…
DEVELOPMENT AND TESTING OF FAULT-DIAGNOSIS ALGORITHMS FOR REACTOR PLANT SYSTEMS

DOE Office of Scientific and Technical Information (OSTI.GOV)

Grelle, Austin L.; Park, Young S.; Vilim, Richard B.

Argonne National Laboratory is further developing fault diagnosis algorithms for use by the operator of a nuclear plant to aid in improved monitoring of overall plant condition and performance. The objective is better management of plant upsets through more timely, informed decisions on control actions with the ultimate goal of improved plant safety, production, and cost management. Integration of these algorithms with visual aids for operators is taking place through a collaboration under the concept of an operator advisory system. This is a software entity whose purpose is to manage and distill the enormous amount of information an operator mustmore » process to understand the plant state, particularly in off-normal situations, and how the state trajectory will unfold in time. The fault diagnosis algorithms were exhaustively tested using computer simulations of twenty different faults introduced into the chemical and volume control system (CVCS) of a pressurized water reactor (PWR). The algorithms are unique in that each new application to a facility requires providing only the piping and instrumentation diagram (PID) and no other plant-specific information; a subject-matter expert is not needed to install and maintain each instance of an application. The testing approach followed accepted procedures for verifying and validating software. It was shown that the code satisfies its functional requirement which is to accept sensor information, identify process variable trends based on this sensor information, and then to return an accurate diagnosis based on chains of rules related to these trends. The validation and verification exercise made use of GPASS, a one-dimensional systems code, for simulating CVCS operation. Plant components were failed and the code generated the resulting plant response. Parametric studies with respect to the severity of the fault, the richness of the plant sensor set, and the accuracy of sensors were performed as part of the validation exercise. The background and overview of the software will be presented to give an overview of the approach. Following, the verification and validation effort using the GPASS code for simulation of plant transients including a sensitivity study on important parameters will be presented« less
Predeployment validation of fault-tolerant systems through software-implemented fault insertion

NASA Technical Reports Server (NTRS)

Czeck, Edward W.; Siewiorek, Daniel P.; Segall, Zary Z.

1989-01-01

Fault injection-based automated testing (FIAT) environment, which can be used to experimentally characterize and evaluate distributed realtime systems under fault-free and faulted conditions is described. A survey is presented of validation methodologies. The need for fault insertion based on validation methodologies is demonstrated. The origins and models of faults, and motivation for the FIAT concept are reviewed. FIAT employs a validation methodology which builds confidence in the system through first providing a baseline of fault-free performance data and then characterizing the behavior of the system with faults present. Fault insertion is accomplished through software and allows faults or the manifestation of faults to be inserted by either seeding faults into memory or triggering error detection mechanisms. FIAT is capable of emulating a variety of fault-tolerant strategies and architectures, can monitor system activity, and can automatically orchestrate experiments involving insertion of faults. There is a common system interface which allows ease of use to decrease experiment development and run time. Fault models chosen for experiments on FIAT have generated system responses which parallel those observed in real systems under faulty conditions. These capabilities are shown by two example experiments each using a different fault-tolerance strategy.
Advanced information processing system: Fault injection study and results

NASA Technical Reports Server (NTRS)

Burkhardt, Laura F.; Masotto, Thomas K.; Lala, Jaynarayan H.

1992-01-01

The objective of the AIPS program is to achieve a validated fault tolerant distributed computer system. The goals of the AIPS fault injection study were: (1) to present the fault injection study components addressing the AIPS validation objective; (2) to obtain feedback for fault removal from the design implementation; (3) to obtain statistical data regarding fault detection, isolation, and reconfiguration responses; and (4) to obtain data regarding the effects of faults on system performance. The parameters are described that must be varied to create a comprehensive set of fault injection tests, the subset of test cases selected, the test case measurements, and the test case execution. Both pin level hardware faults using a hardware fault injector and software injected memory mutations were used to test the system. An overview is provided of the hardware fault injector and the associated software used to carry out the experiments. Detailed specifications are given of fault and test results for the I/O Network and the AIPS Fault Tolerant Processor, respectively. The results are summarized and conclusions are given.
Diagnostic Analyzer for Gearboxes (DAG): User's Guide. Version 3.1 for Microsoft Windows 3.1

NASA Technical Reports Server (NTRS)

Jammu, Vinay B.; Kourosh, Danai

1997-01-01

This documentation describes the Diagnostic Analyzer for Gearboxes (DAG) software for performing fault diagnosis of gearboxes. First, the user would construct a graphical representation of the gearbox using the gear, bearing, shaft, and sensor tools contained in the DAG software. Next, a set of vibration features obtained by processing the vibration signals recorded from the gearbox using a signal analyzer is required. Given this information, the DAG software uses an unsupervised neural network referred to as the Fault Detection Network (FDN) to identify the occurrence of faults, and a pattern classifier called Single Category-Based Classifier (SCBC) for abnormality scaling of individual vibration features. The abnormality-scaled vibration features are then used as inputs to a Structure-Based Connectionist Network (SBCN) for identifying faults in gearbox subsystems and components. The weights of the SBCN represent its diagnostic knowledge and are derived from the structure of the gearbox graphically presented in DAG. The outputs of SBCN are fault possibility values between 0 and 1 for individual subsystems and components in the gearbox with a 1 representing a definite fault and a 0 representing normality. This manual describes the steps involved in creating the diagnostic gearbox model, along with the options and analysis tools of the DAG software.
A Voyager attitude control perspective on fault tolerant systems

NASA Technical Reports Server (NTRS)

Rasmussen, R. D.; Litty, E. C.

1981-01-01

In current spacecraft design, a trend can be observed to achieve greater fault tolerance through the application of on-board software dedicated to detecting and isolating failures. Whether fault tolerance through software can meet the desired objectives depends on very careful consideration and control of the system in which the software is imbedded. The considered investigation has the objective to provide some of the insight needed for the required analysis of the system. A description is given of the techniques which have been developed in this connection during the development of the Voyager spacecraft. The Voyager Galileo Attitude and Articulation Control Subsystem (AACS) fault tolerant design is discussed to emphasize basic lessons learned from this experience. The central driver of hardware redundancy implementation on Voyager was known as the 'single point failure criterion'.
Drive and protection circuit for converter module of cascaded H-bridge STATCOM

NASA Astrophysics Data System (ADS)

Wang, Xuan; Yuan, Hongliang; Wang, Xiaoxing; Wang, Shuai; Fu, Yongsheng

2018-04-01

Drive and protection circuit is an important part of power electronics, which is related to safe and stable operation issues in the power electronics. The drive and protection circuit is designed for the cascaded H-bridge STATCOM. This circuit can realize flexible dead-time setting, operation status self-detection, fault priority protection and detailed fault status uploading. It can help to improve the reliability of STATCOM's operation. Finally, the proposed circuit is tested and analyzed by power electronic simulation software PSPICE (Simulation Program with IC Emphasis) and a series of experiments. Further studies showed that the proposed circuit can realize drive and control of H-bridge circuit, meanwhile it also can realize fast processing faults and have advantage of high reliability.
A fault-tolerant intelligent robotic control system

NASA Technical Reports Server (NTRS)

Marzwell, Neville I.; Tso, Kam Sing

1993-01-01

This paper describes the concept, design, and features of a fault-tolerant intelligent robotic control system being developed for space and commercial applications that require high dependability. The comprehensive strategy integrates system level hardware/software fault tolerance with task level handling of uncertainties and unexpected events for robotic control. The underlying architecture for system level fault tolerance is the distributed recovery block which protects against application software, system software, hardware, and network failures. Task level fault tolerance provisions are implemented in a knowledge-based system which utilizes advanced automation techniques such as rule-based and model-based reasoning to monitor, diagnose, and recover from unexpected events. The two level design provides tolerance of two or more faults occurring serially at any level of command, control, sensing, or actuation. The potential benefits of such a fault tolerant robotic control system include: (1) a minimized potential for damage to humans, the work site, and the robot itself; (2) continuous operation with a minimum of uncommanded motion in the presence of failures; and (3) more reliable autonomous operation providing increased efficiency in the execution of robotic tasks and decreased demand on human operators for controlling and monitoring the robotic servicing routines.
[The Development and Application of the Orthopaedics Implants Failure Database Software Based on WEB].

PubMed

Huang, Jiahua; Zhou, Hai; Zhang, Binbin; Ding, Biao

2015-09-01

This article develops a new failure database software for orthopaedics implants based on WEB. The software is based on B/S mode, ASP dynamic web technology is used as its main development language to achieve data interactivity, Microsoft Access is used to create a database, these mature technologies make the software extend function or upgrade easily. In this article, the design and development idea of the software, the software working process and functions as well as relative technical features are presented. With this software, we can store many different types of the fault events of orthopaedics implants, the failure data can be statistically analyzed, and in the macroscopic view, it can be used to evaluate the reliability of orthopaedics implants and operations, it also can ultimately guide the doctors to improve the clinical treatment level.
Ground Software Maintenance Facility (GSMF) system manual

NASA Technical Reports Server (NTRS)

Derrig, D.; Griffith, G.

1986-01-01

The Ground Software Maintenance Facility (GSMF) is designed to support development and maintenance of spacelab ground support software. THE GSMF consists of a Perkin Elmer 3250 (Host computer) and a MITRA 125s (ATE computer), with appropriate interface devices and software to simulate the Electrical Ground Support Equipment (EGSE). This document is presented in three sections: (1) GSMF Overview; (2) Software Structure; and (3) Fault Isolation Capability. The overview contains information on hardware and software organization along with their corresponding block diagrams. The Software Structure section describes the modes of software structure including source files, link information, and database files. The Fault Isolation section describes the capabilities of the Ground Computer Interface Device, Perkin Elmer host, and MITRA ATE.
Autonomous power system brassboard

NASA Technical Reports Server (NTRS)

Merolla, Anthony

1992-01-01

The Autonomous Power System (APS) brassboard is a 20 kHz power distribution system which has been developed at NASA Lewis Research Center, Cleveland, Ohio. The brassboard exists to provide a realistic hardware platform capable of testing artificially intelligent (AI) software. The brassboard's power circuit topology is based upon a Power Distribution Control Unit (PDCU), which is a subset of an advanced development 20 kHz electrical power system (EPS) testbed, originally designed for Space Station Freedom (SSF). The APS program is designed to demonstrate the application of intelligent software as a fault detection, isolation, and recovery methodology for space power systems. This report discusses both the hardware and software elements used to construct the present configuration of the brassboard. The brassboard power components are described. These include the solid-state switches (herein referred to as switchgear), transformers, sources, and loads. Closely linked to this power portion of the brassboard is the first level of embedded control. Hardware used to implement this control and its associated software is discussed. An Ada software program, developed by Lewis Research Center's Space Station Freedom Directorate for their 20 kHz testbed, is used to control the brassboard's switchgear, as well as monitor key brassboard parameters through sensors located within these switches. The Ada code is downloaded from a PC/AT, and is resident within the 8086 microprocessor-based embedded controllers. The PC/AT is also used for smart terminal emulation, capable of controlling the switchgear as well as displaying data from them. Intelligent control is provided through use of a T1 Explorer and the Autonomous Power Expert (APEX) LISP software. Real-time load scheduling is implemented through use of a 'C' program-based scheduling engine. The methods of communication between these computers and the brassboard are explored. In order to evaluate the features of both the brassboard hardware and intelligent controlling software, fault circuits have been developed and integrated as part of the brassboard. A description of these fault circuits and their function is included. The brassboard has become an extremely useful test facility, promoting artificial intelligence (AI) applications for power distribution systems. However, there are elements of the brassboard which could be enhanced, thus improving system performance. Modifications and enhancements to improve the brassboard's operation are discussed.
Detection of faults and software reliability analysis

NASA Technical Reports Server (NTRS)

Knight, John C.

1987-01-01

Multi-version or N-version programming is proposed as a method of providing fault tolerance in software. The approach requires the separate, independent preparation of multiple versions of a piece of software for some application. These versions are executed in parallel in the application environment; each receives identical inputs and each produces its version of the required outputs. The outputs are collected by a voter and, in principle, they should all be the same. In practice there may be some disagreement. If this occurs, the results of the majority are taken to be the correct output, and that is the output used by the system. A total of 27 programs were produced. Each of these programs was then subjected to one million randomly-generated test cases. The experiment yielded a number of programs containing faults that are useful for general studies of software reliability as well as studies of N-version programming. Fault tolerance through data diversity and analytic models of comparison testing are discussed.
Measurement of fault latency in a digital avionic miniprocessor

NASA Technical Reports Server (NTRS)

Mcgough, J. G.; Swern, F. L.

1981-01-01

The results of fault injection experiments utilizing a gate-level emulation of the central processor unit of the Bendix BDX-930 digital computer are presented. The failure detection coverage of comparison-monitoring and a typical avionics CPU self-test program was determined. The specific tasks and experiments included: (1) inject randomly selected gate-level and pin-level faults and emulate six software programs using comparison-monitoring to detect the faults; (2) based upon the derived empirical data develop and validate a model of fault latency that will forecast a software program's detecting ability; (3) given a typical avionics self-test program, inject randomly selected faults at both the gate-level and pin-level and determine the proportion of faults detected; (4) determine why faults were undetected; (5) recommend how the emulation can be extended to multiprocessor systems such as SIFT; and (6) determine the proportion of faults detected by a uniprocessor BIT (built-in-test) irrespective of self-test.
Criteria for software modularization

NASA Technical Reports Server (NTRS)

Card, David N.; Page, Gerald T.; Mcgarry, Frank E.

1985-01-01

A central issue in programming practice involves determining the appropriate size and information content of a software module. This study attempted to determine the effectiveness of two widely used criteria for software modularization, strength and size, in reducing fault rate and development cost. Data from 453 FORTRAN modules developed by professional programmers were analyzed. The results indicated that module strength is a good criterion with respect to fault rate, whereas arbitrary module size limitations inhibit programmer productivity. This analysis is a first step toward defining empirically based standards for software modularization.
Development and evaluation of a fault-tolerant multiprocessor (FTMP) computer. Volume 4: FTMP executive summary

NASA Technical Reports Server (NTRS)

Smith, T. B., III; Lala, J. H.

1984-01-01

The FTMP architecture is a high reliability computer concept modeled after a homogeneous multiprocessor architecture. Elements of the FTMP are operated in tight synchronism with one another and hardware fault-detection and fault-masking is provided which is transparent to the software. Operating system design and user software design is thus greatly simplified. Performance of the FTMP is also comparable to that of a simplex equivalent due to the efficiency of fault handling hardware. The FTMP project constructed an engineering module of the FTMP, programmed the machine and extensively tested the architecture through fault injection and other stress testing. This testing confirmed the soundness of the FTMP concepts.
Methodology for Designing Fault-Protection Software

NASA Technical Reports Server (NTRS)

Barltrop, Kevin; Levison, Jeffrey; Kan, Edwin

2006-01-01

A document describes a methodology for designing fault-protection (FP) software for autonomous spacecraft. The methodology embodies and extends established engineering practices in the technical discipline of Fault Detection, Diagnosis, Mitigation, and Recovery; and has been successfully implemented in the Deep Impact Spacecraft, a NASA Discovery mission. Based on established concepts of Fault Monitors and Responses, this FP methodology extends the notion of Opinion, Symptom, Alarm (aka Fault), and Response with numerous new notions, sub-notions, software constructs, and logic and timing gates. For example, Monitor generates a RawOpinion, which graduates into Opinion, categorized into no-opinion, acceptable, or unacceptable opinion. RaiseSymptom, ForceSymptom, and ClearSymptom govern the establishment and then mapping to an Alarm (aka Fault). Local Response is distinguished from FP System Response. A 1-to-n and n-to- 1 mapping is established among Monitors, Symptoms, and Responses. Responses are categorized by device versus by function. Responses operate in tiers, where the early tiers attempt to resolve the Fault in a localized step-by-step fashion, relegating more system-level response to later tier(s). Recovery actions are gated by epoch recovery timing, enabling strategy, urgency, MaxRetry gate, hardware availability, hazardous versus ordinary fault, and many other priority gates. This methodology is systematic, logical, and uses multiple linked tables, parameter files, and recovery command sequences. The credibility of the FP design is proven via a fault-tree analysis "top-down" approach, and a functional fault-mode-effects-and-analysis via "bottoms-up" approach. Via this process, the mitigation and recovery strategy(s) per Fault Containment Region scope (width versus depth) the FP architecture.
Resilience Design Patterns - A Structured Approach to Resilience at Extreme Scale (version 1.0)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hukerikar, Saurabh; Engelmann, Christian

Reliability is a serious concern for future extreme-scale high-performance computing (HPC) systems. Projections based on the current generation of HPC systems and technology roadmaps suggest that very high fault rates in future systems. The errors resulting from these faults will propagate and generate various kinds of failures, which may result in outcomes ranging from result corruptions to catastrophic application crashes. Practical limits on power consumption in HPC systems will require future systems to embrace innovative architectures, increasing the levels of hardware and software complexities. The resilience challenge for extreme-scale HPC systems requires management of various hardware and software technologies thatmore » are capable of handling a broad set of fault models at accelerated fault rates. These techniques must seek to improve resilience at reasonable overheads to power consumption and performance. While the HPC community has developed various solutions, application-level as well as system-based solutions, the solution space of HPC resilience techniques remains fragmented. There are no formal methods and metrics to investigate and evaluate resilience holistically in HPC systems that consider impact scope, handling coverage, and performance & power eciency across the system stack. Additionally, few of the current approaches are portable to newer architectures and software ecosystems, which are expected to be deployed on future systems. In this document, we develop a structured approach to the management of HPC resilience based on the concept of resilience-based design patterns. A design pattern is a general repeatable solution to a commonly occurring problem. We identify the commonly occurring problems and solutions used to deal with faults, errors and failures in HPC systems. The catalog of resilience design patterns provides designers with reusable design elements. We define a design framework that enhances our understanding of the important constraints and opportunities for solutions deployed at various layers of the system stack. The framework may be used to establish mechanisms and interfaces to coordinate flexible fault management across hardware and software components. The framework also enables optimization of the cost-benefit trade-os among performance, resilience, and power consumption. The overall goal of this work is to enable a systematic methodology for the design and evaluation of resilience technologies in extreme-scale HPC systems that keep scientific applications running to a correct solution in a timely and cost-ecient manner in spite of frequent faults, errors, and failures of various types.« less
Path Searching Based Fault Automated Recovery Scheme for Distribution Grid with DG

NASA Astrophysics Data System (ADS)

Xia, Lin; Qun, Wang; Hui, Xue; Simeng, Zhu

2016-12-01

Applying the method of path searching based on distribution network topology in setting software has a good effect, and the path searching method containing DG power source is also applicable to the automatic generation and division of planned islands after the fault. This paper applies path searching algorithm in the automatic division of planned islands after faults: starting from the switch of fault isolation, ending in each power source, and according to the line load that the searching path traverses and the load integrated by important optimized searching path, forming optimized division scheme of planned islands that uses each DG as power source and is balanced to local important load. Finally, COBASE software and distribution network automation software applied are used to illustrate the effectiveness of the realization of such automatic restoration program.

Evolution of solid rocket booster component testing

NASA Technical Reports Server (NTRS)

Lessey, Joseph A.

1989-01-01

The evolution of one of the new generation of test sets developed for the Solid Rocket Booster of the U.S. Space Transportation System. Requirements leading to factory checkout of the test set are explained, including the evolution from manual to semiautomated toward fully automated status. Individual improvements in the built-in test equipment, self-calibration, and software flexibility are addressed, and the insertion of fault detection to improve reliability is discussed.
Reliable and Fault-Tolerant Software-Defined Network Operations Scheme for Remote 3D Printing

NASA Astrophysics Data System (ADS)

Kim, Dongkyun; Gil, Joon-Min

2015-03-01

The recent wide expansion of applicable three-dimensional (3D) printing and software-defined networking (SDN) technologies has led to a great deal of attention being focused on efficient remote control of manufacturing processes. SDN is a renowned paradigm for network softwarization, which has helped facilitate remote manufacturing in association with high network performance, since SDN is designed to control network paths and traffic flows, guaranteeing improved quality of services by obtaining network requests from end-applications on demand through the separated SDN controller or control plane. However, current SDN approaches are generally focused on the controls and automation of the networks, which indicates that there is a lack of management plane development designed for a reliable and fault-tolerant SDN environment. Therefore, in addition to the inherent advantage of SDN, this paper proposes a new software-defined network operations center (SD-NOC) architecture to strengthen the reliability and fault-tolerance of SDN in terms of network operations and management in particular. The cooperation and orchestration between SDN and SD-NOC are also introduced for the SDN failover processes based on four principal SDN breakdown scenarios derived from the failures of the controller, SDN nodes, and connected links. The abovementioned SDN troubles significantly reduce the network reachability to remote devices (e.g., 3D printers, super high-definition cameras, etc.) and the reliability of relevant control processes. Our performance consideration and analysis results show that the proposed scheme can shrink operations and management overheads of SDN, which leads to the enhancement of responsiveness and reliability of SDN for remote 3D printing and control processes.
Use of Field Programmable Gate Array Technology in Future Space Avionics

NASA Technical Reports Server (NTRS)

Ferguson, Roscoe C.; Tate, Robert

2005-01-01

Fulfilling NASA's new vision for space exploration requires the development of sustainable, flexible and fault tolerant spacecraft control systems. The traditional development paradigm consists of the purchase or fabrication of hardware boards with fixed processor and/or Digital Signal Processing (DSP) components interconnected via a standardized bus system. This is followed by the purchase and/or development of software. This paradigm has several disadvantages for the development of systems to support NASA's new vision. Building a system to be fault tolerant increases the complexity and decreases the performance of included software. Standard bus design and conventional implementation produces natural bottlenecks. Configuring hardware components in systems containing common processors and DSPs is difficult initially and expensive or impossible to change later. The existence of Hardware Description Languages (HDLs), the recent increase in performance, density and radiation tolerance of Field Programmable Gate Arrays (FPGAs), and Intellectual Property (IP) Cores provides the technology for reprogrammable Systems on a Chip (SOC). This technology supports a paradigm better suited for NASA's vision. Hardware and software production are melded for more effective development; they can both evolve together over time. Designers incorporating this technology into future avionics can benefit from its flexibility. Systems can be designed with improved fault isolation and tolerance using hardware instead of software. Also, these designs can be protected from obsolescence problems where maintenance is compromised via component and vendor availability.To investigate the flexibility of this technology, the core of the Central Processing Unit and Input/Output Processor of the Space Shuttle AP101S Computer were prototyped in Verilog HDL and synthesized into an Altera Stratix FPGA.
A Fuzzy Expert System for Fault Management of Water Supply Recovery in the ALSS Project

NASA Technical Reports Server (NTRS)

Tohala, Vapsi J.

1998-01-01

Modeling with a new software is a challenge. CONFIG is a challenge and is design to work with many types of systems in which discrete and continuous processes occur. The CONFIG software was used to model the two subsystem of the Water Recovery system: ICB and TFB. The model worked manually only for water flows with further implementation to be done in the future. Activities in the models are stiff need to be implemented based on testing of the hardware for phase III. More improvements to CONFIG are in progress to make it a more user friendly software.
Software reliability through fault-avoidance and fault-tolerance

NASA Technical Reports Server (NTRS)

Vouk, Mladen A.; Mcallister, David F.

1992-01-01

Accomplishments in the following research areas are summarized: structure based testing, reliability growth, and design testability with risk evaluation; reliability growth models and software risk management; and evaluation of consensus voting, consensus recovery block, and acceptance voting. Four papers generated during the reporting period are included as appendices.
Windows .NET Network Distributed Basic Local Alignment Search Toolkit (W.ND-BLAST)

PubMed Central

Dowd, Scot E; Zaragoza, Joaquin; Rodriguez, Javier R; Oliver, Melvin J; Payton, Paxton R

2005-01-01

Background BLAST is one of the most common and useful tools for Genetic Research. This paper describes a software application we have termed Windows .NET Distributed Basic Local Alignment Search Toolkit (W.ND-BLAST), which enhances the BLAST utility by improving usability, fault recovery, and scalability in a Windows desktop environment. Our goal was to develop an easy to use, fault tolerant, high-throughput BLAST solution that incorporates a comprehensive BLAST result viewer with curation and annotation functionality. Results W.ND-BLAST is a comprehensive Windows-based software toolkit that targets researchers, including those with minimal computer skills, and provides the ability increase the performance of BLAST by distributing BLAST queries to any number of Windows based machines across local area networks (LAN). W.ND-BLAST provides intuitive Graphic User Interfaces (GUI) for BLAST database creation, BLAST execution, BLAST output evaluation and BLAST result exportation. This software also provides several layers of fault tolerance and fault recovery to prevent loss of data if nodes or master machines fail. This paper lays out the functionality of W.ND-BLAST. W.ND-BLAST displays close to 100% performance efficiency when distributing tasks to 12 remote computers of the same performance class. A high throughput BLAST job which took 662.68 minutes (11 hours) on one average machine was completed in 44.97 minutes when distributed to 17 nodes, which included lower performance class machines. Finally, there is a comprehensive high-throughput BLAST Output Viewer (BOV) and Annotation Engine components, which provides comprehensive exportation of BLAST hits to text files, annotated fasta files, tables, or association files. Conclusion W.ND-BLAST provides an interactive tool that allows scientists to easily utilizing their available computing resources for high throughput and comprehensive sequence analyses. The install package for W.ND-BLAST is freely downloadable from . With registration the software is free, installation, networking, and usage instructions are provided as well as a support forum. PMID:15819992
Galileo spacecraft power distribution and autonomous fault recovery

NASA Technical Reports Server (NTRS)

Detwiler, R. C.

1982-01-01

There is a trend in current spacecraft design to achieve greater fault tolerance through the implemenation of on-board software dedicated to detecting and isolating failures. A combination of hardware and software is utilized in the Galileo power system for autonomous fault recovery. Galileo is a dual-spun spacecraft designed to carry a number of scientific instruments into a series of orbits around the planet Jupiter. In addition to its self-contained scientific payload, it will also carry a probe system which will be separated from the spacecraft some 150 days prior to Jupiter encounter. The Galileo spacecraft is scheduled to be launched in 1985. Attention is given to the power system, the fault protection requirements, and the power fault recovery implementation.
Detection of faults and software reliability analysis

NASA Technical Reports Server (NTRS)

Knight, J. C.

1987-01-01

Specific topics briefly addressed include: the consistent comparison problem in N-version system; analytic models of comparison testing; fault tolerance through data diversity; and the relationship between failures caused by automatically seeded faults.
Design for dependability: A simulation-based approach. Ph.D. Thesis, 1993

NASA Technical Reports Server (NTRS)

Goswami, Kumar K.

1994-01-01

This research addresses issues in simulation-based system level dependability analysis of fault-tolerant computer systems. The issues and difficulties of providing a general simulation-based approach for system level analysis are discussed and a methodology that address and tackle these issues is presented. The proposed methodology is designed to permit the study of a wide variety of architectures under various fault conditions. It permits detailed functional modeling of architectural features such as sparing policies, repair schemes, routing algorithms as well as other fault-tolerant mechanisms, and it allows the execution of actual application software. One key benefit of this approach is that the behavior of a system under faults does not have to be pre-defined as it is normally done. Instead, a system can be simulated in detail and injected with faults to determine its failure modes. The thesis describes how object-oriented design is used to incorporate this methodology into a general purpose design and fault injection package called DEPEND. A software model is presented that uses abstractions of application programs to study the behavior and effect of software on hardware faults in the early design stage when actual code is not available. Finally, an acceleration technique that combines hierarchical simulation, time acceleration algorithms and hybrid simulation to reduce simulation time is introduced.
Aircraft Fault Detection and Classification Using Multi-Level Immune Learning Detection

NASA Technical Reports Server (NTRS)

Wong, Derek; Poll, Scott; KrishnaKumar, Kalmanje

2005-01-01

This work is an extension of a recently developed software tool called MILD (Multi-level Immune Learning Detection), which implements a negative selection algorithm for anomaly and fault detection that is inspired by the human immune system. The immunity-based approach can detect a broad spectrum of known and unforeseen faults. We extend MILD by applying a neural network classifier to identify the pattern of fault detectors that are activated during fault detection. Consequently, MILD now performs fault detection and identification of the system under investigation. This paper describes the application of MILD to detect and classify faults of a generic transport aircraft augmented with an intelligent flight controller. The intelligent control architecture is designed to accommodate faults without the need to explicitly identify them. Adding knowledge about the existence and type of a fault will improve the handling qualities of a degraded aircraft and impact tactical and strategic maneuvering decisions. In addition, providing fault information to the pilot is important for maintaining situational awareness so that he can avoid performing an action that might lead to unexpected behavior - e.g., an action that exceeds the remaining control authority of the damaged aircraft. We discuss the detection and classification results of simulated failures of the aircraft's control system and show that MILD is effective at determining the problem with low false alarm and misclassification rates.
Debugging and Logging Services for Defence Service Oriented Architectures

DTIC Science & Technology

2012-02-01

Service A software component and callable end point that provides a logically related set of operations, each of which perform a logical step in a...important to note that in some cases when the fault is identified to lie in uneditable code such as program libraries, or outsourced software services ...debugging is limited to characterisation of the fault, reporting it to the software or service provider and development of work-arounds and management
Software Implemented Fault-Tolerant (SIFT) user's guide

NASA Technical Reports Server (NTRS)

Green, D. F., Jr.; Palumbo, D. L.; Baltrus, D. W.

1984-01-01

Program development for a Software Implemented Fault Tolerant (SIFT) computer system is accomplished in the NASA LaRC AIRLAB facility using a DEC VAX-11 to interface with eight Bendix BDX 930 flight control processors. The interface software which provides this SIFT program development capability was developed by AIRLAB personnel. This technical memorandum describes the application and design of this software in detail, and is intended to assist both the user in performance of SIFT research and the systems programmer responsible for maintaining and/or upgrading the SIFT programming environment.
The Infeasibility of Quantifying the Reliability of Life-Critical Real-Time Software

NASA Technical Reports Server (NTRS)

Butler, Ricky W.; Finelli, George B.

1991-01-01

This paper affirms that the quantification of life-critical software reliability is infeasible using statistical methods whether applied to standard software or fault-tolerant software. The classical methods of estimating reliability are shown to lead to exhorbitant amounts of testing when applied to life-critical software. Reliability growth models are examined and also shown to be incapable of overcoming the need for excessive amounts of testing. The key assumption of software fault tolerance separately programmed versions fail independently is shown to be problematic. This assumption cannot be justified by experimentation in the ultrareliability region and subjective arguments in its favor are not sufficiently strong to justify it as an axiom. Also, the implications of the recent multiversion software experiments support this affirmation.
Proceedings of the European Seminar on Industrial Software Engineering (2nd) Held in Freiburg (Germany, F.R.) on 9-10 May 1985,

DTIC Science & Technology

1985-05-10

synchronisation , 8% cache bus monitoring ). 6. Conclusions Since the 1950’s, fault tolerance has been used to improve the reliability of hardware systems ...description. The operation may use other operations supplied with the system , here e.g. HIRE EMPLOYEE, ENTER MGR SAL etc . HIRE MRNAGR (X:PERSOW) nsot ACTOR (X...hardware design and in the operating systems software and they have developed a number of products which are of a commercial standard and of wide
Finite-Fault and Other New Capabilities of CISN ShakeAlert

NASA Astrophysics Data System (ADS)

Boese, M.; Felizardo, C.; Heaton, T. H.; Hudnut, K. W.; Hauksson, E.

2013-12-01

Over the past 6 years, scientists at Caltech, UC Berkeley, the Univ. of Southern California, the Univ. of Washington, the US Geological Survey, and ETH Zurich (Switzerland) have developed the 'ShakeAlert' earthquake early warning demonstration system for California and the Pacific Northwest. We have now started to transform this system into a stable end-to-end production system that will be integrated into the daily routine operations of the CISN and PNSN networks. To quickly determine the earthquake magnitude and location, ShakeAlert currently processes and interprets real-time data-streams from several hundred seismic stations within the California Integrated Seismic Network (CISN) and the Pacific Northwest Seismic Network (PNSN). Based on these parameters, the 'UserDisplay' software predicts and displays the arrival and intensity of shaking at a given user site. Real-time ShakeAlert feeds are currently being shared with around 160 individuals, companies, and emergency response organizations to gather feedback about the system performance, to educate potential users about EEW, and to identify needs and applications of EEW in a future operational warning system. To improve the performance during large earthquakes (M>6.5), we have started to develop, implement, and test a number of new algorithms for the ShakeAlert system: the 'FinDer' (Finite Fault Rupture Detector) algorithm provides real-time estimates of locations and extents of finite-fault ruptures from high-frequency seismic data. The 'GPSlip' algorithm estimates the fault slip along these ruptures using high-rate real-time GPS data. And, third, a new type of ground-motion prediction models derived from over 415,000 rupture simulations along active faults in southern California improves MMI intensity predictions for large earthquakes with consideration of finite-fault, rupture directivity, and basin response effects. FinDer and GPSlip are currently being real-time and offline tested in a separate internal ShakeAlert installation at Caltech. Real-time position and displacement time series from around 100 GPS sensors are obtained in JSON format from RTK/PPP(AR) solutions using the RTNet software at USGS Pasadena. However, we have also started to investigate the usage of onsite (in-receiver) processing using NetR9 with RTX and tracebuf2 output format. A number of changes to the ShakeAlert processing, xml message format, and the usage of this information in the UserDisplay software were necessary to handle the new finite-fault and slip information from the FinDer and GPSlip algorithms. In addition, we have developed a framework for end-to-end off-line testing with archived and simulated waveform data using the Earthworm tankplayer. Detailed background information about the algorithms, processing, and results from these test runs will be presented.
Effectiveness of back-to-back testing

NASA Technical Reports Server (NTRS)

Vouk, Mladen A.; Mcallister, David F.; Eckhardt, David E.; Caglayan, Alper; Kelly, John P. J.

1987-01-01

Three models of back-to-back testing processes are described. Two models treat the case where there is no intercomponent failure dependence. The third model describes the more realistic case where there is correlation among the failure probabilities of the functionally equivalent components. The theory indicates that back-to-back testing can, under the right conditions, provide a considerable gain in software reliability. The models are used to analyze the data obtained in a fault-tolerant software experiment. It is shown that the expected gain is indeed achieved, and exceeded, provided the intercomponent failure dependence is sufficiently small. However, even with the relatively high correlation the use of several functionally equivalent components coupled with back-to-back testing may provide a considerable reliability gain. Implications of this finding are that the multiversion software development is a feasible and cost effective approach to providing highly reliable software components intended for fault-tolerant software systems, on condition that special attention is directed at early detection and elimination of correlated faults.
Extreme scale multi-physics simulations of the tsunamigenic 2004 Sumatra megathrust earthquake

NASA Astrophysics Data System (ADS)

Ulrich, T.; Gabriel, A. A.; Madden, E. H.; Wollherr, S.; Uphoff, C.; Rettenberger, S.; Bader, M.

2017-12-01

SeisSol (www.seissol.org) is an open-source software package based on an arbitrary high-order derivative Discontinuous Galerkin method (ADER-DG). It solves spontaneous dynamic rupture propagation on pre-existing fault interfaces according to non-linear friction laws, coupled to seismic wave propagation with high-order accuracy in space and time (minimal dispersion errors). SeisSol exploits unstructured meshes to account for complex geometries, e.g. high resolution topography and bathymetry, 3D subsurface structure, and fault networks. We present the up-to-date largest (1500 km of faults) and longest (500 s) dynamic rupture simulation modeling the 2004 Sumatra-Andaman earthquake. We demonstrate the need for end-to-end-optimization and petascale performance of scientific software to realize realistic simulations on the extreme scales of subduction zone earthquakes: Considering the full complexity of subduction zone geometries leads inevitably to huge differences in element sizes. The main code improvements include a cache-aware wave propagation scheme and optimizations of the dynamic rupture kernels using code generation. In addition, a novel clustered local-time-stepping scheme for dynamic rupture has been established. Finally, asynchronous output has been implemented to overlap I/O and compute time. We resolve the frictional sliding process on the curved mega-thrust and a system of splay faults, as well as the seismic wave field and seafloor displacement with frequency content up to 2.2 Hz. We validate the scenario by geodetic, seismological and tsunami observations. The resulting rupture dynamics shed new light on the activation and importance of splay faults.
Automatic Fault Characterization via Abnormality-Enhanced Classification

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bronevetsky, G; Laguna, I; de Supinski, B R

Enterprise and high-performance computing systems are growing extremely large and complex, employing hundreds to hundreds of thousands of processors and software/hardware stacks built by many people across many organizations. As the growing scale of these machines increases the frequency of faults, system complexity makes these faults difficult to detect and to diagnose. Current system management techniques, which focus primarily on efficient data access and query mechanisms, require system administrators to examine the behavior of various system services manually. Growing system complexity is making this manual process unmanageable: administrators require more effective management tools that can detect faults and help tomore » identify their root causes. System administrators need timely notification when a fault is manifested that includes the type of fault, the time period in which it occurred and the processor on which it originated. Statistical modeling approaches can accurately characterize system behavior. However, the complex effects of system faults make these tools difficult to apply effectively. This paper investigates the application of classification and clustering algorithms to fault detection and characterization. We show experimentally that naively applying these methods achieves poor accuracy. Further, we design novel techniques that combine classification algorithms with information on the abnormality of application behavior to improve detection and characterization accuracy. Our experiments demonstrate that these techniques can detect and characterize faults with 65% accuracy, compared to just 5% accuracy for naive approaches.« less
MIROS: A Hybrid Real-Time Energy-Efficient Operating System for the Resource-Constrained Wireless Sensor Nodes

PubMed Central

Liu, Xing; Hou, Kun Mean; de Vaulx, Christophe; Shi, Hongling; Gholami, Khalid El

2014-01-01

Operating system (OS) technology is significant for the proliferation of the wireless sensor network (WSN). With an outstanding OS; the constrained WSN resources (processor; memory and energy) can be utilized efficiently. Moreover; the user application development can be served soundly. In this article; a new hybrid; real-time; memory-efficient; energy-efficient; user-friendly and fault-tolerant WSN OS MIROS is designed and implemented. MIROS implements the hybrid scheduler and the dynamic memory allocator. Real-time scheduling can thus be achieved with low memory consumption. In addition; it implements a mid-layer software EMIDE (Efficient Mid-layer Software for User-Friendly Application Development Environment) to decouple the WSN application from the low-level system. The application programming process can consequently be simplified and the application reprogramming performance improved. Moreover; it combines both the software and the multi-core hardware techniques to conserve the energy resources; improve the node reliability; as well as achieve a new debugging method. To evaluate the performance of MIROS; it is compared with the other WSN OSes (TinyOS; Contiki; SOS; openWSN and mantisOS) from different OS concerns. The final evaluation results prove that MIROS is suitable to be used even on the tight resource-constrained WSN nodes. It can support the real-time WSN applications. Furthermore; it is energy efficient; user friendly and fault tolerant. PMID:25248069
MIROS: a hybrid real-time energy-efficient operating system for the resource-constrained wireless sensor nodes.

PubMed

Liu, Xing; Hou, Kun Mean; de Vaulx, Christophe; Shi, Hongling; El Gholami, Khalid

2014-09-22

Operating system (OS) technology is significant for the proliferation of the wireless sensor network (WSN). With an outstanding OS; the constrained WSN resources (processor; memory and energy) can be utilized efficiently. Moreover; the user application development can be served soundly. In this article; a new hybrid; real-time; memory-efficient; energy-efficient; user-friendly and fault-tolerant WSN OS MIROS is designed and implemented. MIROS implements the hybrid scheduler and the dynamic memory allocator. Real-time scheduling can thus be achieved with low memory consumption. In addition; it implements a mid-layer software EMIDE (Efficient Mid-layer Software for User-Friendly Application Development Environment) to decouple the WSN application from the low-level system. The application programming process can consequently be simplified and the application reprogramming performance improved. Moreover; it combines both the software and the multi-core hardware techniques to conserve the energy resources; improve the node reliability; as well as achieve a new debugging method. To evaluate the performance of MIROS; it is compared with the other WSN OSes (TinyOS; Contiki; SOS; openWSN and mantisOS) from different OS concerns. The final evaluation results prove that MIROS is suitable to be used even on the tight resource-constrained WSN nodes. It can support the real-time WSN applications. Furthermore; it is energy efficient; user friendly and fault tolerant.

Fault Mitigation Schemes for Future Spaceflight Multicore Processors

NASA Technical Reports Server (NTRS)

Alexander, James W.; Clement, Bradley J.; Gostelow, Kim P.; Lai, John Y.

2012-01-01

Future planetary exploration missions demand significant advances in on-board computing capabilities over current avionics architectures based on a single-core processing element. The state-of-the-art multi-core processor provides much promise in meeting such challenges while introducing new fault tolerance problems when applied to space missions. Software-based schemes are being presented in this paper that can achieve system-level fault mitigation beyond that provided by radiation-hard-by-design (RHBD). For mission and time critical applications such as the Terrain Relative Navigation (TRN) for planetary or small body navigation, and landing, a range of fault tolerance methods can be adapted by the application. The software methods being investigated include Error Correction Code (ECC) for data packet routing between cores, virtual network routing, Triple Modular Redundancy (TMR), and Algorithm-Based Fault Tolerance (ABFT). A robust fault tolerance framework that provides fail-operational behavior under hard real-time constraints and graceful degradation will be demonstrated using TRN executing on a commercial Tilera(R) processor with simulated fault injections.
Simplex GPS and InSAR Inversion Software

NASA Technical Reports Server (NTRS)

Donnellan, Andrea; Parker, Jay W.; Lyzenga, Gregory A.; Pierce, Marlon E.

2012-01-01

Changes in the shape of the Earth's surface can be routinely measured with precisions better than centimeters. Processes below the surface often drive these changes and as a result, investigators require models with inversion methods to characterize the sources. Simplex inverts any combination of GPS (global positioning system), UAVSAR (uninhabited aerial vehicle synthetic aperture radar), and InSAR (interferometric synthetic aperture radar) data simultaneously for elastic response from fault and fluid motions. It can be used to solve for multiple faults and parameters, all of which can be specified or allowed to vary. The software can be used to study long-term tectonic motions and the faults responsible for those motions, or can be used to invert for co-seismic slip from earthquakes. Solutions involving estimation of fault motion and changes in fluid reservoirs such as magma or water are possible. Any arbitrary number of faults or parameters can be considered. Simplex specifically solves for any of location, geometry, fault slip, and expansion/contraction of a single or multiple faults. It inverts GPS and InSAR data for elastic dislocations in a half-space. Slip parameters include strike slip, dip slip, and tensile dislocations. It includes a map interface for both setting up the models and viewing the results. Results, including faults, and observed, computed, and residual displacements, are output in text format, a map interface, and can be exported to KML. The software interfaces with the QuakeTables database allowing a user to select existing fault parameters or data. Simplex can be accessed through the QuakeSim portal graphical user interface or run from a UNIX command line.
Modeling and Performance Considerations for Automated Fault Isolation in Complex Systems

NASA Technical Reports Server (NTRS)

Ferrell, Bob; Oostdyk, Rebecca

2010-01-01

The purpose of this paper is to document the modeling considerations and performance metrics that were examined in the development of a large-scale Fault Detection, Isolation and Recovery (FDIR) system. The FDIR system is envisioned to perform health management functions for both a launch vehicle and the ground systems that support the vehicle during checkout and launch countdown by using suite of complimentary software tools that alert operators to anomalies and failures in real-time. The FDIR team members developed a set of operational requirements for the models that would be used for fault isolation and worked closely with the vendor of the software tools selected for fault isolation to ensure that the software was able to meet the requirements. Once the requirements were established, example models of sufficient complexity were used to test the performance of the software. The results of the performance testing demonstrated the need for enhancements to the software in order to meet the demands of the full-scale ground and vehicle FDIR system. The paper highlights the importance of the development of operational requirements and preliminary performance testing as a strategy for identifying deficiencies in highly scalable systems and rectifying those deficiencies before they imperil the success of the project
Rover Attitude and Pointing System Simulation Testbed

NASA Technical Reports Server (NTRS)

Vanelli, Charles A.; Grinblat, Jonathan F.; Sirlin, Samuel W.; Pfister, Sam

2009-01-01

The MER (Mars Exploration Rover) Attitude and Pointing System Simulation Testbed Environment (RAPSSTER) provides a simulation platform used for the development and test of GNC (guidance, navigation, and control) flight algorithm designs for the Mars rovers, which was specifically tailored to the MERs, but has since been used in the development of rover algorithms for the Mars Science Laboratory (MSL) as well. The software provides an integrated simulation and software testbed environment for the development of Mars rover attitude and pointing flight software. It provides an environment that is able to run the MER GNC flight software directly (as opposed to running an algorithmic model of the MER GNC flight code). This improves simulation fidelity and confidence in the results. Further more, the simulation environment allows the user to single step through its execution, pausing, and restarting at will. The system also provides for the introduction of simulated faults specific to Mars rover environments that cannot be replicated in other testbed platforms, to stress test the GNC flight algorithms under examination. The software provides facilities to do these stress tests in ways that cannot be done in the real-time flight system testbeds, such as time-jumping (both forwards and backwards), and introduction of simulated actuator faults that would be difficult, expensive, and/or destructive to implement in the real-time testbeds. Actual flight-quality codes can be incorporated back into the development-test suite of GNC developers, closing the loop between the GNC developers and the flight software developers. The software provides fully automated scripting, allowing multiple tests to be run with varying parameters, without human supervision.
PDSS/IMC requirements and functional specifications

NASA Technical Reports Server (NTRS)

1983-01-01

The system (software and hardware) requirements for the Payload Development Support System (PDSS)/Image Motion Compensator (IMC) are provided. The PDSS/IMC system provides the capability for performing Image Motion Compensator Electronics (IMCE) flight software test, checkout, and verification and provides the capability for monitoring the IMC flight computer system during qualification testing for fault detection and fault isolation.
Use of controlled dynamic impacts on hierarchically structured seismically hazardous faults for seismically safe relaxation of shear stresses

NASA Astrophysics Data System (ADS)

Ruzhich, Valery V.; Psakhie, Sergey G.; Levina, Elena A.; Shilko, Evgeny V.; Grigoriev, Alexandr S.

2017-12-01

In the paper we briefly outline the experience in forecasting catastrophic earthquakes and the general problems in ensuring seismic safety. The purpose of our long-term research is the development and improvement of the methods of man-caused impacts on large-scale fault segments to safely reduce the negative effect of seismodynamic failure. Various laboratory and large-scale field experiments were carried out in the segments of tectonic faults in Baikal rift zone and in main cracks in block-structured ice cove of Lake Baikal using the developed measuring systems and special software for identification and treatment of deformation response of faulty segments to man-caused impacts. The results of the study let us to ground the necessity of development of servo-controlled technologies, which are able to provide changing the shear resistance and deformation regime of fault zone segments by applying vibrational and pulse triggering impacts. We suppose that the use of triggering impacts in highly stressed segments of active faults will promote transferring the geodynamic state of these segments from a metastable to a more stable and safe state.
SIRU development. Volume 3: Software description and program documentation

NASA Technical Reports Server (NTRS)

Oehrle, J.

1973-01-01

The development and initial evaluation of a strapdown inertial reference unit (SIRU) system are discussed. The SIRU configuration is a modular inertial subsystem with hardware and software features that achieve fault tolerant operational capabilities. The SIRU redundant hardware design is formulated about a six gyro and six accelerometer instrument module package. The six axes array provides redundant independent sensing and the symmetry enables the formulation of an optimal software redundant data processing structure with self-contained fault detection and isolation (FDI) capabilities. The basic SIRU software coding system used in the DDP-516 computer is documented.
Use of Soft Computing Technologies For Rocket Engine Control

NASA Technical Reports Server (NTRS)

Trevino, Luis C.; Olcmen, Semih; Polites, Michael

2003-01-01

The problem to be addressed in this paper is to explore how the use of Soft Computing Technologies (SCT) could be employed to further improve overall engine system reliability and performance. Specifically, this will be presented by enhancing rocket engine control and engine health management (EHM) using SCT coupled with conventional control technologies, and sound software engineering practices used in Marshall s Flight Software Group. The principle goals are to improve software management, software development time and maintenance, processor execution, fault tolerance and mitigation, and nonlinear control in power level transitions. The intent is not to discuss any shortcomings of existing engine control and EHM methodologies, but to provide alternative design choices for control, EHM, implementation, performance, and sustaining engineering. The approaches outlined in this paper will require knowledge in the fields of rocket engine propulsion, software engineering for embedded systems, and soft computing technologies (i.e., neural networks, fuzzy logic, and Bayesian belief networks), much of which is presented in this paper. The first targeted demonstration rocket engine platform is the MC-1 (formerly FASTRAC Engine) which is simulated with hardware and software in the Marshall Avionics & Software Testbed laboratory that
Development and evaluation of a Fault-Tolerant Multiprocessor (FTMP) computer. Volume 3: FTMP test and evaluation

NASA Technical Reports Server (NTRS)

Lala, J. H.; Smith, T. B., III

1983-01-01

The experimental test and evaluation of the Fault-Tolerant Multiprocessor (FTMP) is described. Major objectives of this exercise include expanding validation envelope, building confidence in the system, revealing any weaknesses in the architectural concepts and in their execution in hardware and software, and in general, stressing the hardware and software. To this end, pin-level faults were injected into one LRU of the FTMP and the FTMP response was measured in terms of fault detection, isolation, and recovery times. A total of 21,055 stuck-at-0, stuck-at-1 and invert-signal faults were injected in the CPU, memory, bus interface circuits, Bus Guardian Units, and voters and error latches. Of these, 17,418 were detected. At least 80 percent of undetected faults are estimated to be on unused pins. The multiprocessor identified all detected faults correctly and recovered successfully in each case. Total recovery time for all faults averaged a little over one second. This can be reduced to half a second by including appropriate self-tests.
Fault Tolerant Software Technology for Distributed Computer Systems

DTIC Science & Technology

1989-03-01

RAY.) &-TR-88-296 I Fin;.’ Technical Report ,r 19,39 i A28 3329 F’ULT TOLERANT SOFTWARE TECHNOLOGY FOR DISTRIBUTED COMPUTER SYSTEMS Georgia Institute...GrfisABN 34-70IiWftlI NO0. IN?3. NO IACCESSION NO. 158 21 7 11. TITLE (Incld security Cassification) FAULT TOLERANT SOFTWARE FOR DISTRIBUTED COMPUTER ...Technology for Distributed Computing Systems," a two year effort performed at Georgia Institute of Technology as part of the Clouds Project. The Clouds
Resilience Design Patterns - A Structured Approach to Resilience at Extreme Scale (version 1.1)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hukerikar, Saurabh; Engelmann, Christian

Reliability is a serious concern for future extreme-scale high-performance computing (HPC) systems. Projections based on the current generation of HPC systems and technology roadmaps suggest the prevalence of very high fault rates in future systems. The errors resulting from these faults will propagate and generate various kinds of failures, which may result in outcomes ranging from result corruptions to catastrophic application crashes. Therefore the resilience challenge for extreme-scale HPC systems requires management of various hardware and software technologies that are capable of handling a broad set of fault models at accelerated fault rates. Also, due to practical limits on powermore » consumption in HPC systems future systems are likely to embrace innovative architectures, increasing the levels of hardware and software complexities. As a result the techniques that seek to improve resilience must navigate the complex trade-off space between resilience and the overheads to power consumption and performance. While the HPC community has developed various resilience solutions, application-level techniques as well as system-based solutions, the solution space of HPC resilience techniques remains fragmented. There are no formal methods and metrics to investigate and evaluate resilience holistically in HPC systems that consider impact scope, handling coverage, and performance & power efficiency across the system stack. Additionally, few of the current approaches are portable to newer architectures and software environments that will be deployed on future systems. In this document, we develop a structured approach to the management of HPC resilience using the concept of resilience-based design patterns. A design pattern is a general repeatable solution to a commonly occurring problem. We identify the commonly occurring problems and solutions used to deal with faults, errors and failures in HPC systems. Each established solution is described in the form of a pattern that addresses concrete problems in the design of resilient systems. The complete catalog of resilience design patterns provides designers with reusable design elements. We also define a framework that enhances a designer's understanding of the important constraints and opportunities for the design patterns to be implemented and deployed at various layers of the system stack. This design framework may be used to establish mechanisms and interfaces to coordinate flexible fault management across hardware and software components. The framework also supports optimization of the cost-benefit trade-offs among performance, resilience, and power consumption. The overall goal of this work is to enable a systematic methodology for the design and evaluation of resilience technologies in extreme-scale HPC systems that keep scientific applications running to a correct solution in a timely and cost-efficient manner in spite of frequent faults, errors, and failures of various types.« less
Staged-Fault Testing of Distance Protection Relay Settings

NASA Astrophysics Data System (ADS)

Havelka, J.; Malarić, R.; Frlan, K.

2012-01-01

In order to analyze the operation of the protection system during induced fault testing in the Croatian power system, a simulation using the CAPE software has been performed. The CAPE software (Computer-Aided Protection Engineering) is expert software intended primarily for relay protection engineers, which calculates current and voltage values during faults in the power system, so that relay protection devices can be properly set up. Once the accuracy of the simulation model had been confirmed, a series of simulations were performed in order to obtain the optimal fault location to test the protection system. The simulation results were used to specify the test sequence definitions for the end-to-end relay testing using advanced testing equipment with GPS synchronization for secondary injection in protection schemes based on communication. The objective of the end-to-end testing was to perform field validation of the protection settings, including verification of the circuit breaker operation, telecommunication channel time and the effectiveness of the relay algorithms. Once the end-to-end secondary injection testing had been completed, the induced fault testing was performed with three-end lines loaded and in service. This paper describes and analyses the test procedure, consisting of CAPE simulations, end-to-end test with advanced secondary equipment and staged-fault test of a three-end power line in the Croatian transmission system.
Distributed asynchronous microprocessor architectures in fault tolerant integrated flight systems

NASA Technical Reports Server (NTRS)

Dunn, W. R.

1983-01-01

The paper discusses the implementation of fault tolerant digital flight control and navigation systems for rotorcraft application. It is shown that in implementing fault tolerance at the systems level using advanced LSI/VLSI technology, aircraft physical layout and flight systems requirements tend to define a system architecture of distributed, asynchronous microprocessors in which fault tolerance can be achieved locally through hardware redundancy and/or globally through application of analytical redundancy. The effects of asynchronism on the execution of dynamic flight software is discussed. It is shown that if the asynchronous microprocessors have knowledge of time, these errors can be significantly reduced through appropiate modifications of the flight software. Finally, the papear extends previous work to show that through the combined use of time referencing and stable flight algorithms, individual microprocessors can be configured to autonomously tolerate intermittent faults.
Tutorial: Advanced fault tree applications using HARP

NASA Technical Reports Server (NTRS)

Dugan, Joanne Bechta; Bavuso, Salvatore J.; Boyd, Mark A.

1993-01-01

Reliability analysis of fault tolerant computer systems for critical applications is complicated by several factors. These modeling difficulties are discussed and dynamic fault tree modeling techniques for handling them are described and demonstrated. Several advanced fault tolerant computer systems are described, and fault tree models for their analysis are presented. HARP (Hybrid Automated Reliability Predictor) is a software package developed at Duke University and NASA Langley Research Center that is capable of solving the fault tree models presented.
Experiments in fault tolerant software reliability

NASA Technical Reports Server (NTRS)

Mcallister, David F.; Vouk, Mladen A.

1989-01-01

Twenty functionally equivalent programs were built and tested in a multiversion software experiment. Following unit testing, all programs were subjected to an extensive system test. In the process sixty-one distinct faults were identified among the versions. Less than 12 percent of the faults exhibited varying degrees of positive correlation. The common-cause (or similar) faults spanned as many as 14 components. However, a majority of these faults were trivial, and easily detected by proper unit and/or system testing. Only two of the seven similar faults were difficult faults, and both were caused by specification ambiguities. One of these faults exhibited variable identical-and-wrong response span, i.e. response span which varied with the testing conditions and input data. Techniques that could have been used to avoid the faults are discussed. For example, it was determined that back-to-back testing of 2-tuples could have been used to eliminate about 90 percent of the faults. In addition, four of the seven similar faults could have been detected by using back-to-back testing of 5-tuples. It is believed that most, if not all, similar faults could have been avoided had the specifications been written using more formal notation, the unit testing phase was subject to more stringent standards and controls, and better tools for measuring the quality and adequacy of the test data (e.g. coverage) were used.
Toward a Model-Based Approach to Flight System Fault Protection

NASA Technical Reports Server (NTRS)

Day, John; Murray, Alex; Meakin, Peter

2012-01-01

Fault Protection (FP) is a distinct and separate systems engineering sub-discipline that is concerned with the off-nominal behavior of a system. Flight system fault protection is an important part of the overall flight system systems engineering effort, with its own products and processes. As with other aspects of systems engineering, the FP domain is highly amenable to expression and management in models. However, while there are standards and guidelines for performing FP related analyses, there are not standards or guidelines for formally relating the FP analyses to each other or to the system hardware and software design. As a result, the material generated for these analyses are effectively creating separate models that are only loosely-related to the system being designed. Development of approaches that enable modeling of FP concerns in the same model as the system hardware and software design enables establishment of formal relationships that has great potential for improving the efficiency, correctness, and verification of the implementation of flight system FP. This paper begins with an overview of the FP domain, and then continues with a presentation of a SysML/UML model of the FP domain and the particular analyses that it contains, by way of showing a potential model-based approach to flight system fault protection, and an exposition of the use of the FP models in FSW engineering. The analyses are small examples, inspired by current real-project examples of FP analyses.
Using recurrence plot analysis for software execution interpretation and fault detection

NASA Astrophysics Data System (ADS)

Mosdorf, M.

2015-09-01

This paper shows a method targeted at software execution interpretation and fault detection using recurrence plot analysis. In in the proposed approach recurrence plot analysis is applied to software execution trace that contains executed assembly instructions. Results of this analysis are subject to further processing with PCA (Principal Component Analysis) method that simplifies number coefficients used for software execution classification. This method was used for the analysis of five algorithms: Bubble Sort, Quick Sort, Median Filter, FIR, SHA-1. Results show that some of the collected traces could be easily assigned to particular algorithms (logs from Bubble Sort and FIR algorithms) while others are more difficult to distinguish.
Slicken 1.0: Program for calculating the orientation of shear on reactivated faults

NASA Astrophysics Data System (ADS)

Xu, Hong; Xu, Shunshan; Nieto-Samaniego, Ángel F.; Alaniz-Álvarez, Susana A.

2017-07-01

The slip vector on a fault is an important parameter in the study of the movement history of a fault and its faulting mechanism. Although there exist many graphical programs to represent the shear stress (or slickenline) orientations on faults, programs to quantitatively calculate the orientation of fault slip based on a given stress field are scarce. In consequence, we develop Slicken 1.0, a software to rapidly calculate the orientation of maximum shear stress on any fault plane. For this direct method of calculating the resolved shear stress on a planar surface, the input data are the unit vector normal to the involved plane, the unit vectors of the three principal stress axes, and the stress ratio. The advantage of this program is that the vertical or horizontal principal stresses are not necessarily required. Due to its nimble design using Java SE 8.0, it runs on most operating systems with the corresponding Java VM. The software program will be practical for geoscience students, geologists and engineers and will help resolve a deficiency in field geology, and structural and engineering geology.
Protecting Against Faults in JPL Spacecraft

NASA Technical Reports Server (NTRS)

Morgan, Paula

2007-01-01

A paper discusses techniques for protecting against faults in spacecraft designed and operated by NASA s Jet Propulsion Laboratory (JPL). The paper addresses, more specifically, fault-protection requirements and techniques common to most JPL spacecraft (in contradistinction to unique, mission specific techniques), standard practices in the implementation of these techniques, and fault-protection software architectures. Common requirements include those to protect onboard command, data-processing, and control computers; protect against loss of Earth/spacecraft radio communication; maintain safe temperatures; and recover from power overloads. The paper describes fault-protection techniques as part of a fault-management strategy that also includes functional redundancy, redundant hardware, and autonomous monitoring of (1) the operational and health statuses of spacecraft components, (2) temperatures inside and outside the spacecraft, and (3) allocation of power. The strategy also provides for preprogrammed automated responses to anomalous conditions. In addition, the software running in almost every JPL spacecraft incorporates a general-purpose "Safe Mode" response algorithm that configures the spacecraft in a lower-power state that is safe and predictable, thereby facilitating diagnosis of more complex faults by a team of human experts on Earth.
A theoretical basis for the analysis of redundant software subject to coincident errors

NASA Technical Reports Server (NTRS)

Eckhardt, D. E., Jr.; Lee, L. D.

1985-01-01

Fundamental to the development of redundant software techniques fault-tolerant software, is an understanding of the impact of multiple-joint occurrences of coincident errors. A theoretical basis for the study of redundant software is developed which provides a probabilistic framework for empirically evaluating the effectiveness of the general (N-Version) strategy when component versions are subject to coincident errors, and permits an analytical study of the effects of these errors. The basic assumptions of the model are: (1) independently designed software components are chosen in a random sample; and (2) in the user environment, the system is required to execute on a stationary input series. The intensity of coincident errors, has a central role in the model. This function describes the propensity to introduce design faults in such a way that software components fail together when executing in the user environment. The model is used to give conditions under which an N-Version system is a better strategy for reducing system failure probability than relying on a single version of software. A condition which limits the effectiveness of a fault-tolerant strategy is studied, and it is posted whether system failure probability varies monotonically with increasing N or whether an optimal choice of N exists.

Expert systems applied to fault isolation and energy storage management, phase 2

NASA Technical Reports Server (NTRS)

1987-01-01

A user's guide for the Fault Isolation and Energy Storage (FIES) II system is provided. Included are a brief discussion of the background and scope of this project, a discussion of basic and advanced operating installation and problem determination procedures for the FIES II system and information on hardware and software design and implementation. A number of appendices are provided including a detailed specification for the microprocessor software, a detailed description of the expert system rule base and a description and listings of the LISP interface software.
Software fault tolerance using data diversity

NASA Technical Reports Server (NTRS)

Knight, John C.

1991-01-01

Research on data diversity is discussed. Data diversity relies on a different form of redundancy from existing approaches to software fault tolerance and is substantially less expensive to implement. Data diversity can also be applied to software testing and greatly facilitates the automation of testing. Up to now it has been explored both theoretically and in a pilot study, and has been shown to be a promising technique. The effectiveness of data diversity as an error detection mechanism and the application of data diversity to differential equation solvers are discussed.
Improved fault tolerance for air bag release in automobiles

NASA Astrophysics Data System (ADS)

Yeshwanth Kumar, C. H.; Prudhvi Prasad, P.; Uday Shankar, M.; Shanmugasundaram, M.

2017-11-01

In order to increase the reliability of the airbag system in automobiles which in turn increase the safety of the automobile we require improved airbag release system, our project deals with Triple Modular Redundancy (TMR) Technique where we use either three Sensors interfaced with three Microcontrollers which given as input to the software voter which produces majority output which is feed to the air compressor for releasing airbag. This concept was being used, in this project we are increasing reliability and safety of the entire system.
An empirical study of flight control software reliability

NASA Technical Reports Server (NTRS)

Dunham, J. R.; Pierce, J. L.

1986-01-01

The results of a laboratory experiment in flight control software reliability are reported. The experiment tests a small sample of implementations of a pitch axis control law for a PA28 aircraft with over 14 million pitch commands with varying levels of additive input and feedback noise. The testing which uses the method of n-version programming for error detection surfaced four software faults in one implementation of the control law. The small number of detected faults precluded the conduct of the error burst analyses. The pitch axis problem provides data for use in constructing a model in the prediction of the reliability of software in systems with feedback. The study is undertaken to find means to perform reliability evaluations of flight control software.
Adaptive neural network/expert system that learns fault diagnosis for different structures

NASA Astrophysics Data System (ADS)

Simon, Solomon H.

1992-08-01

Corporations need better real-time monitoring and control systems to improve productivity by watching quality and increasing production flexibility. The innovative technology to achieve this goal is evolving in the form artificial intelligence and neural networks applied to sensor processing, fusion, and interpretation. By using these advanced Al techniques, we can leverage existing systems and add value to conventional techniques. Neural networks and knowledge-based expert systems can be combined into intelligent sensor systems which provide real-time monitoring, control, evaluation, and fault diagnosis for production systems. Neural network-based intelligent sensor systems are more reliable because they can provide continuous, non-destructive monitoring and inspection. Use of neural networks can result in sensor fusion and the ability to model highly, non-linear systems. Improved models can provide a foundation for more accurate performance parameters and predictions. We discuss a research software/hardware prototype which integrates neural networks, expert systems, and sensor technologies and which can adapt across a variety of structures to perform fault diagnosis. The flexibility and adaptability of the prototype in learning two structures is presented. Potential applications are discussed.
An autonomous fault detection, isolation, and recovery system for a 20-kHz electric power distribution test bed

NASA Technical Reports Server (NTRS)

Quinn, Todd M.; Walters, Jerry L.

1991-01-01

Future space explorations will require long term human presence in space. Space environments that provide working and living quarters for manned missions are becoming increasingly larger and more sophisticated. Monitor and control of the space environment subsystems by expert system software, which emulate human reasoning processes, could maintain the health of the subsystems and help reduce the human workload. The autonomous power expert (APEX) system was developed to emulate a human expert's reasoning processes used to diagnose fault conditions in the domain of space power distribution. APEX is a fault detection, isolation, and recovery (FDIR) system, capable of autonomous monitoring and control of the power distribution system. APEX consists of a knowledge base, a data base, an inference engine, and various support and interface software. APEX provides the user with an easy-to-use interactive interface. When a fault is detected, APEX will inform the user of the detection. The user can direct APEX to isolate the probable cause of the fault. Once a fault has been isolated, the user can ask APEX to justify its fault isolation and to recommend actions to correct the fault. APEX implementation and capabilities are discussed.
Fault Management Architectures and the Challenges of Providing Software Assurance

NASA Technical Reports Server (NTRS)

Savarino, Shirley; Fitz, Rhonda; Fesq, Lorraine; Whitman, Gerek

2015-01-01

The satellite systems Fault Management (FM) is focused on safety, the preservation of assets, and maintaining the desired functionality of the system. How FM is implemented varies among missions. Common to most is system complexity due to a need to establish a multi-dimensional structure across hardware, software and operations. This structure is necessary to identify and respond to system faults, mitigate technical risks and ensure operational continuity. These architecture, implementation and software assurance efforts increase with mission complexity. Because FM is a systems engineering discipline with a distributed implementation, providing efficient and effective verification and validation (VV) is challenging. A breakout session at the 2012 NASA Independent Verification Validation (IVV) Annual Workshop titled VV of Fault Management: Challenges and Successes exposed these issues in terms of VV for a representative set of architectures. NASA's IVV is funded by NASA's Software Assurance Research Program (SARP) in partnership with NASA's Jet Propulsion Laboratory (JPL) to extend the work performed at the Workshop session. NASA IVV will extract FM architectures across the IVV portfolio and evaluate the data set for robustness, assess visibility for validation and test, and define software assurance methods that could be applied to the various architectures and designs. This work focuses efforts on FM architectures from critical and complex projects within NASA. The identification of particular FM architectures, visibility, and associated VVIVV techniques provides a data set that can enable higher assurance that a satellite system will adequately detect and respond to adverse conditions. Ultimately, results from this activity will be incorporated into the NASA Fault Management Handbook providing dissemination across NASA, other agencies and the satellite community. This paper discusses the approach taken to perform the evaluations and preliminary findings from the research including identification of FM architectures, visibility observations, and methods utilized for VVIVV.
Fault detection and diagnosis of photovoltaic systems

NASA Astrophysics Data System (ADS)

Wu, Xing

The rapid growth of the solar industry over the past several years has expanded the significance of photovoltaic (PV) systems. One of the primary aims of research in building-integrated PV systems is to improve the performance of the system's efficiency, availability, and reliability. Although much work has been done on technological design to increase a photovoltaic module's efficiency, there is little research so far on fault diagnosis for PV systems. Faults in a PV system, if not detected, may not only reduce power generation, but also threaten the availability and reliability, effectively the "security" of the whole system. In this paper, first a circuit-based simulation baseline model of a PV system with maximum power point tracking (MPPT) is developed using MATLAB software. MATLAB is one of the most popular tools for integrating computation, visualization and programming in an easy-to-use modeling environment. Second, data collection of a PV system at variable surface temperatures and insolation levels under normal operation is acquired. The developed simulation model of PV system is then calibrated and improved by comparing modeled I-V and P-V characteristics with measured I--V and P--V characteristics to make sure the simulated curves are close to those measured values from the experiments. Finally, based on the circuit-based simulation model, a PV model of various types of faults will be developed by changing conditions or inputs in the MATLAB model, and the I--V and P--V characteristic curves, and the time-dependent voltage and current characteristics of the fault modalities will be characterized for each type of fault. These will be developed as benchmark I-V or P-V, or prototype transient curves. If a fault occurs in a PV system, polling and comparing actual measured I--V and P--V characteristic curves with both normal operational curves and these baseline fault curves will aid in fault diagnosis.
Development and evaluation of a Fault-Tolerant Multiprocessor (FTMP) computer. Volume 2: FTMP software

NASA Technical Reports Server (NTRS)

Lala, J. H.; Smith, T. B., III

1983-01-01

The software developed for the Fault-Tolerant Multiprocessor (FTMP) is described. The FTMP executive is a timer-interrupt driven dispatcher that schedules iterative tasks which run at 3.125, 12.5, and 25 Hz. Major tasks which run under the executive include system configuration control, flight control, and display. The flight control task includes autopilot and autoland functions for a jet transport aircraft. System Displays include status displays of all hardware elements (processors, memories, I/O ports, buses), failure log displays showing transient and hard faults, and an autopilot display. All software is in a higher order language (AED, an ALGOL derivative). The executive is a fully distributed general purpose executive which automatically balances the load among available processor triads. Provisions for graceful performance degradation under processing overload are an integral part of the scheduling algorithms.
Software for determining the true displacement of faults

NASA Astrophysics Data System (ADS)

Nieto-Fuentes, R.; Nieto-Samaniego, Á. F.; Xu, S.-S.; Alaniz-Álvarez, S. A.

2014-03-01

One of the most important parameters of faults is the true (or net) displacement, which is measured by restoring two originally adjacent points, called “piercing points”, to their original positions. This measurement is not typically applicable because it is rare to observe piercing points in natural outcrops. Much more common is the measurement of the apparent displacement of a marker. Methods to calculate the true displacement of faults using descriptive geometry, trigonometry or vector algebra are common in the literature, and most of them solve a specific situation from a large amount of possible combinations of the fault parameters. True displacements are not routinely calculated because it is a tedious and tiring task, despite their importance and the relatively simple methodology. We believe that the solution is to develop software capable of performing this work. In a previous publication, our research group proposed a method to calculate the true displacement of faults by solving most combinations of fault parameters using simple trigonometric equations. The purpose of this contribution is to present a computer program for calculating the true displacement of faults. The input data are the dip of the fault; the pitch angles of the markers, slickenlines and observation lines; and the marker separation. To prevent the common difficulties involved in switching between operative systems, the software is developed using the Java programing language. The computer program could be used as a tool in education and will also be useful for the calculation of the true fault displacement in geological and engineering works. The application resolves the cases with known direction of net slip, which commonly is assumed parallel to the slickenlines. This assumption is not always valid and must be used with caution, because the slickenlines are formed during a step of the incremental displacement on the fault surface, whereas the net slip is related to the finite slip.
Progress in Computational Simulation of Earthquakes

NASA Technical Reports Server (NTRS)

Donnellan, Andrea; Parker, Jay; Lyzenga, Gregory; Judd, Michele; Li, P. Peggy; Norton, Charles; Tisdale, Edwin; Granat, Robert

2006-01-01

GeoFEST(P) is a computer program written for use in the QuakeSim project, which is devoted to development and improvement of means of computational simulation of earthquakes. GeoFEST(P) models interacting earthquake fault systems from the fault-nucleation to the tectonic scale. The development of GeoFEST( P) has involved coupling of two programs: GeoFEST and the Pyramid Adaptive Mesh Refinement Library. GeoFEST is a message-passing-interface-parallel code that utilizes a finite-element technique to simulate evolution of stress, fault slip, and plastic/elastic deformation in realistic materials like those of faulted regions of the crust of the Earth. The products of such simulations are synthetic observable time-dependent surface deformations on time scales from days to decades. Pyramid Adaptive Mesh Refinement Library is a software library that facilitates the generation of computational meshes for solving physical problems. In an application of GeoFEST(P), a computational grid can be dynamically adapted as stress grows on a fault. Simulations on workstations using a few tens of thousands of stress and displacement finite elements can now be expanded to multiple millions of elements with greater than 98-percent scaled efficiency on over many hundreds of parallel processors (see figure).
The Dangers of Failure Masking in Fault-Tolerant Software: Aspects of a Recent In-Flight Upset Event

NASA Technical Reports Server (NTRS)

Johnson, C. W.; Holloway, C. M.

2007-01-01

On 1 August 2005, a Boeing Company 777-200 aircraft, operating on an international passenger flight from Australia to Malaysia, was involved in a significant upset event while flying on autopilot. The Australian Transport Safety Bureau's investigation into the event discovered that an anomaly existed in the component software hierarchy that allowed inputs from a known faulty accelerometer to be processed by the air data inertial reference unit (ADIRU) and used by the primary flight computer, autopilot and other aircraft systems. This anomaly had existed in original ADIRU software, and had not been detected in the testing and certification process for the unit. This paper describes the software aspects of the incident in detail, and suggests possible implications concerning complex, safety-critical, fault-tolerant software.
Fault-Tree Compiler

NASA Technical Reports Server (NTRS)

Butler, Ricky W.; Boerschlein, David P.

1993-01-01

Fault-Tree Compiler (FTC) program, is software tool used to calculate probability of top event in fault tree. Gates of five different types allowed in fault tree: AND, OR, EXCLUSIVE OR, INVERT, and M OF N. High-level input language easy to understand and use. In addition, program supports hierarchical fault-tree definition feature, which simplifies tree-description process and reduces execution time. Set of programs created forming basis for reliability-analysis workstation: SURE, ASSIST, PAWS/STEM, and FTC fault-tree tool (LAR-14586). Written in PASCAL, ANSI-compliant C language, and FORTRAN 77. Other versions available upon request.
Interaction Behavior between Thrust Faulting and the National Highway No. 3 - Tianliao III bridge as Determined using Numerical Simulation

NASA Astrophysics Data System (ADS)

Li, C. H.; Wu, L. C.; Chan, P. C.; Lin, M. L.

2016-12-01

The National Highway No. 3 - Tianliao III Bridge is located in the southwestern Taiwan mudstone area and crosses the Chekualin fault. Since the bridge was opened to traffic, it has been repaired 11 times. To understand the interaction behavior between thrust faulting and the bridge, a discrete element method-based software program, PFC, was applied to conduct a numerical analysis. A 3D model for simulating the thrust faulting and bridge was established, as shown in Fig. 1. In this conceptual model, the length and width were 50 and 10 m, respectively. Part of the box bottom was moveable, simulating the displacement of the thrust fault. The overburden stratum had a height of 5 m with fault dip angles of 20° (Fig. 2). The bottom-up strata were mudstone, clay, and sand, separately. The uplift was 1 m, which was 20% of the stratum thickness. In accordance with the investigation, the position of the fault tip was set, depending on the fault zone, and the bridge deformation was observed (Fig. 3). By setting "Monitoring Balls" in the numerical model to analyzes bridge displacement, we determined that the bridge deck deflection increased as the uplift distance increased. Furthermore, the force caused by the loading of the bridge deck and fault dislocation was determined to cause a down deflection of the P1 and P2 bridge piers. Finally, the fault deflection trajectory of the P4 pier displayed the maximum displacement (Fig. 4). Similar behavior has been observed through numerical simulation as well as field monitoring data. Usage of the discrete element model (PFC3D) to simulate the deformation behavior between thrust faulting and the bridge provided feedback for the design and improved planning of the bridge.
Fault-Tree Compiler Program

NASA Technical Reports Server (NTRS)

Butler, Ricky W.; Martensen, Anna L.

1992-01-01

FTC, Fault-Tree Compiler program, is reliability-analysis software tool used to calculate probability of top event of fault tree. Five different types of gates allowed in fault tree: AND, OR, EXCLUSIVE OR, INVERT, and M OF N. High-level input language of FTC easy to understand and use. Program supports hierarchical fault-tree-definition feature simplifying process of description of tree and reduces execution time. Solution technique implemented in FORTRAN, and user interface in Pascal. Written to run on DEC VAX computer operating under VMS operating system.
Runtime Speculative Software-Only Fault Tolerance

DTIC Science & Technology

2012-06-01

reliability of RSFT, a in-depth analysis on its window of vulnerability is also discussed and measured via simulated fault injection. The performance...propagation of faults through the entire program. For optimal performance, these techniques have to use herotic alias analysis to find the minimum set of...affect program output. No program source code or alias analysis is needed to analyze the fault propagation ahead of time. 2.3 Limitations of Existing
An implementation and performance measurement of the progressive retry technique

NASA Technical Reports Server (NTRS)

Suri, Gaurav; Huang, Yennun; Wang, Yi-Min; Fuchs, W. Kent; Kintala, Chandra

1995-01-01

This paper describes a recovery technique called progressive retry for bypassing software faults in message-passing applications. The technique is implemented as reusable modules to provide application-level software fault tolerance. The paper describes the implementation of the technique and presents results from the application of progressive retry to two telecommunications systems. the results presented show that the technique is helpful in reducing the total recovery time for message-passing applications.
Hyperswitch Communication Network Computer

NASA Technical Reports Server (NTRS)

Peterson, John C.; Chow, Edward T.; Priel, Moshe; Upchurch, Edwin T.

1993-01-01

Hyperswitch Communications Network (HCN) computer is prototype multiple-processor computer being developed. Incorporates improved version of hyperswitch communication network described in "Hyperswitch Network For Hypercube Computer" (NPO-16905). Designed to support high-level software and expansion of itself. HCN computer is message-passing, multiple-instruction/multiple-data computer offering significant advantages over older single-processor and bus-based multiple-processor computers, with respect to price/performance ratio, reliability, availability, and manufacturing. Design of HCN operating-system software provides flexible computing environment accommodating both parallel and distributed processing. Also achieves balance among following competing factors; performance in processing and communications, ease of use, and tolerance of (and recovery from) faults.
Static and Dynamic Verification of Critical Software for Space Applications

NASA Astrophysics Data System (ADS)

Moreira, F.; Maia, R.; Costa, D.; Duro, N.; Rodríguez-Dapena, P.; Hjortnaes, K.

Space technology is no longer used only for much specialised research activities or for sophisticated manned space missions. Modern society relies more and more on space technology and applications for every day activities. Worldwide telecommunications, Earth observation, navigation and remote sensing are only a few examples of space applications on which we rely daily. The European driven global navigation system Galileo and its associated applications, e.g. air traffic management, vessel and car navigation, will significantly expand the already stringent safety requirements for space based applications Apart from their usefulness and practical applications, every single piece of onboard software deployed into the space represents an enormous investment. With a long lifetime operation and being extremely difficult to maintain and upgrade, at least when comparing with "mainstream" software development, the importance of ensuring their correctness before deployment is immense. Verification &Validation techniques and technologies have a key role in ensuring that the onboard software is correct and error free, or at least free from errors that can potentially lead to catastrophic failures. Many RAMS techniques including both static criticality analysis and dynamic verification techniques have been used as a means to verify and validate critical software and to ensure its correctness. But, traditionally, these have been isolated applied. One of the main reasons is the immaturity of this field in what concerns to its application to the increasing software product(s) within space systems. This paper presents an innovative way of combining both static and dynamic techniques exploiting their synergy and complementarity for software fault removal. The methodology proposed is based on the combination of Software FMEA and FTA with Fault-injection techniques. The case study herein described is implemented with support from two tools: The SoftCare tool for the SFMEA and SFTA, and the Xception tool for fault-injection. Keywords: Verification &Validation, RAMS, Onboard software, SFMEA, STA, Fault-injection 1 This work is being performed under the project STADY Applied Static And Dynamic Verification Of Critical Software, ESA/ESTEC Contract Nr. 15751/02/NL/LvH.
Measurement of SIFT operating system overhead

NASA Technical Reports Server (NTRS)

Palumbo, D. L.; Butler, R. W.

1985-01-01

The overhead of the software implemented fault tolerance (SIFT) operating system was measured. Several versions of the operating system evolved. Each version represents different strategies employed to improve the measured performance. Three of these versions are analyzed. The internal data structures of the operating systems are discussed. The overhead of the SIFT operating system was found to be of two types: vote overhead and executive task overhead. Both types of overhead were found to be significant in all versions of the system. Improvements substantially reduced this overhead; even with these improvements, the operating system consumed well over 50% of the available processing time.

Improving Quality Using Architecture Fault Analysis with Confidence Arguments

DTIC Science & Technology

2015-03-01

CMU/SEI-2015-TR-006 | SOFTWARE ENGINEERING INSTITUTE | CARNEGIE MELLON UNIVERSITY iii List of Figures Figure 1: Architecture-Centric...Requirements Decomposition 5 Figure 2: A System and Its Interface with Its Environment 6 Figure 3: AADL Graphical Symbols 8 Figure 4: Textual AADL Example...8 Figure 5: Textual AADL Error Model Example 9 Figure 6: Potential Hazard Sources in the Feedback Control Loop [Leveson 2012] 11 Figure 7
Using certification trails to achieve software fault tolerance

NASA Technical Reports Server (NTRS)

Sullivan, Gregory F.; Masson, Gerald M.

1993-01-01

A conceptually novel and powerful technique to achieve fault tolerance in hardware and software systems is introduced. When used for software fault tolerance, this new technique uses time and software redundancy and can be outlined as follows. In the initial phase, a program is run to solve a problem and store the result. In addition, this program leaves behind a trail of data called a certification trail. In the second phase, another program is run which solves the original problem again. This program, however, has access to the certification trail left by the first program. Because of the availability of the certification trail, the second phase can be performed by a less complex program and can execute more quickly. In the final phase, the two results are accepted as correct; otherwise an error is indicated. An essential aspect of this approach is that the second program must always generate either an error indication or a correct output even when the certification trail it receives from the first program is incorrect. The certification trail approach to fault tolerance was formalized and it was illustrated by applying it to the fundamental problem of finding a minimum spanning tree. Cases in which the second phase can be run concorrectly with the first and act as a monitor are discussed. The certification trail approach was compared to other approaches to fault tolerance. Because of space limitations we have omitted examples of our technique applied to the Huffman tree, and convex hull problems. These can be found in the full version of this paper.
Evaluating software development characteristics: Assessment of software measures in the Software Engineering Laboratory. [reliability engineering

NASA Technical Reports Server (NTRS)

Basili, V. R.

1981-01-01

Work on metrics is discussed. Factors that affect software quality are reviewed. Metrics is discussed in terms of criteria achievements, reliability, and fault tolerance. Subjective and objective metrics are distinguished. Product/process and cost/quality metrics are characterized and discussed.
Emerging technologies for V&V of ISHM software for space exploration

NASA Technical Reports Server (NTRS)

Feather, Martin S.; Markosian, Lawrence Z.

2006-01-01

Systems1,2 required to exhibit high operational reliability often rely on some form of fault protection to recognize and respond to faults, preventing faults' escalation to catastrophic failures. Integrated System Health Management (ISHM) extends the functionality of fault protection to both scale to more complex systems (and systems of systems), and to maintain capability rather than just avert catastrophe. Forms of ISHM have been utilized to good effect in the maintenance phase of systems' total lifecycles (often referred to as 'condition-based mainte-nance'), but less so in a 'fault protection' role during actual operations. One of the impediments to such use lies in the challenges of verification, validation and certification of ISHM systems themselves. This paper makes the case that state-of-the-practice V&V and certification techniques will not suffice for emerging forms of ISHM systems; however, a number of maturing software engineering assurance technologies show particular promise for addressing these ISHM V&V challenges.
OPAD-EDIFIS Real-Time Processing

NASA Technical Reports Server (NTRS)

Katsinis, Constantine

1997-01-01

The Optical Plume Anomaly Detection (OPAD) detects engine hardware degradation of flight vehicles through identification and quantification of elemental species found in the plume by analyzing the plume emission spectra in a real-time mode. Real-time performance of OPAD relies on extensive software which must report metal amounts in the plume faster than once every 0.5 sec. OPAD software previously written by NASA scientists performed most necessary functions at speeds which were far below what is needed for real-time operation. The research presented in this report improved the execution speed of the software by optimizing the code without changing the algorithms and converting it into a parallelized form which is executed in a shared-memory multiprocessor system. The resulting code was subjected to extensive timing analysis. The report also provides suggestions for further performance improvement by (1) identifying areas of algorithm optimization, (2) recommending commercially available multiprocessor architectures and operating systems to support real-time execution and (3) presenting an initial study of fault-tolerance requirements.
Specification, Synthesis, and Verification of Software-based Control Protocols for Fault-Tolerant Space Systems

DTIC Science & Technology

2016-08-16

Force Research Laboratory Space Vehicles Directorate AFRL /RVSV 3550 Aberdeen Ave, SE 11. SPONSOR/MONITOR’S REPORT Kirtland AFB, NM 87117-5776 NUMBER...Ft Belvoir, VA 22060-6218 1 cy AFRL /RVIL Kirtland AFB, NM 87117-5776 2 cys Official Record Copy AFRL /RVSV/Richard S. Erwin 1 cy... AFRL -RV-PS- AFRL -RV-PS- TR-2016-0112 TR-2016-0112 SPECIFICATION, SYNTHESIS, AND VERIFICATION OF SOFTWARE-BASED CONTROL PROTOCOLS FOR FAULT-TOLERANT
Experience report: Using formal methods for requirements analysis of critical spacecraft software

NASA Technical Reports Server (NTRS)

Lutz, Robyn R.; Ampo, Yoko

1994-01-01

Formal specification and analysis of requirements continues to gain support as a method for producing more reliable software. However, the introduction of formal methods to a large software project is difficult, due in part to the unfamiliarity of the specification languages and the lack of graphics. This paper reports results of an investigation into the effectiveness of formal methods as an aid to the requirements analysis of critical, system-level fault-protection software on a spacecraft currently under development. Our experience indicates that formal specification and analysis can enhance the accuracy of the requirements and add assurance prior to design development in this domain. The work described here is part of a larger, NASA-funded research project whose purpose is to use formal-methods techniques to improve the quality of software in space applications. The demonstration project described here is part of the effort to evaluate experimentally the effectiveness of supplementing traditional engineering approaches to requirements specification with the more rigorous specification and analysis available with formal methods.
Diagnosis diagrams for passing signals on an automatic block signaling railway section

NASA Astrophysics Data System (ADS)

Spunei, E.; Piroi, I.; Chioncel, C. P.; Piroi, F.

2018-01-01

This work presents a diagnosis method for railway traffic security installations. More specifically, the authors present a series of diagnosis charts for passing signals on a railway block equipped with an automatic block signaling installation. These charts are based on the exploitation electric schemes, and are subsequently used to develop a diagnosis software package. The thus developed software package contributes substantially to a reduction of failure detection and remedy for these types of installation faults. The use of the software package eliminates making wrong decisions in the fault detection process, decisions that may result in longer remedy times and, sometimes, to railway traffic events.
Motion-Based System Identification and Fault Detection and Isolation Technologies for Thruster Controlled Spacecraft

NASA Technical Reports Server (NTRS)

Wilson, Edward; Sutter, David W.; Berkovitz, Dustin; Betts, Bradley J.; Kong, Edmund; delMundo, Rommel; Lages, Christopher R.; Mah, Robert W.; Papasin, Richard

2003-01-01

By analyzing the motions of a thruster-controlled spacecraft, it is possible to provide on-line (1) thruster fault detection and isolation (FDI), and (2) vehicle mass- and thruster-property identification (ID). Technologies developed recently at NASA Ames have significantly improved the speed and accuracy of these ID and FDI capabilities, making them feasible for application to a broad class of spacecraft. Since these technologies use existing sensors, the improved system robustness and performance that comes with the thruster fault tolerance and system ID can be achieved through a software-only implementation. This contrasts with the added cost, mass, and hardware complexity commonly required by FDI. Originally developed in partnership with NASA - Johnson Space Center to provide thruster FDI capability for the X-38 during re-entry, these technologies are most recently being applied to the MIT SPHERES experimental spacecraft to fly on the International Space Station in 2004. The model-based FDI uses a maximum-likelihood calculation at its core, while the ID is based upon recursive least squares estimation. Flight test results from the SPHERES implementation, as flown aboard the NASA KC-1 35A 0-g simulator aircraft in November 2003 are presented.
Modelling of 3D fractured geological systems - technique and application

NASA Astrophysics Data System (ADS)

Cacace, M.; Scheck-Wenderoth, M.; Cherubini, Y.; Kaiser, B. O.; Bloecher, G.

2011-12-01

All rocks in the earth's crust are fractured to some extent. Faults and fractures are important in different scientific and industry fields comprising engineering, geotechnical and hydrogeological applications. Many petroleum, gas and geothermal and water supply reservoirs form in faulted and fractured geological systems. Additionally, faults and fractures may control the transport of chemical contaminants into and through the subsurface. Depending on their origin and orientation with respect to the recent and palaeo stress field as well as on the overall kinematics of chemical processes occurring within them, faults and fractures can act either as hydraulic conductors providing preferential pathways for fluid to flow or as barriers preventing flow across them. The main challenge in modelling processes occurring in fractured rocks is related to the way of describing the heterogeneities of such geological systems. Flow paths are controlled by the geometry of faults and their open void space. To correctly simulate these processes an adequate 3D mesh is a basic requirement. Unfortunately, the representation of realistic 3D geological environments is limited by the complexity of embedded fracture networks often resulting in oversimplified models of the natural system. A technical description of an improved method to integrate generic dipping structures (representing faults and fractures) into a 3D porous medium is out forward. The automated mesh generation algorithm is composed of various existing routines from computational geometry (e.g. 2D-3D projection, interpolation, intersection, convex hull calculation) and meshing (e.g. triangulation in 2D and tetrahedralization in 3D). All routines have been combined in an automated software framework and the robustness of the approach has been tested and verified. These techniques and methods can be applied for fractured porous media including fault systems and therefore found wide applications in different geo-energy related topics including CO2 storage in deep saline aquifers, shale gas extraction and geothermal heat recovery. The main advantage is that dipping structures can be integrated into a 3D body representing the porous media and the interaction between the discrete flow paths through and across faults and fractures and within the rock matrix can be correctly simulated. In addition the complete workflow is captured by open-source software.
Certification of computational results

NASA Technical Reports Server (NTRS)

Sullivan, Gregory F.; Wilson, Dwight S.; Masson, Gerald M.

1993-01-01

A conceptually novel and powerful technique to achieve fault detection and fault tolerance in hardware and software systems is described. When used for software fault detection, this new technique uses time and software redundancy and can be outlined as follows. In the initial phase, a program is run to solve a problem and store the result. In addition, this program leaves behind a trail of data called a certification trail. In the second phase, another program is run which solves the original problem again. This program, however, has access to the certification trail left by the first program. Because of the availability of the certification trail, the second phase can be performed by a less complex program and can execute more quickly. In the final phase, the two results are compared and if they agree the results are accepted as correct; otherwise an error is indicated. An essential aspect of this approach is that the second program must always generate either an error indication or a correct output even when the certification trail it receives from the first program is incorrect. The certification trail approach to fault tolerance is formalized and realizations of it are illustrated by considering algorithms for the following problems: convex hull, sorting, and shortest path. Cases in which the second phase can be run concurrently with the first and act as a monitor are discussed. The certification trail approach are compared to other approaches to fault tolerance.
Fault Injection Validation of a Safety-Critical TMR Sysem

NASA Astrophysics Data System (ADS)

Irrera, Ivano; Madeira, Henrique; Zentai, Andras; Hergovics, Beata

2016-08-01

Digital systems and their software are the core technology for controlling and monitoring industrial systems in practically all activity domains. Functional safety standards such as the European standard EN 50128 for railway applications define the procedures and technical requirements for the development of software for railway control and protection systems. The validation of such systems is a highly demanding task. In this paper we discuss the use of fault injection techniques, which have been used extensively in several domains, particularly in the space domain, to complement the traditional procedures to validate a SIL (Safety Integrity Level) 4 system for railway signalling, implementing a TMR (Triple Modular Redundancy) architecture. The fault injection tool is based on JTAG technology. The results of our injection campaign showed a high degree of tolerance to most of the injected faults, but several cases of unexpected behaviour have also been observed, helping understanding worst-case scenarios.
Quantitative method of medication system interface evaluation.

PubMed

Pingenot, Alleene Anne; Shanteau, James; Pingenot, James D F

2007-01-01

The objective of this study was to develop a quantitative method of evaluating the user interface for medication system software. A detailed task analysis provided a description of user goals and essential activity. A structural fault analysis was used to develop a detailed description of the system interface. Nurses experienced with use of the system under evaluation provided estimates of failure rates for each point in this simplified fault tree. Means of estimated failure rates provided quantitative data for fault analysis. Authors note that, although failures of steps in the program were frequent, participants reported numerous methods of working around these failures so that overall system failure was rare. However, frequent process failure can affect the time required for processing medications, making a system inefficient. This method of interface analysis, called Software Efficiency Evaluation and Fault Identification Method, provides quantitative information with which prototypes can be compared and problems within an interface identified.
Fault Detection and Correction for the Solar Dynamics Observatory Attitude Control System

NASA Technical Reports Server (NTRS)

Starin, Scott R.; Vess, Melissa F.; Kenney, Thomas M.; Maldonado, Manuel D.; Morgenstern, Wendy M.

2007-01-01

The Solar Dynamics Observatory is an Explorer-class mission that will launch in early 2009. The spacecraft will operate in a geosynchronous orbit, sending data 24 hours a day to a devoted ground station in White Sands, New Mexico. It will carry a suite of instruments designed to observe the Sun in multiple wavelengths at unprecedented resolution. The Atmospheric Imaging Assembly includes four telescopes with focal plane CCDs that can image the full solar disk in four different visible wavelengths. The Extreme-ultraviolet Variability Experiment will collect time-correlated data on the activity of the Sun's corona. The Helioseismic and Magnetic Imager will enable study of pressure waves moving through the body of the Sun. The attitude control system on Solar Dynamics Observatory is responsible for four main phases of activity. The physical safety of the spacecraft after separation must be guaranteed. Fine attitude determination and control must be sufficient for instrument calibration maneuvers. The mission science mode requires 2-arcsecond control according to error signals provided by guide telescopes on the Atmospheric Imaging Assembly, one of the three instruments to be carried. Lastly, accurate execution of linear and angular momentum changes to the spacecraft must be provided for momentum management and orbit maintenance. In thsp aper, single-fault tolerant fault detection and correction of the Solar Dynamics Observatory attitude control system is described. The attitude control hardware suite for the mission is catalogued, with special attention to redundancy at the hardware level. Four reaction wheels are used where any three are satisfactory. Four pairs of redundant thrusters are employed for orbit change maneuvers and momentum management. Three two-axis gyroscopes provide full redundancy for rate sensing. A digital Sun sensor and two autonomous star trackers provide two-out-of-three redundancy for fine attitude determination. The use of software to maximize chances of recovery from any hardware or software fault is detailed. A generic fault detection and correction software structure is used, allowing additions, deletions, and adjustments to fault detection and correction rules. This software structure is fed by in-line fault tests that are also able to take appropriate actions to avoid corruption of the data stream.
Fault-tolerant clock synchronization in distributed systems

NASA Technical Reports Server (NTRS)

Ramanathan, Parameswaran; Shin, Kang G.; Butler, Ricky W.

1990-01-01

Existing fault-tolerant clock synchronization algorithms are compared and contrasted. These include the following: software synchronization algorithms, such as convergence-averaging, convergence-nonaveraging, and consistency algorithms, as well as probabilistic synchronization; hardware synchronization algorithms; and hybrid synchronization. The worst-case clock skews guaranteed by representative algorithms are compared, along with other important aspects such as time, message, and cost overhead imposed by the algorithms. More recent developments such as hardware-assisted software synchronization and algorithms for synchronizing large, partially connected distributed systems are especially emphasized.
The development of an interim generalized gate logic software simulator

NASA Technical Reports Server (NTRS)

Mcgough, J. G.; Nemeroff, S.

1985-01-01

A proof-of-concept computer program called IGGLOSS (Interim Generalized Gate Logic Software Simulator) was developed and is discussed. The simulator engine was designed to perform stochastic estimation of self test coverage (fault-detection latency times) of digital computers or systems. A major attribute of the IGGLOSS is its high-speed simulation: 9.5 x 1,000,000 gates/cpu sec for nonfaulted circuits and 4.4 x 1,000,000 gates/cpu sec for faulted circuits on a VAX 11/780 host computer.
The Design of a Fault-Tolerant COTS-Based Bus Architecture

NASA Technical Reports Server (NTRS)

Chau, Savio N.; Alkalai, Leon; Burt, John B.; Tai, Ann T.

1999-01-01

In this paper, we report our experiences and findings on the design of a fault-tolerant bus architecture comprised of two COTS buses, the IEEE 1394 and the 12C. This fault-tolerant bus is the backbone system bus for the avionics architecture of the X2000 program at the Jet Propulsion Laboratory. COTS buses are attractive because of the availability of low cost commercial products. However, they are not specifically designed for highly reliable applications such as long-life deep-space missions. The X2000 design team has devised a multi-level fault tolerance approach to compensate for this shortcoming of COTS buses. First, the approach enhances the fault tolerance capabilities of the IEEE 1394 and 12 C buses by adding a layer of fault handling hardware and software. Second, algorithms are developed to enable the IEEE 1394 and the 12 C buses assist each other to isolate and recovery from faults. Third, the set of IEEE 1394 and 12 C buses is duplicated to further enhance system reliability. The X2000 design team has paid special attention to guarantee that all fault tolerance provisions will not cause the bus design to deviate from the commercial standard specifications. Otherwise, the economic attractiveness of using COTS will be diminished. The hardware and software design of the X2000 fault-tolerant bus are being implemented and flight hardware will be delivered to the ST4 and Europa Orbiter missions.
AADL and Model-based Engineering

DTIC Science & Technology

2014-10-20

and MBE Feiler, Oct 20, 2014 © 2014 Carnegie Mellon University We Rely on Software for Safe Aircraft Operation Embedded software systems ...D eveloper Compute Platform Runtime Architecture Application Software Embedded SW System Engineer Data Stream Characteristics Latency...confusion Hardware Engineer Why do system level failures still occur despite fault tolerance techniques being deployed in systems ? Embedded software
A survey of fault diagnosis technology

NASA Technical Reports Server (NTRS)

Riedesel, Joel

1989-01-01

Existing techniques and methodologies for fault diagnosis are surveyed. The techniques run the gamut from theoretical artificial intelligence work to conventional software engineering applications. They are shown to define a spectrum of implementation alternatives where tradeoffs determine their position on the spectrum. Various tradeoffs include execution time limitations and memory requirements of the algorithms as well as their effectiveness in addressing the fault diagnosis problem.
An experiment in software reliability: Additional analyses using data from automated replications

NASA Technical Reports Server (NTRS)

Dunham, Janet R.; Lauterbach, Linda A.

1988-01-01

A study undertaken to collect software error data of laboratory quality for use in the development of credible methods for predicting the reliability of software used in life-critical applications is summarized. The software error data reported were acquired through automated repetitive run testing of three independent implementations of a launch interceptor condition module of a radar tracking problem. The results are based on 100 test applications to accumulate a sufficient sample size for error rate estimation. The data collected is used to confirm the results of two Boeing studies reported in NASA-CR-165836 Software Reliability: Repetitive Run Experimentation and Modeling, and NASA-CR-172378 Software Reliability: Additional Investigations into Modeling With Replicated Experiments, respectively. That is, the results confirm the log-linear pattern of software error rates and reject the hypothesis of equal error rates per individual fault. This rejection casts doubt on the assumption that the program's failure rate is a constant multiple of the number of residual bugs; an assumption which underlies some of the current models of software reliability. data raises new questions concerning the phenomenon of interacting faults.

Software Health Management with Bayesian Networks

NASA Technical Reports Server (NTRS)

Mengshoel, Ole; Schumann, JOhann

2011-01-01

Most modern aircraft as well as other complex machinery is equipped with diagnostics systems for its major subsystems. During operation, sensors provide important information about the subsystem (e.g., the engine) and that information is used to detect and diagnose faults. Most of these systems focus on the monitoring of a mechanical, hydraulic, or electromechanical subsystem of the vehicle or machinery. Only recently, health management systems that monitor software have been developed. In this paper, we will discuss our approach of using Bayesian networks for Software Health Management (SWHM). We will discuss SWHM requirements, which make advanced reasoning capabilities for the detection and diagnosis important. Then we will present our approach to using Bayesian networks for the construction of health models that dynamically monitor a software system and is capable of detecting and diagnosing faults.
Numerical simulations of earthquakes and the dynamics of fault systems using the Finite Element method.

NASA Astrophysics Data System (ADS)

Kettle, L. M.; Mora, P.; Weatherley, D.; Gross, L.; Xing, H.

2006-12-01

Simulations using the Finite Element method are widely used in many engineering applications and for the solution of partial differential equations (PDEs). Computational models based on the solution of PDEs play a key role in earth systems simulations. We present numerical modelling of crustal fault systems where the dynamic elastic wave equation is solved using the Finite Element method. This is achieved using a high level computational modelling language, escript, available as open source software from ACcESS (Australian Computational Earth Systems Simulator), the University of Queensland. Escript is an advanced geophysical simulation software package developed at ACcESS which includes parallel equation solvers, data visualisation and data analysis software. The escript library was implemented to develop a flexible Finite Element model which reliably simulates the mechanism of faulting and the physics of earthquakes. Both 2D and 3D elastodynamic models are being developed to study the dynamics of crustal fault systems. Our final goal is to build a flexible model which can be applied to any fault system with user-defined geometry and input parameters. To study the physics of earthquake processes, two different time scales must be modelled, firstly the quasi-static loading phase which gradually increases stress in the system (~100years), and secondly the dynamic rupture process which rapidly redistributes stress in the system (~100secs). We will discuss the solution of the time-dependent elastic wave equation for an arbitrary fault system using escript. This involves prescribing the correct initial stress distribution in the system to simulate the quasi-static loading of faults to failure; determining a suitable frictional constitutive law which accurately reproduces the dynamics of the stick/slip instability at the faults; and using a robust time integration scheme. These dynamic models generate data and information that can be used for earthquake forecasting.
SIRU utilization. Volume 2: Software description and program documentation

NASA Technical Reports Server (NTRS)

Oehrle, J.; Whittredge, R.

1973-01-01

A complete description of the additional analysis, development and evaluation provided for the SIRU system as identified in the requirements for the SIRU utilization program is presented. The SIRU configuration is a modular inertial subsystem with hardware and software features that achieve fault tolerant operational capabilities. The SIRU redundant hardware design is formulated about a six gyro and six accelerometer instrument module package. The modules are mounted in this package so that their measurement input axes form a unique symmetrical pattern that corresponds to the array of perpendiculars to the faces of a regular dodecahedron. This six axes array provides redundant independent sensing and the symmetry enables the formulation of an optimal software redundant data processing structure with self-contained fault detection and isolation (FDI) capabilities. Documentation of the additional software and software modifications required to implement the utilization capabilities includes assembly listings and flow charts
A theoretical basis for the analysis of multiversion software subject to coincident errors

NASA Technical Reports Server (NTRS)

Eckhardt, D. E., Jr.; Lee, L. D.

1985-01-01

Fundamental to the development of redundant software techniques (known as fault-tolerant software) is an understanding of the impact of multiple joint occurrences of errors, referred to here as coincident errors. A theoretical basis for the study of redundant software is developed which: (1) provides a probabilistic framework for empirically evaluating the effectiveness of a general multiversion strategy when component versions are subject to coincident errors, and (2) permits an analytical study of the effects of these errors. An intensity function, called the intensity of coincident errors, has a central role in this analysis. This function describes the propensity of programmers to introduce design faults in such a way that software components fail together when executing in the application environment. A condition under which a multiversion system is a better strategy than relying on a single version is given.
Learning from examples - Generation and evaluation of decision trees for software resource analysis

NASA Technical Reports Server (NTRS)

Selby, Richard W.; Porter, Adam A.

1988-01-01

A general solution method for the automatic generation of decision (or classification) trees is investigated. The approach is to provide insights through in-depth empirical characterization and evaluation of decision trees for software resource data analysis. The trees identify classes of objects (software modules) that had high development effort. Sixteen software systems ranging from 3,000 to 112,000 source lines were selected for analysis from a NASA production environment. The collection and analysis of 74 attributes (or metrics), for over 4,700 objects, captured information about the development effort, faults, changes, design style, and implementation style. A total of 9,600 decision trees were automatically generated and evaluated. The trees correctly identified 79.3 percent of the software modules that had high development effort or faults, and the trees generated from the best parameter combinations correctly identified 88.4 percent of the modules on the average.
Spacecraft fault tolerance: The Magellan experience

NASA Technical Reports Server (NTRS)

Kasuda, Rick; Packard, Donna Sexton

1993-01-01

Interplanetary and earth orbiting missions are now imposing unique fault tolerant requirements upon spacecraft design. Mission success is the prime motivator for building spacecraft with fault tolerant systems. The Magellan spacecraft had many such requirements imposed upon its design. Magellan met these requirements by building redundancy into all the major subsystem components and designing the onboard hardware and software with the capability to detect a fault, isolate it to a component, and issue commands to achieve a back-up configuration. This discussion is limited to fault protection, which is the autonomous capability to respond to a fault. The Magellan fault protection design is discussed, as well as the developmental and flight experiences and a summary of the lessons learned.
Multi-Agent Diagnosis and Control of an Air Revitalization System for Life Support in Space

NASA Technical Reports Server (NTRS)

Malin, Jane T.; Kowing, Jeffrey; Nieten, Joseph; Graham, Jeffrey s.; Schreckenghost, Debra; Bonasso, Pete; Fleming, Land D.; MacMahon, Matt; Thronesbery, Carroll

2000-01-01

An architecture of interoperating agents has been developed to provide control and fault management for advanced life support systems in space. In this adjustable autonomy architecture, software agents coordinate with human agents and provide support in novel fault management situations. This architecture combines the Livingstone model-based mode identification and reconfiguration (MIR) system with the 3T architecture for autonomous flexible command and control. The MIR software agent performs model-based state identification and diagnosis. MIR identifies novel recovery configurations and the set of commands required for the recovery. The AZT procedural executive and the human operator use the diagnoses and recovery recommendations, and provide command sequencing. User interface extensions have been developed to support human monitoring of both AZT and MIR data and activities. This architecture has been demonstrated performing control and fault management for an oxygen production system for air revitalization in space. The software operates in a dynamic simulation testbed.
Transparent Ada rendezvous in a fault tolerant distributed system

NASA Technical Reports Server (NTRS)

Racine, Roger

1986-01-01

There are many problems associated with distributing an Ada program over a loosely coupled communication network. Some of these problems involve the various aspects of the distributed rendezvous. The problems addressed involve supporting the delay statement in a selective call and supporting the else clause in a selective call. Most of these difficulties are compounded by the need for an efficient communication system. The difficulties are compounded even more by considering the possibility of hardware faults occurring while the program is running. With a hardware fault tolerant computer system, it is possible to design a distribution scheme and communication software which is efficient and allows Ada semantics to be preserved. An Ada design for the communications software of one such system will be presented, including a description of the services provided in the seven layers of an International Standards Organization (ISO) Open System Interconnect (OSI) model communications system. The system capabilities (hardware and software) that allow this communication system will also be described.
Characterization of the faulted behavior of digital computers and fault tolerant systems

NASA Technical Reports Server (NTRS)

Bavuso, Salvatore J.; Miner, Paul S.

1989-01-01

A development status evaluation is presented for efforts conducted at NASA-Langley since 1977, toward the characterization of the latent fault in digital fault-tolerant systems. Attention is given to the practical, high speed, generalized gate-level logic system simulator developed, as well as to the validation methodology used for the simulator, on the basis of faultable software and hardware simulations employing a prototype MIL-STD-1750A processor. After validation, latency tests will be performed.
Computer Sciences and Data Systems, volume 1

NASA Technical Reports Server (NTRS)

1987-01-01

Topics addressed include: software engineering; university grants; institutes; concurrent processing; sparse distributed memory; distributed operating systems; intelligent data management processes; expert system for image analysis; fault tolerant software; and architecture research.
Fault Tolerant Real-Time Systems

DTIC Science & Technology

1993-09-30

The ART (Advanced Real-Time Technology) Project of Carnegie Mellon University is engaged in wide ranging research on hard real - time systems . The...including hardware and software fault tolerance using temporal redundancy and analytic redundancy to permit the construction of real - time systems whose
Calculation and use of an environment's characteristic software metric set

NASA Technical Reports Server (NTRS)

Basili, Victor R.; Selby, Richard W., Jr.

1985-01-01

Since both cost/quality and production environments differ, this study presents an approach for customizing a characteristic set of software metrics to an environment. The approach is applied in the Software Engineering Laboratory (SEL), a NASA Goddard production environment, to 49 candidate process and product metrics of 652 modules from six (51,000 to 112,000 lines) projects. For this particular environment, the method yielded the characteristic metric set (source lines, fault correction effort per executable statement, design effort, code effort, number of I/O parameters, number of versions). The uses examined for a characteristic metric set include forecasting the effort for development, modification, and fault correction of modules based on historical data.
An Investigation of Network Enterprise Risk Management Techniques to Support Military Net-Centric Operations

DTIC Science & Technology

2009-09-01

this information supports the decison - making process as it is applied to the management of risk. 2. Operational Risk Operational risk is the threat... reasonability . However, to make a software system fault tolerant, the system needs to recognize and fix a system state condition. To detect a fault, a fault...Tracking ..........................................51 C. DECISION- MAKING PROCESS................................................................51 1. Risk
Secure Embedded System Design Methodologies for Military Cryptographic Systems

DTIC Science & Technology

2016-03-31

Fault- Tree Analysis (FTA); Built-In Self-Test (BIST) Introduction Secure access-control systems restrict operations to authorized users via methods...failures in the individual software/processor elements, the question of exactly how unlikely is difficult to answer. Fault- Tree Analysis (FTA) has a...Collins of Sandia National Laboratories for years of sharing his extensive knowledge of Fail-Safe Design Assurance and Fault- Tree Analysis
Intelligent fault management for the Space Station active thermal control system

NASA Technical Reports Server (NTRS)

Hill, Tim; Faltisco, Robert M.

1992-01-01

The Thermal Advanced Automation Project (TAAP) approach and architecture is described for automating the Space Station Freedom (SSF) Active Thermal Control System (ATCS). The baseline functionally and advanced automation techniques for Fault Detection, Isolation, and Recovery (FDIR) will be compared and contrasted. Advanced automation techniques such as rule-based systems and model-based reasoning should be utilized to efficiently control, monitor, and diagnose this extremely complex physical system. TAAP is developing advanced FDIR software for use on the SSF thermal control system. The goal of TAAP is to join Knowledge-Based System (KBS) technology, using a combination of rules and model-based reasoning, with conventional monitoring and control software in order to maximize autonomy of the ATCS. TAAP's predecessor was NASA's Thermal Expert System (TEXSYS) project which was the first large real-time expert system to use both extensive rules and model-based reasoning to control and perform FDIR on a large, complex physical system. TEXSYS showed that a method is needed for safely and inexpensively testing all possible faults of the ATCS, particularly those potentially damaging to the hardware, in order to develop a fully capable FDIR system. TAAP therefore includes the development of a high-fidelity simulation of the thermal control system. The simulation provides realistic, dynamic ATCS behavior and fault insertion capability for software testing without hardware related risks or expense. In addition, thermal engineers will gain greater confidence in the KBS FDIR software than was possible prior to this kind of simulation testing. The TAAP KBS will initially be a ground-based extension of the baseline ATCS monitoring and control software and could be migrated on-board as additional computation resources are made available.
Development of an Environment for Software Reliability Model Selection

DTIC Science & Technology

1992-09-01

now is directed to other related problems such as tools for model selection, multiversion programming, and software fault tolerance modeling... multiversion programming, 7. Hlardware can be repaired by spare modules, which is not. the case for software, 2-6 N. Preventive maintenance is very important
A study of discrete control signal fault conditions in the shuttle DPS

NASA Technical Reports Server (NTRS)

Reddi, S. S.; Retter, C. T.

1976-01-01

An analysis of the effects of discrete failures on the data processing subsystem is presented. A functional description of each discrete together with a list of software modules that use this discrete are included. A qualitative description of the consequences that may ensue due to discrete failures is given followed by a probabilistic reliability analysis of the data processing subsystem. Based on the investigation conducted, recommendations were made to improve the reliability of the subsystem.
Power plant fault detection using artificial neural network

NASA Astrophysics Data System (ADS)

Thanakodi, Suresh; Nazar, Nazatul Shiema Moh; Joini, Nur Fazriana; Hidzir, Hidzrin Dayana Mohd; Awira, Mohammad Zulfikar Khairul

2018-02-01

The fault that commonly occurs in power plants is due to various factors that affect the system outage. There are many types of faults in power plants such as single line to ground fault, double line to ground fault, and line to line fault. The primary aim of this paper is to diagnose the fault in 14 buses power plants by using an Artificial Neural Network (ANN). The Multilayered Perceptron Network (MLP) that detection trained utilized the offline training methods such as Gradient Descent Backpropagation (GDBP), Levenberg-Marquardt (LM), and Bayesian Regularization (BR). The best method is used to build the Graphical User Interface (GUI). The modelling of 14 buses power plant, network training, and GUI used the MATLAB software.
Development of N-version software samples for an experiment in software fault tolerance

NASA Technical Reports Server (NTRS)

Lauterbach, L.

1987-01-01

The report documents the task planning and software development phases of an effort to obtain twenty versions of code independently designed and developed from a common specification. These versions were created for use in future experiments in software fault tolerance, in continuation of the experimental series underway at the Systems Validation Methods Branch (SVMB) at NASA Langley Research Center. The 20 versions were developed under controlled conditions at four U.S. universities, by 20 teams of two researchers each. The versions process raw data from a modified Redundant Strapped Down Inertial Measurement Unit (RSDIMU). The specifications, and over 200 questions submitted by the developers concerning the specifications, are included as appendices to this report. Design documents, and design and code walkthrough reports for each version, were also obtained in this task for use in future studies.
Logic flowgraph methodology - A tool for modeling embedded systems

NASA Technical Reports Server (NTRS)

Muthukumar, C. T.; Guarro, S. B.; Apostolakis, G. E.

1991-01-01

The logic flowgraph methodology (LFM), a method for modeling hardware in terms of its process parameters, has been extended to form an analytical tool for the analysis of integrated (hardware/software) embedded systems. In the software part of a given embedded system model, timing and the control flow among different software components are modeled by augmenting LFM with modified Petrinet structures. The objective of the use of such an augmented LFM model is to uncover possible errors and the potential for unanticipated software/hardware interactions. This is done by backtracking through the augmented LFM mode according to established procedures which allow the semiautomated construction of fault trees for any chosen state of the embedded system (top event). These fault trees, in turn, produce the possible combinations of lower-level states (events) that may lead to the top event.

Secure it now or secure it later: the benefits of addressing cyber-security from the outset

NASA Astrophysics Data System (ADS)

Olama, Mohammed M.; Nutaro, James

2013-05-01

The majority of funding for research and development (R&D) in cyber-security is focused on the end of the software lifecycle where systems have been deployed or are nearing deployment. Recruiting of cyber-security personnel is similarly focused on end-of-life expertise. By emphasizing cyber-security at these late stages, security problems are found and corrected when it is most expensive to do so, thus increasing the cost of owning and operating complex software systems. Worse, expenditures on expensive security measures often mean less money for innovative developments. These unwanted increases in cost and potential slowing of innovation are unavoidable consequences of an approach to security that finds and remediate faults after software has been implemented. We argue that software security can be improved and the total cost of a software system can be substantially reduced by an appropriate allocation of resources to the early stages of a software project. By adopting a similar allocation of R&D funds to the early stages of the software lifecycle, we propose that the costs of cyber-security can be better controlled and, consequently, the positive effects of this R&D on industry will be much more pronounced.
Analysis Impact of Distributed Generation Injection to Profile of Voltage and Short-Circuit Fault in 20 kV Distribution Network System

NASA Astrophysics Data System (ADS)

Mulyadi, Y.; Sucita, T.; Rahmawan, M. D.

2018-01-01

This study was a case study in PT. PLN (Ltd.) APJ Bandung area with the subject taken was the installation of distributed generation (DG) on 20-kV distribution channels. The purpose of this study is to find out the effect of DG to the changes in voltage profile and three-phase short circuit fault in the 20-kV distribution system with load conditions considered to be balanced. The reason for this research is to know how far DG can improve the voltage profile of the channel and to what degree DG can increase the three-phase short circuit fault on each bus. The method used in this study was comparing the simulation results of power flow and short-circuit fault using ETAP Power System software with manual calculations. The result obtained from the power current simulation before the installation of DG voltage was the drop at the end of the channel at 2.515%. Meanwhile, the three-phase short-circuit current fault before the DG installation at the beginning of the channel was 13.43 kA. After the installation of DG with injection of 50%, DG power obtained voltage drop at the end of the channel was 1.715% and the current fault at the beginning network was 14.05 kA. In addition, with injection of 90%, DG power obtained voltage drop at the end of the channel was 1.06% and the current fault at the beginning network was 14.13%.
Improved reservoir characterization of the Rose Run sandstone on the East Randolph Field, Portage County, Ohio

DOE Office of Scientific and Technical Information (OSTI.GOV)

Safley, I.E.; Thomas, J.B.

1996-09-01

The East Randolph Field, located in Randolph Township, Portage County, Ohio, produces oil and gas from the Cambrian Rose Run sandstone unit, a member of the Knox Supergroup. Field development and infill drilling opportunities illustrate the need for improved reservoir characterization of the hydrocarbon productive intervals. This reservoir study is conducted under the Department of Energy`s Reservoir Management Program with professionals from BDM-Oklahoma and Belden & Blake Corporation. Well log and core analyses were conducted to determine the reservoir distribution, the heterogeneity of the hydrocarbon producing intervals, and the effects of faulting and fracturing on well productivity. The Rose Runmore » sandstones and interbedded dolomites were subdivided into three productive intervals. Cross sections were constructed for correlation of individual layers and identification of localized faulting. The geologic data was input into GeoGraphix software for construction of structure, net pay, production, and gas- and water-oil ratio maps.« less
Implementation of an experimental fault-tolerant memory system

NASA Technical Reports Server (NTRS)

Carter, W. C.; Mccarthy, C. E.

1976-01-01

The experimental fault-tolerant memory system described in this paper has been designed to enable the modular addition of spares, to validate the theoretical fault-secure and self-testing properties of the translator/corrector, to provide a basis for experiments using the new testing and correction processes for recovery, and to determine the practicality of such systems. The hardware design and implementation are described, together with methods of fault insertion. The hardware/software interface, including a restricted single error correction/double error detection (SEC/DED) code, is specified. Procedures are carefully described which, (1) test for specified physical faults, (2) ensure that single error corrections are not miscorrections due to triple faults, and (3) enable recovery from double errors.
Improving fault image by determination of optimum seismic survey parameters using ray-based modeling

NASA Astrophysics Data System (ADS)

Saffarzadeh, Sadegh; Javaherian, Abdolrahim; Hasani, Hossein; Talebi, Mohammad Ali

2018-06-01

In complex structures such as faults, salt domes and reefs, specifying the survey parameters is more challenging and critical owing to the complicated wave field behavior involved in such structures. In the petroleum industry, detecting faults has become crucial for reservoir potential where faults can act as traps for hydrocarbon. In this regard, seismic survey modeling is employed to construct a model close to the real structure, and obtain very realistic synthetic seismic data. Seismic modeling software, the velocity model and parameters pre-determined by conventional methods enable a seismic survey designer to run a shot-by-shot virtual survey operation. A reliable velocity model of structures can be constructed by integrating the 2D seismic data, geological reports and the well information. The effects of various survey designs can be investigated by the analysis of illumination maps and flower plots. Also, seismic processing of the synthetic data output can describe the target image using different survey parameters. Therefore, seismic modeling is one of the most economical ways to establish and test the optimum acquisition parameters to obtain the best image when dealing with complex geological structures. The primary objective of this study is to design a proper 3D seismic survey orientation to achieve fault zone structures through ray-tracing seismic modeling. The results prove that a seismic survey designer can enhance the image of fault planes in a seismic section by utilizing the proposed modeling and processing approach.
Quantitative study of tectonic geomorphology along Haiyuan fault based on airborne LiDAR

USGS Publications Warehouse

Chen, Tao; Zhang, Pei Zhen; Liu, Jing; Li, Chuan You; Ren, Zhi Kun; Hudnut, Kenneth W.

2014-01-01

High-precision and high-resolution topography are the fundamental data for active fault research. Light detection and ranging (LiDAR) presents a new approach to build detailed digital elevation models effectively. We take the Haiyuan fault in Gansu Province as an example of how LiDAR data may be used to improve the study of active faults and the risk assessment of related hazards. In the eastern segment of the Haiyuan fault, the Shaomayin site has been comprehensively investigated in previous research because of its exemplary tectonic topographic features. Based on unprecedented LiDAR data, the horizontal and vertical coseismic offsets at the Shaomayin site are described. The measured horizontal value is about 8.6 m, and the vertical value is about 0.8 m. Using prior dating ages sampled from the same location, we estimate the horizontal slip rate as 4.0 ± 1.0 mm/a with high confidence and define that the lower bound of the vertical slip rate is 0.4 ± 0.1 mm/a since the Holocene. LiDAR data can repeat the measurements of field work on quantifying offsets of tectonic landform features quite well. The offset landforms are visualized on an office computer workstation easily, and specialized software may be used to obtain displacement quantitatively. By combining precious chronological results, the fundamental link between fault activity and large earthquakes is better recognized, as well as the potential risk for future earthquake hazards.
Cost-Sensitive Radial Basis Function Neural Network Classifier for Software Defect Prediction

PubMed Central

Venkatesan, R.

2016-01-01

Effective prediction of software modules, those that are prone to defects, will enable software developers to achieve efficient allocation of resources and to concentrate on quality assurance activities. The process of software development life cycle basically includes design, analysis, implementation, testing, and release phases. Generally, software testing is a critical task in the software development process wherein it is to save time and budget by detecting defects at the earliest and deliver a product without defects to the customers. This testing phase should be carefully operated in an effective manner to release a defect-free (bug-free) software product to the customers. In order to improve the software testing process, fault prediction methods identify the software parts that are more noted to be defect-prone. This paper proposes a prediction approach based on conventional radial basis function neural network (RBFNN) and the novel adaptive dimensional biogeography based optimization (ADBBO) model. The developed ADBBO based RBFNN model is tested with five publicly available datasets from the NASA data program repository. The computed results prove the effectiveness of the proposed ADBBO-RBFNN classifier approach with respect to the considered metrics in comparison with that of the early predictors available in the literature for the same datasets. PMID:27738649
Cost-Sensitive Radial Basis Function Neural Network Classifier for Software Defect Prediction.

PubMed

Kumudha, P; Venkatesan, R

Effective prediction of software modules, those that are prone to defects, will enable software developers to achieve efficient allocation of resources and to concentrate on quality assurance activities. The process of software development life cycle basically includes design, analysis, implementation, testing, and release phases. Generally, software testing is a critical task in the software development process wherein it is to save time and budget by detecting defects at the earliest and deliver a product without defects to the customers. This testing phase should be carefully operated in an effective manner to release a defect-free (bug-free) software product to the customers. In order to improve the software testing process, fault prediction methods identify the software parts that are more noted to be defect-prone. This paper proposes a prediction approach based on conventional radial basis function neural network (RBFNN) and the novel adaptive dimensional biogeography based optimization (ADBBO) model. The developed ADBBO based RBFNN model is tested with five publicly available datasets from the NASA data program repository. The computed results prove the effectiveness of the proposed ADBBO-RBFNN classifier approach with respect to the considered metrics in comparison with that of the early predictors available in the literature for the same datasets.
Concurrent development of fault management hardware and software in the SSM/PMAD. [Space Station Module/Power Management And Distribution

NASA Technical Reports Server (NTRS)

Freeman, Kenneth A.; Walsh, Rick; Weeks, David J.

1988-01-01

Space Station issues in fault management are discussed. The system background is described with attention given to design guidelines and power hardware. A contractually developed fault management system, FRAMES, is integrated with the energy management functions, the control switchgear, and the scheduling and operations management functions. The constraints that shaped the FRAMES system and its implementation are considered.
Integrated Environment for Development and Assurance

DTIC Science & Technology

2015-01-26

Jan 26, 2015 © 2015 Carnegie Mellon University We Rely on Software for Safe Aircraft Operation Embedded software systems introduce a new class of...eveloper Compute Platform Runtime Architecture Application Software Embedded SW System Engineer Data Stream Characteristics Latency jitter affects...Why do system level failures still occur despite fault tolerance techniques being deployed in systems ? Embedded software system as major source of
Software For Fault-Tree Diagnosis Of A System

NASA Technical Reports Server (NTRS)

Iverson, Dave; Patterson-Hine, Ann; Liao, Jack

1993-01-01

Fault Tree Diagnosis System (FTDS) computer program is automated-diagnostic-system program identifying likely causes of specified failure on basis of information represented in system-reliability mathematical models known as fault trees. Is modified implementation of failure-cause-identification phase of Narayanan's and Viswanadham's methodology for acquisition of knowledge and reasoning in analyzing failures of systems. Knowledge base of if/then rules replaced with object-oriented fault-tree representation. Enhancement yields more-efficient identification of causes of failures and enables dynamic updating of knowledge base. Written in C language, C++, and Common LISP.
Flight elements: Fault detection and fault management

NASA Technical Reports Server (NTRS)

Lum, H.; Patterson-Hine, A.; Edge, J. T.; Lawler, D.

1990-01-01

Fault management for an intelligent computational system must be developed using a top down integrated engineering approach. An approach proposed includes integrating the overall environment involving sensors and their associated data; design knowledge capture; operations; fault detection, identification, and reconfiguration; testability; causal models including digraph matrix analysis; and overall performance impacts on the hardware and software architecture. Implementation of the concept to achieve a real time intelligent fault detection and management system will be accomplished via the implementation of several objectives, which are: Development of fault tolerant/FDIR requirement and specification from a systems level which will carry through from conceptual design through implementation and mission operations; Implementation of monitoring, diagnosis, and reconfiguration at all system levels providing fault isolation and system integration; Optimize system operations to manage degraded system performance through system integration; and Lower development and operations costs through the implementation of an intelligent real time fault detection and fault management system and an information management system.
MISSION: Mission and Safety Critical Support Environment. Executive overview

NASA Technical Reports Server (NTRS)

Mckay, Charles; Atkinson, Colin

1992-01-01

For mission and safety critical systems it is necessary to: improve definition, evolution and sustenance techniques; lower development and maintenance costs; support safe, timely and affordable system modifications; and support fault tolerance and survivability. The goal of the MISSION project is to lay the foundation for a new generation of integrated systems software providing a unified infrastructure for mission and safety critical applications and systems. This will involve the definition of a common, modular target architecture and a supporting infrastructure.
TES: A modular systems approach to expert system development for real-time space applications

NASA Technical Reports Server (NTRS)

Cacace, Ralph; England, Brenda

1988-01-01

A major goal of the Space Station era is to reduce reliance on support from ground based experts. The development of software programs using expert systems technology is one means of reaching this goal without requiring crew members to become intimately familiar with the many complex spacecraft subsystems. Development of an expert systems program requires a validation of the software with actual flight hardware. By combining accurate hardware and software modelling techniques with a modular systems approach to expert systems development, the validation of these software programs can be successfully completed with minimum risk and effort. The TIMES Expert System (TES) is an application that monitors and evaluates real time data to perform fault detection and fault isolation tasks as they would otherwise be carried out by a knowledgeable designer. The development process and primary features of TES, a modular systems approach, and the lessons learned are discussed.
Fault Detection, Isolation and Recovery (FDIR) Portable Liquid Oxygen Hardware Demonstrator

NASA Technical Reports Server (NTRS)

Oostdyk, Rebecca L.; Perotti, Jose M.

2011-01-01

The Fault Detection, Isolation and Recovery (FDIR) hardware demonstration will highlight the effort being conducted by Constellation's Ground Operations (GO) to provide the Launch Control System (LCS) with system-level health management during vehicle processing and countdown activities. A proof-of-concept demonstration of the FDIR prototype established the capability of the software to provide real-time fault detection and isolation using generated Liquid Hydrogen data. The FDIR portable testbed unit (presented here) aims to enhance FDIR by providing a dynamic simulation of Constellation subsystems that feed the FDIR software live data based on Liquid Oxygen system properties. The LO2 cryogenic ground system has key properties that are analogous to the properties of an electronic circuit. The LO2 system is modeled using electrical components and an equivalent circuit is designed on a printed circuit board to simulate the live data. The portable testbed is also be equipped with data acquisition and communication hardware to relay the measurements to the FDIR application running on a PC. This portable testbed is an ideal capability to perform FDIR software testing, troubleshooting, training among others.
Advanced Diagnostic and Prognostic Testbed (ADAPT) Testability Analysis Report

NASA Technical Reports Server (NTRS)

Ossenfort, John

2008-01-01

As system designs become more complex, determining the best locations to add sensors and test points for the purpose of testing and monitoring these designs becomes more difficult. Not only must the designer take into consideration all real and potential faults of the system, he or she must also find efficient ways of detecting and isolating those faults. Because sensors and cabling take up valuable space and weight on a system, and given constraints on bandwidth and power, it is even more difficult to add sensors into these complex designs after the design has been completed. As a result, a number of software tools have been developed to assist the system designer in proper placement of these sensors during the system design phase of a project. One of the key functions provided by many of these software programs is a testability analysis of the system essentially an evaluation of how observable the system behavior is using available tests. During the design phase, testability metrics can help guide the designer in improving the inherent testability of the design. This may include adding, removing, or modifying tests; breaking up feedback loops, or changing the system to reduce fault propagation. Given a set of test requirements, the analysis can also help to verify that the system will meet those requirements. Of course, a testability analysis requires that a software model of the physical system is available. For the analysis to be most effective in guiding system design, this model should ideally be constructed in parallel with these efforts. The purpose of this paper is to present the final testability results of the Advanced Diagnostic and Prognostic Testbed (ADAPT) after the system model was completed. The tool chosen to build the model and to perform the testability analysis with is the Testability Engineering and Maintenance System Designer (TEAMS-Designer). The TEAMS toolset is intended to be a solution to span all phases of the system, from design and development through health management and maintenance. TEAMS-Designer is the model-building and testability analysis software in that suite.
Reliability and coverage analysis of non-repairable fault-tolerant memory systems

NASA Technical Reports Server (NTRS)

Cox, G. W.; Carroll, B. D.

1976-01-01

A method was developed for the construction of probabilistic state-space models for nonrepairable systems. Models were developed for several systems which achieved reliability improvement by means of error-coding, modularized sparing, massive replication and other fault-tolerant techniques. From the models developed, sets of reliability and coverage equations for the systems were developed. Comparative analyses of the systems were performed using these equation sets. In addition, the effects of varying subunit reliabilities on system reliability and coverage were described. The results of these analyses indicated that a significant gain in system reliability may be achieved by use of combinations of modularized sparing, error coding, and software error control. For sufficiently reliable system subunits, this gain may far exceed the reliability gain achieved by use of massive replication techniques, yet result in a considerable saving in system cost.
Simulation of demand-response power management in smart city

NASA Astrophysics Data System (ADS)

Kadam, Kshitija

Smart Grids manage energy efficiently through intelligent monitoring and control of all the components connected to the electrical grid. Advanced digital technology, combined with sensors and power electronics, can greatly improve transmission line efficiency. This thesis proposed a model of a deregulated grid which supplied power to diverse set of consumers and allowed them to participate in decision making process through two-way communication. The deregulated market encourages competition at the generation and distribution levels through communication with the central system operator. A software platform was developed and executed to manage the communication, as well for energy management of the overall system. It also demonstrated self-healing property of the system in case a fault occurs, resulting in an outage. The system not only recovered from the fault but managed to do so in a short time with no/minimum human involvement.
Onboard Nonlinear Engine Sensor and Component Fault Diagnosis and Isolation Scheme

NASA Technical Reports Server (NTRS)

Tang, Liang; DeCastro, Jonathan A.; Zhang, Xiaodong

2011-01-01

A method detects and isolates in-flight sensor, actuator, and component faults for advanced propulsion systems. In sharp contrast to many conventional methods, which deal with either sensor fault or component fault, but not both, this method considers sensor fault, actuator fault, and component fault under one systemic and unified framework. The proposed solution consists of two main components: a bank of real-time, nonlinear adaptive fault diagnostic estimators for residual generation, and a residual evaluation module that includes adaptive thresholds and a Transferable Belief Model (TBM)-based residual evaluation scheme. By employing a nonlinear adaptive learning architecture, the developed approach is capable of directly dealing with nonlinear engine models and nonlinear faults without the need of linearization. Software modules have been developed and evaluated with the NASA C-MAPSS engine model. Several typical engine-fault modes, including a subset of sensor/actuator/components faults, were tested with a mild transient operation scenario. The simulation results demonstrated that the algorithm was able to successfully detect and isolate all simulated faults as long as the fault magnitudes were larger than the minimum detectable/isolable sizes, and no misdiagnosis occurred
Agile deployment and code coverage testing metrics of the boot software on-board Solar Orbiter's Energetic Particle Detector

NASA Astrophysics Data System (ADS)

Parra, Pablo; da Silva, Antonio; Polo, Óscar R.; Sánchez, Sebastián

2018-02-01

In this day and age, successful embedded critical software needs agile and continuous development and testing procedures. This paper presents the overall testing and code coverage metrics obtained during the unit testing procedure carried out to verify the correctness of the boot software that will run in the Instrument Control Unit (ICU) of the Energetic Particle Detector (EPD) on-board Solar Orbiter. The ICU boot software is a critical part of the project so its verification should be addressed at an early development stage, so any test case missed in this process may affect the quality of the overall on-board software. According to the European Cooperation for Space Standardization ESA standards, testing this kind of critical software must cover 100% of the source code statement and decision paths. This leads to the complete testing of fault tolerance and recovery mechanisms that have to resolve every possible memory corruption or communication error brought about by the space environment. The introduced procedure enables fault injection from the beginning of the development process and enables to fulfill the exigent code coverage demands on the boot software.

Modeling the Proterozoic Basement's Effective Stress Field, Assessing Fault Reactivation Potential Related to Increased Fluid Pressures, and Improved 3D Structural Interpretation of Faulting within Wellington and Anson-Bates Fields, Sumner County, Kansas

NASA Astrophysics Data System (ADS)

Keast, R. T.; Lacroix, B.; Raef, A. E.; Adam, C.; Bidgoli, T. S.; Leclere, H.; Daniel, G.

2017-12-01

South-central Kansas has experienced an increase in seismic activity within the Proterozoic basement. Since 2013, United States Geological Survey (USGS) seismograph stations have recorded 3414 earthquakes. Fluid pressure increases associated with recent high-rate wastewater injection into the dolomitic Arbuckle disposal zone is the hypothesized cause of reactivation of the faulted study region's Proterozoic basement. Although the magnitude of the pressure change required for reactivation of these faults is likely low given failure equilibrium conditions in the midcontinent, heterogeneities in the basement could allow for a range of fluid pressure changes associated with injection. This research aims to quantify the fluid pressure changes responsible for fault reactivation of the Proterozoic basement. To address this issue, we use 103 focal mechanisms and 3,414 seismic events, from the USGS catalog, within an area encompassing 4,000 km2. Three major fault populations have been identified using the dense seismicity and focal mechanism datasets. Win-Tensor paleostress reconstruction software was used to identify effective stress ratios, R = (σ'1/σ'3), and stress tensors for twelve 22 km by 17 km grid squares covering the study area. One fault population strikes parallel with the Nemaha Ridge basement structure ( 030˚). Another reoccurring fault population is oriented 310˚, closely parallel to the Central Kansas Uplift, a subtle anticlinal structure subjected to repeated movement during the Paleozoic. The third population of faults is parallel to the regional maximum compressive stress oriented 265˚ as determined by previous researchers using borehole image logs and shear wave anisotropy. A 3D stress modeling Matlab script was used to analyze fault reactivation potential based on results obtained from Win-Tensor to better understand fault orientations and their susceptibility to reactivation related to pore fluid pressure increases. In addition, the orientations of these normal and strike-slip fault populations suggest the development of a transtensional basin, not yet identified.
Hardware fault insertion and instrumentation system: Mechanization and validation

NASA Technical Reports Server (NTRS)

Benson, J. W.

1987-01-01

Automated test capability for extensive low-level hardware fault insertion testing is developed. The test capability is used to calibrate fault detection coverage and associated latency times as relevant to projecting overall system reliability. Described are modifications made to the NASA Ames Reconfigurable Flight Control System (RDFCS) Facility to fully automate the total test loop involving the Draper Laboratories' Fault Injector Unit. The automated capability provided included the application of sequences of simulated low-level hardware faults, the precise measurement of fault latency times, the identification of fault symptoms, and bulk storage of test case results. A PDP-11/60 served as a test coordinator, and a PDP-11/04 as an instrumentation device. The fault injector was controlled by applications test software in the PDP-11/60, rather than by manual commands from a terminal keyboard. The time base was especially developed for this application to use a variety of signal sources in the system simulator.
RAMP: A fault tolerant distributed microcomputer structure for aircraft navigation and control

NASA Technical Reports Server (NTRS)

Dunn, W. R.

1980-01-01

RAMP consists of distributed sets of parallel computers partioned on the basis of software and packaging constraints. To minimize hardware and software complexity, the processors operate asynchronously. It was shown that through the design of asymptotically stable control laws, data errors due to the asynchronism were minimized. It was further shown that by designing control laws with this property and making minor hardware modifications to the RAMP modules, the system became inherently tolerant to intermittent faults. A laboratory version of RAMP was constructed and is described in the paper along with the experimental results.
Technical Basis for Evaluating Software-Related Common-Cause Failures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Muhlheim, Michael David; Wood, Richard

2016-04-01

The instrumentation and control (I&C) system architecture at a nuclear power plant (NPP) incorporates protections against common-cause failures (CCFs) through the use of diversity and defense-in-depth. Even for well-established analog-based I&C system designs, the potential for CCFs of multiple systems (or redundancies within a system) constitutes a credible threat to defeating the defense-in-depth provisions within the I&C system architectures. The integration of digital technologies into the I&C systems provides many advantages compared to the aging analog systems with respect to reliability, maintenance, operability, and cost effectiveness. However, maintaining the diversity and defense-in-depth for both the hardware and software within themore » digital system is challenging. In fact, the introduction of digital technologies may actually increase the potential for CCF vulnerabilities because of the introduction of undetected systematic faults. These systematic faults are defined as a “design fault located in a software component” and at a high level, are predominately the result of (1) errors in the requirement specification, (2) inadequate provisions to account for design limits (e.g., environmental stress), or (3) technical faults incorporated in the internal system (or architectural) design or implementation. Other technology-neutral CCF concerns include hardware design errors, equipment qualification deficiencies, installation or maintenance errors, instrument loop scaling and setpoint mistakes.« less
SSME digital control design characteristics

NASA Technical Reports Server (NTRS)

Mitchell, W. T.; Searle, R. F.

1985-01-01

To protect against a latent programming error (software fault) existing in an untried branch combination that would render the space shuttle out of control in a critical flight phase, the Backup Flight System (BFS) was chartered to provide a safety alternative. The BFS is designed to operate in critical flight phases (ascent and descent) by monitoring the activities of the space shuttle flight subsystems that are under control of the primary flight software (PFS) (e.g., navigation, crew interface, propulsion), then, upon manual command by the flightcrew, to assume control of the space shuttle and deliver it to a noncritical flight condition (safe orbit or touchdown). The problems associated with the selection of the PFS/BFS system architecture, the internal BFS architecture, the fault tolerant software mechanisms, and the long term BFS utility are discussed.
Advanced information processing system: Hosting of advanced guidance, navigation and control algorithms on AIPS using ASTER

NASA Technical Reports Server (NTRS)

Brenner, Richard; Lala, Jaynarayan H.; Nagle, Gail A.; Schor, Andrei; Turkovich, John

1994-01-01

This program demonstrated the integration of a number of technologies that can increase the availability and reliability of launch vehicles while lowering costs. Availability is increased with an advanced guidance algorithm that adapts trajectories in real-time. Reliability is increased with fault-tolerant computers and communication protocols. Costs are reduced by automatically generating code and documentation. This program was realized through the cooperative efforts of academia, industry, and government. The NASA-LaRC coordinated the effort, while Draper performed the integration. Georgia Institute of Technology supplied a weak Hamiltonian finite element method for optimal control problems. Martin Marietta used MATLAB to apply this method to a launch vehicle (FENOC). Draper supplied the fault-tolerant computing and software automation technology. The fault-tolerant technology includes sequential and parallel fault-tolerant processors (FTP & FTPP) and authentication protocols (AP) for communication. Fault-tolerant technology was incrementally incorporated. Development culminated with a heterogeneous network of workstations and fault-tolerant computers using AP. Draper's software automation system, ASTER, was used to specify a static guidance system based on FENOC, navigation, flight control (GN&C), models, and the interface to a user interface for mission control. ASTER generated Ada code for GN&C and C code for models. An algebraic transform engine (ATE) was developed to automatically translate MATLAB scripts into ASTER.
The embedded software life cycle - An expanded view

NASA Technical Reports Server (NTRS)

Larman, Brian T.; Loesh, Robert E.

1989-01-01

Six common issues that are encountered in the development of software for embedded computer systems are discussed from the perspective of their interrelationships with the development process and/or the system itself. Particular attention is given to concurrent hardware/software development, prototyping, the inaccessibility of the operational system, fault tolerance, the long life cycle, and inheritance. It is noted that the life cycle for embedded software must include elements beyond simply the specification and implementation of the target software.
Software safety

NASA Technical Reports Server (NTRS)

Leveson, Nancy

1987-01-01

Software safety and its relationship to other qualities are discussed. It is shown that standard reliability and fault tolerance techniques will not solve the safety problem for the present. A new attitude requires: looking at what you do NOT want software to do along with what you want it to do; and assuming things will go wrong. New procedures and changes to entire software development process are necessary: special software safety analysis techniques are needed; and design techniques, especially eliminating complexity, can be very helpful.
A Genetic Representation for Evolutionary Fault Recovery in Virtex FPGAs

NASA Technical Reports Server (NTRS)

Lohn, Jason; Larchev, Greg; DeMara, Ronald; Korsmeyer, David (Technical Monitor)

2003-01-01

Most evolutionary approaches to fault recovery in FPGAs focus on evolving alternative logic configurations as opposed to evolving the intra-cell routing. Since the majority of transistors in a typical FPGA are dedicated to interconnect, nearly 80% according to one estimate, evolutionary fault-recovery systems should benefit hy accommodating routing. In this paper, we propose an evolutionary fault-recovery system employing a genetic representation that takes into account both logic and routing configurations. Experiments were run using a software model of the Xilinx Virtex FPGA. We report that using four Virtex combinational logic blocks, we were able to evolve a 100% accurate quadrature decoder finite state machine in the presence of a stuck-at-zero fault.
Reinforcement and Drill by Microcomputer.

ERIC Educational Resources Information Center

Balajthy, Ernest

1984-01-01

Points out why drill work has a role in the language arts classroom, explores the possibilities of using a microcomputer to give children drill work, and discusses the characteristics of a good software program, along with faults found in many software programs. (FL)
Graphical workstation capability for reliability modeling

NASA Technical Reports Server (NTRS)

Bavuso, Salvatore J.; Koppen, Sandra V.; Haley, Pamela J.

1992-01-01

In addition to computational capabilities, software tools for estimating the reliability of fault-tolerant digital computer systems must also provide a means of interfacing with the user. Described here is the new graphical interface capability of the hybrid automated reliability predictor (HARP), a software package that implements advanced reliability modeling techniques. The graphics oriented (GO) module provides the user with a graphical language for modeling system failure modes through the selection of various fault-tree gates, including sequence-dependency gates, or by a Markov chain. By using this graphical input language, a fault tree becomes a convenient notation for describing a system. In accounting for any sequence dependencies, HARP converts the fault-tree notation to a complex stochastic process that is reduced to a Markov chain, which it can then solve for system reliability. The graphics capability is available for use on an IBM-compatible PC, a Sun, and a VAX workstation. The GO module is written in the C programming language and uses the graphical kernal system (GKS) standard for graphics implementation. The PC, VAX, and Sun versions of the HARP GO module are currently in beta-testing stages.
MATTS- A Step Towards Model Based Testing

NASA Astrophysics Data System (ADS)

Herpel, H.-J.; Willich, G.; Li, J.; Xie, J.; Johansen, B.; Kvinnesland, K.; Krueger, S.; Barrios, P.

2016-08-01

In this paper we describe a Model Based approach to testing of on-board software and compare it with traditional validation strategy currently applied to satellite software. The major problems that software engineering will face over at least the next two decades are increasing application complexity driven by the need for autonomy and serious application robustness. In other words, how do we actually get to declare success when trying to build applications one or two orders of magnitude more complex than today's applications. To solve the problems addressed above the software engineering process has to be improved at least for two aspects: 1) Software design and 2) Software testing. The software design process has to evolve towards model-based approaches with extensive use of code generators. Today, testing is an essential, but time and resource consuming activity in the software development process. Generating a short, but effective test suite usually requires a lot of manual work and expert knowledge. In a model-based process, among other subtasks, test construction and test execution can also be partially automated. The basic idea behind the presented study was to start from a formal model (e.g. State Machines), generate abstract test cases which are then converted to concrete executable test cases (input and expected output pairs). The generated concrete test cases were applied to an on-board software. Results were collected and evaluated wrt. applicability, cost-efficiency, effectiveness at fault finding, and scalability.
Three-dimensional geologic map of the Hayward fault, northern California: Correlation of rock unites with variations in seismicity, creep rate, and fault dip

USGS Publications Warehouse

Graymer, R.W.; Ponce, D.A.; Jachens, R.C.; Simpson, R.W.; Phelps, G.A.; Wentworth, C.M.

2005-01-01

In order to better understand mechanisms of active faults, we studied relationships between fault behavior and rock units along the Hayward fault using a three-dimensional geologic map. The three-dimensional map-constructed from hypocenters, potential field data, and surface map data-provided a geologic map of each fault surface, showing rock units on either side of the fault truncated by the fault. The two fault-surface maps were superimposed to create a rock-rock juxtaposition map. The three maps were compared with seismicity, including aseismic patches, surface creep, and fault dip along the fault, by using visuallization software to explore three-dimensional relationships. Fault behavior appears to be correlated to the fault-surface maps, but not to the rock-rock juxtaposition map, suggesting that properties of individual wall-rock units, including rock strength, play an important role in fault behavior. Although preliminary, these results suggest that any attempt to understand the detailed distribution of earthquakes or creep along a fault should include consideration of the rock types that abut the fault surface, including the incorporation of observations of physical properties of the rock bodies that intersect the fault at depth. ?? 2005 Geological Society of America.
From experiment to design -- Fault characterization and detection in parallel computer systems using computational accelerators

NASA Astrophysics Data System (ADS)

Yim, Keun Soo

This dissertation summarizes experimental validation and co-design studies conducted to optimize the fault detection capabilities and overheads in hybrid computer systems (e.g., using CPUs and Graphics Processing Units, or GPUs), and consequently to improve the scalability of parallel computer systems using computational accelerators. The experimental validation studies were conducted to help us understand the failure characteristics of CPU-GPU hybrid computer systems under various types of hardware faults. The main characterization targets were faults that are difficult to detect and/or recover from, e.g., faults that cause long latency failures (Ch. 3), faults in dynamically allocated resources (Ch. 4), faults in GPUs (Ch. 5), faults in MPI programs (Ch. 6), and microarchitecture-level faults with specific timing features (Ch. 7). The co-design studies were based on the characterization results. One of the co-designed systems has a set of source-to-source translators that customize and strategically place error detectors in the source code of target GPU programs (Ch. 5). Another co-designed system uses an extension card to learn the normal behavioral and semantic execution patterns of message-passing processes executing on CPUs, and to detect abnormal behaviors of those parallel processes (Ch. 6). The third co-designed system is a co-processor that has a set of new instructions in order to support software-implemented fault detection techniques (Ch. 7). The work described in this dissertation gains more importance because heterogeneous processors have become an essential component of state-of-the-art supercomputers. GPUs were used in three of the five fastest supercomputers that were operating in 2011. Our work included comprehensive fault characterization studies in CPU-GPU hybrid computers. In CPUs, we monitored the target systems for a long period of time after injecting faults (a temporally comprehensive experiment), and injected faults into various types of program states that included dynamically allocated memory (to be spatially comprehensive). In GPUs, we used fault injection studies to demonstrate the importance of detecting silent data corruption (SDC) errors that are mainly due to the lack of fine-grained protections and the massive use of fault-insensitive data. This dissertation also presents transparent fault tolerance frameworks and techniques that are directly applicable to hybrid computers built using only commercial off-the-shelf hardware components. This dissertation shows that by developing understanding of the failure characteristics and error propagation paths of target programs, we were able to create fault tolerance frameworks and techniques that can quickly detect and recover from hardware faults with low performance and hardware overheads.
Automatically generated acceptance test: A software reliability experiment

NASA Technical Reports Server (NTRS)

Protzel, Peter W.

1988-01-01

This study presents results of a software reliability experiment investigating the feasibility of a new error detection method. The method can be used as an acceptance test and is solely based on empirical data about the behavior of internal states of a program. The experimental design uses the existing environment of a multi-version experiment previously conducted at the NASA Langley Research Center, in which the launch interceptor problem is used as a model. This allows the controlled experimental investigation of versions with well-known single and multiple faults, and the availability of an oracle permits the determination of the error detection performance of the test. Fault interaction phenomena are observed that have an amplifying effect on the number of error occurrences. Preliminary results indicate that all faults examined so far are detected by the acceptance test. This shows promise for further investigations, and for the employment of this test method on other applications.
Impact of mineralization on carbon dioxide migration in term of critical value of fault permeability.

NASA Astrophysics Data System (ADS)

Alshammari, A.; Brantley, D.; Knapp, C. C.; Lakshmi, V.

2017-12-01

In this study, multi chemical components ((H2O, H2S) will be injected with supercritical carbon dioxide in onshore part of South Georgia Rift (SGR) Basin model. Chemical reaction expected issue between these components to produce stable mineral of carbonite rocks by the time. The 3D geological model has been extracted from petrel software and computer modelling group (CMG) package software has been used to build simulation model explain the effect of mineralization on fault permeability that control on plume migration critically between (0-0.05 m Darcy). The expected results will be correlated with single component case (CO2 only) to evaluate the importance the mineralization on CO2 plume migration in structure and stratigraphic traps and detect the variation of fault leakage in case of critical values (low permeability). The results will also, show us the ratio of every trapped phase in (SGR) basin reservoir model.
An SSME High Pressure Oxidizer Turbopump diagnostic system using G2 real-time expert system

NASA Technical Reports Server (NTRS)

Guo, Ten-Huei

1991-01-01

An expert system which diagnoses various seal leakage faults in the High Pressure Oxidizer Turbopump of the SSME was developed using G2 real-time expert system. Three major functions of the software were implemented: model-based data generation, real-time expert system reasoning, and real-time input/output communication. This system is proposed as one module of a complete diagnostic system for the SSME. Diagnosis of a fault is defined as the determination of its type, severity, and likelihood. Since fault diagnosis is often accomplished through the use of heuristic human knowledge, an expert system based approach has been adopted as a paradigm to develop this diagnostic system. To implement this approach, a software shell which can be easily programmed to emulate the human decision process, the G2 Real-Time Expert System, was selected. Lessons learned from this implementation are discussed.
An SSME high pressure oxidizer turbopump diagnostic system using G2(TM) real-time expert system

NASA Technical Reports Server (NTRS)

Guo, Ten-Huei

1991-01-01

An expert system which diagnoses various seal leakage faults in the High Pressure Oxidizer Turbopump of the SSME was developed using G2(TM) real-time expert system. Three major functions of the software were implemented: model-based data generation, real-time expert system reasoning, and real-time input/output communication. This system is proposed as one module of a complete diagnostic system for Space Shuttle Main Engine. Diagnosis of a fault is defined as the determination of its type, severity, and likelihood. Since fault diagnosis is often accomplished through the use of heuristic human knowledge, an expert system based approach was adopted as a paradigm to develop this diagnostic system. To implement this approach, a software shell which can be easily programmed to emulate the human decision process, the G2 Real-Time Expert System, was selected. Lessons learned from this implementation are discussed.
Progressive retry for software error recovery in distributed systems

NASA Technical Reports Server (NTRS)

Wang, Yi-Min; Huang, Yennun; Fuchs, W. K.

1993-01-01

In this paper, we describe a method of execution retry for bypassing software errors based on checkpointing, rollback, message reordering and replaying. We demonstrate how rollback techniques, previously developed for transient hardware failure recovery, can also be used to recover from software faults by exploiting message reordering to bypass software errors. Our approach intentionally increases the degree of nondeterminism and the scope of rollback when a previous retry fails. Examples from our experience with telecommunications software systems illustrate the benefits of the scheme.
Analysis of Seismotektonic Patterns in Sumatra Region Based on the Focal Mechanism of Earthquake Period 1976-2016

NASA Astrophysics Data System (ADS)

Indah, F. P.; Syafriani, S.; Andiyansyah, Z. S.

2018-04-01

Sumatra is in an active subduction zone between the indo-australian plate and the eurasian plate and is located at a fault along the sumatra fault so that sumatra is vulnerable to earthquakes. One of the ways to find out the cause of earthquake can be done by identifying the type of earthquake-causing faults based on earthquake of focal mechanism. The data used to identify the type of fault cause of earthquake is the earth tensor moment data which is sourced from global cmt period 1976-2016. The data used in this research using magnitude m ≥ 6 sr. This research uses gmt software (generic mapping tolls) to describe the form of fault. From the research result, it is found that the characteristics of fault field that formed in every region in sumatera island based on data processing and data of earthquake history of 1976-2016 period that the type of fault in sumatera fault is strike slip, fault type in mentawai fault is reverse fault (rising faults) and dip-slip, while the fault type in the subduction zone is dip-slip.

Design and Realization of Controllable Ultrasonic Fault Detector Automatic Verification System

NASA Astrophysics Data System (ADS)

Sun, Jing-Feng; Liu, Hui-Ying; Guo, Hui-Juan; Shu, Rong; Wei, Kai-Li

The ultrasonic flaw detection equipment with remote control interface is researched and the automatic verification system is developed. According to use extensible markup language, the building of agreement instruction set and data analysis method database in the system software realizes the controllable designing and solves the diversification of unreleased device interfaces and agreements. By using the signal generator and a fixed attenuator cascading together, a dynamic error compensation method is proposed, completes what the fixed attenuator does in traditional verification and improves the accuracy of verification results. The automatic verification system operating results confirms that the feasibility of the system hardware and software architecture design and the correctness of the analysis method, while changes the status of traditional verification process cumbersome operations, and reduces labor intensity test personnel.
Extended Testability Analysis Tool

NASA Technical Reports Server (NTRS)

Melcher, Kevin; Maul, William A.; Fulton, Christopher

2012-01-01

The Extended Testability Analysis (ETA) Tool is a software application that supports fault management (FM) by performing testability analyses on the fault propagation model of a given system. Fault management includes the prevention of faults through robust design margins and quality assurance methods, or the mitigation of system failures. Fault management requires an understanding of the system design and operation, potential failure mechanisms within the system, and the propagation of those potential failures through the system. The purpose of the ETA Tool software is to process the testability analysis results from a commercial software program called TEAMS Designer in order to provide a detailed set of diagnostic assessment reports. The ETA Tool is a command-line process with several user-selectable report output options. The ETA Tool also extends the COTS testability analysis and enables variation studies with sensor sensitivity impacts on system diagnostics and component isolation using a single testability output. The ETA Tool can also provide extended analyses from a single set of testability output files. The following analysis reports are available to the user: (1) the Detectability Report provides a breakdown of how each tested failure mode was detected, (2) the Test Utilization Report identifies all the failure modes that each test detects, (3) the Failure Mode Isolation Report demonstrates the system s ability to discriminate between failure modes, (4) the Component Isolation Report demonstrates the system s ability to discriminate between failure modes relative to the components containing the failure modes, (5) the Sensor Sensor Sensitivity Analysis Report shows the diagnostic impact due to loss of sensor information, and (6) the Effect Mapping Report identifies failure modes that result in specified system-level effects.
Modeling of a latent fault detector in a digital system

NASA Technical Reports Server (NTRS)

Nagel, P. M.

1978-01-01

Methods of modeling the detection time or latency period of a hardware fault in a digital system are proposed that explain how a computer detects faults in a computational mode. The objectives were to study how software reacts to a fault, to account for as many variables as possible affecting detection and to forecast a given program's detecting ability prior to computation. A series of experiments were conducted on a small emulated microprocessor with fault injection capability. Results indicate that the detecting capability of a program largely depends on the instruction subset used during computation and the frequency of its use and has little direct dependence on such variables as fault mode, number set, degree of branching and program length. A model is discussed which employs an analog with balls in an urn to explain the rate of which subsequent repetitions of an instruction or instruction set detect a given fault.
Comparative analysis of techniques for evaluating the effectiveness of aircraft computing systems

NASA Technical Reports Server (NTRS)

Hitt, E. F.; Bridgman, M. S.; Robinson, A. C.

1981-01-01

Performability analysis is a technique developed for evaluating the effectiveness of fault-tolerant computing systems in multiphase missions. Performability was evaluated for its accuracy, practical usefulness, and relative cost. The evaluation was performed by applying performability and the fault tree method to a set of sample problems ranging from simple to moderately complex. The problems involved as many as five outcomes, two to five mission phases, permanent faults, and some functional dependencies. Transient faults and software errors were not considered. A different analyst was responsible for each technique. Significantly more time and effort were required to learn performability analysis than the fault tree method. Performability is inherently as accurate as fault tree analysis. For the sample problems, fault trees were more practical and less time consuming to apply, while performability required less ingenuity and was more checkable. Performability offers some advantages for evaluating very complex problems.
Proceedings of the Twenty-Third Annual Software Engineering Workshop

NASA Technical Reports Server (NTRS)

1999-01-01

The Twenty-third Annual Software Engineering Workshop (SEW) provided 20 presentations designed to further the goals of the Software Engineering Laboratory (SEL) of the NASA-GSFC. The presentations were selected on their creativity. The sessions which were held on 2-3 of December 1998, centered on the SEL, Experimentation, Inspections, Fault Prediction, Verification and Validation, and Embedded Systems and Safety-Critical Systems.
An Incremental Life-cycle Assurance Strategy for Critical System Certification

DTIC Science & Technology

2014-11-04

for Safe Aircraft Operation Embedded software systems introduce a new class of problems not addressed by traditional system modeling & analysis...Platform Runtime Architecture Application Software Embedded SW System Engineer Data Stream Characteristics Latency jitter affects control behavior...do system level failures still occur despite fault tolerance techniques being deployed in systems ? Embedded software system as major source of
The environmental control and life support system advanced automation project. Phase 1: Application evaluation

NASA Technical Reports Server (NTRS)

Dewberry, Brandon S.

1990-01-01

The Environmental Control and Life Support System (ECLSS) is a Freedom Station distributed system with inherent applicability to advanced automation primarily due to the comparatively large reaction times of its subsystem processes. This allows longer contemplation times in which to form a more intelligent control strategy and to detect or prevent faults. The objective of the ECLSS Advanced Automation Project is to reduce the flight and ground manpower needed to support the initial and evolutionary ECLS system. The approach is to search out and make apparent those processes in the baseline system which are in need of more automatic control and fault detection strategies, to influence the ECLSS design by suggesting software hooks and hardware scars which will allow easy adaptation to advanced algorithms, and to develop complex software prototypes which fit into the ECLSS software architecture and will be shown in an ECLSS hardware testbed to increase the autonomy of the system. Covered here are the preliminary investigation and evaluation process, aimed at searching the ECLSS for candidate functions for automation and providing a software hooks and hardware scars analysis. This analysis shows changes needed in the baselined system for easy accommodation of knowledge-based or other complex implementations which, when integrated in flight or ground sustaining engineering architectures, will produce a more autonomous and fault tolerant Environmental Control and Life Support System.
Deep Space Network Antenna Logic Controller

NASA Technical Reports Server (NTRS)

Ahlstrom, Harlow; Morgan, Scott; Hames, Peter; Strain, Martha; Owen, Christopher; Shimizu, Kenneth; Wilson, Karen; Shaller, David; Doktomomtaz, Said; Leung, Patrick

2007-01-01

The Antenna Logic Controller (ALC) software controls and monitors the motion control equipment of the 4,000-metric-ton structure of the Deep Space Network 70-meter antenna. This program coordinates the control of 42 hydraulic pumps, while monitoring several interlocks for personnel and equipment safety. Remote operation of the ALC runs via the Antenna Monitor & Control (AMC) computer, which orchestrates the tracking functions of the entire antenna. This software provides a graphical user interface for local control, monitoring, and identification of faults as well as, at a high level, providing for the digital control of the axis brakes so that the servo of the AMC may control the motion of the antenna. Specific functions of the ALC also include routines for startup in cold weather, controlled shutdown for both normal and fault situations, and pump switching on failure. The increased monitoring, the ability to trend key performance characteristics, the improved fault detection and recovery, the centralization of all control at a single panel, and the simplification of the user interface have all reduced the required workforce to run 70-meter antennas. The ALC also increases the antenna availability by reducing the time required to start up the antenna, to diagnose faults, and by providing additional insight into the performance of key parameters that aid in preventive maintenance to avoid key element failure. The ALC User Display (AUD) is a graphical user interface with hierarchical display structure, which provides high-level status information to the operation of the ALC, as well as detailed information for virtually all aspects of the ALC via drill-down displays. The operational status of an item, be it a function or assembly, is shown in the higher-level display. By pressing the item on the display screen, a new screen opens to show more detail of the function/assembly. Navigation tools and the map button allow immediate access to all screens.
Methodology for Automated Detection of Degradation and Faults in Packaged Air Conditioners and Heat Pumps Using Only Two Sensors

DOE Office of Scientific and Technical Information (OSTI.GOV)

2016-02-10

The software was created in the process of developing a system known as the Smart Monitoring and Diagnostic System (SMDS) for packaged air conditioners and heat pumps used on commercial buildings (known as RTUs). The SMDS provides automated remote monitoring and detection of performance degradation and faults in these RTUs and could increase the awareness by building owners and maintenance providers of the condition of the equipment, the cost of operating it in degraded condition, and the quality of maintenance and repair service when it is performed. The SMDS provides these capabilities and would enable conditioned-based maintenance rather than themore » reactive and schedule-based preventive maintenance commonly used today, when maintenance of RTUs is done at all. Improved maintenance would help ensure persistent peak operating efficiencies, reducing energy consumption by an estimated 10% to 30%.« less
Software Requirements Analysis as Fault Predictor

NASA Technical Reports Server (NTRS)

Wallace, Dolores

2003-01-01

Waiting until the integration and system test phase to discover errors leads to more costly rework than resolving those same errors earlier in the lifecycle. Costs increase even more significantly once a software system has become operational. WE can assess the quality of system requirements, but do little to correlate this information either to system assurance activities or long-term reliability projections - both of which remain unclear and anecdotal. Extending earlier work on requirements accomplished by the ARM tool, measuring requirements quality information against code complexity and test data for the same system may be used to predict specific software modules containing high impact or deeply embedded faults now escaping in operational systems. Such knowledge would lead to more effective and efficient test programs. It may enable insight into whether a program should be maintained or started over.
A framework for software fault tolerance in real-time systems

NASA Technical Reports Server (NTRS)

Anderson, T.; Knight, J. C.

1983-01-01

A classification scheme for errors and a technique for the provision of software fault tolerance in cyclic real-time systems is presented. The technique requires that the process structure of a system be represented by a synchronization graph which is used by an executive as a specification of the relative times at which they will communicate during execution. Communication between concurrent processes is severely limited and may only take place between processes engaged in an exchange. A history of error occurrences is maintained by an error handler. When an error is detected, the error handler classifies it using the error history information and then initiates appropriate recovery action.
The implementation and use of Ada on distributed systems with high reliability requirements

NASA Technical Reports Server (NTRS)

Knight, J. C.

1984-01-01

The use and implementation of Ada in distributed environments in which reliability is the primary concern is investigated. Emphasis is placed on the possibility that a distributed system may be programmed entirely in ADA so that the individual tasks of the system are unconcerned with which processors they are executing on, and that failures may occur in the software or underlying hardware. The primary activities are: (1) Continued development and testing of our fault-tolerant Ada testbed; (2) consideration of desirable language changes to allow Ada to provide useful semantics for failure; (3) analysis of the inadequacies of existing software fault tolerance strategies.
Regional tectonic evaluation of the Tuscan Apenine, vulcanism, thermal anomalies and the relation to structural units

NASA Technical Reports Server (NTRS)

Bodechtel, J. (Principal Investigator)

1975-01-01

The author has identified the following significant results. The geological interpretation on data exhibiting the Italian peninsula led to the recognition of tectonic features which are explained by a clockwise rotation of various blocks along left-handed transform faults. These faults can be interpreted as resulting from shear due to main stress directed north-eastwards. A land use map of the mountainous regions of Italy was produced on a scale of 1:250,000. For the digital treatment of MSS-CCTs an image processing software was written in FORTRAN 4. The software package includes descriptive statistics and also classification algorithms.
The Development of Design Tools for Fault Tolerant Quantum Dot Cellular Automata Based Logic

NASA Technical Reports Server (NTRS)

Armstrong, Curtis D.; Humphreys, William M.

2003-01-01

We are developing software to explore the fault tolerance of quantum dot cellular automata gate architectures in the presence of manufacturing variations and device defects. The Topology Optimization Methodology using Applied Statistics (TOMAS) framework extends the capabilities of the A Quantum Interconnected Network Array Simulator (AQUINAS) by adding front-end and back-end software and creating an environment that integrates all of these components. The front-end tools establish all simulation parameters, configure the simulation system, automate the Monte Carlo generation of simulation files, and execute the simulation of these files. The back-end tools perform automated data parsing, statistical analysis and report generation.
Fuzzy logic based on-line fault detection and classification in transmission line.

PubMed

Adhikari, Shuma; Sinha, Nidul; Dorendrajit, Thingam

2016-01-01

This study presents fuzzy logic based online fault detection and classification of transmission line using Programmable Automation and Control technology based National Instrument Compact Reconfigurable i/o (CRIO) devices. The LabVIEW software combined with CRIO can perform real time data acquisition of transmission line. When fault occurs in the system current waveforms are distorted due to transients and their pattern changes according to the type of fault in the system. The three phase alternating current, zero sequence and positive sequence current data generated by LabVIEW through CRIO-9067 are processed directly for relaying. The result shows that proposed technique is capable of right tripping action and classification of type of fault at high speed therefore can be employed in practical application.
Airborne Advanced Reconfigurable Computer System (ARCS)

NASA Technical Reports Server (NTRS)

Bjurman, B. E.; Jenkins, G. M.; Masreliez, C. J.; Mcclellan, K. L.; Templeman, J. E.

1976-01-01

A digital computer subsystem fault-tolerant concept was defined, and the potential benefits and costs of such a subsystem were assessed when used as the central element of a new transport's flight control system. The derived advanced reconfigurable computer system (ARCS) is a triple-redundant computer subsystem that automatically reconfigures, under multiple fault conditions, from triplex to duplex to simplex operation, with redundancy recovery if the fault condition is transient. The study included criteria development covering factors at the aircraft's operation level that would influence the design of a fault-tolerant system for commercial airline use. A new reliability analysis tool was developed for evaluating redundant, fault-tolerant system availability and survivability; and a stringent digital system software design methodology was used to achieve design/implementation visibility.
The role of thin, mechanical discontinuities on the propagation of reverse faults: insights from analogue models

NASA Astrophysics Data System (ADS)

Bonanno, Emanuele; Bonini, Lorenzo; Basili, Roberto; Toscani, Giovanni; Seno, Silvio

2016-04-01

Fault-related folding kinematic models are widely used to explain accommodation of crustal shortening. These models, however, include simplifications, such as the assumption of constant growth rate of faults. This value sometimes is not constant in isotropic materials, and even more variable if one considers naturally anisotropic geological systems. , This means that these simplifications could lead to incorrect interpretations of the reality. In this study, we use analogue models to evaluate how thin, mechanical discontinuities, such as beddings or thin weak layers, influence the propagation of reverse faults and related folds. The experiments are performed with two different settings to simulate initially-blind master faults dipping at 30° and 45°. The 30° dip represents one of the Andersonian conjugate fault, and 45° dip is very frequent in positive reactivation of normal faults. The experimental apparatus consists of a clay layer placed above two plates: one plate, the footwall, is fixed; the other one, the hanging wall, is mobile. Motor-controlled sliding of the hanging wall plate along an inclined plane reproduces the reverse fault movement. We run thirty-six experiments: eighteen with dip of 30° and eighteen with dip of 45°. For each dip-angle setting, we initially run isotropic experiments that serve as a reference. Then, we run the other experiments with one or two discontinuities (horizontal precuts performed into the clay layer). We monitored the experiments collecting side photographs every 1.0 mm of displacement of the master fault. These images have been analyzed through PIVlab software, a tool based on the Digital Image Correlation method. With the "displacement field analysis" (one of the PIVlab tools) we evaluated, the variation of the trishear zone shape and how the master-fault tip and newly-formed faults propagate into the clay medium. With the "strain distribution analysis", we observed the amount of the on-fault and off-fault deformation with respect to the faulting pattern and evolution. Secondly, using MOVE software, we extracted the positions of fault tips and folds every 5 mm of displacement on the master fault. Analyzing these positions in all of the experiments, we found that the growth rate of the faults and the related fold shape vary depending on the number of discontinuities in the clay medium. Other results can be summarized as follows: 1) the fault growth rate is not constant, but varies especially while the new faults interacts with precuts; 2) the new faults tend to crosscut the discontinuities when the angle between them is approximately 90°; 3) the trishear zone change its shape during the experiments especially when the main fault interacts with the discontinuities.
Application of majority voting and consensus voting algorithms in N-version software

NASA Astrophysics Data System (ADS)

Tsarev, R. Yu; Durmuş, M. S.; Üstoglu, I.; Morozov, V. A.

2018-05-01

N-version programming is one of the most common techniques which is used to improve the reliability of software by building in fault tolerance, redundancy and decreasing common cause failures. N different equivalent software versions are developed by N different and isolated workgroups by considering the same software specifications. The versions solve the same task and return results that have to be compared to determine the correct result. Decisions of N different versions are evaluated by a voting algorithm or the so-called voter. In this paper, two of the most commonly used software voting algorithms such as the majority voting algorithm and the consensus voting algorithm are studied. The distinctive features of Nversion programming with majority voting and N-version programming with consensus voting are described. These two algorithms make a decision about the correct result on the base of the agreement matrix. However, if the equivalence relation on the agreement matrix is not satisfied it is impossible to make a decision. It is shown that the agreement matrix can be transformed into an appropriate form by using the Boolean compositions when the equivalence relation is satisfied.
Open Energy Information System version 2.0

DOE Office of Scientific and Technical Information (OSTI.GOV)

OpenEIS was created to provide standard methods for authoring, sharing, testing, using, and improving algorithms for operational building energy efficiency with building managers and building owners. OpenEIS is designed as a no-cost/low-cost solution that will propagate the fault detection and diagnostic (FDD) solutions into the marketplace by providing state- of- the-art analytical and diagnostic algorithms. As OpenEIS penetrates the market, demand by control system manufacturers and integrators serving small and medium commercial customers will help push these types of commercial software tool offerings into the broader marketplace.
Model-Based Fault Diagnosis: Performing Root Cause and Impact Analyses in Real Time

NASA Technical Reports Server (NTRS)

Figueroa, Jorge F.; Walker, Mark G.; Kapadia, Ravi; Morris, Jonathan

2012-01-01

Generic, object-oriented fault models, built according to causal-directed graph theory, have been integrated into an overall software architecture dedicated to monitoring and predicting the health of mission- critical systems. Processing over the generic fault models is triggered by event detection logic that is defined according to the specific functional requirements of the system and its components. Once triggered, the fault models provide an automated way for performing both upstream root cause analysis (RCA), and for predicting downstream effects or impact analysis. The methodology has been applied to integrated system health management (ISHM) implementations at NASA SSC's Rocket Engine Test Stands (RETS).

Dataflow models for fault-tolerant control systems

NASA Technical Reports Server (NTRS)

Papadopoulos, G. M.

1984-01-01

Dataflow concepts are used to generate a unified hardware/software model of redundant physical systems which are prone to faults. Basic results in input congruence and synchronization are shown to reduce to a simple model of data exchanges between processing sites. Procedures are given for the construction of congruence schemata, the distinguishing features of any correctly designed redundant system.
Fault Injection Campaign for a Fault Tolerant Duplex Framework

NASA Technical Reports Server (NTRS)

Sacco, Gian Franco; Ferraro, Robert D.; von llmen, Paul; Rennels, Dave A.

2007-01-01

Fault tolerance is an efficient approach adopted to avoid or reduce the damage of a system failure. In this work we present the results of a fault injection campaign we conducted on the Duplex Framework (DF). The DF is a software developed by the UCLA group [1, 2] that uses a fault tolerant approach and allows to run two replicas of the same process on two different nodes of a commercial off-the-shelf (COTS) computer cluster. A third process running on a different node, constantly monitors the results computed by the two replicas, and eventually restarts the two replica processes if an inconsistency in their computation is detected. This approach is very cost efficient and can be adopted to control processes on spacecrafts where the fault rate produced by cosmic rays is not very high.
Adjustable Autonomy Testbed

NASA Technical Reports Server (NTRS)

Malin, Jane T.; Schrenkenghost, Debra K.

2001-01-01

The Adjustable Autonomy Testbed (AAT) is a simulation-based testbed located in the Intelligent Systems Laboratory in the Automation, Robotics and Simulation Division at NASA Johnson Space Center. The purpose of the testbed is to support evaluation and validation of prototypes of adjustable autonomous agent software for control and fault management for complex systems. The AA T project has developed prototype adjustable autonomous agent software and human interfaces for cooperative fault management. This software builds on current autonomous agent technology by altering the architecture, components and interfaces for effective teamwork between autonomous systems and human experts. Autonomous agents include a planner, flexible executive, low level control and deductive model-based fault isolation. Adjustable autonomy is intended to increase the flexibility and effectiveness of fault management with an autonomous system. The test domain for this work is control of advanced life support systems for habitats for planetary exploration. The CONFIG hybrid discrete event simulation environment provides flexible and dynamically reconfigurable models of the behavior of components and fluids in the life support systems. Both discrete event and continuous (discrete time) simulation are supported, and flows and pressures are computed globally. This provides fast dynamic simulations of interacting hardware systems in closed loops that can be reconfigured during operations scenarios, producing complex cascading effects of operations and failures. Current object-oriented model libraries support modeling of fluid systems, and models have been developed of physico-chemical and biological subsystems for processing advanced life support gases. In FY01, water recovery system models will be developed.
Reliable High Performance Peta- and Exa-Scale Computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bronevetsky, G

2012-04-02

As supercomputers become larger and more powerful, they are growing increasingly complex. This is reflected both in the exponentially increasing numbers of components in HPC systems (LLNL is currently installing the 1.6 million core Sequoia system) as well as the wide variety of software and hardware components that a typical system includes. At this scale it becomes infeasible to make each component sufficiently reliable to prevent regular faults somewhere in the system or to account for all possible cross-component interactions. The resulting faults and instability cause HPC applications to crash, perform sub-optimally or even produce erroneous results. As supercomputers continuemore » to approach Exascale performance and full system reliability becomes prohibitively expensive, we will require novel techniques to bridge the gap between the lower reliability provided by hardware systems and users unchanging need for consistent performance and reliable results. Previous research on HPC system reliability has developed various techniques for tolerating and detecting various types of faults. However, these techniques have seen very limited real applicability because of our poor understanding of how real systems are affected by complex faults such as soft fault-induced bit flips or performance degradations. Prior work on such techniques has had very limited practical utility because it has generally focused on analyzing the behavior of entire software/hardware systems both during normal operation and in the face of faults. Because such behaviors are extremely complex, such studies have only produced coarse behavioral models of limited sets of software/hardware system stacks. Since this provides little insight into the many different system stacks and applications used in practice, this work has had little real-world impact. My project addresses this problem by developing a modular methodology to analyze the behavior of applications and systems during both normal and faulty operation. By synthesizing models of individual components into a whole-system behavior models my work is making it possible to automatically understand the behavior of arbitrary real-world systems to enable them to tolerate a wide range of system faults. My project is following a multi-pronged research strategy. Section II discusses my work on modeling the behavior of existing applications and systems. Section II.A discusses resilience in the face of soft faults and Section II.B looks at techniques to tolerate performance faults. Finally Section III presents an alternative approach that studies how a system should be designed from the ground up to make resilience natural and easy.« less
Space Station Module Power Management and Distribution System (SSM/PMAD)

NASA Technical Reports Server (NTRS)

Miller, William (Compiler); Britt, Daniel (Compiler); Elges, Michael (Compiler); Myers, Chris (Compiler)

1994-01-01

This report provides an overview of the Space Station Module Power Management and Distribution (SSM/PMAD) testbed system and describes recent enhancements to that system. Four tasks made up the original contract: (1) common module power management and distribution system automation plan definition; (2) definition of hardware and software elements of automation; (3) design, implementation and delivery of the hardware and software making up the SSM/PMAD system; and (4) definition and development of the host breadboard computer environment. Additions and/or enhancements to the SSM/PMAD test bed that have occurred since July 1990 are reported. These include: (1) rehosting the MAESTRO scheduler; (2) reorganization of the automation software internals; (3) a more robust communications package; (4) the activity editor to the MAESTRO scheduler; (5) rehosting the LPLMS to execute under KNOMAD; implementation of intermediate levels of autonomy; (6) completion of the KNOMAD knowledge management facility; (7) significant improvement of the user interface; (8) soft and incipient fault handling design; (9) intermediate levels of autonomy, and (10) switch maintenance.
Lessons Learned on Implementing Fault Detection, Isolation, and Recovery (FDIR) in a Ground Launch Environment

NASA Technical Reports Server (NTRS)

Ferrell, Bob A.; Lewis, Mark E.; Perotti, Jose M.; Brown, Barbara L.; Oostdyk, Rebecca L.; Goetz, Jesse W.

2010-01-01

This paper's main purpose is to detail issues and lessons learned regarding designing, integrating, and implementing Fault Detection Isolation and Recovery (FDIR) for Constellation Exploration Program (CxP) Ground Operations at Kennedy Space Center (KSC). Part of the0 overall implementation of National Aeronautics and Space Administration's (NASA's) CxP, FDIR is being implemented in three main components of the program (Ares, Orion, and Ground Operations/Processing). While not initially part of the design baseline for the CxP Ground Operations, NASA felt that FDIR is important enough to develop, that NASA's Exploration Systems Mission Directorate's (ESMD's) Exploration Technology Development Program (ETDP) initiated a task for it under their Integrated System Health Management (ISHM) research area. This task, referred to as the FDIIR project, is a multi-year multi-center effort. The primary purpose of the FDIR project is to develop a prototype and pathway upon which Fault Detection and Isolation (FDI) may be transitioned into the Ground Operations baseline. Currently, Qualtech Systems Inc (QSI) Commercial Off The Shelf (COTS) software products Testability Engineering and Maintenance System (TEAMS) Designer and TEAMS RDS/RT are being utilized in the implementation of FDI within the FDIR project. The TEAMS Designer COTS software product is being utilized to model the system with Functional Fault Models (FFMs). A limited set of systems in Ground Operations are being modeled by the FDIR project, and the entire Ares Launch Vehicle is being modeled under the Functional Fault Analysis (FFA) project at Marshall Space Flight Center (MSFC). Integration of the Ares FFMs and the Ground Processing FFMs is being done under the FDIR project also utilizing the TEAMS Designer COTS software product. One of the most significant challenges related to integration is to ensure that FFMs developed by different organizations can be integrated easily and without errors. Software Interface Control Documents (ICDs) for the FFMs and their usage will be addressed as the solution to this issue. In particular, the advantages and disadvantages of these ICDs across physically separate development groups will be delineated.
Software Health Management: A Short Review of Challenges and Existing Techniques

NASA Technical Reports Server (NTRS)

Pipatsrisawat, Knot; Darwiche, Adnan; Mengshoel, Ole J.; Schumann, Johann

2009-01-01

Modern spacecraft (as well as most other complex mechanisms like aircraft, automobiles, and chemical plants) rely more and more on software, to a point where software failures have caused severe accidents and loss of missions. Software failures during a manned mission can cause loss of life, so there are severe requirements to make the software as safe and reliable as possible. Typically, verification and validation (V&V) has the task of making sure that all software errors are found before the software is deployed and that it always conforms to the requirements. Experience, however, shows that this gold standard of error-free software cannot be reached in practice. Even if the software alone is free of glitches, its interoperation with the hardware (e.g., with sensors or actuators) can cause problems. Unexpected operational conditions or changes in the environment may ultimately cause a software system to fail. Is there a way to surmount this problem? In most modern aircraft and many automobiles, hardware such as central electrical, mechanical, and hydraulic components are monitored by IVHM (Integrated Vehicle Health Management) systems. These systems can recognize, isolate, and identify faults and failures, both those that already occurred as well as imminent ones. With the help of diagnostics and prognostics, appropriate mitigation strategies can be selected (replacement or repair, switch to redundant systems, etc.). In this short paper, we discuss some challenges and promising techniques for software health management (SWHM). In particular, we identify unique challenges for preventing software failure in systems which involve both software and hardware components. We then present our classifications of techniques related to SWHM. These classifications are performed based on dimensions of interest to both developers and users of the techniques, and hopefully provide a map for dealing with software faults and failures.
Rupture geometry and slip distribution of the 2016 January 21st Ms6.4 Menyuan, China earthquake inferred from Sentinel-1A InSAR measurements

NASA Astrophysics Data System (ADS)

Zhou, Y.

2016-12-01

On 21 January 2016, an Ms6.4 earthquake stroke Menyuan country, Qinghai Province, China. The epicenter of the main shock and locations of its aftershocks indicate that the Menyuan earthquake occurred near the left-lateral Lenglongling fault. However, the focal mechanism suggests that the earthquake should take place on a thrust fault. In addition, field investigation indicates that the earthquake did not rupture the ground surface. Therefore, the rupture geometry is unclear as well as coseismic slip distribution. We processed two pairs of InSAR images acquired by the ESA Sentinel-1A satellite with the ISCE software, and both ascending and descending orbits were included. After subsampling the coseismic InSAR images into about 800 pixels, coseismic displacement data along LOS direction are inverted for earthquake source parameters. We employ an improved mixed linear-nonlinear Bayesian inversion method to infer fault geometric parameters, slip distribution, and the Laplacian smoothing factor simultaneously. This method incorporates a hybrid differential evolution algorithm, which is an efficient global optimization algorithm. The inversion results show that the Menyuan earthquake ruptured a blind thrust fault with a strike of 124°and a dip angle of 41°. This blind fault was never investigated before and intersects with the left-lateral Lenglongling fault, but the strikes of them are nearly parallel. The slip sense is almost pure thrusting, and there is no significant slip within 4km depth. The max slip value is up to 0.3m, and the estimated moment magnitude is Mw5.93, in agreement with the seismic inversion result. The standard error of residuals between InSAR data and model prediction is as small as 0.5cm, verifying the correctness of the inversion results.
Rupture geometry and slip distribution of the 2016 January 21st Ms6.4 Menyuan, China earthquake

NASA Astrophysics Data System (ADS)

Zhou, Y.

2017-12-01

On 21 January 2016, an Ms6.4 earthquake stroke Menyuan country, Qinghai Province, China. The epicenter of the main shock and locations of its aftershocks indicate that the Menyuan earthquake occurred near the left-lateral Lenglongling fault. However, the focal mechanism suggests that the earthquake should take place on a thrust fault. In addition, field investigation indicates that the earthquake did not rupture the ground surface. Therefore, the rupture geometry is unclear as well as coseismic slip distribution. We processed two pairs of InSAR images acquired by the ESA Sentinel-1A satellite with the ISCE software, and both ascending and descending orbits were included. After subsampling the coseismic InSAR images into about 800 pixels, coseismic displacement data along LOS direction are inverted for earthquake source parameters. We employ an improved mixed linear-nonlinear Bayesian inversion method to infer fault geometric parameters, slip distribution, and the Laplacian smoothing factor simultaneously. This method incorporates a hybrid differential evolution algorithm, which is an efficient global optimization algorithm. The inversion results show that the Menyuan earthquake ruptured a blind thrust fault with a strike of 124°and a dip angle of 41°. This blind fault was never investigated before and intersects with the left-lateral Lenglongling fault, but the strikes of them are nearly parallel. The slip sense is almost pure thrusting, and there is no significant slip within 4km depth. The max slip value is up to 0.3m, and the estimated moment magnitude is Mw5.93, in agreement with the seismic inversion result. The standard error of residuals between InSAR data and model prediction is as small as 0.5cm, verifying the correctness of the inversion results.
Abstract for 1999 Rational Software User Conference

NASA Technical Reports Server (NTRS)

Dunphy, Julia; Rouquette, Nicolas; Feather, Martin; Tung, Yu-Wen

1999-01-01

We develop spacecraft fault-protection software at NASA/JPL. Challenges exemplified by our task: 1) high-quality systems - need for extensive validation & verification; 2) multi-disciplinary context - involves experts from diverse areas; 3) embedded systems - must adapt to external practices, notations, etc.; and 4) development pressures - NASA's mandate of "better, faster, cheaper".
Advanced microprocessor based power protection system using artificial neural network techniques

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chen, Z.; Kalam, A.; Zayegh, A.

This paper describes an intelligent embedded microprocessor based system for fault classification in power system protection system using advanced 32-bit microprocessor technology. The paper demonstrates the development of protective relay to provide overcurrent protection schemes for fault detection. It also describes a method for power fault classification in three-phase system based on the use of neural network technology. The proposed design is implemented and tested on a single line three phase power system in power laboratory. Both the hardware and software development are described in detail.
Integrated Hardware and Software for No-Loss Computing

NASA Technical Reports Server (NTRS)

James, Mark

2007-01-01

When an algorithm is distributed across multiple threads executing on many distinct processors, a loss of one of those threads or processors can potentially result in the total loss of all the incremental results up to that point. When implementation is massively hardware distributed, then the probability of a hardware failure during the course of a long execution is potentially high. Traditionally, this problem has been addressed by establishing checkpoints where the current state of some or part of the execution is saved. Then in the event of a failure, this state information can be used to recompute that point in the execution and resume the computation from that point. A serious problem arises when one distributes a problem across multiple threads and physical processors is that one increases the likelihood of the algorithm failing due to no fault of the scientist but as a result of hardware faults coupled with operating system problems. With good reason, scientists expect their computing tools to serve them and not the other way around. What is novel here is a unique combination of hardware and software that reformulates an application into monolithic structure that can be monitored in real-time and dynamically reconfigured in the event of a failure. This unique reformulation of hardware and software will provide advanced aeronautical technologies to meet the challenges of next-generation systems in aviation, for civilian and scientific purposes, in our atmosphere and in atmospheres of other worlds. In particular, with respect to NASA s manned flight to Mars, this technology addresses the critical requirements for improving safety and increasing reliability of manned spacecraft.
A Wireless Sensor System for Real-Time Monitoring and Fault Detection of Motor Arrays

PubMed Central

Medina-García, Jonathan; Sánchez-Rodríguez, Trinidad; Galán, Juan Antonio Gómez; Delgado, Aránzazu; Gómez-Bravo, Fernando; Jiménez, Raúl

2017-01-01

This paper presents a wireless fault detection system for industrial motors that combines vibration, motor current and temperature analysis, thus improving the detection of mechanical faults. The design also considers the time of detection and further possible actions, which are also important for the early detection of possible malfunctions, and thus for avoiding irreversible damage to the motor. The remote motor condition monitoring is implemented through a wireless sensor network (WSN) based on the IEEE 802.15.4 standard. The deployed network uses the beacon-enabled mode to synchronize several sensor nodes with the coordinator node, and the guaranteed time slot mechanism provides data monitoring with a predetermined latency. A graphic user interface offers remote access to motor conditions and real-time monitoring of several parameters. The developed wireless sensor node exhibits very low power consumption since it has been optimized both in terms of hardware and software. The result is a low cost, highly reliable and compact design, achieving a high degree of autonomy of more than two years with just one 3.3 V/2600 mAh battery. Laboratory and field tests confirm the feasibility of the wireless system. PMID:28245623
A Wireless Sensor System for Real-Time Monitoring and Fault Detection of Motor Arrays.

PubMed

Medina-García, Jonathan; Sánchez-Rodríguez, Trinidad; Galán, Juan Antonio Gómez; Delgado, Aránzazu; Gómez-Bravo, Fernando; Jiménez, Raúl

2017-02-25

This paper presents a wireless fault detection system for industrial motors that combines vibration, motor current and temperature analysis, thus improving the detection of mechanical faults. The design also considers the time of detection and further possible actions, which are also important for the early detection of possible malfunctions, and thus for avoiding irreversible damage to the motor. The remote motor condition monitoring is implemented through a wireless sensor network (WSN) based on the IEEE 802.15.4 standard. The deployed network uses the beacon-enabled mode to synchronize several sensor nodes with the coordinator node, and the guaranteed time slot mechanism provides data monitoring with a predetermined latency. A graphic user interface offers remote access to motor conditions and real-time monitoring of several parameters. The developed wireless sensor node exhibits very low power consumption since it has been optimized both in terms of hardware and software. The result is a low cost, highly reliable and compact design, achieving a high degree of autonomy of more than two years with just one 3.3 V/2600 mAh battery. Laboratory and field tests confirm the feasibility of the wireless system.
(Quickly) Testing the Tester via Path Coverage

NASA Technical Reports Server (NTRS)

Groce, Alex

2009-01-01

The configuration complexity and code size of an automated testing framework may grow to a point that the tester itself becomes a significant software artifact, prone to poor configuration and implementation errors. Unfortunately, testing the tester by using old versions of the software under test (SUT) may be impractical or impossible: test framework changes may have been motivated by interface changes in the tested system, or fault detection may become too expensive in terms of computing time to justify running until errors are detected on older versions of the software. We propose the use of path coverage measures as a "quick and dirty" method for detecting many faults in complex test frameworks. We also note the possibility of using techniques developed to diversify state-space searches in model checking to diversify test focus, and an associated classification of tester changes into focus-changing and non-focus-changing modifications.
Advanced information processing system: Local system services

NASA Technical Reports Server (NTRS)

Burkhardt, Laura; Alger, Linda; Whittredge, Roy; Stasiowski, Peter

1989-01-01

The Advanced Information Processing System (AIPS) is a multi-computer architecture composed of hardware and software building blocks that can be configured to meet a broad range of application requirements. The hardware building blocks are fault-tolerant, general-purpose computers, fault-and damage-tolerant networks (both computer and input/output), and interfaces between the networks and the computers. The software building blocks are the major software functions: local system services, input/output, system services, inter-computer system services, and the system manager. The foundation of the local system services is an operating system with the functions required for a traditional real-time multi-tasking computer, such as task scheduling, inter-task communication, memory management, interrupt handling, and time maintenance. Resting on this foundation are the redundancy management functions necessary in a redundant computer and the status reporting functions required for an operator interface. The functional requirements, functional design and detailed specifications for all the local system services are documented.
A second generation experiment in fault-tolerant software

NASA Technical Reports Server (NTRS)

Knight, J. C.

1986-01-01

Information was collected on the efficacy of fault-tolerant software by conducting two large-scale controlled experiments. In the first, an empirical study of multi-version software (MVS) was conducted. The second experiment is an empirical evaluation of self testing as a method of error detection (STED). The purpose ot the MVS experiment was to obtain empirical measurement of the performance of multi-version systems. Twenty versions of a program were prepared at four different sites under reasonably realistic development conditions from the same specifications. The purpose of the STED experiment was to obtain empirical measurements of the performance of assertions in error detection. Eight versions of a program were modified to include assertions at two different sites under controlled conditions. The overall structure of the testing environment for the MVS experiment and its status are described. Work to date in the STED experiment is also presented.
An Integrated Crustal Dynamics Simulator

NASA Astrophysics Data System (ADS)

Xing, H. L.; Mora, P.

2007-12-01

Numerical modelling offers an outstanding opportunity to gain an understanding of the crustal dynamics and complex crustal system behaviour. This presentation provides our long-term and ongoing effort on finite element based computational model and software development to simulate the interacting fault system for earthquake forecasting. A R-minimum strategy based finite-element computational model and software tool, PANDAS, for modelling 3-dimensional nonlinear frictional contact behaviour between multiple deformable bodies with the arbitrarily-shaped contact element strategy has been developed by the authors, which builds up a virtual laboratory to simulate interacting fault systems including crustal boundary conditions and various nonlinearities (e.g. from frictional contact, materials, geometry and thermal coupling). It has been successfully applied to large scale computing of the complex nonlinear phenomena in the non-continuum media involving the nonlinear frictional instability, multiple material properties and complex geometries on supercomputers, such as the South Australia (SA) interacting fault system, South California fault model and Sumatra subduction model. It has been also extended and to simulate the hot fractured rock (HFR) geothermal reservoir system in collaboration of Geodynamics Ltd which is constructing the first geothermal reservoir system in Australia and to model the tsunami generation induced by earthquakes. Both are supported by Australian Research Council.
Three-Dimensional Geologic Map of the Hayward Fault Zone, San Francisco Bay Region, California

USGS Publications Warehouse

Phelps, G.A.; Graymer, R.W.; Jachens, R.C.; Ponce, D.A.; Simpson, R.W.; Wentworth, C.M.

2008-01-01

A three-dimensional (3D) geologic map of the Hayward Fault zone was created by integrating the results from geologic mapping, potential field geophysics, and seismology investigations. The map volume is 100 km long, 20 km wide, and extends to a depth of 12 km below sea level. The map volume is oriented northwest and is approximately bisected by the Hayward Fault. The complex geologic structure of the region makes it difficult to trace many geologic units into the subsurface. Therefore, the map units are generalized from 1:24,000-scale geologic maps. Descriptions of geologic units and structures are offered, along with a discussion of the methods used to map them and incorporate them into the 3D geologic map. The map spatial database and associated viewing software are provided. Elements of the map, such as individual fault surfaces, are also provided in a non-proprietary format so that the user can access the map via open-source software. The sheet accompanying this manuscript shows views taken from the 3D geologic map for the user to access. The 3D geologic map is designed as a multi-purpose resource for further geologic investigations and process modeling.
Combinatorial Optimization Algorithms for Dynamic Multiple Fault Diagnosis in Automotive and Aerospace Applications

NASA Astrophysics Data System (ADS)

Kodali, Anuradha

In this thesis, we develop dynamic multiple fault diagnosis (DMFD) algorithms to diagnose faults that are sporadic and coupled. Firstly, we formulate a coupled factorial hidden Markov model-based (CFHMM) framework to diagnose dependent faults occurring over time (dynamic case). Here, we implement a mixed memory Markov coupling model to determine the most likely sequence of (dependent) fault states, the one that best explains the observed test outcomes over time. An iterative Gauss-Seidel coordinate ascent optimization method is proposed for solving the problem. A soft Viterbi algorithm is also implemented within the framework for decoding dependent fault states over time. We demonstrate the algorithm on simulated and real-world systems with coupled faults; the results show that this approach improves the correct isolation rate as compared to the formulation where independent fault states are assumed. Secondly, we formulate a generalization of set-covering, termed dynamic set-covering (DSC), which involves a series of coupled set-covering problems over time. The objective of the DSC problem is to infer the most probable time sequence of a parsimonious set of failure sources that explains the observed test outcomes over time. The DSC problem is NP-hard and intractable due to the fault-test dependency matrix that couples the failed tests and faults via the constraint matrix, and the temporal dependence of failure sources over time. Here, the DSC problem is motivated from the viewpoint of a dynamic multiple fault diagnosis problem, but it has wide applications in operations research, for e.g., facility location problem. Thus, we also formulated the DSC problem in the context of a dynamically evolving facility location problem. Here, a facility can be opened, closed, or can be temporarily unavailable at any time for a given requirement of demand points. These activities are associated with costs or penalties, viz., phase-in or phase-out for the opening or closing of a facility, respectively. The set-covering matrix encapsulates the relationship among the rows (tests or demand points) and columns (faults or locations) of the system at each time. By relaxing the coupling constraints using Lagrange multipliers, the DSC problem can be decoupled into independent subproblems, one for each column. Each subproblem is solved using the Viterbi decoding algorithm, and a primal feasible solution is constructed by modifying the Viterbi solutions via a heuristic. The proposed Viterbi-Lagrangian relaxation algorithm (VLRA) provides a measure of suboptimality via an approximate duality gap. As a major practical extension of the above problem, we also consider the problem of diagnosing faults with delayed test outcomes, termed delay-dynamic set-covering (DDSC), and experiment with real-world problems that exhibit masking faults. Also, we present simulation results on OR-library datasets (set-covering formulations are predominantly validated on these matrices in the literature), posed as facility location problems. Finally, we implement these algorithms to solve problems in aerospace and automotive applications. Firstly, we address the diagnostic ambiguity problem in aerospace and automotive applications by developing a dynamic fusion framework that includes dynamic multiple fault diagnosis algorithms. This improves the correct fault isolation rate, while minimizing the false alarm rates, by considering multiple faults instead of the traditional data-driven techniques based on single fault (class)-single epoch (static) assumption. The dynamic fusion problem is formulated as a maximum a posteriori decision problem of inferring the fault sequence based on uncertain outcomes of multiple binary classifiers over time. The fusion process involves three steps: the first step transforms the multi-class problem into dichotomies using error correcting output codes (ECOC), thereby solving the concomitant binary classification problems; the second step fuses the outcomes of multiple binary classifiers over time using a sliding window or block dynamic fusion method that exploits temporal data correlations over time. We solve this NP-hard optimization problem via a Lagrangian relaxation (variational) technique. The third step optimizes the classifier parameters, viz., probabilities of detection and false alarm, using a genetic algorithm. The proposed algorithm is demonstrated by computing the diagnostic performance metrics on a twin-spool commercial jet engine, an automotive engine, and UCI datasets (problems with high classification error are specifically chosen for experimentation). We show that the primal-dual optimization framework performed consistently better than any traditional fusion technique, even when it is forced to give a single fault decision across a range of classification problems. Secondly, we implement the inference algorithms to diagnose faults in vehicle systems that are controlled by a network of electronic control units (ECUs). The faults, originating from various interactions and especially between hardware and software, are particularly challenging to address. Our basic strategy is to divide the fault universe of such cyber-physical systems in a hierarchical manner, and monitor the critical variables/signals that have impact at different levels of interactions. The proposed diagnostic strategy is validated on an electrical power generation and storage system (EPGS) controlled by two ECUs in an environment with CANoe/MATLAB co-simulation. Eleven faults are injected with the failures originating in actuator hardware, sensor, controller hardware and software components. Diagnostic matrix is established to represent the relationship between the faults and the test outcomes (also known as fault signatures) via simulations. The results show that the proposed diagnostic strategy is effective in addressing the interaction-caused faults.

A highly reliable, high performance open avionics architecture for real time Nap-of-the-Earth operations

NASA Technical Reports Server (NTRS)

Harper, Richard E.; Elks, Carl

1995-01-01

An Army Fault Tolerant Architecture (AFTA) has been developed to meet real-time fault tolerant processing requirements of future Army applications. AFTA is the enabling technology that will allow the Army to configure existing processors and other hardware to provide high throughput and ultrahigh reliability necessary for TF/TA/NOE flight control and other advanced Army applications. A comprehensive conceptual study of AFTA has been completed that addresses a wide range of issues including requirements, architecture, hardware, software, testability, producibility, analytical models, validation and verification, common mode faults, VHDL, and a fault tolerant data bus. A Brassboard AFTA for demonstration and validation has been fabricated, and two operating systems and a flight-critical Army application have been ported to it. Detailed performance measurements have been made of fault tolerance and operating system overheads while AFTA was executing the flight application in the presence of faults.
General Purpose Data-Driven Monitoring for Space Operations

NASA Technical Reports Server (NTRS)

Iverson, David L.; Martin, Rodney A.; Schwabacher, Mark A.; Spirkovska, Liljana; Taylor, William McCaa; Castle, Joseph P.; Mackey, Ryan M.

2009-01-01

As modern space propulsion and exploration systems improve in capability and efficiency, their designs are becoming increasingly sophisticated and complex. Determining the health state of these systems, using traditional parameter limit checking, model-based, or rule-based methods, is becoming more difficult as the number of sensors and component interactions grow. Data-driven monitoring techniques have been developed to address these issues by analyzing system operations data to automatically characterize normal system behavior. System health can be monitored by comparing real-time operating data with these nominal characterizations, providing detection of anomalous data signatures indicative of system faults or failures. The Inductive Monitoring System (IMS) is a data-driven system health monitoring software tool that has been successfully applied to several aerospace applications. IMS uses a data mining technique called clustering to analyze archived system data and characterize normal interactions between parameters. The scope of IMS based data-driven monitoring applications continues to expand with current development activities. Successful IMS deployment in the International Space Station (ISS) flight control room to monitor ISS attitude control systems has led to applications in other ISS flight control disciplines, such as thermal control. It has also generated interest in data-driven monitoring capability for Constellation, NASA's program to replace the Space Shuttle with new launch vehicles and spacecraft capable of returning astronauts to the moon, and then on to Mars. Several projects are currently underway to evaluate and mature the IMS technology and complementary tools for use in the Constellation program. These include an experiment on board the Air Force TacSat-3 satellite, and ground systems monitoring for NASA's Ares I-X and Ares I launch vehicles. The TacSat-3 Vehicle System Management (TVSM) project is a software experiment to integrate fault and anomaly detection algorithms and diagnosis tools with executive and adaptive planning functions contained in the flight software on-board the Air Force Research Laboratory TacSat-3 satellite. The TVSM software package will be uploaded after launch to monitor spacecraft subsystems such as power and guidance, navigation, and control (GN&C). It will analyze data in real-time to demonstrate detection of faults and unusual conditions, diagnose problems, and react to threats to spacecraft health and mission goals. The experiment will demonstrate the feasibility and effectiveness of integrated system health management (ISHM) technologies with both ground and on-board experiments.
Novel elastic protection against DDF failures in an enhanced software-defined SIEPON

NASA Astrophysics Data System (ADS)

Pakpahan, Andrew Fernando; Hwang, I.-Shyan; Yu, Yu-Ming; Hsu, Wu-Hsiao; Liem, Andrew Tanny; Nikoukar, AliAkbar

2017-07-01

Ever-increasing bandwidth demands on passive optical networks (PONs) are pushing the utilization of every fiber strand to its limit. This is mandating comprehensive protection until the end of the distribution drop fiber (DDF). Hence, it is important to provide refined protection with an advanced fault-protection architecture and recovery mechanism that is able to cope with various DDF failures. We propose a novel elastic protection against DDF failures that incorporates a software-defined networking (SDN) capability and a bus protection line to enhance the resiliency of the existing Service Interoperability in Ethernet Passive Optical Networks (SIEPON) system. We propose the addition of an integrated SDN controller and flow tables to the optical line terminal and optical network units (ONUs) in order to deliver various DDF protection scenarios. The proposed architecture enables flexible assignment of backup ONU(s) in pre/post-fault conditions depending on the PON traffic load. A transient backup ONU and multiple backup ONUs can be deployed in the pre-fault and post-fault scenarios, respectively. Our extensively discussed simulation results show that our proposed architecture provides better overall throughput and drop probability compared to the architecture with a fixed DDF protection mechanism. It does so while still maintaining overall QoS performance in terms of packet delay, mean jitter, packet loss, and throughput under various fault conditions.
FINDS: A fault inferring nonlinear detection system programmers manual, version 3.0

NASA Technical Reports Server (NTRS)

Lancraft, R. E.

1985-01-01

Detailed software documentation of the digital computer program FINDS (Fault Inferring Nonlinear Detection System) Version 3.0 is provided. FINDS is a highly modular and extensible computer program designed to monitor and detect sensor failures, while at the same time providing reliable state estimates. In this version of the program the FINDS methodology is used to detect, isolate, and compensate for failures in simulated avionics sensors used by the Advanced Transport Operating Systems (ATOPS) Transport System Research Vehicle (TSRV) in a Microwave Landing System (MLS) environment. It is intended that this report serve as a programmers guide to aid in the maintenance, modification, and revision of the FINDS software.
Designing application software in wide area network settings

NASA Technical Reports Server (NTRS)

Makpangou, Mesaac; Birman, Ken

1990-01-01

Progress in methodologies for developing robust local area network software has not been matched by similar results for wide area settings. The design of application software spanning multiple local area environments is examined. For important classes of applications, simple design techniques are presented that yield fault tolerant wide area programs. An implementation of these techniques as a set of tools for use within the ISIS system is described.
Digital Database of Recently Active Traces of the Hayward Fault, California

USGS Publications Warehouse

Lienkaemper, James J.

2006-01-01

The purpose of this map is to show the location of and evidence for recent movement on active fault traces within the Hayward Fault Zone, California. The mapped traces represent the integration of the following three different types of data: (1) geomorphic expression, (2) creep (aseismic fault slip),and (3) trench exposures. This publication is a major revision of an earlier map (Lienkaemper, 1992), which both brings up to date the evidence for faulting and makes it available formatted both as a digital database for use within a geographic information system (GIS) and for broader public access interactively using widely available viewing software. The pamphlet describes in detail the types of scientific observations used to make the map, gives references pertaining to the fault and the evidence of faulting, and provides guidance for use of and limitations of the map. [Last revised Nov. 2008, a minor update for 2007 LiDAR and recent trench investigations; see version history below.
The Orion GN and C Data-Driven Flight Software Architecture for Automated Sequencing and Fault Recovery

NASA Technical Reports Server (NTRS)

King, Ellis; Hart, Jeremy; Odegard, Ryan

2010-01-01

The Orion Crew Exploration Vehicle (CET) is being designed to include significantly more automation capability than either the Space Shuttle or the International Space Station (ISS). In particular, the vehicle flight software has requirements to accommodate increasingly automated missions throughout all phases of flight. A data-driven flight software architecture will provide an evolvable automation capability to sequence through Guidance, Navigation & Control (GN&C) flight software modes and configurations while maintaining the required flexibility and human control over the automation. This flexibility is a key aspect needed to address the maturation of operational concepts, to permit ground and crew operators to gain trust in the system and mitigate unpredictability in human spaceflight. To allow for mission flexibility and reconfrgurability, a data driven approach is being taken to load the mission event plan as well cis the flight software artifacts associated with the GN&C subsystem. A database of GN&C level sequencing data is presented which manages and tracks the mission specific and algorithm parameters to provide a capability to schedule GN&C events within mission segments. The flight software data schema for performing automated mission sequencing is presented with a concept of operations for interactions with ground and onboard crew members. A prototype architecture for fault identification, isolation and recovery interactions with the automation software is presented and discussed as a forward work item.
Thermal Expert System (TEXSYS): Systems autonomy demonstration project, volume 2. Results

NASA Technical Reports Server (NTRS)

Glass, B. J. (Editor)

1992-01-01

The Systems Autonomy Demonstration Project (SADP) produced a knowledge-based real-time control system for control and fault detection, isolation, and recovery (FDIR) of a prototype two-phase Space Station Freedom external active thermal control system (EATCS). The Thermal Expert System (TEXSYS) was demonstrated in recent tests to be capable of reliable fault anticipation and detection, as well as ordinary control of the thermal bus. Performance requirements were addressed by adopting a hierarchical symbolic control approach-layering model-based expert system software on a conventional, numerical data acquisition and control system. The model-based reasoning capabilities of TEXSYS were shown to be advantageous over typical rule-based expert systems, particularly for detection of unforeseen faults and sensor failures. Volume 1 gives a project overview and testing highlights. Volume 2 provides detail on the EATCS testbed, test operations, and online test results. Appendix A is a test archive, while Appendix B is a compendium of design and user manuals for the TEXSYS software.
Thermal Expert System (TEXSYS): Systems automony demonstration project, volume 1. Overview

NASA Technical Reports Server (NTRS)

Glass, B. J. (Editor)

1992-01-01

The Systems Autonomy Demonstration Project (SADP) produced a knowledge-based real-time control system for control and fault detection, isolation, and recovery (FDIR) of a prototype two-phase Space Station Freedom external active thermal control system (EATCS). The Thermal Expert System (TEXSYS) was demonstrated in recent tests to be capable of reliable fault anticipation and detection, as well as ordinary control of the thermal bus. Performance requirements were addressed by adopting a hierarchical symbolic control approach-layering model-based expert system software on a conventional, numerical data acquisition and control system. The model-based reasoning capabilities of TEXSYS were shown to be advantageous over typical rule-based expert systems, particularly for detection of unforeseen faults and sensor failures. Volume 1 gives a project overview and testing highlights. Volume 2 provides detail on the EATCS test bed, test operations, and online test results. Appendix A is a test archive, while Appendix B is a compendium of design and user manuals for the TEXSYS software.
Thermal Expert System (TEXSYS): Systems autonomy demonstration project, volume 2. Results

NASA Astrophysics Data System (ADS)

Glass, B. J.

1992-10-01

The Systems Autonomy Demonstration Project (SADP) produced a knowledge-based real-time control system for control and fault detection, isolation, and recovery (FDIR) of a prototype two-phase Space Station Freedom external active thermal control system (EATCS). The Thermal Expert System (TEXSYS) was demonstrated in recent tests to be capable of reliable fault anticipation and detection, as well as ordinary control of the thermal bus. Performance requirements were addressed by adopting a hierarchical symbolic control approach-layering model-based expert system software on a conventional, numerical data acquisition and control system. The model-based reasoning capabilities of TEXSYS were shown to be advantageous over typical rule-based expert systems, particularly for detection of unforeseen faults and sensor failures. Volume 1 gives a project overview and testing highlights. Volume 2 provides detail on the EATCS testbed, test operations, and online test results. Appendix A is a test archive, while Appendix B is a compendium of design and user manuals for the TEXSYS software.
High-throughput state-machine replication using software transactional memory.

PubMed

Zhao, Wenbing; Yang, William; Zhang, Honglei; Yang, Jack; Luo, Xiong; Zhu, Yueqin; Yang, Mary; Luo, Chaomin

2016-11-01

State-machine replication is a common way of constructing general purpose fault tolerance systems. To ensure replica consistency, requests must be executed sequentially according to some total order at all non-faulty replicas. Unfortunately, this could severely limit the system throughput. This issue has been partially addressed by identifying non-conflicting requests based on application semantics and executing these requests concurrently. However, identifying and tracking non-conflicting requests require intimate knowledge of application design and implementation, and a custom fault tolerance solution developed for one application cannot be easily adopted by other applications. Software transactional memory offers a new way of constructing concurrent programs. In this article, we present the mechanisms needed to retrofit existing concurrency control algorithms designed for software transactional memory for state-machine replication. The main benefit for using software transactional memory in state-machine replication is that general purpose concurrency control mechanisms can be designed without deep knowledge of application semantics. As such, new fault tolerance systems based on state-machine replications with excellent throughput can be easily designed and maintained. In this article, we introduce three different concurrency control mechanisms for state-machine replication using software transactional memory, namely, ordered strong strict two-phase locking, conventional timestamp-based multiversion concurrency control, and speculative timestamp-based multiversion concurrency control. Our experiments show that speculative timestamp-based multiversion concurrency control mechanism has the best performance in all types of workload, the conventional timestamp-based multiversion concurrency control offers the worst performance due to high abort rate in the presence of even moderate contention between transactions. The ordered strong strict two-phase locking mechanism offers the simplest solution with excellent performance in low contention workload, and fairly good performance in high contention workload.
High-throughput state-machine replication using software transactional memory

PubMed Central

Yang, William; Zhang, Honglei; Yang, Jack; Luo, Xiong; Zhu, Yueqin; Yang, Mary; Luo, Chaomin

2017-01-01

State-machine replication is a common way of constructing general purpose fault tolerance systems. To ensure replica consistency, requests must be executed sequentially according to some total order at all non-faulty replicas. Unfortunately, this could severely limit the system throughput. This issue has been partially addressed by identifying non-conflicting requests based on application semantics and executing these requests concurrently. However, identifying and tracking non-conflicting requests require intimate knowledge of application design and implementation, and a custom fault tolerance solution developed for one application cannot be easily adopted by other applications. Software transactional memory offers a new way of constructing concurrent programs. In this article, we present the mechanisms needed to retrofit existing concurrency control algorithms designed for software transactional memory for state-machine replication. The main benefit for using software transactional memory in state-machine replication is that general purpose concurrency control mechanisms can be designed without deep knowledge of application semantics. As such, new fault tolerance systems based on state-machine replications with excellent throughput can be easily designed and maintained. In this article, we introduce three different concurrency control mechanisms for state-machine replication using software transactional memory, namely, ordered strong strict two-phase locking, conventional timestamp-based multiversion concurrency control, and speculative timestamp-based multiversion concurrency control. Our experiments show that speculative timestamp-based multiversion concurrency control mechanism has the best performance in all types of workload, the conventional timestamp-based multiversion concurrency control offers the worst performance due to high abort rate in the presence of even moderate contention between transactions. The ordered strong strict two-phase locking mechanism offers the simplest solution with excellent performance in low contention workload, and fairly good performance in high contention workload. PMID:29075049
Quantitative Measures for Software Independent Verification and Validation

NASA Technical Reports Server (NTRS)

Lee, Alice

1996-01-01

As software is maintained or reused, it undergoes an evolution which tends to increase the overall complexity of the code. To understand the effects of this, we brought in statistics experts and leading researchers in software complexity, reliability, and their interrelationships. These experts' project has resulted in our ability to statistically correlate specific code complexity attributes, in orthogonal domains, to errors found over time in the HAL/S flight software which flies in the Space Shuttle. Although only a prototype-tools experiment, the result of this research appears to be extendable to all other NASA software, given appropriate data similar to that logged for the Shuttle onboard software. Our research has demonstrated that a more complete domain coverage can be mathematically demonstrated with the approach we have applied, thereby ensuring full insight into the cause-and-effects relationship between the complexity of a software system and the fault density of that system. By applying the operational profile we can characterize the dynamic effects of software path complexity under this same approach We now have the ability to measure specific attributes which have been statistically demonstrated to correlate to increased error probability, and to know which actions to take, for each complexity domain. Shuttle software verifiers can now monitor the changes in the software complexity, assess the added or decreased risk of software faults in modified code, and determine necessary corrections. The reports, tool documentation, user's guides, and new approach that have resulted from this research effort represent advances in the state of the art of software quality and reliability assurance. Details describing how to apply this technique to other NASA code are contained in this document.
Software reliability models for critical applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pham, H.; Pham, M.

This report presents the results of the first phase of the ongoing EG G Idaho, Inc. Software Reliability Research Program. The program is studying the existing software reliability models and proposes a state-of-the-art software reliability model that is relevant to the nuclear reactor control environment. This report consists of three parts: (1) summaries of the literature review of existing software reliability and fault tolerant software reliability models and their related issues, (2) proposed technique for software reliability enhancement, and (3) general discussion and future research. The development of this proposed state-of-the-art software reliability model will be performed in the secondmore » place. 407 refs., 4 figs., 2 tabs.« less
Software reliability models for critical applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pham, H.; Pham, M.

This report presents the results of the first phase of the ongoing EG&G Idaho, Inc. Software Reliability Research Program. The program is studying the existing software reliability models and proposes a state-of-the-art software reliability model that is relevant to the nuclear reactor control environment. This report consists of three parts: (1) summaries of the literature review of existing software reliability and fault tolerant software reliability models and their related issues, (2) proposed technique for software reliability enhancement, and (3) general discussion and future research. The development of this proposed state-of-the-art software reliability model will be performed in the second place.more » 407 refs., 4 figs., 2 tabs.« less
Integrated Application of Active Controls (IAAC) technology to an advanced subsonic transport project: Test act system validation

NASA Technical Reports Server (NTRS)

1985-01-01

The primary objective of the Test Active Control Technology (ACT) System laboratory tests was to verify and validate the system concept, hardware, and software. The initial lab tests were open loop hardware tests of the Test ACT System as designed and built. During the course of the testing, minor problems were uncovered and corrected. Major software tests were run. The initial software testing was also open loop. These tests examined pitch control laws, wing load alleviation, signal selection/fault detection (SSFD), and output management. The Test ACT System was modified to interface with the direct drive valve (DDV) modules. The initial testing identified problem areas with DDV nonlinearities, valve friction induced limit cycling, DDV control loop instability, and channel command mismatch. The other DDV issue investigated was the ability to detect and isolate failures. Some simple schemes for failure detection were tested but were not completely satisfactory. The Test ACT System architecture continues to appear promising for ACT/FBW applications in systems that must be immune to worst case generic digital faults, and be able to tolerate two sequential nongeneric faults with no reduction in performance. The challenge in such an implementation would be to keep the analog element sufficiently simple to achieve the necessary reliability.
FTMP (Fault Tolerant Multiprocessor) programmer's manual

NASA Technical Reports Server (NTRS)

Feather, F. E.; Liceaga, C. A.; Padilla, P. A.

1986-01-01

The Fault Tolerant Multiprocessor (FTMP) computer system was constructed using the Rockwell/Collins CAPS-6 processor. It is installed in the Avionics Integration Research Laboratory (AIRLAB) of NASA Langley Research Center. It is hosted by AIRLAB's System 10, a VAX 11/750, for the loading of programs and experimentation. The FTMP support software includes a cross compiler for a high level language called Automated Engineering Design (AED) System, an assembler for the CAPS-6 processor assembly language, and a linker. Access to this support software is through an automated remote access facility on the VAX which relieves the user of the burden of learning how to use the IBM 4381. This manual is a compilation of information about the FTMP support environment. It explains the FTMP software and support environment along many of the finer points of running programs on FTMP. This will be helpful to the researcher trying to run an experiment on FTMP and even to the person probing FTMP with fault injections. Much of the information in this manual can be found in other sources; we are only attempting to bring together the basic points in a single source. If the reader should need points clarified, there is a list of support documentation in the back of this manual.
Numerical simulation of the stress distribution in a coal mine caused by a normal fault

NASA Astrophysics Data System (ADS)

Zhang, Hongmei; Wu, Jiwen; Zhai, Xiaorong

2017-06-01

Luling coal mine was used for research using FLAC3D software to analyze the stress distribution characteristics of the two sides of a normal fault zone with two different working face models. The working faces were, respectively, on the hanging wall and the foot wall; the two directions of mining were directed to the fault. The stress distributions were different across the fault. The stress was concentrated and the influenced range of stress was gradually larger while the working face was located on the hanging wall. The fault zone played a negative effect to the stress transmission. Obviously, the fault prevented stress transmission, the stress concentrated on the fault zone and the hanging wall. In the second model, the stress on the two sides decreased at first, but then increased continuing to transmit to the hanging wall. The concentrated stress in the fault zone decreased and the stress transmission was obvious. Because of this, the result could be used to minimize roadway damage and lengthen the time available for coal mining by careful design of the roadway and working face.
Mission Services Evolution Center Message Bus

NASA Technical Reports Server (NTRS)

Mayorga, Arturo; Bristow, John O.; Butschky, Mike

2011-01-01

The Goddard Mission Services Evolution Center (GMSEC) Message Bus is a robust, lightweight, fault-tolerant middleware implementation that supports all messaging capabilities of the GMSEC API. This architecture is a distributed software system that routes messages based on message subject names and knowledge of the locations in the network of the interested software components.
Verifying Diagnostic Software

NASA Technical Reports Server (NTRS)

Lindsey, Tony; Pecheur, Charles

2004-01-01

Livingstone PathFinder (LPF) is a simulation-based computer program for verifying autonomous diagnostic software. LPF is designed especially to be applied to NASA s Livingstone computer program, which implements a qualitative-model-based algorithm that diagnoses faults in a complex automated system (e.g., an exploratory robot, spacecraft, or aircraft). LPF forms a software test bed containing a Livingstone diagnosis engine, embedded in a simulated operating environment consisting of a simulator of the system to be diagnosed by Livingstone and a driver program that issues commands and faults according to a nondeterministic scenario provided by the user. LPF runs the test bed through all executions allowed by the scenario, checking for various selectable error conditions after each step. All components of the test bed are instrumented, so that execution can be single-stepped both backward and forward. The architecture of LPF is modular and includes generic interfaces to facilitate substitution of alternative versions of its different parts. Altogether, LPF provides a flexible, extensible framework for simulation-based analysis of diagnostic software; these characteristics also render it amenable to application to diagnostic programs other than Livingstone.

Joint Inversion of 3d Mt/gravity/magnetic at Pisagua Fault.

NASA Astrophysics Data System (ADS)

Bascur, J.; Saez, P.; Tapia, R.; Humpire, M.

2017-12-01

This work shows the results of a joint inversion at Pisagua Fault using 3D Magnetotellurics (MT), gravity and regional magnetic data. The MT survey has a poor coverage of study area with only 21 stations; however, it allows to detect a low resistivity zone aligned with the Pisagua Fault trace that it is interpreted as a damage zone. The integration of gravity and magnetic data, which have more dense sampling and coverage, adds more detail and resolution to the detected low resistivity structure and helps to improve the structure interpretation using the resulted models (density, magnetic-susceptibility and electrical resistivity). The joint inversion process minimizes a multiple target function which includes the data misfit, model roughness and coupling norms (crossgradient and direct relations) for all geophysical methods considered (MT, gravity and magnetic). This process is solved iteratively using the Gauss-Newton method which updates the model of each geophysical method improving its individual data misfit, model roughness and the coupling with the other geophysical models. For solving the model updates of magnetic and gravity methods were developed dedicated 3D inversion software codes which include the coupling norms with additionals geophysical parameters. The model update of the 3D MT is calculated using an iterative method which sequentially filters the priority model and the output model of a single 3D MT inversion process for obtaining the resistivity model coupled solution with the gravity and magnetic methods.
Improving Multiple Fault Diagnosability using Possible Conflicts

NASA Technical Reports Server (NTRS)

Daigle, Matthew J.; Bregon, Anibal; Biswas, Gautam; Koutsoukos, Xenofon; Pulido, Belarmino

2012-01-01

Multiple fault diagnosis is a difficult problem for dynamic systems. Due to fault masking, compensation, and relative time of fault occurrence, multiple faults can manifest in many different ways as observable fault signature sequences. This decreases diagnosability of multiple faults, and therefore leads to a loss in effectiveness of the fault isolation step. We develop a qualitative, event-based, multiple fault isolation framework, and derive several notions of multiple fault diagnosability. We show that using Possible Conflicts, a model decomposition technique that decouples faults from residuals, we can significantly improve the diagnosability of multiple faults compared to an approach using a single global model. We demonstrate these concepts and provide results using a multi-tank system as a case study.
An Integrated Fault Tolerant Robotic Controller System for High Reliability and Safety

NASA Technical Reports Server (NTRS)

Marzwell, Neville I.; Tso, Kam S.; Hecht, Myron

1994-01-01

This paper describes the concepts and features of a fault-tolerant intelligent robotic control system being developed for applications that require high dependability (reliability, availability, and safety). The system consists of two major elements: a fault-tolerant controller and an operator workstation. The fault-tolerant controller uses a strategy which allows for detection and recovery of hardware, operating system, and application software failures.The fault-tolerant controller can be used by itself in a wide variety of applications in industry, process control, and communications. The controller in combination with the operator workstation can be applied to robotic applications such as spaceborne extravehicular activities, hazardous materials handling, inspection and maintenance of high value items (e.g., space vehicles, reactor internals, or aircraft), medicine, and other tasks where a robot system failure poses a significant risk to life or property.
Reliability of Fault Tolerant Control Systems. Part 1

NASA Technical Reports Server (NTRS)

Wu, N. Eva

2001-01-01

This paper reports Part I of a two part effort, that is intended to delineate the relationship between reliability and fault tolerant control in a quantitative manner. Reliability analysis of fault-tolerant control systems is performed using Markov models. Reliability properties, peculiar to fault-tolerant control systems are emphasized. As a consequence, coverage of failures through redundancy management can be severely limited. It is shown that in the early life of a syi1ein composed of highly reliable subsystems, the reliability of the overall system is affine with respect to coverage, and inadequate coverage induces dominant single point failures. The utility of some existing software tools for assessing the reliability of fault tolerant control systems is also discussed. Coverage modeling is attempted in Part II in a way that captures its dependence on the control performance and on the diagnostic resolution.
[Advanced Development for Space Robotics With Emphasis on Fault Tolerance Technology

NASA Technical Reports Server (NTRS)

Tesar, Delbert

1997-01-01

This report describes work developing fault tolerant redundant robotic architectures and adaptive control strategies for robotic manipulator systems which can dynamically accommodate drastic robot manipulator mechanism, sensor or control failures and maintain stable end-point trajectory control with minimum disturbance. Kinematic designs of redundant, modular, reconfigurable arms for fault tolerance were pursued at a fundamental level. The approach developed robotic testbeds to evaluate disturbance responses of fault tolerant concepts in robotic mechanisms and controllers. The development was implemented in various fault tolerant mechanism testbeds including duality in the joint servo motor modules, parallel and serial structural architectures, and dual arms. All have real-time adaptive controller technologies to react to mechanism or controller disturbances (failures) to perform real-time reconfiguration to continue the task operations. The developments fall into three main areas: hardware, software, and theoretical.
Traffic accident reconstruction and an approach for prediction of fault rates using artificial neural networks: A case study in Turkey.

PubMed

Can Yilmaz, Ali; Aci, Cigdem; Aydin, Kadir

2016-08-17

Currently, in Turkey, fault rates in traffic accidents are determined according to the initiative of accident experts (no speed analyses of vehicles just considering accident type) and there are no specific quantitative instructions on fault rates related to procession of accidents which just represents the type of collision (side impact, head to head, rear end, etc.) in No. 2918 Turkish Highway Traffic Act (THTA 1983). The aim of this study is to introduce a scientific and systematic approach for determination of fault rates in most frequent property damage-only (PDO) traffic accidents in Turkey. In this study, data (police reports, skid marks, deformation, crush depth, etc.) collected from the most frequent and controversial accident types (4 sample vehicle-vehicle scenarios) that consist of PDO were inserted into a reconstruction software called vCrash. Sample real-world scenarios were simulated on the software to generate different vehicle deformations that also correspond to energy-equivalent speed data just before the crash. These values were used to train a multilayer feedforward artificial neural network (MFANN), function fitting neural network (FITNET, a specialized version of MFANN), and generalized regression neural network (GRNN) models within 10-fold cross-validation to predict fault rates without using software. The performance of the artificial neural network (ANN) prediction models was evaluated using mean square error (MSE) and multiple correlation coefficient (R). It was shown that the MFANN model performed better for predicting fault rates (i.e., lower MSE and higher R) than FITNET and GRNN models for accident scenarios 1, 2, and 3, whereas FITNET performed the best for scenario 4. The FITNET model showed the second best results for prediction for the first 3 scenarios. Because there is no training phase in GRNN, the GRNN model produced results much faster than MFANN and FITNET models. However, the GRNN model had the worst prediction results. The R values for prediction of fault rates were close to 1 for all folds and scenarios. This study focuses on exhibiting new aspects and scientific approaches for determining fault rates of involvement in most frequent PDO accidents occurring in Turkey by discussing some deficiencies in THTA and without regard to initiative and/or experience of experts. This study yields judicious decisions to be made especially on forensic investigations and events involving insurance companies. Referring to this approach, injury/fatal and/or pedestrian-related accidents may be analyzed as future work by developing new scientific models.
The Assistant for Specifying the Quality Software (ASQS) Operational Concept Document. Volume 1

DTIC Science & Technology

1990-09-01

Assistant in which the manager supplies system-specific characteristics and needs and the Assistant fills in the software quality concepts and methods. The...member(s) of the Computer Resources Working Group (CRWG) to aid in performing a software quality engineering study. Figure 3.4-1 outlines the...need to recovery from faults more likely than need _o provide alternative functions or interfaces), and more on Autcncmy - 27 - that Modularity
An Open Avionics and Software Architecture to Support Future NASA Exploration Missions

NASA Technical Reports Server (NTRS)

Schlesinger, Adam

2017-01-01

The presentation describes an avionics and software architecture that has been developed through NASAs Advanced Exploration Systems (AES) division. The architecture is open-source, highly reliable with fault tolerance, and utilizes standard capabilities and interfaces, which are scalable and customizable to support future exploration missions. Specific focus areas of discussion will include command and data handling, software, human interfaces, communication and wireless systems, and systems engineering and integration.
Scalable cloud without dedicated storage

NASA Astrophysics Data System (ADS)

Batkovich, D. V.; Kompaniets, M. V.; Zarochentsev, A. K.

2015-05-01

We present a prototype of a scalable computing cloud. It is intended to be deployed on the basis of a cluster without the separate dedicated storage. The dedicated storage is replaced by the distributed software storage. In addition, all cluster nodes are used both as computing nodes and as storage nodes. This solution increases utilization of the cluster resources as well as improves fault tolerance and performance of the distributed storage. Another advantage of this solution is high scalability with a relatively low initial and maintenance cost. The solution is built on the basis of the open source components like OpenStack, CEPH, etc.
Semi-automatic mapping of fault rocks on a Digital Outcrop Model, Gole Larghe Fault Zone (Southern Alps, Italy)

NASA Astrophysics Data System (ADS)

Vho, Alice; Bistacchi, Andrea

2015-04-01

A quantitative analysis of fault-rock distribution is of paramount importance for studies of fault zone architecture, fault and earthquake mechanics, and fluid circulation along faults at depth. Here we present a semi-automatic workflow for fault-rock mapping on a Digital Outcrop Model (DOM). This workflow has been developed on a real case of study: the strike-slip Gole Larghe Fault Zone (GLFZ). It consists of a fault zone exhumed from ca. 10 km depth, hosted in granitoid rocks of Adamello batholith (Italian Southern Alps). Individual seismogenic slip surfaces generally show green cataclasites (cemented by the precipitation of epidote and K-feldspar from hydrothermal fluids) and more or less well preserved pseudotachylytes (black when well preserved, greenish to white when altered). First of all, a digital model for the outcrop is reconstructed with photogrammetric techniques, using a large number of high resolution digital photographs, processed with VisualSFM software. By using high resolution photographs the DOM can have a much higher resolution than with LIDAR surveys, up to 0.2 mm/pixel. Then, image processing is performed to map the fault-rock distribution with the ImageJ-Fiji package. Green cataclasites and epidote/K-feldspar veins can be quite easily separated from the host rock (tonalite) using spectral analysis. Particularly, band ratio and principal component analysis have been tested successfully. The mapping of black pseudotachylyte veins is more tricky because the differences between the pseudotachylyte and biotite spectral signature are not appreciable. For this reason we have tested different morphological processing tools aimed at identifying (and subtracting) the tiny biotite grains. We propose a solution based on binary images involving a combination of size and circularity thresholds. Comparing the results with manually segmented images, we noticed that major problems occur only when pseudotachylyte veins are very thin and discontinuous. After having tested and refined the image analysis processing for some typical images, we have recorded a macro with ImageJ-Fiji allowing to process all the images for a given DOM. As a result, the three different types of rocks can be semi-automatically mapped on large DOMs using a simple and efficient procedure. This allows to develop quantitative analyses of fault rock distribution and thickness, fault trace roughness/curvature and length, fault zone architecture, and alteration halos due to hydrothermal fluid-rock interaction. To improve our workflow, additional or different morphological operators could be integrated in our procedure to yield a better resolution on small and thin pseudotachylyte veins (e.g. perimeter/area ratio).
On the engineering of crucial software

NASA Technical Reports Server (NTRS)

Pratt, T. W.; Knight, J. C.; Gregory, S. T.

1983-01-01

The various aspects of the conventional software development cycle are examined. This cycle was the basis of the augmented approach contained in the original grant proposal. This cycle was found inadequate for crucial software development, and the justification for this opinion is presented. Several possible enhancements to the conventional software cycle are discussed. Software fault tolerance, a possible enhancement of major importance, is discussed separately. Formal verification using mathematical proof is considered. Automatic programming is a radical alternative to the conventional cycle and is discussed. Recommendations for a comprehensive approach are presented, and various experiments which could be conducted in AIRLAB are described.
Information technologies in optimization process of monitoring of software and hardware status

NASA Astrophysics Data System (ADS)

Nikitin, P. V.; Savinov, A. N.; Bazhenov, R. I.; Ryabov, I. V.

2018-05-01

The article describes a model of a hardware and software monitoring system for a large company that provides customers with software as a service (SaaS solution) using information technology. The main functions of the monitoring system are: provision of up-todate data for analyzing the state of the IT infrastructure, rapid detection of the fault and its effective elimination. The main risks associated with the provision of these services are described; the comparative characteristics of the software are given; author's methods of monitoring the status of software and hardware are proposed.
Software architecture of the Magdalena Ridge Observatory Interferometer

NASA Astrophysics Data System (ADS)

Farris, Allen; Klinglesmith, Dan; Seamons, John; Torres, Nicolas; Buscher, David; Young, John

2010-07-01

Merging software from 36 independent work packages into a coherent, unified software system with a lifespan of twenty years is the challenge faced by the Magdalena Ridge Observatory Interferometer (MROI). We solve this problem by using standardized interface software automatically generated from simple highlevel descriptions of these systems, relying only on Linux, GNU, and POSIX without complex software such as CORBA. This approach, based on gigabit Ethernet with a TCP/IP protocol, provides the flexibility to integrate and manage diverse, independent systems using a centralized supervisory system that provides a database manager, data collectors, fault handling, and an operator interface.
Processing LiDAR Data to Predict Natural Hazards

NASA Technical Reports Server (NTRS)

Fairweather, Ian; Crabtree, Robert; Hager, Stacey

2008-01-01

ELF-Base and ELF-Hazards (wherein 'ELF' signifies 'Extract LiDAR Features' and 'LiDAR' signifies 'light detection and ranging') are developmental software modules for processing remote-sensing LiDAR data to identify past natural hazards (principally, landslides) and predict future ones. ELF-Base processes raw LiDAR data, including LiDAR intensity data that are often ignored in other software, to create digital terrain models (DTMs) and digital feature models (DFMs) with sub-meter accuracy. ELF-Hazards fuses raw LiDAR data, data from multispectral and hyperspectral optical images, and DTMs and DFMs generated by ELF-Base to generate hazard risk maps. Advanced algorithms in these software modules include line-enhancement and edge-detection algorithms, surface-characterization algorithms, and algorithms that implement innovative data-fusion techniques. The line-extraction and edge-detection algorithms enable users to locate such features as faults and landslide headwall scarps. Also implemented in this software are improved methodologies for identification and mapping of past landslide events by use of (1) accurate, ELF-derived surface characterizations and (2) three LiDAR/optical-data-fusion techniques: post-classification data fusion, maximum-likelihood estimation modeling, and hierarchical within-class discrimination. This software is expected to enable faster, more accurate forecasting of natural hazards than has previously been possible.
Operations management system advanced automation: Fault detection isolation and recovery prototyping

NASA Technical Reports Server (NTRS)

Hanson, Matt

1990-01-01

The purpose of this project is to address the global fault detection, isolation and recovery (FDIR) requirements for Operation's Management System (OMS) automation within the Space Station Freedom program. This shall be accomplished by developing a selected FDIR prototype for the Space Station Freedom distributed processing systems. The prototype shall be based on advanced automation methodologies in addition to traditional software methods to meet the requirements for automation. A secondary objective is to expand the scope of the prototyping to encompass multiple aspects of station-wide fault management (SWFM) as discussed in OMS requirements documentation.
Scientific Research Program for Power, Energy, and Thermal Technologies. Task Order 0002: Power, Thermal and Control Technologies and Processes Experimental Research. Subtask: Laboratory Test Set-up to Evaluate Electromechanical Actuation Systems for Aircraft Flight Control

DTIC Science & Technology

2015-08-01

faults are incorporated into the system in order to better understand the EMA reliability, and to aid in designing fault detection software for real...to a fixed angle repeatedly and accurately [16]. The motor in the EHA is used to drive a reversible pump tied to a hydraulic cylinder which moves...24] [25] [26]. These test stands are used for the prognostic testing of EMAS that have had mechanical or electrical faults injected into them. The
An empirical study of software design practices

NASA Technical Reports Server (NTRS)

Card, David N.; Church, Victor E.; Agresti, William W.

1986-01-01

Software engineers have developed a large body of software design theory and folklore, much of which was never validated. The results of an empirical study of software design practices in one specific environment are presented. The practices examined affect module size, module strength, data coupling, descendant span, unreferenced variables, and software reuse. Measures characteristic of these practices were extracted from 887 FORTRAN modules developed for five flight dynamics software projects monitored by the Software Engineering Laboratory (SEL). The relationship of these measures to cost and fault rate was analyzed using a contingency table procedure. The results show that some recommended design practices, despite their intuitive appeal, are ineffective in this environment, whereas others are very effective.
Modelling earthquake ruptures with dynamic off-fault damage

NASA Astrophysics Data System (ADS)

Okubo, Kurama; Bhat, Harsha S.; Klinger, Yann; Rougier, Esteban

2017-04-01

Earthquake rupture modelling has been developed for producing scenario earthquakes. This includes understanding the source mechanisms and estimating far-field ground motion with given a priori constraints like fault geometry, constitutive law of the medium and friction law operating on the fault. It is necessary to consider all of the above complexities of a fault systems to conduct realistic earthquake rupture modelling. In addition to the complexity of the fault geometry in nature, coseismic off-fault damage, which is observed by a variety of geological and seismological methods, plays a considerable role on the resultant ground motion and its spectrum compared to a model with simple planer fault surrounded by purely elastic media. Ideally all of these complexities should be considered in earthquake modelling. State of the art techniques developed so far, however, cannot treat all of them simultaneously due to a variety of computational restrictions. Therefore, we adopt the combined finite-discrete element method (FDEM), which can effectively deal with pre-existing complex fault geometry such as fault branches and kinks and can describe coseismic off-fault damage generated during the dynamic rupture. The advantage of FDEM is that it can handle a wide range of length scales, from metric to kilometric scale, corresponding to the off-fault damage and complex fault geometry respectively. We used the FDEM-based software tool called HOSSedu (Hybrid Optimization Software Suite - Educational Version) for the earthquake rupture modelling, which was developed by Los Alamos National Laboratory. We firstly conducted the cross-validation of this new methodology against other conventional numerical schemes such as the finite difference method (FDM), the spectral element method (SEM) and the boundary integral equation method (BIEM), to evaluate the accuracy with various element sizes and artificial viscous damping values. We demonstrate the capability of the FDEM tool for modelling earthquake ruptures. We then modelled earthquake ruptures allowing for coseismic off-fault damage with appropriate fracture nucleation and growth criteria. We studied the effect of different conditions such as rupture speed (sub-Rayleigh or supershear), the orientation of the initial maximum principal stress with respect to the fault and the magnitude of the initial stress (to mimic depth). The comparison between the sub-Rayleigh and supershear case shows that the coseismic off-fault damage is enhanced in the supershear case when compared with the sub-Rayleigh case. The orientation of the maximum principal stress also has significant difference such that the dynamic off-fault cracking is more likely to occur on the extensional side of the fault for high principal stress orientation. It is found that the coseismic off-fault damage reduces the rupture speed due to the dissipation of the energy by dynamic off-fault cracking generated in the vicinity of the rupture front. In terms of the ground motion amplitude spectra it is shown that the high-frequency radiation is enhanced by the coseismic off-fault damage though it is quickly attenuated. This is caused by the intricate superposition of the radiation generated by the off-fault damage and the perturbation of the rupture speed on the main fault.
Lessons learned in creating spacecraft computer systems: Implications for using Ada (R) for the space station

NASA Technical Reports Server (NTRS)

Tomayko, James E.

1986-01-01

Twenty-five years of spacecraft onboard computer development have resulted in a better understanding of the requirements for effective, efficient, and fault tolerant flight computer systems. Lessons from eight flight programs (Gemini, Apollo, Skylab, Shuttle, Mariner, Voyager, and Galileo) and three reserach programs (digital fly-by-wire, STAR, and the Unified Data System) are useful in projecting the computer hardware configuration of the Space Station and the ways in which the Ada programming language will enhance the development of the necessary software. The evolution of hardware technology, fault protection methods, and software architectures used in space flight in order to provide insight into the pending development of such items for the Space Station are reviewed.
Analytical sensor redundancy assessment

NASA Technical Reports Server (NTRS)

Mulcare, D. B.; Downing, L. E.; Smith, M. K.

1988-01-01

The rationale and mechanization of sensor fault tolerance based on analytical redundancy principles are described. The concept involves the substitution of software procedures, such as an observer algorithm, to supplant additional hardware components. The observer synthesizes values of sensor states in lieu of their direct measurement. Such information can then be used, for example, to determine which of two disagreeing sensors is more correct, thus enhancing sensor fault survivability. Here a stability augmentation system is used as an example application, with required modifications being made to a quadruplex digital flight control system. The impact on software structure and the resultant revalidation effort are illustrated as well. Also, the use of an observer algorithm for wind gust filtering of the angle-of-attack sensor signal is presented.

The implementation and use of Ada on distributed systems with high reliability requirements

NASA Technical Reports Server (NTRS)

Knight, J. C.; Gregory, S. T.; Urquhart, J. I. A.

1985-01-01

The use and implementation of Ada in distributed environments in which reliability is the primary concern were investigated. In particular, the concept that a distributed system may be programmed entirely in Ada so that the individual tasks of the system are unconcerned with which processors they are executing on, and that failures may occur in the software or underlying hardware was examined. Progress is discussed for the following areas: continued development and testing of the fault-tolerant Ada testbed; development of suggested changes to Ada so that it might more easily cope with the failure of interest; and design of new approaches to fault-tolerant software in real-time systems, and integration of these ideas into Ada.
Analysis of Multilayered Printed Circuit Boards using Computed Tomography

DTIC Science & Technology

2014-05-01

complex PCBs that present a challenge for any testing or fault analysis. Set-to- work testing and fault analysis of any electronic circuit require...Electronic Warfare and Radar Division in December 2010. He is currently in Electro- Optic Countermeasures Group. Samuel works on embedded system design...and software optimisation of complex electro-optical systems, including the set to work and characterisation of these systems. He has a Bachelor of
Flight test results of the Strapdown hexad Inertial Reference Unit (SIRU). Volume 1: Flight test summary

NASA Technical Reports Server (NTRS)

Hruby, R. J.; Bjorkman, W. S.

1977-01-01

Flight test results of the strapdown inertial reference unit (SIRU) navigation system are presented. The fault-tolerant SIRU navigation system features a redundant inertial sensor unit and dual computers. System software provides for detection and isolation of inertial sensor failures and continued operation in the event of failures. Flight test results include assessments of the system's navigational performance and fault tolerance.
Attitude Determination and Control System (ADCS) and Maintenance and Diagnostic System (MDS): A maintenance and diagnostic system for Space Station Freedom

NASA Technical Reports Server (NTRS)

Toms, David; Hadden, George D.; Harrington, Jim

1990-01-01

The Maintenance and Diagnostic System (MDS) that is being developed at Honeywell to enhance the Fault Detection Isolation and Recovery system (FDIR) for the Attitude Determination and Control System on Space Station Freedom is described. The MDS demonstrates ways that AI-based techniques can be used to improve the maintainability and safety of the Station by helping to resolve fault anomalies that cannot be fully determined by built-in-test, by providing predictive maintenance capabilities, and by providing expert maintenance assistance. The MDS will address the problems associated with reasoning about dynamic, continuous information versus only about static data, the concerns of porting software based on AI techniques to embedded targets, and the difficulties associated with real-time response. An initial prototype was built of the MDS. The prototype executes on Sun and IBM PS/2 hardware and is implemented in the Common Lisp; further work will evaluate its functionality and develop mechanisms to port the code to Ada.
Developing parallel GeoFEST(P) using the PYRAMID AMR library

NASA Technical Reports Server (NTRS)

Norton, Charles D.; Lyzenga, Greg; Parker, Jay; Tisdale, Robert E.

2004-01-01

The PYRAMID parallel unstructured adaptive mesh refinement (AMR) library has been coupled with the GeoFEST geophysical finite element simulation tool to support parallel active tectonics simulations. Specifically, we have demonstrated modeling of coseismic and postseismic surface displacement due to a simulated Earthquake for the Landers system of interacting faults in Southern California. The new software demonstrated a 25-times resolution improvement and a 4-times reduction in time to solution over the sequential baseline milestone case. Simulations on workstations using a few tens of thousands of stress displacement finite elements can now be expanded to multiple millions of elements with greater than 98% scaled efficiency on various parallel platforms over many hundreds of processors. Our most recent work has demonstrated that we can dynamically adapt the computational grid as stress grows on a fault. In this paper, we will describe the major issues and challenges associated with coupling these two programs to create GeoFEST(P). Performance and visualization results will also be described.
Pratt and Whitney Overview and Advanced Health Management Program

NASA Technical Reports Server (NTRS)

Inabinett, Calvin

2008-01-01

Hardware Development Activity: Design and Test Custom Multi-layer Circuit Boards for use in the Fault Emulation Unit; Logic design performed using VHDL; Layout power system for lab hardware; Work lab issues with software developers and software testers; Interface with Engine Systems personnel with performance of Engine hardware components; Perform off nominal testing with new engine hardware.
Human factors in software development

DOE Office of Scientific and Technical Information (OSTI.GOV)

Curtis, B.

1986-01-01

This book presents an overview of ergonomics/human factors in software development, recent research, and classic papers. Articles are drawn from the following areas of psychological research on programming: cognitive ergonomics, cognitive psychology, and psycholinguistics. Topics examined include: theoretical models of how programmers solve technical problems, the characteristics of programming languages, specification formats in behavioral research and psychological aspects of fault diagnosis.
Developing Software For Monitoring And Diagnosis

NASA Technical Reports Server (NTRS)

Edwards, S. J.; Caglayan, A. K.

1993-01-01

Expert-system software shell produces executable code. Report discusses beginning phase of research directed toward development of artificial intelligence for real-time monitoring of, and diagnosis of faults in, complicated systems of equipment. Motivated by need for onboard monitoring and diagnosis of electronic sensing and controlling systems of advanced aircraft. Also applicable to such equipment systems as refineries, factories, and powerplants.
Development of preliminary design concept for a multifunction display and control system for the Orbiter crew station. Task 4: Design concept recommendation

NASA Technical Reports Server (NTRS)

Spiger, R. J.; Farrell, R. J.; Holcomb, G. A.

1982-01-01

Application of multifunction display and control systems to the NASA Orbiter spacecraft offers the potential for reducing crew workload and improving the presentation of system status and operational data to the crew. A design concept is presented for the application of a multifunction display and control system (MFDCS) to the Orbital Maneuvering System and Electrical Power Distribution and Control System on the Orbiter spacecraft. The MFDCS would provide the capability for automation of procedures, fault prioritization and software reconfiguration of the MFDCS data base. The MFDCS would operate as a stand-alone processor to minimize the impact on the current Orbiter software. Supervisory crew command of all current functions would be retained through the use of several operating modes in the system. Both the design concept and the processes followed in defining the concept are described.
Lessons Learned in the Livingstone 2 on Earth Observing One Flight Experiment

NASA Technical Reports Server (NTRS)

Hayden, Sandra C.; Sweet, Adam J.; Shulman, Seth

2005-01-01

The Livingstone 2 (L2) model-based diagnosis software is a reusable diagnostic tool for monitoring complex systems. In 2004, L2 was integrated with the JPL Autonomous Sciencecraft Experiment (ASE) and deployed on-board Goddard's Earth Observing One (EO-1) remote sensing satellite, to monitor and diagnose the EO-1 space science instruments and imaging sequence. This paper reports on lessons learned from this flight experiment. The goals for this experiment, including validation of minimum success criteria and of a series of diagnostic scenarios, have all been successfully net. Long-term operations in space are on-going, as a test of the maturity of the system, with L2 performance remaining flawless. L2 has demonstrated the ability to track the state of the system during nominal operations, detect simulated abnormalities in operations and isolate failures to their root cause fault. Specific advances demonstrated include diagnosis of ambiguity groups rather than a single fault candidate; hypothesis revision given new sensor evidence about the state of the system; and the capability to check for faults in a dynamic system without having to wait until the system is quiescent. The major benefits of this advanced health management technology are to increase mission duration and reliability through intelligent fault protection, and robust autonomous operations with reduced dependency on supervisory operations from Earth. The work-load for operators will be reduced by telemetry of processed state-of-health information rather than raw data. The long-term vision is that of making diagnosis available to the onboard planner or executive, allowing autonomy software to re-plan in order to work around known component failures. For a system that is expected to evolve substantially over its lifetime, as for the International Space Station, the model-based approach has definite advantages over rule-based expert systems and limit-checking fault protection systems, as these do not scale well. The model-based approach facilitates reuse of the L2 diagnostic software; only the model of the system to be diagnosed and telemetry monitoring software has to be rebuilt for a new system or expanded for a growing system. The hierarchical L2 model supports modularity and expendability, and as such is suitable solution for integrated system health management as envisioned for systems-of-systems.
The Importance of Architecture in DoD Software

DTIC Science & Technology

1991-07-01

01282 92 1 14 060 M91-35 The Importance of Architecture in DOD Software S ACCesion For- * DTIC "r,’L- .S Dr. Barry M. Horowitz July 1991 D;.t ibto...resource utilization: architecture determines how the system sustains , 06 operations when parts of the system fail. The architecture also determines...software maintainers to ensure that we deliver to them whatever is necessary for them Medium to sustain and use the architecture . Fault Rate 37% Getting
Software-Controlled Caches in the VMP Multiprocessor

DTIC Science & Technology

1986-03-01

programming system level that Processors is tuned for the VMP design. In this vein, we are interested in exploring how far the software support can go to ...handled in software, analogously to the handling agement of the shared program state is familiar and of virtual memory page faults. Hardware support for...ensure good behavior, as opposed to how Each cache miss results in bus traffic. Table 2 pro- vides the bus cost for the "average" cache miss. Fig
Model-Based Verification and Validation of Spacecraft Avionics

NASA Technical Reports Server (NTRS)

Khan, M. Omair; Sievers, Michael; Standley, Shaun

2012-01-01

Verification and Validation (V&V) at JPL is traditionally performed on flight or flight-like hardware running flight software. For some time, the complexity of avionics has increased exponentially while the time allocated for system integration and associated V&V testing has remained fixed. There is an increasing need to perform comprehensive system level V&V using modeling and simulation, and to use scarce hardware testing time to validate models; the norm for thermal and structural V&V for some time. Our approach extends model-based V&V to electronics and software through functional and structural models implemented in SysML. We develop component models of electronics and software that are validated by comparison with test results from actual equipment. The models are then simulated enabling a more complete set of test cases than possible on flight hardware. SysML simulations provide access and control of internal nodes that may not be available in physical systems. This is particularly helpful in testing fault protection behaviors when injecting faults is either not possible or potentially damaging to the hardware. We can also model both hardware and software behaviors in SysML, which allows us to simulate hardware and software interactions. With an integrated model and simulation capability we can evaluate the hardware and software interactions and identify problems sooner. The primary missing piece is validating SysML model correctness against hardware; this experiment demonstrated such an approach is possible.
Functional Fault Modeling Conventions and Practices for Real-Time Fault Isolation

NASA Technical Reports Server (NTRS)

Ferrell, Bob; Lewis, Mark; Perotti, Jose; Oostdyk, Rebecca; Brown, Barbara

2010-01-01

The purpose of this paper is to present the conventions, best practices, and processes that were established based on the prototype development of a Functional Fault Model (FFM) for a Cryogenic System that would be used for real-time Fault Isolation in a Fault Detection, Isolation, and Recovery (FDIR) system. The FDIR system is envisioned to perform health management functions for both a launch vehicle and the ground systems that support the vehicle during checkout and launch countdown by using a suite of complimentary software tools that alert operators to anomalies and failures in real-time. The FFMs were created offline but would eventually be used by a real-time reasoner to isolate faults in a Cryogenic System. Through their development and review, a set of modeling conventions and best practices were established. The prototype FFM development also provided a pathfinder for future FFM development processes. This paper documents the rationale and considerations for robust FFMs that can easily be transitioned to a real-time operating environment.
Investigating an API for resilient exascale computing.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Stearley, Jon R.; Tomkins, James; VanDyke, John P.

2013-05-01

Increased HPC capability comes with increased complexity, part counts, and fault occurrences. In- creasing the resilience of systems and applications to faults is a critical requirement facing the viability of exascale systems, as the overhead of traditional checkpoint/restart is projected to outweigh its bene ts due to fault rates outpacing I/O bandwidths. As faults occur and propagate throughout hardware and software layers, pervasive noti cation and handling mechanisms are necessary. This report describes an initial investigation of fault types and programming interfaces to mitigate them. Proof-of-concept APIs are presented for the frequent and important cases of memory errors and nodemore » failures, and a strategy proposed for lesystem failures. These involve changes to the operating system, runtime, I/O library, and application layers. While a single API for fault handling among hardware and OS and application system-wide remains elusive, the e ort increased our understanding of both the mountainous challenges and the promising trailheads. 3« less
A study of the relationship between the performance and dependability of a fault-tolerant computer

NASA Technical Reports Server (NTRS)

Goswami, Kumar K.

1994-01-01

This thesis studies the relationship by creating a tool (FTAPE) that integrates a high stress workload generator with fault injection and by using the tool to evaluate system performance under error conditions. The workloads are comprised of processes which are formed from atomic components that represent CPU, memory, and I/O activity. The fault injector is software-implemented and is capable of injecting any memory addressable location, including special registers and caches. This tool has been used to study a Tandem Integrity S2 Computer. Workloads with varying numbers of processes and varying compositions of CPU, memory, and I/O activity are first characterized in terms of performance. Then faults are injected into these workloads. The results show that as the number of concurrent processes increases, the mean fault latency initially increases due to increased contention for the CPU. However, for even higher numbers of processes (less than 3 processes), the mean latency decreases because long latency faults are paged out before they can be activated.
Integrated software health management for aerospace guidance, navigation, and control systems: A probabilistic reasoning approach

NASA Astrophysics Data System (ADS)

Mbaya, Timmy

Embedded Aerospace Systems have to perform safety and mission critical operations in a real-time environment where timing and functional correctness are extremely important. Guidance, Navigation, and Control (GN&C) systems substantially rely on complex software interfacing with hardware in real-time; any faults in software or hardware, or their interaction could result in fatal consequences. Integrated Software Health Management (ISWHM) provides an approach for detection and diagnosis of software failures while the software is in operation. The ISWHM approach is based on probabilistic modeling of software and hardware sensors using a Bayesian network. To meet memory and timing constraints of real-time embedded execution, the Bayesian network is compiled into an Arithmetic Circuit, which is used for on-line monitoring. This type of system monitoring, using an ISWHM, provides automated reasoning capabilities that compute diagnoses in a timely manner when failures occur. This reasoning capability enables time-critical mitigating decisions and relieves the human agent from the time-consuming and arduous task of foraging through a multitude of isolated---and often contradictory---diagnosis data. For the purpose of demonstrating the relevance of ISWHM, modeling and reasoning is performed on a simple simulated aerospace system running on a real-time operating system emulator, the OSEK/Trampoline platform. Models for a small satellite and an F-16 fighter jet GN&C (Guidance, Navigation, and Control) system have been implemented. Analysis of the ISWHM is then performed by injecting faults and analyzing the ISWHM's diagnoses.
The 26 May 2006 Yogyakarta earthquake fault observed by seismic data and satellite data based surface features

NASA Astrophysics Data System (ADS)

Anggraini, Ade; Sobiesiak, Monika; Walter, Thomas R.

2010-05-01

The Mw 6.3 May 26, 2006 Yogyakarta Earthquake caused severe damage and claimed thousands lives in the Yogyakarta Special Province and Klaten District of Central Java Province. The nearby Opak River fault was thought to be the source of this earthquake disaster. However, no significant surface movement was observed along the fault which could confirm that this fault was really the source of the earthquake. To investigate the earthquake source and to understand the earthquake mechanism, a rapid response team of the German Task Force for Earthquake, together with the Seismological Division of Badan Meteorologi Klimatologi dan Geofisika and Gadjah Mada University in Yogyakarta, had installed a temporary seismic network of 12 short period seismometers. More than 3000 aftershocks were recorded during the 3-month campaign. Here we present the result of several hundred processed aftershocks. We used integrated software package GIANTPitsa to pick P and S phases manually and HYPO71 to determine the hypocenters. HypoDD software was used for hypocenters relocation to obtain high precision aftershock locations. Our aftershock distribution shows a system of lineaments in southwest-northeast direction, about 10 km east to Opak River fault, at 5-18 km depth. The b-value map from the aftershocks shows that the main lineaments have relatively low b-value at the middle part which suggests this part is still under stress. We also observe several aftershock clusters cutting these lineaments in nearly perpendicular direction. To verify the interpretation of our aftershocks analysis, we will overlay it on surface feature we delineate from satellite data. Hopefully our result will give significant contribution to understand the near surface fault systems around Yogyakarta Area in order to mitigate similar earthquake hazard in the future.
A circuit-based photovoltaic module simulator with shadow and fault settings

NASA Astrophysics Data System (ADS)

Chao, Kuei-Hsiang; Chao, Yuan-Wei; Chen, Jyun-Ping

2016-03-01

The main purpose of this study was to develop a photovoltaic (PV) module simulator. The proposed simulator, using electrical parameters from solar cells, could simulate output characteristics not only during normal operational conditions, but also during conditions of partial shadow and fault conditions. Such a simulator should possess the advantages of low cost, small size and being easily realizable. Experiments have shown that results from a proposed PV simulator of this kind are very close to that from simulation software during partial shadow conditions, and with negligible differences during fault occurrence. Meanwhile, the PV module simulator, as developed, could be used on various types of series-parallel connections to form PV arrays, to conduct experiments on partial shadow and fault events occurring in some of the modules. Such experiments are designed to explore the impact of shadow and fault conditions on the output characteristics of the system as a whole.
Narrowing the scope of failure prediction using targeted fault load injection

NASA Astrophysics Data System (ADS)

Jordan, Paul L.; Peterson, Gilbert L.; Lin, Alan C.; Mendenhall, Michael J.; Sellers, Andrew J.

2018-05-01

As society becomes more dependent upon computer systems to perform increasingly critical tasks, ensuring that those systems do not fail becomes increasingly important. Many organizations depend heavily on desktop computers for day-to-day operations. Unfortunately, the software that runs on these computers is written by humans and, as such, is still subject to human error and consequent failure. A natural solution is to use statistical machine learning to predict failure. However, since failure is still a relatively rare event, obtaining labelled training data to train these models is not a trivial task. This work presents new simulated fault-inducing loads that extend the focus of traditional fault injection techniques to predict failure in the Microsoft enterprise authentication service and Apache web server. These new fault loads were successful in creating failure conditions that were identifiable using statistical learning methods, with fewer irrelevant faults being created.

Partitioning in Avionics Architectures: Requirements, Mechanisms, and Assurance

NASA Technical Reports Server (NTRS)

Rushby, John

1999-01-01

Automated aircraft control has traditionally been divided into distinct "functions" that are implemented separately (e.g., autopilot, autothrottle, flight management); each function has its own fault-tolerant computer system, and dependencies among different functions are generally limited to the exchange of sensor and control data. A by-product of this "federated" architecture is that faults are strongly contained within the computer system of the function where they occur and cannot readily propagate to affect the operation of other functions. More modern avionics architectures contemplate supporting multiple functions on a single, shared, fault-tolerant computer system where natural fault containment boundaries are less sharply defined. Partitioning uses appropriate hardware and software mechanisms to restore strong fault containment to such integrated architectures. This report examines the requirements for partitioning, mechanisms for their realization, and issues in providing assurance for partitioning. Because partitioning shares some concerns with computer security, security models are reviewed and compared with the concerns of partitioning.
Comparison of the Structurally Controlled Landslides Numerical Model Results to the M 7.2 2013 Bohol Earthquake Co-seismic Landslides

NASA Astrophysics Data System (ADS)

Macario Galang, Jan Albert; Narod Eco, Rodrigo; Mahar Francisco Lagmay, Alfredo

2015-04-01

The M 7.2 October 15, 2013 Bohol earthquake is the most destructive earthquake to hit the Philippines since 2012. The epicenter was located in Sagbayan municipality, central Bohol and was generated by a previously unmapped reverse fault called the "Inabanga Fault". Its name, taken after the barangay (village) where the fault is best exposed and was first seen. The earthquake resulted in 209 fatalities and over 57 billion USD worth of damages. The earthquake generated co-seismic landslides most of which were related to fault structures. Unlike rainfall induced landslides, the trigger for co-seismic landslides happen without warning. Preparedness against this type of landslide therefore, relies heavily on the identification of fracture-related unstable slopes. To mitigate the impacts of co-seismic landslide hazards, morpho-structural orientations or discontinuity sets were mapped in the field with the aid of a 2012 IFSAR Digital Terrain Model (DTM) with 5-meter pixel resolution and < 0.5 meter vertical accuracy. Coltop 3D software was then used to identify similar structures including measurement of their dip and dip directions. The chosen discontinuity sets were then keyed into Matterocking software to identify potential rock slide zones due to planar or wedged discontinuities. After identifying the structurally-controlled unstable slopes, the rock mass propagation extent of the possible rock slides was simulated using Conefall. The results were compared to a post-earthquake landslide inventory of 456 landslides. Out the total number of landslides identified from post-earthquake high-resolution imagery, 366 or 80% intersect the structural-controlled hazard areas of Bohol. The results show the potential of this method to identify co-seismic landslide hazard areas for disaster mitigation. Along with computer methods to simulate shallow landslides, and debris flow paths, located structurally-controlled unstable zones can be used to mark unsafe areas for settlement. The method can be further improved with the use of Lidar DTMs, which has better accuracy than the IFSAR DTM. A nationwide effort under DOST-Project NOAH (DREAM-LIDAR) is underway, to map the Philippine archipelago using Lidar.
A Testbed for Evaluating Lunar Habitat Autonomy Architectures

NASA Technical Reports Server (NTRS)

Lawler, Dennis G.

2008-01-01

A lunar outpost will involve a habitat with an integrated set of hardware and software that will maintain a safe environment for human activities. There is a desire for a paradigm shift whereby crew will be the primary mission operators, not ground controllers. There will also be significant periods when the outpost is uncrewed. This will require that significant automation software be resident in the habitat to maintain all system functions and respond to faults. JSC is developing a testbed to allow for early testing and evaluation of different autonomy architectures. This will allow evaluation of different software configurations in order to: 1) understand different operational concepts; 2) assess the impact of failures and perturbations on the system; and 3) mitigate software and hardware integration risks. The testbed will provide an environment in which habitat hardware simulations can interact with autonomous control software. Faults can be injected into the simulations and different mission scenarios can be scripted. The testbed allows for logging, replaying and re-initializing mission scenarios. An initial testbed configuration has been developed by combining an existing life support simulation and an existing simulation of the space station power distribution system. Results from this initial configuration will be presented along with suggested requirements and designs for the incremental development of a more sophisticated lunar habitat testbed.
A Thermal Expert System (TEXSYS) development overview - AI-based control of a Space Station prototype thermal bus

NASA Technical Reports Server (NTRS)

Glass, B. J.; Hack, E. C.

1990-01-01

A knowledge-based control system for real-time control and fault detection, isolation and recovery (FDIR) of a prototype two-phase Space Station Freedom external thermal control system (TCS) is discussed in this paper. The Thermal Expert System (TEXSYS) has been demonstrated in recent tests to be capable of both fault anticipation and detection and real-time control of the thermal bus. Performance requirements were achieved by using a symbolic control approach, layering model-based expert system software on a conventional numerical data acquisition and control system. The model-based capabilities of TEXSYS were shown to be advantageous during software development and testing. One representative example is given from on-line TCS tests of TEXSYS. The integration and testing of TEXSYS with a live TCS testbed provides some insight on the use of formal software design, development and documentation methodologies to qualify knowledge-based systems for on-line or flight applications.
Model-based reasoning for power system management using KATE and the SSM/PMAD

NASA Technical Reports Server (NTRS)

Morris, Robert A.; Gonzalez, Avelino J.; Carreira, Daniel J.; Mckenzie, F. D.; Gann, Brian

1993-01-01

The overall goal of this research effort has been the development of a software system which automates tasks related to monitoring and controlling electrical power distribution in spacecraft electrical power systems. The resulting software system is called the Intelligent Power Controller (IPC). The specific tasks performed by the IPC include continuous monitoring of the flow of power from a source to a set of loads, fast detection of anomalous behavior indicating a fault to one of the components of the distribution systems, generation of diagnosis (explanation) of anomalous behavior, isolation of faulty object from remainder of system, and maintenance of flow of power to critical loads and systems (e.g. life-support) despite fault conditions being present (recovery). The IPC system has evolved out of KATE (Knowledge-based Autonomous Test Engineer), developed at NASA-KSC. KATE consists of a set of software tools for developing and applying structure and behavior models to monitoring, diagnostic, and control applications.
Autonomous Cryogenics Loading Operations Simulation Software: Knowledgebase Autonomous Test Engineer

NASA Technical Reports Server (NTRS)

Wehner, Walter S.

2012-01-01

The Simulation Software, KATE (Knowledgebase Autonomous Test Engineer), is used to demonstrate the automatic identification of faults in a system. The ACLO (Autonomous Cryogenics Loading Operation) project uses KATE to monitor and find faults in the loading of the cryogenics int o a vehicle fuel tank. The KATE software interfaces with the IHM (Integrated Health Management) systems bus to communicate with other systems that are part of ACLO. One system that KATE uses the IHM bus to communicate with is AIS (Advanced Inspection System). KATE will send messages to AIS when there is a detected anomaly. These messages include visual inspection of specific valves, pressure gauges and control messages to have AIS open or close manual valves. My goals include implementing the connection to the IHM bus within KATE and for the AIS project. I will also be working on implementing changes to KATE's Ul and implementing the physics objects in KATE that will model portions of the cryogenics loading operation.
Evaluation of an expert system for fault detection, isolation, and recovery in the manned maneuvering unit

NASA Technical Reports Server (NTRS)

Rushby, John; Crow, Judith

1990-01-01

The authors explore issues in the specification, verification, and validation of artificial intelligence (AI) based software, using a prototype fault detection, isolation and recovery (FDIR) system for the Manned Maneuvering Unit (MMU). They use this system as a vehicle for exploring issues in the semantics of C-Language Integrated Production System (CLIPS)-style rule-based languages, the verification of properties relating to safety and reliability, and the static and dynamic analysis of knowledge based systems. This analysis reveals errors and shortcomings in the MMU FDIR system and raises a number of issues concerning software engineering in CLIPs. The authors came to realize that the MMU FDIR system does not conform to conventional definitions of AI software, despite the fact that it was intended and indeed presented as an AI system. The authors discuss this apparent disparity and related questions such as the role of AI techniques in space and aircraft operations and the suitability of CLIPS for critical applications.
MAX - An advanced parallel computer for space applications

NASA Technical Reports Server (NTRS)

Lewis, Blair F.; Bunker, Robert L.

1991-01-01

MAX is a fault-tolerant multicomputer hardware and software architecture designed to meet the needs of NASA spacecraft systems. It consists of conventional computing modules (computers) connected via a dual network topology. One network is used to transfer data among the computers and between computers and I/O devices. This network's topology is arbitrary. The second network operates as a broadcast medium for operating system synchronization messages and supports the operating system's Byzantine resilience. A fully distributed operating system supports multitasking in an asynchronous event and data driven environment. A large grain dataflow paradigm is used to coordinate the multitasking and provide easy control of concurrency. It is the basis of the system's fault tolerance and allows both static and dynamical location of tasks. Redundant execution of tasks with software voting of results may be specified for critical tasks. The dataflow paradigm also supports simplified software design, test and maintenance. A unique feature is a method for reliably patching code in an executing dataflow application.
Advanced diagnostic system for piston slap faults in IC engines, based on the non-stationary characteristics of the vibration signals

NASA Astrophysics Data System (ADS)

Chen, Jian; Randall, Robert Bond; Peeters, Bart

2016-06-01

Artificial Neural Networks (ANNs) have the potential to solve the problem of automated diagnostics of piston slap faults, but the critical issue for the successful application of ANN is the training of the network by a large amount of data in various engine conditions (different speed/load conditions in normal condition, and with different locations/levels of faults). On the other hand, the latest simulation technology provides a useful alternative in that the effect of clearance changes may readily be explored without recourse to cutting metal, in order to create enough training data for the ANNs. In this paper, based on some existing simplified models of piston slap, an advanced multi-body dynamic simulation software was used to simulate piston slap faults with different speeds/loads and clearance conditions. Meanwhile, the simulation models were validated and updated by a series of experiments. Three-stage network systems are proposed to diagnose piston faults: fault detection, fault localisation and fault severity identification. Multi Layer Perceptron (MLP) networks were used in the detection stage and severity/prognosis stage and a Probabilistic Neural Network (PNN) was used to identify which cylinder has faults. Finally, it was demonstrated that the networks trained purely on simulated data can efficiently detect piston slap faults in real tests and identify the location and severity of the faults as well.
Identification of active fault using analysis of derivatives with vertical second based on gravity anomaly data (Case study: Seulimeum fault in Sumatera fault system)

NASA Astrophysics Data System (ADS)

Hududillah, Teuku Hafid; Simanjuntak, Andrean V. H.; Husni, Muhammad

2017-07-01

Gravity is a non-destructive geophysical technique that has numerous application in engineering and environmental field like locating a fault zone. The purpose of this study is to spot the Seulimeum fault system in Iejue, Aceh Besar (Indonesia) by using a gravity technique and correlate the result with geologic map and conjointly to grasp a trend pattern of fault system. An estimation of subsurface geological structure of Seulimeum fault has been done by using gravity field anomaly data. Gravity anomaly data which used in this study is from Topex that is processed up to Free Air Correction. The step in the Next data processing is applying Bouger correction and Terrin Correction to obtain complete Bouger anomaly that is topographically dependent. Subsurface modeling is done using the Gav2DC for windows software. The result showed a low residual gravity value at a north half compared to south a part of study space that indicated a pattern of fault zone. Gravity residual was successfully correlate with the geologic map that show the existence of the Seulimeum fault in this study space. The study of earthquake records can be used for differentiating the active and non active fault elements, this gives an indication that the delineated fault elements are active.
Fleet-Wide Prognostic and Health Management Suite: Asset Fault Signature Database

DOE Office of Scientific and Technical Information (OSTI.GOV)

Vivek Agarwal; Nancy J. Lybeck; Randall Bickford

Proactive online monitoring in the nuclear industry is being explored using the Electric Power Research Institute’s Fleet-Wide Prognostic and Health Management (FW-PHM) Suite software. The FW-PHM Suite is a set of web-based diagnostic and prognostic tools and databases that serves as an integrated health monitoring architecture. The FW-PHM Suite has four main modules: (1) Diagnostic Advisor, (2) Asset Fault Signature (AFS) Database, (3) Remaining Useful Life Advisor, and (4) Remaining Useful Life Database. The paper focuses on the AFS Database of the FW-PHM Suite, which is used to catalog asset fault signatures. A fault signature is a structured representation ofmore » the information that an expert would use to first detect and then verify the occurrence of a specific type of fault. The fault signatures developed to assess the health status of generator step-up transformers are described in the paper. The developed fault signatures capture this knowledge and implement it in a standardized approach, thereby streamlining the diagnostic and prognostic process. This will support the automation of proactive online monitoring techniques in nuclear power plants to diagnose incipient faults, perform proactive maintenance, and estimate the remaining useful life of assets.« less
Effect of off-fault low-velocity elastic inclusions on supershear rupture dynamics

NASA Astrophysics Data System (ADS)

Ma, Xiao; Elbanna, A. E.

2015-10-01

Heterogeneous velocity structures are expected to affect fault rupture dynamics. To quantitatively evaluate some of these effects, we examine a model of dynamic rupture on a frictional fault embedded in an elastic full space, governed by plane strain elasticity, with a pair of off-fault inclusions that have a lower rigidity than the background medium. We solve the elastodynamic problem using the Finite Element software Pylith. The fault operates under linear slip-weakening friction law. We initiate the rupture by artificially overstressing a localized region near the left edge of the fault. We primarily consider embedded soft inclusions with 20 per cent reduction in both the pressure wave and shear wave speeds. The embedded inclusions are placed at different distances from the fault surface and have different sizes. We show that the existence of a soft inclusion may significantly shorten the transition length to supershear propagation through the Burridge-Andrews mechanism. We also observe that supershear rupture is generated at pre-stress values that are lower than what is theoretically predicted for a homogeneous medium. We discuss the implications of our results for dynamic rupture propagation in complex velocity structures as well as supershear propagation on understressed faults.
The correlation of 2D-resistivity and magnetic methods in fault verification at northern Sumatra, Indonesia

NASA Astrophysics Data System (ADS)

Kamaruddin, Nur Aminuda; Saad, Rosli; Nordiana, M. M.; Azwin, I. N.

2015-04-01

The Great Sumatra Fault system was split into two sub-parallel lines or segments at the Northern Sumatra. This event is one of the impacts of powerful earthquakes that hit Sumatra Island especially one that occurred in 2004. These two sub-parallel segments known as Aceh and Seulimeum fault. The study is focused on the Seulimeum fault and two geophysical methods chosen aimed to compare and verified the result obtained respectively. 2-D resistivity method is a common geophysical method used in determination of near surface structures such as faults, cavities, voids and sinkholes. Meanwhile, the magnetic method often chosen to delineate subsurface structures, determine depth of magnetic source bodies and possibly sediment thickness. Three survey lines of resistivity method and randomly magnetic stations were carried out covering Krueng district. The resistivity data processed using Res2Dinv and result presented using Surfer software. The fault identified by the contrast of low and high resistivity value. Meanwhile, the magnetic data were presented in magnetic residual contour map and the extended fault system is suspected represent by the contrast value of the magnetic anomalies. Within suspected fault zone, the results of resistivity are tally with magnetic result.
Using Combined SFTA and SFMECA Techniques for Space Critical Software

NASA Astrophysics Data System (ADS)

Nicodemos, F. G.; Lahoz, C. H. N.; Abdala, M. A. D.; Saotome, O.

2012-01-01

This work addresses the combined Software Fault Tree Analysis (SFTA) and Software Failure Modes, Effects and Criticality Analysis (SFMECA) techniques applied to space critical software of satellite launch vehicles. The combined approach is under research as part of the Verification and Validation (V&V) efforts to increase software dependability and as future application in other projects under development at Instituto de Aeronáutica e Espaço (IAE). The applicability of such approach was conducted on system software specification and applied to a case study based on the Brazilian Satellite Launcher (VLS). The main goal is to identify possible failure causes and obtain compensating provisions that lead to inclusion of new functional and non-functional system software requirements.
Continuous Fine-Fault Estimation with Real-Time GNSS

NASA Astrophysics Data System (ADS)

Norford, B. B.; Melbourne, T. I.; Szeliga, W. M.; Santillan, V. M.; Scrivner, C.; Senko, J.; Larsen, D.

2017-12-01

Thousands of real-time telemetered GNSS stations operate throughout the circum-Pacific that may be used for rapid earthquake characterization and estimation of local tsunami excitation. We report on the development of a GNSS-based finite-fault inversion system that continuously estimates slip using real-time GNSS position streams from the Cascadia subduction zone and which is being expanded throughout the circum-Pacific. The system uses 1 Hz precise point position streams computed in the ITRF14 reference frame using clock and satellite orbit corrections from the IGS. The software is implemented as seven independent modules that filter time series using Kalman filters, trigger and estimate coseismic offsets, invert for slip using a non-negative least squares method developed by Lawson and Hanson (1974) and elastic half-space Green's Functions developed by Okada (1985), smooth the results temporally and spatially, and write the resulting streams of time-dependent slip to a RabbitMQ messaging server for use by downstream modules such as tsunami excitation modules. Additional fault models can be easily added to the system for other circum-Pacific subduction zones as additional real-time GNSS data become available. The system is currently being tested using data from well-recorded earthquakes including the 2011 Tohoku earthquake, the 2010 Maule earthquake, the 2015 Illapel earthquake, the 2003 Tokachi-oki earthquake, the 2014 Iquique earthquake, the 2010 Mentawai earthquake, the 2016 Kaikoura earthquake, the 2016 Ecuador earthquake, the 2015 Gorkha earthquake, and others. Test data will be fed to the system and the resultant earthquake characterizations will be compared with published earthquake parameters. Seismic events will be assumed to occur on major faults, so, for example, only the San Andreas fault will be considered in Southern California, while the hundreds of other faults in the region will be ignored. Rake will be constrained along each subfault to be consistent with NUVEL-1 plate convergence directions. This software provides a basis for a GNSS-based rapid earthquake finite fault estimation system with global scope.
Design and evaluation of a fault-tolerant multiprocessor using hardware recovery blocks

NASA Technical Reports Server (NTRS)

Lee, Y. H.; Shin, K. G.

1982-01-01

A fault-tolerant multiprocessor with a rollback recovery mechanism is discussed. The rollback mechanism is based on the hardware recovery block which is a hardware equivalent to the software recovery block. The hardware recovery block is constructed by consecutive state-save operations and several state-save units in every processor and memory module. When a fault is detected, the multiprocessor reconfigures itself to replace the faulty component and then the process originally assigned to the faulty component retreats to one of the previously saved states in order to resume fault-free execution. A mathematical model is proposed to calculate both the coverage of multi-step rollback recovery and the risk of restart. A performance evaluation in terms of task execution time is also presented.
Havens: Explicit Reliable Memory Regions for HPC Applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hukerikar, Saurabh; Engelmann, Christian

2016-01-01

Supporting error resilience in future exascale-class supercomputing systems is a critical challenge. Due to transistor scaling trends and increasing memory density, scientific simulations are expected to experience more interruptions caused by transient errors in the system memory. Existing hardware-based detection and recovery techniques will be inadequate to manage the presence of high memory fault rates. In this paper we propose a partial memory protection scheme based on region-based memory management. We define the concept of regions called havens that provide fault protection for program objects. We provide reliability for the regions through a software-based parity protection mechanism. Our approach enablesmore » critical program objects to be placed in these havens. The fault coverage provided by our approach is application agnostic, unlike algorithm-based fault tolerance techniques.« less
Portable Automated Test Station: Using Engineering-Design Partnerships to Replace Obsolete Test Systems

DTIC Science & Technology

2015-04-01

troubleshooting avionics system faults while the aircraft is on the ground. The core component of the PATS-30, the ruggedized laptop, is no longer sustainable...as well as trouble shooting avionics system faults while the aircraft is on the ground. The PATS-70 utilizes up-to-date, sustainable technology for...Operational Flight Program (OFP) software loading and diagnostic avionics system testing and includes additional TPSs to enhance its capability
MAGMA: A Liquid Software Approach to Fault Tolerance, Computer Network Security, and Survivable Networking

DTIC Science & Technology

2001-12-01

and Lieutenant Namik Kaplan , Turkish Navy. Maj Tiefert’s thesis, “Modeling Control Channel Dynamics of SAAM using NS Network Simulation”, helped lay...DEC99] Deconinck , Dr. ir. Geert, Fault Tolerant Systems, ESAT / Division ACCA , Katholieke Universiteit Leuven, October 1999. [FRE00] Freed...Systems”, Addison-Wesley, 1989. [KAP99] Kaplan , Namik, “Prototyping of an Active and Lightweight Router,” March 1999 [KAT99] Kati, Effraim
Flight test results of the strapdown hexad inertial reference unit (SIRU). Volume 2: Test report

NASA Technical Reports Server (NTRS)

Hruby, R. J.; Bjorkman, W. S.

1977-01-01

Results of flight tests of the Strapdown Inertial Reference Unit (SIRU) navigation system are presented. The fault tolerant SIRU navigation system features a redundant inertial sensor unit and dual computers. System software provides for detection and isolation of inertial sensor failures and continued operation in the event of failures. Flight test results include assessments of the system's navigational performance and fault tolerance. Performance shortcomings are analyzed.

Towards an operational fault isolation expert system for French telecommunication satellite Telecom 2

NASA Astrophysics Data System (ADS)

Haziza, M.

1990-10-01

The DIAMS satellite fault isolation expert system shell concept is described. The project, initiated in 1985, has led to the development of a prototype Expert System (ES) dedicated to the Telecom 1 attitude and orbit control system. The prototype ES has been installed in the Telecom 1 satellite control center and evaluated by Telecom 1 operations. The development of a fault isolation ES covering a whole spacecraft (the French telecommunication satellite Telecom 2) is currently being undertaken. Full scale industrial applications raise stringent requirements in terms of knowledge management and software development methodology. The approach used by MATRA ESPACE to face this challenge is outlined.
Multicore Considerations for Legacy Flight Software Migration

NASA Technical Reports Server (NTRS)

Vines, Kenneth; Day, Len

2013-01-01

In this paper we will discuss potential benefits and pitfalls when considering a migration from an existing single core code base to a multicore processor implementation. The results of this study present options that should be considered before migrating fault managers, device handlers and tasks with time-constrained requirements to a multicore flight software environment. Possible future multicore test bed demonstrations are also discussed.
NASA Tech Briefs, February 2004

NASA Technical Reports Server (NTRS)

2004-01-01

Topics include: Simulation Testing of Embedded Flight Software; Improved Indentation Test for Measuring Nonlinear Elasticity; Ultraviolet-Absorption Spectroscopic Biofilm Monitor; Electronic Tongue for Quantitation of Contaminants in Water; Radar for Measuring Soil Moisture Under Vegetation; Modular Wireless Data-Acquisition and Control System; Microwave System for Detecting Ice on Aircraft; Routing Algorithm Exploits Spatial Relations; Two-Finger EKG Method of Detecting Evasive Responses; Updated System-Availability and Resource-Allocation Program; Routines for Computing Pressure Drops in Venturis; Software for Fault-Tolerant Matrix Multiplication; Reproducible Growth of High-Quality Cubic-SiC Layers; Nonlinear Thermoelastic Model for SMAs and SMA Hybrid Composites; Liquid-Crystal Thermosets, a New Generation of High-Performance Liquid-Crystal Polymers; Formulations for Stronger Solid Oxide Fuel-Cell Electrolytes; Simulation of Hazards and Poses for a Rocker-Bogie Rover; Autonomous Formation Flight; Expandable Purge Chambers Would Protect Cryogenic Fittings; Wavy-Planform Helicopter Blades Make Less Noise; Miniature Robotic Spacecraft for Inspecting Other Spacecraft; Miniature Ring-Shaped Peristaltic Pump; Compact Plasma Accelerator; Improved Electrohydraulic Linear Actuators; A Software Architecture for Semiautonomous Robot Control; Fabrication of Channels for Nanobiotechnological Devices; Improved Thin, Flexible Heat Pipes; Miniature Radioisotope Thermoelectric Power Cubes; Permanent Sequestration of Emitted Gases in the Form of Clathrate Hydrates; Electrochemical, H2O2-Boosted Catalytic Oxidation System; Electrokinetic In Situ Treatment of Metal-Contaminated Soil; Pumping Liquid Oxygen by Use of Pulsed Magnetic Fields; Magnetocaloric Pumping of Liquid Oxygen; Tailoring Ion-Thruster Grid Apertures for Greater Efficiency; and Lidar for Guidance of a Spacecraft or Exploratory Robot.
Surface Rupture Map of the 2002 M7.9 Denali Fault Earthquake, Alaska: Digital Data

USGS Publications Warehouse

Haeussler, Peter J.

2009-01-01

The November 3, 2002, Mw7.9 Denali Fault earthquake produced about 340 km of surface rupture along the Susitna Glacier Thrust Fault and the right-lateral, strike-slip Denali and Totschunda Faults. Digital photogrammetric methods were primarily used to create a 1:500-scale, three-dimensional surface rupture map, and 1:6,000-scale aerial photographs were used for three-dimensional digitization in ESRI's ArcMap GIS software, using Leica's StereoAnalyst plug in. Points were digitized 4.3 m apart, on average, for the entire surface rupture. Earthquake-induced landslides, sackungen, and unruptured Holocene fault scarps on the eastern Denali Fault were also digitized where they lay within the limits of air photo coverage. This digital three-dimensional fault-trace map is superior to traditional maps in terms of relative and absolute accuracy, completeness, and detail and is used as a basis for three-dimensional visualization. Field work complements the air photo observations in locations of dense vegetation, on bedrock, or in areas where the surface trace is weakly developed. Seventeen km of the fault trace, which broke through glacier ice, were not digitized in detail due to time constraints, and air photos missed another 10 km of fault rupture through the upper Black Rapids Glacier, so that was not mapped in detail either.
Development of Asset Fault Signatures for Prognostic and Health Management in the Nuclear Industry

DOE Office of Scientific and Technical Information (OSTI.GOV)

Vivek Agarwal; Nancy J. Lybeck; Randall Bickford

2014-06-01

Proactive online monitoring in the nuclear industry is being explored using the Electric Power Research Institute’s Fleet-Wide Prognostic and Health Management (FW-PHM) Suite software. The FW-PHM Suite is a set of web-based diagnostic and prognostic tools and databases that serves as an integrated health monitoring architecture. The FW-PHM Suite has four main modules: Diagnostic Advisor, Asset Fault Signature (AFS) Database, Remaining Useful Life Advisor, and Remaining Useful Life Database. This paper focuses on development of asset fault signatures to assess the health status of generator step-up generators and emergency diesel generators in nuclear power plants. Asset fault signatures describe themore » distinctive features based on technical examinations that can be used to detect a specific fault type. At the most basic level, fault signatures are comprised of an asset type, a fault type, and a set of one or more fault features (symptoms) that are indicative of the specified fault. The AFS Database is populated with asset fault signatures via a content development exercise that is based on the results of intensive technical research and on the knowledge and experience of technical experts. The developed fault signatures capture this knowledge and implement it in a standardized approach, thereby streamlining the diagnostic and prognostic process. This will support the automation of proactive online monitoring techniques in nuclear power plants to diagnose incipient faults, perform proactive maintenance, and estimate the remaining useful life of assets.« less
On the nature of bias and defects in the software specification process

NASA Technical Reports Server (NTRS)

Straub, Pablo A.; Zelkowitz, Marvin V.

1992-01-01

Implementation bias in a specification is an arbitrary constraint in the solution space. This paper describes the problem of bias. Additionally, this paper presents a model of the specification and design processes describing individual subprocesses in terms of precision/detail diagrams and a model of bias in multi-attribute software specifications. While studying how bias is introduced into a specification we realized that software defects and bias are dual problems of a single phenomenon. This was used to explain the large proportion of faults found during the coding phase at the Software Engineering Laboratory at NASA/GSFC.
Solar Photovoltaic (PV) Distributed Generation Systems - Control and Protection

NASA Astrophysics Data System (ADS)

Yi, Zhehan

This dissertation proposes a comprehensive control, power management, and fault detection strategy for solar photovoltaic (PV) distribution generations. Battery storages are typically employed in PV systems to mitigate the power fluctuation caused by unstable solar irradiance. With AC and DC loads, a PV-battery system can be treated as a hybrid microgrid which contains both DC and AC power resources and buses. In this thesis, a control power and management system (CAPMS) for PV-battery hybrid microgrid is proposed, which provides 1) the DC and AC bus voltage and AC frequency regulating scheme and controllers designed to track set points; 2) a power flow management strategy in the hybrid microgrid to achieve system generation and demand balance in both grid-connected and islanded modes; 3) smooth transition control during grid reconnection by frequency and phase synchronization control between the main grid and microgrid. Due to the increasing demands for PV power, scales of PV systems are getting larger and fault detection in PV arrays becomes challenging. High-impedance faults, low-mismatch faults, and faults occurred in low irradiance conditions tend to be hidden due to low fault currents, particularly, when a PV maximum power point tracking (MPPT) algorithm is in-service. If remain undetected, these faults can considerably lower the output energy of solar systems, damage the panels, and potentially cause fire hazards. In this dissertation, fault detection challenges in PV arrays are analyzed in depth, considering the crossing relations among the characteristics of PV, interactions with MPPT algorithms, and the nature of solar irradiance. Two fault detection schemes are then designed as attempts to address these technical issues, which detect faults inside PV arrays accurately even under challenging circumstances, e.g., faults in low irradiance conditions or high-impedance faults. Taking advantage of multi-resolution signal decomposition (MSD), a powerful signal processing technique based on discrete wavelet transformation (DWT), the first attempt is devised, which extracts the features of both line-to-line (L-L) and line-to-ground (L-G) faults and employs a fuzzy inference system (FIS) for the decision-making stage of fault detection. This scheme is then improved as the second attempt by further studying the system's behaviors during L-L faults, extracting more efficient fault features, and devising a more advanced decision-making stage: the two-stage support vector machine (SVM). For the first time, the two-stage SVM method is proposed in this dissertation to detect L-L faults in PV system with satisfactory accuracies. Numerous simulation and experimental case studies are carried out to verify the proposed control and protection strategies. Simulation environment is set up using the PSCAD/EMTDC and Matlab/Simulink software packages. Experimental case studies are conducted in a PV-battery hybrid microgrid using the dSPACE real-time controller to demonstrate the ease of hardware implementation and the controller performance. Another small-scale grid-connected PV system is set up to verify both fault detection algorithms which demonstrate promising performances and fault detecting accuracies.
Use of Soft Computing Technologies for a Qualitative and Reliable Engine Control System for Propulsion Systems

NASA Technical Reports Server (NTRS)

Trevino, Luis; Brown, Terry; Crumbley, R. T. (Technical Monitor)

2001-01-01

The problem to be addressed in this paper is to explore how the use of Soft Computing Technologies (SCT) could be employed to improve overall vehicle system safety, reliability, and rocket engine performance by development of a qualitative and reliable engine control system (QRECS). Specifically, this will be addressed by enhancing rocket engine control using SCT, innovative data mining tools, and sound software engineering practices used in Marshall's Flight Software Group (FSG). The principle goals for addressing the issue of quality are to improve software management, software development time, software maintenance, processor execution, fault tolerance and mitigation, and nonlinear control in power level transitions. The intent is not to discuss any shortcomings of existing engine control methodologies, but to provide alternative design choices for control, implementation, performance, and sustaining engineering, all relative to addressing the issue of reliability. The approaches outlined in this paper will require knowledge in the fields of rocket engine propulsion (system level), software engineering for embedded flight software systems, and soft computing technologies (i.e., neural networks, fuzzy logic, data mining, and Bayesian belief networks); some of which are briefed in this paper. For this effort, the targeted demonstration rocket engine testbed is the MC-1 engine (formerly FASTRAC) which is simulated with hardware and software in the Marshall Avionics & Software Testbed (MAST) laboratory that currently resides at NASA's Marshall Space Flight Center, building 4476, and is managed by the Avionics Department. A brief plan of action for design, development, implementation, and testing a Phase One effort for QRECS is given, along with expected results. Phase One will focus on development of a Smart Start Engine Module and a Mainstage Engine Module for proper engine start and mainstage engine operations. The overall intent is to demonstrate that by employing soft computing technologies, the quality and reliability of the overall scheme to engine controller development is further improved and vehicle safety is further insured. The final product that this paper proposes is an approach to development of an alternative low cost engine controller that would be capable of performing in unique vision spacecraft vehicles requiring low cost advanced avionics architectures for autonomous operations from engine pre-start to engine shutdown.
SCEC-VDO: A New 3-Dimensional Visualization and Movie Making Software for Earth Science Data

NASA Astrophysics Data System (ADS)

Milner, K. R.; Sanskriti, F.; Yu, J.; Callaghan, S.; Maechling, P. J.; Jordan, T. H.

2016-12-01

Researchers and undergraduate interns at the Southern California Earthquake Center (SCEC) have created a new 3-dimensional (3D) visualization software tool called SCEC Virtual Display of Objects (SCEC-VDO). SCEC-VDO is written in Java and uses the Visualization Toolkit (VTK) backend to render 3D content. SCEC-VDO offers advantages over existing 3D visualization software for viewing georeferenced data beneath the Earth's surface. Many popular visualization packages, such as Google Earth, restrict the user to views of the Earth from above, obstructing views of geological features such as faults and earthquake hypocenters at depth. SCEC-VDO allows the user to view data both above and below the Earth's surface at any angle. It includes tools for viewing global earthquakes from the U.S. Geological Survey, faults from the SCEC Community Fault Model, and results from the latest SCEC models of earthquake hazards in California including UCERF3 and RSQSim. Its object-oriented plugin architecture allows for the easy integration of new regional and global datasets, regardless of the science domain. SCEC-VDO also features rich animation capabilities, allowing users to build a timeline with keyframes of camera position and displayed data. The software is built with the concept of statefulness, allowing for reproducibility and collaboration using an xml file. A prior version of SCEC-VDO, which began development in 2005 under the SCEC Undergraduate Studies in Earthquake Information Technology internship, used the now unsupported Java3D library. Replacing Java3D with the widely supported and actively developed VTK libraries not only ensures that SCEC-VDO can continue to function for years to come, but allows for the export of 3D scenes to web viewers and popular software such as Paraview. SCEC-VDO runs on all recent 64-bit Windows, Mac OS X, and Linux systems with Java 8 or later. More information, including downloads, tutorials, and example movies created fully within SCEC-VDO is available here: http://scecvdo.usc.edu
Onboard Sensor Data Qualification in Human-Rated Launch Vehicles

NASA Technical Reports Server (NTRS)

Wong, Edmond; Melcher, Kevin J.; Maul, William A.; Chicatelli, Amy K.; Sowers, Thomas S.; Fulton, Christopher; Bickford, Randall

2012-01-01

The avionics system software for human-rated launch vehicles requires an implementation approach that is robust to failures, especially the failure of sensors used to monitor vehicle conditions that might result in an abort determination. Sensor measurements provide the basis for operational decisions on human-rated launch vehicles. This data is often used to assess the health of system or subsystem components, to identify failures, and to take corrective action. An incorrect conclusion and/or response may result if the sensor itself provides faulty data, or if the data provided by the sensor has been corrupted. Operational decisions based on faulty sensor data have the potential to be catastrophic, resulting in loss of mission or loss of crew. To prevent these later situations from occurring, a Modular Architecture and Generalized Methodology for Sensor Data Qualification in Human-rated Launch Vehicles has been developed. Sensor Data Qualification (SDQ) is a set of algorithms that can be implemented in onboard flight software, and can be used to qualify data obtained from flight-critical sensors prior to the data being used by other flight software algorithms. Qualified data has been analyzed by SDQ and is determined to be a true representation of the sensed system state; that is, the sensor data is determined not to be corrupted by sensor faults or signal transmission faults. Sensor data can become corrupted by faults at any point in the signal path between the sensor and the flight computer. Qualifying the sensor data has the benefit of ensuring that erroneous data is identified and flagged before otherwise being used for operational decisions, thus increasing confidence in the response of the other flight software processes using the qualified data, and decreasing the probability of false alarms or missed detections.
A grid-doubling finite-element technique for calculating dynamic three-dimensional spontaneous rupture on an earthquake fault

USGS Publications Warehouse

Barall, Michael

2009-01-01

We present a new finite-element technique for calculating dynamic 3-D spontaneous rupture on an earthquake fault, which can reduce the required computational resources by a factor of six or more, without loss of accuracy. The grid-doubling technique employs small cells in a thin layer surrounding the fault. The remainder of the modelling volume is filled with larger cells, typically two or four times as large as the small cells. In the resulting non-conforming mesh, an interpolation method is used to join the thin layer of smaller cells to the volume of larger cells. Grid-doubling is effective because spontaneous rupture calculations typically require higher spatial resolution on and near the fault than elsewhere in the model volume. The technique can be applied to non-planar faults by morphing, or smoothly distorting, the entire mesh to produce the desired 3-D fault geometry. Using our FaultMod finite-element software, we have tested grid-doubling with both slip-weakening and rate-and-state friction laws, by running the SCEC/USGS 3-D dynamic rupture benchmark problems. We have also applied it to a model of the Hayward fault, Northern California, which uses realistic fault geometry and rock properties. FaultMod implements fault slip using common nodes, which represent motion common to both sides of the fault, and differential nodes, which represent motion of one side of the fault relative to the other side. We describe how to modify the traction-at-split-nodes method to work with common and differential nodes, using an implicit time stepping algorithm.
NASA integrated vehicle health management technology experiment for X-37

NASA Astrophysics Data System (ADS)

Schwabacher, Mark; Samuels, Jeff; Brownston, Lee

2002-07-01

The NASA Integrated Vehicle Health Management (IVHM) Technology Experiment for X-37 was intended to run IVHM software on board the X-37 spacecraft. The X-37 is an unpiloted vehicle designed to orbit the Earth for up to 21 days before landing on a runway. The objectives of the experiment were to demonstrate the benefits of in-flight IVHM to the operation of a Reusable Launch Vehicle, to advance the Technology Readiness Level of this IVHM technology within a flight environment, and to demonstrate that the IVHM software could operate on the Vehicle Management Computer. The scope of the experiment was to perform real-time fault detection and isolation for X-37's electrical power system and electro-mechanical actuators. The experiment used Livingstone, a software system that performs diagnosis using a qualitative, model-based reasoning approach that searches system-wide interactions to detect and isolate failures. Two of the challenges we faced were to make this research software more efficient so that it would fit within the limited computational resources that were available to us on the X-37 spacecraft, and to modify it so that it satisfied the X-37's software safety requirements. Although the experiment is currently unfunded, the development effort resulted in major improvements in Livingstone's efficiency and safety. This paper reviews some of the details of the modeling and integration efforts, and some of the lessons that were learned.
Insulation detection of electric vehicle batteries

NASA Astrophysics Data System (ADS)

Dai, Qiqi; Zhu, Zhongwen; Huang, Denggao; Du, Mingxing; Wei, Kexin

2018-06-01

In this paper, an electric vehicle insulation detection method with single side switching fixed resistance is designed, and the hardware and software design of the system are given. The experiment proves that the insulation detection system can detect the insulation resistance in a wide range of resistance values, and accurately report the fault level. This system can effectively monitor the insulation fault between the car body and the high voltage line and avoid the passengers from being injured.
Research, Development and Testing of a Fault-Tolerant FPGA-Based Sequencer for CubeSat Launching Applications

DTIC Science & Technology

2013-03-01

amounts of time and effort to implement. Future testing with commercial, fault-tolerant synthesis software, under a radiation environment, will yield ...initial viewpoint of the author is to take the flash-based FPGA route. This will yield a simple, reconfigurable circuit while providing the added...structure seen in Figure 30. Each of these full adder blocks were replaced in subsequent iterations to yield proper comparison with this baseline
Online Monitoring Technical Basis and Analysis Framework for Large Power Transformers; Interim Report for FY 2012

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nancy J. Lybeck; Vivek Agarwal; Binh T. Pham

The Light Water Reactor Sustainability program at Idaho National Laboratory (INL) is actively conducting research to develop and demonstrate online monitoring (OLM) capabilities for active components in existing Nuclear Power Plants. A pilot project is currently underway to apply OLM to Generator Step-Up Transformers (GSUs) and Emergency Diesel Generators (EDGs). INL and the Electric Power Research Institute (EPRI) are working jointly to implement the pilot project. The EPRI Fleet-Wide Prognostic and Health Management (FW-PHM) Software Suite will be used to implement monitoring in conjunction with utility partners: the Shearon Harris Nuclear Generating Station (owned by Duke Energy for GSUs, andmore » Braidwood Generating Station (owned by Exelon Corporation) for EDGs. This report presents monitoring techniques, fault signatures, and diagnostic and prognostic models for GSUs. GSUs are main transformers that are directly connected to generators, stepping up the voltage from the generator output voltage to the highest transmission voltages for supplying electricity to the transmission grid. Technical experts from Shearon Harris are assisting INL and EPRI in identifying critical faults and defining fault signatures associated with each fault. The resulting diagnostic models will be implemented in the FW-PHM Software Suite and tested using data from Shearon-Harris. Parallel research on EDGs is being conducted, and will be reported in an interim report during the first quarter of fiscal year 2013.« less
The SCEC/UseIT Intern Program: Creating Open-Source Visualization Software Using Diverse Resources

NASA Astrophysics Data System (ADS)

Francoeur, H.; Callaghan, S.; Perry, S.; Jordan, T.

2004-12-01

The Southern California Earthquake Center undergraduate IT intern program (SCEC UseIT) conducts IT research to benefit collaborative earth science research. Through this program, interns have developed real-time, interactive, 3D visualization software using open-source tools. Dubbed LA3D, a distribution of this software is now in use by the seismic community. LA3D enables the user to interactively view Southern California datasets and models of importance to earthquake scientists, such as faults, earthquakes, fault blocks, digital elevation models, and seismic hazard maps. LA3D is now being extended to support visualizations anywhere on the planet. The new software, called SCEC-VIDEO (Virtual Interactive Display of Earth Objects), makes use of a modular, plugin-based software architecture which supports easy development and integration of new data sets. Currently SCEC-VIDEO is in beta testing, with a full open-source release slated for the future. Both LA3D and SCEC-VIDEO were developed using a wide variety of software technologies. These, which included relational databases, web services, software management technologies, and 3-D graphics in Java, were necessary to integrate the heterogeneous array of data sources which comprise our software. Currently the interns are working to integrate new technologies and larger data sets to increase software functionality and value. In addition, both LA3D and SCEC-VIDEO allow the user to script and create movies. Thus program interns with computer science backgrounds have been writing software while interns with other interests, such as cinema, geology, and education, have been making movies that have proved of great use in scientific talks, media interviews, and education. Thus, SCEC UseIT incorporates a wide variety of scientific and human resources to create products of value to the scientific and outreach communities. The program plans to continue with its interdisciplinary approach, increasing the relevance of the software and expanding its use in the scientific community.
Comparison between wavelet and wavelet packet transform features for classification of faults in distribution system

NASA Astrophysics Data System (ADS)

Arvind, Pratul

2012-11-01

The ability to identify and classify all ten types of faults in a distribution system is an important task for protection engineers. Unlike transmission system, distribution systems have a complex configuration and are subjected to frequent faults. In the present work, an algorithm has been developed for identifying all ten types of faults in a distribution system by collecting current samples at the substation end. The samples are subjected to wavelet packet transform and artificial neural network in order to yield better classification results. A comparison of results between wavelet transform and wavelet packet transform is also presented thereby justifying the feature extracted from wavelet packet transform yields promising results. It should also be noted that current samples are collected after simulating a 25kv distribution system in PSCAD software.
Integrated Software Health Management for Aircraft GN and C

NASA Technical Reports Server (NTRS)

Schumann, Johann; Mengshoel, Ole

2011-01-01

Modern aircraft rely heavily on dependable operation of many safety-critical software components. Despite careful design, verification and validation (V&V), on-board software can fail with disastrous consequences if it encounters problematic software/hardware interaction or must operate in an unexpected environment. We are using a Bayesian approach to monitor the software and its behavior during operation and provide up-to-date information about the health of the software and its components. The powerful reasoning mechanism provided by our model-based Bayesian approach makes reliable diagnosis of the root causes possible and minimizes the number of false alarms. Compilation of the Bayesian model into compact arithmetic circuits makes SWHM feasible even on platforms with limited CPU power. We show initial results of SWHM on a small simulator of an embedded aircraft software system, where software and sensor faults can be injected.
Research and design of portable photoelectric rotary table data-acquisition and analysis system

NASA Astrophysics Data System (ADS)

Yang, Dawei; Yang, Xiufang; Han, Junfeng; Yan, Xiaoxu

2015-02-01

Photoelectric rotary table as the main test tracking measurement platform, widely use in shooting range and aerospace fields. In the range of photoelectric tracking measurement system, in order to meet the photoelectric testing instruments and equipment of laboratory and field application demand, research and design the portable photoelectric rotary table data acquisition and analysis system, and introduces the FPGA device based on Xilinx company Virtex-4 series and its peripheral module of the system hardware design, and the software design of host computer in VC++ 6.0 programming platform and MFC package based on class libraries. The data acquisition and analysis system for data acquisition, display and storage, commission control, analysis, laboratory wave playback, transmission and fault diagnosis, and other functions into an organic whole, has the advantages of small volume, can be embedded, high speed, portable, simple operation, etc. By photoelectric tracking turntable as experimental object, carries on the system software and hardware alignment, the experimental results show that the system can realize the data acquisition, analysis and processing of photoelectric tracking equipment and control of turntable debugging good, and measurement results are accurate, reliable and good maintainability and extensibility. The research design for advancing the photoelectric tracking measurement equipment debugging for diagnosis and condition monitoring and fault analysis as well as the standardization and normalization of the interface and improve the maintainability of equipment is of great significance, and has certain innovative and practical value.
Mechatronics technology in predictive maintenance method

NASA Astrophysics Data System (ADS)

Majid, Nurul Afiqah A.; Muthalif, Asan G. A.

2017-11-01

This paper presents recent mechatronics technology that can help to implement predictive maintenance by combining intelligent and predictive maintenance instrument. Vibration Fault Simulation System (VFSS) is an example of mechatronics system. The focus of this study is the prediction on the use of critical machines to detect vibration. Vibration measurement is often used as the key indicator of the state of the machine. This paper shows the choice of the appropriate strategy in the vibration of diagnostic process of the mechanical system, especially rotating machines, in recognition of the failure during the working process. In this paper, the vibration signature analysis is implemented to detect faults in rotary machining that includes imbalance, mechanical looseness, bent shaft, misalignment, missing blade bearing fault, balancing mass and critical speed. In order to perform vibration signature analysis for rotating machinery faults, studies have been made on how mechatronics technology is used as predictive maintenance methods. Vibration Faults Simulation Rig (VFSR) is designed to simulate and understand faults signatures. These techniques are based on the processing of vibrational data in frequency-domain. The LabVIEW-based spectrum analyzer software is developed to acquire and extract frequency contents of faults signals. This system is successfully tested based on the unique vibration fault signatures that always occur in a rotating machinery.

The application of structure from motion (SfM) to identify the geological structure and outcrop studies

NASA Astrophysics Data System (ADS)

Saputra, Aditya; Rahardianto, Trias; Gomez, Christopher

2017-07-01

Adequate knowledge of geological structure is an essential for most studies in geoscience, mineral exploration, geo-hazard and disaster management. The geological map is still one the datasets the most commonly used to obtain information about the geological structure such as fault, joint, fold, and unconformities, however in rural areas such as Central Java data is still sparse. Recent progress in data acquisition technologies and computing have increased the interest in how to capture the high-resolution geological data effectively and for a relatively low cost. Some methods such as Airborne Laser Scanning (ALS), Terrestrial Laser Scanning (TLS), and Unmanned Aerial Vehicles (UAVs) have been widely used to obtain this information, however, these methods need a significant investment in hardware, software, and time. Resolving some of those issues, the photogrammetric method structure from motion (SfM) is an image-based method, which can provide solutions equivalent to laser technologies for a relatively low-cost with minimal time, specialization and financial investment. Using SfM photogrammetry, it is possible to generate high resolution 3D images rock surfaces and outcrops, in order to improve the geological understanding of Indonesia. In the present contribution, it is shown that the information about fault and joint can be obtained at high-resolution and in a shorter time than with the conventional grid mapping and remotely sensed topographic surveying. The SfM method produces a point-cloud through image matching and computing. This task can be run with open- source or commercial image processing and 3D reconstruction software. As the point cloud has 3D information as well as RGB values, it allows for further analysis such as DEM extraction and image orthorectification processes. The present paper describes some examples of SfM to identify the fault in the outcrops and also highlight the future possibilities in terms of earthquake hazard assessment, based on fieldwork in the South of Yogyakarta City.
Failure Diagnosis for the Holdup Tank System via ISFA

DOE Office of Scientific and Technical Information (OSTI.GOV)

Li, Huijuan; Bragg-Sitton, Shannon; Smidts, Carol

This paper discusses the use of the integrated system failure analysis (ISFA) technique for fault diagnosis for the holdup tank system. ISFA is a simulation-based, qualitative and integrated approach used to study fault propagation in systems containing both hardware and software subsystems. The holdup tank system consists of a tank containing a fluid whose level is controlled by an inlet valve and an outlet valve. We introduce the component and functional models of the system, quantify the main parameters and simulate possible failure-propagation paths based on the fault propagation approach, ISFA. The results show that most component failures in themore » holdup tank system can be identified clearly and that ISFA is viable as a technique for fault diagnosis. Since ISFA is a qualitative technique that can be used in the very early stages of system design, this case study provides indications that it can be used early to study design aspects that relate to robustness and fault tolerance.« less
Advanced information processing system

NASA Technical Reports Server (NTRS)

Lala, J. H.

1984-01-01

Design and performance details of the advanced information processing system (AIPS) for fault and damage tolerant data processing on aircraft and spacecraft are presented. AIPS comprises several computers distributed throughout the vehicle and linked by a damage tolerant data bus. Most I/O functions are available to all the computers, which run in a TDMA mode. Each computer performs separate specific tasks in normal operation and assumes other tasks in degraded modes. Redundant software assures that all fault monitoring, logging and reporting are automated, together with control functions. Redundant duplex links and damage-spread limitation provide the fault tolerance. Details of an advanced design of a laboratory-scale proof-of-concept system are described, including functional operations.
Java for flight software

NASA Technical Reports Server (NTRS)

Benowitz, E. G.; Niessner, A. F.

2003-01-01

We have successfully demonstrated a portion of the spacecraft attitude control and fault protection, running on a standard Java platform, and are currently in the process of taking advantage of the features provided by the RTSJ.
Software for determining the direction of movement, shear and normal stresses of a fault under a determined stress state

NASA Astrophysics Data System (ADS)

Álvarez del Castillo, Alejandra; Alaniz-Álvarez, Susana Alicia; Nieto-Samaniego, Angel Francisco; Xu, Shunshan; Ochoa-González, Gil Humberto; Velasquillo-Martínez, Luis Germán

2017-07-01

In the oil, gas and geothermal industry, the extraction or the input of fluids induces changes in the stress field of the reservoir, if the in-situ stress state of a fault plane is sufficiently disturbed, a fault may slip and can trigger fluid leakage or the reservoir might fracture and become damaged. The goal of the SSLIPO 1.0 software is to obtain data that can reduce the risk of affecting the stability of wellbores. The input data are the magnitudes of the three principal stresses and their orientation in geographic coordinates. The output data are the slip direction of a fracture in geographic coordinates, and its normal (σn) and shear (τ) stresses resolved on a single or multiple fracture planes. With this information, it is possible to calculate the slip tendency (τ/σn) and the propensity to open a fracture that is inversely proportional to σn. This software could analyze any compressional stress system, even non-Andersonian. An example is given from an oilfield in southern Mexico, in a region that contains fractures formed in three events of deformation. In the example SSLIPO 1.0 was used to determine in which deformation event the oil migrated. SSLIPO 1.0 is an open code application developed in MATLAB. The URL to obtain the source code and to download SSLIPO 1.0 are: http://www.geociencias.unam.mx/ alaniz/main_code.txt, http://www.geociencias.unam.mx/ alaniz/ SSLIPO_pkg.exe.
Data collection and analysis software development for rotor dynamics testing in spin laboratory

NASA Astrophysics Data System (ADS)

Abdul-Aziz, Ali; Arble, Daniel; Woike, Mark

2017-04-01

Gas turbine engine components undergo high rotational loading another complex environmental conditions. Such operating environment leads these components to experience damages and cracks that can cause catastrophic failure during flights. There are traditional crack detections and health monitoring methodologies currently being used which rely on periodic routine maintenances, nondestructive inspections that often times involve engine and components dis-assemblies. These methods do not also offer adequate information about the faults, especially, if these faults at subsurface or not clearly evident. At NASA Glenn research center, the rotor dynamics laboratory is presently involved in developing newer techniques that are highly dependent on sensor technology to enable health monitoring and prediction of damage and cracks in rotor disks. These approaches are noninvasive and relatively economical. Spin tests are performed using a subscale test article mimicking turbine rotor disk undergoing rotational load. Non-contact instruments such as capacitive and microwave sensors are used to measure the blade tip gap displacement and blade vibrations characteristics in an attempt develop a physics based model to assess/predict the faults in the rotor disk. Data collection is a major component in this experimental-analytical procedure and as a result, an upgrade to an older version of the data acquisition software which is based on LabVIEW program has been implemented to support efficiently running tests and analyze the results. Outcomes obtained from the tests data and related experimental and analytical rotor dynamics modeling including key features of the updated software are presented and discussed.
A Fault Oblivious Extreme-Scale Execution Environment

DOE Office of Scientific and Technical Information (OSTI.GOV)

McKie, Jim

The FOX project, funded under the ASCR X-stack I program, developed systems software and runtime libraries for a new approach to the data and work distribution for massively parallel, fault oblivious application execution. Our work was motivated by the premise that exascale computing systems will provide a thousand-fold increase in parallelism and a proportional increase in failure rate relative to today’s machines. To deliver the capability of exascale hardware, the systems software must provide the infrastructure to support existing applications while simultaneously enabling efficient execution of new programming models that naturally express dynamic, adaptive, irregular computation; coupled simulations; and massivemore » data analysis in a highly unreliable hardware environment with billions of threads of execution. Our OS research has prototyped new methods to provide efficient resource sharing, synchronization, and protection in a many-core compute node. We have experimented with alternative task/dataflow programming models and shown scalability in some cases to hundreds of thousands of cores. Much of our software is in active development through open source projects. Concepts from FOX are being pursued in next generation exascale operating systems. Our OS work focused on adaptive, application tailored OS services optimized for multi → many core processors. We developed a new operating system NIX that supports role-based allocation of cores to processes which was released to open source. We contributed to the IBM FusedOS project, which promoted the concept of latency-optimized and throughput-optimized cores. We built a task queue library based on distributed, fault tolerant key-value store and identified scaling issues. A second fault tolerant task parallel library was developed, based on the Linda tuple space model, that used low level interconnect primitives for optimized communication. We designed fault tolerance mechanisms for task parallel computations employing work stealing for load balancing that scaled to the largest existing supercomputers. Finally, we implemented the Elastic Building Blocks runtime, a library to manage object-oriented distributed software components. To support the research, we won two INCITE awards for time on Intrepid (BG/P) and Mira (BG/Q). Much of our work has had impact in the OS and runtime community through the ASCR Exascale OS/R workshop and report, leading to the research agenda of the Exascale OS/R program. Our project was, however, also affected by attrition of multiple PIs. While the PIs continued to participate and offer guidance as time permitted, losing these key individuals was unfortunate both for the project and for the DOE HPC community.« less
Development of a Software Safety Process and a Case Study of Its Use

NASA Technical Reports Server (NTRS)

Knight, J. C.

1996-01-01

Research in the year covered by this reporting period has been primarily directed toward: continued development of mock-ups of computer screens for operator of a digital reactor control system; development of a reactor simulation to permit testing of various elements of the control system; formal specification of user interfaces; fault-tree analysis including software; evaluation of formal verification techniques; and continued development of a software documentation system. Technical results relating to this grant and the remainder of the principal investigator's research program are contained in various reports and papers.
IEEE/AIAA/NASA Digital Avionics Systems Conference, 9th, Virginia Beach, VA, Oct. 15-18, 1990, Proceedings

NASA Technical Reports Server (NTRS)

1990-01-01

The present conference on digital avionics discusses vehicle-management systems, spacecraft avionics, special vehicle avionics, communication/navigation/identification systems, software qualification and quality assurance, launch-vehicle avionics, Ada applications, sensor and signal processing, general aviation avionics, automated software development, design-for-testability techniques, and avionics-software engineering. Also discussed are optical technology and systems, modular avionics, fault-tolerant avionics, commercial avionics, space systems, data buses, crew-station technology, embedded processors and operating systems, AI and expert systems, data links, and pilot/vehicle interfaces.
A novel N-input voting algorithm for X-by-wire fault-tolerant systems.

PubMed

Karimi, Abbas; Zarafshan, Faraneh; Al-Haddad, S A R; Ramli, Abdul Rahman

2014-01-01

Voting is an important operation in multichannel computation paradigm and realization of ultrareliable and real-time control systems that arbitrates among the results of N redundant variants. These systems include N-modular redundant (NMR) hardware systems and diversely designed software systems based on N-version programming (NVP). Depending on the characteristics of the application and the type of selected voter, the voting algorithms can be implemented for either hardware or software systems. In this paper, a novel voting algorithm is introduced for real-time fault-tolerant control systems, appropriate for applications in which N is large. Then, its behavior has been software implemented in different scenarios of error-injection on the system inputs. The results of analyzed evaluations through plots and statistical computations have demonstrated that this novel algorithm does not have the limitations of some popular voting algorithms such as median and weighted; moreover, it is able to significantly increase the reliability and availability of the system in the best case to 2489.7% and 626.74%, respectively, and in the worst case to 3.84% and 1.55%, respectively.
Distributed controller clustering in software defined networks.

PubMed

Abdelaziz, Ahmed; Fong, Ang Tan; Gani, Abdullah; Garba, Usman; Khan, Suleman; Akhunzada, Adnan; Talebian, Hamid; Choo, Kim-Kwang Raymond

2017-01-01

Software Defined Networking (SDN) is an emerging promising paradigm for network management because of its centralized network intelligence. However, the centralized control architecture of the software-defined networks (SDNs) brings novel challenges of reliability, scalability, fault tolerance and interoperability. In this paper, we proposed a novel clustered distributed controller architecture in the real setting of SDNs. The distributed cluster implementation comprises of multiple popular SDN controllers. The proposed mechanism is evaluated using a real world network topology running on top of an emulated SDN environment. The result shows that the proposed distributed controller clustering mechanism is able to significantly reduce the average latency from 8.1% to 1.6%, the packet loss from 5.22% to 4.15%, compared to distributed controller without clustering running on HP Virtual Application Network (VAN) SDN and Open Network Operating System (ONOS) controllers respectively. Moreover, proposed method also shows reasonable CPU utilization results. Furthermore, the proposed mechanism makes possible to handle unexpected load fluctuations while maintaining a continuous network operation, even when there is a controller failure. The paper is a potential contribution stepping towards addressing the issues of reliability, scalability, fault tolerance, and inter-operability.
Fault-Tolerant Software-Defined Radio on Manycore

NASA Technical Reports Server (NTRS)

Ricketts, Scott

2015-01-01

Software-defined radio (SDR) platforms generally rely on field-programmable gate arrays (FPGAs) and digital signal processors (DSPs), but such architectures require significant software development. In addition, application demands for radiation mitigation and fault tolerance exacerbate programming challenges. MaXentric Technologies, LLC, has developed a manycore-based SDR technology that provides 100 times the throughput of conventional radiationhardened general purpose processors. Manycore systems (30-100 cores and beyond) have the potential to provide high processing performance at error rates that are equivalent to current space-deployed uniprocessor systems. MaXentric's innovation is a highly flexible radio, providing over-the-air reconfiguration; adaptability; and uninterrupted, real-time, multimode operation. The technology is also compliant with NASA's Space Telecommunications Radio System (STRS) architecture. In addition to its many uses within NASA communications, the SDR can also serve as a highly programmable research-stage prototyping device for new waveforms and other communications technologies. It can also support noncommunication codes on its multicore processor, collocated with the communications workload-reducing the size, weight, and power of the overall system by aggregating processing jobs to a single board computer.
Defense Small Business Innovation Research Program (SBIR). Volume 2. Navy Projects, Abstracts of Phase 1 Awards from FY 1989 SBIR Solicitation

DTIC Science & Technology

1990-04-01

DECISION AIDS HAVE CREATED A VAST NEW POTENTIAL FOR SUPPORT OF STRATEGIC AND TACTICAL OPERATIONS. THE NON-MONOTONIC PROBABILIST (NMP), DEVELOPED BY...QUALITY OF THE NEW DESIGN WILL BE EVALUATED BY CREATING A VIDEO TAPE USING A VIDEO ANIMATION SYSTEM, AND A SOFTWARE SIMULATION OF THE NEW DESIGN. THE...FAULT TOLERANT, SECURE SHIPBOARD COMMUNICATIONS. THE LAN WILL UTILIZE PHOENIX DIGITAL’S FAULT TOLERANT, " SELF - HEALING " SMALL BUSINESS INNOVATION RESEARCH
Specification of Fenix MPI Fault Tolerance library version 1.0.1

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gamble, Marc; Van Der Wijngaart, Rob; Teranishi, Keita

This document provides a specification of Fenix, a software library compatible with the Message Passing Interface (MPI) to support fault recovery without application shutdown. The library consists of two modules. The first, termed process recovery , restores an application to a consistent state after it has suffered a loss of one or more MPI processes (ranks). The second specifies functions the user can invoke to store application data in Fenix managed redundant storage, and to retrieve it from that storage after process recovery.
Fix-Forward: A Comparison of the Army’s Requirements and Capabilities for Forward Support Maintenance,

DTIC Science & Technology

1983-04-01

tolerances or spaci - able assets diagnostic/fault ness float fications isolation devices Operation of cannibalL- zation point Why Sustain materiel...with diagnostic software based on "fault tree " representation of the M65 ThS) to bridge the gap in diagnostics capability was demonstrated in 1980 and... identification friend or foe) which has much lower reliability than TSQ-73 peculiar hardware). Thus, as in other examples, reported readiness does not reflect
Design of penicillin fermentation process simulation system

NASA Astrophysics Data System (ADS)

Qi, Xiaoyu; Yuan, Zhonghu; Qi, Xiaoxuan; Zhang, Wenqi

2011-10-01

Real-time monitoring for batch process attracts increasing attention. It can ensure safety and provide products with consistent quality. The design of simulation system of batch process fault diagnosis is of great significance. In this paper, penicillin fermentation, a typical non-linear, dynamic, multi-stage batch production process, is taken as the research object. A visual human-machine interactive simulation software system based on Windows operation system is developed. The simulation system can provide an effective platform for the research of batch process fault diagnosis.
Satellite Fault Diagnosis Using Support Vector Machines Based on a Hybrid Voting Mechanism

PubMed Central

Yang, Shuqiang; Zhu, Xiaoqian; Jin, Songchang; Wang, Xiang

2014-01-01

The satellite fault diagnosis has an important role in enhancing the safety, reliability, and availability of the satellite system. However, the problem of enormous parameters and multiple faults makes a challenge to the satellite fault diagnosis. The interactions between parameters and misclassifications from multiple faults will increase the false alarm rate and the false negative rate. On the other hand, for each satellite fault, there is not enough fault data for training. To most of the classification algorithms, it will degrade the performance of model. In this paper, we proposed an improving SVM based on a hybrid voting mechanism (HVM-SVM) to deal with the problem of enormous parameters, multiple faults, and small samples. Many experimental results show that the accuracy of fault diagnosis using HVM-SVM is improved. PMID:25215324
Fault Tolerant Considerations and Methods for Guidance and Control Systems

DTIC Science & Technology

1987-07-01

multifunction devices such as microprocessors with software. In striving toward the economic goal, however, a cost is incurred in a different coin, i.e...therefore been developed which reduces the software risk to acceptable proportions. Several of the techniques thus developed incur no significant cost ...complex that their design and implementation need computerized tools in order to be cost -effective (in a broad sense, including the capability of
Optimizing the Reliability and Performance of Service Composition Applications with Fault Tolerance in Wireless Sensor Networks

PubMed Central

Wu, Zhao; Xiong, Naixue; Huang, Yannong; Xu, Degang; Hu, Chunyang

2015-01-01

The services composition technology provides flexible methods for building service composition applications (SCAs) in wireless sensor networks (WSNs). The high reliability and high performance of SCAs help services composition technology promote the practical application of WSNs. The optimization methods for reliability and performance used for traditional software systems are mostly based on the instantiations of software components, which are inapplicable and inefficient in the ever-changing SCAs in WSNs. In this paper, we consider the SCAs with fault tolerance in WSNs. Based on a Universal Generating Function (UGF) we propose a reliability and performance model of SCAs in WSNs, which generalizes a redundancy optimization problem to a multi-state system. Based on this model, an efficient optimization algorithm for reliability and performance of SCAs in WSNs is developed based on a Genetic Algorithm (GA) to find the optimal structure of SCAs with fault-tolerance in WSNs. In order to examine the feasibility of our algorithm, we have evaluated the performance. Furthermore, the interrelationships between the reliability, performance and cost are investigated. In addition, a distinct approach to determine the most suitable parameters in the suggested algorithm is proposed. PMID:26561818
A Validation of Object-Oriented Design Metrics

NASA Technical Reports Server (NTRS)

Basili, Victor R.; Briand, Lionel; Melo, Walcelio L.

1995-01-01

This paper presents the results of a study conducted at the University of Maryland in which we experimentally investigated the suite of Object-Oriented (00) design metrics introduced by [Chidamber and Kemerer, 1994]. In order to do this, we assessed these metrics as predictors of fault-prone classes. This study is complementary to [Lieand Henry, 1993] where the same suite of metrics had been used to assess frequencies of maintenance changes to classes. To perform our validation accurately, we collected data on the development of eight medium-sized information management systems based on identical requirements. All eight projects were developed using a sequential life cycle model, a well-known 00 analysis/design method and the C++ programming language. Based on experimental results, the advantages and drawbacks of these 00 metrics are discussed and suggestions for improvement are provided. Several of Chidamber and Kemerer's 00 metrics appear to be adequate to predict class fault-proneness during the early phases of the life-cycle. We also showed that they are, on our data set, better predictors than "traditional" code metrics, which can only be collected at a later phase of the software development processes.

A parameters optimization method for planar joint clearance model and its application for dynamics simulation of reciprocating compressor

NASA Astrophysics Data System (ADS)

Hai-yang, Zhao; Min-qiang, Xu; Jin-dong, Wang; Yong-bo, Li

2015-05-01

In order to improve the accuracy of dynamics response simulation for mechanism with joint clearance, a parameter optimization method for planar joint clearance contact force model was presented in this paper, and the optimized parameters were applied to the dynamics response simulation for mechanism with oversized joint clearance fault. By studying the effect of increased clearance on the parameters of joint clearance contact force model, the relation of model parameters between different clearances was concluded. Then the dynamic equation of a two-stage reciprocating compressor with four joint clearances was developed using Lagrange method, and a multi-body dynamic model built in ADAMS software was used to solve this equation. To obtain a simulated dynamic response much closer to that of experimental tests, the parameters of joint clearance model, instead of using the designed values, were optimized by genetic algorithms approach. Finally, the optimized parameters were applied to simulate the dynamics response of model with oversized joint clearance fault according to the concluded parameter relation. The dynamics response of experimental test verified the effectiveness of this application.
Performance Monitoring of Chilled-Water Distribution Systems Using HVAC-Cx

PubMed Central

Ferretti, Natascha Milesi; Galler, Michael A.; Bushby, Steven T.

2017-01-01

In this research we develop, test, and demonstrate the newest extension of the software HVAC-Cx (NIST and CSTB 2014), an automated commissioning tool for detecting common mechanical faults and control errors in chilled-water distribution systems (loops). The commissioning process can improve occupant comfort, ensure the persistence of correct system operation, and reduce energy consumption. Automated tools support the process by decreasing the time and the skill level required to carry out necessary quality assurance measures, and as a result they enable more thorough testing of building heating, ventilating, and air-conditioning (HVAC) systems. This paper describes the algorithm, developed by National Institute of Standards and Technology (NIST), to analyze chilled-water loops and presents the results of a passive monitoring investigation using field data obtained from BACnet® (ASHRAE 2016) controllers and presents field validation of the findings. The tool was successful in detecting faults in system operation in its first field implementation supporting the investigation phase through performance monitoring. Its findings led to a full energy retrocommissioning of the field site. PMID:29167584
Performance Monitoring of Chilled-Water Distribution Systems Using HVAC-Cx.

PubMed

Ferretti, Natascha Milesi; Galler, Michael A; Bushby, Steven T

2017-01-01

In this research we develop, test, and demonstrate the newest extension of the software HVAC-Cx (NIST and CSTB 2014), an automated commissioning tool for detecting common mechanical faults and control errors in chilled-water distribution systems (loops). The commissioning process can improve occupant comfort, ensure the persistence of correct system operation, and reduce energy consumption. Automated tools support the process by decreasing the time and the skill level required to carry out necessary quality assurance measures, and as a result they enable more thorough testing of building heating, ventilating, and air-conditioning (HVAC) systems. This paper describes the algorithm, developed by National Institute of Standards and Technology (NIST), to analyze chilled-water loops and presents the results of a passive monitoring investigation using field data obtained from BACnet ® (ASHRAE 2016) controllers and presents field validation of the findings. The tool was successful in detecting faults in system operation in its first field implementation supporting the investigation phase through performance monitoring. Its findings led to a full energy retrocommissioning of the field site.
Software reliability experiments data analysis and investigation

NASA Technical Reports Server (NTRS)

Walker, J. Leslie; Caglayan, Alper K.

1991-01-01

The objectives are to investigate the fundamental reasons which cause independently developed software programs to fail dependently, and to examine fault tolerant software structures which maximize reliability gain in the presence of such dependent failure behavior. The authors used 20 redundant programs from a software reliability experiment to analyze the software errors causing coincident failures, to compare the reliability of N-version and recovery block structures composed of these programs, and to examine the impact of diversity on software reliability using subpopulations of these programs. The results indicate that both conceptually related and unrelated errors can cause coincident failures and that recovery block structures offer more reliability gain than N-version structures if acceptance checks that fail independently from the software components are available. The authors present a theory of general program checkers that have potential application for acceptance tests.
Predictive modelling of fault related fracturing in carbonate damage-zones: analytical and numerical models of field data (Central Apennines, Italy)

NASA Astrophysics Data System (ADS)

Mannino, Irene; Cianfarra, Paola; Salvini, Francesco

2010-05-01

Permeability in carbonates is strongly influenced by the presence of brittle deformation patterns, i.e pressure-solution surfaces, extensional fractures, and faults. Carbonate rocks achieve fracturing both during diagenesis and tectonic processes. Attitude, spatial distribution and connectivity of brittle deformation features rule the secondary permeability of carbonatic rocks and therefore the accumulation and the pathway of deep fluids (ground-water, hydrocarbon). This is particularly true in fault zones, where the damage zone and the fault core show different hydraulic properties from the pristine rock as well as between them. To improve the knowledge of fault architecture and faults hydraulic properties we study the brittle deformation patterns related to fault kinematics in carbonate successions. In particular we focussed on the damage-zone fracturing evolution. Fieldwork was performed in Meso-Cenozoic carbonate units of the Latium-Abruzzi Platform, Central Apennines, Italy. These units represent field analogues of rock reservoir in the Southern Apennines. We combine the study of rock physical characteristics of 22 faults and quantitative analyses of brittle deformation for the same faults, including bedding attitudes, fracturing type, attitudes, and spatial intensity distribution by using the dimension/spacing ratio, namely H/S ratio where H is the dimension of the fracture and S is the spacing between two analogous fractures of the same set. Statistical analyses of structural data (stereonets, contouring and H/S transect) were performed to infer a focussed, general algorithm that describes the expected intensity of fracturing process. The analytical model was fit to field measurements by a Montecarlo-convergent approach. This method proved a useful tool to quantify complex relations with a high number of variables. It creates a large sequence of possible solution parameters and results are compared with field data. For each item an error mean value is computed (RMS), representing the effectiveness of the fit and so the validity of this analysis. Eventually, the method selects the set of parameters that produced the least values. The tested algorithm describes the expected H/S values as a function of the distance from the fault core (D), the clay content (S), and the fault throw (T). The preliminary results of the Montecarlo inversion show that the distance (D) has the most effective influence in the H/S spatial distribution and the H/S value decreases with the distance from the fault-core. The rheological parameter shows a value similar to the diagenetic H/S values (1-1.5). The resulting equation has a reasonable RMS value of 0.116. The results of the Montecarlo models were finally implemented in FRAP, a fault environment modelling software. It is a true 4D tool that can predict stress conditions and permeability architecture associated to a given faults during single or multiple tectonic events. We present some models of fault-related fracturing among the studied faults performed by FRAP and we compare them with the field measurements, to test the validity of our methodology.
A novel fault location scheme for power distribution system based on injection method and transient line voltage

NASA Astrophysics Data System (ADS)

Huang, Yuehua; Li, Xiaomin; Cheng, Jiangzhou; Nie, Deyu; Wang, Zhuoyuan

2018-02-01

This paper presents a novel fault location method by injecting travelling wave current. The new methodology is based on Time Difference Of Arrival(TDOA)measurement which is available measurements the injection point and the end node of main radial. In other words, TDOA is the maximum correlation time when the signal reflected wave crest of the injected and fault appear simultaneously. Then distance calculation is equal to the wave velocity multiplied by TDOA. Furthermore, in case of some transformers connected to the end of the feeder, it’s necessary to combine with the transient voltage comparison of amplitude. Finally, in order to verify the effectiveness of this method, several simulations have been undertaken by using MATLAB/SIMULINK software packages. The proposed fault location is useful to short the positioning time in the premise of ensuring the accuracy, besides the error is 5.1% and 13.7%.
Feature Selection and Parameters Optimization of SVM Using Particle Swarm Optimization for Fault Classification in Power Distribution Systems.

PubMed

Cho, Ming-Yuan; Hoang, Thi Thom

2017-01-01

Fast and accurate fault classification is essential to power system operations. In this paper, in order to classify electrical faults in radial distribution systems, a particle swarm optimization (PSO) based support vector machine (SVM) classifier has been proposed. The proposed PSO based SVM classifier is able to select appropriate input features and optimize SVM parameters to increase classification accuracy. Further, a time-domain reflectometry (TDR) method with a pseudorandom binary sequence (PRBS) stimulus has been used to generate a dataset for purposes of classification. The proposed technique has been tested on a typical radial distribution network to identify ten different types of faults considering 12 given input features generated by using Simulink software and MATLAB Toolbox. The success rate of the SVM classifier is over 97%, which demonstrates the effectiveness and high efficiency of the developed method.
Optimization of the coherence function estimation for multi-core central processing unit

NASA Astrophysics Data System (ADS)

Cheremnov, A. G.; Faerman, V. A.; Avramchuk, V. S.

2017-02-01

The paper considers use of parallel processing on multi-core central processing unit for optimization of the coherence function evaluation arising in digital signal processing. Coherence function along with other methods of spectral analysis is commonly used for vibration diagnosis of rotating machinery and its particular nodes. An algorithm is given for the function evaluation for signals represented with digital samples. The algorithm is analyzed for its software implementation and computational problems. Optimization measures are described, including algorithmic, architecture and compiler optimization, their results are assessed for multi-core processors from different manufacturers. Thus, speeding-up of the parallel execution with respect to sequential execution was studied and results are presented for Intel Core i7-4720HQ и AMD FX-9590 processors. The results show comparatively high efficiency of the optimization measures taken. In particular, acceleration indicators and average CPU utilization have been significantly improved, showing high degree of parallelism of the constructed calculating functions. The developed software underwent state registration and will be used as a part of a software and hardware solution for rotating machinery fault diagnosis and pipeline leak location with acoustic correlation method.
Dynamic rupture simulations on complex fault zone structures with off-fault plasticity using the ADER-DG method

NASA Astrophysics Data System (ADS)

Wollherr, Stephanie; Gabriel, Alice-Agnes; Igel, Heiner

2015-04-01

In dynamic rupture models, high stress concentrations at rupture fronts have to to be accommodated by off-fault inelastic processes such as plastic deformation. As presented in (Roten et al., 2014), incorporating plastic yielding can significantly reduce earlier predictions of ground motions in the Los Angeles Basin. Further, an inelastic response of materials surrounding a fault potentially has a strong impact on surface displacement and is therefore a key aspect in understanding the triggering of tsunamis through floor uplifting. We present an implementation of off-fault-plasticity and its verification for the software package SeisSol, an arbitrary high-order derivative discontinuous Galerkin (ADER-DG) method. The software recently reached multi-petaflop/s performance on some of the largest supercomputers worldwide and was a Gordon Bell prize finalist application in 2014 (Heinecke et al., 2014). For the nonelastic calculations we impose a Drucker-Prager yield criterion in shear stress with a viscous regularization following (Andrews, 2005). It permits the smooth relaxation of high stress concentrations induced in the dynamic rupture process. We verify the implementation by comparison to the SCEC/USGS Spontaneous Rupture Code Verification Benchmarks. The results of test problem TPV13 with a 60-degree dipping normal fault show that SeisSol is in good accordance with other codes. Additionally we aim to explore the numerical characteristics of the off-fault plasticity implementation by performing convergence tests for the 2D code. The ADER-DG method is especially suited for complex geometries by using unstructured tetrahedral meshes. Local adaptation of the mesh resolution enables a fine sampling of the cohesive zone on the fault while simultaneously satisfying the dispersion requirements of wave propagation away from the fault. In this context we will investigate the influence of off-fault-plasticity on geometrically complex fault zone structures like subduction zones or branched faults. Studying the interplay of stress conditions and angle dependence of neighbouring branches including inelastic material behaviour and its effects on rupture jumps and seismic activation helps to advance our understanding of earthquake source processes. An application is the simulation of a real large-scale subduction zone scenario including plasticity to validate the coupling of our dynamic rupture calculations to a tsunami model in the framework of the ASCETE project (http://www.ascete.de/). Andrews, D. J. (2005): Rupture dynamics with energy loss outside the slip zone, J. Geophys. Res., 110, B01307. Heinecke, A. (2014), A. Breuer, S. Rettenberger, M. Bader, A.-A. Gabriel, C. Pelties, A. Bode, W. Barth, K. Vaidyanathan, M. Smelyanskiy and P. Dubey: Petascale High Order Dynamic Rupture Earthquake Simulations on Heterogeneous Supercomputers. In Supercomputing 2014, The International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, New Orleans, LA, USA, November 2014. Roten, D. (2014), K. B. Olsen, S.M. Day, Y. Cui, and D. Fäh: Expected seismic shaking in Los Angeles reduced by San Andreas fault zone plasticity, Geophys. Res. Lett., 41, 2769-2777.
HyspIRI Intelligent Payload Module(IPM) and Benchmarking Algorithms for Upload

NASA Technical Reports Server (NTRS)

Mandl, Daniel

2010-01-01

Features: Hardware: a) Xilinx Virtex-5 (GSFC Space Cube 2); b) 2 x 400MHz PPC; c) 100MHz Bus; d) 2 x 512MB SDRAM; e) Dual Gigabit Ethernet. Support Linux kernel 2.6.31 (gcc version 4.2.2). Support software running in stand alone mode for better performance. Can stream raw data up to 800 Mbps. Ready for operations. Software Application Examples: Band-stripping Algiotrhmsl:cloud, sulfur, flood, thermal, SWIL, NDVI, NDWI, SIWI, oil spills, algae blooms, etc. Corrections: geometric, radiometric, atmospheric. Core Flight System/dynamic software bus. CCSDS File Delivery Protocol. Delay Tolerant Network. CASPER /onboard planning. Fault monitoring/recovery software. S/C command and telemetry software. Data compression. Sensor Web for Autonomous Mission Operations.
Digital techniques for processing Landsat imagery

NASA Technical Reports Server (NTRS)

Green, W. B.

1978-01-01

An overview of the basic techniques used to process Landsat images with a digital computer, and the VICAR image processing software developed at JPL and available to users through the NASA sponsored COSMIC computer program distribution center is presented. Examples of subjective processing performed to improve the information display for the human observer, such as contrast enhancement, pseudocolor display and band rationing, and of quantitative processing using mathematical models, such as classification based on multispectral signatures of different areas within a given scene and geometric transformation of imagery into standard mapping projections are given. Examples are illustrated by Landsat scenes of the Andes mountains and Altyn-Tagh fault zone in China before and after contrast enhancement and classification of land use in Portland, Oregon. The VICAR image processing software system which consists of a language translator that simplifies execution of image processing programs and provides a general purpose format so that imagery from a variety of sources can be processed by the same basic set of general applications programs is described.
NASA Tech Briefs, February 2008

NASA Technical Reports Server (NTRS)

2008-01-01

Topics discussed include: Optical Measurement of Mass Flow of a Two-Phase Fluid; Selectable-Tip Corrosion-Testing Electrochemical Cell; Piezoelectric Bolt Breakers and Bolt Fatigue Testers; Improved Measurement of B(sub 22) of Macromolecules in a Flow Cell; Measurements by a Vector Network Analyzer at 325 to 508 GHz; Using Light to Treat Mucositis and Help Wounds Heal; Increasing Discharge Capacities of Li-(CF)(sub n) Cells; Dot-in-Well Quantum-Dot Infrared Photodetectors; Integrated Microbatteries for Implantable Medical Devices; Oxidation Behavior of Carbon Fiber-Reinforced Composites; GIDEP Batching Tool; Generic Spacecraft Model for Real-Time Simulation; Parallel-Processing Software for Creating Mosaic Images; Software for Verifying Image-Correlation Tie Points; Flexcam Image Capture Viewing and Spot Tracking; Low-Pt-Content Anode Catalyst for Direct Methanol Fuel Cells; Graphite/Cyanate Ester Face Sheets for Adaptive Optics; Atomized BaF2-CaF7 for Better-Flowing Plasma-Spray Feedstock; Nanophase Nickel-Zirconium Alloys for Fuel Cells; Vacuum Packaging of MEMS With Multiple Internal Seal Rings; Compact Two-Dimensional Spectrometer Optics; and Fault-Tolerant Coding for State Machines.
Using 3D visualization and seismic attributes to improve structural and stratigraphic resolution of reservoirs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kerr, J.; Jones, G.L.

1996-01-01

Recent advances in hardware and software have given the interpreter and engineer new ways to view 3D seismic data and well bore information. Recent papers have also highlighted the use of various statistics and seismic attributes. By combining new 3D rendering technologies with recent trends in seismic analysis, the interpreter can improve the structural and stratigraphic resolution of hydrocarbon reservoirs. This paper gives several examples using 3D visualization to better define both the structural and stratigraphic aspects of several different structural types from around the world. Statistics, 3D visualization techniques and rapid animation are used to show complex faulting andmore » detailed channel systems. These systems would be difficult to map using either 2D or 3D data with conventional interpretation techniques.« less
Using 3D visualization and seismic attributes to improve structural and stratigraphic resolution of reservoirs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kerr, J.; Jones, G.L.

1996-12-31

Recent advances in hardware and software have given the interpreter and engineer new ways to view 3D seismic data and well bore information. Recent papers have also highlighted the use of various statistics and seismic attributes. By combining new 3D rendering technologies with recent trends in seismic analysis, the interpreter can improve the structural and stratigraphic resolution of hydrocarbon reservoirs. This paper gives several examples using 3D visualization to better define both the structural and stratigraphic aspects of several different structural types from around the world. Statistics, 3D visualization techniques and rapid animation are used to show complex faulting andmore » detailed channel systems. These systems would be difficult to map using either 2D or 3D data with conventional interpretation techniques.« less
Recovering from "amnesia" brought about by radiation. Verification of the "Over the air" (OTA) application software update mechanism On-Board Solar Orbiter's Energetic Particle Detector

NASA Astrophysics Data System (ADS)

Da Silva, Antonio; Sánchez Prieto, Sebastián; Rodriguez Polo, Oscar; Parra Espada, Pablo

Computer memories are not supposed to forget, but they do. Because of the proximity of the Sun, from the Solar Orbiter boot software perspective, it is mandatory to look out for permanent memory errors resulting from (SEL) latch-up failures in application binaries stored in EEPROM and its SDRAM deployment areas. In this situation, the last line in defense established by FDIR mechanisms is the capability of the boot software to provide an accurate report of the memories’ damages and to perform an application software update, that avoid the harmed locations by flashing EEPROM with a new binary. This paper describes the OTA EEPROM firmware update procedure verification of the boot software that will run in the Instrument Control Unit (ICU) of the Energetic Particle Detector (EPD) on-board Solar Orbiter. Since the maximum number of rewrites on real EEPROM is limited and permanent memory faults cannot be friendly emulated in real hardware, the verification has been accomplished by the use of a LEON2 Virtual Platform (Leon2ViP) with fault injection capabilities and real SpaceWire interfaces developed by the Space Research Group (SRG) of the University of Alcalá. This way it is possible to run the exact same target binary software as if was run on the real ICU platform. Furthermore, the use of this virtual hardware-in-the-loop (VHIL) approach makes it possible to communicate with Electrical Ground Support Equipment (EGSE) through real SpaceWire interfaces in an agile, controlled and deterministic environment.
Alteration of fault rocks by CO2-bearing fluids with implications for sequestration

NASA Astrophysics Data System (ADS)

Luetkemeyer, P. B.; Kirschner, D. L.; Solum, J. G.; Naruk, S.

2011-12-01

Carbonates and sulfates commonly occur as primary (diagenetic) pore cements and secondary fluid-mobilized veins within fault zones. Stable isotope analyses of calcite, formation fluid, and fault zone fluids can help elucidate the carbon sources and the extent of fluid-rock interaction within a particular reservoir. Introduction of CO2 bearing fluids into a reservoir/fault system can profoundly affect the overall fluid chemistry of the reservoir/fault system and may lead to the enhancement or degradation of porosity within the fault zone. The extent of precipitation and/or dissolution of minerals within a fault zone can ultimately influence the sealing properties of a fault. The Colorado Plateau contains a number of large carbon dioxide reservoirs some of which leak and some of which do not. Several normal faults within the Paradox Basin (SE Utah) dissect the Green River anticline giving rise to a series of footwall reservoirs with fault-dependent columns. Numerous CO2-charged springs and geysers are associated with these faults. This study seeks to identify regional sources and subsurface migration of CO2 to these reservoirs and the effect(s) faults have on trap performance. Data provided in this study include mineralogical, elemental, and stable isotope data for fault rocks, host rocks, and carbonate veins that come from two localities along one fault that locally sealed CO2. This fault is just tens of meters away from another normal fault that has leaked CO2-charged waters to the land surface for thousands of years. These analyses have been used to determine the source of carbon isotopes from sedimentary derived carbon and deeply sourced CO2. XRF and XRD data taken from several transects across the normal faults are consistent with mechanical mixing and fluid-assisted mass transfer processes within the fault zone. δ13C range from -6% to +10% (PDB); δ18O values range from +15% to +24% (VSMOW). Geochemical modeling software is used to model the alteration productions of fault rocks from fluids of various chemistries coming from several different reservoirs within an active CO2-charged fault system. These results are compared to data obtained in the field.
First-order and subsidiary faults controlling the time-space evolution of the Central Italy 2016 seismic sequence - a multi-source data detailed 3D reconstruction

NASA Astrophysics Data System (ADS)

Lavecchia, Giusy; de nardis, Rita; Ferrarini, Federica; Cirillo, Daniele; Brozzetti, Francesco

2017-04-01

The Central Italy 2016 seismic sequence, with its three major events (24 August, Mw 6.0/6.2; 26 October Mw5.9/6.0; 30 October Mw6.5/6.6), activated a well-known active west-dipping extensional fault alignment of central Italy (Vettore-Gorzano faults, VEGO). Soon after the first event, based on geological, interferometric and at that moment available seismological data, a preliminary 3D fault model of VEGO was built. Such a model is here updated and improved at the light of a large amount of relocated earthquake data (time interval 24 August to 30 November 2016, 0.1≤ML ≤6.5, Chiaraluce at al., submitted to SRL) plus additional geological information. The 3D modeling was done using the software package MOVE from the Midland Valley. All the available data were taken into consideration (surface traces, fault-slip data, primary co-seismic surface fractures, geological maps and cross-sections, hypocentral locations and focal mechanisms of both background seismicity and seismic sequences). The VEGO geometric configuration did not substantially changed with respect to the previous model, but some additional structures involved in the sequence were reconstructed. In particular, four additional faults are well evident: a NE-dipping normal fault (dip-angle 50˚ ) antithetic to Vettore Fault, located at depths between 1 and 5 km; a WNW dipping plane (dip-angle 30˚ ) located at depth between 1 and 4 km within the Vettore footwall volume; this structure represents a splay of the late Miocene Sibillini thrust, which is evidently cross-cut and dislocated by the Vettore normal fault; a SW-dipping normal fault representing an unknown northward prosecution of the VEGO alignment, where since 26 October a relevant seismic activity was released; an unknown east-dipping low-angle detachment, where VEGO detaches at a depth of about 10-11 km. An uninterrupted microseismic activity has illuminated such a detachment not only during the overall sequence, but also in the previous months. At the light of the reconstructed geometric pattern integrated with the evidences of primary co-seismic fractures, it results evident that the Central Italy seismic sequence represents a "classic", although complex, intra-Apennine normal-faulting event, reactivating a long-term quiescent seismogenic alignment (e.g. VEGO). The reactivated and inverted compressional structures are confined at shallow depth within the Vettore footwall, and in no way control the major events of the sequence. Conversely, an important regional role is played by the east-dipping detachment. It represents the missing geometric link between the Altotiberina LANF of northern Umbria and the recently discovered LANF of Latium-Abruzzi.
Online Monitoring Technical Basis and Analysis Framework for Emergency Diesel Generators - Interim Report for FY 2013

DOE Office of Scientific and Technical Information (OSTI.GOV)

Binh T. Pham; Nancy J. Lybeck; Vivek Agarwal

The Light Water Reactor Sustainability program at Idaho National Laboratory is actively conducting research to develop and demonstrate online monitoring capabilities for active components in existing nuclear power plants. Idaho National Laboratory and the Electric Power Research Institute are working jointly to implement a pilot project to apply these capabilities to emergency diesel generators and generator step-up transformers. The Electric Power Research Institute Fleet-Wide Prognostic and Health Management Software Suite will be used to implement monitoring in conjunction with utility partners: Braidwood Generating Station (owned by Exelon Corporation) for emergency diesel generators, and Shearon Harris Nuclear Generating Station (owned bymore » Duke Energy Progress) for generator step-up transformers. This report presents monitoring techniques, fault signatures, and diagnostic and prognostic models for emergency diesel generators. Emergency diesel generators provide backup power to the nuclear power plant, allowing operation of essential equipment such as pumps in the emergency core coolant system during catastrophic events, including loss of offsite power. Technical experts from Braidwood are assisting Idaho National Laboratory and Electric Power Research Institute in identifying critical faults and defining fault signatures associated with each fault. The resulting diagnostic models will be implemented in the Fleet-Wide Prognostic and Health Management Software Suite and tested using data from Braidwood. Parallel research on generator step-up transformers was summarized in an interim report during the fourth quarter of fiscal year 2012.« less
A Novel Bearing Multi-Fault Diagnosis Approach Based on Weighted Permutation Entropy and an Improved SVM Ensemble Classifier.

PubMed

Zhou, Shenghan; Qian, Silin; Chang, Wenbing; Xiao, Yiyong; Cheng, Yang

2018-06-14

Timely and accurate state detection and fault diagnosis of rolling element bearings are very critical to ensuring the reliability of rotating machinery. This paper proposes a novel method of rolling bearing fault diagnosis based on a combination of ensemble empirical mode decomposition (EEMD), weighted permutation entropy (WPE) and an improved support vector machine (SVM) ensemble classifier. A hybrid voting (HV) strategy that combines SVM-based classifiers and cloud similarity measurement (CSM) was employed to improve the classification accuracy. First, the WPE value of the bearing vibration signal was calculated to detect the fault. Secondly, if a bearing fault occurred, the vibration signal was decomposed into a set of intrinsic mode functions (IMFs) by EEMD. The WPE values of the first several IMFs were calculated to form the fault feature vectors. Then, the SVM ensemble classifier was composed of binary SVM and the HV strategy to identify the bearing multi-fault types. Finally, the proposed model was fully evaluated by experiments and comparative studies. The results demonstrate that the proposed method can effectively detect bearing faults and maintain a high accuracy rate of fault recognition when a small number of training samples are available.
NASA ground terminal communication equipment automated fault isolation expert systems

NASA Technical Reports Server (NTRS)

Tang, Y. K.; Wetzel, C. R.

1990-01-01

The prototype expert systems are described that diagnose the Distribution and Switching System I and II (DSS1 and DSS2), Statistical Multiplexers (SM), and Multiplexer and Demultiplexer systems (MDM) at the NASA Ground Terminal (NGT). A system level fault isolation expert system monitors the activities of a selected data stream, verifies that the fault exists in the NGT and identifies the faulty equipment. Equipment level fault isolation expert systems are invoked to isolate the fault to a Line Replaceable Unit (LRU) level. Input and sometimes output data stream activities for the equipment are available. The system level fault isolation expert system compares the equipment input and output status for a data stream and performs loopback tests (if necessary) to isolate the faulty equipment. The equipment level fault isolation system utilizes the process of elimination and/or the maintenance personnel's fault isolation experience stored in its knowledge base. The DSS1, DSS2 and SM fault isolation systems, using the knowledge of the current equipment configuration and the equipment circuitry issues a set of test connections according to the predefined rules. The faulty component or board can be identified by the expert system by analyzing the test results. The MDM fault isolation system correlates the failure symptoms with the faulty component based on maintenance personnel experience. The faulty component can be determined by knowing the failure symptoms. The DSS1, DSS2, SM, and MDM equipment simulators are implemented in PASCAL. The DSS1 fault isolation expert system was converted to C language from VP-Expert and integrated into the NGT automation software for offline switch diagnoses. Potentially, the NGT fault isolation algorithms can be used for the DSS1, SM, amd MDM located at Goddard Space Flight Center (GSFC).

Space-time evolution of a growth fold (Betic Cordillera, Spain). Evidences from 3D geometrical modelling

NASA Astrophysics Data System (ADS)

Martin-Rojas, Ivan; Alfaro, Pedro; Estévez, Antonio

2014-05-01

We present a study that encompasses several software tools (iGIS©, ArcGIS©, Autocad©, etc.) and data (geological mapping, high resolution digital topographic data, high resolution aerial photographs, etc.) to create a detailed 3D geometric model of an active fault propagation growth fold. This 3D model clearly shows structural features of the analysed fold, as well as growth relationships and sedimentary patterns. The results obtained permit us to discuss the kinematics and structural evolution of the fold and the fault in time and space. The study fault propagation fold is the Crevillente syncline. This fold represents the northern limit of the Bajo Segura Basin, an intermontane basin in the Eastern Betic Cordillera (SE Spain) developed from upper Miocene on. 3D features of the Crevillente syncline, including growth pattern, indicate that limb rotation and, consequently, fault activity was higher during Messinian than during Tortonian; consequently, fault activity was also higher. From Pliocene on our data point that limb rotation and fault activity steadies or probably decreases. This in time evolution of the Crevillente syncline is not the same all along the structure; actually the 3D geometric model indicates that observed lateral heterogeneity is related to along strike variation of fault displacement.
Automated Generation of Fault Management Artifacts from a Simple System Model

NASA Technical Reports Server (NTRS)

Kennedy, Andrew K.; Day, John C.

2013-01-01

Our understanding of off-nominal behavior - failure modes and fault propagation - in complex systems is often based purely on engineering intuition; specific cases are assessed in an ad hoc fashion as a (fallible) fault management engineer sees fit. This work is an attempt to provide a more rigorous approach to this understanding and assessment by automating the creation of a fault management artifact, the Failure Modes and Effects Analysis (FMEA) through querying a representation of the system in a SysML model. This work builds off the previous development of an off-nominal behavior model for the upcoming Soil Moisture Active-Passive (SMAP) mission at the Jet Propulsion Laboratory. We further developed the previous system model to more fully incorporate the ideas of State Analysis, and it was restructured in an organizational hierarchy that models the system as layers of control systems while also incorporating the concept of "design authority". We present software that was developed to traverse the elements and relationships in this model to automatically construct an FMEA spreadsheet. We further discuss extending this model to automatically generate other typical fault management artifacts, such as Fault Trees, to efficiently portray system behavior, and depend less on the intuition of fault management engineers to ensure complete examination of off-nominal behavior.
Investigation of an advanced fault tolerant integrated avionics system

NASA Technical Reports Server (NTRS)

Dunn, W. R.; Cottrell, D.; Flanders, J.; Javornik, A.; Rusovick, M.

1986-01-01

Presented is an advanced, fault-tolerant multiprocessor avionics architecture as could be employed in an advanced rotorcraft such as LHX. The processor structure is designed to interface with existing digital avionics systems and concepts including the Army Digital Avionics System (ADAS) cockpit/display system, navaid and communications suites, integrated sensing suite, and the Advanced Digital Optical Control System (ADOCS). The report defines mission, maintenance and safety-of-flight reliability goals as might be expected for an operational LHX aircraft. Based on use of a modular, compact (16-bit) microprocessor card family, results of a preliminary study examining simplex, dual and standby-sparing architectures is presented. Given the stated constraints, it is shown that the dual architecture is best suited to meet reliability goals with minimum hardware and software overhead. The report presents hardware and software design considerations for realizing the architecture including redundancy management requirements and techniques as well as verification and validation needs and methods.
Software fault-tolerance by design diversity DEDIX: A tool for experiments

NASA Technical Reports Server (NTRS)

Avizienis, A.; Gunningberg, P.; Kelly, J. P. J.; Lyu, R. T.; Strigini, L.; Traverse, P. J.; Tso, K. S.; Voges, U.

1986-01-01

The use of multiple versions of a computer program, independently designed from a common specification, to reduce the effects of an error is discussed. If these versions are designed by independent programming teams, it is expected that a fault in one version will not have the same behavior as any fault in the other versions. Since the errors in the output of the versions are different and uncorrelated, it is possible to run the versions concurrently, cross-check their results at prespecified points, and mask errors. A DEsign DIversity eXperiments (DEDIX) testbed was implemented to study the influence of common mode errors which can result in a failure of the entire system. The layered design of DEDIX and its decision algorithm are described.
Formal specification and mechanical verification of SIFT - A fault-tolerant flight control system

NASA Technical Reports Server (NTRS)

Melliar-Smith, P. M.; Schwartz, R. L.

1982-01-01

The paper describes the methodology being employed to demonstrate rigorously that the SIFT (software-implemented fault-tolerant) computer meets its requirements. The methodology uses a hierarchy of design specifications, expressed in the mathematical domain of multisorted first-order predicate calculus. The most abstract of these, from which almost all details of mechanization have been removed, represents the requirements on the system for reliability and intended functionality. Successive specifications in the hierarchy add design and implementation detail until the PASCAL programs implementing the SIFT executive are reached. A formal proof that a SIFT system in a 'safe' state operates correctly despite the presence of arbitrary faults has been completed all the way from the most abstract specifications to the PASCAL program.
Fault Tree Analysis.

PubMed

McElroy, Lisa M; Khorzad, Rebeca; Rowe, Theresa A; Abecassis, Zachary A; Apley, Daniel W; Barnard, Cynthia; Holl, Jane L

The purpose of this study was to use fault tree analysis to evaluate the adequacy of quality reporting programs in identifying root causes of postoperative bloodstream infection (BSI). A systematic review of the literature was used to construct a fault tree to evaluate 3 postoperative BSI reporting programs: National Surgical Quality Improvement Program (NSQIP), Centers for Medicare and Medicaid Services (CMS), and The Joint Commission (JC). The literature review revealed 699 eligible publications, 90 of which were used to create the fault tree containing 105 faults. A total of 14 identified faults are currently mandated for reporting to NSQIP, 5 to CMS, and 3 to JC; 2 or more programs require 4 identified faults. The fault tree identifies numerous contributing faults to postoperative BSI and reveals substantial variation in the requirements and ability of national quality data reporting programs to capture these potential faults. Efforts to prevent postoperative BSI require more comprehensive data collection to identify the root causes and develop high-reliability improvement strategies.
A Novel Wide-Area Backup Protection Based on Fault Component Current Distribution and Improved Evidence Theory

PubMed Central

Zhang, Zhe; Kong, Xiangping; Yin, Xianggen; Yang, Zengli; Wang, Lijun

2014-01-01

In order to solve the problems of the existing wide-area backup protection (WABP) algorithms, the paper proposes a novel WABP algorithm based on the distribution characteristics of fault component current and improved Dempster/Shafer (D-S) evidence theory. When a fault occurs, slave substations transmit to master station the amplitudes of fault component currents of transmission lines which are the closest to fault element. Then master substation identifies suspicious faulty lines according to the distribution characteristics of fault component current. After that, the master substation will identify the actual faulty line with improved D-S evidence theory based on the action states of traditional protections and direction components of these suspicious faulty lines. The simulation examples based on IEEE 10-generator-39-bus system show that the proposed WABP algorithm has an excellent performance. The algorithm has low requirement of sampling synchronization, small wide-area communication flow, and high fault tolerance. PMID:25050399
Rolling Bearing Fault Diagnosis Based on an Improved HTT Transform

PubMed Central

Tang, Guiji; Tian, Tian; Zhou, Chong

2018-01-01

When rolling bearing failure occurs, vibration signals generally contain different signal components, such as impulsive fault feature signals, background noise and harmonic interference signals. One of the most challenging aspects of rolling bearing fault diagnosis is how to inhibit noise and harmonic interference signals, while enhancing impulsive fault feature signals. This paper presents a novel bearing fault diagnosis method, namely an improved Hilbert time–time (IHTT) transform, by combining a Hilbert time–time (HTT) transform with principal component analysis (PCA). Firstly, the HTT transform was performed on vibration signals to derive a HTT transform matrix. Then, PCA was employed to de-noise the HTT transform matrix in order to improve the robustness of the HTT transform. Finally, the diagonal time series of the de-noised HTT transform matrix was extracted as the enhanced impulsive fault feature signal and the contained fault characteristic information was identified through further analyses of amplitude and envelope spectrums. Both simulated and experimental analyses validated the superiority of the presented method for detecting bearing failures. PMID:29662013
Dynamic rupture simulations of the 2016 Mw7.8 Kaikōura earthquake: a cascading multi-fault event

NASA Astrophysics Data System (ADS)

Ulrich, T.; Gabriel, A. A.; Ampuero, J. P.; Xu, W.; Feng, G.

2017-12-01

The Mw7.8 Kaikōura earthquake struck the Northern part of New Zealand's South Island roughly one year ago. It ruptured multiple segments of the contractional North Canterbury fault zone and of the Marlborough fault system. Field observations combined with satellite data suggest a rupture path involving partly unmapped faults separated by large stepover distances larger than 5 km, the maximum distance usually considered by the latest seismic hazard assessment methods. This might imply distant rupture transfer mechanisms generally not considered in seismic hazard assessment. We present high-resolution 3D dynamic rupture simulations of the Kaikōura earthquake under physically self-consistent initial stress and strength conditions. Our simulations are based on recent finite-fault slip inversions that constrain fault system geometry and final slip distribution from remote sensing, surface rupture and geodetic data (Xu et al., 2017). We assume a uniform background stress field, without lateral fault stress or strength heterogeneity. We use the open-source software SeisSol (www.seissol.org) which is based on an arbitrary high-order accurate DERivative Discontinuous Galerkin method (ADER-DG). Our method can account for complex fault geometries, high resolution topography and bathymetry, 3D subsurface structure, off-fault plasticity and modern friction laws. It enables the simulation of seismic wave propagation with high-order accuracy in space and time in complex media. We show that a cascading rupture driven by dynamic triggering can break all fault segments that were involved in this earthquake without mechanically requiring an underlying thrust fault. Our prefered fault geometry connects most fault segments: it does not features stepover larger than 2 km. The best scenario matches the main macroscopic characteristics of the earthquake, including its apparently slow rupture propagation caused by zigzag cascading, the moment magnitude and the overall inferred slip distribution. We observe a high sensitivity of cascading dynamics on fault-step over distance and off-fault energy dissipation.
An Analysis of Failure Handling in Chameleon, A Framework for Supporting Cost-Effective Fault Tolerant Services

NASA Technical Reports Server (NTRS)

Haakensen, Erik Edward

1998-01-01

The desire for low-cost reliable computing is increasing. Most current fault tolerant computing solutions are not very flexible, i.e., they cannot adapt to reliability requirements of newly emerging applications in business, commerce, and manufacturing. It is important that users have a flexible, reliable platform to support both critical and noncritical applications. Chameleon, under development at the Center for Reliable and High-Performance Computing at the University of Illinois, is a software framework. for supporting cost-effective adaptable networked fault tolerant service. This thesis details a simulation of fault injection, detection, and recovery in Chameleon. The simulation was written in C++ using the DEPEND simulation library. The results obtained from the simulation included the amount of overhead incurred by the fault detection and recovery mechanisms supported by Chameleon. In addition, information about fault scenarios from which Chameleon cannot recover was gained. The results of the simulation showed that both critical and noncritical applications can be executed in the Chameleon environment with a fairly small amount of overhead. No single point of failure from which Chameleon could not recover was found. Chameleon was also found to be capable of recovering from several multiple failure scenarios.
Dynamic assertion testing of flight control software

NASA Technical Reports Server (NTRS)

Andrews, D. M.; Mahmood, A.; Mccluskey, E. J.

1985-01-01

An experiment in using assertions to dynamically test fault tolerant flight software is described. The experiment showed that 87% of typical errors introduced into the program would be detected by assertions. Detailed analysis of the test data showed that the number of assertions needed to detect those errors could be reduced to a minimal set. The analysis also revealed that the most effective assertions tested program parameters that provided greater indirect (collateral) testing of other parameters.
Fault Tolerant Hardware/Software Architecture for Flight Critical Function

DTIC Science & Technology

1985-09-01

Applications Studies Programme. The results of AGARD work are reported to the member nations and the NATO Authorities through the AGARD series of...systems, and is being advocated as a defense against design deficiencies which can plague software. - -- -- z--mm-L ___ K A critical application area for...day of the lecture series concludes with part I of a paper on the ;use of the Ada programming language In flight critical applications . Ada has been
The engine fuel system fault analysis

NASA Astrophysics Data System (ADS)

Zhang, Yong; Song, Hanqiang; Yang, Changsheng; Zhao, Wei

2017-05-01

For improving the reliability of the engine fuel system, the typical fault factor of the engine fuel system was analyzed from the point view of structure and functional. The fault character was gotten by building the fuel system fault tree. According the utilizing of fault mode effect analysis method (FMEA), several factors of key component fuel regulator was obtained, which include the fault mode, the fault cause, and the fault influences. All of this made foundation for next development of fault diagnosis system.
The NASA Integrated Vehicle Health Management Technology Experiment for X-37

NASA Technical Reports Server (NTRS)

Schwabacher, Mark; Samuels, Jeff; Brownston, Lee; Clancy, Daniel (Technical Monitor)

2002-01-01

The NASA Integrated Vehicle Health Management (IVHM) Technology Experiment for X-37 was intended to run IVHM software on-board the X-37 spacecraft. The X-37 is intended to be an unpiloted vehicle that would orbit the Earth for up to 21 days before landing on a runway. The objectives of the experiment were to demonstrate the benefits of in-flight IVHM to the operation of a Reusable Launch Vehicle, to advance the Technology Readiness Level of this IVHM technology within a flight environment, and to demonstrate that the IVHM software could operate on the Vehicle Management Computer. The scope of the experiment was to perform real-time fault detection and isolation for X-37's electrical power system and electro-mechanical actuators. The experiment used Livingstone, a software system that performs diagnosis using a qualitative, model-based reasoning approach that searches system-wide interactions to detect and isolate failures. Two of the challenges we faced were to make this research software more efficient so that it would fit within the limited computational resources that were available to us on the X-37 spacecraft, and to modify it so that it satisfied the X-37's software safety requirements. Although the experiment is currently unfunded, the development effort had value in that it resulted in major improvements in Livingstone's efficiency and safety. This paper reviews some of the details of the modeling and integration efforts, and some of the lessons that were learned.
Modelling Active Faults in Probabilistic Seismic Hazard Analysis (PSHA) with OpenQuake: Definition, Design and Experience

NASA Astrophysics Data System (ADS)

Weatherill, Graeme; Garcia, Julio; Poggi, Valerio; Chen, Yen-Shin; Pagani, Marco

2016-04-01

The Global Earthquake Model (GEM) has, since its inception in 2009, made many contributions to the practice of seismic hazard modeling in different regions of the globe. The OpenQuake-engine (hereafter referred to simply as OpenQuake), GEM's open-source software for calculation of earthquake hazard and risk, has found application in many countries, spanning a diversity of tectonic environments. GEM itself has produced a database of national and regional seismic hazard models, harmonizing into OpenQuake's own definition the varied seismogenic sources found therein. The characterization of active faults in probabilistic seismic hazard analysis (PSHA) is at the centre of this process, motivating many of the developments in OpenQuake and presenting hazard modellers with the challenge of reconciling seismological, geological and geodetic information for the different regions of the world. Faced with these challenges, and from the experience gained in the process of harmonizing existing models of seismic hazard, four critical issues are addressed. The challenge GEM has faced in the development of software is how to define a representation of an active fault (both in terms of geometry and earthquake behaviour) that is sufficiently flexible to adapt to different tectonic conditions and levels of data completeness. By exploring the different fault typologies supported by OpenQuake we illustrate how seismic hazard calculations can, and do, take into account complexities such as geometrical irregularity of faults in the prediction of ground motion, highlighting some of the potential pitfalls and inconsistencies that can arise. This exploration leads to the second main challenge in active fault modeling, what elements of the fault source model impact most upon the hazard at a site, and when does this matter? Through a series of sensitivity studies we show how different configurations of fault geometry, and the corresponding characterisation of near-fault phenomena (including hanging wall and directivity effects) within modern ground motion prediction equations, can have an influence on the seismic hazard at a site. Yet we also illustrate the conditions under which these effects may be partially tempered when considering the full uncertainty in rupture behaviour within the fault system. The third challenge is the development of efficient means for representing both aleatory and epistemic uncertainties from active fault models in PSHA. In implementing state-of-the-art seismic hazard models into OpenQuake, such as those recently undertaken in California and Japan, new modeling techniques are needed that redefine how we treat interdependence of ruptures within the model (such as mutual exclusivity), and the propagation of uncertainties emerging from geology. Finally, we illustrate how OpenQuake, and GEM's additional toolkits for model preparation, can be applied to address long-standing issues in active fault modeling in PSHA. These include constraining the seismogenic coupling of a fault and the partitioning of seismic moment between the active fault surfaces and the surrounding seismogenic crust. We illustrate some of the possible roles that geodesy can play in the process, but highlight where this may introduce new uncertainties and potential biases into the seismic hazard process, and how these can be addressed.
Logic design for dynamic and interactive recovery.

NASA Technical Reports Server (NTRS)

Carter, W. C.; Jessep, D. C.; Wadia, A. B.; Schneider, P. R.; Bouricius, W. G.

1971-01-01

Recovery in a fault-tolerant computer means the continuation of system operation with data integrity after an error occurs. This paper delineates two parallel concepts embodied in the hardware and software functions required for recovery; detection, diagnosis, and reconfiguration for hardware, data integrity, checkpointing, and restart for the software. The hardware relies on the recovery variable set, checking circuits, and diagnostics, and the software relies on the recovery information set, audit, and reconstruct routines, to characterize the system state and assist in recovery when required. Of particular utility is a handware unit, the recovery control unit, which serves as an interface between error detection and software recovery programs in the supervisor and provides dynamic interactive recovery.
Software reliability perspectives

NASA Technical Reports Server (NTRS)

Wilson, Larry; Shen, Wenhui

1987-01-01

Software which is used in life critical functions must be known to be highly reliable before installation. This requires a strong testing program to estimate the reliability, since neither formal methods, software engineering nor fault tolerant methods can guarantee perfection. Prior to the final testing software goes through a debugging period and many models have been developed to try to estimate reliability from the debugging data. However, the existing models are poorly validated and often give poor performance. This paper emphasizes the fact that part of their failures can be attributed to the random nature of the debugging data given to these models as input, and it poses the problem of correcting this defect as an area of future research.
Stress state reconstruction and tectonic evolution of the northern slope of the Baikit anteclise, Siberian Craton, based on 3D seismic data

NASA Astrophysics Data System (ADS)

Moskalenko, A. N.; Khudoley, A. K.; Khusnitdinov, R. R.

2017-05-01

In this work, we consider application of an original method for determining the indicators of the tectonic stress fields in the northern Baikit anteclise based on 3D seismic data for further reconstruction of the stress state parameters when analyzing structural maps of seismic horizons and corresponded faults. The stress state parameters are determined by the orientations of the main stress axes and shape of the stress ellipsoid. To calculate the stress state parameters from data on the spatial orientations of faults and slip vectors, we used the algorithms from quasiprimary stress computation methods and cataclastic analysis, implemented in the software products FaultKinWin and StressGeol, respectively. The results of this work show that kinematic characteristics of faults regularly change toward the top of succession and that the stress state parameters are characterized by different values of the Lode-Nadai coefficient. Faults are presented as strike-slip faults with normal or reverse component of displacement. Three stages of formation of the faults are revealed: (1) partial inversion of ancient normal faults, (2) the most intense stage with the predominance of thrust and strike-slip faults at north-northeast orientation of an axis of the main compression, and (3) strike-slip faults at the west-northwest orientation of an axis of the main compression. The second and third stages are pre-Vendian in age and correlate to tectonic events that took place during the evolution of the active southwestern margin of the Siberian Craton.
Combining Real-Time Seismic and GPS Data for Earthquake Early Warning (Invited)

NASA Astrophysics Data System (ADS)

Boese, M.; Heaton, T. H.; Hudnut, K. W.

2013-12-01

Scientists at Caltech, UC Berkeley, the Univ. of SoCal, the Univ. of Washington, the US Geological Survey, and ETH Zurich have developed an earthquake early warning (EEW) demonstration system for California and the Pacific Northwest. To quickly determine the earthquake magnitude and location, 'ShakeAlert' currently processes and interprets real-time data-streams from ~400 seismic broadband and strong-motion stations within the California Integrated Seismic Network (CISN). Based on these parameters, the 'UserDisplay' software predicts and displays the arrival and intensity of shaking at a given user site. Real-time ShakeAlert feeds are currently shared with around 160 individuals, companies, and emergency response organizations to educate potential users about EEW and to identify needs and applications of EEW in a future operational warning system. Recently, scientists at the contributing institutions have started to develop algorithms for ShakeAlert that make use of high-rate real-time GPS data to improve the magnitude estimates for large earthquakes (M>6.5) and to determine slip distributions. Knowing the fault slip in (near) real-time is crucial for users relying on or operating distributed systems, such as for power, water or transportation, especially if these networks run close to or across large faults. As shown in an earlier study, slip information is also useful to predict (in a probabilistic sense) how far a fault rupture will propagate, thus enabling more robust probabilistic ground-motion predictions at distant locations. Finally, fault slip information is needed for tsunami warning, such as in the Cascadia subduction-zone. To handle extended fault-ruptures of large earthquakes in real-time, Caltech and USGS Pasadena are currently developing and testing a two-step procedure that combines seismic and geodetic data; in the first step, high-frequency strong-motion amplitudes are used to rapidly classify near-and far-source stations. Then, the location and extent of the 2D fault rupture is determined from comparison with pre-calculated generic and fault-specific templates ('FinDer' algorithm, Finite Fault Rupture Detector). In the second step, long-period dynamic displacement amplitudes from the GPS sites are back-projected onto this rupture line/plane to estimate the slip amplitudes ('GPSlip' algorithm). The corresponding back-projection relations were empirically derived from a suite of 3D waveform simulations. We are currently testing our approach in southern California (both real-time and offline), although not yet included in the current distribution of ShakeAlert. RTK/PPP(AR) solutions from the RTNet software at USGS Pasadena currently provide 1 Hz real-time position times series at ~100 GPS sensor locations. Output is in openly available in JSON format. We and UNAVCO have tested onsite (in-receiver) PPP(AR) processing using Trimble NetR9 receivers with RTX & GLONASS options enabled, of which Caltech has recently purchased 41 new units. These special GPS receivers will provide 5 Hz position and velocity streams. We will deliver the GPS RTX output (in GSOF format) into the EEW system (in Earthworm tracebuf2 format). The new receivers are to be installed at 'zipper array' stations of the SCSN in upcoming months. In addition, we have developed a framework for end-to-end offline testing with archived and simulated waveform data.
Component Prioritization Schema for Achieving Maximum Time and Cost Benefits from Software Testing

NASA Astrophysics Data System (ADS)

Srivastava, Praveen Ranjan; Pareek, Deepak

Software testing is any activity aimed at evaluating an attribute or capability of a program or system and determining that it meets its required results. Defining the end of software testing represents crucial features of any software development project. A premature release will involve risks like undetected bugs, cost of fixing faults later, and discontented customers. Any software organization would want to achieve maximum possible benefits from software testing with minimum resources. Testing time and cost need to be optimized for achieving a competitive edge in the market. In this paper, we propose a schema, called the Component Prioritization Schema (CPS), to achieve an effective and uniform prioritization of the software components. This schema serves as an extension to the Non Homogenous Poisson Process based Cumulative Priority Model. We also introduce an approach for handling time-intensive versus cost-intensive projects.

A software engineering approach to expert system design and verification

NASA Technical Reports Server (NTRS)

Bochsler, Daniel C.; Goodwin, Mary Ann

1988-01-01

Software engineering design and verification methods for developing expert systems are not yet well defined. Integration of expert system technology into software production environments will require effective software engineering methodologies to support the entire life cycle of expert systems. The software engineering methods used to design and verify an expert system, RENEX, is discussed. RENEX demonstrates autonomous rendezvous and proximity operations, including replanning trajectory events and subsystem fault detection, onboard a space vehicle during flight. The RENEX designers utilized a number of software engineering methodologies to deal with the complex problems inherent in this system. An overview is presented of the methods utilized. Details of the verification process receive special emphasis. The benefits and weaknesses of the methods for supporting the development life cycle of expert systems are evaluated, and recommendations are made based on the overall experiences with the methods.
Remote Agent Experiment

NASA Technical Reports Server (NTRS)

Benard, Doug; Dorais, Gregory A.; Gamble, Ed; Kanefsky, Bob; Kurien, James; Millar, William; Muscettola, Nicola; Nayak, Pandu; Rouquette, Nicolas; Rajan, Kanna;

2000-01-01

Remote Agent (RA) is a model-based, reusable artificial intelligence (At) software system that enables goal-based spacecraft commanding and robust fault recovery. RA was flight validated during an experiment on board of DS1 between May 17th and May 21th, 1999.

Software augmented buildings: Exploiting existing infrastructure to improve energy efficiency and comfort in commercial buildings

NASA Astrophysics Data System (ADS)

Balaji, Bharathan

Commercial buildings consume 19% of energy in the US as of 2010, and traditionally, their energy use has been optimized through improved equipment efficiency and retrofits. Beyond improved hardware and infrastructure, there exists a tremendous potential in reducing energy use through better monitoring and operation. We present several applications that we developed and deployed to support our thesis that building energy use can be reduced through sensing, monitoring and optimization software that modulates use of building subsystems including HVAC. We focus on HVAC systems as these constitute 48-55% of building energy use. Specifically, in case of sensing, we describe an energy apportionment system that enables us to estimate real-time zonal HVAC power consumption by analyzing existing sensor information. With this energy breakdown, we can measure effectiveness of optimization solutions and identify inefficiencies. Central to energy efficiency improvement is determination of human occupancy in buildings. But this information is often unavailable or expensive to obtain using wide scale sensor deployment. We present our system that infers room level occupancy inexpensively by leveraging existing WiFi infrastructure. Occupancy information can be used not only to directly control HVAC but also to infer state of the building for predictive control. Building energy use is strongly influenced by human behaviors, and timely feedback mechanisms can encourage energy saving behavior. Occupants interact with HVAC using thermostats which has shown to be inadequate for thermal comfort. Building managers are responsible for incorporating energy efficiency measures, but our interviews reveal that they struggle to maintain efficiency due to lack of analytical tools and contextual information. We present our software services that provide energy feedback to occupants and building managers, improves comfort with personalized control and identifies energy wasting faults. For wide scale deployment of such energy saving software, they need to be portable across multiple buildings. However, buildings consist of heterogeneous equipment and use inconsistent naming schema, and developers need extensive domain knowledge to map sensor information to a standard format. To enable portability, we present an active learning algorithm that automates mapping building sensor metadata to a standard naming schema.
Experimental Fault Diagnosis in Systems Containing Finite Elements of Plate of Kirchoff by Using State Observers Methodology

NASA Astrophysics Data System (ADS)

Alegre, D. M.; Koroishi, E. H.; Melo, G. P.

2015-07-01

This paper presents a methodology for detection and localization of faults by using state observers. State Observers can rebuild the states not measured or values from points of difficult access in the system. So faults can be detected in these points without the knowledge of its measures, and can be track by the reconstructions of their states. In this paper this methodology will be applied in a system which represents a simplified model of a vehicle. In this model the chassis of the car was represented by a flat plate, which was divided in finite elements of plate (plate of Kirchoff), in addition, was considered the car suspension (springs and dampers). A test rig was built and the developed methodology was used to detect and locate faults on this system. In analyses done, the idea is to use a system with a specific fault, and then use the state observers to locate it, checking on a quantitative variation of the parameter of the system which caused this crash. For the computational simulations the software MATLAB was used.
The Design of Fault Tolerant Quantum Dot Cellular Automata Based Logic

NASA Technical Reports Server (NTRS)

Armstrong, C. Duane; Humphreys, William M.; Fijany, Amir

2002-01-01

As transistor geometries are reduced, quantum effects begin to dominate device performance. At some point, transistors cease to have the properties that make them useful computational components. New computing elements must be developed in order to keep pace with Moore s Law. Quantum dot cellular automata (QCA) represent an alternative paradigm to transistor-based logic. QCA architectures that are robust to manufacturing tolerances and defects must be developed. We are developing software that allows the exploration of fault tolerant QCA gate architectures by automating the specification, simulation, analysis and documentation processes.
Care 3, Phase 1, volume 1

NASA Technical Reports Server (NTRS)

Stiffler, J. J.; Bryant, L. A.; Guccione, L.

1979-01-01

A computer program to aid in accessing the reliability of fault tolerant avionics systems was developed. A simple mathematical expression was used to evaluate the reliability of any redundant configuration over any interval during which the failure rates and coverage parameters remained unaffected by configuration changes. Provision was made for convolving such expressions in order to evaluate the reliability of a dual mode system. A coverage model was also developed to determine the various relevant coverage coefficients as a function of the available hardware and software fault detector characteristics, and subsequent isolation and recovery delay statistics.
A fault tolerant 80960 engine controller

NASA Technical Reports Server (NTRS)

Reichmuth, D. M.; Gage, M. L.; Paterson, E. S.; Kramer, D. D.

1993-01-01

The paper describes the design of the 80960 Fault Tolerant Engine Controller for the supervision of engine operations, which was designed for the NASA Marshall Space Center. Consideration is given to the major electronic components of the controller, including the engine controller, effectors, and the sensors, as well as to the controller hardware, the controller module and the communications module, and the controller software. The architecture of the controller hardware allows modifications to be made to fit the requirements of any new propulsion systems. Multiple flow diagrams are presented illustrating the controller's operations.
Development of a space-systems network testbed

NASA Technical Reports Server (NTRS)

Lala, Jaynarayan; Alger, Linda; Adams, Stuart; Burkhardt, Laura; Nagle, Gail; Murray, Nicholas

1988-01-01

This paper describes a communications network testbed which has been designed to allow the development of architectures and algorithms that meet the functional requirements of future NASA communication systems. The central hardware components of the Network Testbed are programmable circuit switching communication nodes which can be adapted by software or firmware changes to customize the testbed to particular architectures and algorithms. Fault detection, isolation, and reconfiguration has been implemented in the Network with a hybrid approach which utilizes features of both centralized and distributed techniques to provide efficient handling of faults within the Network.
Advanced reliability modeling of fault-tolerant computer-based systems

NASA Technical Reports Server (NTRS)

Bavuso, S. J.

1982-01-01

Two methodologies for the reliability assessment of fault tolerant digital computer based systems are discussed. The computer-aided reliability estimation 3 (CARE 3) and gate logic software simulation (GLOSS) are assessment technologies that were developed to mitigate a serious weakness in the design and evaluation process of ultrareliable digital systems. The weak link is based on the unavailability of a sufficiently powerful modeling technique for comparing the stochastic attributes of one system against others. Some of the more interesting attributes are reliability, system survival, safety, and mission success.
Mark 4A antenna control system data handling architecture study

NASA Technical Reports Server (NTRS)

Briggs, H. C.; Eldred, D. B.

1991-01-01

A high-level review was conducted to provide an analysis of the existing architecture used to handle data and implement control algorithms for NASA's Deep Space Network (DSN) antennas and to make system-level recommendations for improving this architecture so that the DSN antennas can support the ever-tightening requirements of the next decade and beyond. It was found that the existing system is seriously overloaded, with processor utilization approaching 100 percent. A number of factors contribute to this overloading, including dated hardware, inefficient software, and a message-passing strategy that depends on serial connections between machines. At the same time, the system has shortcomings and idiosyncrasies that require extensive human intervention. A custom operating system kernel and an obscure programming language exacerbate the problems and should be modernized. A new architecture is presented that addresses these and other issues. Key features of the new architecture include a simplified message passing hierarchy that utilizes a high-speed local area network, redesign of particular processing function algorithms, consolidation of functions, and implementation of the architecture in modern hardware and software using mainstream computer languages and operating systems. The system would also allow incremental hardware improvements as better and faster hardware for such systems becomes available, and costs could potentially be low enough that redundancy would be provided economically. Such a system could support DSN requirements for the foreseeable future, though thorough consideration must be given to hard computational requirements, porting existing software functionality to the new system, and issues of fault tolerance and recovery.
Fault tolerant operation of switched reluctance machine

NASA Astrophysics Data System (ADS)

Wang, Wei

The energy crisis and environmental challenges have driven industry towards more energy efficient solutions. With nearly 60% of electricity consumed by various electric machines in industry sector, advancement in the efficiency of the electric drive system is of vital importance. Adjustable speed drive system (ASDS) provides excellent speed regulation and dynamic performance as well as dramatically improved system efficiency compared with conventional motors without electronics drives. Industry has witnessed tremendous grow in ASDS applications not only as a driving force but also as an electric auxiliary system for replacing bulky and low efficiency auxiliary hydraulic and mechanical systems. With the vast penetration of ASDS, its fault tolerant operation capability is more widely recognized as an important feature of drive performance especially for aerospace, automotive applications and other industrial drive applications demanding high reliability. The Switched Reluctance Machine (SRM), a low cost, highly reliable electric machine with fault tolerant operation capability, has drawn substantial attention in the past three decades. Nevertheless, SRM is not free of fault. Certain faults such as converter faults, sensor faults, winding shorts, eccentricity and position sensor faults are commonly shared among all ASDS. In this dissertation, a thorough understanding of various faults and their influence on transient and steady state performance of SRM is developed via simulation and experimental study, providing necessary knowledge for fault detection and post fault management. Lumped parameter models are established for fast real time simulation and drive control. Based on the behavior of the faults, a fault detection scheme is developed for the purpose of fast and reliable fault diagnosis. In order to improve the SRM power and torque capacity under faults, the maximum torque per ampere excitation are conceptualized and validated through theoretical analysis and experiments. With the proposed optimal waveform, torque production is greatly improved under the same Root Mean Square (RMS) current constraint. Additionally, position sensorless operation methods under phase faults are investigated to account for the combination of physical position sensor and phase winding faults. A comprehensive solution for position sensorless operation under single and multiple phases fault are proposed and validated through experiments. Continuous position sensorless operation with seamless transition between various numbers of phase fault is achieved.
Software Construction and Analysis Tools for Future Space Missions

NASA Technical Reports Server (NTRS)

Lowry, Michael R.; Clancy, Daniel (Technical Monitor)

2002-01-01

NASA and its international partners will increasingly depend on software-based systems to implement advanced functions for future space missions, such as Martian rovers that autonomously navigate long distances exploring geographic features formed by surface water early in the planet's history. The software-based functions for these missions will need to be robust and highly reliable, raising significant challenges in the context of recent Mars mission failures attributed to software faults. After reviewing these challenges, this paper describes tools that have been developed at NASA Ames that could contribute to meeting these challenges; 1) Program synthesis tools based on automated inference that generate documentation for manual review and annotations for automated certification. 2) Model-checking tools for concurrent object-oriented software that achieve memorability through synergy with program abstraction and static analysis tools.
Geological modeling of a fault zone in clay rocks at the Mont-Terri laboratory (Switzerland)

NASA Astrophysics Data System (ADS)

Kakurina, M.; Guglielmi, Y.; Nussbaum, C.; Valley, B.

2016-12-01

Clay-rich formations are considered to be a natural barrier for radionuclides or fluids (water, hydrocarbons, CO2) migration. However, little is known about the architecture of faults affecting clay formations because of their quick alteration at the Earth's surface. The Mont Terri Underground Research Laboratory provides exceptional conditions to investigate an un-weathered, perfectly exposed clay fault zone architecture and to conduct fault activation experiments that allow explore the conditions for stability of such clay faults. Here we show first results from a detailed geological model of the Mont Terri Main Fault architecture, using GoCad software, a detailed structural analysis of 6 fully cored and logged 30-to-50m long and 3-to-15m spaced boreholes crossing the fault zone. These high-definition geological data were acquired within the Fault Slip (FS) experiment project that consisted in fluid injections in different intervals within the fault using the SIMFIP probe to explore the conditions for the fault mechanical and seismic stability. The Mont Terri Main Fault "core" consists of a thrust zone about 0.8 to 3m wide that is bounded by two major fault planes. Between these planes, there is an assembly of distinct slickensided surfaces and various facies including scaly clays, fault gouge and fractured zones. Scaly clay including S-C bands and microfolds occurs in larger zones at top and bottom of the Mail Fault. A cm-thin layer of gouge, that is known to accommodate high strain parts, runs along the upper fault zone boundary. The non-scaly part mainly consists of undeformed rock block, bounded by slickensides. Such a complexity as well as the continuity of the two major surfaces are hard to correlate between the different boreholes even with the high density of geological data within the relatively small volume of the experiment. This may show that a poor strain localization occurred during faulting giving some perspectives about the potential for reactivation and leakage of faults affecting clay materials.
Physics Simulation Software for Autonomous Propellant Loading and Gas House Autonomous System Monitoring

NASA Technical Reports Server (NTRS)

Regalado Reyes, Bjorn Constant

2015-01-01

1. Kennedy Space Center (KSC) is developing a mobile launching system with autonomous propellant loading capabilities for liquid-fueled rockets. An autonomous system will be responsible for monitoring and controlling the storage, loading and transferring of cryogenic propellants. The Physics Simulation Software will reproduce the sensor data seen during the delivery of cryogenic fluids including valve positions, pressures, temperatures and flow rates. The simulator will provide insight into the functionality of the propellant systems and demonstrate the effects of potential faults. This will provide verification of the communications protocols and the autonomous system control. 2. The High Pressure Gas Facility (HPGF) stores and distributes hydrogen, nitrogen, helium and high pressure air. The hydrogen and nitrogen are stored in cryogenic liquid state. The cryogenic fluids pose several hazards to operators and the storage and transfer equipment. Constant monitoring of pressures, temperatures and flow rates are required in order to maintain the safety of personnel and equipment during the handling and storage of these commodities. The Gas House Autonomous System Monitoring software will be responsible for constantly observing and recording sensor data, identifying and predicting faults and relaying hazard and operational information to the operators.
Distributed controller clustering in software defined networks

PubMed Central

Gani, Abdullah; Akhunzada, Adnan; Talebian, Hamid; Choo, Kim-Kwang Raymond

2017-01-01

Software Defined Networking (SDN) is an emerging promising paradigm for network management because of its centralized network intelligence. However, the centralized control architecture of the software-defined networks (SDNs) brings novel challenges of reliability, scalability, fault tolerance and interoperability. In this paper, we proposed a novel clustered distributed controller architecture in the real setting of SDNs. The distributed cluster implementation comprises of multiple popular SDN controllers. The proposed mechanism is evaluated using a real world network topology running on top of an emulated SDN environment. The result shows that the proposed distributed controller clustering mechanism is able to significantly reduce the average latency from 8.1% to 1.6%, the packet loss from 5.22% to 4.15%, compared to distributed controller without clustering running on HP Virtual Application Network (VAN) SDN and Open Network Operating System (ONOS) controllers respectively. Moreover, proposed method also shows reasonable CPU utilization results. Furthermore, the proposed mechanism makes possible to handle unexpected load fluctuations while maintaining a continuous network operation, even when there is a controller failure. The paper is a potential contribution stepping towards addressing the issues of reliability, scalability, fault tolerance, and inter-operability. PMID:28384312
Fault tolerant testbed evaluation, phase 1

NASA Technical Reports Server (NTRS)

Caluori, V., Jr.; Newberry, T.

1993-01-01

In recent years, avionics systems development costs have become the driving factor in the development of space systems, military aircraft, and commercial aircraft. A method of reducing avionics development costs is to utilize state-of-the-art software application generator (autocode) tools and methods. The recent maturity of application generator technology has the potential to dramatically reduce development costs by eliminating software development steps that have historically introduced errors and the need for re-work. Application generator tools have been demonstrated to be an effective method for autocoding non-redundant, relatively low-rate input/output (I/O) applications on the Space Station Freedom (SSF) program; however, they have not been demonstrated for fault tolerant, high-rate I/O, flight critical environments. This contract will evaluate the use of application generators in these harsh environments. Using Boeing's quad-redundant avionics system controller as the target system, Space Shuttle Guidance, Navigation, and Control (GN&C) software will be autocoded, tested, and evaluated in the Johnson (Space Center) Avionics Engineering Laboratory (JAEL). The response of the autocoded system will be shown to match the response of the existing Shuttle General Purpose Computers (GPC's), thereby demonstrating the viability of using autocode techniques in the development of future avionics systems.
The Impact of Contextual Factors on the Security of Code

DTIC Science & Technology

2014-12-30

in which a system is resourced, overseen, managed and assured will have a lot to do with how successfully it performs in actual practice. Software is...ensure proper and adequate system assurance . Because of the high degree of skill and specialization required, details about software and systems are...whole has to be carefully coordinated in order to assure against the types of faults that are the basis for most of the exploits listed in the Common
Fault tolerance in a supercomputer through dynamic repartitioning

DOEpatents

Chen, Dong; Coteus, Paul W.; Gara, Alan G.; Takken, Todd E.

2007-02-27

A multiprocessor, parallel computer is made tolerant to hardware failures by providing extra groups of redundant standby processors and by designing the system so that these extra groups of processors can be swapped with any group which experiences a hardware failure. This swapping can be under software control, thereby permitting the entire computer to sustain a hardware failure but, after swapping in the standby processors, to still appear to software as a pristine, fully functioning system.
Variation of the fractal dimension anisotropy of two major Cenozoic normal fault systems over space and time around the Snake River Plain, Idaho and SW Montana

NASA Astrophysics Data System (ADS)

Davarpanah, A.; Babaie, H. A.

2012-12-01

The interaction of the thermally induced stress field of the Yellowstone hotspot (YHS) with existing Basin and Range (BR) fault blocks, over the past 17 m.y., has produced a new, spatially and temporally variable system of normal faults around the Snake River Plain (SRP) in Idaho and Wyoming-Montana area. Data about the trace of these new cross faults (CF) and older BR normal faults were acquired from a combination of satellite imageries, DEM, and USGS geological maps and databases at scales of 1:24,000, 1:100,000, 1:250,000, 1:1000, 000, and 1:2,500, 000, and classified based on their azimuth in ArcGIS 10. The box-counting fractal dimension (Db) of the BR fault traces, determined applying the Benoit software, and the anisotropy intensity (ellipticity) of the fractal dimensions, measured with the modified Cantor dust method applying the AMOCADO software, were measured in two large spatial domains (I and II). The Db and anisotropy of the cross faults were studied in five temporal domains (T1-T5) classified based on the geologic age of successive eruptive centers (12 Ma to recent) of the YHS along the eastern SRP. The fractal anisotropy of the CF system in each temporal domain was also spatially determined in the southern part (domain S1), central part (domain S2), and northern part (domain S3) of the SRP. Line (fault trace) density maps for the BR and CF polylines reveal a higher linear density (trace length per unit area) for the BR traces in the spatial domain I, and a higher linear density of the CF traces around the present Yellowstone National Park (S1T5) where most of the seismically active faults are located. Our spatio-temporal analysis reveals that the fractal dimension of the BR system in domain I (Db=1.423) is greater than that in domain II (Db=1.307). It also shows that the anisotropy of the fractal dimension in domain I is less eccentric (axial ratio: 1.242) than that in domain II (1.355), probably reflecting the greater variation in the trend of the BR system in domain I. The CF system in the S1T5 domain has the highest fractal dimension (Db=1.37) and the lowest anisotropy eccentricity (1.23) among the five temporal domains. These values positively correlate with the observed maxima on the fault trace density maps. The major axis of the anisotropy ellipses is consistently perpendicular to the average trend of the normal fault system in each domain, and therefore approximates the orientation of extension for normal faulting in each domain. This fact gives a NE-SW and NW-SE extension direction for the BR system in domains I and II, respectively. The observed NE-SW orientation of the major axes of the anisotropy ellipses in the youngest T4 and T5 temporal domains, oriented perpendicular to the mean trend of the normal faults in the these domains, suggests extension along the NE-SW direction for cross faulting in these areas. The spatial trajectories (form lines) of the minor axes of the anisotropy ellipses, and the mean trend of fault traces in the T4 and T5 temporal domains, define a large parabolic pattern about the axis of the eastern SRP, with its apex at the Yellowstone plateau.
Instrumentation System Diagnoses a Thermocouple

NASA Technical Reports Server (NTRS)

Perotti, Jose; Santiago, Josephine; Mata, Carlos; Vokrot, Peter; Zavala, Carlos; Burns, Bradley

2008-01-01

An improved self-validating thermocouple (SVT) instrumentation system not only acquires readings from a thermocouple but is also capable of detecting deterioration and a variety of discrete faults in the thermocouple and its lead wires. Prime examples of detectable discrete faults and deterioration include open- and short-circuit conditions and debonding of the thermocouple junction from the object, the temperature of which one seeks to measure. Debonding is the most common cause of errors in thermocouple measurements, but most prior SVT instrumentation systems have not been capable of detecting debonding. The improved SVT instrumentation system includes power circuitry, a cold-junction compensator, signal-conditioning circuitry, pulse-width-modulation (PWM) thermocouple-excitation circuitry, an analog-to-digital converter (ADC), a digital data processor, and a universal serial bus (USB) interface. The system can operate in any of the following three modes: temperature measurement, thermocouple validation, and bonding/debonding detection. The software running in the processor includes components that implement statistical algorithms to evaluate the state of the thermocouple and the instrumentation system. When the power is first turned on, the user can elect to start a diagnosis/ monitoring sequence, in which the PWM is used to estimate the characteristic times corresponding to the correct configuration. The user also has the option of using previous diagnostic values, which are stored in an electrically erasable, programmable read-only memory so that they are available every time the power is turned on.

A Generic Modeling Process to Support Functional Fault Model Development

NASA Technical Reports Server (NTRS)

Maul, William A.; Hemminger, Joseph A.; Oostdyk, Rebecca; Bis, Rachael A.

2016-01-01

Functional fault models (FFMs) are qualitative representations of a system's failure space that are used to provide a diagnostic of the modeled system. An FFM simulates the failure effect propagation paths within a system between failure modes and observation points. These models contain a significant amount of information about the system including the design, operation and off nominal behavior. The development and verification of the models can be costly in both time and resources. In addition, models depicting similar components can be distinct, both in appearance and function, when created individually, because there are numerous ways of representing the failure space within each component. Generic application of FFMs has the advantages of software code reuse: reduction of time and resources in both development and verification, and a standard set of component models from which future system models can be generated with common appearance and diagnostic performance. This paper outlines the motivation to develop a generic modeling process for FFMs at the component level and the effort to implement that process through modeling conventions and a software tool. The implementation of this generic modeling process within a fault isolation demonstration for NASA's Advanced Ground System Maintenance (AGSM) Integrated Health Management (IHM) project is presented and the impact discussed.
Fault Tolerance in ZigBee Wireless Sensor Networks

NASA Technical Reports Server (NTRS)

Alena, Richard; Gilstrap, Ray; Baldwin, Jarren; Stone, Thom; Wilson, Pete

2011-01-01

Wireless sensor networks (WSN) based on the IEEE 802.15.4 Personal Area Network standard are finding increasing use in the home automation and emerging smart energy markets. The network and application layers, based on the ZigBee 2007 PRO Standard, provide a convenient framework for component-based software that supports customer solutions from multiple vendors. This technology is supported by System-on-a-Chip solutions, resulting in extremely small and low-power nodes. The Wireless Connections in Space Project addresses the aerospace flight domain for both flight-critical and non-critical avionics. WSNs provide the inherent fault tolerance required for aerospace applications utilizing such technology. The team from Ames Research Center has developed techniques for assessing the fault tolerance of ZigBee WSNs challenged by radio frequency (RF) interference or WSN node failure.
Problems related to the integration of fault tolerant aircraft electronic systems

NASA Technical Reports Server (NTRS)

Bannister, J. A.; Adlakha, V.; Triyedi, K.; Alspaugh, T. A., Jr.

1982-01-01

Problems related to the design of the hardware for an integrated aircraft electronic system are considered. Taxonomies of concurrent systems are reviewed and a new taxonomy is proposed. An informal methodology intended to identify feasible regions of the taxonomic design space is described. Specific tools are recommended for use in the methodology. Based on the methodology, a preliminary strawman integrated fault tolerant aircraft electronic system is proposed. Next, problems related to the programming and control of inegrated aircraft electronic systems are discussed. Issues of system resource management, including the scheduling and allocation of real time periodic tasks in a multiprocessor environment, are treated in detail. The role of software design in integrated fault tolerant aircraft electronic systems is discussed. Conclusions and recommendations for further work are included.
Research on the fault diagnosis of bearing based on wavelet and demodulation

NASA Astrophysics Data System (ADS)

Li, Jiapeng; Yuan, Yu

2017-05-01

As a most commonly-used machine part, antifriction bearing is extensively used in mechanical equipment. Vibration signal analysis is one of the methods to monitor and diagnose the running status of antifriction bearings. Therefore, using wavelet analysis for demising is of great importance in the engineering practice. This paper firstly presented the basic theory of wavelet analysis to study the transformation, decomposition and reconstruction of wavelet. In addition, edition software LabVIEW was adopted to conduct wavelet and demodulation upon the vibration signal of antifriction bearing collected. With the combination of Hilbert envelop demodulation analysis, the fault character frequencies of the demised signal were extracted to conduct fault diagnosis analysis, which serves as a reference for the wavelet and demodulation of the vibration signal in engineering practice.
Distributed Fault Detection Based on Credibility and Cooperation for WSNs in Smart Grids.

PubMed

Shao, Sujie; Guo, Shaoyong; Qiu, Xuesong

2017-04-28

Due to the increasingly important role in monitoring and data collection that sensors play, accurate and timely fault detection is a key issue for wireless sensor networks (WSNs) in smart grids. This paper presents a novel distributed fault detection mechanism for WSNs based on credibility and cooperation. Firstly, a reasonable credibility model of a sensor is established to identify any suspicious status of the sensor according to its own temporal data correlation. Based on the credibility model, the suspicious sensor is then chosen to launch fault diagnosis requests. Secondly, the sending time of fault diagnosis request is discussed to avoid the transmission overhead brought about by unnecessary diagnosis requests and improve the efficiency of fault detection based on neighbor cooperation. The diagnosis reply of a neighbor sensor is analyzed according to its own status. Finally, to further improve the accuracy of fault detection, the diagnosis results of neighbors are divided into several classifications to judge the fault status of the sensors which launch the fault diagnosis requests. Simulation results show that this novel mechanism can achieve high fault detection ratio with a small number of fault diagnoses and low data congestion probability.
Distributed Fault Detection Based on Credibility and Cooperation for WSNs in Smart Grids

PubMed Central

Shao, Sujie; Guo, Shaoyong; Qiu, Xuesong

2017-01-01

Due to the increasingly important role in monitoring and data collection that sensors play, accurate and timely fault detection is a key issue for wireless sensor networks (WSNs) in smart grids. This paper presents a novel distributed fault detection mechanism for WSNs based on credibility and cooperation. Firstly, a reasonable credibility model of a sensor is established to identify any suspicious status of the sensor according to its own temporal data correlation. Based on the credibility model, the suspicious sensor is then chosen to launch fault diagnosis requests. Secondly, the sending time of fault diagnosis request is discussed to avoid the transmission overhead brought about by unnecessary diagnosis requests and improve the efficiency of fault detection based on neighbor cooperation. The diagnosis reply of a neighbor sensor is analyzed according to its own status. Finally, to further improve the accuracy of fault detection, the diagnosis results of neighbors are divided into several classifications to judge the fault status of the sensors which launch the fault diagnosis requests. Simulation results show that this novel mechanism can achieve high fault detection ratio with a small number of fault diagnoses and low data congestion probability. PMID:28452925
Salton Trough Post-seismic Afterslip, Viscoelastic Response, and Contribution to Regional Hazard

NASA Astrophysics Data System (ADS)

Parker, J. W.; Donnellan, A.; Lyzenga, G. A.

2012-12-01

The El Mayor-Cucapah M7.2 April 4 2010 earthquake in Baja California may have affected accumulated hazard to Southern California cities due to loading of regional faults including the Elsinore, San Jacinto and southern San Andreas, faults which already have over a century of tectonic loading. We examine changes observed via multiple seismic and geodetic techniques, including micro seismicity and proposed seismicity-based indicators of hazard, high-quality fault models, the Plate Boundary Observatory GNSS array (with 174 stations showing post-seismic transients with greater than 1 mm amplitude), and interferometric radar maps from UAVSAR (aircraft) flights, showing a network of aseismic fault slip events at distances up to 60 km from the end of the surface rupture. Finite element modeling is used to compute the expected coseismic motions at GPS stations with general agreement, including coseismic uplift at sites ~200 km north of the rupture. Postseismic response is also compared, with GNSS and also with the CIG software "RELAX." An initial examination of hazard is made comparing micro seismicity-based metrics, fault models, and changes to coulomb stress on nearby faults using the finite element model. Comparison of seismicity with interferograms and historic earthquakes show aseismic slip occurs on fault segments that have had earthquakes in the last 70 years, while other segments show no slip at the surface but do show high triggered seismicity. UAVSAR-based estimates of fault slip can be incorporated into the finite element model to correct Coloumb stress change.
A Reference Model for Software and System Inspections. White Paper

NASA Technical Reports Server (NTRS)

He, Lulu; Shull, Forrest

2009-01-01

Software Quality Assurance (SQA) is an important component of the software development process. SQA processes provide assurance that the software products and processes in the project life cycle conform to their specified requirements by planning, enacting, and performing a set of activities to provide adequate confidence that quality is being built into the software. Typical techniques include: (1) Testing (2) Simulation (3) Model checking (4) Symbolic execution (5) Management reviews (6) Technical reviews (7) Inspections (8) Walk-throughs (9) Audits (10) Analysis (complexity analysis, control flow analysis, algorithmic analysis) (11) Formal method Our work over the last few years has resulted in substantial knowledge about SQA techniques, especially the areas of technical reviews and inspections. But can we apply the same QA techniques to the system development process? If yes, what kind of tailoring do we need before applying them in the system engineering context? If not, what types of QA techniques are actually used at system level? And, is there any room for improvement.) After a brief examination of the system engineering literature (especially focused on NASA and DoD guidance) we found that: (1) System and software development process interact with each other at different phases through development life cycle (2) Reviews are emphasized in both system and software development. (Figl.3). For some reviews (e.g. SRR, PDR, CDR), there are both system versions and software versions. (3) Analysis techniques are emphasized (e.g. Fault Tree Analysis, Preliminary Hazard Analysis) and some details are given about how to apply them. (4) Reviews are expected to use the outputs of the analysis techniques. In other words, these particular analyses are usually conducted in preparation for (before) reviews. The goal of our work is to explore the interaction between the Quality Assurance (QA) techniques at the system level and the software level.
Three-Dimensional Geologic Model of Complex Fault Structures in the Upper Seco Creek Area, Medina and Uvalde Counties, South-Central Texas

USGS Publications Warehouse

Pantea, Michael P.; Cole, James C.; Smith, Bruce D.; Faith, Jason R.; Blome, Charles D.; Smith, David V.

2008-01-01

This multimedia report shows and describes digital three-dimensional faulted geologic surfaces and volumes of the lithologic units of the Edwards aquifer in the upper Seco Creek area of Medina and Uvalde Counties in south-central Texas. This geologic framework model was produced using (1) geologic maps and interpretations of depositional environments and paleogeography; (2) lithologic descriptions, interpretations, and geophysical logs from 31 drill holes; (3) rock core and detailed lithologic descriptions from one drill hole; (4) helicopter electromagnetic geophysical data; and (5) known major and minor faults in the study area. These faults were used because of their individual and collective effects on the continuity of the aquifer-forming units in the Edwards Group. Data and information were compared and validated with each other and reflect the complex relationships of structures in the Seco Creek area of the Balcones fault zone. This geologic framework model can be used as a tool to visually explore and study geologic structures within the Seco Creek area of the Balcones fault zone and to show the connectivity of hydrologic units of high and low permeability between and across faults. The software can be used to display other data and information, such as drill-hole data, on this geologic framework model in three-dimensional space.
AGSM Functional Fault Models for Fault Isolation Project

NASA Technical Reports Server (NTRS)

Harp, Janicce Leshay

2014-01-01

This project implements functional fault models to automate the isolation of failures during ground systems operations. FFMs will also be used to recommend sensor placement to improve fault isolation capabilities. The project enables the delivery of system health advisories to ground system operators.
V&V of Fault Management: Challenges and Successes

NASA Technical Reports Server (NTRS)

Fesq, Lorraine M.; Costello, Ken; Ohi, Don; Lu, Tiffany; Newhouse, Marilyn

2013-01-01

This paper describes the results of a special breakout session of the NASA Independent Verification and Validation (IV&V) Workshop held in the fall of 2012 entitled "V&V of Fault Management: Challenges and Successes." The NASA IV&V Program is in a unique position to interact with projects across all of the NASA development domains. Using this unique opportunity, the IV&V program convened a breakout session to enable IV&V teams to share their challenges and successes with respect to the V&V of Fault Management (FM) architectures and software. The presentations and discussions provided practical examples of pitfalls encountered while performing V&V of FM including the lack of consistent designs for implementing faults monitors and the fact that FM information is not centralized but scattered among many diverse project artifacts. The discussions also solidified the need for an early commitment to developing FM in parallel with the spacecraft systems as well as clearly defining FM terminology within a project.
ASCS online fault detection and isolation based on an improved MPCA

NASA Astrophysics Data System (ADS)

Peng, Jianxin; Liu, Haiou; Hu, Yuhui; Xi, Junqiang; Chen, Huiyan

2014-09-01

Multi-way principal component analysis (MPCA) has received considerable attention and been widely used in process monitoring. A traditional MPCA algorithm unfolds multiple batches of historical data into a two-dimensional matrix and cut the matrix along the time axis to form subspaces. However, low efficiency of subspaces and difficult fault isolation are the common disadvantages for the principal component model. This paper presents a new subspace construction method based on kernel density estimation function that can effectively reduce the storage amount of the subspace information. The MPCA model and the knowledge base are built based on the new subspace. Then, fault detection and isolation with the squared prediction error (SPE) statistic and the Hotelling ( T 2) statistic are also realized in process monitoring. When a fault occurs, fault isolation based on the SPE statistic is achieved by residual contribution analysis of different variables. For fault isolation of subspace based on the T 2 statistic, the relationship between the statistic indicator and state variables is constructed, and the constraint conditions are presented to check the validity of fault isolation. Then, to improve the robustness of fault isolation to unexpected disturbances, the statistic method is adopted to set the relation between single subspace and multiple subspaces to increase the corrective rate of fault isolation. Finally fault detection and isolation based on the improved MPCA is used to monitor the automatic shift control system (ASCS) to prove the correctness and effectiveness of the algorithm. The research proposes a new subspace construction method to reduce the required storage capacity and to prove the robustness of the principal component model, and sets the relationship between the state variables and fault detection indicators for fault isolation.
Fault diagnosis for analog circuits utilizing time-frequency features and improved VVRKFA

NASA Astrophysics Data System (ADS)

He, Wei; He, Yigang; Luo, Qiwu; Zhang, Chaolong

2018-04-01

This paper proposes a novel scheme for analog circuit fault diagnosis utilizing features extracted from the time-frequency representations of signals and an improved vector-valued regularized kernel function approximation (VVRKFA). First, the cross-wavelet transform is employed to yield the energy-phase distribution of the fault signals over the time and frequency domain. Since the distribution is high-dimensional, a supervised dimensionality reduction technique—the bilateral 2D linear discriminant analysis—is applied to build a concise feature set from the distributions. Finally, VVRKFA is utilized to locate the fault. In order to improve the classification performance, the quantum-behaved particle swarm optimization technique is employed to gradually tune the learning parameter of the VVRKFA classifier. The experimental results for the analog circuit faults classification have demonstrated that the proposed diagnosis scheme has an advantage over other approaches.
Secure Proactive Recovery a Hardware Based Mission Assurance Scheme

DTIC Science & Technology

2011-08-01

Room, January. Kalbarczyk, Z., Iyer, R.K., Bagchi, S. and Whisnant, K. (1999) " Chameleon : a software infrastructure for adaptive fault tolerance...components of this evaluation include a JAVA implementation based on Chameleon ARMORs (Kalbarczyk et al. 1999), ARENA simulation (http
An experimental evaluation of the REE SIFT environment for spaceborne applications

NASA Technical Reports Server (NTRS)

Whistnant, K.; Iyer, R. K.; Jones, P.; Some, R.; Rennels, D.

2002-01-01

This paper presents an experimental evaluation of a software-implemented fault tolerance environment built around a set of self-checking ARMOR proceses running on different machines that provide error detection and recovery services to themselves and to spaceborne scientific applications.
A PC based time domain reflectometer for space station cable fault isolation

NASA Technical Reports Server (NTRS)

Pham, Michael; McClean, Marty; Hossain, Sabbir; Vo, Peter; Kouns, Ken

1994-01-01

Significant problems are faced by astronauts on orbit in the Space Station when trying to locate electrical faults in multi-segment avionics and communication cables. These problems necessitate the development of an automated portable device that will detect and locate cable faults using the pulse-echo technique known as Time Domain Reflectometry. A breadboard time domain reflectometer (TDR) circuit board was designed and developed at the NASA-JSC. The TDR board works in conjunction with a GRiD lap-top computer to automate the fault detection and isolation process. A software program was written to automatically display the nature and location of any possible faults. The breadboard system can isolate open circuit and short circuit faults within two feet in a typical space station cable configuration. Follow-on efforts planned for 1994 will produce a compact, portable prototype Space Station TDR capable of automated switching in multi-conductor cables for high fidelity evaluation. This device has many possible commercial applications, including commercial and military aircraft avionics, cable TV, telephone, communication, information and computer network systems. This paper describes the principle of time domain reflectometry and the methodology for on-orbit avionics utility distribution system repair, utilizing the newly developed device called the Space Station Time Domain Reflectometer (SSTDR).
An extensible circuit QED architecture for quantum computation

NASA Astrophysics Data System (ADS)

Dicarlo, Leo

Realizing a logical qubit robust to single errors in its constituent physical elements is an immediate challenge for quantum information processing platforms. A longer-term challenge will be achieving quantum fault tolerance, i.e., improving logical qubit resilience by increasing redundancy in the underlying quantum error correction code (QEC). In QuTech, we target these challenges in collaboration with industrial and academic partners. I will present the circuit QED quantum hardware, room-temperature control electronics, and software components of the complete architecture. I will show the extensibility of each component to the Surface-17 and -49 circuits needed to reach the objectives with surface-code QEC, and provide an overview of latest developments. Research funded by IARPA and Intel Corporation.
Mission Management Computer Software for RLV-TD

NASA Astrophysics Data System (ADS)

Manju, C. R.; Joy, Josna Susan; Vidya, L.; Sheenarani, I.; Sruthy, C. N.; Viswanathan, P. C.; Dinesh, Sudin; Jayalekshmy, L.; Karuturi, Kesavabrahmaji; Sheema, E.; Syamala, S.; Unnikrishnan, S. Manju; Ali, S. Akbar; Paramasivam, R.; Sheela, D. S.; Shukkoor, A. Abdul; Lalithambika, V. R.; Mookiah, T.

2017-12-01

The Mission Management Computer (MMC) software is responsible for the autonomous navigation, sequencing, guidance and control of the Re-usable Launch Vehicle (RLV), through lift-off, ascent, coasting, re-entry, controlled descent and splashdown. A hard real-time system has been designed for handling the mission requirements in an integrated manner and for meeting the stringent timing constraints. Redundancy management and fault-tolerance techniques are also built into the system, in order to achieve a successful mission even in presence of component failures. This paper describes the functions and features of the components of the MMC software which has accomplished the successful RLV-Technology Demonstrator mission.
Method of gear fault diagnosis based on EEMD and improved Elman neural network

NASA Astrophysics Data System (ADS)

Zhang, Qi; Zhao, Wei; Xiao, Shungen; Song, Mengmeng

2017-05-01

Aiming at crack and wear and so on of gears Fault information is difficult to diagnose usually due to its weak, a gear fault diagnosis method that is based on EEMD and improved Elman neural network fusion is proposed. A number of IMF components are obtained by decomposing denoised all kinds of fault signals with EEMD, and the pseudo IMF components is eliminated by using the correlation coefficient method to obtain the effective IMF component. The energy characteristic value of each effective component is calculated as the input feature quantity of Elman neural network, and the improved Elman neural network is based on standard network by adding a feedback factor. The fault data of normal gear, broken teeth, cracked gear and attrited gear were collected by field collecting. The results were analyzed by the diagnostic method proposed in this paper. The results show that compared with the standard Elman neural network, Improved Elman neural network has the advantages of high diagnostic efficiency.
Building geomechanical characteristic model in Ilan geothermal area, NE Taiwan

NASA Astrophysics Data System (ADS)

Chiang, Yu-Hsuan; Hung, Jih-Hao

2015-04-01

National Energy Program-Phase II (NEPPII) was initiated to understand the geomechanical characteristic in Ilan geothermal area. In this study, we integrate well cores and logs (e.g. Nature Gamma-ray, Normal resistivity, Formation Micro Imager) which were acquired in HongChaiLin (HCL), Duck-Field (DF) and IC21 to determine the depth of fracture zone, in-situ stress state, the depth of basement and lithological characters. In addition, the subsurface in-situ stress state will be helpful to analyze the fault reactivation potential and slip tendency. By retrieved core from HCL well and the results of geophysical logging, indicated that the lithological character is slate (520m ~ 1500m) and the basement depth is around 520m. To get the minimum and maximum horizontal stress, several hydraulic fracturing tests were conducted in the interval of 750~765m on HCL well. The horizontal maximum and minimum stresses including the hydrostatic pressure are calculated as 15.39MPa and 13.57MPa, respectively. The vertical stress is decided by measuring the core density from 738m to 902m depth. The average core density is 2.71 g/cm3, and the vertical stress is 19.95 MPa (at 750m). From DF well, the basement depth is 468.9m. Besides, by analyzing the IC21 well logging data, we know the in-situ orientation of maximum horizontal stress is NE-SW. Using these parameters, the fault reactivation potential and slip tendency can be analyzed with 3DStress, Traptester software and demonstrated on model. On the other hand, we interpreted the horizons and faults from the nine seismic profiles including six N-S profiles, two W-E profiles and one NE-SW profile to construct the 3D subsurface structure model with GOCAD software. The result shows that Zhuosui fault and Kankou Formation are dip to north, but Hanxi fault and Xiaonanao fault are dip to south. In addition, there is a syncline-like structure on Nansuao Formation and the Chingshuihu member of the Lushan Formation. However, there is a conflict on Szeleng sandstone. We need to more drilling data to confirm the dip of Szeleng sandstone.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.