An experimental evaluation of software redundancy as a strategy for improving reliability
NASA Technical Reports Server (NTRS)
Eckhardt, Dave E., Jr.; Caglayan, Alper K.; Knight, John C.; Lee, Larry D.; Mcallister, David F.; Vouk, Mladen A.; Kelly, John P. J.
1990-01-01
The strategy of using multiple versions of independently developed software as a means to tolerate residual software design faults is suggested by the success of hardware redundancy for tolerating hardware failures. Although, as generally accepted, the independence of hardware failures resulting from physical wearout can lead to substantial increases in reliability for redundant hardware structures, a similar conclusion is not immediate for software. The degree to which design faults are manifested as independent failures determines the effectiveness of redundancy as a method for improving software reliability. Interest in multi-version software centers on whether it provides an adequate measure of increased reliability to warrant its use in critical applications. The effectiveness of multi-version software is studied by comparing estimates of the failure probabilities of these systems with the failure probabilities of single versions. The estimates are obtained under a model of dependent failures and compared with estimates obtained when failures are assumed to be independent. The experimental results are based on twenty versions of an aerospace application developed and certified by sixty programmers from four universities. Descriptions of the application, development and certification processes, and operational evaluation are given together with an analysis of the twenty versions.
NASA Technical Reports Server (NTRS)
Scalzo, F.
1983-01-01
Sensor redundancy management (SRM) requires a system which will detect failures and reconstruct avionics accordingly. A probability density function to determine false alarm rates, using an algorithmic approach was generated. Microcomputer software was developed which will print out tables of values for the cummulative probability of being in the domain of failure; system reliability; and false alarm probability, given a signal is in the domain of failure. The microcomputer software was applied to the sensor output data for various AFT1 F-16 flights and sensor parameters. Practical recommendations for further research were made.
ERIC Educational Resources Information Center
Lafferty, Mark T.
2010-01-01
The number of project failures and those projects completed over cost and over schedule has been a significant issue for software project managers. Among the many reasons for failure, inaccuracy in software estimation--the basis for project bidding, budgeting, planning, and probability estimates--has been identified as a root cause of a high…
NASA Technical Reports Server (NTRS)
Lawrence, Stella
1992-01-01
This paper is concerned with methods of measuring and developing quality software. Reliable flight and ground support software is a highly important factor in the successful operation of the space shuttle program. Reliability is probably the most important of the characteristics inherent in the concept of 'software quality'. It is the probability of failure free operation of a computer program for a specified time and environment.
1984-09-28
variables before simula- tion of model - Search for reality checks a, - Express uncertainty as a probability density distribution. a. H2 a, H-22 TWIF... probability that the software con- tains errors. This prior is updated as test failure data are accumulated. Only a p of 1 (software known to contain...discusssed; both parametric and nonparametric versions are presented. It is shown by the author that the bootstrap underlies the jackknife method and
Detection of faults and software reliability analysis
NASA Technical Reports Server (NTRS)
Knight, J. C.
1986-01-01
Multiversion or N-version programming was proposed as a method of providing fault tolerance in software. The approach requires the separate, independent preparation of multiple versions of a piece of software for some application. Specific topics addressed are: failure probabilities in N-version systems, consistent comparison in N-version systems, descriptions of the faults found in the Knight and Leveson experiment, analytic models of comparison testing, characteristics of the input regions that trigger faults, fault tolerance through data diversity, and the relationship between failures caused by automatically seeded faults.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mitchell, Scott A.; Ebeida, Mohamed Salah; Romero, Vicente J.
2015-09-01
This SAND report summarizes our work on the Sandia National Laboratory LDRD project titled "Efficient Probability of Failure Calculations for QMU using Computational Geometry" which was project #165617 and proposal #13-0144. This report merely summarizes our work. Those interested in the technical details are encouraged to read the full published results, and contact the report authors for the status of the software and follow-on projects.
NASA Technical Reports Server (NTRS)
Lee, Alice T.; Gunn, Todd; Pham, Tuan; Ricaldi, Ron
1994-01-01
This handbook documents the three software analysis processes the Space Station Software Analysis team uses to assess space station software, including their backgrounds, theories, tools, and analysis procedures. Potential applications of these analysis results are also presented. The first section describes how software complexity analysis provides quantitative information on code, such as code structure and risk areas, throughout the software life cycle. Software complexity analysis allows an analyst to understand the software structure, identify critical software components, assess risk areas within a software system, identify testing deficiencies, and recommend program improvements. Performing this type of analysis during the early design phases of software development can positively affect the process, and may prevent later, much larger, difficulties. The second section describes how software reliability estimation and prediction analysis, or software reliability, provides a quantitative means to measure the probability of failure-free operation of a computer program, and describes the two tools used by JSC to determine failure rates and design tradeoffs between reliability, costs, performance, and schedule.
A theoretical basis for the analysis of redundant software subject to coincident errors
NASA Technical Reports Server (NTRS)
Eckhardt, D. E., Jr.; Lee, L. D.
1985-01-01
Fundamental to the development of redundant software techniques fault-tolerant software, is an understanding of the impact of multiple-joint occurrences of coincident errors. A theoretical basis for the study of redundant software is developed which provides a probabilistic framework for empirically evaluating the effectiveness of the general (N-Version) strategy when component versions are subject to coincident errors, and permits an analytical study of the effects of these errors. The basic assumptions of the model are: (1) independently designed software components are chosen in a random sample; and (2) in the user environment, the system is required to execute on a stationary input series. The intensity of coincident errors, has a central role in the model. This function describes the propensity to introduce design faults in such a way that software components fail together when executing in the user environment. The model is used to give conditions under which an N-Version system is a better strategy for reducing system failure probability than relying on a single version of software. A condition which limits the effectiveness of a fault-tolerant strategy is studied, and it is posted whether system failure probability varies monotonically with increasing N or whether an optimal choice of N exists.
Effectiveness of back-to-back testing
NASA Technical Reports Server (NTRS)
Vouk, Mladen A.; Mcallister, David F.; Eckhardt, David E.; Caglayan, Alper; Kelly, John P. J.
1987-01-01
Three models of back-to-back testing processes are described. Two models treat the case where there is no intercomponent failure dependence. The third model describes the more realistic case where there is correlation among the failure probabilities of the functionally equivalent components. The theory indicates that back-to-back testing can, under the right conditions, provide a considerable gain in software reliability. The models are used to analyze the data obtained in a fault-tolerant software experiment. It is shown that the expected gain is indeed achieved, and exceeded, provided the intercomponent failure dependence is sufficiently small. However, even with the relatively high correlation the use of several functionally equivalent components coupled with back-to-back testing may provide a considerable reliability gain. Implications of this finding are that the multiversion software development is a feasible and cost effective approach to providing highly reliable software components intended for fault-tolerant software systems, on condition that special attention is directed at early detection and elimination of correlated faults.
Variation of Time Domain Failure Probabilities of Jack-up with Wave Return Periods
NASA Astrophysics Data System (ADS)
Idris, Ahmad; Harahap, Indra S. H.; Ali, Montassir Osman Ahmed
2018-04-01
This study evaluated failure probabilities of jack up units on the framework of time dependent reliability analysis using uncertainty from different sea states representing different return period of the design wave. Surface elevation for each sea state was represented by Karhunen-Loeve expansion method using the eigenfunctions of prolate spheroidal wave functions in order to obtain the wave load. The stochastic wave load was propagated on a simplified jack up model developed in commercial software to obtain the structural response due to the wave loading. Analysis of the stochastic response to determine the failure probability in excessive deck displacement in the framework of time dependent reliability analysis was performed by developing Matlab codes in a personal computer. Results from the study indicated that the failure probability increases with increase in the severity of the sea state representing a longer return period. Although the results obtained are in agreement with the results of a study of similar jack up model using time independent method at higher values of maximum allowable deck displacement, it is in contrast at lower values of the criteria where the study reported that failure probability decreases with increase in the severity of the sea state.
Hardware and software reliability estimation using simulations
NASA Technical Reports Server (NTRS)
Swern, Frederic L.
1994-01-01
The simulation technique is used to explore the validation of both hardware and software. It was concluded that simulation is a viable means for validating both hardware and software and associating a reliability number with each. This is useful in determining the overall probability of system failure of an embedded processor unit, and improving both the code and the hardware where necessary to meet reliability requirements. The methodologies were proved using some simple programs, and simple hardware models.
CARES/Life Software for Designing More Reliable Ceramic Parts
NASA Technical Reports Server (NTRS)
Nemeth, Noel N.; Powers, Lynn M.; Baker, Eric H.
1997-01-01
Products made from advanced ceramics show great promise for revolutionizing aerospace and terrestrial propulsion, and power generation. However, ceramic components are difficult to design because brittle materials in general have widely varying strength values. The CAPES/Life software eases this task by providing a tool to optimize the design and manufacture of brittle material components using probabilistic reliability analysis techniques. Probabilistic component design involves predicting the probability of failure for a thermomechanically loaded component from specimen rupture data. Typically, these experiments are performed using many simple geometry flexural or tensile test specimens. A static, dynamic, or cyclic load is applied to each specimen until fracture. Statistical strength and SCG (fatigue) parameters are then determined from these data. Using these parameters and the results obtained from a finite element analysis, the time-dependent reliability for a complex component geometry and loading is then predicted. Appropriate design changes are made until an acceptable probability of failure has been reached.
NASA Technical Reports Server (NTRS)
Moore, N. R.; Ebbeler, D. H.; Newlin, L. E.; Sutharshana, S.; Creager, M.
1992-01-01
An improved methodology for quantitatively evaluating failure risk of spaceflight systems to assess flight readiness and identify risk control measures is presented. This methodology, called Probabilistic Failure Assessment (PFA), combines operating experience from tests and flights with engineering analysis to estimate failure risk. The PFA methodology is of particular value when information on which to base an assessment of failure risk, including test experience and knowledge of parameters used in engineering analyses of failure phenomena, is expensive or difficult to acquire. The PFA methodology is a prescribed statistical structure in which engineering analysis models that characterize failure phenomena are used conjointly with uncertainties about analysis parameters and/or modeling accuracy to estimate failure probability distributions for specific failure modes, These distributions can then be modified, by means of statistical procedures of the PFA methodology, to reflect any test or flight experience. Conventional engineering analysis models currently employed for design of failure prediction are used in this methodology. The PFA methodology is described and examples of its application are presented. Conventional approaches to failure risk evaluation for spaceflight systems are discussed, and the rationale for the approach taken in the PFA methodology is presented. The statistical methods, engineering models, and computer software used in fatigue failure mode applications are thoroughly documented.
Reliability analysis of the F-8 digital fly-by-wire system
NASA Technical Reports Server (NTRS)
Brock, L. D.; Goodman, H. A.
1981-01-01
The F-8 Digital Fly-by-Wire (DFBW) flight test program intended to provide the technology for advanced control systems, giving aircraft enhanced performance and operational capability is addressed. A detailed analysis of the experimental system was performed to estimated the probabilities of two significant safety critical events: (1) loss of primary flight control function, causing reversion to the analog bypass system; and (2) loss of the aircraft due to failure of the electronic flight control system. The analysis covers appraisal of risks due to random equipment failure, generic faults in design of the system or its software, and induced failure due to external events. A unique diagrammatic technique was developed which details the combinatorial reliability equations for the entire system, promotes understanding of system failure characteristics, and identifies the most likely failure modes. The technique provides a systematic method of applying basic probability equations and is augmented by a computer program written in a modular fashion that duplicates the structure of these equations.
Probabilistic Prediction of Lifetimes of Ceramic Parts
NASA Technical Reports Server (NTRS)
Nemeth, Noel N.; Gyekenyesi, John P.; Jadaan, Osama M.; Palfi, Tamas; Powers, Lynn; Reh, Stefan; Baker, Eric H.
2006-01-01
ANSYS/CARES/PDS is a software system that combines the ANSYS Probabilistic Design System (PDS) software with a modified version of the Ceramics Analysis and Reliability Evaluation of Structures Life (CARES/Life) Version 6.0 software. [A prior version of CARES/Life was reported in Program for Evaluation of Reliability of Ceramic Parts (LEW-16018), NASA Tech Briefs, Vol. 20, No. 3 (March 1996), page 28.] CARES/Life models effects of stochastic strength, slow crack growth, and stress distribution on the overall reliability of a ceramic component. The essence of the enhancement in CARES/Life 6.0 is the capability to predict the probability of failure using results from transient finite-element analysis. ANSYS PDS models the effects of uncertainty in material properties, dimensions, and loading on the stress distribution and deformation. ANSYS/CARES/PDS accounts for the effects of probabilistic strength, probabilistic loads, probabilistic material properties, and probabilistic tolerances on the lifetime and reliability of the component. Even failure probability becomes a stochastic quantity that can be tracked as a response variable. ANSYS/CARES/PDS enables tracking of all stochastic quantities in the design space, thereby enabling more precise probabilistic prediction of lifetimes of ceramic components.
NASA Technical Reports Server (NTRS)
Moore, N. R.; Ebbeler, D. H.; Newlin, L. E.; Sutharshana, S.; Creager, M.
1992-01-01
An improved methodology for quantitatively evaluating failure risk of spaceflights systems to assess flight readiness and identify risk control measures is presented. This methodology, called Probabilistic Failure Assessment (PFA), combines operating experience from tests and flights with analytical modeling of failure phenomena to estimate failure risk. The PFA methodology is of particular value when information on which to base an assessment of failure risk, including test experience and knowledge of parameters used in analytical modeling, is expensive or difficult to acquire. The PFA methodology is a prescribed statistical structure in which analytical models that characterize failure phenomena are used conjointly with uncertainties about analysis parameters and/or modeling accuracy to estimate failure probability distributions for specific failure modes. These distributions can then be modified, by means of statistical procedures of the PFA methodology, to reflect any test or flight experience. State-of-the-art analytical models currently employed for design, failure prediction, or performance analysis are used in this methodology. The rationale for the statistical approach taken in the PFA methodology is discussed, the PFA methodology is described, and examples of its application to structural failure modes are presented. The engineering models and computer software used in fatigue crack growth and fatigue crack initiation applications are thoroughly documented.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hoisak, J; Manger, R; Dragojevic, I
Purpose: To perform a failure mode and effects analysis (FMEA) of the process for treating superficial skin cancers with the Xoft Axxent electronic brachytherapy (eBx) system, given the recent introduction of expanded quality control (QC) initiatives at our institution. Methods: A process map was developed listing all steps in superficial treatments with Xoft eBx, from the initial patient consult to the completion of the treatment course. The process map guided the FMEA to identify the failure modes for each step in the treatment workflow and assign Risk Priority Numbers (RPN), calculated as the product of the failure mode’s probability ofmore » occurrence (O), severity (S) and lack of detectability (D). FMEA was done with and without the inclusion of recent QC initiatives such as increased staffing, physics oversight, standardized source calibration, treatment planning and documentation. The failure modes with the highest RPNs were identified and contrasted before and after introduction of the QC initiatives. Results: Based on the FMEA, the failure modes with the highest RPN were related to source calibration, treatment planning, and patient setup/treatment delivery (Fig. 1). The introduction of additional physics oversight, standardized planning and safety initiatives such as checklists and time-outs reduced the RPNs of these failure modes. High-risk failure modes that could be mitigated with improved hardware and software interlocks were identified. Conclusion: The FMEA analysis identified the steps in the treatment process presenting the highest risk. The introduction of enhanced QC initiatives mitigated the risk of some of these failure modes by decreasing their probability of occurrence and increasing their detectability. This analysis demonstrates the importance of well-designed QC policies, procedures and oversight in a Xoft eBx programme for treatment of superficial skin cancers. Unresolved high risk failure modes highlight the need for non-procedural quality initiatives such as improved planning software and more robust hardware interlock systems.« less
Risk-Based Object Oriented Testing
NASA Technical Reports Server (NTRS)
Rosenberg, Linda H.; Stapko, Ruth; Gallo, Albert
2000-01-01
Software testing is a well-defined phase of the software development life cycle. Functional ("black box") testing and structural ("white box") testing are two methods of test case design commonly used by software developers. A lesser known testing method is risk-based testing, which takes into account the probability of failure of a portion of code as determined by its complexity. For object oriented programs, a methodology is proposed for identification of risk-prone classes. Risk-based testing is a highly effective testing technique that can be used to find and fix the most important problems as quickly as possible.
Development Of Knowledge Systems For Trouble Shooting Complex Production Machinery
NASA Astrophysics Data System (ADS)
Sanford, Richard L.; Novak, Thomas; Meigs, James R.
1987-05-01
This paper discusses the use of knowledge base system software for microcomputers to aid repairmen in diagnosing electrical failures in complex mining machinery. The knowledge base is constructed to allow the user to input initial symptoms of the failed machine, and the most probable cause of failure is traced through the knowledge base, with the software requesting additional information such as voltage or resistance measurements as needed. Although the case study presented is for an underground mining machine, results have application to any industry using complex machinery. Two commercial expert-system development tools (M1 TM and Insight 2+TM) and an Al language (Turbo PrologTM) are discussed with emphasis on ease of application and suitability for this study.
System Risk Balancing Profiles: Software Component
NASA Technical Reports Server (NTRS)
Kelly, John C.; Sigal, Burton C.; Gindorf, Tom
2000-01-01
The Software QA / V&V guide will be reviewed and updated based on feedback from NASA organizations and others with a vested interest in this area. Hardware, EEE Parts, Reliability, and Systems Safety are a sample of the future guides that will be developed. Cost Estimates, Lessons Learned, Probability of Failure and PACTS (Prevention, Avoidance, Control or Test) are needed to provide a more complete risk management strategy. This approach to risk management is designed to help balance the resources and program content for risk reduction for NASA's changing environment.
Analysis of whisker-toughened CMC structural components using an interactive reliability model
NASA Technical Reports Server (NTRS)
Duffy, Stephen F.; Palko, Joseph L.
1992-01-01
Realizing wider utilization of ceramic matrix composites (CMC) requires the development of advanced structural analysis technologies. This article focuses on the use of interactive reliability models to predict component probability of failure. The deterministic William-Warnke failure criterion serves as theoretical basis for the reliability model presented here. The model has been implemented into a test-bed software program. This computer program has been coupled to a general-purpose finite element program. A simple structural problem is presented to illustrate the reliability model and the computer algorithm.
NASA Technical Reports Server (NTRS)
Nemeth, Noel
2013-01-01
Models that predict the failure probability of monolithic glass and ceramic components under multiaxial loading have been developed by authors such as Batdorf, Evans, and Matsuo. These "unit-sphere" failure models assume that the strength-controlling flaws are randomly oriented, noninteracting planar microcracks of specified geometry but of variable size. This report develops a formulation to describe the probability density distribution of the orientation of critical strength-controlling flaws that results from an applied load. This distribution is a function of the multiaxial stress state, the shear sensitivity of the flaws, the Weibull modulus, and the strength anisotropy. Examples are provided showing the predicted response on the unit sphere for various stress states for isotropic and transversely isotropic (anisotropic) materials--including the most probable orientation of critical flaws for offset uniaxial loads with strength anisotropy. The author anticipates that this information could be used to determine anisotropic stiffness degradation or anisotropic damage evolution for individual brittle (or quasi-brittle) composite material constituents within finite element or micromechanics-based software
A method for producing digital probabilistic seismic landslide hazard maps
Jibson, R.W.; Harp, E.L.; Michael, J.A.
2000-01-01
The 1994 Northridge, California, earthquake is the first earthquake for which we have all of the data sets needed to conduct a rigorous regional analysis of seismic slope instability. These data sets include: (1) a comprehensive inventory of triggered landslides, (2) about 200 strong-motion records of the mainshock, (3) 1:24 000-scale geologic mapping of the region, (4) extensive data on engineering properties of geologic units, and (5) high-resolution digital elevation models of the topography. All of these data sets have been digitized and rasterized at 10 m grid spacing using ARC/INFO GIS software on a UNIX computer. Combining these data sets in a dynamic model based on Newmark's permanent-deformation (sliding-block) analysis yields estimates of coseismic landslide displacement in each grid cell from the Northridge earthquake. The modeled displacements are then compared with the digital inventory of landslides triggered by the Northridge earthquake to construct a probability curve relating predicted displacement to probability of failure. This probability function can be applied to predict and map the spatial variability in failure probability in any ground-shaking conditions of interest. We anticipate that this mapping procedure will be used to construct seismic landslide hazard maps that will assist in emergency preparedness planning and in making rational decisions regarding development and construction in areas susceptible to seismic slope failure. ?? 2000 Elsevier Science B.V. All rights reserved.
NASA Technical Reports Server (NTRS)
Ortega, J. M.
1986-01-01
Various graduate research activities in the field of computer science are reported. Among the topics discussed are: (1) failure probabilities in multi-version software; (2) Gaussian Elimination on parallel computers; (3) three dimensional Poisson solvers on parallel/vector computers; (4) automated task decomposition for multiple robot arms; (5) multi-color incomplete cholesky conjugate gradient methods on the Cyber 205; and (6) parallel implementation of iterative methods for solving linear equations.
Requirements: Towards an understanding on why software projects fail
NASA Astrophysics Data System (ADS)
Hussain, Azham; Mkpojiogu, Emmanuel O. C.
2016-08-01
Requirement engineering is at the foundation of every successful software project. There are many reasons for software project failures; however, poorly engineered requirements process contributes immensely to the reason why software projects fail. Software project failure is usually costly and risky and could also be life threatening. Projects that undermine requirements engineering suffer or are likely to suffer from failures, challenges and other attending risks. The cost of project failures and overruns when estimated is very huge. Furthermore, software project failures or overruns pose a challenge in today's competitive market environment. It affects the company's image, goodwill, and revenue drive and decreases the perceived satisfaction of customers and clients. In this paper, requirements engineering was discussed. Its role in software projects success was elaborated. The place of software requirements process in relation to software project failure was explored and examined. Also, project success and failure factors were also discussed with emphasis placed on requirements factors as they play a major role in software projects' challenges, successes and failures. The paper relied on secondary data and empirical statistics to explore and examine factors responsible for the successes, challenges and failures of software projects in large, medium and small scaled software companies.
The predictive information obtained by testing multiple software versions
NASA Technical Reports Server (NTRS)
Lee, Larry D.
1987-01-01
Multiversion programming is a redundancy approach to developing highly reliable software. In applications of this method, two or more versions of a program are developed independently by different programmers and the versions are combined to form a redundant system. One variation of this approach consists of developing a set of n program versions and testing the versions to predict the failure probability of a particular program or a system formed from a subset of the programs. The precision that might be obtained, and also the effect of programmer variability if predictions are made over repetitions of the process of generating different program versions, are examined.
NASA Astrophysics Data System (ADS)
Chen, Po-Hao; Botzolakis, Emmanuel; Mohan, Suyash; Bryan, R. N.; Cook, Tessa
2016-03-01
In radiology, diagnostic errors occur either through the failure of detection or incorrect interpretation. Errors are estimated to occur in 30-35% of all exams and contribute to 40-54% of medical malpractice litigations. In this work, we focus on reducing incorrect interpretation of known imaging features. Existing literature categorizes cognitive bias leading a radiologist to an incorrect diagnosis despite having correctly recognized the abnormal imaging features: anchoring bias, framing effect, availability bias, and premature closure. Computational methods make a unique contribution, as they do not exhibit the same cognitive biases as a human. Bayesian networks formalize the diagnostic process. They modify pre-test diagnostic probabilities using clinical and imaging features, arriving at a post-test probability for each possible diagnosis. To translate Bayesian networks to clinical practice, we implemented an entirely web-based open-source software tool. In this tool, the radiologist first selects a network of choice (e.g. basal ganglia). Then, large, clearly labeled buttons displaying salient imaging features are displayed on the screen serving both as a checklist and for input. As the radiologist inputs the value of an extracted imaging feature, the conditional probabilities of each possible diagnosis are updated. The software presents its level of diagnostic discrimination using a Pareto distribution chart, updated with each additional imaging feature. Active collaboration with the clinical radiologist is a feasible approach to software design and leads to design decisions closely coupling the complex mathematics of conditional probability in Bayesian networks with practice.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dana L. Kelly
Typical engineering systems in applications with high failure consequences such as nuclear reactor plants often employ redundancy and diversity of equipment in an effort to lower the probability of failure and therefore risk. However, it has long been recognized that dependencies exist in these redundant and diverse systems. Some dependencies, such as common sources of electrical power, are typically captured in the logic structure of the risk model. Others, usually referred to as intercomponent dependencies, are treated implicitly by introducing one or more statistical parameters into the model. Such common-cause failure models have limitations in a simulation environment. In addition,more » substantial subjectivity is associated with parameter estimation for these models. This paper describes an approach in which system performance is simulated by drawing samples from the joint distributions of dependent variables. The approach relies on the notion of a copula distribution, a notion which has been employed by the actuarial community for ten years or more, but which has seen only limited application in technological risk assessment. The paper also illustrates how equipment failure data can be used in a Bayesian framework to estimate the parameter values in the copula model. This approach avoids much of the subjectivity required to estimate parameters in traditional common-cause failure models. Simulation examples are presented for failures in time. The open-source software package R is used to perform the simulations. The open-source software package WinBUGS is used to perform the Bayesian inference via Markov chain Monte Carlo sampling.« less
Development of GENOA Progressive Failure Parallel Processing Software Systems
NASA Technical Reports Server (NTRS)
Abdi, Frank; Minnetyan, Levon
1999-01-01
A capability consisting of software development and experimental techniques has been developed and is described. The capability is integrated into GENOA-PFA to model polymer matrix composite (PMC) structures. The capability considers the physics and mechanics of composite materials and structure by integration of a hierarchical multilevel macro-scale (lamina, laminate, and structure) and micro scale (fiber, matrix, and interface) simulation analyses. The modeling involves (1) ply layering methodology utilizing FEM elements with through-the-thickness representation, (2) simulation of effects of material defects and conditions (e.g., voids, fiber waviness, and residual stress) on global static and cyclic fatigue strengths, (3) including material nonlinearities (by updating properties periodically) and geometrical nonlinearities (by Lagrangian updating), (4) simulating crack initiation. and growth to failure under static, cyclic, creep, and impact loads. (5) progressive fracture analysis to determine durability and damage tolerance. (6) identifying the percent contribution of various possible composite failure modes involved in critical damage events. and (7) determining sensitivities of failure modes to design parameters (e.g., fiber volume fraction, ply thickness, fiber orientation. and adhesive-bond thickness). GENOA-PFA progressive failure analysis is now ready for use to investigate the effects on structural responses to PMC material degradation from damage induced by static, cyclic (fatigue). creep, and impact loading in 2D/3D PMC structures subjected to hygrothermal environments. Its use will significantly facilitate targeting design parameter changes that will be most effective in reducing the probability of a given failure mode occurring.
Enhanced CARES Software Enables Improved Ceramic Life Prediction
NASA Technical Reports Server (NTRS)
Janosik, Lesley A.
1997-01-01
The NASA Lewis Research Center has developed award-winning software that enables American industry to establish the reliability and life of brittle material (e.g., ceramic, intermetallic, graphite) structures in a wide variety of 21st century applications. The CARES (Ceramics Analysis and Reliability Evaluation of Structures) series of software is successfully used by numerous engineers in industrial, academic, and government organizations as an essential element of the structural design and material selection processes. The latest version of this software, CARES/Life, provides a general- purpose design tool that predicts the probability of failure of a ceramic component as a function of its time in service. CARES/Life was recently enhanced by adding new modules designed to improve functionality and user-friendliness. In addition, a beta version of the newly-developed CARES/Creep program (for determining the creep life of monolithic ceramic components) has just been released to selected organizations.
NASA Astrophysics Data System (ADS)
Dulo, D. A.
Safety critical software systems permeate spacecraft, and in a long term venture like a starship would be pervasive in every system of the spacecraft. Yet software failure today continues to plague both the systems and the organizations that develop them resulting in the loss of life, time, money, and valuable system platforms. A starship cannot afford this type of software failure in long journeys away from home. A single software failure could have catastrophic results for the spaceship and the crew onboard. This paper will offer a new approach to developing safe reliable software systems through focusing not on the traditional safety/reliability engineering paradigms but rather by focusing on a new paradigm: Resilience and Failure Obviation Engineering. The foremost objective of this approach is the obviation of failure, coupled with the ability of a software system to prevent or adapt to complex changing conditions in real time as a safety valve should failure occur to ensure safe system continuity. Through this approach, safety is ensured through foresight to anticipate failure and to adapt to risk in real time before failure occurs. In a starship, this type of software engineering is vital. Through software developed in a resilient manner, a starship would have reduced or eliminated software failure, and would have the ability to rapidly adapt should a software system become unstable or unsafe. As a result, long term software safety, reliability, and resilience would be present for a successful long term starship mission.
Failure analysis and modeling of a VAXcluster system
NASA Technical Reports Server (NTRS)
Tang, Dong; Iyer, Ravishankar K.; Subramani, Sujatha S.
1990-01-01
This paper discusses the results of a measurement-based analysis of real error data collected from a DEC VAXcluster multicomputer system. In addition to evaluating basic system dependability characteristics such as error and failure distributions and hazard rates for both individual machines and for the VAXcluster, reward models were developed to analyze the impact of failures on the system as a whole. The results show that more than 46 percent of all failures were due to errors in shared resources. This is despite the fact that these errors have a recovery probability greater than 0.99. The hazard rate calculations show that not only errors, but also failures occur in bursts. Approximately 40 percent of all failures occur in bursts and involved multiple machines. This result indicates that correlated failures are significant. Analysis of rewards shows that software errors have the lowest reward (0.05 vs 0.74 for disk errors). The expected reward rate (reliability measure) of the VAXcluster drops to 0.5 in 18 hours for the 7-out-of-7 model and in 80 days for the 3-out-of-7 model.
Estimation of probability of failure for damage-tolerant aerospace structures
NASA Astrophysics Data System (ADS)
Halbert, Keith
The majority of aircraft structures are designed to be damage-tolerant such that safe operation can continue in the presence of minor damage. It is necessary to schedule inspections so that minor damage can be found and repaired. It is generally not possible to perform structural inspections prior to every flight. The scheduling is traditionally accomplished through a deterministic set of methods referred to as Damage Tolerance Analysis (DTA). DTA has proven to produce safe aircraft but does not provide estimates of the probability of failure of future flights or the probability of repair of future inspections. Without these estimates maintenance costs cannot be accurately predicted. Also, estimation of failure probabilities is now a regulatory requirement for some aircraft. The set of methods concerned with the probabilistic formulation of this problem are collectively referred to as Probabilistic Damage Tolerance Analysis (PDTA). The goal of PDTA is to control the failure probability while holding maintenance costs to a reasonable level. This work focuses specifically on PDTA for fatigue cracking of metallic aircraft structures. The growth of a crack (or cracks) must be modeled using all available data and engineering knowledge. The length of a crack can be assessed only indirectly through evidence such as non-destructive inspection results, failures or lack of failures, and the observed severity of usage of the structure. The current set of industry PDTA tools are lacking in several ways: they may in some cases yield poor estimates of failure probabilities, they cannot realistically represent the variety of possible failure and maintenance scenarios, and they do not allow for model updates which incorporate observed evidence. A PDTA modeling methodology must be flexible enough to estimate accurately the failure and repair probabilities under a variety of maintenance scenarios, and be capable of incorporating observed evidence as it becomes available. This dissertation describes and develops new PDTA methodologies that directly address the deficiencies of the currently used tools. The new methods are implemented as a free, publicly licensed and open source R software package that can be downloaded from the Comprehensive R Archive Network. The tools consist of two main components. First, an explicit (and expensive) Monte Carlo approach is presented which simulates the life of an aircraft structural component flight-by-flight. This straightforward MC routine can be used to provide defensible estimates of the failure probabilities for future flights and repair probabilities for future inspections under a variety of failure and maintenance scenarios. This routine is intended to provide baseline estimates against which to compare the results of other, more efficient approaches. Second, an original approach is described which models the fatigue process and future scheduled inspections as a hidden Markov model. This model is solved using a particle-based approximation and the sequential importance sampling algorithm, which provides an efficient solution to the PDTA problem. Sequential importance sampling is an extension of importance sampling to a Markov process, allowing for efficient Bayesian updating of model parameters. This model updating capability, the benefit of which is demonstrated, is lacking in other PDTA approaches. The results of this approach are shown to agree with the results of the explicit Monte Carlo routine for a number of PDTA problems. Extensions to the typical PDTA problem, which cannot be solved using currently available tools, are presented and solved in this work. These extensions include incorporating observed evidence (such as non-destructive inspection results), more realistic treatment of possible future repairs, and the modeling of failure involving more than one crack (the so-called continuing damage problem). The described hidden Markov model / sequential importance sampling approach to PDTA has the potential to improve aerospace structural safety and reduce maintenance costs by providing a more accurate assessment of the risk of failure and the likelihood of repairs throughout the life of an aircraft.
NASA Technical Reports Server (NTRS)
Moore, N. R.; Ebbeler, D. H.; Newlin, L. E.; Sutharshana, S.; Creager, M.
1992-01-01
An improved methodology for quantitatively evaluating failure risk of spaceflight systems to assess flight readiness and identify risk control measures is presented. This methodology, called Probabilistic Failure Assessment (PFA), combines operating experience from tests and flights with engineering analysis to estimate failure risk. The PFA methodology is of particular value when information on which to base an assessment of failure risk, including test experience and knowledge of parameters used in engineering analyses of failure phenomena, is expensive or difficult to acquire. The PFA methodology is a prescribed statistical structure in which engineering analysis models that characterize failure phenomena are used conjointly with uncertainties about analysis parameters and/or modeling accuracy to estimate failure probability distributions for specific failure modes. These distributions can then be modified, by means of statistical procedures of the PFA methodology, to reflect any test or flight experience. Conventional engineering analysis models currently employed for design of failure prediction are used in this methodology. The PFA methodology is described and examples of its application are presented. Conventional approaches to failure risk evaluation for spaceflight systems are discussed, and the rationale for the approach taken in the PFA methodology is presented. The statistical methods, engineering models, and computer software used in fatigue failure mode applications are thoroughly documented.
Failure detection system risk reduction assessment
NASA Technical Reports Server (NTRS)
Aguilar, Robert B. (Inventor); Huang, Zhaofeng (Inventor)
2012-01-01
A process includes determining a probability of a failure mode of a system being analyzed reaching a failure limit as a function of time to failure limit, determining a probability of a mitigation of the failure mode as a function of a time to failure limit, and quantifying a risk reduction based on the probability of the failure mode reaching the failure limit and the probability of the mitigation.
NASA Technical Reports Server (NTRS)
Moore, N. R.; Ebbeler, D. H.; Newlin, L. E.; Sutharshana, S.; Creager, M.
1992-01-01
An improved methodology for quantitatively evaluating failure risk of spaceflight systems to assess flight readiness and identify risk control measures is presented. This methodology, called Probabilistic Failure Assessment (PFA), combines operating experience from tests and flights with analytical modeling of failure phenomena to estimate failure risk. The PFA methodology is of particular value when information on which to base an assessment of failure risk, including test experience and knowledge of parameters used in analytical modeling, is expensive or difficult to acquire. The PFA methodology is a prescribed statistical structure in which analytical models that characterize failure phenomena are used conjointly with uncertainties about analysis parameters and/or modeling accuracy to estimate failure probability distributions for specific failure modes. These distributions can then be modified, by means of statistical procedures of the PFA methodology, to reflect any test or flight experience. State-of-the-art analytical models currently employed for designs failure prediction, or performance analysis are used in this methodology. The rationale for the statistical approach taken in the PFA methodology is discussed, the PFA methodology is described, and examples of its application to structural failure modes are presented. The engineering models and computer software used in fatigue crack growth and fatigue crack initiation applications are thoroughly documented.
Onboard Sensor Data Qualification in Human-Rated Launch Vehicles
NASA Technical Reports Server (NTRS)
Wong, Edmond; Melcher, Kevin J.; Maul, William A.; Chicatelli, Amy K.; Sowers, Thomas S.; Fulton, Christopher; Bickford, Randall
2012-01-01
The avionics system software for human-rated launch vehicles requires an implementation approach that is robust to failures, especially the failure of sensors used to monitor vehicle conditions that might result in an abort determination. Sensor measurements provide the basis for operational decisions on human-rated launch vehicles. This data is often used to assess the health of system or subsystem components, to identify failures, and to take corrective action. An incorrect conclusion and/or response may result if the sensor itself provides faulty data, or if the data provided by the sensor has been corrupted. Operational decisions based on faulty sensor data have the potential to be catastrophic, resulting in loss of mission or loss of crew. To prevent these later situations from occurring, a Modular Architecture and Generalized Methodology for Sensor Data Qualification in Human-rated Launch Vehicles has been developed. Sensor Data Qualification (SDQ) is a set of algorithms that can be implemented in onboard flight software, and can be used to qualify data obtained from flight-critical sensors prior to the data being used by other flight software algorithms. Qualified data has been analyzed by SDQ and is determined to be a true representation of the sensed system state; that is, the sensor data is determined not to be corrupted by sensor faults or signal transmission faults. Sensor data can become corrupted by faults at any point in the signal path between the sensor and the flight computer. Qualifying the sensor data has the benefit of ensuring that erroneous data is identified and flagged before otherwise being used for operational decisions, thus increasing confidence in the response of the other flight software processes using the qualified data, and decreasing the probability of false alarms or missed detections.
Software dependability in the Tandem GUARDIAN system
NASA Technical Reports Server (NTRS)
Lee, Inhwan; Iyer, Ravishankar K.
1995-01-01
Based on extensive field failure data for Tandem's GUARDIAN operating system this paper discusses evaluation of the dependability of operational software. Software faults considered are major defects that result in processor failures and invoke backup processes to take over. The paper categorizes the underlying causes of software failures and evaluates the effectiveness of the process pair technique in tolerating software faults. A model to describe the impact of software faults on the reliability of an overall system is proposed. The model is used to evaluate the significance of key factors that determine software dependability and to identify areas for improvement. An analysis of the data shows that about 77% of processor failures that are initially considered due to software are confirmed as software problems. The analysis shows that the use of process pairs to provide checkpointing and restart (originally intended for tolerating hardware faults) allows the system to tolerate about 75% of reported software faults that result in processor failures. The loose coupling between processors, which results in the backup execution (the processor state and the sequence of events) being different from the original execution, is a major reason for the measured software fault tolerance. Over two-thirds (72%) of measured software failures are recurrences of previously reported faults. Modeling, based on the data, shows that, in addition to reducing the number of software faults, software dependability can be enhanced by reducing the recurrence rate.
Methods, apparatus and system for notification of predictable memory failure
Cher, Chen-Yong; Andrade Costa, Carlos H.; Park, Yoonho; Rosenburg, Bryan S.; Ryu, Kyung D.
2017-01-03
A method for providing notification of a predictable memory failure includes the steps of: obtaining information regarding at least one condition associated with a memory; calculating a memory failure probability as a function of the obtained information; calculating a failure probability threshold; and generating a signal when the memory failure probability exceeds the failure probability threshold, the signal being indicative of a predicted future memory failure.
Importance Sampling in the Evaluation and Optimization of Buffered Failure Probability
2015-07-01
12th International Conference on Applications of Statistics and Probability in Civil Engineering, ICASP12 Vancouver, Canada, July 12-15, 2015...Importance Sampling in the Evaluation and Optimization of Buffered Failure Probability Marwan M. Harajli Graduate Student, Dept. of Civil and Environ...criterion is usually the failure probability . In this paper, we examine the buffered failure probability as an attractive alternative to the failure
NASA Technical Reports Server (NTRS)
Wallace, Dolores R.
2003-01-01
In FY01 we learned that hardware reliability models need substantial changes to account for differences in software, thus making software reliability measurements more effective, accurate, and easier to apply. These reliability models are generally based on familiar distributions or parametric methods. An obvious question is 'What new statistical and probability models can be developed using non-parametric and distribution-free methods instead of the traditional parametric method?" Two approaches to software reliability engineering appear somewhat promising. The first study, begin in FY01, is based in hardware reliability, a very well established science that has many aspects that can be applied to software. This research effort has investigated mathematical aspects of hardware reliability and has identified those applicable to software. Currently the research effort is applying and testing these approaches to software reliability measurement, These parametric models require much project data that may be difficult to apply and interpret. Projects at GSFC are often complex in both technology and schedules. Assessing and estimating reliability of the final system is extremely difficult when various subsystems are tested and completed long before others. Parametric and distribution free techniques may offer a new and accurate way of modeling failure time and other project data to provide earlier and more accurate estimates of system reliability.
Advanced Information Processing System (AIPS)
NASA Technical Reports Server (NTRS)
Pitts, Felix L.
1993-01-01
Advanced Information Processing System (AIPS) is a computer systems philosophy, a set of validated hardware building blocks, and a set of validated services as embodied in system software. The goal of AIPS is to provide the knowledgebase which will allow achievement of validated fault-tolerant distributed computer system architectures, suitable for a broad range of applications, having failure probability requirements of 10E-9 at 10 hours. A background and description is given followed by program accomplishments, the current focus, applications, technology transfer, FY92 accomplishments, and funding.
NASA Technical Reports Server (NTRS)
Moore, N. R.; Ebbeler, D. H.; Newlin, L. E.; Sutharshana, S.; Creager, M.
1992-01-01
An improved methodology for quantitatively evaluating failure risk of spaceflight systems to assess flight readiness and identify risk control measures is presented. This methodology, called Probabilistic Failure Assessment (PFA), combines operating experience from tests and flights with engineering analysis to estimate failure risk. The PFA methodology is of particular value when information on which to base an assessment of failure risk, including test experience and knowledge of parameters used in engineering analyses of failure phenomena, is expensive or difficult to acquire. The PFA methodology is a prescribed statistical structure in which engineering analysis models that characterize failure phenomena are used conjointly with uncertainties about analysis parameters and/or modeling accuracy to estimate failure probability distributions for specific failure modes. These distributions can then be modified, by means of statistical procedures of the PFA methodology, to reflect any test or flight experience. Conventional engineering analysis models currently employed for design of failure prediction are used in this methodology. The PFA methodology is described and examples of its application are presented. Conventional approaches to failure risk evaluation for spaceflight systems are discussed, and the rationale for the approach taken in the PFA methodology is presented. The statistical methods, engineering models, and computer software used in fatigue failure mode applications are thoroughly documented.
Survival Predictions of Ceramic Crowns Using Statistical Fracture Mechanics
Nasrin, S.; Katsube, N.; Seghi, R.R.; Rokhlin, S.I.
2017-01-01
This work establishes a survival probability methodology for interface-initiated fatigue failures of monolithic ceramic crowns under simulated masticatory loading. A complete 3-dimensional (3D) finite element analysis model of a minimally reduced molar crown was developed using commercially available hardware and software. Estimates of material surface flaw distributions and fatigue parameters for 3 reinforced glass-ceramics (fluormica [FM], leucite [LR], and lithium disilicate [LD]) and a dense sintered yttrium-stabilized zirconia (YZ) were obtained from the literature and incorporated into the model. Utilizing the proposed fracture mechanics–based model, crown survival probability as a function of loading cycles was obtained from simulations performed on the 4 ceramic materials utilizing identical crown geometries and loading conditions. The weaker ceramic materials (FM and LR) resulted in lower survival rates than the more recently developed higher-strength ceramic materials (LD and YZ). The simulated 10-y survival rate of crowns fabricated from YZ was only slightly better than those fabricated from LD. In addition, 2 of the model crown systems (FM and LD) were expanded to determine regional-dependent failure probabilities. This analysis predicted that the LD-based crowns were more likely to fail from fractures initiating from margin areas, whereas the FM-based crowns showed a slightly higher probability of failure from fractures initiating from the occlusal table below the contact areas. These 2 predicted fracture initiation locations have some agreement with reported fractographic analyses of failed crowns. In this model, we considered the maximum tensile stress tangential to the interfacial surface, as opposed to the more universally reported maximum principal stress, because it more directly impacts crack propagation. While the accuracy of these predictions needs to be experimentally verified, the model can provide a fundamental understanding of the importance that pre-existing flaws at the intaglio surface have on fatigue failures. PMID:28107637
Contemporary issues in HIM. Software engineering--what does it mean to you?
Wear, L L
1994-02-01
There have been significant advances in the way we develop software in the last two decades. Many companies are using the new process oriented approach to software development. Companies that use the new techniques and tools have reported improvements in both productivity and quality, but there are still companies developing software the way we did 30 years ago. If you saw the movie Jurassic Park, you saw the perfect way not to develop software. The programmer in the movie was the only person who knew the details of the system. No processes were followed, and there was no documentation. This was an absolutely perfect prescription for failure. Some of you are probably familiar with the term hacker which describes a person who spends hours sitting at a terminal hacking out code. Hackers have created some outstanding software products, but with today's complex systems, most companies are trying to get away from their dependence on hackers. They are instead turning to the process-oriented approach. When selecting software vendors, don't just look at the functionality of a product. Try to determine how the vendor develops software, and determine if you are dealing with hackers or a process-driven company. In the long run, you should get better, more reliable products from the latter.
Fault Tree Based Diagnosis with Optimal Test Sequencing for Field Service Engineers
NASA Technical Reports Server (NTRS)
Iverson, David L.; George, Laurence L.; Patterson-Hine, F. A.; Lum, Henry, Jr. (Technical Monitor)
1994-01-01
When field service engineers go to customer sites to service equipment, they want to diagnose and repair failures quickly and cost effectively. Symptoms exhibited by failed equipment frequently suggest several possible causes which require different approaches to diagnosis. This can lead the engineer to follow several fruitless paths in the diagnostic process before they find the actual failure. To assist in this situation, we have developed the Fault Tree Diagnosis and Optimal Test Sequence (FTDOTS) software system that performs automated diagnosis and ranks diagnostic hypotheses based on failure probability and the time or cost required to isolate and repair each failure. FTDOTS first finds a set of possible failures that explain exhibited symptoms by using a fault tree reliability model as a diagnostic knowledge to rank the hypothesized failures based on how likely they are and how long it would take or how much it would cost to isolate and repair them. This ordering suggests an optimal sequence for the field service engineer to investigate the hypothesized failures in order to minimize the time or cost required to accomplish the repair task. Previously, field service personnel would arrive at the customer site and choose which components to investigate based on past experience and service manuals. Using FTDOTS running on a portable computer, they can now enter a set of symptoms and get a list of possible failures ordered in an optimal test sequence to help them in their decisions. If facilities are available, the field engineer can connect the portable computer to the malfunctioning device for automated data gathering. FTDOTS is currently being applied to field service of medical test equipment. The techniques are flexible enough to use for many different types of devices. If a fault tree model of the equipment and information about component failure probabilities and isolation times or costs are available, a diagnostic knowledge base for that device can be developed easily.
NASA Technical Reports Server (NTRS)
Craig, Larry G.
2010-01-01
This slide presentation reviews three failures of software and how the failures contributed to or caused the failure of a launch or payload insertion into orbit. In order to avoid these systematic failures in the future, failure mitigation strategies are suggested for use.
Integrated Hardware and Software for No-Loss Computing
NASA Technical Reports Server (NTRS)
James, Mark
2007-01-01
When an algorithm is distributed across multiple threads executing on many distinct processors, a loss of one of those threads or processors can potentially result in the total loss of all the incremental results up to that point. When implementation is massively hardware distributed, then the probability of a hardware failure during the course of a long execution is potentially high. Traditionally, this problem has been addressed by establishing checkpoints where the current state of some or part of the execution is saved. Then in the event of a failure, this state information can be used to recompute that point in the execution and resume the computation from that point. A serious problem arises when one distributes a problem across multiple threads and physical processors is that one increases the likelihood of the algorithm failing due to no fault of the scientist but as a result of hardware faults coupled with operating system problems. With good reason, scientists expect their computing tools to serve them and not the other way around. What is novel here is a unique combination of hardware and software that reformulates an application into monolithic structure that can be monitored in real-time and dynamically reconfigured in the event of a failure. This unique reformulation of hardware and software will provide advanced aeronautical technologies to meet the challenges of next-generation systems in aviation, for civilian and scientific purposes, in our atmosphere and in atmospheres of other worlds. In particular, with respect to NASA s manned flight to Mars, this technology addresses the critical requirements for improving safety and increasing reliability of manned spacecraft.
Experimental analysis of computer system dependability
NASA Technical Reports Server (NTRS)
Iyer, Ravishankar, K.; Tang, Dong
1993-01-01
This paper reviews an area which has evolved over the past 15 years: experimental analysis of computer system dependability. Methodologies and advances are discussed for three basic approaches used in the area: simulated fault injection, physical fault injection, and measurement-based analysis. The three approaches are suited, respectively, to dependability evaluation in the three phases of a system's life: design phase, prototype phase, and operational phase. Before the discussion of these phases, several statistical techniques used in the area are introduced. For each phase, a classification of research methods or study topics is outlined, followed by discussion of these methods or topics as well as representative studies. The statistical techniques introduced include the estimation of parameters and confidence intervals, probability distribution characterization, and several multivariate analysis methods. Importance sampling, a statistical technique used to accelerate Monte Carlo simulation, is also introduced. The discussion of simulated fault injection covers electrical-level, logic-level, and function-level fault injection methods as well as representative simulation environments such as FOCUS and DEPEND. The discussion of physical fault injection covers hardware, software, and radiation fault injection methods as well as several software and hybrid tools including FIAT, FERARI, HYBRID, and FINE. The discussion of measurement-based analysis covers measurement and data processing techniques, basic error characterization, dependency analysis, Markov reward modeling, software-dependability, and fault diagnosis. The discussion involves several important issues studies in the area, including fault models, fast simulation techniques, workload/failure dependency, correlated failures, and software fault tolerance.
Assessing performance and validating finite element simulations using probabilistic knowledge
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dolin, Ronald M.; Rodriguez, E. A.
Two probabilistic approaches for assessing performance are presented. The first approach assesses probability of failure by simultaneously modeling all likely events. The probability each event causes failure along with the event's likelihood of occurrence contribute to the overall probability of failure. The second assessment method is based on stochastic sampling using an influence diagram. Latin-hypercube sampling is used to stochastically assess events. The overall probability of failure is taken as the maximum probability of failure of all the events. The Likelihood of Occurrence simulation suggests failure does not occur while the Stochastic Sampling approach predicts failure. The Likelihood of Occurrencemore » results are used to validate finite element predictions.« less
NASA Astrophysics Data System (ADS)
Li, Xin; Zhang, Lu; Tang, Ying; Huang, Shanguo
2018-03-01
The light-tree-based optical multicasting (LT-OM) scheme provides a spectrum- and energy-efficient method to accommodate emerging multicast services. Some studies focus on the survivability technologies for LTs against a fixed number of link failures, such as single-link failure. However, a few studies involve failure probability constraints when building LTs. It is worth noting that each link of an LT plays different important roles under failure scenarios. When calculating the failure probability of an LT, the importance of its every link should be considered. We design a link importance incorporated failure probability measuring solution (LIFPMS) for multicast LTs under independent failure model and shared risk link group failure model. Based on the LIFPMS, we put forward the minimum failure probability (MFP) problem for the LT-OM scheme. Heuristic approaches are developed to address the MFP problem in elastic optical networks. Numerical results show that the LIFPMS provides an accurate metric for calculating the failure probability of multicast LTs and enhances the reliability of the LT-OM scheme while accommodating multicast services.
Effect of system workload on operating system reliability - A study on IBM 3081
NASA Technical Reports Server (NTRS)
Iyer, R. K.; Rossetti, D. J.
1985-01-01
This paper presents an analysis of operating system failures on an IBM 3081 running VM/SP. Three broad categories of software failures are found: error handling, program control or logic, and hardware related; it is found that more than 25 percent of software failures occur in the hardware/software interface. Measurements show that results on software reliability cannot be considered representative unless the system workload is taken into account. The overall CPU execution rate, although measured to be close to 100 percent most of the time, is not found to correlate strongly with the occurrence of failures. Possible reasons for the observed workload failure dependency, based on detailed investigations of the failure data, are discussed.
UQTools: The Uncertainty Quantification Toolbox - Introduction and Tutorial
NASA Technical Reports Server (NTRS)
Kenny, Sean P.; Crespo, Luis G.; Giesy, Daniel P.
2012-01-01
UQTools is the short name for the Uncertainty Quantification Toolbox, a software package designed to efficiently quantify the impact of parametric uncertainty on engineering systems. UQTools is a MATLAB-based software package and was designed to be discipline independent, employing very generic representations of the system models and uncertainty. Specifically, UQTools accepts linear and nonlinear system models and permits arbitrary functional dependencies between the system s measures of interest and the probabilistic or non-probabilistic parametric uncertainty. One of the most significant features incorporated into UQTools is the theoretical development centered on homothetic deformations and their application to set bounding and approximating failure probabilities. Beyond the set bounding technique, UQTools provides a wide range of probabilistic and uncertainty-based tools to solve key problems in science and engineering.
New Approach For Prediction Groundwater Depletion
NASA Astrophysics Data System (ADS)
Moustafa, Mahmoud
2017-01-01
Current approaches to quantify groundwater depletion involve water balance and satellite gravity. However, the water balance technique includes uncertain estimation of parameters such as evapotranspiration and runoff. The satellite method consumes time and effort. The work reported in this paper proposes using failure theory in a novel way to predict groundwater saturated thickness depletion. An important issue in the failure theory proposed is to determine the failure point (depletion case). The proposed technique uses depth of water as the net result of recharge/discharge processes in the aquifer to calculate remaining saturated thickness resulting from the applied pumping rates in an area to evaluate the groundwater depletion. Two parameters, the Weibull function and Bayes analysis were used to model and analyze collected data from 1962 to 2009. The proposed methodology was tested in a nonrenewable aquifer, with no recharge. Consequently, the continuous decline in water depth has been the main criterion used to estimate the depletion. The value of the proposed approach is to predict the probable effect of the current applied pumping rates on the saturated thickness based on the remaining saturated thickness data. The limitation of the suggested approach is that it assumes the applied management practices are constant during the prediction period. The study predicted that after 300 years there would be an 80% probability of the saturated aquifer which would be expected to be depleted. Lifetime or failure theory can give a simple alternative way to predict the remaining saturated thickness depletion with no time-consuming processes such as the sophisticated software required.
Orbiter subsystem hardware/software interaction analysis. Volume 8: Forward reaction control system
NASA Technical Reports Server (NTRS)
Becker, D. D.
1980-01-01
The results of the orbiter hardware/software interaction analysis for the AFT reaction control system are presented. The interaction between hardware failure modes and software are examined in order to identify associated issues and risks. All orbiter subsystems and interfacing program elements which interact with the orbiter computer flight software are analyzed. The failure modes identified in the subsystem/element failure mode and effects analysis are discussed.
Failure probability under parameter uncertainty.
Gerrard, R; Tsanakas, A
2011-05-01
In many problems of risk analysis, failure is equivalent to the event of a random risk factor exceeding a given threshold. Failure probabilities can be controlled if a decisionmaker is able to set the threshold at an appropriate level. This abstract situation applies, for example, to environmental risks with infrastructure controls; to supply chain risks with inventory controls; and to insurance solvency risks with capital controls. However, uncertainty around the distribution of the risk factor implies that parameter error will be present and the measures taken to control failure probabilities may not be effective. We show that parameter uncertainty increases the probability (understood as expected frequency) of failures. For a large class of loss distributions, arising from increasing transformations of location-scale families (including the log-normal, Weibull, and Pareto distributions), the article shows that failure probabilities can be exactly calculated, as they are independent of the true (but unknown) parameters. Hence it is possible to obtain an explicit measure of the effect of parameter uncertainty on failure probability. Failure probability can be controlled in two different ways: (1) by reducing the nominal required failure probability, depending on the size of the available data set, and (2) by modifying of the distribution itself that is used to calculate the risk control. Approach (1) corresponds to a frequentist/regulatory view of probability, while approach (2) is consistent with a Bayesian/personalistic view. We furthermore show that the two approaches are consistent in achieving the required failure probability. Finally, we briefly discuss the effects of data pooling and its systemic risk implications. © 2010 Society for Risk Analysis.
Develop advanced nonlinear signal analysis topographical mapping system
NASA Technical Reports Server (NTRS)
Jong, Jen-Yi
1993-01-01
This study will provide timely assessment of SSME component operational status, identify probable causes of malfunction, and indicate feasible engineering solutions. The final result of this program will yield an advanced nonlinear signal analysis topographical mapping system (ATMS) of nonlinear and nonstationary spectral analysis software package integrated with the Compressed SSME TOPO Data Base (CSTDB) on the same platform. This system will allow NASA engineers to retrieve any unique defect signatures and trends associated with different failure modes and anomalous phenomena over the entire SSME test history across turbopump families.
Approximation of Failure Probability Using Conditional Sampling
NASA Technical Reports Server (NTRS)
Giesy. Daniel P.; Crespo, Luis G.; Kenney, Sean P.
2008-01-01
In analyzing systems which depend on uncertain parameters, one technique is to partition the uncertain parameter domain into a failure set and its complement, and judge the quality of the system by estimating the probability of failure. If this is done by a sampling technique such as Monte Carlo and the probability of failure is small, accurate approximation can require so many sample points that the computational expense is prohibitive. Previous work of the authors has shown how to bound the failure event by sets of such simple geometry that their probabilities can be calculated analytically. In this paper, it is shown how to make use of these failure bounding sets and conditional sampling within them to substantially reduce the computational burden of approximating failure probability. It is also shown how the use of these sampling techniques improves the confidence intervals for the failure probability estimate for a given number of sample points and how they reduce the number of sample point analyses needed to achieve a given level of confidence.
Conceptual modeling of coincident failures in multiversion software
NASA Technical Reports Server (NTRS)
Littlewood, Bev; Miller, Douglas R.
1989-01-01
Recent work by Eckhardt and Lee (1985) shows that independently developed program versions fail dependently (specifically, simultaneous failure of several is greater than would be the case under true independence). The present authors show there is a precise duality between input choice and program choice in this model and consider a generalization in which different versions can be developed using diverse methodologies. The use of diverse methodologies is shown to decrease the probability of the simultaneous failure of several versions. Indeed, it is theoretically possible to obtain versions which exhibit better than independent failure behavior. The authors try to formalize the notion of methodological diversity by considering the sequence of decision outcomes that constitute a methodology. They show that diversity of decision implies likely diversity of behavior for the different verions developed under such forced diversity. For certain one-out-of-n systems the authors obtain an optimal method for allocating diversity between versions. For two-out-of-three systems there seem to be no simple optimality results which do not depend on constraints which cannot be verified in practice.
Guest Editor's Introduction: Special section on dependable distributed systems
NASA Astrophysics Data System (ADS)
Fetzer, Christof
1999-09-01
We rely more and more on computers. For example, the Internet reshapes the way we do business. A `computer outage' can cost a company a substantial amount of money. Not only with respect to the business lost during an outage, but also with respect to the negative publicity the company receives. This is especially true for Internet companies. After recent computer outages of Internet companies, we have seen a drastic fall of the shares of the affected companies. There are multiple causes for computer outages. Although computer hardware becomes more reliable, hardware related outages remain an important issue. For example, some of the recent computer outages of companies were caused by failed memory and system boards, and even by crashed disks - a failure type which can easily be masked using disk mirroring. Transient hardware failures might also look like software failures and, hence, might be incorrectly classified as such. However, many outages are software related. Faulty system software, middleware, and application software can crash a system. Dependable computing systems are systems we can rely on. Dependable systems are, by definition, reliable, available, safe and secure [3]. This special section focuses on issues related to dependable distributed systems. Distributed systems have the potential to be more dependable than a single computer because the probability that all computers in a distributed system fail is smaller than the probability that a single computer fails. However, if a distributed system is not built well, it is potentially less dependable than a single computer since the probability that at least one computer in a distributed system fails is higher than the probability that one computer fails. For example, if the crash of any computer in a distributed system can bring the complete system to a halt, the system is less dependable than a single-computer system. Building dependable distributed systems is an extremely difficult task. There is no silver bullet solution. Instead one has to apply a variety of engineering techniques [2]: fault-avoidance (minimize the occurrence of faults, e.g. by using a proper design process), fault-removal (remove faults before they occur, e.g. by testing), fault-evasion (predict faults by monitoring and reconfigure the system before failures occur), and fault-tolerance (mask and/or contain failures). Building a system from scratch is an expensive and time consuming effort. To reduce the cost of building dependable distributed systems, one would choose to use commercial off-the-shelf (COTS) components whenever possible. The usage of COTS components has several potential advantages beyond minimizing costs. For example, through the widespread usage of a COTS component, design failures might be detected and fixed before the component is used in a dependable system. Custom-designed components have to mature without the widespread in-field testing of COTS components. COTS components have various potential disadvantages when used in dependable systems. For example, minimizing the time to market might lead to the release of components with inherent design faults (e.g. use of `shortcuts' that only work most of the time). In addition, the components might be more complex than needed and, hence, potentially have more design faults than simpler components. However, given economic constraints and the ability to cope with some of the problems using fault-evasion and fault-tolerance, only for a small percentage of systems can one justify not using COTS components. Distributed systems built from current COTS components are asynchronous systems in the sense that there exists no a priori known bound on the transmission delay of messages or the execution time of processes. When designing a distributed algorithm, one would like to make sure (e.g. by testing or verification) that it is correct, i.e. satisfies its specification. Many distributed algorithms make use of consensus (eventually all non-crashed processes have to agree on a value), leader election (a crashed leader is eventually replaced by a new leader, but at any time there is at most one leader) or a group membership detection service (a crashed process is eventually suspected to have crashed but only crashed processes are suspected). From a theoretical point of view, the service specifications given for such services are not implementable in asynchronous systems. In particular, for each implementation one can derive a counter example in which the service violates its specification. From a practical point of view, the consensus, the leader election, and the membership detection problem are solvable in asynchronous distributed systems. In this special section, Raynal and Tronel show how to bridge this difference by showing how to implement the group membership detection problem with a negligible probability [1] to fail in an asynchronous system. The group membership detection problem is specified by a liveness condition (L) and a safety property (S): (L) if a process p crashes, then eventually every non-crashed process q has to suspect that p has crashed; and (S) if a process q suspects p, then p has indeed crashed. One can show that either (L) or (S) is implementable, but one cannot implement both (L) and (S) at the same time in an asynchronous system. In practice, one only needs to implement (L) and (S) such that the probability that (L) or (S) is violated becomes negligible. Raynal and Tronel propose and analyse a protocol that implements (L) with certainty and that can be tuned such that the probability that (S) is violated becomes negligible. Designing and implementing distributed fault-tolerant protocols for asynchronous systems is a difficult but not an impossible task. A fault-tolerant protocol has to detect and mask certain failure classes, e.g. crash failures and message omission failures. There is a trade-off between the performance of a fault-tolerant protocol and the failure classes the protocol can tolerate. One wants to tolerate as many failure classes as needed to satisfy the stochastic requirements of the protocol [1] while still maintaining a sufficient performance. Since clients of a protocol have different requirements with respect to the performance/fault-tolerance trade-off, one would like to be able to customize protocols such that one can select an appropriate performance/fault-tolerance trade-off. In this special section Hiltunen et al describe how one can compose protocols from micro-protocols in their Cactus system. They show how a group RPC system can be tailored to the needs of a client. In particular, they show how considering additional failure classes affects the performance of a group RPC system. References [1] Cristian F 1991 Understanding fault-tolerant distributed systems Communications of ACM 34 (2) 56-78 [2] Heimerdinger W L and Weinstock C B 1992 A conceptual framework for system fault tolerance Technical Report 92-TR-33, CMU/SEI [3] Laprie J C (ed) 1992 Dependability: Basic Concepts and Terminology (Vienna: Springer)
Software Risk Identification for Interplanetary Probes
NASA Technical Reports Server (NTRS)
Dougherty, Robert J.; Papadopoulos, Periklis E.
2005-01-01
The need for a systematic and effective software risk identification methodology is critical for interplanetary probes that are using increasingly complex and critical software. Several probe failures are examined that suggest more attention and resources need to be dedicated to identifying software risks. The direct causes of these failures can often be traced to systemic problems in all phases of the software engineering process. These failures have lead to the development of a practical methodology to identify risks for interplanetary probes. The proposed methodology is based upon the tailoring of the Software Engineering Institute's (SEI) method of taxonomy-based risk identification. The use of this methodology will ensure a more consistent and complete identification of software risks in these probes.
Novel elastic protection against DDF failures in an enhanced software-defined SIEPON
NASA Astrophysics Data System (ADS)
Pakpahan, Andrew Fernando; Hwang, I.-Shyan; Yu, Yu-Ming; Hsu, Wu-Hsiao; Liem, Andrew Tanny; Nikoukar, AliAkbar
2017-07-01
Ever-increasing bandwidth demands on passive optical networks (PONs) are pushing the utilization of every fiber strand to its limit. This is mandating comprehensive protection until the end of the distribution drop fiber (DDF). Hence, it is important to provide refined protection with an advanced fault-protection architecture and recovery mechanism that is able to cope with various DDF failures. We propose a novel elastic protection against DDF failures that incorporates a software-defined networking (SDN) capability and a bus protection line to enhance the resiliency of the existing Service Interoperability in Ethernet Passive Optical Networks (SIEPON) system. We propose the addition of an integrated SDN controller and flow tables to the optical line terminal and optical network units (ONUs) in order to deliver various DDF protection scenarios. The proposed architecture enables flexible assignment of backup ONU(s) in pre/post-fault conditions depending on the PON traffic load. A transient backup ONU and multiple backup ONUs can be deployed in the pre-fault and post-fault scenarios, respectively. Our extensively discussed simulation results show that our proposed architecture provides better overall throughput and drop probability compared to the architecture with a fixed DDF protection mechanism. It does so while still maintaining overall QoS performance in terms of packet delay, mean jitter, packet loss, and throughput under various fault conditions.
Failure probability analysis of optical grid
NASA Astrophysics Data System (ADS)
Zhong, Yaoquan; Guo, Wei; Sun, Weiqiang; Jin, Yaohui; Hu, Weisheng
2008-11-01
Optical grid, the integrated computing environment based on optical network, is expected to be an efficient infrastructure to support advanced data-intensive grid applications. In optical grid, the faults of both computational and network resources are inevitable due to the large scale and high complexity of the system. With the optical network based distributed computing systems extensive applied in the processing of data, the requirement of the application failure probability have been an important indicator of the quality of application and an important aspect the operators consider. This paper will present a task-based analysis method of the application failure probability in optical grid. Then the failure probability of the entire application can be quantified, and the performance of reducing application failure probability in different backup strategies can be compared, so that the different requirements of different clients can be satisfied according to the application failure probability respectively. In optical grid, when the application based DAG (directed acyclic graph) is executed in different backup strategies, the application failure probability and the application complete time is different. This paper will propose new multi-objective differentiated services algorithm (MDSA). New application scheduling algorithm can guarantee the requirement of the failure probability and improve the network resource utilization, realize a compromise between the network operator and the application submission. Then differentiated services can be achieved in optical grid.
Duan, Yuanyuan; Gonzalez, Jorge A; Kulkarni, Pratim A; Nagy, William W; Griggs, Jason A
2018-06-16
To validate the fatigue lifetime of a reduced-diameter dental implant system predicted by three-dimensional finite element analysis (FEA) by testing physical implant specimens using an accelerated lifetime testing (ALT) strategy with the apparatus specified by ISO 14801. A commercially-available reduced-diameter titanium dental implant system (Straumann Standard Plus NN) was digitized using a micro-CT scanner. Axial slices were processed using an interactive medical image processing software (Mimics) to create 3D models. FEA analysis was performed in ABAQUS, and fatigue lifetime was predicted using fe-safe ® software. The same implant specimens (n=15) were tested at a frequency of 2Hz on load frames using apparatus specified by ISO 14801 and ALT. Multiple step-stress load profiles with various aggressiveness were used to improve testing efficiency. Fatigue lifetime statistics of physical specimens were estimated in a reliability analysis software (ALTA PRO). Fractured specimens were examined using SEM with fractographic technique to determine the failure mode. FEA predicted lifetime was within the 95% confidence interval of lifetime estimated by experimental results, which suggested that FEA prediction was accurate for this implant system. The highest probability of failure was located at the root of the implant body screw thread adjacent to the simulated bone level, which also agreed with the failure origin in physical specimens. Fatigue lifetime predictions based on finite element modeling could yield similar results in lieu of physical testing, allowing the use of virtual testing in the early stages of future research projects on implant fatigue. Copyright © 2018 The Academy of Dental Materials. Published by Elsevier Inc. All rights reserved.
NASA Technical Reports Server (NTRS)
Putcha, Chandra S.; Mikula, D. F. Kip; Dueease, Robert A.; Dang, Lan; Peercy, Robert L.
1997-01-01
This paper deals with the development of a reliability methodology to assess the consequences of using hardware, without failure analysis or corrective action, that has previously demonstrated that it did not perform per specification. The subject of this paper arose from the need to provide a detailed probabilistic analysis to calculate the change in probability of failures with respect to the base or non-failed hardware. The methodology used for the analysis is primarily based on principles of Monte Carlo simulation. The random variables in the analysis are: Maximum Time of Operation (MTO) and operation Time of each Unit (OTU) The failure of a unit is considered to happen if (OTU) is less than MTO for the Normal Operational Period (NOP) in which this unit is used. NOP as a whole uses a total of 4 units. Two cases are considered. in the first specialized scenario, the failure of any operation or system failure is considered to happen if any of the units used during the NOP fail. in the second specialized scenario, the failure of any operation or system failure is considered to happen only if any two of the units used during the MOP fail together. The probability of failure of the units and the system as a whole is determined for 3 kinds of systems - Perfect System, Imperfect System 1 and Imperfect System 2. in a Perfect System, the operation time of the failed unit is the same as that of the MTO. In an Imperfect System 1, the operation time of the failed unit is assumed as 1 percent of the MTO. In an Imperfect System 2, the operation time of the failed unit is assumed as zero. in addition, simulated operation time of failed units is assumed as 10 percent of the corresponding units before zero value. Monte Carlo simulation analysis is used for this study. Necessary software has been developed as part of this study to perform the reliability calculations. The results of the analysis showed that the predicted change in failure probability (P(sub F)) for the previously failed units is as high as 49 percent above the baseline (perfect system) for the worst case. The predicted change in system P(sub F) for the previously failed units is as high as 36% for single unit failure without any redundancy. For redundant systems, with dual unit failure, the predicted change in P(sub F) for the previously failed units is as high as 16%. These results will help management to make decisions regarding the consequences of using previously failed units without adequate failure analysis or corrective action.
Cross-layer restoration with software defined networking based on IP over optical transport networks
NASA Astrophysics Data System (ADS)
Yang, Hui; Cheng, Lei; Deng, Junni; Zhao, Yongli; Zhang, Jie; Lee, Young
2015-10-01
The IP over optical transport network is a very promising networking architecture applied to the interconnection of geographically distributed data centers due to the performance guarantee of low delay, huge bandwidth and high reliability at a low cost. It can enable efficient resource utilization and support heterogeneous bandwidth demands in highly-available, cost-effective and energy-effective manner. In case of cross-layer link failure, to ensure a high-level quality of service (QoS) for user request after the failure becomes a research focus. In this paper, we propose a novel cross-layer restoration scheme for data center services with software defined networking based on IP over optical network. The cross-layer restoration scheme can enable joint optimization of IP network and optical network resources, and enhance the data center service restoration responsiveness to the dynamic end-to-end service demands. We quantitatively evaluate the feasibility and performances through the simulation under heavy traffic load scenario in terms of path blocking probability and path restoration latency. Numeric results show that the cross-layer restoration scheme improves the recovery success rate and minimizes the overall recovery time.
Formal Validation of Aerospace Software
NASA Astrophysics Data System (ADS)
Lesens, David; Moy, Yannick; Kanig, Johannes
2013-08-01
Any single error in critical software can have catastrophic consequences. Even though failures are usually not advertised, some software bugs have become famous, such as the error in the MIM-104 Patriot. For space systems, experience shows that software errors are a serious concern: more than half of all satellite failures from 2000 to 2003 involved software. To address this concern, this paper addresses the use of formal verification of software developed in Ada.
NASA Technical Reports Server (NTRS)
1997-01-01
Products made from advanced ceramics show great promise for revolutionizing aerospace and terrestrial propulsion and power generation. However, ceramic components are difficult to design because brittle materials in general have widely varying strength values. The CARES/Life software developed at the NASA Lewis Research Center eases this by providing a tool that uses probabilistic reliability analysis techniques to optimize the design and manufacture of brittle material components. CARES/Life is an integrated package that predicts the probability of a monolithic ceramic component's failure as a function of its time in service. It couples commercial finite element programs--which resolve a component's temperature and stress distribution - with reliability evaluation and fracture mechanics routines for modeling strength - limiting defects. These routines are based on calculations of the probabilistic nature of the brittle material's strength.
Failure-Modes-And-Effects Analysis Of Software Logic
NASA Technical Reports Server (NTRS)
Garcia, Danny; Hartline, Thomas; Minor, Terry; Statum, David; Vice, David
1996-01-01
Rigorous analysis applied early in design effort. Method of identifying potential inadequacies and modes and effects of failures caused by inadequacies (failure-modes-and-effects analysis or "FMEA" for short) devised for application to software logic.
Software Health Management: A Short Review of Challenges and Existing Techniques
NASA Technical Reports Server (NTRS)
Pipatsrisawat, Knot; Darwiche, Adnan; Mengshoel, Ole J.; Schumann, Johann
2009-01-01
Modern spacecraft (as well as most other complex mechanisms like aircraft, automobiles, and chemical plants) rely more and more on software, to a point where software failures have caused severe accidents and loss of missions. Software failures during a manned mission can cause loss of life, so there are severe requirements to make the software as safe and reliable as possible. Typically, verification and validation (V&V) has the task of making sure that all software errors are found before the software is deployed and that it always conforms to the requirements. Experience, however, shows that this gold standard of error-free software cannot be reached in practice. Even if the software alone is free of glitches, its interoperation with the hardware (e.g., with sensors or actuators) can cause problems. Unexpected operational conditions or changes in the environment may ultimately cause a software system to fail. Is there a way to surmount this problem? In most modern aircraft and many automobiles, hardware such as central electrical, mechanical, and hydraulic components are monitored by IVHM (Integrated Vehicle Health Management) systems. These systems can recognize, isolate, and identify faults and failures, both those that already occurred as well as imminent ones. With the help of diagnostics and prognostics, appropriate mitigation strategies can be selected (replacement or repair, switch to redundant systems, etc.). In this short paper, we discuss some challenges and promising techniques for software health management (SWHM). In particular, we identify unique challenges for preventing software failure in systems which involve both software and hardware components. We then present our classifications of techniques related to SWHM. These classifications are performed based on dimensions of interest to both developers and users of the techniques, and hopefully provide a map for dealing with software faults and failures.
NASA Astrophysics Data System (ADS)
D'silva, Oneil; Kerrison, Roger
2013-09-01
A key feature for the increased utilization of space robotics is to automate Extra-Vehicular manned space activities and thus significantly reduce the potential for catastrophic hazards while simultaneously minimizing the overall costs associated with manned space. The principal scope of the paper is to evaluate the use of industry standard accepted Probability risk/safety assessment (PRA/PSA) methodologies and Hazard Risk frequency Criteria as a hazard control. This paper illustrates the applicability of combining the selected Probability risk assessment methodology and hazard risk frequency criteria, in order to apply the necessary safety controls that allow for the increased use of the Mobile Servicing system (MSS) robotic system on the International Space Station. This document will consider factors such as component failure rate reliability, software reliability, and periods of operation and dormancy, fault tree analyses and their effects on the probability risk assessments. The paper concludes with suggestions for the incorporation of existing industry Risk/Safety plans to create an applicable safety process for future activities/programs
Sakatani, Tomohiko; Shimoo, Satoshi; Takamatsu, Kazuaki; Kyodo, Atsushi; Tsuji, Yumika; Mera, Kayoko; Koide, Masahiro; Isodono, Koji; Tsubakimoto, Yoshinori; Matsuo, Akiko; Inoue, Keiji; Fujita, Hiroshi
2016-12-01
Myocardial perfusion single-photon emission-computed tomography (SPECT) can predict cardiac events in patients with coronary artery disease with high accuracy; however, pseudo-negative cases sometimes occur. Heart Risk View, which is based on the prospective cohort study (J-ACCESS), is a software for evaluating cardiac event probability. We examined whether Heart Risk View was useful to evaluate the cardiac risk in patients with normal myocardial perfusion SPECT (MPS). We studied 3461 consecutive patients who underwent MPS to detect myocardial ischemia and those who had normal MPS were enrolled in this study (n = 698). We calculated cardiac event probability by Heart Risk View and followed-up for 3.8 ± 2.4 years. The cardiac events were defined as cardiac death, non-fatal myocardial infarction, and heart failure requiring hospitalization. During the follow-up period, 21 patients (3.0 %) had cardiac events. The event probability calculated by Heart Risk View was higher in the event group (5.5 ± 2.6 vs. 2.9 ± 2.6 %, p < 0.001). According to the receiver-operating characteristics curve, the cut-off point of the event probability for predicting cardiac events was 3.4 % (sensitivity 0.76, specificity 0.72, and AUC 0.85). Kaplan-Meier curves revealed that a higher event rate was observed in the high-event probability group by the log-rank test (p < 0.001). Although myocardial perfusion SPECT is useful for the prediction of cardiac events, risk estimation by Heart Risk View adds more prognostic information, especially in patients with normal MPS.
NASA Technical Reports Server (NTRS)
Becker, D. D.
1980-01-01
The orbiter subsystems and interfacing program elements which interact with the orbiter computer flight software are analyzed. The failure modes identified in the subsystem/element failure mode and effects analysis are examined. Potential interaction with the software is examined through an evaluation of the software requirements. The analysis is restricted to flight software requirements and excludes utility/checkout software. The results of the hardware/software interaction analysis for the forward reaction control system are presented.
SMART: A Propositional Logic-Based Trade Analysis and Risk Assessment Tool for a Complex Mission
NASA Technical Reports Server (NTRS)
Ono, Masahiro; Nicholas, Austin; Alibay, Farah; Parrish, Joseph
2015-01-01
This paper introduces a new trade analysis software called the Space Mission Architecture and Risk Analysis Tool (SMART). This tool supports a high-level system trade study on a complex mission, such as a potential Mars Sample Return (MSR) mission, in an intuitive and quantitative manner. In a complex mission, a common approach to increase the probability of success is to have redundancy and prepare backups. Quantitatively evaluating the utility of adding redundancy to a system is important but not straightforward, particularly when the failure of parallel subsystems are correlated.
Towards improving software security by using simulation to inform requirements and conceptual design
Nutaro, James J.; Allgood, Glenn O.; Kuruganti, Teja
2015-06-17
We illustrate the use of modeling and simulation early in the system life-cycle to improve security and reduce costs. The models that we develop for this illustration are inspired by problems in reliability analysis and supervisory control, for which similar models are used to quantify failure probabilities and rates. In the context of security, we propose that models of this general type can be used to understand trades between risk and cost while writing system requirements and during conceptual design, and thereby significantly reduce the need for expensive security corrections after a system enters operation
On the use and the performance of software reliability growth models
NASA Technical Reports Server (NTRS)
Keiller, Peter A.; Miller, Douglas R.
1991-01-01
We address the problem of predicting future failures for a piece of software. The number of failures occurring during a finite future time interval is predicted from the number failures observed during an initial period of usage by using software reliability growth models. Two different methods for using the models are considered: straightforward use of individual models, and dynamic selection among models based on goodness-of-fit and quality-of-prediction criteria. Performance is judged by the relative error of the predicted number of failures over future finite time intervals relative to the number of failures eventually observed during the intervals. Six of the former models and eight of the latter are evaluated, based on their performance on twenty data sets. Many open questions remain regarding the use and the performance of software reliability growth models.
Using Utility Functions to Control a Distributed Storage System
2008-05-01
Pinheiro et al. [2007] suggest this is not an accurate assumption. Nicola and Goyal [1990] examined correlated failures across multiversion software...F. and Goyal, A. (1990). Modeling of correlated failures and community error recovery in multiversion software. IEEE Transactions on Software
Huang, Jiahua; Zhou, Hai; Zhang, Binbin; Ding, Biao
2015-09-01
This article develops a new failure database software for orthopaedics implants based on WEB. The software is based on B/S mode, ASP dynamic web technology is used as its main development language to achieve data interactivity, Microsoft Access is used to create a database, these mature technologies make the software extend function or upgrade easily. In this article, the design and development idea of the software, the software working process and functions as well as relative technical features are presented. With this software, we can store many different types of the fault events of orthopaedics implants, the failure data can be statistically analyzed, and in the macroscopic view, it can be used to evaluate the reliability of orthopaedics implants and operations, it also can ultimately guide the doctors to improve the clinical treatment level.
Software reliability experiments data analysis and investigation
NASA Technical Reports Server (NTRS)
Walker, J. Leslie; Caglayan, Alper K.
1991-01-01
The objectives are to investigate the fundamental reasons which cause independently developed software programs to fail dependently, and to examine fault tolerant software structures which maximize reliability gain in the presence of such dependent failure behavior. The authors used 20 redundant programs from a software reliability experiment to analyze the software errors causing coincident failures, to compare the reliability of N-version and recovery block structures composed of these programs, and to examine the impact of diversity on software reliability using subpopulations of these programs. The results indicate that both conceptually related and unrelated errors can cause coincident failures and that recovery block structures offer more reliability gain than N-version structures if acceptance checks that fail independently from the software components are available. The authors present a theory of general program checkers that have potential application for acceptance tests.
NASA Astrophysics Data System (ADS)
Mulyana, Cukup; Muhammad, Fajar; Saad, Aswad H.; Mariah, Riveli, Nowo
2017-03-01
Storage tank component is the most critical component in LNG regasification terminal. It has the risk of failure and accident which impacts to human health and environment. Risk assessment is conducted to detect and reduce the risk of failure in storage tank. The aim of this research is determining and calculating the probability of failure in regasification unit of LNG. In this case, the failure is caused by Boiling Liquid Expanding Vapor Explosion (BLEVE) and jet fire in LNG storage tank component. The failure probability can be determined by using Fault Tree Analysis (FTA). Besides that, the impact of heat radiation which is generated is calculated. Fault tree for BLEVE and jet fire on storage tank component has been determined and obtained with the value of failure probability for BLEVE of 5.63 × 10-19 and for jet fire of 9.57 × 10-3. The value of failure probability for jet fire is high enough and need to be reduced by customizing PID scheme of regasification LNG unit in pipeline number 1312 and unit 1. The value of failure probability after customization has been obtained of 4.22 × 10-6.
Optimization Testbed Cometboards Extended into Stochastic Domain
NASA Technical Reports Server (NTRS)
Patnaik, Surya N.; Pai, Shantaram S.; Coroneos, Rula M.; Patnaik, Surya N.
2010-01-01
COMparative Evaluation Testbed of Optimization and Analysis Routines for the Design of Structures (CometBoards) is a multidisciplinary design optimization software. It was originally developed for deterministic calculation. It has now been extended into the stochastic domain for structural design problems. For deterministic problems, CometBoards is introduced through its subproblem solution strategy as well as the approximation concept in optimization. In the stochastic domain, a design is formulated as a function of the risk or reliability. Optimum solution including the weight of a structure, is also obtained as a function of reliability. Weight versus reliability traced out an inverted-S-shaped graph. The center of the graph corresponded to 50 percent probability of success, or one failure in two samples. A heavy design with weight approaching infinity could be produced for a near-zero rate of failure that corresponded to unity for reliability. Weight can be reduced to a small value for the most failure-prone design with a compromised reliability approaching zero. The stochastic design optimization (SDO) capability for an industrial problem was obtained by combining three codes: MSC/Nastran code was the deterministic analysis tool, fast probabilistic integrator, or the FPI module of the NESSUS software, was the probabilistic calculator, and CometBoards became the optimizer. The SDO capability requires a finite element structural model, a material model, a load model, and a design model. The stochastic optimization concept is illustrated considering an academic example and a real-life airframe component made of metallic and composite materials.
NASA Technical Reports Server (NTRS)
Shih, Ann T.; Lo, Yunnhon; Ward, Natalie C.
2010-01-01
Quantifying the probability of significant launch vehicle failure scenarios for a given design, while still in the design process, is critical to mission success and to the safety of the astronauts. Probabilistic risk assessment (PRA) is chosen from many system safety and reliability tools to verify the loss of mission (LOM) and loss of crew (LOC) requirements set by the NASA Program Office. To support the integrated vehicle PRA, probabilistic design analysis (PDA) models are developed by using vehicle design and operation data to better quantify failure probabilities and to better understand the characteristics of a failure and its outcome. This PDA approach uses a physics-based model to describe the system behavior and response for a given failure scenario. Each driving parameter in the model is treated as a random variable with a distribution function. Monte Carlo simulation is used to perform probabilistic calculations to statistically obtain the failure probability. Sensitivity analyses are performed to show how input parameters affect the predicted failure probability, providing insight for potential design improvements to mitigate the risk. The paper discusses the application of the PDA approach in determining the probability of failure for two scenarios from the NASA Ares I project
Commercialization of NESSUS: Status
NASA Technical Reports Server (NTRS)
Thacker, Ben H.; Millwater, Harry R.
1991-01-01
A plan was initiated in 1988 to commercialize the Numerical Evaluation of Stochastic Structures Under Stress (NESSUS) probabilistic structural analysis software. The goal of the on-going commercialization effort is to begin the transfer of Probabilistic Structural Analysis Method (PSAM) developed technology into industry and to develop additional funding resources in the general area of structural reliability. The commercialization effort is summarized. The SwRI NESSUS Software System is a general purpose probabilistic finite element computer program using state of the art methods for predicting stochastic structural response due to random loads, material properties, part geometry, and boundary conditions. NESSUS can be used to assess structural reliability, to compute probability of failure, to rank the input random variables by importance, and to provide a more cost effective design than traditional methods. The goal is to develop a general probabilistic structural analysis methodology to assist in the certification of critical components in the next generation Space Shuttle Main Engine.
van Walraven, Carl
2017-04-01
Diagnostic codes used in administrative databases cause bias due to misclassification of patient disease status. It is unclear which methods minimize this bias. Serum creatinine measures were used to determine severe renal failure status in 50,074 hospitalized patients. The true prevalence of severe renal failure and its association with covariates were measured. These were compared to results for which renal failure status was determined using surrogate measures including the following: (1) diagnostic codes; (2) categorization of probability estimates of renal failure determined from a previously validated model; or (3) bootstrap methods imputation of disease status using model-derived probability estimates. Bias in estimates of severe renal failure prevalence and its association with covariates were minimal when bootstrap methods were used to impute renal failure status from model-based probability estimates. In contrast, biases were extensive when renal failure status was determined using codes or methods in which model-based condition probability was categorized. Bias due to misclassification from inaccurate diagnostic codes can be minimized using bootstrap methods to impute condition status using multivariable model-derived probability estimates. Copyright © 2017 Elsevier Inc. All rights reserved.
Risk Based Inspection Methodology and Software Applied to Atmospheric Storage Tanks
NASA Astrophysics Data System (ADS)
Topalis, P.; Korneliussen, G.; Hermanrud, J.; Steo, Y.
2012-05-01
A new risk-based inspection (RBI) methodology and software is presented in this paper. The objective of this work is to allow management of the inspections of atmospheric storage tanks in the most efficient way, while, at the same time, accident risks are minimized. The software has been built on the new risk framework architecture, a generic platform facilitating efficient and integrated development of software applications using risk models. The framework includes a library of risk models and the user interface is automatically produced on the basis of editable schemas. This risk-framework-based RBI tool has been applied in the context of RBI for above-ground atmospheric storage tanks (AST) but it has been designed with the objective of being generic enough to allow extension to the process plants in general. This RBI methodology is an evolution of an approach and mathematical models developed for Det Norske Veritas (DNV) and the American Petroleum Institute (API). The methodology assesses damage mechanism potential, degradation rates, probability of failure (PoF), consequence of failure (CoF) in terms of environmental damage and financial loss, risk and inspection intervals and techniques. The scope includes assessment of the tank floor for soil-side external corrosion and product-side internal corrosion and the tank shell courses for atmospheric corrosion and internal thinning. It also includes preliminary assessment for brittle fracture and cracking. The data are structured according to an asset hierarchy including Plant, Production Unit, Process Unit, Tag, Part and Inspection levels and the data are inherited / defaulted seamlessly from a higher hierarchy level to a lower level. The user interface includes synchronized hierarchy tree browsing, dynamic editor and grid-view editing and active reports with drill-in capability.
Development of STS/Centaur failure probabilities liftoff to Centaur separation
NASA Technical Reports Server (NTRS)
Hudson, J. M.
1982-01-01
The results of an analysis to determine STS/Centaur catastrophic vehicle response probabilities for the phases of vehicle flight from STS liftoff to Centaur separation from the Orbiter are presented. The analysis considers only category one component failure modes as contributors to the vehicle response mode probabilities. The relevant component failure modes are grouped into one of fourteen categories of potential vehicle behavior. By assigning failure rates to each component, for each of its failure modes, the STS/Centaur vehicle response probabilities in each phase of flight can be calculated. The results of this study will be used in a DOE analysis to ascertain the hazard from carrying a nuclear payload on the STS.
NASA Technical Reports Server (NTRS)
Tamayo, Tak Chai
1987-01-01
Quality of software not only is vital to the successful operation of the space station, it is also an important factor in establishing testing requirements, time needed for software verification and integration as well as launching schedules for the space station. Defense of management decisions can be greatly strengthened by combining engineering judgments with statistical analysis. Unlike hardware, software has the characteristics of no wearout and costly redundancies, thus making traditional statistical analysis not suitable in evaluating reliability of software. A statistical model was developed to provide a representation of the number as well as types of failures occur during software testing and verification. From this model, quantitative measure of software reliability based on failure history during testing are derived. Criteria to terminate testing based on reliability objectives and methods to estimate the expected number of fixings required are also presented.
Automatic Monitoring System Design and Failure Probability Analysis for River Dikes on Steep Channel
NASA Astrophysics Data System (ADS)
Chang, Yin-Lung; Lin, Yi-Jun; Tung, Yeou-Koung
2017-04-01
The purposes of this study includes: (1) design an automatic monitoring system for river dike; and (2) develop a framework which enables the determination of dike failure probabilities for various failure modes during a rainstorm. The historical dike failure data collected in this study indicate that most dikes in Taiwan collapsed under the 20-years return period discharge, which means the probability of dike failure is much higher than that of overtopping. We installed the dike monitoring system on the Chiu-She Dike which located on the middle stream of Dajia River, Taiwan. The system includes: (1) vertical distributed pore water pressure sensors in front of and behind the dike; (2) Time Domain Reflectometry (TDR) to measure the displacement of dike; (3) wireless floating device to measure the scouring depth at the toe of dike; and (4) water level gauge. The monitoring system recorded the variation of pore pressure inside the Chiu-She Dike and the scouring depth during Typhoon Megi. The recorded data showed that the highest groundwater level insides the dike occurred 15 hours after the peak discharge. We developed a framework which accounts for the uncertainties from return period discharge, Manning's n, scouring depth, soil cohesion, and friction angle and enables the determination of dike failure probabilities for various failure modes such as overtopping, surface erosion, mass failure, toe sliding and overturning. The framework was applied to Chiu-She, Feng-Chou, and Ke-Chuang Dikes on Dajia River. The results indicate that the toe sliding or overturning has the highest probability than other failure modes. Furthermore, the overall failure probability (integrate different failure modes) reaches 50% under 10-years return period flood which agrees with the historical failure data for the study reaches.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Johnson, Jay Dean; Oberkampf, William Louis; Helton, Jon Craig
2004-12-01
Relationships to determine the probability that a weak link (WL)/strong link (SL) safety system will fail to function as intended in a fire environment are investigated. In the systems under study, failure of the WL system before failure of the SL system is intended to render the overall system inoperational and thus prevent the possible occurrence of accidents with potentially serious consequences. Formal developments of the probability that the WL system fails to deactivate the overall system before failure of the SL system (i.e., the probability of loss of assured safety, PLOAS) are presented for several WWSL configurations: (i) onemore » WL, one SL, (ii) multiple WLs, multiple SLs with failure of any SL before any WL constituting failure of the safety system, (iii) multiple WLs, multiple SLs with failure of all SLs before any WL constituting failure of the safety system, and (iv) multiple WLs, multiple SLs and multiple sublinks in each SL with failure of any sublink constituting failure of the associated SL and failure of all SLs before failure of any WL constituting failure of the safety system. The indicated probabilities derive from time-dependent temperatures in the WL/SL system and variability (i.e., aleatory uncertainty) in the temperatures at which the individual components of this system fail and are formally defined as multidimensional integrals. Numerical procedures based on quadrature (i.e., trapezoidal rule, Simpson's rule) and also on Monte Carlo techniques (i.e., simple random sampling, importance sampling) are described and illustrated for the evaluation of these integrals. Example uncertainty and sensitivity analyses for PLOAS involving the representation of uncertainty (i.e., epistemic uncertainty) with probability theory and also with evidence theory are presented.« less
Anusavice, Kenneth J; Jadaan, Osama M; Esquivel-Upshaw, Josephine F
2013-11-01
Recent reports on bilayer ceramic crown prostheses suggest that fractures of the veneering ceramic represent the most common reason for prosthesis failure. The aims of this study were to test the hypotheses that: (1) an increase in core ceramic/veneer ceramic thickness ratio for a crown thickness of 1.6mm reduces the time-dependent fracture probability (Pf) of bilayer crowns with a lithium-disilicate-based glass-ceramic core, and (2) oblique loading, within the central fossa, increases Pf for 1.6-mm-thick crowns compared with vertical loading. Time-dependent fracture probabilities were calculated for 1.6-mm-thick, veneered lithium-disilicate-based glass-ceramic molar crowns as a function of core/veneer thickness ratio and load orientation in the central fossa area. Time-dependent fracture probability analyses were computed by CARES/Life software and finite element analysis, using dynamic fatigue strength data for monolithic discs of a lithium-disilicate glass-ceramic core (Empress 2), and ceramic veneer (Empress 2 Veneer Ceramic). Predicted fracture probabilities (Pf) for centrally loaded 1.6-mm-thick bilayer crowns over periods of 1, 5, and 10 years are 1.2%, 2.7%, and 3.5%, respectively, for a core/veneer thickness ratio of 1.0 (0.8mm/0.8mm), and 2.5%, 5.1%, and 7.0%, respectively, for a core/veneer thickness ratio of 0.33 (0.4mm/1.2mm). CARES/Life results support the proposed crown design and load orientation hypotheses. The application of dynamic fatigue data, finite element stress analysis, and CARES/Life analysis represent an optimal approach to optimize fixed dental prosthesis designs produced from dental ceramics and to predict time-dependent fracture probabilities of ceramic-based fixed dental prostheses that can minimize the risk for clinical failures. Copyright © 2013 Academy of Dental Materials. All rights reserved.
Anusavice, Kenneth J.; Jadaan, Osama M.; Esquivel–Upshaw, Josephine
2013-01-01
Recent reports on bilayer ceramic crown prostheses suggest that fractures of the veneering ceramic represent the most common reason for prosthesis failure. Objective The aims of this study were to test the hypotheses that: (1) an increase in core ceramic/veneer ceramic thickness ratio for a crown thickness of 1.6 mm reduces the time-dependent fracture probability (Pf) of bilayer crowns with a lithium-disilicate-based glass-ceramic core, and (2) oblique loading, within the central fossa, increases Pf for 1.6-mm-thick crowns compared with vertical loading. Materials and methods Time-dependent fracture probabilities were calculated for 1.6-mm-thick, veneered lithium-disilicate-based glass-ceramic molar crowns as a function of core/veneer thickness ratio and load orientation in the central fossa area. Time-dependent fracture probability analyses were computed by CARES/Life software and finite element analysis, using dynamic fatigue strength data for monolithic discs of a lithium-disilicate glass-ceramic core (Empress 2), and ceramic veneer (Empress 2 Veneer Ceramic). Results Predicted fracture probabilities (Pf) for centrally-loaded 1,6-mm-thick bilayer crowns over periods of 1, 5, and 10 years are 1.2%, 2.7%, and 3.5%, respectively, for a core/veneer thickness ratio of 1.0 (0.8 mm/0.8 mm), and 2.5%, 5.1%, and 7.0%, respectively, for a core/veneer thickness ratio of 0.33 (0.4 mm/1.2 mm). Conclusion CARES/Life results support the proposed crown design and load orientation hypotheses. Significance The application of dynamic fatigue data, finite element stress analysis, and CARES/Life analysis represent an optimal approach to optimize fixed dental prosthesis designs produced from dental ceramics and to predict time-dependent fracture probabilities of ceramic-based fixed dental prostheses that can minimize the risk for clinical failures. PMID:24060349
Questioning the Role of Requirements Engineering in the Causes of Safety-Critical Software Failures
NASA Technical Reports Server (NTRS)
Johnson, C. W.; Holloway, C. M.
2006-01-01
Many software failures stem from inadequate requirements engineering. This view has been supported both by detailed accident investigations and by a number of empirical studies; however, such investigations can be misleading. It is often difficult to distinguish between failures in requirements engineering and problems elsewhere in the software development lifecycle. Further pitfalls arise from the assumption that inadequate requirements engineering is a cause of all software related accidents for which the system fails to meet its requirements. This paper identifies some of the problems that have arisen from an undue focus on the role of requirements engineering in the causes of major accidents. The intention is to provoke further debate within the emerging field of forensic software engineering.
Real-time software failure characterization
NASA Technical Reports Server (NTRS)
Dunham, Janet R.; Finelli, George B.
1990-01-01
A series of studies aimed at characterizing the fundamentals of the software failure process has been undertaken as part of a NASA project on the modeling of a real-time aerospace vehicle software reliability. An overview of these studies is provided, and the current study, an investigation of the reliability of aerospace vehicle guidance and control software, is examined. The study approach provides for the collection of life-cycle process data, and for the retention and evaluation of interim software life-cycle products.
Probabilistic confidence for decisions based on uncertain reliability estimates
NASA Astrophysics Data System (ADS)
Reid, Stuart G.
2013-05-01
Reliability assessments are commonly carried out to provide a rational basis for risk-informed decisions concerning the design or maintenance of engineering systems and structures. However, calculated reliabilities and associated probabilities of failure often have significant uncertainties associated with the possible estimation errors relative to the 'true' failure probabilities. For uncertain probabilities of failure, a measure of 'probabilistic confidence' has been proposed to reflect the concern that uncertainty about the true probability of failure could result in a system or structure that is unsafe and could subsequently fail. The paper describes how the concept of probabilistic confidence can be applied to evaluate and appropriately limit the probabilities of failure attributable to particular uncertainties such as design errors that may critically affect the dependability of risk-acceptance decisions. This approach is illustrated with regard to the dependability of structural design processes based on prototype testing with uncertainties attributable to sampling variability.
Modeling Finite-Time Failure Probabilities in Risk Analysis Applications.
Dimitrova, Dimitrina S; Kaishev, Vladimir K; Zhao, Shouqi
2015-10-01
In this article, we introduce a framework for analyzing the risk of systems failure based on estimating the failure probability. The latter is defined as the probability that a certain risk process, characterizing the operations of a system, reaches a possibly time-dependent critical risk level within a finite-time interval. Under general assumptions, we define two dually connected models for the risk process and derive explicit expressions for the failure probability and also the joint probability of the time of the occurrence of failure and the excess of the risk process over the risk level. We illustrate how these probabilistic models and results can be successfully applied in several important areas of risk analysis, among which are systems reliability, inventory management, flood control via dam management, infectious disease spread, and financial insolvency. Numerical illustrations are also presented. © 2015 Society for Risk Analysis.
The consistency service of the ATLAS Distributed Data Management system
NASA Astrophysics Data System (ADS)
Serfon, Cédric; Garonne, Vincent; ATLAS Collaboration
2011-12-01
With the continuously increasing volume of data produced by ATLAS and stored on the WLCG sites, the probability of data corruption or data losses, due to software and hardware failures is increasing. In order to ensure the consistency of all data produced by ATLAS a Consistency Service has been developed as part of the DQ2 Distributed Data Management system. This service is fed by the different ATLAS tools, i.e. the analysis tools, production tools, DQ2 site services or by site administrators that report corrupted or lost files. It automatically corrects the errors reported and informs the users in case of irrecoverable file loss.
Analyzing and Predicting Effort Associated with Finding and Fixing Software Faults
NASA Technical Reports Server (NTRS)
Hamill, Maggie; Goseva-Popstojanova, Katerina
2016-01-01
Context: Software developers spend a significant amount of time fixing faults. However, not many papers have addressed the actual effort needed to fix software faults. Objective: The objective of this paper is twofold: (1) analysis of the effort needed to fix software faults and how it was affected by several factors and (2) prediction of the level of fix implementation effort based on the information provided in software change requests. Method: The work is based on data related to 1200 failures, extracted from the change tracking system of a large NASA mission. The analysis includes descriptive and inferential statistics. Predictions are made using three supervised machine learning algorithms and three sampling techniques aimed at addressing the imbalanced data problem. Results: Our results show that (1) 83% of the total fix implementation effort was associated with only 20% of failures. (2) Both safety critical failures and post-release failures required three times more effort to fix compared to non-critical and pre-release counterparts, respectively. (3) Failures with fixes spread across multiple components or across multiple types of software artifacts required more effort. The spread across artifacts was more costly than spread across components. (4) Surprisingly, some types of faults associated with later life-cycle activities did not require significant effort. (5) The level of fix implementation effort was predicted with 73% overall accuracy using the original, imbalanced data. Using oversampling techniques improved the overall accuracy up to 77%. More importantly, oversampling significantly improved the prediction of the high level effort, from 31% to around 85%. Conclusions: This paper shows the importance of tying software failures to changes made to fix all associated faults, in one or more software components and/or in one or more software artifacts, and the benefit of studying how the spread of faults and other factors affect the fix implementation effort.
Chambers, David W
2010-01-01
Every plan contains risk. To proceed without planning some means of managing that risk is to court failure. The basic logic of risk is explained. It consists in identifying a threshold where some corrective action is necessary, the probability of exceeding that threshold, and the attendant cost should the undesired outcome occur. This is the probable cost of failure. Various risk categories in dentistry are identified, including lack of liquidity; poor quality; equipment or procedure failures; employee slips; competitive environments; new regulations; unreliable suppliers, partners, and patients; and threats to one's reputation. It is prudent to make investments in risk management to the extent that the cost of managing the risk is less than the probable loss due to risk failure and when risk management strategies can be matched to type of risk. Four risk management strategies are discussed: insurance, reducing the probability of failure, reducing the costs of failure, and learning. A risk management accounting of the financial meltdown of October 2008 is provided.
Probabilistic safety analysis of earth retaining structures during earthquakes
NASA Astrophysics Data System (ADS)
Grivas, D. A.; Souflis, C.
1982-07-01
A procedure is presented for determining the probability of failure of Earth retaining structures under static or seismic conditions. Four possible modes of failure (overturning, base sliding, bearing capacity, and overall sliding) are examined and their combined effect is evaluated with the aid of combinatorial analysis. The probability of failure is shown to be a more adequate measure of safety than the customary factor of safety. As Earth retaining structures may fail in four distinct modes, a system analysis can provide a single estimate for the possibility of failure. A Bayesian formulation of the safety retaining walls is found to provide an improved measure for the predicted probability of failure under seismic loading. The presented Bayesian analysis can account for the damage incurred to a retaining wall during an earthquake to provide an improved estimate for its probability of failure during future seismic events.
An Empirical Approach to Logical Clustering of Software Failure Regions
1994-03-01
this is a coincidence or normal behavior of failure regions. " Software faults were numbered in order as they were discovered, by the various testing...locations of the associated faults. The goal of this research will be an improved testing technique that incorporates failure region behavior . To do this...clustering behavior . This, however, does not correlate with the structural clustering of failure regions observed by Ginn (1991) on the same set of data
Capturing a failure of an ASIC in-situ, using infrared radiometry and image processing software
NASA Technical Reports Server (NTRS)
Ruiz, Ronald P.
2003-01-01
Failures in electronic devices can sometimes be tricky to locate-especially if they are buried inside radiation-shielded containers designed to work in outer space. Such was the case with a malfunctioning ASIC (Application Specific Integrated Circuit) that was drawing excessive power at a specific temperature during temperature cycle testing. To analyze the failure, infrared radiometry (thermography) was used in combination with image processing software to locate precisely where the power was being dissipated at the moment the failure took place. The IR imaging software was used to make the image of the target and background, appear as unity. As testing proceeded and the failure mode was reached, temperature changes revealed the precise location of the fault. The results gave the design engineers the information they needed to fix the problem. This paper describes the techniques and equipment used to accomplish this failure analysis.
NASA Astrophysics Data System (ADS)
Edwards, John L.; Beekman, Randy M.; Buchanan, David B.; Farner, Scott; Gershzohn, Gary R.; Khuzadi, Mbuyi; Mikula, D. F.; Nissen, Gerry; Peck, James; Taylor, Shaun
2007-04-01
Human space travel is inherently dangerous. Hazardous conditions will exist. Real time health monitoring of critical subsystems is essential for providing a safe abort timeline in the event of a catastrophic subsystem failure. In this paper, we discuss a practical and cost effective process for developing critical subsystem failure detection, diagnosis and response (FDDR). We also present the results of a real time health monitoring simulation of a propellant ullage pressurization subsystem failure. The health monitoring development process identifies hazards, isolates hazard causes, defines software partitioning requirements and quantifies software algorithm development. The process provides a means to establish the number and placement of sensors necessary to provide real time health monitoring. We discuss how health monitoring software tracks subsystem control commands, interprets off-nominal operational sensor data, predicts failure propagation timelines, corroborate failures predictions and formats failure protocol.
Unbiased multi-fidelity estimate of failure probability of a free plane jet
NASA Astrophysics Data System (ADS)
Marques, Alexandre; Kramer, Boris; Willcox, Karen; Peherstorfer, Benjamin
2017-11-01
Estimating failure probability related to fluid flows is a challenge because it requires a large number of evaluations of expensive models. We address this challenge by leveraging multiple low fidelity models of the flow dynamics to create an optimal unbiased estimator. In particular, we investigate the effects of uncertain inlet conditions in the width of a free plane jet. We classify a condition as failure when the corresponding jet width is below a small threshold, such that failure is a rare event (failure probability is smaller than 0.001). We estimate failure probability by combining the frameworks of multi-fidelity importance sampling and optimal fusion of estimators. Multi-fidelity importance sampling uses a low fidelity model to explore the parameter space and create a biasing distribution. An unbiased estimate is then computed with a relatively small number of evaluations of the high fidelity model. In the presence of multiple low fidelity models, this framework offers multiple competing estimators. Optimal fusion combines all competing estimators into a single estimator with minimal variance. We show that this combined framework can significantly reduce the cost of estimating failure probabilities, and thus can have a large impact in fluid flow applications. This work was funded by DARPA.
Bounding the Failure Probability Range of Polynomial Systems Subject to P-box Uncertainties
NASA Technical Reports Server (NTRS)
Crespo, Luis G.; Kenny, Sean P.; Giesy, Daniel P.
2012-01-01
This paper proposes a reliability analysis framework for systems subject to multiple design requirements that depend polynomially on the uncertainty. Uncertainty is prescribed by probability boxes, also known as p-boxes, whose distribution functions have free or fixed functional forms. An approach based on the Bernstein expansion of polynomials and optimization is proposed. In particular, we search for the elements of a multi-dimensional p-box that minimize (i.e., the best-case) and maximize (i.e., the worst-case) the probability of inner and outer bounding sets of the failure domain. This technique yields intervals that bound the range of failure probabilities. The offset between this bounding interval and the actual failure probability range can be made arbitrarily tight with additional computational effort.
The determination of measures of software reliability
NASA Technical Reports Server (NTRS)
Maxwell, F. D.; Corn, B. C.
1978-01-01
Measurement of software reliability was carried out during the development of data base software for a multi-sensor tracking system. The failure ratio and failure rate were found to be consistent measures. Trend lines could be established from these measurements that provide good visualization of the progress on the job as a whole as well as on individual modules. Over one-half of the observed failures were due to factors associated with the individual run submission rather than with the code proper. Possible application of these findings for line management, project managers, functional management, and regulatory agencies is discussed. Steps for simplifying the measurement process and for use of these data in predicting operational software reliability are outlined.
Risk-based decision making to manage water quality failures caused by combined sewer overflows
NASA Astrophysics Data System (ADS)
Sriwastava, A. K.; Torres-Matallana, J. A.; Tait, S.; Schellart, A.
2017-12-01
Regulatory authorities set certain environmental permit for water utilities such that the combined sewer overflows (CSO) managed by these companies conform to the regulations. These utility companies face the risk of paying penalty or negative publicity in case they breach the environmental permit. These risks can be addressed by designing appropriate solutions such as investing in additional infrastructure which improve the system capacity and reduce the impact of CSO spills. The performance of these solutions is often estimated using urban drainage models. Hence, any uncertainty in these models can have a significant effect on the decision making process. This study outlines a risk-based decision making approach to address water quality failure caused by CSO spills. A calibrated lumped urban drainage model is used to simulate CSO spill quality in Haute-Sûre catchment in Luxembourg. Uncertainty in rainfall and model parameters is propagated through Monte Carlo simulations to quantify uncertainty in the concentration of ammonia in the CSO spill. A combination of decision alternatives such as the construction of a storage tank at the CSO and the reduction in the flow contribution of catchment surfaces are selected as planning measures to avoid the water quality failure. Failure is defined as exceedance of a concentration-duration based threshold based on Austrian emission standards for ammonia (De Toffol, 2006) with a certain frequency. For each decision alternative, uncertainty quantification results into a probability distribution of the number of annual CSO spill events which exceed the threshold. For each alternative, a buffered failure probability as defined in Rockafellar & Royset (2010), is estimated. Buffered failure probability (pbf) is a conservative estimate of failure probability (pf), however, unlike failure probability, it includes information about the upper tail of the distribution. A pareto-optimal set of solutions is obtained by performing mean- pbf optimization. The effectiveness of using buffered failure probability compared to the failure probability is tested by comparing the solutions obtained by using mean-pbf and mean-pf optimizations.
Risk Analysis of Earth-Rock Dam Failures Based on Fuzzy Event Tree Method
Fu, Xiao; Gu, Chong-Shi; Su, Huai-Zhi; Qin, Xiang-Nan
2018-01-01
Earth-rock dams make up a large proportion of the dams in China, and their failures can induce great risks. In this paper, the risks associated with earth-rock dam failure are analyzed from two aspects: the probability of a dam failure and the resulting life loss. An event tree analysis method based on fuzzy set theory is proposed to calculate the dam failure probability. The life loss associated with dam failure is summarized and refined to be suitable for Chinese dams from previous studies. The proposed method and model are applied to one reservoir dam in Jiangxi province. Both engineering and non-engineering measures are proposed to reduce the risk. The risk analysis of the dam failure has essential significance for reducing dam failure probability and improving dam risk management level. PMID:29710824
Weighting and Bayes Nets for Rollup of Surveillance Metrics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Henson, Kriste; Sentz, Kari; Hamada, Michael
2012-04-30
The LANL IKE team proposes that the surveillance metrics for several data stream that are used to detect the same failure mode be weighted. Similarly, the failure mode metrics are weighted to obtain a subsystem metric. E.g., if there n data streams (nodes 1-n), the failure mode (node 0) metric is obtained as M{sub 0} = w{sub 1}M{sub 1} + {hor_ellipsis} + w{sub n}M{sub n}, where {Sigma}{sub i=1}{sup n} w{sub i} = 1. This proposal has been implemented with Bayes Nets using the Netica/IKE software by specifying an appropriate conditional probability table (CPT). This CPT is calculated using the samemore » form as (1), where the data stream metrics for the true (T) and false (F) states are replaced by 1 and 0, respectively. Then using this CPT, the failure mode metric calculated by Netica/IKE equals (1). This result has two nice features. First, the rollup Bayes nets is doing can be easily explained. Second, because Bayes Nets can implement this rollup using Netica/IKE, then data marshalling (allocating next year's budget) can be studied. A proof that the claim 'failure mode metric calculated by Netica/IKE equals (1)' for n = 2 and n = 3 follows as well as the sketch of a proof by induction for general n.« less
A methodology for estimating risks associated with landslides of contaminated soil into rivers.
Göransson, Gunnel; Norrman, Jenny; Larson, Magnus; Alén, Claes; Rosén, Lars
2014-02-15
Urban areas adjacent to surface water are exposed to soil movements such as erosion and slope failures (landslides). A landslide is a potential mechanism for mobilisation and spreading of pollutants. This mechanism is in general not included in environmental risk assessments for contaminated sites, and the consequences associated with contamination in the soil are typically not considered in landslide risk assessments. This study suggests a methodology to estimate the environmental risks associated with landslides in contaminated sites adjacent to rivers. The methodology is probabilistic and allows for datasets with large uncertainties and the use of expert judgements, providing quantitative estimates of probabilities for defined failures. The approach is illustrated by a case study along the river Göta Älv, Sweden, where failures are defined and probabilities for those failures are estimated. Failures are defined from a pollution perspective and in terms of exceeding environmental quality standards (EQSs) and acceptable contaminant loads. Models are then suggested to estimate probabilities of these failures. A landslide analysis is carried out to assess landslide probabilities based on data from a recent landslide risk classification study along the river Göta Älv. The suggested methodology is meant to be a supplement to either landslide risk assessment (LRA) or environmental risk assessment (ERA), providing quantitative estimates of the risks associated with landslide in contaminated sites. The proposed methodology can also act as a basis for communication and discussion, thereby contributing to intersectoral management solutions. From the case study it was found that the defined failures are governed primarily by the probability of a landslide occurring. The overall probabilities for failure are low; however, if a landslide occurs the probabilities of exceeding EQS are high and the probability of having at least a 10% increase in the contamination load within one year is also high. Copyright © 2013 Elsevier B.V. All rights reserved.
Statistical modeling of software reliability
NASA Technical Reports Server (NTRS)
Miller, Douglas R.
1992-01-01
This working paper discusses the statistical simulation part of a controlled software development experiment being conducted under the direction of the System Validation Methods Branch, Information Systems Division, NASA Langley Research Center. The experiment uses guidance and control software (GCS) aboard a fictitious planetary landing spacecraft: real-time control software operating on a transient mission. Software execution is simulated to study the statistical aspects of reliability and other failure characteristics of the software during development, testing, and random usage. Quantification of software reliability is a major goal. Various reliability concepts are discussed. Experiments are described for performing simulations and collecting appropriate simulated software performance and failure data. This data is then used to make statistical inferences about the quality of the software development and verification processes as well as inferences about the reliability of software versions and reliability growth under random testing and debugging.
NASA Astrophysics Data System (ADS)
Gan, Luping; Li, Yan-Feng; Zhu, Shun-Peng; Yang, Yuan-Jian; Huang, Hong-Zhong
2014-06-01
Failure mode, effects and criticality analysis (FMECA) and Fault tree analysis (FTA) are powerful tools to evaluate reliability of systems. Although single failure mode issue can be efficiently addressed by traditional FMECA, multiple failure modes and component correlations in complex systems cannot be effectively evaluated. In addition, correlated variables and parameters are often assumed to be precisely known in quantitative analysis. In fact, due to the lack of information, epistemic uncertainty commonly exists in engineering design. To solve these problems, the advantages of FMECA, FTA, fuzzy theory, and Copula theory are integrated into a unified hybrid method called fuzzy probability weighted geometric mean (FPWGM) risk priority number (RPN) method. The epistemic uncertainty of risk variables and parameters are characterized by fuzzy number to obtain fuzzy weighted geometric mean (FWGM) RPN for single failure mode. Multiple failure modes are connected using minimum cut sets (MCS), and Boolean logic is used to combine fuzzy risk priority number (FRPN) of each MCS. Moreover, Copula theory is applied to analyze the correlation of multiple failure modes in order to derive the failure probabilities of each MCS. Compared to the case where dependency among multiple failure modes is not considered, the Copula modeling approach eliminates the error of reliability analysis. Furthermore, for purpose of quantitative analysis, probabilities importance weight from failure probabilities are assigned to FWGM RPN to reassess the risk priority, which generalize the definition of probability weight and FRPN, resulting in a more accurate estimation than that of the traditional models. Finally, a basic fatigue analysis case drawn from turbine and compressor blades in aeroengine is used to demonstrate the effectiveness and robustness of the presented method. The result provides some important insights on fatigue reliability analysis and risk priority assessment of structural system under failure correlations.
NASA Astrophysics Data System (ADS)
Wang, Yu; Jiang, Wenchun; Luo, Yun; Zhang, Yucai; Tu, Shan-Tung
2017-12-01
The reduction and re-oxidation of anode have significant effects on the integrity of the solid oxide fuel cell (SOFC) sealed by the glass-ceramic (GC). The mechanical failure is mainly controlled by the stress distribution. Therefore, a three dimensional model of SOFC is established to investigate the stress evolution during the reduction and re-oxidation by finite element method (FEM) in this paper, and the failure probability is calculated using the Weibull method. The results demonstrate that the reduction of anode can decrease the thermal stresses and reduce the failure probability due to the volumetric contraction and porosity increasing. The re-oxidation can result in a remarkable increase of the thermal stresses, and the failure probabilities of anode, cathode, electrolyte and GC all increase to 1, which is mainly due to the large linear strain rather than the porosity decreasing. The cathode and electrolyte fail as soon as the linear strains are about 0.03% and 0.07%. Therefore, the re-oxidation should be controlled to ensure the integrity, and a lower re-oxidation temperature can decrease the stress and failure probability.
Reliability and Failure Modes of a Hybrid Ceramic Abutment Prototype.
Silva, Nelson Rfa; Teixeira, Hellen S; Silveira, Lucas M; Bonfante, Estevam A; Coelho, Paulo G; Thompson, Van P
2018-01-01
A ceramic and metal abutment prototype was fatigue tested to determine the probability of survival at various loads. Lithium disilicate CAD-milled abutments (n = 24) were cemented to titanium sleeve inserts and then screw attached to titanium fixtures. The assembly was then embedded at a 30° angle in polymethylmethacrylate. Each (n = 24) was restored with a resin-cemented machined lithium disilicate all-ceramic central incisor crown. Single load (lingual-incisal contact) to failure was determined for three specimens. Fatigue testing (n = 21) was conducted employing the step-stress method with lingual mouth motion loading. Failures were recorded, and reliability calculations were performed using proprietary software. Probability Weibull curves were calculated with 90% confidence bounds. Fracture modes were classified with a stereomicroscope, and representative samples imaged with scanning electron microscopy. Fatigue results indicated that the limiting factor in the current design is the fatigue strength of the abutment screw, where screw fracture often leads to failure of the abutment metal sleeve and/or cracking in the implant fixture. Reliability for completion of a mission at 200 N load for 50K cycles was 0.38 (0.52% to 0.25 90% CI) and for 100K cycles was only 0.12 (0.26 to 0.05)-only 12% predicted to survive. These results are similar to those from previous studies on metal to metal abutment/fixture systems where screw failure is a limitation. No ceramic crown or ceramic abutment initiated fractures occurred, supporting the research hypothesis. The limiting factor in performance was the screw failure in the metal-to-metal connection between the prototyped abutment and the fixture, indicating that this configuration should function clinically with no abutment ceramic complications. The combined ceramic with titanium sleeve abutment prototype performance was limited by the fatigue degradation of the abutment screw. In fatigue, no ceramic crown or ceramic abutment components failed, supporting the research hypothesis with a reliability similar to that of all-metal abutment fixture systems. A lithium disilcate abutment with a Ti alloy sleeve in combination with an all-ceramic crown should be expected to function clinically in a satisfactory manner. © 2016 by the American College of Prosthodontists.
Man-rated flight software for the F-8 DFBW program
NASA Technical Reports Server (NTRS)
Bairnsfather, R. R.
1975-01-01
The design, implementation, and verification of the flight control software used in the F-8 DFBW program are discussed. Since the DFBW utilizes an Apollo computer and hardware, the procedures, controls, and basic management techniques employed are based on those developed for the Apollo software system. Program Assembly Control, simulator configuration control, erasable-memory load generation, change procedures and anomaly reporting are discussed. The primary verification tools--the all-digital simulator, the hybrid simulator, and the Iron Bird simulator--are described, as well as the program test plans and their implementation on the various simulators. Failure-effects analysis and the creation of special failure-generating software for testing purposes are described. The quality of the end product is evidenced by the F-8 DFBW flight test program in which 42 flights, totaling 58 hours of flight time, were successfully made without any DFCS inflight software, or hardware, failures.
NASA Technical Reports Server (NTRS)
Dunn, William R.; Corliss, Lloyd D.
1991-01-01
Paper examines issue of software safety. Presents four case histories of software-safety analysis. Concludes that, to be safe, software, for all practical purposes, must be free of errors. Backup systems still needed to prevent catastrophic software failures.
Contraceptive failure in the United States
Trussell, James
2013-01-01
This review provides an update of previous estimates of first-year probabilities of contraceptive failure for all methods of contraception available in the United States. Estimates are provided of probabilities of failure during typical use (which includes both incorrect and inconsistent use) and during perfect use (correct and consistent use). The difference between these two probabilities reveals the consequences of imperfect use; it depends both on how unforgiving of imperfect use a method is and on how hard it is to use that method perfectly. These revisions reflect new research on contraceptive failure both during perfect use and during typical use. PMID:21477680
CARES/Life Ceramics Durability Evaluation Software Used for Mars Microprobe Aeroshell
NASA Technical Reports Server (NTRS)
Nemeth, Noel N.
1998-01-01
The CARES/Life computer program, which was developed at the NASA Lewis Research Center, predicts the probability of a monolithic ceramic component's failure as a function of time in service. The program has many features and options for materials evaluation and component design. It couples commercial finite element programs-which resolve a component's temperature and stress distribution-to-reliability evaluation and fracture mechanics routines for modeling strength-limiting defects. These routines are based on calculations of the probabilistic nature of the brittle material's strength. The capability, flexibility, and uniqueness of CARES/Life has attracted many users representing a broad range of interests and has resulted in numerous awards for technological achievements and technology transfer.
NASA Astrophysics Data System (ADS)
Alvarez, Diego A.; Uribe, Felipe; Hurtado, Jorge E.
2018-02-01
Random set theory is a general framework which comprises uncertainty in the form of probability boxes, possibility distributions, cumulative distribution functions, Dempster-Shafer structures or intervals; in addition, the dependence between the input variables can be expressed using copulas. In this paper, the lower and upper bounds on the probability of failure are calculated by means of random set theory. In order to accelerate the calculation, a well-known and efficient probability-based reliability method known as subset simulation is employed. This method is especially useful for finding small failure probabilities in both low- and high-dimensional spaces, disjoint failure domains and nonlinear limit state functions. The proposed methodology represents a drastic reduction of the computational labor implied by plain Monte Carlo simulation for problems defined with a mixture of representations for the input variables, while delivering similar results. Numerical examples illustrate the efficiency of the proposed approach.
Failure detection and recovery in the assembly/contingency subsystem
NASA Technical Reports Server (NTRS)
Gantenbein, Rex E.
1993-01-01
The Assembly/Contingency Subsystem (ACS) is the primary communications link on board the Space Station. Any failure in a component of this system or in the external devices through which it communicates with ground-based systems will isolate the Station. The ACS software design includes a failure management capability (ACFM) that provides protocols for failure detection, isolation, and recovery (FDIR). The the ACFM design requirements as outlined in the current ACS software requirements specification document are reviewed. The activities carried out in this review include: (1) an informal, but thorough, end-to-end failure mode and effects analysis of the proposed software architecture for the ACFM; and (2) a prototype of the ACFM software, implemented as a C program under the UNIX operating system. The purpose of this review is to evaluate the FDIR protocols specified in the ACS design and the specifications themselves in light of their use in implementing the ACFM. The basis of failure detection in the ACFM is the loss of signal between the ground and the Station, which (under the appropriate circumstances) will initiate recovery to restore communications. This recovery involves the reconfiguration of the ACS to either a backup set of components or to a degraded communications mode. The initiation of recovery depends largely on the criticality of the failure mode, which is defined by tables in the ACFM and can be modified to provide a measure of flexibility in recovery procedures.
Multichannel analysis of the surface waves of earth materials in some parts of Lagos State, Nigeria
NASA Astrophysics Data System (ADS)
Adegbola, R. B.; Oyedele, K. F.; Adeoti, L.; Adeloye, A. B.
2016-09-01
We present a method that utilizes multichannel analysis of surface waves (MASW), which was used to measure shear wave velocities, with a view to establishing the probable causes of road failure, subsidence and weakening of structures in some local government areas in Lagos, Nigeria. MASW data were acquired using a 24-channel seismograph. The acquired data were processed and transformed into a two-dimensional (2-D) structure reflective of the depth and surface wave velocity distribution within a depth of 0-15 m beneath the surface using SURFSEIS software. The shear wave velocity data were compared with other geophysical/ borehole data that were acquired along the same profile. The comparison and correlation illustrate the accuracy and consistency of MASW-derived shear wave velocity profiles. Rigidity modulus and N-value were also generated. The study showed that the low velocity/ very low velocity data are reflective of organic clay/ peat materials and thus likely responsible for the failure, subsidence and weakening of structures within the study areas.
Risk and Vulnerability Analysis of Satellites Due to MM/SD with PIRAT
NASA Astrophysics Data System (ADS)
Kempf, Scott; Schafer, Frank Rudolph, Martin; Welty, Nathan; Donath, Therese; Destefanis, Roberto; Grassi, Lilith; Janovsky, Rolf; Evans, Leanne; Winterboer, Arne
2013-08-01
Until recently, the state-of-the-art assessment of the threat posed to spacecraft by micrometeoroids and space debris was limited to the application of ballistic limit equations to the outer hull of a spacecraft. The probability of no penetration (PNP) is acceptable for assessing the risk and vulnerability of manned space mission, however, for unmanned missions, whereby penetrations of the spacecraft exterior do not necessarily constitute satellite or mission failure, these values are overly conservative. The newly developed software tool PIRAT (Particle Impact Risk and Vulnerability Analysis Tool) has been developed based on the Schäfer-Ryan-Lambert (SRL) triple-wall ballistic limit equation (BLE), applicable for various satellite components. As a result, it has become possible to assess the individual failure rates of satellite components. This paper demonstrates the modeling of an example satellite, the performance of a PIRAT analysis and the potential for subsequent design optimizations with respect of micrometeoroid and space debris (MM/SD) impact risk.
Using Combined SFTA and SFMECA Techniques for Space Critical Software
NASA Astrophysics Data System (ADS)
Nicodemos, F. G.; Lahoz, C. H. N.; Abdala, M. A. D.; Saotome, O.
2012-01-01
This work addresses the combined Software Fault Tree Analysis (SFTA) and Software Failure Modes, Effects and Criticality Analysis (SFMECA) techniques applied to space critical software of satellite launch vehicles. The combined approach is under research as part of the Verification and Validation (V&V) efforts to increase software dependability and as future application in other projects under development at Instituto de Aeronáutica e Espaço (IAE). The applicability of such approach was conducted on system software specification and applied to a case study based on the Brazilian Satellite Launcher (VLS). The main goal is to identify possible failure causes and obtain compensating provisions that lead to inclusion of new functional and non-functional system software requirements.
Analysis of Emergency Diesel Generators Failure Incidents in Nuclear Power Plants
NASA Astrophysics Data System (ADS)
Hunt, Ronderio LaDavis
In early years of operation, emergency diesel generators have had a minimal rate of demand failures. Emergency diesel generators are designed to operate as a backup when the main source of electricity has been disrupted. As of late, EDGs (emergency diesel generators) have been failing at NPPs (nuclear power plants) around the United States causing either station blackouts or loss of onsite and offsite power. These failures occurred from a specific type called demand failures. This thesis evaluated the current problem that raised concern in the nuclear industry which was averaging 1 EDG demand failure/year in 1997 to having an excessive event of 4 EDG demand failure year which occurred in 2011. To determine the next occurrence of the extreme event and possible cause to an event of such happening, two analyses were conducted, the statistical and root cause analysis. Considering the statistical analysis in which an extreme event probability approach was applied to determine the next occurrence year of an excessive event as well as, the probability of that excessive event occurring. Using the root cause analysis in which the potential causes of the excessive event occurred by evaluating, the EDG manufacturers, aging, policy changes/ maintenance practices and failure components. The root cause analysis investigated the correlation between demand failure data and historical data. Final results from the statistical analysis showed expectations of an excessive event occurring in a fixed range of probability and a wider range of probability from the extreme event probability approach. The root-cause analysis of the demand failure data followed historical statistics for the EDG manufacturer, aging and policy changes/ maintenance practices but, indicated a possible cause regarding the excessive event with the failure components. Conclusions showed the next excessive demand failure year, prediction of the probability and the next occurrence year of such failures, with an acceptable confidence level, was difficult but, it was likely that this type of failure will not be a 100 year event. It was noticeable to see that the majority of the EDG demand failures occurred within the main components as of 2005. The overall analysis of this study provided from percentages, indicated that it would be appropriate to make the statement that the excessive event was caused by the overall age (wear and tear) of the Emergency Diesel Generators in Nuclear Power Plants. Future Work will be to better determine the return period of the excessive event once the occurrence has happened for a second time by implementing the extreme event probability approach.
NASA Technical Reports Server (NTRS)
Smart, Christian
1998-01-01
During 1997, a team from Hernandez Engineering, MSFC, Rocketdyne, Thiokol, Pratt & Whitney, and USBI completed the first phase of a two year Quantitative Risk Assessment (QRA) of the Space Shuttle. The models for the Shuttle systems were entered and analyzed by a new QRA software package. This system, termed the Quantitative Risk Assessment System(QRAS), was designed by NASA and programmed by the University of Maryland. The software is a groundbreaking PC-based risk assessment package that allows the user to model complex systems in a hierarchical fashion. Features of the software include the ability to easily select quantifications of failure modes, draw Event Sequence Diagrams(ESDs) interactively, perform uncertainty and sensitivity analysis, and document the modeling. This paper illustrates both the approach used in modeling and the particular features of the software package. The software is general and can be used in a QRA of any complex engineered system. The author is the project lead for the modeling of the Space Shuttle Main Engines (SSMEs), and this paper focuses on the modeling completed for the SSMEs during 1997. In particular, the groundrules for the study, the databases used, the way in which ESDs were used to model catastrophic failure of the SSMES, the methods used to quantify the failure rates, and how QRAS was used in the modeling effort are discussed. Groundrules were necessary to limit the scope of such a complex study, especially with regard to a liquid rocket engine such as the SSME, which can be shut down after ignition either on the pad or in flight. The SSME was divided into its constituent components and subsystems. These were ranked on the basis of the possibility of being upgraded and risk of catastrophic failure. Once this was done the Shuttle program Hazard Analysis and Failure Modes and Effects Analysis (FMEA) were used to create a list of potential failure modes to be modeled. The groundrules and other criteria were used to screen out the many failure modes that did not contribute significantly to the catastrophic risk. The Hazard Analysis and FMEA for the SSME were also used to build ESDs that show the chain of events leading from the failure mode occurence to one of the following end states: catastrophic failure, engine shutdown, or siccessful operation( successful with respect to the failure mode under consideration).
On Correlated Failures in Survivable Storage Systems
2002-05-01
Littlewood, D.R. Miller, “Conceptual modeling of coincident failures in multiversion software”, IEEE Transactions on Software Engineering, Volume: 15 Issue...Recovery in Multiversion Software”. IEEE Transaction on Software Engineering, Vol. 16 No.3, March 1990 [Plank1997] J. Plank “A tutorial on Reed-Solomon
Game-Theoretic strategies for systems of components using product-form utilities
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rao, Nageswara S; Ma, Cheng-Yu; Hausken, K.
Many critical infrastructures are composed of multiple systems of components which are correlated so that disruptions to one may propagate to others. We consider such infrastructures with correlations characterized in two ways: (i) an aggregate failure correlation function specifies the conditional failure probability of the infrastructure given the failure of an individual system, and (ii) a pairwise correlation function between two systems specifies the failure probability of one system given the failure of the other. We formulate a game for ensuring the resilience of the infrastructure, wherein the utility functions of the provider and attacker are products of an infrastructuremore » survival probability term and a cost term, both expressed in terms of the numbers of system components attacked and reinforced. The survival probabilities of individual systems satisfy first-order differential conditions that lead to simple Nash Equilibrium conditions. We then derive sensitivity functions that highlight the dependence of infrastructure resilience on the cost terms, correlation functions, and individual system survival probabilities. We apply these results to simplified models of distributed cloud computing and energy grid infrastructures.« less
[Low Fidelity Simulation of a Zero-Y Robot
NASA Technical Reports Server (NTRS)
Sweet, Adam
2001-01-01
The item to be cleared is a low-fidelity software simulation model of a hypothetical freeflying robot designed for use in zero gravity environments. This simulation model works with the HCC simulation system that was developed by Xerox PARC and NASA Ames Research Center. HCC has been previously cleared for distribution. When used with the HCC software, the model computes the location and orientation of the simulated robot over time. Failures (such as a broken motor) can be injected into the simulation to produce simulated behavior corresponding to the failure. Release of this simulation will allow researchers to test their software diagnosis systems by attempting to diagnose the simulated failure from the simulated behavior. This model does not contain any encryption software nor can it perform any control tasks that might be export controlled.
Concept Development for Software Health Management
NASA Technical Reports Server (NTRS)
Riecks, Jung; Storm, Walter; Hollingsworth, Mark
2011-01-01
This report documents the work performed by Lockheed Martin Aeronautics (LM Aero) under NASA contract NNL06AA08B, delivery order NNL07AB06T. The Concept Development for Software Health Management (CDSHM) program was a NASA funded effort sponsored by the Integrated Vehicle Health Management Project, one of the four pillars of the NASA Aviation Safety Program. The CD-SHM program focused on defining a structured approach to software health management (SHM) through the development of a comprehensive failure taxonomy that is used to characterize the fundamental failure modes of safety-critical software.
NASA Technical Reports Server (NTRS)
Motyka, P.
1983-01-01
A methodology is developed and applied for quantitatively analyzing the reliability of a dual, fail-operational redundant strapdown inertial measurement unit (RSDIMU). A Markov evaluation model is defined in terms of the operational states of the RSDIMU to predict system reliability. A 27 state model is defined based upon a candidate redundancy management system which can detect and isolate a spectrum of failure magnitudes. The results of parametric studies are presented which show the effect on reliability of the gyro failure rate, both the gyro and accelerometer failure rates together, false alarms, probability of failure detection, probability of failure isolation, and probability of damage effects and mission time. A technique is developed and evaluated for generating dynamic thresholds for detecting and isolating failures of the dual, separated IMU. Special emphasis is given to the detection of multiple, nonconcurrent failures. Digital simulation time histories are presented which show the thresholds obtained and their effectiveness in detecting and isolating sensor failures.
Lifecycle Prognostics Architecture for Selected High-Cost Active Components
DOE Office of Scientific and Technical Information (OSTI.GOV)
N. Lybeck; B. Pham; M. Tawfik
There are an extensive body of knowledge and some commercial products available for calculating prognostics, remaining useful life, and damage index parameters. The application of these technologies within the nuclear power community is still in its infancy. Online monitoring and condition-based maintenance is seeing increasing acceptance and deployment, and these activities provide the technological bases for expanding to add predictive/prognostics capabilities. In looking to deploy prognostics there are three key aspects of systems that are presented and discussed: (1) component/system/structure selection, (2) prognostic algorithms, and (3) prognostics architectures. Criteria are presented for component selection: feasibility, failure probability, consequences of failure,more » and benefits of the prognostics and health management (PHM) system. The basis and methods commonly used for prognostics algorithms are reviewed and summarized. Criteria for evaluating PHM architectures are presented: open, modular architecture; platform independence; graphical user interface for system development and/or results viewing; web enabled tools; scalability; and standards compatibility. Thirteen software products were identified and discussed in the context of being potentially useful for deployment in a PHM program applied to systems in a nuclear power plant (NPP). These products were evaluated by using information available from company websites, product brochures, fact sheets, scholarly publications, and direct communication with vendors. The thirteen products were classified into four groups of software: (1) research tools, (2) PHM system development tools, (3) deployable architectures, and (4) peripheral tools. Eight software tools fell into the deployable architectures category. Of those eight, only two employ all six modules of a full PHM system. Five systems did not offer prognostic estimates, and one system employed the full health monitoring suite but lacked operations and maintenance support. Each product is briefly described in Appendix A. Selection of the most appropriate software package for a particular application will depend on the chosen component, system, or structure. Ongoing research will determine the most appropriate choices for a successful demonstration of PHM systems in aging NPPs.« less
Improving Software Engineering on NASA Projects
NASA Technical Reports Server (NTRS)
Crumbley, Tim; Kelly, John C.
2010-01-01
Software Engineering Initiative: Reduces risk of software failure -Increases mission safety. More predictable software cost estimates and delivery schedules. Smarter buyer of contracted out software. More defects found and removed earlier. Reduces duplication of efforts between projects. Increases ability to meet the challenges of evolving software technology.
[Comments on the use of the "life-table method" in orthopedics].
Hassenpflug, J; Hahne, H J; Hedderich, J
1992-01-01
In the description of long term results, e.g. of joint replacements, survivorship analysis is used increasingly in orthopaedic surgery. The survivorship analysis is more useful to describe the frequency of failure rather than global statements in percentage. The relative probability of failure for fixed intervals is drawn from the number of controlled patients and the frequency of failure. The complementary probabilities of success are linked in their temporal sequence thus representing the probability of survival at a fixed endpoint. Necessary condition for the use of this procedure is the exact definition of moment and manner of failure. It is described how to establish survivorship tables.
Sensor Data Quality and Angular Rate Down-Selection Algorithms on SLS EM-1
NASA Technical Reports Server (NTRS)
Park, Thomas; Smith, Austin; Oliver, T. Emerson
2018-01-01
The NASA Space Launch System Block 1 launch vehicle is equipped with an Inertial Navigation System (INS) and multiple Rate Gyro Assemblies (RGA) that are used in the Guidance, Navigation, and Control (GN&C) algorithms. The INS provides the inertial position, velocity, and attitude of the vehicle along with both angular rate and specific force measurements. Additionally, multiple sets of co-located rate gyros supply angular rate data. The collection of angular rate data, taken along the launch vehicle, is used to separate out vehicle motion from flexible body dynamics. Since the system architecture uses redundant sensors, the capability was developed to evaluate the health (or validity) of the independent measurements. A suite of Sensor Data Quality (SDQ) algorithms is responsible for assessing the angular rate data from the redundant sensors. When failures are detected, SDQ will take the appropriate action and disqualify or remove faulted sensors from forward processing. Additionally, the SDQ algorithms contain logic for down-selecting the angular rate data used by the GNC software from the set of healthy measurements. This paper explores the trades and analyses that were performed in selecting a set of robust fault-detection algorithms included in the GN&C flight software. These trades included both an assessment of hardware-provided health and status data as well as an evaluation of different algorithms based on time-to-detection, type of failures detected, and probability of detecting false positives. We then provide an overview of the algorithms used for both fault-detection and measurement down selection. We next discuss the role of trajectory design, flexible-body models, and vehicle response to off-nominal conditions in setting the detection thresholds. Lastly, we present lessons learned from software integration and hardware-in-the-loop testing.
Ye, Qing; Pan, Hao; Liu, Changhua
2015-01-01
This research proposes a novel framework of final drive simultaneous failure diagnosis containing feature extraction, training paired diagnostic models, generating decision threshold, and recognizing simultaneous failure modes. In feature extraction module, adopt wavelet package transform and fuzzy entropy to reduce noise interference and extract representative features of failure mode. Use single failure sample to construct probability classifiers based on paired sparse Bayesian extreme learning machine which is trained only by single failure modes and have high generalization and sparsity of sparse Bayesian learning approach. To generate optimal decision threshold which can convert probability output obtained from classifiers into final simultaneous failure modes, this research proposes using samples containing both single and simultaneous failure modes and Grid search method which is superior to traditional techniques in global optimization. Compared with other frequently used diagnostic approaches based on support vector machine and probability neural networks, experiment results based on F 1-measure value verify that the diagnostic accuracy and efficiency of the proposed framework which are crucial for simultaneous failure diagnosis are superior to the existing approach. PMID:25722717
Engineering and Software Engineering
NASA Astrophysics Data System (ADS)
Jackson, Michael
The phrase ‘software engineering' has many meanings. One central meaning is the reliable development of dependable computer-based systems, especially those for critical applications. This is not a solved problem. Failures in software development have played a large part in many fatalities and in huge economic losses. While some of these failures may be attributable to programming errors in the narrowest sense—a program's failure to satisfy a given formal specification—there is good reason to think that most of them have other roots. These roots are located in the problem of software engineering rather than in the problem of program correctness. The famous 1968 conference was motivated by the belief that software development should be based on “the types of theoretical foundations and practical disciplines that are traditional in the established branches of engineering.” Yet after forty years of currency the phrase ‘software engineering' still denotes no more than a vague and largely unfulfilled aspiration. Two major causes of this disappointment are immediately clear. First, too many areas of software development are inadequately specialised, and consequently have not developed the repertoires of normal designs that are the indispensable basis of reliable engineering success. Second, the relationship between structural design and formal analytical techniques for software has rarely been one of fruitful synergy: too often it has defined a boundary between competing dogmas, at which mutual distrust and incomprehension deprive both sides of advantages that should be within their grasp. This paper discusses these causes and their effects. Whether the common practice of software development will eventually satisfy the broad aspiration of 1968 is hard to predict; but an understanding of past failure is surely a prerequisite of future success.
From Bridges and Rockets, Lessons for Software Systems
NASA Technical Reports Server (NTRS)
Holloway, C. Michael
2004-01-01
Although differences exist between building software systems and building physical structures such as bridges and rockets, enough similarities exist that software engineers can learn lessons from failures in traditional engineering disciplines. This paper draws lessons from two well-known failures the collapse of the Tacoma Narrows Bridge in 1940 and the destruction of the space shuttle Challenger in 1986 and applies these lessons to software system development. The following specific applications are made: (1) the verification and validation of a software system should not be based on a single method, or a single style of methods; (2) the tendency to embrace the latest fad should be overcome; and (3) the introduction of software control into safety-critical systems should be done cautiously.
Rock face stability analysis and potential rockfall source detection in Yosemite Valley
NASA Astrophysics Data System (ADS)
Matasci, B.; Stock, G. M.; Jaboyedoff, M.; Oppikofer, T.; Pedrazzini, A.; Carrea, D.
2012-04-01
Rockfall hazard in Yosemite Valley is especially high owing to the great cliff heights (~1 km), the fracturing of the steep granitic cliffs, and the widespread occurrence of surface parallel sheeting or exfoliation joints. Between 1857 and 2011, 890 documented rockfalls and other slope movements caused 15 fatalities and at least 82 injuries. The first part of this study focused on realizing a structural study for Yosemite Valley at both regional (valley-wide) and local (rockfall source area) scales. The dominant joint sets were completely characterized by their orientation, persistence, spacing, roughness and opening. Spacing and trace length for each joint set were accurately measured on terrestrial laser scanning (TLS) point clouds with the software PolyWorks (InnovMetric). Based on this fundamental information the second part of the study aimed to detect the most important failure mechanisms leading to rockfalls. With the software Matterocking and the 1m cell size DEM, we calculated the number of possible failure mechanisms (wedge sliding, planar sliding, toppling) per cell, for several cliffs of the valley. Orientation, spacing and persistence measurements directly issued from field and TLS data were inserted in the Matterocking calculations. TLS point clouds are much more accurate than the 1m DEM and show the overhangs of the cliffs. Accordingly, with the software Coltop 3D we developed a methodology similar to the one used with Matterocking to identify on the TLS point clouds the areas of a cliff with the highest number of failure mechanisms. Exfoliation joints are included in this stability analysis in the same way as the other joint sets, with the only difference that their orientation is parallel to the local cliff orientation and thus variable. This means that, in two separate areas of a cliff, the exfoliation joint set is taken into account with different dip direction and dip, but its effect on the stability assessment is the same. Areas with a high density of possible failure mechanisms are shown to be more susceptible to rockfalls, demonstrating a link between high fracture density and rockfall susceptibility. This approach enables locating the most probable future rockfall sources and provides key elements needed to evaluate the potential volume and run-out distance of rockfall blocks. This information is used to improve rockfall hazard assessment in Yosemite Valley and elsewhere.
A risk assessment method for multi-site damage
NASA Astrophysics Data System (ADS)
Millwater, Harry Russell, Jr.
This research focused on developing probabilistic methods suitable for computing small probabilities of failure, e.g., 10sp{-6}, of structures subject to multi-site damage (MSD). MSD is defined as the simultaneous development of fatigue cracks at multiple sites in the same structural element such that the fatigue cracks may coalesce to form one large crack. MSD is modeled as an array of collinear cracks with random initial crack lengths with the centers of the initial cracks spaced uniformly apart. The data used was chosen to be representative of aluminum structures. The structure is considered failed whenever any two adjacent cracks link up. A fatigue computer model is developed that can accurately and efficiently grow a collinear array of arbitrary length cracks from initial size until failure. An algorithm is developed to compute the stress intensity factors of all cracks considering all interaction effects. The probability of failure of two to 100 cracks is studied. Lower bounds on the probability of failure are developed based upon the probability of the largest crack exceeding a critical crack size. The critical crack size is based on the initial crack size that will grow across the ligament when the neighboring crack has zero length. The probability is evaluated using extreme value theory. An upper bound is based on the probability of the maximum sum of initial cracks being greater than a critical crack size. A weakest link sampling approach is developed that can accurately and efficiently compute small probabilities of failure. This methodology is based on predicting the weakest link, i.e., the two cracks to link up first, for a realization of initial crack sizes, and computing the cycles-to-failure using these two cracks. Criteria to determine the weakest link are discussed. Probability results using the weakest link sampling method are compared to Monte Carlo-based benchmark results. The results indicate that very small probabilities can be computed accurately in a few minutes using a Hewlett-Packard workstation.
Accident hazard evaluation and control decisions on forested recreation sites
Lee A. Paine
1971-01-01
Accident hazard associated with trees on recreation sites is inherently concerned with probabilities. The major factors include the probabilities of mechanical failure and of target impact if failure occurs, the damage potential of the failure, and the target value. Hazard may be evaluated as the product of these factors; i.e., expected loss during the current...
2010-09-01
The MasterNet project continued to expand in software and hardware complexity until its failure ( Szilagyi , n.d.). Despite all of the issues...were used for MasterNet ( Szilagyi , n.d.). Although executive management committed significant financial resources to MasterNet, Bank of America...implementation failure as well as project- management failure as a whole ( Szilagyi , n.d.). The lesson learned from this vignette is the importance of setting
Fault tolerance in a supercomputer through dynamic repartitioning
Chen, Dong; Coteus, Paul W.; Gara, Alan G.; Takken, Todd E.
2007-02-27
A multiprocessor, parallel computer is made tolerant to hardware failures by providing extra groups of redundant standby processors and by designing the system so that these extra groups of processors can be swapped with any group which experiences a hardware failure. This swapping can be under software control, thereby permitting the entire computer to sustain a hardware failure but, after swapping in the standby processors, to still appear to software as a pristine, fully functioning system.
Reliability Evaluation of Machine Center Components Based on Cascading Failure Analysis
NASA Astrophysics Data System (ADS)
Zhang, Ying-Zhi; Liu, Jin-Tong; Shen, Gui-Xiang; Long, Zhe; Sun, Shu-Guang
2017-07-01
In order to rectify the problems that the component reliability model exhibits deviation, and the evaluation result is low due to the overlook of failure propagation in traditional reliability evaluation of machine center components, a new reliability evaluation method based on cascading failure analysis and the failure influenced degree assessment is proposed. A direct graph model of cascading failure among components is established according to cascading failure mechanism analysis and graph theory. The failure influenced degrees of the system components are assessed by the adjacency matrix and its transposition, combined with the Pagerank algorithm. Based on the comprehensive failure probability function and total probability formula, the inherent failure probability function is determined to realize the reliability evaluation of the system components. Finally, the method is applied to a machine center, it shows the following: 1) The reliability evaluation values of the proposed method are at least 2.5% higher than those of the traditional method; 2) The difference between the comprehensive and inherent reliability of the system component presents a positive correlation with the failure influenced degree of the system component, which provides a theoretical basis for reliability allocation of machine center system.
Reliability measurement during software development. [for a multisensor tracking system
NASA Technical Reports Server (NTRS)
Hecht, H.; Sturm, W. A.; Trattner, S.
1977-01-01
During the development of data base software for a multi-sensor tracking system, reliability was measured. The failure ratio and failure rate were found to be consistent measures. Trend lines were established from these measurements that provided good visualization of the progress on the job as a whole as well as on individual modules. Over one-half of the observed failures were due to factors associated with the individual run submission rather than with the code proper. Possible application of these findings for line management, project managers, functional management, and regulatory agencies is discussed. Steps for simplifying the measurement process and for use of these data in predicting operational software reliability are outlined.
Estimating earthquake-induced failure probability and downtime of critical facilities.
Porter, Keith; Ramer, Kyle
2012-01-01
Fault trees have long been used to estimate failure risk in earthquakes, especially for nuclear power plants (NPPs). One interesting application is that one can assess and manage the probability that two facilities - a primary and backup - would be simultaneously rendered inoperative in a single earthquake. Another is that one can calculate the probabilistic time required to restore a facility to functionality, and the probability that, during any given planning period, the facility would be rendered inoperative for any specified duration. A large new peer-reviewed library of component damageability and repair-time data for the first time enables fault trees to be used to calculate the seismic risk of operational failure and downtime for a wide variety of buildings other than NPPs. With the new library, seismic risk of both the failure probability and probabilistic downtime can be assessed and managed, considering the facility's unique combination of structural and non-structural components, their seismic installation conditions, and the other systems on which the facility relies. An example is offered of real computer data centres operated by a California utility. The fault trees were created and tested in collaboration with utility operators, and the failure probability and downtime results validated in several ways.
Man-rated flight software for the F-8 DFBW program
NASA Technical Reports Server (NTRS)
Bairnsfather, R. R.
1976-01-01
The design, implementation, and verification of the flight control software used in the F-8 DFBW program are discussed. Since the DFBW utilizes an Apollo computer and hardware, the procedures, controls, and basic management techniques employed are based on those developed for the Apollo software system. Program assembly control, simulator configuration control, erasable-memory load generation, change procedures and anomaly reporting are discussed. The primary verification tools are described, as well as the program test plans and their implementation on the various simulators. Failure effects analysis and the creation of special failure generating software for testing purposes are described.
Time-dependent earthquake probabilities
Gomberg, J.; Belardinelli, M.E.; Cocco, M.; Reasenberg, P.
2005-01-01
We have attempted to provide a careful examination of a class of approaches for estimating the conditional probability of failure of a single large earthquake, particularly approaches that account for static stress perturbations to tectonic loading as in the approaches of Stein et al. (1997) and Hardebeck (2004). We have loading as in the framework based on a simple, generalized rate change formulation and applied it to these two approaches to show how they relate to one another. We also have attempted to show the connection between models of seismicity rate changes applied to (1) populations of independent faults as in background and aftershock seismicity and (2) changes in estimates of the conditional probability of failures of different members of a the notion of failure rate corresponds to successive failures of different members of a population of faults. The latter application requires specification of some probability distribution (density function of PDF) that describes some population of potential recurrence times. This PDF may reflect our imperfect knowledge of when past earthquakes have occurred on a fault (epistemic uncertainty), the true natural variability in failure times, or some combination of both. We suggest two end-member conceptual single-fault models that may explain natural variability in recurrence times and suggest how they might be distinguished observationally. When viewed deterministically, these single-fault patch models differ significantly in their physical attributes, and when faults are immature, they differ in their responses to stress perturbations. Estimates of conditional failure probabilities effectively integrate over a range of possible deterministic fault models, usually with ranges that correspond to mature faults. Thus conditional failure probability estimates usually should not differ significantly for these models. Copyright 2005 by the American Geophysical Union.
Code of Federal Regulations, 2010 CFR
2010-01-01
... staff; (4) Committee computer, software or Internet service provider failures; (5) A committee's failure... software despite the respondent seeking technical assistance from Commission personnel and resources; (2) A... Commission's or respondent's computer systems or Internet service provider; and (3) Severe weather or other...
Two-IMU FDI performance of the sequential probability ratio test during shuttle entry
NASA Technical Reports Server (NTRS)
Rich, T. M.
1976-01-01
Performance data for the sequential probability ratio test (SPRT) during shuttle entry are presented. Current modeling constants and failure thresholds are included for the full mission 3B from entry through landing trajectory. Minimum 100 percent detection/isolation failure levels and a discussion of the effects of failure direction are presented. Finally, a limited comparison of failures introduced at trajectory initiation shows that the SPRT algorithm performs slightly worse than the data tracking test.
Medium Fidelity Simulation of Oxygen Tank Venting
NASA Technical Reports Server (NTRS)
Sweet, Adam; Kurien, James; Lau, Sonie (Technical Monitor)
2001-01-01
The item to he cleared is a medium-fidelity software simulation model of a vented cryogenic tank. Such tanks are commonly used to transport cryogenic liquids such as liquid oxygen via truck, and have appeared on liquid-fueled rockets for decades. This simulation model works with the HCC simulation system that was developed by Xerox PARC and NASA Ames Research Center. HCC has been previously cleared for distribution. When used with the HCC software, the model generates simulated readings for the tank pressure and temperature as the simulated cryogenic liquid boils off and is vented. Failures (such as a broken vent valve) can be injected into the simulation to produce readings corresponding to the failure. Release of this simulation will allow researchers to test their software diagnosis systems by attempting to diagnose the simulated failure from the simulated readings. This model does not contain any encryption software nor can it perform any control tasks that might be export controlled.
NASA Technical Reports Server (NTRS)
Bergmann, E.
1976-01-01
The current baseline method and software implementation of the space shuttle reaction control subsystem failure detection and identification (RCS FDI) system is presented. This algorithm is recommended for conclusion in the redundancy management (RM) module of the space shuttle guidance, navigation, and control system. Supporting software is presented, and recommended for inclusion in the system management (SM) and display and control (D&C) systems. RCS FDI uses data from sensors in the jets, in the manifold isolation valves, and in the RCS fuel and oxidizer storage tanks. A list of jet failures and fuel imbalance warnings is generated for use by the jet selection algorithm of the on-orbit and entry flight control systems, and to inform the crew and ground controllers of RCS failure status. Manifold isolation valve close commands are generated in the event of failed on or leaking jets to prevent loss of large quantities of RCS fuel.
DEPEND - A design environment for prediction and evaluation of system dependability
NASA Technical Reports Server (NTRS)
Goswami, Kumar K.; Iyer, Ravishankar K.
1990-01-01
The development of DEPEND, an integrated simulation environment for the design and dependability analysis of fault-tolerant systems, is described. DEPEND models both hardware and software components at a functional level, and allows automatic failure injection to assess system performance and reliability. It relieves the user of the work needed to inject failures, maintain statistics, and output reports. The automatic failure injection scheme is geared toward evaluating a system under high stress (workload) conditions. The failures that are injected can affect both hardware and software components. To illustrate the capability of the simulator, a distributed system which employs a prediction-based, dynamic load-balancing heuristic is evaluated. Experiments were conducted to determine the impact of failures on system performance and to identify the failures to which the system is especially susceptible.
Orion Burn Management, Nominal and Response to Failures
NASA Technical Reports Server (NTRS)
Odegard, Ryan; Goodman, John L.; Barrett, Charles P.; Pohlkamp, Kara; Robinson, Shane
2016-01-01
An approach for managing Orion on-orbit burn execution is described for nominal and failure response scenarios. The burn management strategy for Orion takes into account per-burn variations in targeting, timing, and execution; crew and ground operator intervention and overrides; defined burn failure triggers and responses; and corresponding on-board software sequencing functionality. Burn-to- burn variations are managed through the identification of specific parameters that may be updated for each progressive burn. Failure triggers and automatic responses during the burn timeframe are defined to provide safety for the crew in the case of vehicle failures, along with override capabilities to ensure operational control of the vehicle. On-board sequencing software provides the timeline coordination for performing the required activities related to targeting, burn execution, and responding to burn failures.
NASA Technical Reports Server (NTRS)
Lawrence, Stella
1991-01-01
The object of this project was to develop and calibrate quantitative models for predicting the quality of software. Reliable flight and supporting ground software is a highly important factor in the successful operation of the space shuttle program. The models used in the present study consisted of SMERFS (Statistical Modeling and Estimation of Reliability Functions for Software). There are ten models in SMERFS. For a first run, the results obtained in modeling the cumulative number of failures versus execution time showed fairly good results for our data. Plots of cumulative software failures versus calendar weeks were made and the model results were compared with the historical data on the same graph. If the model agrees with actual historical behavior for a set of data then there is confidence in future predictions for this data. Considering the quality of the data, the models have given some significant results, even at this early stage. With better care in data collection, data analysis, recording of the fixing of failures and CPU execution times, the models should prove extremely helpful in making predictions regarding the future pattern of failures, including an estimate of the number of errors remaining in the software and the additional testing time required for the software quality to reach acceptable levels. It appears that there is no one 'best' model for all cases. It is for this reason that the aim of this project was to test several models. One of the recommendations resulting from this study is that great care must be taken in the collection of data. When using a model, the data should satisfy the model assumptions.
Study of fault tolerant software technology for dynamic systems
NASA Technical Reports Server (NTRS)
Caglayan, A. K.; Zacharias, G. L.
1985-01-01
The major aim of this study is to investigate the feasibility of using systems-based failure detection isolation and compensation (FDIC) techniques in building fault-tolerant software and extending them, whenever possible, to the domain of software fault tolerance. First, it is shown that systems-based FDIC methods can be extended to develop software error detection techniques by using system models for software modules. In particular, it is demonstrated that systems-based FDIC techniques can yield consistency checks that are easier to implement than acceptance tests based on software specifications. Next, it is shown that systems-based failure compensation techniques can be generalized to the domain of software fault tolerance in developing software error recovery procedures. Finally, the feasibility of using fault-tolerant software in flight software is investigated. In particular, possible system and version instabilities, and functional performance degradation that may occur in N-Version programming applications to flight software are illustrated. Finally, a comparative analysis of N-Version and recovery block techniques in the context of generic blocks in flight software is presented.
Nouri.Gharahasanlou, Ali; Mokhtarei, Ashkan; Khodayarei, Aliasqar; Ataei, Mohammad
2014-01-01
Evaluating and analyzing the risk in the mining industry is a new approach for improving the machinery performance. Reliability, safety, and maintenance management based on the risk analysis can enhance the overall availability and utilization of the mining technological systems. This study investigates the failure occurrence probability of the crushing and mixing bed hall department at Azarabadegan Khoy cement plant by using fault tree analysis (FTA) method. The results of the analysis in 200 h operating interval show that the probability of failure occurrence for crushing, conveyor systems, crushing and mixing bed hall department is 73, 64, and 95 percent respectively and the conveyor belt subsystem found as the most probable system for failure. Finally, maintenance as a method of control and prevent the occurrence of failure is proposed. PMID:26779433
Nouri Gharahasanlou, Ali; Mokhtarei, Ashkan; Khodayarei, Aliasqar; Ataei, Mohammad
2014-04-01
Evaluating and analyzing the risk in the mining industry is a new approach for improving the machinery performance. Reliability, safety, and maintenance management based on the risk analysis can enhance the overall availability and utilization of the mining technological systems. This study investigates the failure occurrence probability of the crushing and mixing bed hall department at Azarabadegan Khoy cement plant by using fault tree analysis (FTA) method. The results of the analysis in 200 h operating interval show that the probability of failure occurrence for crushing, conveyor systems, crushing and mixing bed hall department is 73, 64, and 95 percent respectively and the conveyor belt subsystem found as the most probable system for failure. Finally, maintenance as a method of control and prevent the occurrence of failure is proposed.
NASA Technical Reports Server (NTRS)
Uber, James G.
1988-01-01
Software itself is not hazardous, but since software and hardware share common interfaces there is an opportunity for software to create hazards. Further, these software systems are complex, and proven methods for the design, analysis, and measurement of software safety are not yet available. Some past software failures, future NASA software trends, software engineering methods, and tools and techniques for various software safety analyses are reviewed. Recommendations to NASA are made based on this review.
Reliability Analysis of Systems Subject to First-Passage Failure
NASA Technical Reports Server (NTRS)
Lutes, Loren D.; Sarkani, Shahram
2009-01-01
An obvious goal of reliability analysis is the avoidance of system failure. However, it is generally recognized that it is often not feasible to design a practical or useful system for which failure is impossible. Thus it is necessary to use techniques that estimate the likelihood of failure based on modeling the uncertainty about such items as the demands on and capacities of various elements in the system. This usually involves the use of probability theory, and a design is considered acceptable if it has a sufficiently small probability of failure. This report contains findings of analyses of systems subject to first-passage failure.
Understanding Acceptance of Software Metrics--A Developer Perspective
ERIC Educational Resources Information Center
Umarji, Medha
2009-01-01
Software metrics are measures of software products and processes. Metrics are widely used by software organizations to help manage projects, improve product quality and increase efficiency of the software development process. However, metrics programs tend to have a high failure rate in organizations, and developer pushback is one of the sources…
Certification Processes for Safety-Critical and Mission-Critical Aerospace Software
NASA Technical Reports Server (NTRS)
Nelson, Stacy
2003-01-01
This document is a quick reference guide with an overview of the processes required to certify safety-critical and mission-critical flight software at selected NASA centers and the FAA. Researchers and software developers can use this guide to jumpstart their understanding of how to get new or enhanced software onboard an aircraft or spacecraft. The introduction contains aerospace industry definitions of safety and safety-critical software, as well as, the current rationale for certification of safety-critical software. The Standards for Safety-Critical Aerospace Software section lists and describes current standards including NASA standards and RTCA DO-178B. The Mission-Critical versus Safety-Critical software section explains the difference between two important classes of software: safety-critical software involving the potential for loss of life due to software failure and mission-critical software involving the potential for aborting a mission due to software failure. The DO-178B Safety-critical Certification Requirements section describes special processes and methods required to obtain a safety-critical certification for aerospace software flying on vehicles under auspices of the FAA. The final two sections give an overview of the certification process used at Dryden Flight Research Center and the approval process at the Jet Propulsion Lab (JPL).
Integration and Assessment of Component Health Prognostics in Supervisory Control Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ramuhalli, Pradeep; Bonebrake, Christopher A.; Dib, Gerges
Enhanced risk monitors (ERMs) for active components in advanced reactor concepts use predictive estimates of component failure to update, in real time, predictive safety and economic risk metrics. These metrics have been shown to be capable of use in optimizing maintenance scheduling and managing plant maintenance costs. Integrating this information with plant supervisory control systems increases the potential for making control decisions that utilize real-time information on component conditions. Such decision making would limit the possibility of plant operations that increase the likelihood of degrading the functionality of one or more components while maintaining the overall functionality of the plant.more » ERM uses sensor data for providing real-time information about equipment condition for deriving risk monitors. This information is used to estimate the remaining useful life and probability of failure of these components. By combining this information with plant probabilistic risk assessment models, predictive estimates of risk posed by continued plant operation in the presence of detected degradation may be estimated. In this paper, we describe this methodology in greater detail, and discuss its integration with a prototypic software-based plant supervisory control platform. In order to integrate these two technologies and evaluate the integrated system, software to simulate the sensor data was developed, prognostic models for feedwater valves were developed, and several use cases defined. The full paper will describe these use cases, and the results of the initial evaluation.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Groth, Katrina M.; Zumwalt, Hannah Ruth; Clark, Andrew Jordan
2016-03-01
Hydrogen Risk Assessment Models (HyRAM) is a prototype software toolkit that integrates data and methods relevant to assessing the safety of hydrogen fueling and storage infrastructure. The HyRAM toolkit integrates deterministic and probabilistic models for quantifying accident scenarios, predicting physical effects, and characterizing the impact of hydrogen hazards, including thermal effects from jet fires and thermal pressure effects from deflagration. HyRAM version 1.0 incorporates generic probabilities for equipment failures for nine types of components, and probabilistic models for the impact of heat flux on humans and structures, with computationally and experimentally validated models of various aspects of gaseous hydrogen releasemore » and flame physics. This document provides an example of how to use HyRAM to conduct analysis of a fueling facility. This document will guide users through the software and how to enter and edit certain inputs that are specific to the user-defined facility. Description of the methodology and models contained in HyRAM is provided in [1]. This User’s Guide is intended to capture the main features of HyRAM version 1.0 (any HyRAM version numbered as 1.0.X.XXX). This user guide was created with HyRAM 1.0.1.798. Due to ongoing software development activities, newer versions of HyRAM may have differences from this guide.« less
SureTrak Probability of Impact Display
NASA Technical Reports Server (NTRS)
Elliott, John
2012-01-01
The SureTrak Probability of Impact Display software was developed for use during rocket launch operations. The software displays probability of impact information for each ship near the hazardous area during the time immediately preceding the launch of an unguided vehicle. Wallops range safety officers need to be sure that the risk to humans is below a certain threshold during each use of the Wallops Flight Facility Launch Range. Under the variable conditions that can exist at launch time, the decision to launch must be made in a timely manner to ensure a successful mission while not exceeding those risk criteria. Range safety officers need a tool that can give them the needed probability of impact information quickly, and in a format that is clearly understandable. This application is meant to fill that need. The software is a reuse of part of software developed for an earlier project: Ship Surveillance Software System (S4). The S4 project was written in C++ using Microsoft Visual Studio 6. The data structures and dialog templates from it were copied into a new application that calls the implementation of the algorithms from S4 and displays the results as needed. In the S4 software, the list of ships in the area was received from one local radar interface and from operators who entered the ship information manually. The SureTrak Probability of Impact Display application receives ship data from two local radars as well as the SureTrak system, eliminating the need for manual data entry.
NASA Technical Reports Server (NTRS)
Vitali, Roberto; Lutomski, Michael G.
2004-01-01
National Aeronautics and Space Administration s (NASA) International Space Station (ISS) Program uses Probabilistic Risk Assessment (PRA) as part of its Continuous Risk Management Process. It is used as a decision and management support tool to not only quantify risk for specific conditions, but more importantly comparing different operational and management options to determine the lowest risk option and provide rationale for management decisions. This paper presents the derivation of the probability distributions used to quantify the failure rates and the probability of failures of the basic events employed in the PRA model of the ISS. The paper will show how a Bayesian approach was used with different sources of data including the actual ISS on orbit failures to enhance the confidence in results of the PRA. As time progresses and more meaningful data is gathered from on orbit failures, an increasingly accurate failure rate probability distribution for the basic events of the ISS PRA model can be obtained. The ISS PRA has been developed by mapping the ISS critical systems such as propulsion, thermal control, or power generation into event sequences diagrams and fault trees. The lowest level of indenture of the fault trees was the orbital replacement units (ORU). The ORU level was chosen consistently with the level of statistically meaningful data that could be obtained from the aerospace industry and from the experts in the field. For example, data was gathered for the solenoid valves present in the propulsion system of the ISS. However valves themselves are composed of parts and the individual failure of these parts was not accounted for in the PRA model. In other words the failure of a spring within a valve was considered a failure of the valve itself.
ERIC Educational Resources Information Center
Ichu, Emmanuel A.
2010-01-01
Software quality is perhaps one of the most sought-after attributes in product development, however; this goal is unattained. Problem factors in software development and how these have affected the maintainability of the delivered software systems requires a thorough investigation. It was, therefore, very important to understand software…
A Comparison of Learning Technologies for Teaching Spacecraft Software Development
ERIC Educational Resources Information Center
Straub, Jeremy
2014-01-01
The development of software for spacecraft represents a particular challenge and is, in many ways, a worst case scenario from a design perspective. Spacecraft software must be "bulletproof" and operate for extended periods of time without user intervention. If the software fails, it cannot be manually serviced. Software failure may…
Software reliability models for fault-tolerant avionics computers and related topics
NASA Technical Reports Server (NTRS)
Miller, Douglas R.
1987-01-01
Software reliability research is briefly described. General research topics are reliability growth models, quality of software reliability prediction, the complete monotonicity property of reliability growth, conceptual modelling of software failure behavior, assurance of ultrahigh reliability, and analysis techniques for fault-tolerant systems.
Study of a unified hardware and software fault-tolerant architecture
NASA Technical Reports Server (NTRS)
Lala, Jaynarayan; Alger, Linda; Friend, Steven; Greeley, Gregory; Sacco, Stephen; Adams, Stuart
1989-01-01
A unified architectural concept, called the Fault Tolerant Processor Attached Processor (FTP-AP), that can tolerate hardware as well as software faults is proposed for applications requiring ultrareliable computation capability. An emulation of the FTP-AP architecture, consisting of a breadboard Motorola 68010-based quadruply redundant Fault Tolerant Processor, four VAX 750s as attached processors, and four versions of a transport aircraft yaw damper control law, is used as a testbed in the AIRLAB to examine a number of critical issues. Solutions of several basic problems associated with N-Version software are proposed and implemented on the testbed. This includes a confidence voter to resolve coincident errors in N-Version software. A reliability model of N-Version software that is based upon the recent understanding of software failure mechanisms is also developed. The basic FTP-AP architectural concept appears suitable for hosting N-Version application software while at the same time tolerating hardware failures. Architectural enhancements for greater efficiency, software reliability modeling, and N-Version issues that merit further research are identified.
NASA Technical Reports Server (NTRS)
McCarty, John P.; Lyles, Garry M.
1997-01-01
Propulsion system quality is defined in this paper as having high reliability, that is, quality is a high probability of within-tolerance performance or operation. Since failures are out-of-tolerance performance, the probability of failures and their occurrence is the difference between high and low quality systems. Failures can be described at 3 levels: the system failure (which is the detectable end of a failure), the failure mode (which is the failure process), and the failure cause (which is the start). Failure causes can be evaluated & classified by type. The results of typing flight history failures shows that most failures are in unrecognized modes and result from human error or noise, i.e. failures are when engineers learn how things really work. Although the study based on US launch vehicles, a sampling of failures from other countries indicates the finding has broad application. The parameters of the design of a propulsion system are not single valued, but have dispersions associated with the manufacturing of parts. Many tests are needed to find failures, if the dispersions are large relative to tolerances, which could contribute to the large number of failures in unrecognized modes.
An experiment in software reliability
NASA Technical Reports Server (NTRS)
Dunham, J. R.; Pierce, J. L.
1986-01-01
The results of a software reliability experiment conducted in a controlled laboratory setting are reported. The experiment was undertaken to gather data on software failures and is one in a series of experiments being pursued by the Fault Tolerant Systems Branch of NASA Langley Research Center to find a means of credibly performing reliability evaluations of flight control software. The experiment tests a small sample of implementations of radar tracking software having ultra-reliability requirements and uses n-version programming for error detection, and repetitive run modeling for failure and fault rate estimation. The experiment results agree with those of Nagel and Skrivan in that the program error rates suggest an approximate log-linear pattern and the individual faults occurred with significantly different error rates. Additional analysis of the experimental data raises new questions concerning the phenomenon of interacting faults. This phenomenon may provide one explanation for software reliability decay.
Quantitative method of medication system interface evaluation.
Pingenot, Alleene Anne; Shanteau, James; Pingenot, James D F
2007-01-01
The objective of this study was to develop a quantitative method of evaluating the user interface for medication system software. A detailed task analysis provided a description of user goals and essential activity. A structural fault analysis was used to develop a detailed description of the system interface. Nurses experienced with use of the system under evaluation provided estimates of failure rates for each point in this simplified fault tree. Means of estimated failure rates provided quantitative data for fault analysis. Authors note that, although failures of steps in the program were frequent, participants reported numerous methods of working around these failures so that overall system failure was rare. However, frequent process failure can affect the time required for processing medications, making a system inefficient. This method of interface analysis, called Software Efficiency Evaluation and Fault Identification Method, provides quantitative information with which prototypes can be compared and problems within an interface identified.
Probability of failure prediction for step-stress fatigue under sine or random stress
NASA Technical Reports Server (NTRS)
Lambert, R. G.
1979-01-01
A previously proposed cumulative fatigue damage law is extended to predict the probability of failure or fatigue life for structural materials with S-N fatigue curves represented as a scatterband of failure points. The proposed law applies to structures subjected to sinusoidal or random stresses and includes the effect of initial crack (i.e., flaw) sizes. The corrected cycle ratio damage function is shown to have physical significance.
Practical Application of PRA as an Integrated Design Tool for Space Systems
NASA Technical Reports Server (NTRS)
Kalia, Prince; Shi, Ying; Pair, Robin; Quaney, Virginia; Uhlenbrock, John
2013-01-01
This paper presents the application of the first comprehensive Probabilistic Risk Assessment (PRA) during the design phase of a joint NASA/NOAA weather satellite program, Geostationary Operational Environmental Satellite Series R (GOES-R). GOES-R is the next generation weather satellite primarily to help understand the weather and help save human lives. PRA has been used at NASA for Human Space Flight for many years. PRA was initially adopted and implemented in the operational phase of manned space flight programs and more recently for the next generation human space systems. Since its first use at NASA, PRA has become recognized throughout the Agency as a method of assessing complex mission risks as part of an overall approach to assuring safety and mission success throughout project lifecycles. PRA is now included as a requirement during the design phase of both NASA next generation manned space vehicles as well as for high priority robotic missions. The influence of PRA on GOES-R design and operation concepts are discussed in detail. The GOES-R PRA is unique at NASA for its early implementation. It also represents a pioneering effort to integrate risks from both Spacecraft (SC) and Ground Segment (GS) to fully assess the probability of achieving mission objectives. PRA analysts were actively involved in system engineering and design engineering to ensure that a comprehensive set of technical risks were correctly identified and properly understood from a design and operations perspective. The analysis included an assessment of SC hardware and software, SC fault management system, GS hardware and software, common cause failures, human error, natural hazards, solar weather and infrastructure (such as network and telecommunications failures, fire). PRA findings directly resulted in design changes to reduce SC risk from micro-meteoroids. PRA results also led to design changes in several SC subsystems, e.g. propulsion, guidance, navigation and control (GNC), communications, mechanisms, and command and data handling (C&DH). The fault tree approach assisted in the development of the fault management system design. Human error analysis, which examined human response to failure, indicated areas where automation could reduce the overall probability of gaps in operation by half. In addition, the PRA brought to light many potential root causes of system disruptions, including earthquakes, inclement weather, solar storms, blackouts and other extreme conditions not considered in the typical reliability and availability analyses. Ultimately the PRA served to identify potential failures that, when mitigated, resulted in a more robust design, as well as to influence the program's concept of operations. The early and active integration of PRA with system and design engineering provided a well-managed approach for risk assessment that increased reliability and availability, optimized lifecyc1e costs, and unified the SC and GS developments.
Probabilistic analysis on the failure of reactivity control for the PWR
NASA Astrophysics Data System (ADS)
Sony Tjahyani, D. T.; Deswandri; Sunaryo, G. R.
2018-02-01
The fundamental safety function of the power reactor is to control reactivity, to remove heat from the reactor, and to confine radioactive material. The safety analysis is used to ensure that each parameter is fulfilled during the design and is done by deterministic and probabilistic method. The analysis of reactivity control is important to be done because it will affect the other of fundamental safety functions. The purpose of this research is to determine the failure probability of the reactivity control and its failure contribution on a PWR design. The analysis is carried out by determining intermediate events, which cause the failure of reactivity control. Furthermore, the basic event is determined by deductive method using the fault tree analysis. The AP1000 is used as the object of research. The probability data of component failure or human error, which is used in the analysis, is collected from IAEA, Westinghouse, NRC and other published documents. The results show that there are six intermediate events, which can cause the failure of the reactivity control. These intermediate events are uncontrolled rod bank withdrawal at low power or full power, malfunction of boron dilution, misalignment of control rod withdrawal, malfunction of improper position of fuel assembly and ejection of control rod. The failure probability of reactivity control is 1.49E-03 per year. The causes of failures which are affected by human factor are boron dilution, misalignment of control rod withdrawal and malfunction of improper position for fuel assembly. Based on the assessment, it is concluded that the failure probability of reactivity control on the PWR is still within the IAEA criteria.
Quantifying Safety Margin Using the Risk-Informed Safety Margin Characterization (RISMC)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Grabaskas, David; Bucknor, Matthew; Brunett, Acacia
2015-04-26
The Risk-Informed Safety Margin Characterization (RISMC), developed by Idaho National Laboratory as part of the Light-Water Reactor Sustainability Project, utilizes a probabilistic safety margin comparison between a load and capacity distribution, rather than a deterministic comparison between two values, as is usually done in best-estimate plus uncertainty analyses. The goal is to determine the failure probability, or in other words, the probability of the system load equaling or exceeding the system capacity. While this method has been used in pilot studies, there has been little work conducted investigating the statistical significance of the resulting failure probability. In particular, it ismore » difficult to determine how many simulations are necessary to properly characterize the failure probability. This work uses classical (frequentist) statistics and confidence intervals to examine the impact in statistical accuracy when the number of simulations is varied. Two methods are proposed to establish confidence intervals related to the failure probability established using a RISMC analysis. The confidence interval provides information about the statistical accuracy of the method utilized to explore the uncertainty space, and offers a quantitative method to gauge the increase in statistical accuracy due to performing additional simulations.« less
STS-51 pad abort. OV103-engine 2033 (ME-2) fuel flowmeter sensor open circuit
NASA Technical Reports Server (NTRS)
1993-01-01
The STS-51 initial launch attempt of Discovery (OV-103) was terminated on KSC launch pad 39B on 12 Aug. 1993 at 9:12 AM E.S.T. due to a sensor redundancy failure in the liquid hydrogen system of ME-2 (Engine 2033). The event description and time line are summarized. Propellant loading was initiated on 12 Aug. 1993 at 12:00 AM EST. All space shuttle main engine (SSME) chill parameters and Launch Commit Criteria (LCC) were nominal. At engine start plus 1.34 seconds a Failure Identification (FID) was posted against Engine 2033 for exceeding the 1800 spin intra-channel (A1-A2) Fuel Flowrate sensor channel qualification limit. The engine was shut down at 1.50 seconds followed by Engines 2032 and 2030. All shut down sequences were nominal and the mission was safely aborted. SSME Avionics hardware and software performed nominally during the incident. A review of vehicle data table (VDT) data and controller software logic revealed no failure indications other than the single FID 111-101, Fuel Flowrate Intra-Channel Test Channel A disqualification. Software logic was executed according to requirements and there was no anomalous controller software operation. Immediately following the abort, a Rocketdyne/NASA failure investigation team was assembled. The team successfully isolated the failure cause to an open circuit in a Fuel Flowrate Sensor. This type of failure has occurred eight previous times in ground testing. The sensor had performed acceptably on three previous flights of the engine and SSME flight history shows 684 combined fuel flow rate sensor channel flights without failure. The disqualification of an Engine 2 (SSME No. 2033) Fuel Flowrate sensor channel was a result of an instrumentation failure and not engine performance. All other engine operations were nominal. This disqualification resulted in an engine shutdown and safe sequential shutdown of all three engines prior to ignition of the solid boosters.
14 CFR 417.224 - Probability of failure analysis.
Code of Federal Regulations, 2013 CFR
2013-01-01
... 14 Aeronautics and Space 4 2013-01-01 2013-01-01 false Probability of failure analysis. 417.224 Section 417.224 Aeronautics and Space COMMERCIAL SPACE TRANSPORTATION, FEDERAL AVIATION ADMINISTRATION... phase of normal flight or when any anomalous condition exhibits the potential for a stage or its debris...
14 CFR 417.224 - Probability of failure analysis.
Code of Federal Regulations, 2010 CFR
2010-01-01
... 14 Aeronautics and Space 4 2010-01-01 2010-01-01 false Probability of failure analysis. 417.224 Section 417.224 Aeronautics and Space COMMERCIAL SPACE TRANSPORTATION, FEDERAL AVIATION ADMINISTRATION... phase of normal flight or when any anomalous condition exhibits the potential for a stage or its debris...
14 CFR 417.224 - Probability of failure analysis.
Code of Federal Regulations, 2012 CFR
2012-01-01
... 14 Aeronautics and Space 4 2012-01-01 2012-01-01 false Probability of failure analysis. 417.224 Section 417.224 Aeronautics and Space COMMERCIAL SPACE TRANSPORTATION, FEDERAL AVIATION ADMINISTRATION... phase of normal flight or when any anomalous condition exhibits the potential for a stage or its debris...
14 CFR 417.224 - Probability of failure analysis.
Code of Federal Regulations, 2011 CFR
2011-01-01
... 14 Aeronautics and Space 4 2011-01-01 2011-01-01 false Probability of failure analysis. 417.224 Section 417.224 Aeronautics and Space COMMERCIAL SPACE TRANSPORTATION, FEDERAL AVIATION ADMINISTRATION... phase of normal flight or when any anomalous condition exhibits the potential for a stage or its debris...
14 CFR 417.224 - Probability of failure analysis.
Code of Federal Regulations, 2014 CFR
2014-01-01
... 14 Aeronautics and Space 4 2014-01-01 2014-01-01 false Probability of failure analysis. 417.224 Section 417.224 Aeronautics and Space COMMERCIAL SPACE TRANSPORTATION, FEDERAL AVIATION ADMINISTRATION... phase of normal flight or when any anomalous condition exhibits the potential for a stage or its debris...
Extended Testability Analysis Tool
NASA Technical Reports Server (NTRS)
Melcher, Kevin; Maul, William A.; Fulton, Christopher
2012-01-01
The Extended Testability Analysis (ETA) Tool is a software application that supports fault management (FM) by performing testability analyses on the fault propagation model of a given system. Fault management includes the prevention of faults through robust design margins and quality assurance methods, or the mitigation of system failures. Fault management requires an understanding of the system design and operation, potential failure mechanisms within the system, and the propagation of those potential failures through the system. The purpose of the ETA Tool software is to process the testability analysis results from a commercial software program called TEAMS Designer in order to provide a detailed set of diagnostic assessment reports. The ETA Tool is a command-line process with several user-selectable report output options. The ETA Tool also extends the COTS testability analysis and enables variation studies with sensor sensitivity impacts on system diagnostics and component isolation using a single testability output. The ETA Tool can also provide extended analyses from a single set of testability output files. The following analysis reports are available to the user: (1) the Detectability Report provides a breakdown of how each tested failure mode was detected, (2) the Test Utilization Report identifies all the failure modes that each test detects, (3) the Failure Mode Isolation Report demonstrates the system s ability to discriminate between failure modes, (4) the Component Isolation Report demonstrates the system s ability to discriminate between failure modes relative to the components containing the failure modes, (5) the Sensor Sensor Sensitivity Analysis Report shows the diagnostic impact due to loss of sensor information, and (6) the Effect Mapping Report identifies failure modes that result in specified system-level effects.
NASA Astrophysics Data System (ADS)
Pommatau, Gilles
2014-06-01
The present paper deals with the industrial application, via a software developed by Thales Alenia Space, of a new failure criterion named "Tsai-Hill equivalent criterion" for composite structural parts of satellites. The first part of the paper briefly describes the main hypothesis and the possibilities in terms of failure analysis of the software. The second parts reminds the quadratic and conservative nature of the new failure criterion, already presented in ESA conference in a previous paper. The third part presents the statistical calculation possibilities of the software, and the associated sensitivity analysis, via results obtained on different composites. Then a methodology, proposed to customers and agencies, is presented with its limitations and advantages. It is then conclude that this methodology is an efficient industrial way to perform mechanical analysis on quasi-isotropic composite parts.
NASA Technical Reports Server (NTRS)
Tischer, A. E.
1987-01-01
The failure information propagation model (FIPM) data base was developed to store and manipulate the large amount of information anticipated for the various Space Shuttle Main Engine (SSME) FIPMs. The organization and structure of the FIPM data base is described, including a summary of the data fields and key attributes associated with each FIPM data file. The menu-driven software developed to facilitate and control the entry, modification, and listing of data base records is also discussed. The transfer of the FIPM data base and software to the NASA Marshall Space Flight Center is described. Complete listings of all of the data base definition commands and software procedures are included in the appendixes.
AADL and Model-based Engineering
2014-10-20
and MBE Feiler, Oct 20, 2014 © 2014 Carnegie Mellon University We Rely on Software for Safe Aircraft Operation Embedded software systems ...D eveloper Compute Platform Runtime Architecture Application Software Embedded SW System Engineer Data Stream Characteristics Latency...confusion Hardware Engineer Why do system level failures still occur despite fault tolerance techniques being deployed in systems ? Embedded software
49 CFR Appendix C to Part 236 - Safety Assurance Criteria and Processes
Code of Federal Regulations, 2010 CFR
2010-10-01
... system (all its elements including hardware and software) must be designed to assure safe operation with... unsafe errors in the software due to human error in the software specification, design, or coding phases... (hardware or software, or both) are used in combination to ensure safety. If a common mode failure exists...
49 CFR 238.105 - Train electronic hardware and software safety.
Code of Federal Regulations, 2010 CFR
2010-10-01
... and software system safety as part of the pre-revenue service testing of the equipment. (d)(1... safely by initiating a full service brake application in the event of a hardware or software failure that... 49 Transportation 4 2010-10-01 2010-10-01 false Train electronic hardware and software safety. 238...
Sundaram, Aparna; Vaughan, Barbara; Kost, Kathryn; Bankole, Akinrinola; Finer, Lawrence; Singh, Susheela; Trussell, James
2017-03-01
Contraceptive failure rates measure a woman's probability of becoming pregnant while using a contraceptive. Information about these rates enables couples to make informed contraceptive choices. Failure rates were last estimated for 2002, and social and economic changes that have occurred since then necessitate a reestimation. To estimate failure rates for the most commonly used reversible methods in the United States, data from the 2006-2010 National Survey of Family Growth were used; some 15,728 contraceptive use intervals, contributed by 6,683 women, were analyzed. Data from the Guttmacher Institute's 2008 Abortion Patient Survey were used to adjust for abortion underreporting. Kaplan-Meier methods were used to estimate the associated single-decrement probability of failure by duration of use. Failure rates were compared with those from 1995 and 2002. Long-acting reversible contraceptives (the IUD and the implant) had the lowest failure rates of all methods (1%), while condoms and withdrawal carried the highest probabilities of failure (13% and 20%, respectively). However, the failure rate for the condom had declined significantly since 1995 (from 18%), as had the failure rate for all hormonal methods combined (from 8% to 6%). The failure rate for all reversible methods combined declined from 12% in 2002 to 10% in 2006-2010. These broad-based declines in failure rates reverse a long-term pattern of minimal change. Future research should explore what lies behind these trends, as well as possibilities for further improvements. © 2017 The Authors. Perspectives on Sexual and Reproductive Health published by Wiley Periodicals, Inc., on behalf of the Guttmacher Institute.
Mechanical failure probability of glasses in Earth orbit
NASA Technical Reports Server (NTRS)
Kinser, Donald L.; Wiedlocher, David E.
1992-01-01
Results of five years of earth-orbital exposure on mechanical properties of glasses indicate that radiation effects on mechanical properties of glasses, for the glasses examined, are less than the probable error of measurement. During the 5 year exposure, seven micrometeorite or space debris impacts occurred on the samples examined. These impacts were located in locations which were not subjected to effective mechanical testing, hence limited information on their influence upon mechanical strength was obtained. Combination of these results with micrometeorite and space debris impact frequency obtained by other experiments permits estimates of the failure probability of glasses exposed to mechanical loading under earth-orbit conditions. This probabilistic failure prediction is described and illustrated with examples.
Teaching Probability with the Support of the R Statistical Software
ERIC Educational Resources Information Center
dos Santos Ferreira, Robson; Kataoka, Verônica Yumi; Karrer, Monica
2014-01-01
The objective of this paper is to discuss aspects of high school students' learning of probability in a context where they are supported by the statistical software R. We report on the application of a teaching experiment, constructed using the perspective of Gal's probabilistic literacy and Papert's constructionism. The results show improvement…
On defense strategies for system of systems using aggregated correlations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rao, Nageswara S.; Imam, Neena; Ma, Chris Y. T.
2017-04-01
We consider a System of Systems (SoS) wherein each system Si, i = 1; 2; ... ;N, is composed of discrete cyber and physical components which can be attacked and reinforced. We characterize the disruptions using aggregate failure correlation functions given by the conditional failure probability of SoS given the failure of an individual system. We formulate the problem of ensuring the survival of SoS as a game between an attacker and a provider, each with a utility function composed of asurvival probability term and a cost term, both expressed in terms of the number of components attacked and reinforced.more » The survival probabilities of systems satisfy simple product-form, first-order differential conditions, which simplify the Nash Equilibrium (NE) conditions. We derive the sensitivity functions that highlight the dependence of SoS survival probability at NE on cost terms, correlation functions, and individual system survival probabilities.We apply these results to a simplified model of distributed cloud computing infrastructure.« less
A new algorithm for finding survival coefficients employed in reliability equations
NASA Technical Reports Server (NTRS)
Bouricius, W. G.; Flehinger, B. J.
1973-01-01
Product reliabilities are predicted from past failure rates and reasonable estimate of future failure rates. Algorithm is used to calculate probability that product will function correctly. Algorithm sums the probabilities of each survival pattern and number of permutations for that pattern, over all possible ways in which product can survive.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Conover, W.J.; Cox, D.D.; Martz, H.F.
1997-12-01
When using parametric empirical Bayes estimation methods for estimating the binomial or Poisson parameter, the validity of the assumed beta or gamma conjugate prior distribution is an important diagnostic consideration. Chi-square goodness-of-fit tests of the beta or gamma prior hypothesis are developed for use when the binomial sample sizes or Poisson exposure times vary. Nine examples illustrate the application of the methods, using real data from such diverse applications as the loss of feedwater flow rates in nuclear power plants, the probability of failure to run on demand and the failure rates of the high pressure coolant injection systems atmore » US commercial boiling water reactors, the probability of failure to run on demand of emergency diesel generators in US commercial nuclear power plants, the rate of failure of aircraft air conditioners, baseball batting averages, the probability of testing positive for toxoplasmosis, and the probability of tumors in rats. The tests are easily applied in practice by means of corresponding Mathematica{reg_sign} computer programs which are provided.« less
Amols, Howard I
2008-11-01
New technologies such as intensity modulated and image guided radiation therapy, computer controlled linear accelerators, record and verify systems, electronic charts, and digital imaging have revolutionized radiation therapy over the past 10-15 y. Quality assurance (QA) as historically practiced and as recommended in reports such as American Association of Physicists in Medicine Task Groups 40 and 53 needs to be updated to address the increasing complexity and computerization of radiotherapy equipment, and the increased quantity of data defining a treatment plan and treatment delivery. While new technology has reduced the probability of many types of medical events, seeing new types of errors caused by improper use of new technology, communication failures between computers, corrupted or erroneous computer data files, and "software bugs" are now being seen. The increased use of computed tomography, magnetic resonance, and positron emission tomography imaging has become routine for many types of radiotherapy treatment planning, and QA for imaging modalities is beyond the expertise of most radiotherapy physicists. Errors in radiotherapy rarely result solely from hardware failures. More commonly they are a combination of computer and human errors. The increased use of radiosurgery, hypofractionation, more complex intensity modulated treatment plans, image guided radiation therapy, and increasing financial pressures to treat more patients in less time will continue to fuel this reliance on high technology and complex computer software. Clinical practitioners and regulatory agencies are beginning to realize that QA for new technologies is a major challenge and poses dangers different in nature than what are historically familiar.
NASA/CARES dual-use ceramic technology spinoff applications
NASA Technical Reports Server (NTRS)
Powers, Lynn M.; Janosik, Lesley A.; Gyekenyesi, John P.; Nemeth, Noel N.
1994-01-01
NASA has developed software that enables American industry to establish the reliability and life of ceramic structures in a wide variety of 21st Century applications. Designing ceramic components to survive at higher temperatures than the capability of most metals and in severe loading environments involves the disciplines of statistics and fracture mechanics. Successful application of advanced ceramics material properties and the use of a probabilistic brittle material design methodology. The NASA program, known as CARES (Ceramics Analysis and Reliability Evaluation of Structures), is a comprehensive general purpose design tool that predicts the probability of failure of a ceramic component as a function of its time in service. The latest version of this software, CARESALIFE, is coupled to several commercially available finite element analysis programs (ANSYS, MSC/NASTRAN, ABAQUS, COSMOS/N4, MARC), resulting in an advanced integrated design tool which is adapted to the computing environment of the user. The NASA-developed CARES software has been successfully used by industrial, government, and academic organizations to design and optimize ceramic components for many demanding applications. Industrial sectors impacted by this program include aerospace, automotive, electronic, medical, and energy applications. Dual-use applications include engine components, graphite and ceramic high temperature valves, TV picture tubes, ceramic bearings, electronic chips, glass building panels, infrared windows, radiant heater tubes, heat exchangers, and artificial hips, knee caps, and teeth.
An Empirical Approach to Analysis of Similarities between Software Failure Regions
1991-09-01
cycle costs after the soft- ware has been marketed (Alberts, 1976). 1 Unfortunately, extensive software testing is frequently necessary in spite of...incidence is primarily syntactic. This mixing of semantic and syntactic forms in the same analysis could lead to some distortion, especially since the...of formulae to improve readability or to indicate precedence of operations. * All defintions within ’Condition I’ of a failure region are assumed to
Practical, redundant, failure-tolerant, self-reconfiguring embedded system architecture
Klarer, Paul R.; Hayward, David R.; Amai, Wendy A.
2006-10-03
This invention relates to system architectures, specifically failure-tolerant and self-reconfiguring embedded system architectures. The invention provides both a method and architecture for redundancy. There can be redundancy in both software and hardware for multiple levels of redundancy. The invention provides a self-reconfiguring architecture for activating redundant modules whenever other modules fail. The architecture comprises: a communication backbone connected to two or more processors and software modules running on each of the processors. Each software module runs on one processor and resides on one or more of the other processors to be available as a backup module in the event of failure. Each module and backup module reports its status over the communication backbone. If a primary module does not report, its backup module takes over its function. If the primary module becomes available again, the backup module returns to its backup status.
Operational excellence (six sigma) philosophy: Application to software quality assurance
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lackner, M.
1997-11-01
This report contains viewgraphs on operational excellence philosophy of six sigma applied to software quality assurance. This report outlines the following: goal of six sigma; six sigma tools; manufacturing vs administrative processes; Software quality assurance document inspections; map software quality assurance requirements document; failure mode effects analysis for requirements document; measuring the right response variables; and questions.
Data-Driven Decision Making as a Tool to Improve Software Development Productivity
ERIC Educational Resources Information Center
Brown, Mary Erin
2013-01-01
The worldwide software project failure rate, based on a survey of information technology software manager's view of user satisfaction, product quality, and staff productivity, is estimated to be between 24% and 36% and software project success has not kept pace with the advances in hardware. The problem addressed by this study was the limited…
Fuzzy-information-based robustness of interconnected networks against attacks and failures
NASA Astrophysics Data System (ADS)
Zhu, Qian; Zhu, Zhiliang; Wang, Yifan; Yu, Hai
2016-09-01
Cascading failure is fatal in applications and its investigation is essential and therefore became a focal topic in the field of complex networks in the last decade. In this paper, a cascading failure model is established for interconnected networks and the associated data-packet transport problem is discussed. A distinguished feature of the new model is its utilization of fuzzy information in resisting uncertain failures and malicious attacks. We numerically find that the giant component of the network after failures increases with tolerance parameter for any coupling preference and attacking ambiguity. Moreover, considering the effect of the coupling probability on the robustness of the networks, we find that the robustness of the assortative coupling and random coupling of the network model increases with the coupling probability. However, for disassortative coupling, there exists a critical phenomenon for coupling probability. In addition, a critical value that attacking information accuracy affects the network robustness is observed. Finally, as a practical example, the interconnected AS-level Internet in South Korea and Japan is analyzed. The actual data validates the theoretical model and analytic results. This paper thus provides some guidelines for preventing cascading failures in the design of architecture and optimization of real-world interconnected networks.
ERIC Educational Resources Information Center
Brookhart, Susan M.; And Others
1997-01-01
Process Analysis is described as a method for identifying and measuring the probability of events that could cause the failure of a program, resulting in a cause-and-effect tree structure of events. The method is illustrated through the evaluation of a pilot instructional program at an elementary school. (SLD)
ERIC Educational Resources Information Center
Dougherty, Michael R.; Sprenger, Amber
2006-01-01
This article introduces 2 new sources of bias in probability judgment, discrimination failure and inhibition failure, which are conceptualized as arising from an interaction between error prone memory processes and a support theory like comparison process. Both sources of bias stem from the influence of irrelevant information on participants'…
A detailed description of the sequential probability ratio test for 2-IMU FDI
NASA Technical Reports Server (NTRS)
Rich, T. M.
1976-01-01
The sequential probability ratio test (SPRT) for 2-IMU FDI (inertial measuring unit failure detection/isolation) is described. The SPRT is a statistical technique for detecting and isolating soft IMU failures originally developed for the strapdown inertial reference unit. The flowchart of a subroutine incorporating the 2-IMU SPRT is included.
Optimized Vertex Method and Hybrid Reliability
NASA Technical Reports Server (NTRS)
Smith, Steven A.; Krishnamurthy, T.; Mason, B. H.
2002-01-01
A method of calculating the fuzzy response of a system is presented. This method, called the Optimized Vertex Method (OVM), is based upon the vertex method but requires considerably fewer function evaluations. The method is demonstrated by calculating the response membership function of strain-energy release rate for a bonded joint with a crack. The possibility of failure of the bonded joint was determined over a range of loads. After completing the possibilistic analysis, the possibilistic (fuzzy) membership functions were transformed to probability density functions and the probability of failure of the bonded joint was calculated. This approach is called a possibility-based hybrid reliability assessment. The possibility and probability of failure are presented and compared to a Monte Carlo Simulation (MCS) of the bonded joint.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dickson, T.L.; Simonen, F.A.
1992-05-01
Probabilistic fracture mechanics analysis is a major element of comprehensive probabilistic methodology on which current NRC regulatory requirements for pressurized water reactor vessel integrity evaluation are based. Computer codes such as OCA-P and VISA-II perform probabilistic fracture analyses to estimate the increase in vessel failure probability that occurs as the vessel material accumulates radiation damage over the operating life of the vessel. The results of such analyses, when compared with limits of acceptable failure probabilities, provide an estimation of the residual life of a vessel. Such codes can be applied to evaluate the potential benefits of plant-specific mitigating actions designedmore » to reduce the probability of failure of a reactor vessel. 10 refs.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dickson, T.L.; Simonen, F.A.
1992-01-01
Probabilistic fracture mechanics analysis is a major element of comprehensive probabilistic methodology on which current NRC regulatory requirements for pressurized water reactor vessel integrity evaluation are based. Computer codes such as OCA-P and VISA-II perform probabilistic fracture analyses to estimate the increase in vessel failure probability that occurs as the vessel material accumulates radiation damage over the operating life of the vessel. The results of such analyses, when compared with limits of acceptable failure probabilities, provide an estimation of the residual life of a vessel. Such codes can be applied to evaluate the potential benefits of plant-specific mitigating actions designedmore » to reduce the probability of failure of a reactor vessel. 10 refs.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Takeda, Masatoshi; Komura, Toshiyuki; Hirotani, Tsutomu
1995-12-01
Annual failure probabilities of buildings and equipment were roughly evaluated for two fusion-reactor-like buildings, with and without seismic base isolation, in order to examine the effectiveness of the base isolation system regarding siting issues. The probabilities are calculated considering nonlinearity and rupture of isolators. While the probability of building failure for both buildings on the same site was almost equal, the function failures for equipment showed that the base-isolated building had higher reliability than the non-isolated building. Even if the base-isolated building alone is located on a higher seismic hazard area, it could compete favorably with the ordinary one inmore » reliability of equipment.« less
Estimation of submarine mass failure probability from a sequence of deposits with age dates
Geist, Eric L.; Chaytor, Jason D.; Parsons, Thomas E.; ten Brink, Uri S.
2013-01-01
The empirical probability of submarine mass failure is quantified from a sequence of dated mass-transport deposits. Several different techniques are described to estimate the parameters for a suite of candidate probability models. The techniques, previously developed for analyzing paleoseismic data, include maximum likelihood and Type II (Bayesian) maximum likelihood methods derived from renewal process theory and Monte Carlo methods. The estimated mean return time from these methods, unlike estimates from a simple arithmetic mean of the center age dates and standard likelihood methods, includes the effects of age-dating uncertainty and of open time intervals before the first and after the last event. The likelihood techniques are evaluated using Akaike’s Information Criterion (AIC) and Akaike’s Bayesian Information Criterion (ABIC) to select the optimal model. The techniques are applied to mass transport deposits recorded in two Integrated Ocean Drilling Program (IODP) drill sites located in the Ursa Basin, northern Gulf of Mexico. Dates of the deposits were constrained by regional bio- and magnetostratigraphy from a previous study. Results of the analysis indicate that submarine mass failures in this location occur primarily according to a Poisson process in which failures are independent and return times follow an exponential distribution. However, some of the model results suggest that submarine mass failures may occur quasiperiodically at one of the sites (U1324). The suite of techniques described in this study provides quantitative probability estimates of submarine mass failure occurrence, for any number of deposits and age uncertainty distributions.
NASA Astrophysics Data System (ADS)
Radakovic, Nenad; McDougall, Douglas
2012-10-01
This classroom note illustrates how dynamic visualization can be used to teach conditional probability and Bayes' theorem. There are two features of the visualization that make it an ideal pedagogical tool in probability instruction. The first feature is the use of area-proportional Venn diagrams that, along with showing qualitative relationships, describe the quantitative relationship between two sets. The second feature is the slider and animation component of dynamic geometry software enabling students to observe how the change in the base rate of an event influences conditional probability. A hypothetical instructional sequence using a well-known breast cancer example is described.
Skerjanc, William F.; Maki, John T.; Collin, Blaise P.; ...
2015-12-02
The success of modular high temperature gas-cooled reactors is highly dependent on the performance of the tristructural-isotopic (TRISO) coated fuel particle and the quality to which it can be manufactured. During irradiation, TRISO-coated fuel particles act as a pressure vessel to contain fission gas and mitigate the diffusion of fission products to the coolant boundary. The fuel specifications place limits on key attributes to minimize fuel particle failure under irradiation and postulated accident conditions. PARFUME (an integrated mechanistic coated particle fuel performance code developed at the Idaho National Laboratory) was used to calculate fuel particle failure probabilities. By systematically varyingmore » key TRISO-coated particle attributes, failure probability functions were developed to understand how each attribute contributes to fuel particle failure. Critical manufacturing limits were calculated for the key attributes of a low enriched TRISO-coated nuclear fuel particle with a kernel diameter of 425 μm. As a result, these critical manufacturing limits identify ranges beyond where an increase in fuel particle failure probability is expected to occur.« less
Lin, Chun-Li; Chang, Yen-Hsiang; Pa, Che-An
2009-10-01
This study evaluated the risk of failure for an endodontically treated premolar with mesio occlusodistal palatal (MODP) preparation and 3 different computer-aided design/computer-aided manufacturing (CAD/CAM) ceramic restoration configurations. Three 3-dimensional finite element (FE) models designed with CAD/CAM ceramic onlay, endocrown, and conventional crown restorations were constructed to perform simulations. The Weibull function was incorporated with FE analysis to calculate the long-term failure probability relative to different load conditions. The results indicated that the stress values on the enamel, dentin, and luting cement for endocrown restoration were the lowest values relative to the other 2 restorations. Weibull analysis revealed that the individual failure probability in the endocrown enamel, dentin, and luting cement obviously diminished more than those for onlay and conventional crown restorations. The overall failure probabilities were 27.5%, 1%, and 1% for onlay, endocrown, and conventional crown restorations, respectively, in normal occlusal condition. This numeric investigation suggests that endocrown and conventional crown restorations for endodontically treated premolars with MODP preparation present similar longevity.
Common-Cause Failure Treatment in Event Assessment: Basis for a Proposed New Model
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dana Kelly; Song-Hua Shen; Gary DeMoss
2010-06-01
Event assessment is an application of probabilistic risk assessment in which observed equipment failures and outages are mapped into the risk model to obtain a numerical estimate of the event’s risk significance. In this paper, we focus on retrospective assessments to estimate the risk significance of degraded conditions such as equipment failure accompanied by a deficiency in a process such as maintenance practices. In modeling such events, the basic events in the risk model that are associated with observed failures and other off-normal situations are typically configured to be failed, while those associated with observed successes and unchallenged components aremore » assumed capable of failing, typically with their baseline probabilities. This is referred to as the failure memory approach to event assessment. The conditioning of common-cause failure probabilities for the common cause component group associated with the observed component failure is particularly important, as it is insufficient to simply leave these probabilities at their baseline values, and doing so may result in a significant underestimate of risk significance for the event. Past work in this area has focused on the mathematics of the adjustment. In this paper, we review the Basic Parameter Model for common-cause failure, which underlies most current risk modelling, discuss the limitations of this model with respect to event assessment, and introduce a proposed new framework for common-cause failure, which uses a Bayesian network to model underlying causes of failure, and which has the potential to overcome the limitations of the Basic Parameter Model with respect to event assessment.« less
Cyber-Physical Correlations for Infrastructure Resilience: A Game-Theoretic Approach
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rao, Nageswara S; He, Fei; Ma, Chris Y. T.
In several critical infrastructures, the cyber and physical parts are correlated so that disruptions to one affect the other and hence the whole system. These correlations may be exploited to strategically launch components attacks, and hence must be accounted for ensuring the infrastructure resilience, specified by its survival probability. We characterize the cyber-physical interactions at two levels: (i) the failure correlation function specifies the conditional survival probability of cyber sub-infrastructure given the physical sub-infrastructure as a function of their marginal probabilities, and (ii) the individual survival probabilities of both sub-infrastructures are characterized by first-order differential conditions. We formulate a resiliencemore » problem for infrastructures composed of discrete components as a game between the provider and attacker, wherein their utility functions consist of an infrastructure survival probability term and a cost term expressed in terms of the number of components attacked and reinforced. We derive Nash Equilibrium conditions and sensitivity functions that highlight the dependence of infrastructure resilience on the cost term, correlation function and sub-infrastructure survival probabilities. These results generalize earlier ones based on linear failure correlation functions and independent component failures. We apply the results to models of cloud computing infrastructures and energy grids.« less
Structural Reliability Analysis and Optimization: Use of Approximations
NASA Technical Reports Server (NTRS)
Grandhi, Ramana V.; Wang, Liping
1999-01-01
This report is intended for the demonstration of function approximation concepts and their applicability in reliability analysis and design. Particularly, approximations in the calculation of the safety index, failure probability and structural optimization (modification of design variables) are developed. With this scope in mind, extensive details on probability theory are avoided. Definitions relevant to the stated objectives have been taken from standard text books. The idea of function approximations is to minimize the repetitive use of computationally intensive calculations by replacing them with simpler closed-form equations, which could be nonlinear. Typically, the approximations provide good accuracy around the points where they are constructed, and they need to be periodically updated to extend their utility. There are approximations in calculating the failure probability of a limit state function. The first one, which is most commonly discussed, is how the limit state is approximated at the design point. Most of the time this could be a first-order Taylor series expansion, also known as the First Order Reliability Method (FORM), or a second-order Taylor series expansion (paraboloid), also known as the Second Order Reliability Method (SORM). From the computational procedure point of view, this step comes after the design point identification; however, the order of approximation for the probability of failure calculation is discussed first, and it is denoted by either FORM or SORM. The other approximation of interest is how the design point, or the most probable failure point (MPP), is identified. For iteratively finding this point, again the limit state is approximated. The accuracy and efficiency of the approximations make the search process quite practical for analysis intensive approaches such as the finite element methods; therefore, the crux of this research is to develop excellent approximations for MPP identification and also different approximations including the higher-order reliability methods (HORM) for representing the failure surface. This report is divided into several parts to emphasize different segments of the structural reliability analysis and design. Broadly, it consists of mathematical foundations, methods and applications. Chapter I discusses the fundamental definitions of the probability theory, which are mostly available in standard text books. Probability density function descriptions relevant to this work are addressed. In Chapter 2, the concept and utility of function approximation are discussed for a general application in engineering analysis. Various forms of function representations and the latest developments in nonlinear adaptive approximations are presented with comparison studies. Research work accomplished in reliability analysis is presented in Chapter 3. First, the definition of safety index and most probable point of failure are introduced. Efficient ways of computing the safety index with a fewer number of iterations is emphasized. In chapter 4, the probability of failure prediction is presented using first-order, second-order and higher-order methods. System reliability methods are discussed in chapter 5. Chapter 6 presents optimization techniques for the modification and redistribution of structural sizes for improving the structural reliability. The report also contains several appendices on probability parameters.
Zhao, Yongli; He, Ruiying; Chen, Haoran; Zhang, Jie; Ji, Yuefeng; Zheng, Haomian; Lin, Yi; Wang, Xinbo
2014-04-21
Software defined networking (SDN) has become the focus in the current information and communication technology area because of its flexibility and programmability. It has been introduced into various network scenarios, such as datacenter networks, carrier networks, and wireless networks. Optical transport network is also regarded as an important application scenario for SDN, which is adopted as the enabling technology of data communication networks (DCN) instead of general multi-protocol label switching (GMPLS). However, the practical performance of SDN based DCN for large scale optical networks, which is very important for the technology selection in the future optical network deployment, has not been evaluated up to now. In this paper we have built a large scale flexi-grid optical network testbed with 1000 virtual optical transport nodes to evaluate the performance of SDN based DCN, including network scalability, DCN bandwidth limitation, and restoration time. A series of network performance parameters including blocking probability, bandwidth utilization, average lightpath provisioning time, and failure restoration time have been demonstrated under various network environments, such as with different traffic loads and different DCN bandwidths. The demonstration in this work can be taken as a proof for the future network deployment.
Software Assurance Curriculum Project Volume 2: Undergraduate Course Outlines
2010-08-01
Contents Acknowledgments iii Abstract v 1 An Undergraduate Curriculum Focus on Software Assurance 1 2 Computer Science I 7 3 Computer Science II...confidence that can be integrated into traditional software development and acquisition process models . Thus, in addition to a technology focus...testing throughout the software development life cycle ( SDLC ) AP Security and complexity—system development challenges: security failures
Integrated Environment for Development and Assurance
2015-01-26
Jan 26, 2015 © 2015 Carnegie Mellon University We Rely on Software for Safe Aircraft Operation Embedded software systems introduce a new class of...eveloper Compute Platform Runtime Architecture Application Software Embedded SW System Engineer Data Stream Characteristics Latency jitter affects...Why do system level failures still occur despite fault tolerance techniques being deployed in systems ? Embedded software system as major source of
Investigation of possible wellbore cement failures during hydraulic fracturing operations
Researchers used the peer-reviewed TOUGH+ geomechanics computational software and simulation system to investigate the possibility of fractures and shear failure along vertical wells during hydraulic fracturing operations.
48 CFR 1552.215-72 - Instructions for the Preparation of Proposals.
Code of Federal Regulations, 2013 CFR
2013-10-01
... of the information, to expedite review of the proposal, submit an IBM-compatible software or storage... offeror used another spreadsheet program, indicate the software program used to create this information... submission of a compatible software or device will expedite review, failure to submit a disk will not affect...
48 CFR 1552.215-72 - Instructions for the Preparation of Proposals.
Code of Federal Regulations, 2014 CFR
2014-10-01
... of the information, to expedite review of the proposal, submit an IBM-compatible software or storage... offeror used another spreadsheet program, indicate the software program used to create this information... submission of a compatible software or device will expedite review, failure to submit a disk will not affect...
Stochastic damage evolution in textile laminates
NASA Technical Reports Server (NTRS)
Dzenis, Yuris A.; Bogdanovich, Alexander E.; Pastore, Christopher M.
1993-01-01
A probabilistic model utilizing random material characteristics to predict damage evolution in textile laminates is presented. Model is based on a division of each ply into two sublaminas consisting of cells. The probability of cell failure is calculated using stochastic function theory and maximal strain failure criterion. Three modes of failure, i.e. fiber breakage, matrix failure in transverse direction, as well as matrix or interface shear cracking, are taken into account. Computed failure probabilities are utilized in reducing cell stiffness based on the mesovolume concept. A numerical algorithm is developed predicting the damage evolution and deformation history of textile laminates. Effect of scatter of fiber orientation on cell properties is discussed. Weave influence on damage accumulation is illustrated with the help of an example of a Kevlar/epoxy laminate.
Differential reliability : probabilistic engineering applied to wood members in bending-tension
Stanley K. Suddarth; Frank E. Woeste; William L. Galligan
1978-01-01
Reliability analysis is a mathematical technique for appraising the design and materials of engineered structures to provide a quantitative estimate of probability of failure. Two or more cases which are similar in all respects but one may be analyzed by this method; the contrast between the probabilities of failure for these cases allows strong analytical focus on the...
Fuzzy Bayesian Network-Bow-Tie Analysis of Gas Leakage during Biomass Gasification
Yan, Fang; Xu, Kaili; Yao, Xiwen; Li, Yang
2016-01-01
Biomass gasification technology has been rapidly developed recently. But fire and poisoning accidents caused by gas leakage restrict the development and promotion of biomass gasification. Therefore, probabilistic safety assessment (PSA) is necessary for biomass gasification system. Subsequently, Bayesian network-bow-tie (BN-bow-tie) analysis was proposed by mapping bow-tie analysis into Bayesian network (BN). Causes of gas leakage and the accidents triggered by gas leakage can be obtained by bow-tie analysis, and BN was used to confirm the critical nodes of accidents by introducing corresponding three importance measures. Meanwhile, certain occurrence probability of failure was needed in PSA. In view of the insufficient failure data of biomass gasification, the occurrence probability of failure which cannot be obtained from standard reliability data sources was confirmed by fuzzy methods based on expert judgment. An improved approach considered expert weighting to aggregate fuzzy numbers included triangular and trapezoidal numbers was proposed, and the occurrence probability of failure was obtained. Finally, safety measures were indicated based on the obtained critical nodes. The theoretical occurrence probabilities in one year of gas leakage and the accidents caused by it were reduced to 1/10.3 of the original values by these safety measures. PMID:27463975
Closed-form solution of decomposable stochastic models
NASA Technical Reports Server (NTRS)
Sjogren, Jon A.
1990-01-01
Markov and semi-Markov processes are increasingly being used in the modeling of complex reconfigurable systems (fault tolerant computers). The estimation of the reliability (or some measure of performance) of the system reduces to solving the process for its state probabilities. Such a model may exhibit numerous states and complicated transition distributions, contributing to an expensive and numerically delicate solution procedure. Thus, when a system exhibits a decomposition property, either structurally (autonomous subsystems), or behaviorally (component failure versus reconfiguration), it is desirable to exploit this decomposition in the reliability calculation. In interesting cases there can be failure states which arise from non-failure states of the subsystems. Equations are presented which allow the computation of failure probabilities of the total (combined) model without requiring a complete solution of the combined model. This material is presented within the context of closed-form functional representation of probabilities as utilized in the Symbolic Hierarchical Automated Reliability and Performance Evaluator (SHARPE) tool. The techniques adopted enable one to compute such probability functions for a much wider class of systems at a reduced computational cost. Several examples show how the method is used, especially in enhancing the versatility of the SHARPE tool.
Wang, Yao; Jing, Lei; Ke, Hong-Liang; Hao, Jian; Gao, Qun; Wang, Xiao-Xun; Sun, Qiang; Xu, Zhi-Jun
2016-09-20
The accelerated aging tests under electric stress for one type of LED lamp are conducted, and the differences between online and offline tests of the degradation of luminous flux are studied in this paper. The transformation of the two test modes is achieved with an adjustable AC voltage stabilized power source. Experimental results show that the exponential fitting of the luminous flux degradation in online tests possesses a higher fitting degree for most lamps, and the degradation rate of the luminous flux by online tests is always lower than that by offline tests. Bayes estimation and Weibull distribution are used to calculate the failure probabilities under the accelerated voltages, and then the reliability of the lamps under rated voltage of 220 V is estimated by use of the inverse power law model. Results show that the relative error of the lifetime estimation by offline tests increases as the failure probability decreases, and it cannot be neglected when the failure probability is less than 1%. The relative errors of lifetime estimation are 7.9%, 5.8%, 4.2%, and 3.5%, at the failure probabilities of 0.1%, 1%, 5%, and 10%, respectively.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Jiangjiang; Li, Weixuan; Lin, Guang
In decision-making for groundwater management and contamination remediation, it is important to accurately evaluate the probability of the occurrence of a failure event. For small failure probability analysis, a large number of model evaluations are needed in the Monte Carlo (MC) simulation, which is impractical for CPU-demanding models. One approach to alleviate the computational cost caused by the model evaluations is to construct a computationally inexpensive surrogate model instead. However, using a surrogate approximation can cause an extra error in the failure probability analysis. Moreover, constructing accurate surrogates is challenging for high-dimensional models, i.e., models containing many uncertain input parameters.more » To address these issues, we propose an efficient two-stage MC approach for small failure probability analysis in high-dimensional groundwater contaminant transport modeling. In the first stage, a low-dimensional representation of the original high-dimensional model is sought with Karhunen–Loève expansion and sliced inverse regression jointly, which allows for the easy construction of a surrogate with polynomial chaos expansion. Then a surrogate-based MC simulation is implemented. In the second stage, the small number of samples that are close to the failure boundary are re-evaluated with the original model, which corrects the bias introduced by the surrogate approximation. The proposed approach is tested with a numerical case study and is shown to be 100 times faster than the traditional MC approach in achieving the same level of estimation accuracy.« less
Thorndahl, S; Willems, P
2008-01-01
Failure of urban drainage systems may occur due to surcharge or flooding at specific manholes in the system, or due to overflows from combined sewer systems to receiving waters. To quantify the probability or return period of failure, standard approaches make use of the simulation of design storms or long historical rainfall series in a hydrodynamic model of the urban drainage system. In this paper, an alternative probabilistic method is investigated: the first-order reliability method (FORM). To apply this method, a long rainfall time series was divided in rainstorms (rain events), and each rainstorm conceptualized to a synthetic rainfall hyetograph by a Gaussian shape with the parameters rainstorm depth, duration and peak intensity. Probability distributions were calibrated for these three parameters and used on the basis of the failure probability estimation, together with a hydrodynamic simulation model to determine the failure conditions for each set of parameters. The method takes into account the uncertainties involved in the rainstorm parameterization. Comparison is made between the failure probability results of the FORM method, the standard method using long-term simulations and alternative methods based on random sampling (Monte Carlo direct sampling and importance sampling). It is concluded that without crucial influence on the modelling accuracy, the FORM is very applicable as an alternative to traditional long-term simulations of urban drainage systems.
Free-Swinging Failure Tolerance for Robotic Manipulators
NASA Technical Reports Server (NTRS)
English, James
1997-01-01
Under this GSRP fellowship, software-based failure-tolerance techniques were developed for robotic manipulators. The focus was on failures characterized by the loss of actuator torque at a joint, called free-swinging failures. The research results spanned many aspects of the free-swinging failure-tolerance problem, from preparing for an expected failure to discovery of postfailure capabilities to establishing efficient methods to realize those capabilities. Developed algorithms were verified using computer-based dynamic simulations, and these were further verified using hardware experiments at Johnson Space Center.
van der Burg-de Graauw, N; Cobbaert, C M; Middelhoff, C J F M; Bantje, T A; van Guldener, C
2009-05-01
B-type natriuretic peptide (BNP) and its inactive counterpart NT-proBNP can help to identify or rule out heart failure in patients presenting with acute dyspnoea. It is not well known whether measurement of these peptides can be omitted in certain patient groups. We conducted a prospective observational study of 221 patients presenting with acute dyspnoea at the emergency department. The attending physicians estimated the probability of heart failure by clinical judgement. NT-proBNP was measured, but not reported. An independent panel made a final diagnosis of all available data including NT-proBNP level and judged whether and how NT-proBNP would have altered patient management. NT-proBNP levels were highest in patients with heart failure, alone or in combination with pulmonary failure. Additive value of NT-proBNP was present in 40 of 221 (18%) of the patients, and it mostly indicated that a more intensive treatment for heart failure would have been needed. Clinical judgement was an independent predictor of additive value of NT-proBNP with a maximum at a clinical probability of heart failure of 36%. NT-proBNP measurement has additive value in a substantial number of patients presenting with acute dyspnoea, but can possibly be omitted in patients with a clinical probability of heart failure of >70%.
Reliability-based management of buried pipelines considering external corrosion defects
NASA Astrophysics Data System (ADS)
Miran, Seyedeh Azadeh
Corrosion is one of the main deteriorating mechanisms that degrade the energy pipeline integrity, due to transferring corrosive fluid or gas and interacting with corrosive environment. Corrosion defects are usually detected by periodical inspections using in-line inspection (ILI) methods. In order to ensure pipeline safety, this study develops a cost-effective maintenance strategy that consists of three aspects: corrosion growth model development using ILI data, time-dependent performance evaluation, and optimal inspection interval determination. In particular, the proposed study is applied to a cathodic protected buried steel pipeline located in Mexico. First, time-dependent power-law formulation is adopted to probabilistically characterize growth of the maximum depth and length of the external corrosion defects. Dependency between defect depth and length are considered in the model development and generation of the corrosion defects over time is characterized by the homogenous Poisson process. The growth models unknown parameters are evaluated based on the ILI data through the Bayesian updating method with Markov Chain Monte Carlo (MCMC) simulation technique. The proposed corrosion growth models can be used when either matched or non-matched defects are available, and have ability to consider newly generated defects since last inspection. Results of this part of study show that both depth and length growth models can predict damage quantities reasonably well and a strong correlation between defect depth and length is found. Next, time-dependent system failure probabilities are evaluated using developed corrosion growth models considering prevailing uncertainties where three failure modes, namely small leak, large leak and rupture are considered. Performance of the pipeline is evaluated through failure probability per km (or called a sub-system) where each subsystem is considered as a series system of detected and newly generated defects within that sub-system. Sensitivity analysis is also performed to determine to which incorporated parameter(s) in the growth models reliability of the studied pipeline is most sensitive. The reliability analysis results suggest that newly generated defects should be considered in calculating failure probability, especially for prediction of long-term performance of the pipeline and also, impact of the statistical uncertainty in the model parameters is significant that should be considered in the reliability analysis. Finally, with the evaluated time-dependent failure probabilities, a life cycle-cost analysis is conducted to determine optimal inspection interval of studied pipeline. The expected total life-cycle costs consists construction cost and expected costs of inspections, repair, and failure. The repair is conducted when failure probability from any described failure mode exceeds pre-defined probability threshold after each inspection. Moreover, this study also investigates impact of repair threshold values and unit costs of inspection and failure on the expected total life-cycle cost and optimal inspection interval through a parametric study. The analysis suggests that a smaller inspection interval leads to higher inspection costs, but can lower failure cost and also repair cost is less significant compared to inspection and failure costs.
Contingency Software in Autonomous Systems
NASA Technical Reports Server (NTRS)
Lutz, Robyn; Patterson-Hine, Ann
2006-01-01
This viewgraph presentation reviews the development of contingency software for autonomous systems. Autonomous vehicles currently have a limited capacity to diagnose and mitigate failures. There is a need to be able to handle a broader range of contingencies. The goals of the project are: 1. Speed up diagnosis and mitigation of anomalous situations.2.Automatically handle contingencies, not just failures.3.Enable projects to select a degree of autonomy consistent with their needs and to incrementally introduce more autonomy.4.Augment on-board fault protection with verified contingency scripts
Alani, Amir M.; Faramarzi, Asaad
2015-01-01
In this paper, a stochastic finite element method (SFEM) is employed to investigate the probability of failure of cementitious buried sewer pipes subjected to combined effect of corrosion and stresses. A non-linear time-dependant model is used to determine the extent of concrete corrosion. Using the SFEM, the effects of different random variables, including loads, pipe material, and corrosion on the remaining safe life of the cementitious sewer pipes are explored. A numerical example is presented to demonstrate the merit of the proposed SFEM in evaluating the effects of the contributing parameters upon the probability of failure of cementitious sewer pipes. The developed SFEM offers many advantages over traditional probabilistic techniques since it does not use any empirical equations in order to determine failure of pipes. The results of the SFEM can help the concerning industry (e.g., water companies) to better plan their resources by providing accurate prediction for the remaining safe life of cementitious sewer pipes. PMID:26068092
NASA Astrophysics Data System (ADS)
Zhang, H.; Guan, Z. W.; Wang, Q. Y.; Liu, Y. J.; Li, J. K.
2018-05-01
The effects of microstructure and stress ratio on high cycle fatigue of nickel superalloy Nimonic 80A were investigated. The stress ratios of 0.1, 0.5 and 0.8 were chosen to perform fatigue tests in a frequency of 110 Hz. Cleavage failure was observed, and three competing failure crack initiation modes were discovered by a scanning electron microscope, which were classified as surface without facets, surface with facets and subsurface with facets. With increasing the stress ratio from 0.1 to 0.8, the occurrence probability of surface and subsurface with facets also increased and reached the maximum value at R = 0.5, meanwhile the probability of surface initiation without facets decreased. The effect of microstructure on the fatigue fracture behavior at different stress ratios was also observed and discussed. Based on the Goodman diagram, it was concluded that the fatigue strength of 50% probability of failure at R = 0.1, 0.5 and 0.8 is lower than the modified Goodman line.
Proceedings of the Center for National Software Studies Workshop on Trustworthy Software
2004-05-10
just the de - velopment cost) to achieve a sustained level of software trustworthiness. • Reforming the procurement process. We could reform the...failure or breach of security. Some examples include software used in safety systems of nuclear power plants, transportation systems, medical devices...issue in many vital systems, including those found in transportation , telecommunications, utilities, health care, and financial services. Any lack of
Deviation from Power Law Behavior in Landslide Phenomenon
NASA Astrophysics Data System (ADS)
Li, L.; Lan, H.; Wu, Y.
2013-12-01
Power law distribution of magnitude is widely observed in many natural hazards (e.g., earthquake, floods, tornadoes, and forest fires). Landslide is unique as the size distribution of landslide is characterized by a power law decrease with a rollover in the small size end. Yet, the emergence of the rollover, i.e., the deviation from power law behavior for small size landslides, remains a mystery. In this contribution, we grouped the forces applied on landslide bodies into two categories: 1) the forces proportional to the volume of failure mass (gravity and friction), and 2) the forces proportional to the area of failure surface (cohesion). Failure occurs when the forces proportional to volume exceed the forces proportional to surface area. As such, given a certain mechanical configuration, the failure volume to failure surface area ratio must exceed a corresponding threshold to guarantee a failure. Assuming all landslides share a uniform shape, which means the volume to surface area ratio of landslide regularly increase with the landslide volume, a cutoff of landslide volume distribution in the small size end can be defined. However, in realistic landslide phenomena, where heterogeneities of landslide shape and mechanical configuration are existent, a simple cutoff of landslide volume distribution does not exist. The stochasticity of landslide shape introduce a probability distribution of the volume to surface area ratio with regard to landslide volume, with which the probability that the volume to surface ratio exceed the threshold can be estimated regarding values of landslide volume. An experiment based on empirical data showed that this probability can induce the power law distribution of landslide volume roll down in the small size end. We therefore proposed that the constraints on the failure volume to failure surface area ratio together with the heterogeneity of landslide geometry and mechanical configuration attribute for the deviation from power law behavior in landslide phenomenon. Figure shows that a rollover of landslide size distribution in the small size end is produced as the probability for V/S (the failure volume to failure surface ratio of landslide) exceeding the mechanical threshold applied to the power law distribution of landslide volume.
POF-Darts: Geometric adaptive sampling for probability of failure
Ebeida, Mohamed S.; Mitchell, Scott A.; Swiler, Laura P.; ...
2016-06-18
We introduce a novel technique, POF-Darts, to estimate the Probability Of Failure based on random disk-packing in the uncertain parameter space. POF-Darts uses hyperplane sampling to explore the unexplored part of the uncertain space. We use the function evaluation at a sample point to determine whether it belongs to failure or non-failure regions, and surround it with a protection sphere region to avoid clustering. We decompose the domain into Voronoi cells around the function evaluations as seeds and choose the radius of the protection sphere depending on the local Lipschitz continuity. As sampling proceeds, regions uncovered with spheres will shrink,more » improving the estimation accuracy. After exhausting the function evaluation budget, we build a surrogate model using the function evaluations associated with the sample points and estimate the probability of failure by exhaustive sampling of that surrogate. In comparison to other similar methods, our algorithm has the advantages of decoupling the sampling step from the surrogate construction one, the ability to reach target POF values with fewer samples, and the capability of estimating the number and locations of disconnected failure regions, not just the POF value. Furthermore, we present various examples to demonstrate the efficiency of our novel approach.« less
Diverse Redundant Systems for Reliable Space Life Support
NASA Technical Reports Server (NTRS)
Jones, Harry W.
2015-01-01
Reliable life support systems are required for deep space missions. The probability of a fatal life support failure should be less than one in a thousand in a multi-year mission. It is far too expensive to develop a single system with such high reliability. Using three redundant units would require only that each have a failure probability of one in ten over the mission. Since the system development cost is inverse to the failure probability, this would cut cost by a factor of one hundred. Using replaceable subsystems instead of full systems would further cut cost. Using full sets of replaceable components improves reliability more than using complete systems as spares, since a set of components could repair many different failures instead of just one. Replaceable components would require more tools, space, and planning than full systems or replaceable subsystems. However, identical system redundancy cannot be relied on in practice. Common cause failures can disable all the identical redundant systems. Typical levels of common cause failures will defeat redundancy greater than two. Diverse redundant systems are required for reliable space life support. Three, four, or five diverse redundant systems could be needed for sufficient reliability. One system with lower level repair could be substituted for two diverse systems to save cost.
Defense strategies for asymmetric networked systems under composite utilities
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rao, Nageswara S.; Ma, Chris Y. T.; Hausken, Kjell
We consider an infrastructure of networked systems with discrete components that can be reinforced at certain costs to guard against attacks. The communications network plays a critical, asymmetric role of providing the vital connectivity between the systems. We characterize the correlations within this infrastructure at two levels using (a) aggregate failure correlation function that specifies the infrastructure failure probability giventhe failure of an individual system or network, and (b) first order differential conditions on system survival probabilities that characterize component-level correlations. We formulate an infrastructure survival game between an attacker and a provider, who attacks and reinforces individual components, respectively.more » They use the composite utility functions composed of a survival probability term and a cost term, and the previously studiedsum-form and product-form utility functions are their special cases. At Nash Equilibrium, we derive expressions for individual system survival probabilities and the expected total number of operational components. We apply and discuss these estimates for a simplified model of distributed cloud computing infrastructure« less
HPC Software Stack Testing Framework
DOE Office of Scientific and Technical Information (OSTI.GOV)
Garvey, Cormac
The HPC Software stack testing framework (hpcswtest) is used in the INL Scientific Computing Department to test the basic sanity and integrity of the HPC Software stack (Compilers, MPI, Numerical libraries and Applications) and to quickly discover hard failures, and as a by-product it will indirectly check the HPC infrastructure (network, PBS and licensing servers).
ERIC Educational Resources Information Center
Rhein, Deborah; Alibrandi, Mary; Lyons, Mary; Sammons, Janice; Doyle, Luther
This bibliography, developed by Project RIMES (Reading Instructional Methods of Efficacy with Students) lists 80 software packages for teaching early reading and spelling to students at risk for reading and spelling failure. The software packages are presented alphabetically by title. Entries usually include a grade level indicator, a brief…
NASA Astrophysics Data System (ADS)
Wang, Qiang
2017-09-01
As an important part of software engineering, the software process decides the success or failure of software product. The design and development feature of security software process is discussed, so is the necessity and the present significance of using such process. Coordinating the function software, the process for security software and its testing are deeply discussed. The process includes requirement analysis, design, coding, debug and testing, submission and maintenance. In each process, the paper proposed the subprocesses to support software security. As an example, the paper introduces the above process into the power information platform.
What Software to Use in the Teaching of Mathematical Subjects?
ERIC Educational Resources Information Center
Berežný, Štefan
2015-01-01
We can consider two basic views, when using mathematical software in the teaching of mathematical subjects. First: How to learn to use specific software for the specific tasks, e. g., software Statistica for the subjects of Applied statistics, probability and mathematical statistics, or financial mathematics. Second: How to learn to use the…
Time-dependent landslide probability mapping
Campbell, Russell H.; Bernknopf, Richard L.; ,
1993-01-01
Case studies where time of failure is known for rainfall-triggered debris flows can be used to estimate the parameters of a hazard model in which the probability of failure is a function of time. As an example, a time-dependent function for the conditional probability of a soil slip is estimated from independent variables representing hillside morphology, approximations of material properties, and the duration and rate of rainfall. If probabilities are calculated in a GIS (geomorphic information system ) environment, the spatial distribution of the result for any given hour can be displayed on a map. Although the probability levels in this example are uncalibrated, the method offers a potential for evaluating different physical models and different earth-science variables by comparing the map distribution of predicted probabilities with inventory maps for different areas and different storms. If linked with spatial and temporal socio-economic variables, this method could be used for short-term risk assessment.
Failure analysis and modeling of a multicomputer system. M.S. Thesis
NASA Technical Reports Server (NTRS)
Subramani, Sujatha Srinivasan
1990-01-01
This thesis describes the results of an extensive measurement-based analysis of real error data collected from a 7-machine DEC VaxCluster multicomputer system. In addition to evaluating basic system error and failure characteristics, we develop reward models to analyze the impact of failures and errors on the system. The results show that, although 98 percent of errors in the shared resources recover, they result in 48 percent of all system failures. The analysis of rewards shows that the expected reward rate for the VaxCluster decreases to 0.5 in 100 days for a 3 out of 7 model, which is well over a 100 times that for a 7-out-of-7 model. A comparison of the reward rates for a range of k-out-of-n models indicates that the maximum increase in reward rate (0.25) occurs in going from the 6-out-of-7 model to the 5-out-of-7 model. The analysis also shows that software errors have the lowest reward (0.2 vs. 0.91 for network errors). The large loss in reward rate for software errors is due to the fact that a large proportion (94 percent) of software errors lead to failure. In comparison, the high reward rate for network errors is due to fast recovery from a majority of these errors (median recovery duration is 0 seconds).
A simplified fragility analysis of fan type cable stayed bridges
NASA Astrophysics Data System (ADS)
Khan, R. A.; Datta, T. K.; Ahmad, S.
2005-06-01
A simplified fragility analysis of fan type cable stayed bridges using Probabilistic Risk Analysis (PRA) procedure is presented for determining their failure probability under random ground motion. Seismic input to the bridge support is considered to be a risk consistent response spectrum which is obtained from a separate analysis. For the response analysis, the bridge deck is modeles as a beam supported on spring at different points. The stiffnesses of the springs are determined by a separate 2D static analysis of cable-tower-deck system. The analysis provides a coupled stiffness matrix for the spring system. A continuum method of analysis using dynamic stiffness is used to determine the dynamic properties of the bridges. The response of the bridge deck is obtained by the response spectrum method of analysis as applied to multidegree of freedom system which duly takes into account the quasi-static component of bridge deck vibration. The fragility analysis includes uncertainties arising due to the variation in ground motion, material property, modeling, method of analysis, ductility factor and damage concentration effect. Probability of failure of the bridge deck is determined by the First Order Second Moment (FOSM) method of reliability. A three span double plane symmetrical fan type cable stayed bridge of total span 689 m, is used as an illustrative example. The fragility curves for the bridge deck failure are obtained under a number of parametric variations. Some of the important conclusions of the study indicate that (i) not only vertical component but also the horizontal component of ground motion has considerable effect on the probability of failure; (ii) ground motion with no time lag between support excitations provides a smaller probability of failure as compared to ground motion with very large time lag between support excitation; and (iii) probability of failure may considerably increase soft soil condition.
NASA Technical Reports Server (NTRS)
Hatfield, Glen S.; Hark, Frank; Stott, James
2016-01-01
Launch vehicle reliability analysis is largely dependent upon using predicted failure rates from data sources such as MIL-HDBK-217F. Reliability prediction methodologies based on component data do not take into account risks attributable to manufacturing, assembly, and process controls. These sources often dominate component level reliability or risk of failure probability. While consequences of failure is often understood in assessing risk, using predicted values in a risk model to estimate the probability of occurrence will likely underestimate the risk. Managers and decision makers often use the probability of occurrence in determining whether to accept the risk or require a design modification. Due to the absence of system level test and operational data inherent in aerospace applications, the actual risk threshold for acceptance may not be appropriately characterized for decision making purposes. This paper will establish a method and approach to identify the pitfalls and precautions of accepting risk based solely upon predicted failure data. This approach will provide a set of guidelines that may be useful to arrive at a more realistic quantification of risk prior to acceptance by a program.
Specifying design conservatism: Worst case versus probabilistic analysis
NASA Technical Reports Server (NTRS)
Miles, Ralph F., Jr.
1993-01-01
Design conservatism is the difference between specified and required performance, and is introduced when uncertainty is present. The classical approach of worst-case analysis for specifying design conservatism is presented, along with the modern approach of probabilistic analysis. The appropriate degree of design conservatism is a tradeoff between the required resources and the probability and consequences of a failure. A probabilistic analysis properly models this tradeoff, while a worst-case analysis reveals nothing about the probability of failure, and can significantly overstate the consequences of failure. Two aerospace examples will be presented that illustrate problems that can arise with a worst-case analysis.
NASA Technical Reports Server (NTRS)
Vesely, William E.; Colon, Alfredo E.
2010-01-01
Design Safety/Reliability is associated with the probability of no failure-causing faults existing in a design. Confidence in the non-existence of failure-causing faults is increased by performing tests with no failure. Reliability-Growth testing requirements are based on initial assurance and fault detection probability. Using binomial tables generally gives too many required tests compared to reliability-growth requirements. Reliability-Growth testing requirements are based on reliability principles and factors and should be used.
Beeler, Nicholas M.; Roeloffs, Evelyn A.; McCausland, Wendy
2013-01-01
Mazzotti and Adams (2004) estimated that rapid deep slip during typically two week long episodes beneath northern Washington and southern British Columbia increases the probability of a great Cascadia earthquake by 30–100 times relative to the probability during the ∼58 weeks between slip events. Because the corresponding absolute probability remains very low at ∼0.03% per week, their conclusion is that though it is more likely that a great earthquake will occur during a rapid slip event than during other times, a great earthquake is unlikely to occur during any particular rapid slip event. This previous estimate used a failure model in which great earthquakes initiate instantaneously at a stress threshold. We refine the estimate, assuming a delayed failure model that is based on laboratory‐observed earthquake initiation. Laboratory tests show that failure of intact rock in shear and the onset of rapid slip on pre‐existing faults do not occur at a threshold stress. Instead, slip onset is gradual and shows a damped response to stress and loading rate changes. The characteristic time of failure depends on loading rate and effective normal stress. Using this model, the probability enhancement during the period of rapid slip in Cascadia is negligible (<10%) for effective normal stresses of 10 MPa or more and only increases by 1.5 times for an effective normal stress of 1 MPa. We present arguments that the hypocentral effective normal stress exceeds 1 MPa. In addition, the probability enhancement due to rapid slip extends into the interevent period. With this delayed failure model for effective normal stresses greater than or equal to 50 kPa, it is more likely that a great earthquake will occur between the periods of rapid deep slip than during them. Our conclusion is that great earthquake occurrence is not significantly enhanced by episodic deep slip events.
2014-08-01
technologies and processes to achieve a required level of confidence that software systems and services function in the intended manner. 1.3 Security Example...that took three high-voltage lines out of service and a software fail- ure (a race condition3) that disabled the computing service that notified the... service had failed. Instead of analyzing the details of the alarm server failure, the reviewers asked why the following software assurance claim had
Information Extraction for System-Software Safety Analysis: Calendar Year 2007 Year-End Report
NASA Technical Reports Server (NTRS)
Malin, Jane T.
2008-01-01
This annual report describes work to integrate a set of tools to support early model-based analysis of failures and hazards due to system-software interactions. The tools perform and assist analysts in the following tasks: 1) extract model parts from text for architecture and safety/hazard models; 2) combine the parts with library information to develop the models for visualization and analysis; 3) perform graph analysis on the models to identify possible paths from hazard sources to vulnerable entities and functions, in nominal and anomalous system-software configurations; 4) perform discrete-time-based simulation on the models to investigate scenarios where these paths may play a role in failures and mishaps; and 5) identify resulting candidate scenarios for software integration testing. This paper describes new challenges in a NASA abort system case, and enhancements made to develop the integrated tool set.
A Comparison of Two Approaches to Safety Analysis Based on Use Cases
NASA Astrophysics Data System (ADS)
Stålhane, Tor; Sindre, Guttorm
Engineering has a long tradition in analyzing the safety of mechanical, electrical and electronic systems. Important methods like HazOp and FMEA have also been adopted by the software engineering community. The misuse case method, on the other hand, has been developed by the software community as an alternative to FMEA and preliminary HazOp for software development. To compare the two methods misuse case and FMEA we have run a small experiment involving 42 third year software engineering students. In the experiment, the students should identify and analyze failure modes from one of the use cases for a commercial electronic patient journals system. The results of the experiment show that on the average, the group that used misuse cases identified and analyzed more user related failure modes than the persons using FMEA. In addition, the persons who used the misuse cases scored better on perceived ease of use and intention to use.
The assessment of low probability containment failure modes using dynamic PRA
NASA Astrophysics Data System (ADS)
Brunett, Acacia Joann
Although low probability containment failure modes in nuclear power plants may lead to large releases of radioactive material, these modes are typically crudely modeled in system level codes and have large associated uncertainties. Conventional risk assessment techniques (i.e. the fault-tree/event-tree methodology) are capable of accounting for these failure modes to some degree, however, they require the analyst to pre-specify the ordering of events, which can vary within the range of uncertainty of the phenomena. More recently, dynamic probabilistic risk assessment (DPRA) techniques have been developed which remove the dependency on the analyst. Through DPRA, it is now possible to perform a mechanistic and consistent analysis of low probability phenomena, with the timing of the possible events determined by the computational model simulating the reactor behavior. The purpose of this work is to utilize DPRA tools to assess low probability containment failure modes and the driving mechanisms. Particular focus is given to the risk-dominant containment failure modes considered in NUREG-1150, which has long been the standard for PRA techniques. More specifically, this work focuses on the low probability phenomena occurring during a station blackout (SBO) with late power recovery in the Zion Nuclear Power Plant, a Westinghouse pressurized water reactor (PWR). Subsequent to the major risk study performed in NUREG-1150, significant experimentation and modeling regarding the mechanisms driving containment failure modes have been performed. In light of this improved understanding, NUREG-1150 containment failure modes are reviewed in this work using the current state of knowledge. For some unresolved mechanisms, such as containment loading from high pressure melt ejection and combustion events, additional analyses are performed using the accident simulation tool MELCOR to explore the bounding containment loads for realistic scenarios. A dynamic treatment in the characterization of combustible gas ignition is also presented in this work. In most risk studies, combustion is treated simplistically in that it is assumed an ignition occurs if the gas mixture achieves a concentration favorable for ignition under the premise that an adequate ignition source is available. However, the criteria affecting ignition (such as the magnitude, location and frequency of the ignition sources) are complicated. This work demonstrates a technique for characterizing the properties of an ignition source to determine a probability of ignition. The ignition model developed in this work and implemented within a dynamic framework is utilized to analyze the implications and risk significance of late combustion events. This work also explores the feasibility of using dynamic event trees (DETs) with a deterministic sampling approach to analyze low probability phenomena. The flexibility of this approach is demonstrated through the rediscretization of containment fragility curves used in construction of the DET to show convergence to a true solution. Such a rediscretization also reduces the computational burden introduced through extremely fine fragility curve discretization by subsequent refinement of fragility curve regions of interest. Another advantage of the approach is the ability to perform sensitivity studies on the cumulative distribution functions (CDFs) used to determine branching probabilities without the need for rerunning the simulation code. Through review of the NUREG-1150 containment failure modes using the current state of knowledge, it is found that some failure modes, such as Alpha and rocket, can be excluded from further studies; other failure modes, such as failure to isolate, bypass, high pressure melt ejection (HPME), combustion-induced failure and overpressurization are still concerns to varying degrees. As part of this analysis, scoping studies performed in MELCOR show that HPME and the resulting direct containment heating (DCH) do not impose a significant threat to containment integrity. Additional scoping studies regarding the effect of recovery actions on in-vessel hydrogen generation show that reflooding a partially degraded core do not significantly affect hydrogen generation in-vessel, and the NUREG-1150 assumption that insufficient hydrogen is generated in-vessel to produce an energetic deflagration is confirmed. The DET analyses performed in this work show that very late power recovery produces the potential for very energetic combustion events which are capable of failing containment with a non-negligible probability, and that containment cooling systems have a significant impact on core concrete attack, and therefore combustible gas generation ex-vessel. Ultimately, the overall risk of combustion-induced containment failure is low, but its conditional likelihood can have a significant effect on accident mitigation strategies. It is also shown in this work that DETs are particularly well suited to examine low probability events because of their ability to rediscretize CDFs and observe solution convergence.
Progressive retry for software error recovery in distributed systems
NASA Technical Reports Server (NTRS)
Wang, Yi-Min; Huang, Yennun; Fuchs, W. K.
1993-01-01
In this paper, we describe a method of execution retry for bypassing software errors based on checkpointing, rollback, message reordering and replaying. We demonstrate how rollback techniques, previously developed for transient hardware failure recovery, can also be used to recover from software faults by exploiting message reordering to bypass software errors. Our approach intentionally increases the degree of nondeterminism and the scope of rollback when a previous retry fails. Examples from our experience with telecommunications software systems illustrate the benefits of the scheme.
Probability techniques for reliability analysis of composite materials
NASA Technical Reports Server (NTRS)
Wetherhold, Robert C.; Ucci, Anthony M.
1994-01-01
Traditional design approaches for composite materials have employed deterministic criteria for failure analysis. New approaches are required to predict the reliability of composite structures since strengths and stresses may be random variables. This report will examine and compare methods used to evaluate the reliability of composite laminae. The two types of methods that will be evaluated are fast probability integration (FPI) methods and Monte Carlo methods. In these methods, reliability is formulated as the probability that an explicit function of random variables is less than a given constant. Using failure criteria developed for composite materials, a function of design variables can be generated which defines a 'failure surface' in probability space. A number of methods are available to evaluate the integration over the probability space bounded by this surface; this integration delivers the required reliability. The methods which will be evaluated are: the first order, second moment FPI methods; second order, second moment FPI methods; the simple Monte Carlo; and an advanced Monte Carlo technique which utilizes importance sampling. The methods are compared for accuracy, efficiency, and for the conservativism of the reliability estimation. The methodology involved in determining the sensitivity of the reliability estimate to the design variables (strength distributions) and importance factors is also presented.
Free-Swinging Failure Tolerance for Robotic Manipulators. Degree awarded by Purdue Univ.
NASA Technical Reports Server (NTRS)
English, James
1997-01-01
Under this GSRP fellowship, software-based failure-tolerance techniques were developed for robotic manipulators. The focus was on failures characterized by the loss of actuator torque at a joint, called free-swinging failures. The research results spanned many aspects of the free-swinging failure-tolerance problem, from preparing for an expected failure to discovery of postfailure capabilities to establishing efficient methods to realize those capabilities. Developed algorithms were verified using computer-based dynamic simulations, and these were further verified using hardware experiments at Johnson Space Center.
1981-05-15
Crane. is capable of imagining unicorns -- and we expect he is -- why does he find it relatively difficult to imagine himself avoiding a 30 minute...probability that the plan will succeed and to evaluate the risk of various causes of failure . We have suggested that the construction of scenarios is...expect that events will unfold as planned. However, the cumulative probability of at least one fatal failure could be overwhelmingly high even when
1986-04-07
34 Blackhol -" * Success/failure is too clear cut * The probability of failure is greater than the probability of success The Job Itsellf (59) • Does not...indecd, it is not -- or as one officer in the survey co-ented "a blackhole ." USAHEC is a viable career oppor- tunity; it is career enhancing; and
VHSIC/VHSIC-Like Reliability Prediction Modeling
1989-10-01
prediction would require ’ kowledge of event statistics as well as device robustness. Ii1 Additionally, although this is primarily a theoretical, bottom...Degradation in Section 5.3 P = Power PDIP = Plastic DIP P(f) = Probability of Failure due to EOS or ESD P(flc) = Probability of Failure given Contact from an...the results of those stresses: Device Stress Part Number Power Dissipation Manufacturer Test Type Part Description Junction Teniperatune Package Type
Will They Report It? Ethical Attitude of Graduate Software Engineers in Reporting Bad News
ERIC Educational Resources Information Center
Sajeev, A. S. M.; Crnkovic, Ivica
2012-01-01
Hiding critical information has resulted in disastrous failures of some major software projects. This paper investigates, using a subset of Keil's test, how graduates (70% of them with work experience) from different cultural backgrounds who are enrolled in a postgraduate course on global software development would handle negative information that…
Lodi, Sara; Phillips, Andrew; Fidler, Sarah; Hawkins, David; Gilson, Richard; McLean, Ken; Fisher, Martin; Post, Frank; Johnson, Anne M.; Walker-Nthenda, Louise; Dunn, David; Porter, Kholoud
2013-01-01
Background The development of HIV drug resistance and subsequent virological failure are often cited as potential disadvantages of early cART initiation. However, their long-term probability is not known, and neither is the role of duration of infection at the time of initiation. Methods Patients enrolled in the UK Register of HIV seroconverters were followed-up from cART initiation to last HIV-RNA measurement. Through survival analysis we examined predictors of virologic failure (2HIV-RNA ≥400 c/l while on cART) including CD4 count and HIV duration at initiation. We also estimated the cumulative probabilities of failure and drug resistance (from the available HIV nucleotide sequences) for early initiators (cART within 12 months of seroconversion). Results Of 1075 starting cART at a median (IQR) CD4 count 272 (190,370) cells/mm3 and HIV duration 3 (1,6) years, virological failure occurred in 163 (15%). Higher CD4 count at initiation, but not HIV infection duration at cART initiation, was independently associated with lower risk of failure (p=0.033 and 0.592 respectively). Among 230 patients initiating cART early, 97 (42%) discontinued it after a median of 7 months; cumulative probabilities of resistance and failure by 8 years were 7% (95% CI 4,11) and 19% (13,25), respectively. Conclusion Although the rate of discontinuation of early cART in our cohort was high, the long-term rate of virological failure was low. Our data do not support early cART initiation being associated with increased risk of failure and drug resistance. PMID:24086588
NASA Astrophysics Data System (ADS)
Papaioannou, Athanasios; Mavromichalaki, Helen; Souvatzoglou, George; Paschalis, Pavlos; Sarlanis, Christos; Dimitroulakos, John; Gerontidou, Maria
2013-04-01
High-energy particles released at the Sun during a solar flare or a very energetic coronal mass ejection, result to a significant intensity increase at neutron monitor measurements known as Ground Level Enhancements (GLEs). Due to their space weather impact (i.e. risks and failures at communication and navigation systems, spacecraft electronics and operations, space power systems, manned space missions, and commercial aircraft operations) it is crucial to establish a real-time operational system that would be in place to issue reliable and timely GLE Alerts. Currently, the Cosmic Ray group of the National and Kapodistrian University of Athens is working towards the establishment of a Neutron Monitor Service that will be made available via the Space Weather Portal operated by the European Space Agency (ESA), under the Space Situational Awareness (SSA) Program. To this end, a web interface providing data from multiple Neutron Monitor stations as well as an upgraded GLE Alert will be provided. Both services are now under testing and validation and they will probably enter to an operational phase next year. The core of this Neutron Monitor Service is the GLE Alert software, and therefore, the main goal of this research effort is to upgrade the existing GLE Alert software, to minimize the probability of a false alarm and to enhance the usability of the corresponding results. The ESA Neutron Monitor Service is building upon the infrastructure made available with the implementation of the High-Resolution Neutron Monitor Database (NMDB). In this work the structure of the Neutron Monitor Service for ESA SSA Program and the impact of the novel GLE Alert Service that will be made available to future users via ESA SSA web portal will be presented and further discussed.
Holbrook, Christopher M.; Perry, Russell W.; Brandes, Patricia L.; Adams, Noah S.
2013-01-01
In telemetry studies, premature tag failure causes negative bias in fish survival estimates because tag failure is interpreted as fish mortality. We used mark-recapture modeling to adjust estimates of fish survival for a previous study where premature tag failure was documented. High rates of tag failure occurred during the Vernalis Adaptive Management Plan’s (VAMP) 2008 study to estimate survival of fall-run Chinook salmon (Oncorhynchus tshawytscha) during migration through the San Joaquin River and Sacramento-San Joaquin Delta, California. Due to a high rate of tag failure, the observed travel time distribution was likely negatively biased, resulting in an underestimate of tag survival probability in this study. Consequently, the bias-adjustment method resulted in only a small increase in estimated fish survival when the observed travel time distribution was used to estimate the probability of tag survival. Since the bias-adjustment failed to remove bias, we used historical travel time data and conducted a sensitivity analysis to examine how fish survival might have varied across a range of tag survival probabilities. Our analysis suggested that fish survival estimates were low (95% confidence bounds range from 0.052 to 0.227) over a wide range of plausible tag survival probabilities (0.48–1.00), and this finding is consistent with other studies in this system. When tags fail at a high rate, available methods to adjust for the bias may perform poorly. Our example highlights the importance of evaluating the tag life assumption during survival studies, and presents a simple framework for evaluating adjusted survival estimates when auxiliary travel time data are available.
Performance analysis of the word synchronization properties of the outer code in a TDRSS decoder
NASA Technical Reports Server (NTRS)
Costello, D. J., Jr.; Lin, S.
1984-01-01
A self-synchronizing coding scheme for NASA's TDRSS satellite system is a concatenation of a (2,1,7) inner convolutional code with a (255,223) Reed-Solomon outer code. Both symbol and word synchronization are achieved without requiring that any additional symbols be transmitted. An important parameter which determines the performance of the word sync procedure is the ratio of the decoding failure probability to the undetected error probability. Ideally, the former should be as small as possible compared to the latter when the error correcting capability of the code is exceeded. A computer simulation of a (255,223) Reed-Solomon code as carried out. Results for decoding failure probability and for undetected error probability are tabulated and compared.
Gaussian process surrogates for failure detection: A Bayesian experimental design approach
NASA Astrophysics Data System (ADS)
Wang, Hongqiao; Lin, Guang; Li, Jinglai
2016-05-01
An important task of uncertainty quantification is to identify the probability of undesired events, in particular, system failures, caused by various sources of uncertainties. In this work we consider the construction of Gaussian process surrogates for failure detection and failure probability estimation. In particular, we consider the situation that the underlying computer models are extremely expensive, and in this setting, determining the sampling points in the state space is of essential importance. We formulate the problem as an optimal experimental design for Bayesian inferences of the limit state (i.e., the failure boundary) and propose an efficient numerical scheme to solve the resulting optimization problem. In particular, the proposed limit-state inference method is capable of determining multiple sampling points at a time, and thus it is well suited for problems where multiple computer simulations can be performed in parallel. The accuracy and performance of the proposed method is demonstrated by both academic and practical examples.
The influence of microstructure on the probability of early failure in aluminum-based interconnects
NASA Astrophysics Data System (ADS)
Dwyer, V. M.
2004-09-01
For electromigration in short aluminum interconnects terminated by tungsten vias, the well known "short-line" effect applies. In a similar manner, for longer lines, early failure is determined by a critical value Lcrit for the length of polygranular clusters. Any cluster shorter than Lcrit is "immortal" on the time scale of early failure where the figure of merit is not the standard t50 value (the time to 50% failures), but rather the total probability of early failure, Pcf. Pcf is a complex function of current density, linewidth, line length, and material properties (the median grain size d50 and grain size shape factor σd). It is calculated here using a model based around the theory of runs, which has proved itself to be a useful tool for assessing the probability of extreme events. Our analysis shows that Pcf is strongly dependent on σd, and a change in σd from 0.27 to 0.5 can cause an order of magnitude increase in Pcf under typical test conditions. This has implications for the web-based two-dimensional grain-growth simulator MIT/EmSim, which generates grain patterns with σd=0.27, while typical as-patterned structures are better represented by a σd in the range 0.4 - 0.6. The simulator will consequently overestimate interconnect reliability due to this particular electromigration failure mode.
Product-oriented Software Certification Process for Software Synthesis
NASA Technical Reports Server (NTRS)
Nelson, Stacy; Fischer, Bernd; Denney, Ewen; Schumann, Johann; Richardson, Julian; Oh, Phil
2004-01-01
The purpose of this document is to propose a product-oriented software certification process to facilitate use of software synthesis and formal methods. Why is such a process needed? Currently, software is tested until deemed bug-free rather than proving that certain software properties exist. This approach has worked well in most cases, but unfortunately, deaths still occur due to software failure. Using formal methods (techniques from logic and discrete mathematics like set theory, automata theory and formal logic as opposed to continuous mathematics like calculus) and software synthesis, it is possible to reduce this risk by proving certain software properties. Additionally, software synthesis makes it possible to automate some phases of the traditional software development life cycle resulting in a more streamlined and accurate development process.
A new computer code for discrete fracture network modelling
NASA Astrophysics Data System (ADS)
Xu, Chaoshui; Dowd, Peter
2010-03-01
The authors describe a comprehensive software package for two- and three-dimensional stochastic rock fracture simulation using marked point processes. Fracture locations can be modelled by a Poisson, a non-homogeneous, a cluster or a Cox point process; fracture geometries and properties are modelled by their respective probability distributions. Virtual sampling tools such as plane, window and scanline sampling are included in the software together with a comprehensive set of statistical tools including histogram analysis, probability plots, rose diagrams and hemispherical projections. The paper describes in detail the theoretical basis of the implementation and provides a case study in rock fracture modelling to demonstrate the application of the software.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jadaan, O.M.; Powers, L.M.; Nemeth, N.N.
1995-08-01
A probabilistic design methodology which predicts the fast fracture and time-dependent failure behavior of thermomechanically loaded ceramic components is discussed using the CARES/LIFE integrated design computer program. Slow crack growth (SCG) is assumed to be the mechanism responsible for delayed failure behavior. Inert strength and dynamic fatigue data obtained from testing coupon specimens (O-ring and C-ring specimens) are initially used to calculate the fast fracture and SCG material parameters as a function of temperature using the parameter estimation techniques available with the CARES/LIFE code. Finite element analysis (FEA) is used to compute the stress distributions for the tube as amore » function of applied pressure. Knowing the stress and temperature distributions and the fast fracture and SCG material parameters, the life time for a given tube can be computed. A stress-failure probability-time to failure (SPT) diagram is subsequently constructed for these tubes. Such a diagram can be used by design engineers to estimate the time to failure at a given failure probability level for a component subjected to a given thermomechanical load.« less
Uncertainty Analysis via Failure Domain Characterization: Polynomial Requirement Functions
NASA Technical Reports Server (NTRS)
Crespo, Luis G.; Munoz, Cesar A.; Narkawicz, Anthony J.; Kenny, Sean P.; Giesy, Daniel P.
2011-01-01
This paper proposes an uncertainty analysis framework based on the characterization of the uncertain parameter space. This characterization enables the identification of worst-case uncertainty combinations and the approximation of the failure and safe domains with a high level of accuracy. Because these approximations are comprised of subsets of readily computable probability, they enable the calculation of arbitrarily tight upper and lower bounds to the failure probability. A Bernstein expansion approach is used to size hyper-rectangular subsets while a sum of squares programming approach is used to size quasi-ellipsoidal subsets. These methods are applicable to requirement functions whose functional dependency on the uncertainty is a known polynomial. Some of the most prominent features of the methodology are the substantial desensitization of the calculations from the uncertainty model assumed (i.e., the probability distribution describing the uncertainty) as well as the accommodation for changes in such a model with a practically insignificant amount of computational effort.
Uncertainty Analysis via Failure Domain Characterization: Unrestricted Requirement Functions
NASA Technical Reports Server (NTRS)
Crespo, Luis G.; Kenny, Sean P.; Giesy, Daniel P.
2011-01-01
This paper proposes an uncertainty analysis framework based on the characterization of the uncertain parameter space. This characterization enables the identification of worst-case uncertainty combinations and the approximation of the failure and safe domains with a high level of accuracy. Because these approximations are comprised of subsets of readily computable probability, they enable the calculation of arbitrarily tight upper and lower bounds to the failure probability. The methods developed herein, which are based on nonlinear constrained optimization, are applicable to requirement functions whose functional dependency on the uncertainty is arbitrary and whose explicit form may even be unknown. Some of the most prominent features of the methodology are the substantial desensitization of the calculations from the assumed uncertainty model (i.e., the probability distribution describing the uncertainty) as well as the accommodation for changes in such a model with a practically insignificant amount of computational effort.
Oman India Pipeline: An operational repair strategy based on a rational assessment of risk
DOE Office of Scientific and Technical Information (OSTI.GOV)
German, P.
1996-12-31
This paper describes the development of a repair strategy for the operational phase of the Oman India Pipeline based upon the probability and consequences of a pipeline failure. Risk analyses and cost benefit analyses performed provide guidance on the level of deepwater repair development effort appropriate for the Oman India Pipeline project and identifies critical areas toward which more intense development effort should be directed. The risk analysis results indicate that the likelihood of a failure of the Oman India Pipeline during its 40-year life is low. Furthermore, the probability of operational failure of the pipeline in deepwater regions ismore » extremely low, the major proportion of operational failure risk being associated with the shallow water regions.« less
Game-theoretic strategies for asymmetric networked systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rao, Nageswara S.; Ma, Chris Y. T.; Hausken, Kjell
Abstract—We consider an infrastructure consisting of a network of systems each composed of discrete components that can be reinforced at a certain cost to guard against attacks. The network provides the vital connectivity between systems, and hence plays a critical, asymmetric role in the infrastructure operations. We characterize the system-level correlations using the aggregate failure correlation function that specifies the infrastructure failure probability given the failure of an individual system or network. The survival probabilities of systems and network satisfy first-order differential conditions that capture the component-level correlations. We formulate the problem of ensuring the infrastructure survival as a gamemore » between anattacker and a provider, using the sum-form and product-form utility functions, each composed of a survival probability term and a cost term. We derive Nash Equilibrium conditions which provide expressions for individual system survival probabilities, and also the expected capacity specified by the total number of operational components. These expressions differ only in a single term for the sum-form and product-form utilities, despite their significant differences.We apply these results to simplified models of distributed cloud computing infrastructures.« less
A Model for Assessing the Liability of Seemingly Correct Software
NASA Technical Reports Server (NTRS)
Voas, Jeffrey M.; Voas, Larry K.; Miller, Keith W.
1991-01-01
Current research on software reliability does not lend itself to quantitatively assessing the risk posed by a piece of life-critical software. Black-box software reliability models are too general and make too many assumptions to be applied confidently to assessing the risk of life-critical software. We present a model for assessing the risk caused by a piece of software; this model combines software testing results and Hamlet's probable correctness model. We show how this model can assess software risk for those who insure against a loss that can occur if life-critical software fails.
Making statistical inferences about software reliability
NASA Technical Reports Server (NTRS)
Miller, Douglas R.
1988-01-01
Failure times of software undergoing random debugging can be modelled as order statistics of independent but nonidentically distributed exponential random variables. Using this model inferences can be made about current reliability and, if debugging continues, future reliability. This model also shows the difficulty inherent in statistical verification of very highly reliable software such as that used by digital avionics in commercial aircraft.
An Incremental Life-cycle Assurance Strategy for Critical System Certification
2014-11-04
for Safe Aircraft Operation Embedded software systems introduce a new class of problems not addressed by traditional system modeling & analysis...Platform Runtime Architecture Application Software Embedded SW System Engineer Data Stream Characteristics Latency jitter affects control behavior...do system level failures still occur despite fault tolerance techniques being deployed in systems ? Embedded software system as major source of
Ellefsen, Karl J.
2017-06-27
MapMark4 is a software package that implements the probability calculations in three-part mineral resource assessments. Functions within the software package are written in the R statistical programming language. These functions, their documentation, and a copy of this user’s guide are bundled together in R’s unit of shareable code, which is called a “package.” This user’s guide includes step-by-step instructions showing how the functions are used to carry out the probability calculations. The calculations are demonstrated using test data, which are included in the package.
The Importance of HRA in Human Space Flight: Understanding the Risks
NASA Technical Reports Server (NTRS)
Hamlin, Teri
2010-01-01
Human performance is critical to crew safety during space missions. Humans interact with hardware and software during ground processing, normal flight, and in response to events. Human interactions with hardware and software can cause Loss of Crew and/or Vehicle (LOCV) through improper actions, or may prevent LOCV through recovery and control actions. Humans have the ability to deal with complex situations and system interactions beyond the capability of machines. Human Reliability Analysis (HRA) is a method used to qualitatively and quantitatively assess the occurrence of human failures that affect availability and reliability of complex systems. Modeling human actions with their corresponding failure probabilities in a Probabilistic Risk Assessment (PRA) provides a more complete picture of system risks and risk contributions. A high-quality HRA can provide valuable information on potential areas for improvement, including training, procedures, human interfaces design, and the need for automation. Modeling human error has always been a challenge in part because performance data is not always readily available. For spaceflight, the challenge is amplified not only because of the small number of participants and limited amount of performance data available, but also due to the lack of definition of the unique factors influencing human performance in space. These factors, called performance shaping factors in HRA terminology, are used in HRA techniques to modify basic human error probabilities in order to capture the context of an analyzed task. Many of the human error modeling techniques were developed within the context of nuclear power plants and therefore the methodologies do not address spaceflight factors such as the effects of microgravity and longer duration missions. This presentation will describe the types of human error risks which have shown up as risk drivers in the Shuttle PRA which may be applicable to commercial space flight. As with other large PRAs of complex machines, human error in the Shuttle PRA proved to be an important contributor (12 percent) to LOCV. An existing HRA technique was adapted for use in the Shuttle PRA, but additional guidance and improvements are needed to make the HRA task in space-related PRAs easier and more accurate. Therefore, this presentation will also outline plans for expanding current HRA methodology to more explicitly cover spaceflight performance shaping factors.
2008-12-01
between our current project and the historical projects. Therefore to refine the historical volatility estimate of the previously completed software... historical volatility estimates obtained in the form of beliefs and plausibility based on subjective probabilities that take into consideration unique
Bonnabry, P; Cingria, L; Sadeghipour, F; Ing, H; Fonzo-Christe, C; Pfister, R
2005-01-01
Background: Until recently, the preparation of paediatric parenteral nutrition formulations in our institution included re-transcription and manual compounding of the mixture. Although no significant clinical problems have occurred, re-engineering of this high risk activity was undertaken to improve its safety. Several changes have been implemented including new prescription software, direct recording on a server, automatic printing of the labels, and creation of a file used to pilot a BAXA MM 12 automatic compounder. The objectives of this study were to compare the risks associated with the old and new processes, to quantify the improved safety with the new process, and to identify the major residual risks. Methods: A failure modes, effects, and criticality analysis (FMECA) was performed by a multidisciplinary team. A cause-effect diagram was built, the failure modes were defined, and the criticality index (CI) was determined for each of them on the basis of the likelihood of occurrence, the severity of the potential effect, and the detection probability. The CIs for each failure mode were compared for the old and new processes and the risk reduction was quantified. Results: The sum of the CIs of all 18 identified failure modes was 3415 for the old process and 1397 for the new (reduction of 59%). The new process reduced the CIs of the different failure modes by a mean factor of 7. The CI was smaller with the new process for 15 failure modes, unchanged for two, and slightly increased for one. The greatest reduction (by a factor of 36) concerned re-transcription errors, followed by readability problems (by a factor of 30) and chemical cross contamination (by a factor of 10). The most critical steps in the new process were labelling mistakes (CI 315, maximum 810), failure to detect a dosage or product mistake (CI 288), failure to detect a typing error during the prescription (CI 175), and microbial contamination (CI 126). Conclusions: Modification of the process resulted in a significant risk reduction as shown by risk analysis. Residual failure opportunities were also quantified, allowing additional actions to be taken to reduce the risk of labelling mistakes. This study illustrates the usefulness of prospective risk analysis methods in healthcare processes. More systematic use of risk analysis is needed to guide continuous safety improvement of high risk activities. PMID:15805453
Bonnabry, P; Cingria, L; Sadeghipour, F; Ing, H; Fonzo-Christe, C; Pfister, R E
2005-04-01
Until recently, the preparation of paediatric parenteral nutrition formulations in our institution included re-transcription and manual compounding of the mixture. Although no significant clinical problems have occurred, re-engineering of this high risk activity was undertaken to improve its safety. Several changes have been implemented including new prescription software, direct recording on a server, automatic printing of the labels, and creation of a file used to pilot a BAXA MM 12 automatic compounder. The objectives of this study were to compare the risks associated with the old and new processes, to quantify the improved safety with the new process, and to identify the major residual risks. A failure modes, effects, and criticality analysis (FMECA) was performed by a multidisciplinary team. A cause-effect diagram was built, the failure modes were defined, and the criticality index (CI) was determined for each of them on the basis of the likelihood of occurrence, the severity of the potential effect, and the detection probability. The CIs for each failure mode were compared for the old and new processes and the risk reduction was quantified. The sum of the CIs of all 18 identified failure modes was 3415 for the old process and 1397 for the new (reduction of 59%). The new process reduced the CIs of the different failure modes by a mean factor of 7. The CI was smaller with the new process for 15 failure modes, unchanged for two, and slightly increased for one. The greatest reduction (by a factor of 36) concerned re-transcription errors, followed by readability problems (by a factor of 30) and chemical cross contamination (by a factor of 10). The most critical steps in the new process were labelling mistakes (CI 315, maximum 810), failure to detect a dosage or product mistake (CI 288), failure to detect a typing error during the prescription (CI 175), and microbial contamination (CI 126). Modification of the process resulted in a significant risk reduction as shown by risk analysis. Residual failure opportunities were also quantified, allowing additional actions to be taken to reduce the risk of labelling mistakes. This study illustrates the usefulness of prospective risk analysis methods in healthcare processes. More systematic use of risk analysis is needed to guide continuous safety improvement of high risk activities.
Jahanfar, Ali; Amirmojahedi, Mohsen; Gharabaghi, Bahram; Dubey, Brajesh; McBean, Edward; Kumar, Dinesh
2017-03-01
Rapid population growth of major urban centres in many developing countries has created massive landfills with extraordinary heights and steep side-slopes, which are frequently surrounded by illegal low-income residential settlements developed too close to landfills. These extraordinary landfills are facing high risks of catastrophic failure with potentially large numbers of fatalities. This study presents a novel method for risk assessment of landfill slope failure, using probabilistic analysis of potential failure scenarios and associated fatalities. The conceptual framework of the method includes selecting appropriate statistical distributions for the municipal solid waste (MSW) material shear strength and rheological properties for potential failure scenario analysis. The MSW material properties for a given scenario is then used to analyse the probability of slope failure and the resulting run-out length to calculate the potential risk of fatalities. In comparison with existing methods, which are solely based on the probability of slope failure, this method provides a more accurate estimate of the risk of fatalities associated with a given landfill slope failure. The application of the new risk assessment method is demonstrated with a case study for a landfill located within a heavily populated area of New Delhi, India.
Software For Computing Reliability Of Other Software
NASA Technical Reports Server (NTRS)
Nikora, Allen; Antczak, Thomas M.; Lyu, Michael
1995-01-01
Computer Aided Software Reliability Estimation (CASRE) computer program developed for use in measuring reliability of other software. Easier for non-specialists in reliability to use than many other currently available programs developed for same purpose. CASRE incorporates mathematical modeling capabilities of public-domain Statistical Modeling and Estimation of Reliability Functions for Software (SMERFS) computer program and runs in Windows software environment. Provides menu-driven command interface; enabling and disabling of menu options guides user through (1) selection of set of failure data, (2) execution of mathematical model, and (3) analysis of results from model. Written in C language.
Transient Reliability Analysis Capability Developed for CARES/Life
NASA Technical Reports Server (NTRS)
Nemeth, Noel N.
2001-01-01
The CARES/Life software developed at the NASA Glenn Research Center provides a general-purpose design tool that predicts the probability of the failure of a ceramic component as a function of its time in service. This award-winning software has been widely used by U.S. industry to establish the reliability and life of a brittle material (e.g., ceramic, intermetallic, and graphite) structures in a wide variety of 21st century applications.Present capabilities of the NASA CARES/Life code include probabilistic life prediction of ceramic components subjected to fast fracture, slow crack growth (stress corrosion), and cyclic fatigue failure modes. Currently, this code can compute the time-dependent reliability of ceramic structures subjected to simple time-dependent loading. For example, in slow crack growth failure conditions CARES/Life can handle sustained and linearly increasing time-dependent loads, whereas in cyclic fatigue applications various types of repetitive constant-amplitude loads can be accounted for. However, in real applications applied loads are rarely that simple but vary with time in more complex ways such as engine startup, shutdown, and dynamic and vibrational loads. In addition, when a given component is subjected to transient environmental and or thermal conditions, the material properties also vary with time. A methodology has now been developed to allow the CARES/Life computer code to perform reliability analysis of ceramic components undergoing transient thermal and mechanical loading. This means that CARES/Life will be able to analyze finite element models of ceramic components that simulate dynamic engine operating conditions. The methodology developed is generalized to account for material property variation (on strength distribution and fatigue) as a function of temperature. This allows CARES/Life to analyze components undergoing rapid temperature change in other words, components undergoing thermal shock. In addition, the capability has been developed to perform reliability analysis for components that undergo proof testing involving transient loads. This methodology was developed for environmentally assisted crack growth (crack growth as a function of time and loading), but it will be extended to account for cyclic fatigue (crack growth as a function of load cycles) as well.
Addressing failures in exascale computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Snir, Marc; Wisniewski, Robert W.; Abraham, Jacob A.
2014-05-01
We present here a report produced by a workshop on “Addressing Failures in Exascale Computing” held in Park City, Utah, August 4–11, 2012. The charter of this workshop was to establish a common taxonomy about resilience across all the levels in a computing system; discuss existing knowledge on resilience across the various hardware and software layers of an exascale system; and build on those results, examining potential solutions from both a hardware and software perspective and focusing on a combined approach. The workshop brought together participants with expertise in applications, system software, and hardware; they came from industry, government, andmore » academia; and their interests ranged from theory to implementation. The combination allowed broad and comprehensive discussions and led to this document, which summarizes and builds on those discussions.« less
Addressing Failures in Exascale Computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Snir, Marc; Wisniewski, Robert; Abraham, Jacob
2014-01-01
We present here a report produced by a workshop on Addressing failures in exascale computing' held in Park City, Utah, 4-11 August 2012. The charter of this workshop was to establish a common taxonomy about resilience across all the levels in a computing system, discuss existing knowledge on resilience across the various hardware and software layers of an exascale system, and build on those results, examining potential solutions from both a hardware and software perspective and focusing on a combined approach. The workshop brought together participants with expertise in applications, system software, and hardware; they came from industry, government, andmore » academia, and their interests ranged from theory to implementation. The combination allowed broad and comprehensive discussions and led to this document, which summarizes and builds on those discussions.« less
A Voyager attitude control perspective on fault tolerant systems
NASA Technical Reports Server (NTRS)
Rasmussen, R. D.; Litty, E. C.
1981-01-01
In current spacecraft design, a trend can be observed to achieve greater fault tolerance through the application of on-board software dedicated to detecting and isolating failures. Whether fault tolerance through software can meet the desired objectives depends on very careful consideration and control of the system in which the software is imbedded. The considered investigation has the objective to provide some of the insight needed for the required analysis of the system. A description is given of the techniques which have been developed in this connection during the development of the Voyager spacecraft. The Voyager Galileo Attitude and Articulation Control Subsystem (AACS) fault tolerant design is discussed to emphasize basic lessons learned from this experience. The central driver of hardware redundancy implementation on Voyager was known as the 'single point failure criterion'.
Yang, Hui; Zhang, Jie; Zhao, Yongli; Ji, Yuefeng; Wu, Jialin; Lin, Yi; Han, Jianrui; Lee, Young
2015-05-18
Inter-data center interconnect with IP over elastic optical network (EON) is a promising scenario to meet the high burstiness and high-bandwidth requirements of data center services. In our previous work, we implemented multi-stratum resources integration among IP networks, optical networks and application stratums resources that allows to accommodate data center services. In view of this, this study extends to consider the service resilience in case of edge optical node failure. We propose a novel multi-stratum resources integrated resilience (MSRIR) architecture for the services in software defined inter-data center interconnect based on IP over EON. A global resources integrated resilience (GRIR) algorithm is introduced based on the proposed architecture. The MSRIR can enable cross stratum optimization and provide resilience using the multiple stratums resources, and enhance the data center service resilience responsiveness to the dynamic end-to-end service demands. The overall feasibility and efficiency of the proposed architecture is experimentally verified on the control plane of our OpenFlow-based enhanced SDN (eSDN) testbed. The performance of GRIR algorithm under heavy traffic load scenario is also quantitatively evaluated based on MSRIR architecture in terms of path blocking probability, resilience latency and resource utilization, compared with other resilience algorithms.
Ferret, Yann; Caillault, Aurélie; Sebda, Shéhérazade; Duez, Marc; Grardel, Nathalie; Duployez, Nicolas; Villenet, Céline; Figeac, Martin; Preudhomme, Claude; Salson, Mikaël; Giraud, Mathieu
2016-05-01
High-throughput sequencing (HTS) is considered a technical revolution that has improved our knowledge of lymphoid and autoimmune diseases, changing our approach to leukaemia both at diagnosis and during follow-up. As part of an immunoglobulin/T cell receptor-based minimal residual disease (MRD) assessment of acute lymphoblastic leukaemia patients, we assessed the performance and feasibility of the replacement of the first steps of the approach based on DNA isolation and Sanger sequencing, using a HTS protocol combined with bioinformatics analysis and visualization using the Vidjil software. We prospectively analysed the diagnostic and relapse samples of 34 paediatric patients, thus identifying 125 leukaemic clones with recombinations on multiple loci (TRG, TRD, IGH and IGK), including Dd2/Dd3 and Intron/KDE rearrangements. Sequencing failures were halved (14% vs. 34%, P = 0.0007), enabling more patients to be monitored. Furthermore, more markers per patient could be monitored, reducing the probability of false negative MRD results. The whole analysis, from sample receipt to clinical validation, was shorter than our current diagnostic protocol, with equal resources. V(D)J recombination was successfully assigned by the software, even for unusual recombinations. This study emphasizes the progress that HTS with adapted bioinformatics tools can bring to the diagnosis of leukaemia patients. © 2016 John Wiley & Sons Ltd.
Effects of footwear and stride length on metatarsal strains and failure in running.
Firminger, Colin R; Fung, Anita; Loundagin, Lindsay L; Edwards, W Brent
2017-11-01
The metatarsal bones of the foot are particularly susceptible to stress fracture owing to the high strains they experience during the stance phase of running. Shoe cushioning and stride length reduction represent two potential interventions to decrease metatarsal strain and thus stress fracture risk. Fourteen male recreational runners ran overground at a 5-km pace while motion capture and plantar pressure data were collected during four experimental conditions: traditional shoe at preferred and 90% preferred stride length, and minimalist shoe at preferred and 90% preferred stride length. Combined musculoskeletal - finite element modeling based on motion analysis and computed tomography data were used to quantify metatarsal strains and the probability of failure was determined using stress-life predictions. No significant interactions between footwear and stride length were observed. Running in minimalist shoes increased strains for all metatarsals by 28.7% (SD 6.4%; p<0.001) and probability of failure for metatarsals 2-4 by 17.3% (SD 14.3%; p≤0.005). Running at 90% preferred stride length decreased strains for metatarsal 4 by 4.2% (SD 2.0%; p≤0.007), and no differences in probability of failure were observed. Significant increases in metatarsal strains and the probability of failure were observed for recreational runners acutely transitioning to minimalist shoes. Running with a 10% reduction in stride length did not appear to be a beneficial technique for reducing the risk of metatarsal stress fracture, however the increased number of loading cycles for a given distance was not detrimental either. Copyright © 2017 Elsevier Ltd. All rights reserved.
GSC configuration management plan
NASA Technical Reports Server (NTRS)
Withers, B. Edward
1990-01-01
The tools and methods used for the configuration management of the artifacts (including software and documentation) associated with the Guidance and Control Software (GCS) project are described. The GCS project is part of a software error studies research program. Three implementations of GCS are being produced in order to study the fundamental characteristics of the software failure process. The Code Management System (CMS) is used to track and retrieve versions of the documentation and software. Application of the CMS for this project is described and the numbering scheme is delineated for the versions of the project artifacts.
Program to Diagnose Probability of Aspiration Pneumonia in Patients with Ischemic Stroke
Pinto, Gisele; Zétola, Viviane; Lange, Marcos; Gomes, Guilherme; Nunes, Maria Cristina; Hirata, Gisela; Lagos-Guimarães, Hellen Nataly
2014-01-01
Introduction Stroke is a major cause of death and disability worldwide, with a strong economic and social impact. Approximately 40% of patients show motor, language, and swallowing disorders after stroke. Objective To evaluate the use of software to infer the probability of pneumonia in patients with ischemic stroke. Methods Prospective and cross-sectional study conducted in a university hospital from March 2010 to August 2012. After confirmation of ischemic stroke by computed axial tomography, a clinical and flexible endoscopic evaluation of swallowing was performed within 72 hours of onset of symptoms. All patients received speech therapy poststroke, and the data were subsequently analyzed by the software. The patients were given medical treatment and speech therapy for 3 months. Results The study examined 52 patients with a mean age of 62.05 ± 13.88 years, with 23 (44.2%) women. Of the 52 patients, only 3 (5.7%) had a probability of pneumonia between 80 and 100% as identified by the software. Of all patients, 32 (61.7%) had pneumonia probability between 0 and 19%, 5 (9.5%) between 20 and 49%, 3 (5.8%) between 50 and 79%, and 12 (23.0%) between 80 and 100%. Conclusion The computer program indicates the probability of patient having aspiration pneumonia after ischemic stroke. PMID:25992100
NASA Technical Reports Server (NTRS)
2001-01-01
Qualtech Systems, Inc. developed a complete software system with capabilities of multisignal modeling, diagnostic analysis, run-time diagnostic operations, and intelligent interactive reasoners. Commercially available as the TEAMS (Testability Engineering and Maintenance System) tool set, the software can be used to reveal unanticipated system failures. The TEAMS software package is broken down into four companion tools: TEAMS-RT, TEAMATE, TEAMS-KB, and TEAMS-RDS. TEAMS-RT identifies good, bad, and suspect components in the system in real-time. It reports system health results from onboard tests, and detects and isolates failures within the system, allowing for rapid fault isolation. TEAMATE takes over from where TEAMS-RT left off by intelligently guiding the maintenance technician through the troubleshooting procedure, repair actions, and operational checkout. TEAMS-KB serves as a model management and collection tool. TEAMS-RDS (TEAMS-Remote Diagnostic Server) has the ability to continuously assess a system and isolate any failure in that system or its components, in real time. RDS incorporates TEAMS-RT, TEAMATE, and TEAMS-KB in a large-scale server architecture capable of providing advanced diagnostic and maintenance functions over a network, such as the Internet, with a web browser user interface.
A Summary of Taxonomies of Digital System Failure Modes Provided by the DigRel Task Group
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chu T. L.; Yue M.; Postma, W.
2012-06-25
Recently, the CSNI directed WGRisk to set up a task group called DIGREL to initiate a new task on developing a taxonomy of failure modes of digital components for the purposes of PSA. It is an important step towards standardized digital I&C reliability assessment techniques for PSA. The objective of this paper is to provide a comparison of the failure mode taxonomies provided by the participants. The failure modes are classified in terms of their levels of detail. Software and hardware failure modes are discussed separately.
Experiences with Probabilistic Analysis Applied to Controlled Systems
NASA Technical Reports Server (NTRS)
Kenny, Sean P.; Giesy, Daniel P.
2004-01-01
This paper presents a semi-analytic method for computing frequency dependent means, variances, and failure probabilities for arbitrarily large-order closed-loop dynamical systems possessing a single uncertain parameter or with multiple highly correlated uncertain parameters. The approach will be shown to not suffer from the same computational challenges associated with computing failure probabilities using conventional FORM/SORM techniques. The approach is demonstrated by computing the probabilistic frequency domain performance of an optimal feed-forward disturbance rejection scheme.
Improved Correction of Misclassification Bias With Bootstrap Imputation.
van Walraven, Carl
2018-07-01
Diagnostic codes used in administrative database research can create bias due to misclassification. Quantitative bias analysis (QBA) can correct for this bias, requires only code sensitivity and specificity, but may return invalid results. Bootstrap imputation (BI) can also address misclassification bias but traditionally requires multivariate models to accurately estimate disease probability. This study compared misclassification bias correction using QBA and BI. Serum creatinine measures were used to determine severe renal failure status in 100,000 hospitalized patients. Prevalence of severe renal failure in 86 patient strata and its association with 43 covariates was determined and compared with results in which renal failure status was determined using diagnostic codes (sensitivity 71.3%, specificity 96.2%). Differences in results (misclassification bias) were then corrected with QBA or BI (using progressively more complex methods to estimate disease probability). In total, 7.4% of patients had severe renal failure. Imputing disease status with diagnostic codes exaggerated prevalence estimates [median relative change (range), 16.6% (0.8%-74.5%)] and its association with covariates [median (range) exponentiated absolute parameter estimate difference, 1.16 (1.01-2.04)]. QBA produced invalid results 9.3% of the time and increased bias in estimates of both disease prevalence and covariate associations. BI decreased misclassification bias with increasingly accurate disease probability estimates. QBA can produce invalid results and increase misclassification bias. BI avoids invalid results and can importantly decrease misclassification bias when accurate disease probability estimates are used.
Software Design Improvements. Part 2; Software Quality and the Design and Inspection Process
NASA Technical Reports Server (NTRS)
Lalli, Vincent R.; Packard, Michael H.; Ziemianski, Tom
1997-01-01
The application of assurance engineering techniques improves the duration of failure-free performance of software. The totality of features and characteristics of a software product are what determine its ability to satisfy customer needs. Software in safety-critical systems is very important to NASA. We follow the System Safety Working Groups definition for system safety software as: 'The optimization of system safety in the design, development, use and maintenance of software and its integration with safety-critical systems in an operational environment. 'If it is not safe, say so' has become our motto. This paper goes over methods that have been used by NASA to make software design improvements by focusing on software quality and the design and inspection process.
Coordination in Large Scale Software Development
1990-01-01
toward achieving common and explicitly recognized goals" (Blau and Scott, 1962) and "the integration or linking together of different parts of an...require a strong degree of integration of its components. Much software is built of thousands of modules that must mesh with each other perfectly for the...coordination between subgroups producing software modules could lead to failure in integrating the modules themselves. Informal communication. Both
49 CFR Appendix A to Part 238 - Schedule of Civil Penalties 1
Code of Federal Regulations, 2014 CFR
2014-10-01
....15Movement of power brake defects: (b) Improper movement from Class I or IA brake test 5,000 7,500 (c... required design features 5,000 7,500 (e) Failure to comply with hardware and software safety program 5,000... test previously used equipment 7,500 11,000 (b)(1) Failure to develop plan 7,500 11,000 (b)(2) Failure...
49 CFR Appendix A to Part 238 - Schedule of Civil Penalties 1
Code of Federal Regulations, 2010 CFR
2010-10-01
....15Movement of power brake defects: (b) Improper movement from Class I or IA brake test 5,000 7,500 (c... required design features 5,000 7,500 (e) Failure to comply with hardware and software safety program 5,000... test previously used equipment 7,500 11,000 (b)(1) Failure to develop plan 7,500 11,000 (b)(2) Failure...
49 CFR Appendix A to Part 238 - Schedule of Civil Penalties 1
Code of Federal Regulations, 2013 CFR
2013-10-01
... movement from Class I or IA brake test 5,000 7,500 (c) Improper movement of en route defect 2,500 5,000 (2...) Failure to include required design features 5,000 7,500 (e) Failure to comply with hardware and software... properly test previously used equipment 7,500 11,000 (b)(1) Failure to develop plan 7,500 11,000 (b)(2...
A Stress Gradient Failure Theory for Textile Structural Composites
2006-05-01
additional element failures occur. Incorporation of thermal stresses and investigation of the coefficient of thermal expansion is another potential...avenue for further development of the failure modeling. Due to mismatches between the coefficient of thermal expansion of constituent materials...directly from ABAQUS software, which yields element volumes as outputs, thus the volume of all matrix elements can be compared to the volume of all
Borzecki, Ann M; Chen, Qi; Mull, Hillary J; Shwartz, Michael; Bhatt, Deepak L; Hanchate, Amresh; Rosen, Amy K
2016-09-01
The 3M Potentially Preventable Readmissions (3M-PPR) software matches clinically related index admission and readmission diagnoses that may signify in-hospital or postdischarge quality problems. To assess whether the PPR algorithm identifies preventable readmissions, we compared processes of care between PPR software-flagged and nonflagged cases. Using 2006 to 2010 national VA administrative data, we identified acute myocardial infarction and heart failure discharges associated with 30-day all-cause readmissions, then flagged cases (PPR-Yes/PPR-No) using the 3M-PPR software. To assess care quality, we abstracted medical records of 100 readmissions per condition using tools containing explicit processes organized into admission work-up, in-hospital evaluation/treatment, discharge readiness, postdischarge period. We derived quality scores, scaled to a maximum of 25 per section (maximum total score=100) and compared cases on total and section-specific mean scores. For acute myocardial infarction, 77 of 100 cases were flagged as PPR-Yes. Section quality scores were highest for in-hospital evaluation/treatment (20.5±2.8) and lowest for postdischarge care (6.8±9.1). Total and section-related mean scores did not differ by PPR status; respective PPR-Yes versus PPR-No total scores were 61.6±11.1 and 60.4±9.4; P=0.98. For heart failure, 86 of 100 cases were flagged as PPR-Yes. Section scores were highest for discharge readiness (18.8±2.4) and lowest for postdischarge care (7.3±8.1). Like acute myocardial infarction, total and section-related mean scores did not differ by PPR status; PPR-Yes versus PPR-No total scores were 61.2±10.8 and 63.4±7.0, respectively; P=0.47. Among VA acute myocardial infarction and heart failure readmissions, the 3M-PPR software does not distinguish differences in case-level quality of care. Whether 3M-PPR software better identifies preventable readmissions by using other methods to capture poorly documented processes or performing different comparisons requires further study. © 2016 American Heart Association, Inc.
STS-55 pad abort: Engine 2011 oxidizer preburner augmented spark igniter check valve leak
NASA Technical Reports Server (NTRS)
1993-01-01
The STS-55 initial launch attempt of Columbia (OV102) was terminated on KSC launch pad A March 22, 1993 at 9:51 AM E.S.T. due to violation of an ME-3 (Engine 2011) Launch Commit Criteria (LCC) limit exceedance. The event description and timeline are summarized. Propellant loading was initiated on 22 March, 1993 at 1:15 AM EST. All SSME chill parameters and launch commit criteria (LCC) were nominal. At engine start plus 1.44 seconds, a Failure Identification (FID) was posted against Engine 2011 for exceeding the 50 psia Oxidizer Preburner (OPB) purge pressure redline. The engine was shut down at 1.50 seconds followed by Engines 2034 and 2030. All shut down sequences were nominal and the mission was safely aborted. The OPB purge pressure redline violation and the abort profile/overlay for all three engines are depicted. SSME Avionics hardware and software performed nominally during the incident. A review of vehicle data table (VDT) data and controller software logic revealed no failure indications other than the single FID 013-414, OPB purge pressure redline exceeded. Software logic was executed according to requirements and there was no anomalous controller software operation. Immediately following the abort, a Rocketdyne/NASA failure investigation team was assembled. The team successfully isolated the failure cause to the oxidizer preburner augmented spark igniter purge check valve not being fully closed due to contamination. The source of the contaminant was traced to a cut segment from a rubber O-ring which was used in a fine clean tool during valve production prior to 1992. The valve was apparently contaminated during its fabrication in 1985. The valve had performed acceptably on four previous flights of the engine, and SSME flight history shows 780 combined check valve flights without failure. The failure of an Engine 3 (SSME No. 2011) check valve to close was sensed by onboard engine instruments even though all other engine operations were normal. This resulted in an engine shutdown and safe sequential shutdown of all three engines prior to ignition of the solid boosters.
Collaboration Strategies to Reduce Technical Debt
ERIC Educational Resources Information Center
Miko, Jeffrey Allen
2017-01-01
Inadequate software development collaboration processes can allow technical debt to accumulate increasing future maintenance costs and the chance of system failures. The purpose of this qualitative case study was to explore collaboration strategies software development leaders use to reduce the amount of technical debt created by software…
Hybrid Modeling for Testing Intelligent Software for Lunar-Mars Closed Life Support
NASA Technical Reports Server (NTRS)
Malin, Jane T.; Nicholson, Leonard S. (Technical Monitor)
1999-01-01
Intelligent software is being developed for closed life support systems with biological components, for human exploration of the Moon and Mars. The intelligent software functions include planning/scheduling, reactive discrete control and sequencing, management of continuous control, and fault detection, diagnosis, and management of failures and errors. Four types of modeling information have been essential to system modeling and simulation to develop and test the software and to provide operational model-based what-if analyses: discrete component operational and failure modes; continuous dynamic performance within component modes, modeled qualitatively or quantitatively; configuration of flows and power among components in the system; and operations activities and scenarios. CONFIG, a multi-purpose discrete event simulation tool that integrates all four types of models for use throughout the engineering and operations life cycle, has been used to model components and systems involved in the production and transfer of oxygen and carbon dioxide in a plant-growth chamber and between that chamber and a habitation chamber with physicochemical systems for gas processing.
1999-01-01
published in December of 1998. In addition, Mr. Drake is the author of a theme article entitled: "Measuring Software Quality: A Case Study...and services may run on different platforms in differing combinations , • Partial application failure (e.g., a client running, service down) is...result in a combined utility function that is some aggregation of the underlying utility functions. The benefit a client receives from a service
Probability of loss of assured safety in systems with multiple time-dependent failure modes.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Helton, Jon Craig; Pilch, Martin.; Sallaberry, Cedric Jean-Marie.
2012-09-01
Weak link (WL)/strong link (SL) systems are important parts of the overall operational design of high-consequence systems. In such designs, the SL system is very robust and is intended to permit operation of the entire system under, and only under, intended conditions. In contrast, the WL system is intended to fail in a predictable and irreversible manner under accident conditions and render the entire system inoperable before an accidental operation of the SL system. The likelihood that the WL system will fail to deactivate the entire system before the SL system fails (i.e., degrades into a configuration that could allowmore » an accidental operation of the entire system) is referred to as probability of loss of assured safety (PLOAS). Representations for PLOAS for situations in which both link physical properties and link failure properties are time-dependent are derived and numerically evaluated for a variety of WL/SL configurations, including PLOAS defined by (i) failure of all SLs before failure of any WL, (ii) failure of any SL before failure of any WL, (iii) failure of all SLs before failure of all WLs, and (iv) failure of any SL before failure of all WLs. The effects of aleatory uncertainty and epistemic uncertainty in the definition and numerical evaluation of PLOAS are considered.« less
Enhancing MPLS Protection Method with Adaptive Segment Repair
NASA Astrophysics Data System (ADS)
Chen, Chin-Ling
We propose a novel adaptive segment repair mechanism to improve traditional MPLS (Multi-Protocol Label Switching) failure recovery. The proposed mechanism protects one or more contiguous high failure probability links by dynamic setup of segment protection. Simulations demonstrate that the proposed mechanism reduces failure recovery time while also increasing network resource utilization.
Probabilistic inspection strategies for minimizing service failures
NASA Technical Reports Server (NTRS)
Brot, Abraham
1994-01-01
The INSIM computer program is described which simulates the 'limited fatigue life' environment in which aircraft structures generally operate. The use of INSIM to develop inspection strategies which aim to minimize service failures is demonstrated. Damage-tolerance methodology, inspection thresholds and customized inspections are simulated using the probability of failure as the driving parameter.
Caballero Morales, Santiago Omar
2013-01-01
The application of Preventive Maintenance (PM) and Statistical Process Control (SPC) are important practices to achieve high product quality, small frequency of failures, and cost reduction in a production process. However there are some points that have not been explored in depth about its joint application. First, most SPC is performed with the X-bar control chart which does not fully consider the variability of the production process. Second, many studies of design of control charts consider just the economic aspect while statistical restrictions must be considered to achieve charts with low probabilities of false detection of failures. Third, the effect of PM on processes with different failure probability distributions has not been studied. Hence, this paper covers these points, presenting the Economic Statistical Design (ESD) of joint X-bar-S control charts with a cost model that integrates PM with general failure distribution. Experiments showed statistically significant reductions in costs when PM is performed on processes with high failure rates and reductions in the sampling frequency of units for testing under SPC. PMID:23527082
Effect of Preconditioning and Soldering on Failures of Chip Tantalum Capacitors
NASA Technical Reports Server (NTRS)
Teverovsky, Alexander A.
2014-01-01
Soldering of molded case tantalum capacitors can result in damage to Ta205 dielectric and first turn-on failures due to thermo-mechanical stresses caused by CTE mismatch between materials used in the capacitors. It is also known that presence of moisture might cause damage to plastic cases due to the pop-corning effect. However, there are only scarce literature data on the effect of moisture content on the probability of post-soldering electrical failures. In this work, that is based on a case history, different groups of similar types of CWR tantalum capacitors from two lots were prepared for soldering by bake, moisture saturation, and longterm storage at room conditions. Results of the testing showed that both factors: initial quality of the lot, and preconditioning affect the probability of failures. Baking before soldering was shown to be effective to prevent failures even in lots susceptible to pop-corning damage. Mechanism of failures is discussed and recommendations for pre-soldering bake are suggested based on analysis of moisture characteristics of materials used in the capacitors' design.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lumsdaine, Andrew
2013-03-08
The main purpose of the Coordinated Infrastructure for Fault Tolerance in Systems initiative has been to conduct research with a goal of providing end-to-end fault tolerance on a systemwide basis for applications and other system software. While fault tolerance has been an integral part of most high-performance computing (HPC) system software developed over the past decade, it has been treated mostly as a collection of isolated stovepipes. Visibility and response to faults has typically been limited to the particular hardware and software subsystems in which they are initially observed. Little fault information is shared across subsystems, allowing little flexibility ormore » control on a system-wide basis, making it practically impossible to provide cohesive end-to-end fault tolerance in support of scientific applications. As an example, consider faults such as communication link failures that can be seen by a network library but are not directly visible to the job scheduler, or consider faults related to node failures that can be detected by system monitoring software but are not inherently visible to the resource manager. If information about such faults could be shared by the network libraries or monitoring software, then other system software, such as a resource manager or job scheduler, could ensure that failed nodes or failed network links were excluded from further job allocations and that further diagnosis could be performed. As a founding member and one of the lead developers of the Open MPI project, our efforts over the course of this project have been focused on making Open MPI more robust to failures by supporting various fault tolerance techniques, and using fault information exchange and coordination between MPI and the HPC system software stack from the application, numeric libraries, and programming language runtime to other common system components such as jobs schedulers, resource managers, and monitoring tools.« less
An Evidence Theoretic Approach to Design of Reliable Low-Cost UAVs
2009-07-28
given period. For complex systems with various stages of missions, “ success ” becomes hard to define. For a UAV, for example, is success defined as...For this reason, the proposed methods in this thesis investigate probability of failure (PoF ) rather than probability of success . Further, failure will...reduction in system PoF . Figure 25 illustrates this; a single component 43 (A) from the original system (Figure 25a) is modified to act in a subsystem with
On the estimation of risk associated with an attenuation prediction
NASA Technical Reports Server (NTRS)
Crane, R. K.
1992-01-01
Viewgraphs from a presentation on the estimation of risk associated with an attenuation prediction is presented. Topics covered include: link failure - attenuation exceeding a specified threshold for a specified time interval or intervals; risk - the probability of one or more failures during the lifetime of the link or during a specified accounting interval; the problem - modeling the probability of attenuation by rainfall to provide a prediction of the attenuation threshold for a specified risk; and an accounting for the inadequacy of a model or models.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Helton, Jon C.; Brooks, Dusty Marie; Sallaberry, Cedric Jean-Marie.
Probability of loss of assured safety (PLOAS) is modeled for weak link (WL)/strong link (SL) systems in which one or more WLs or SLs could potentially degrade into a precursor condition to link failure that will be followed by an actual failure after some amount of elapsed time. The following topics are considered: (i) Definition of precursor occurrence time cumulative distribution functions (CDFs) for individual WLs and SLs, (ii) Formal representation of PLOAS with constant delay times, (iii) Approximation and illustration of PLOAS with constant delay times, (iv) Formal representation of PLOAS with aleatory uncertainty in delay times, (v) Approximationmore » and illustration of PLOAS with aleatory uncertainty in delay times, (vi) Formal representation of PLOAS with delay times defined by functions of link properties at occurrence times for failure precursors, (vii) Approximation and illustration of PLOAS with delay times defined by functions of link properties at occurrence times for failure precursors, and (viii) Procedures for the verification of PLOAS calculations for the three indicated definitions of delayed link failure.« less
Fishnet statistics for probabilistic strength and scaling of nacreous imbricated lamellar materials
NASA Astrophysics Data System (ADS)
Luo, Wen; Bažant, Zdeněk P.
2017-12-01
Similar to nacre (or brick masonry), imbricated (or staggered) lamellar structures are widely found in nature and man-made materials, and are of interest for biomimetics. They can achieve high defect insensitivity and fracture toughness, as demonstrated in previous studies. But the probability distribution with a realistic far-left tail is apparently unknown. Here, strictly for statistical purposes, the microstructure of nacre is approximated by a diagonally pulled fishnet with quasibrittle links representing the shear bonds between parallel lamellae (or platelets). The probability distribution of fishnet strength is calculated as a sum of a rapidly convergent series of the failure probabilities after the rupture of one, two, three, etc., links. Each of them represents a combination of joint probabilities and of additive probabilities of disjoint events, modified near the zone of failed links by the stress redistributions caused by previously failed links. Based on previous nano- and multi-scale studies at Northwestern, the strength distribution of each link, characterizing the interlamellar shear bond, is assumed to be a Gauss-Weibull graft, but with a deeper Weibull tail than in Type 1 failure of non-imbricated quasibrittle materials. The autocorrelation length is considered equal to the link length. The size of the zone of failed links at maximum load increases with the coefficient of variation (CoV) of link strength, and also with fishnet size. With an increasing width-to-length aspect ratio, a rectangular fishnet gradually transits from the weakest-link chain to the fiber bundle, as the limit cases. The fishnet strength at failure probability 10-6 grows with the width-to-length ratio. For a square fishnet boundary, the strength at 10-6 failure probability is about 11% higher, while at fixed load the failure probability is about 25-times higher than it is for the non-imbricated case. This is a major safety advantage of the fishnet architecture over particulate or fiber reinforced materials. There is also a strong size effect, partly similar to that of Type 1 while the curves of log-strength versus log-size for different sizes could cross each other. The predicted behavior is verified by about a million Monte Carlo simulations for each of many fishnet geometries, sizes and CoVs of link strength. In addition to the weakest-link or fiber bundle, the fishnet becomes the third analytically tractable statistical model of structural strength, and has the former two as limit cases.
WinDAM C earthen embankment internal erosion analysis software
USDA-ARS?s Scientific Manuscript database
Two primary causes of dam failure are overtopping and internal erosion. For the purpose of evaluating dam safety for existing earthen embankment dams and proposed earthen embankment dams, Windows Dam Analysis Modules C (WinDAM C) software will simulate either internal erosion or erosion resulting f...
14 CFR 35.23 - Propeller control system.
Code of Federal Regulations, 2013 CFR
2013-01-01
... between operating modes, performs the functions defined by the applicant throughout the declared operating... system imbedded software must be designed and implemented by a method approved by the Administrator that... software errors. (d) The propeller control system must be designed and constructed so that the failure or...
14 CFR 35.23 - Propeller control system.
Code of Federal Regulations, 2012 CFR
2012-01-01
... between operating modes, performs the functions defined by the applicant throughout the declared operating... system imbedded software must be designed and implemented by a method approved by the Administrator that... software errors. (d) The propeller control system must be designed and constructed so that the failure or...
14 CFR 35.23 - Propeller control system.
Code of Federal Regulations, 2014 CFR
2014-01-01
... between operating modes, performs the functions defined by the applicant throughout the declared operating... system imbedded software must be designed and implemented by a method approved by the Administrator that... software errors. (d) The propeller control system must be designed and constructed so that the failure or...
NASA Astrophysics Data System (ADS)
Popov, V. D.; Khamidullina, N. M.
2006-10-01
In developing radio-electronic devices (RED) of spacecraft operating in the fields of ionizing radiation in space, one of the most important problems is the correct estimation of their radiation tolerance. The “weakest link” in the element base of onboard microelectronic devices under radiation effect is the integrated microcircuits (IMC), especially of large scale (LSI) and very large scale (VLSI) degree of integration. The main characteristic of IMC, which is taken into account when making decisions on using some particular type of IMC in the onboard RED, is the probability of non-failure operation (NFO) at the end of the spacecraft’s lifetime. It should be noted that, until now, the NFO has been calculated only from the reliability characteristics, disregarding the radiation effect. This paper presents the so-called “reliability” approach to determination of radiation tolerance of IMC, which allows one to estimate the probability of non-failure operation of various types of IMC with due account of radiation-stimulated dose failures. The described technique is applied to RED onboard the Spektr-R spacecraft to be launched in 2007.
14 CFR 25.729 - Retracting mechanism.
Code of Federal Regulations, 2014 CFR
2014-01-01
... design take-off weight), occurring during retraction and extension at any airspeed up to 1.5 VSR1 (with... of— (1) Any reasonably probable failure in the normal retraction system; or (2) The failure of any...
14 CFR 25.729 - Retracting mechanism.
Code of Federal Regulations, 2013 CFR
2013-01-01
... design take-off weight), occurring during retraction and extension at any airspeed up to 1.5 VSR1 (with... of— (1) Any reasonably probable failure in the normal retraction system; or (2) The failure of any...
Software Considerations for Subscale Flight Testing of Experimental Control Laws
NASA Technical Reports Server (NTRS)
Murch, Austin M.; Cox, David E.; Cunningham, Kevin
2009-01-01
The NASA AirSTAR system has been designed to address the challenges associated with safe and efficient subscale flight testing of research control laws in adverse flight conditions. In this paper, software elements of this system are described, with an emphasis on components which allow for rapid prototyping and deployment of aircraft control laws. Through model-based design and automatic coding a common code-base is used for desktop analysis, piloted simulation and real-time flight control. The flight control system provides the ability to rapidly integrate and test multiple research control laws and to emulate component or sensor failures. Integrated integrity monitoring systems provide aircraft structural load protection, isolate the system from control algorithm failures, and monitor the health of telemetry streams. Finally, issues associated with software configuration management and code modularity are briefly discussed.
Failure Surfaces for the Design of Ceramic-Lined Gun Tubes
2004-12-01
density than steel making them attractive candidates as gun tube liners . A new design approach is necessary to address the large variability in strength...systems. Having established the failure criterion for the ceramic liner as the Weibull probability of failure, the need for a suitable failure...Report AMMRC SP-82-1, Materials Technology Laboratory, Watertown, Massachusetts, 1982. 7 R. Katz, Ceramic Gun Barrel Liners : Retrospect and Prospect
NASA Technical Reports Server (NTRS)
Wilmot, Jonathan
2005-01-01
The contents include the following: High availability. Hardware is in harsh environment. Flight processor (constraints) very widely due to power and weight constraints. Software must be remotely modifiable and still operate while changes are being made. Many custom one of kind interfaces for one of a kind missions. Sustaining engineering. Price of failure is high, tens to hundreds of millions of dollars.
Mechanistic Considerations Used in the Development of the PROFIT PCI Failure Model
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pankaskie, P. J.
A fuel Pellet-Zircaloy Cladding (thermo-mechanical-chemical) Interactions (PC!) failure model for estimating the probability of failure in !ransient increases in power (PROFIT) was developed. PROFIT is based on 1) standard statistical methods applied to available PC! fuel failure data and 2) a mechanistic analysis of the environmental and strain-rate-dependent stress versus strain characteristics of Zircaloy cladding. The statistical analysis of fuel failures attributable to PCI suggested that parameters in addition to power, transient increase in power, and burnup are needed to define PCI fuel failures in terms of probability estimates with known confidence limits. The PROFIT model, therefore, introduces an environmentalmore » and strain-rate dependent strain energy absorption to failure (SEAF) concept to account for the stress versus strain anomalies attributable to interstitial-disloction interaction effects in the Zircaloy cladding. Assuming that the power ramping rate is the operating corollary of strain-rate in the Zircaloy cladding, then the variables of first order importance in the PCI fuel failure phenomenon are postulated to be: 1. pre-transient fuel rod power, P{sub I}, 2. transient increase in fuel rod power, {Delta}P, 3. fuel burnup, Bu, and 4. the constitutive material property of the Zircaloy cladding, SEAF.« less
Ayas, Mouhab; Eapen, Mary; Le-Rademacher, Jennifer; Carreras, Jeanette; Abdel-Azim, Hisham; Alter, Blanche P.; Anderlini, Paolo; Battiwalla, Minoo; Bierings, Marc; Buchbinder, David K.; Bonfim, Carmem; Camitta, Bruce M.; Fasth, Anders L.; Gale, Robert Peter; Lee, Michelle A.; Lund, Troy C.; Myers, Kasiani C.; Olsson, Richard F.; Page, Kristin M.; Prestidge, Tim D.; Radhi, Mohamed; Shah, Ami J.; Schultz, Kirk R.; Wirk, Baldeep; Wagner, John E.; Deeg, H. Joachim
2015-01-01
Second allogeneic hematopoietic cell transplantation (HCT) is the only salvage option for those for develop graft failure after their first HCT. Data on outcomes after second HCT in Fanconi anemia (FA) are scarce. We report outcomes after second allogeneic HCT for FA (n=81). The indication for second HCT was graft failure after the first HCT. Transplants occurred between 1990 and 2012. The timing of second transplantation predicted subsequent graft failure and survival. Graft failure was high when the second transplant occurred less than 3 months from the first. The 3-month probability of graft failure was 69% when the interval between first and second transplant was less than 3 months compared to 23% when the interval was longer (p<0.001). Consequently, survival rates were substantially lower when the interval between first and second transplant was less than 3 months, 23% at 1-year compared to 58%, when the interval was longer (p=0.001). The corresponding 5-year probabilities of survival were 16% and 45%, respectively (p=0.006). Taken together, these data suggest that fewer than half of FA patients undergoing a second HCT for graft failure are long-term survivors. There is an urgent need to develop strategies to lower graft failure after first HCT. PMID:26116087
DOE Office of Scientific and Technical Information (OSTI.GOV)
Thomas, Ken D.; Quinn, Edward L.; Mauck, Jerry L.
The nuclear industry has been slow to incorporate digital sensor technology into nuclear plant designs due to concerns with digital qualification issues. However, the benefits of digital sensor technology for nuclear plant instrumentation are substantial in terms of accuracy and reliability. This paper, which refers to a final report issued in 2013, demonstrates these benefits in direct comparisons of digital and analog sensor applications. Improved accuracy results from the superior operating characteristics of digital sensors. These include improvements in sensor accuracy and drift and other related parameters which reduce total loop uncertainty and thereby increase safety and operating margins. Anmore » example instrument loop uncertainty calculation for a pressure sensor application is presented to illustrate these improvements. This is a side-by-side comparison of the instrument loop uncertainty for both an analog and a digital sensor in the same pressure measurement application. Similarly, improved sensor reliability is illustrated with a sample calculation for determining the probability of failure on demand, an industry standard reliability measure. This looks at equivalent analog and digital temperature sensors to draw the comparison. The results confirm substantial reliability improvement with the digital sensor, due in large part to ability to continuously monitor the health of a digital sensor such that problems can be immediately identified and corrected. This greatly reduces the likelihood of a latent failure condition of the sensor at the time of a design basis event. Notwithstanding the benefits of digital sensors, there are certain qualification issues that are inherent with digital technology and these are described in the report. One major qualification impediment for digital sensor implementation is software common cause failure (SCCF).« less
NASA Astrophysics Data System (ADS)
Jackson, Andrew
2015-07-01
On launch, one of Swarm's absolute scalar magnetometers (ASMs) failed to function, leaving an asymmetrical arrangement of redundant spares on different spacecrafts. A decision was required concerning the deployment of individual satellites into the low-orbit pair or the higher "lonely" orbit. I analyse the probabilities for successful operation of two of the science components of the Swarm mission in terms of a classical probabilistic failure analysis, with a view to concluding a favourable assignment for the satellite with the single working ASM. I concentrate on the following two science aspects: the east-west gradiometer aspect of the lower pair of satellites and the constellation aspect, which requires a working ASM in each of the two orbital planes. I use the so-called "expert solicitation" probabilities for instrument failure solicited from Mission Advisory Group (MAG) members. My conclusion from the analysis is that it is better to have redundancy of ASMs in the lonely satellite orbit. Although the opposite scenario, having redundancy (and thus four ASMs) in the lower orbit, increases the chance of a working gradiometer late in the mission; it does so at the expense of a likely constellation. Although the results are presented based on actual MAG members' probabilities, the results are rather generic, excepting the case when the probability of individual ASM failure is very small; in this case, any arrangement will ensure a successful mission since there is essentially no failure expected at all. Since the very design of the lower pair is to enable common mode rejection of external signals, it is likely that its work can be successfully achieved during the first 5 years of the mission.
NASA Astrophysics Data System (ADS)
Iwakoshi, Takehisa; Hirota, Osamu
2014-10-01
This study will test an interpretation in quantum key distribution (QKD) that trace distance between the distributed quantum state and the ideal mixed state is a maximum failure probability of the protocol. Around 2004, this interpretation was proposed and standardized to satisfy both of the key uniformity in the context of universal composability and operational meaning of the failure probability of the key extraction. However, this proposal has not been verified concretely yet for many years while H. P. Yuen and O. Hirota have thrown doubt on this interpretation since 2009. To ascertain this interpretation, a physical random number generator was employed to evaluate key uniformity in QKD. In this way, we calculated statistical distance which correspond to trace distance in quantum theory after a quantum measurement is done, then we compared it with the failure probability whether universal composability was obtained. As a result, the degree of statistical distance of the probability distribution of the physical random numbers and the ideal uniformity was very large. It is also explained why trace distance is not suitable to guarantee the security in QKD from the view point of quantum binary decision theory.
Cycles till failure of silver-zinc cells with competing failure modes - Preliminary data analysis
NASA Technical Reports Server (NTRS)
Sidik, S. M.; Leibecki, H. F.; Bozek, J. M.
1980-01-01
The data analysis of cycles to failure of silver-zinc electrochemical cells with competing failure modes is presented. The test ran 129 cells through charge-discharge cycles until failure; preliminary data analysis consisted of response surface estimate of life. Batteries fail through low voltage condition and an internal shorting condition; a competing failure modes analysis was made using maximum likelihood estimation for the extreme value life distribution. Extensive residual plotting and probability plotting were used to verify data quality and selection of model.
NASA Technical Reports Server (NTRS)
Lovejoy, Andrew E.; Jegley, Dawn C. (Technical Monitor)
2007-01-01
Structures often comprise smaller substructures that are connected to each other or attached to the ground by a set of finite connections. Under static loading one or more of these connections may exceed allowable limits and be deemed to fail. Of particular interest is the structural response when a connection is severed (failed) while the structure is under static load. A transient failure analysis procedure was developed by which it is possible to examine the dynamic effects that result from introducing a discrete failure while a structure is under static load. The failure is introduced by replacing a connection load history by a time-dependent load set that removes the connection load at the time of failure. The subsequent transient response is examined to determine the importance of the dynamic effects by comparing the structural response with the appropriate allowables. Additionally, this procedure utilizes a standard finite element transient analysis that is readily available in most commercial software, permitting the study of dynamic failures without the need to purchase software specifically for this purpose. The procedure is developed and explained, demonstrated on a simple cantilever box example, and finally demonstrated on a real-world example, the American Airlines Flight 587 (AA587) vertical tail plane (VTP).
Fatigue Failure of External Hexagon Connections on Cemented Implant-Supported Crowns.
Malta Barbosa, João; Navarro da Rocha, Daniel; Hirata, Ronaldo; Freitas, Gileade; Bonfante, Estevam A; Coelho, Paulo G
2018-01-17
To evaluate the probability of survival and failure modes of different external hexagon connection systems restored with anterior cement-retained single-unit crowns. The postulated null hypothesis was that there would be no differences under accelerated life testing. Fifty-four external hexagon dental implants (∼4 mm diameter) were used for single cement-retained crown replacement and divided into 3 groups: (3i) Full OSSEOTITE, Biomet 3i (n = 18); (OL) OEX P4, Osseolife Implants (n = 18); and (IL) Unihex, Intra-Lock International (n = 18). Abutments were torqued to the implants, and maxillary central incisor crowns were cemented and subjected to step-stress-accelerated life testing in water. Use-level probability Weibull curves and probability of survival for a mission of 100,000 cycles at 200 N (95% 2-sided confidence intervals) were calculated. Stereo and scanning electron microscopes were used for failure inspection. The beta values for 3i, OL, and IL (1.60, 1.69, and 1.23, respectively) indicated that fatigue accelerated the failure of the 3 groups. Reliability for the 3i and OL (41% and 68%, respectively) was not different between each other, but both were significantly lower than IL group (98%). Abutment screw fracture was the failure mode consistently observed in all groups. Because the reliability was significantly different between the 3 groups, our postulated null hypothesis was rejected.
In the right order of brush strokes: a sketch of a software philosophy retrospective.
Pyshkin, Evgeny
2014-01-01
This paper follows a discourse on software recognized as a product of art and human creativity progressing probably for as long as software exists. A retrospective view on computer science and software philosophy development is introduced. In so doing we discover parallels between software and various branches of human creative manifestations. Aesthetic properties and mutual dependency of the form and matter of art works are examined in their application to software programs. While exploring some philosophical and even artistic reflection on software we consider extended comprehension of technical sciences of programming and software engineering within the realm of liberal arts.
NASA Astrophysics Data System (ADS)
Iskandar, I.
2018-03-01
The exponential distribution is the most widely used reliability analysis. This distribution is very suitable for representing the lengths of life of many cases and is available in a simple statistical form. The characteristic of this distribution is a constant hazard rate. The exponential distribution is the lower rank of the Weibull distributions. In this paper our effort is to introduce the basic notions that constitute an exponential competing risks model in reliability analysis using Bayesian analysis approach and presenting their analytic methods. The cases are limited to the models with independent causes of failure. A non-informative prior distribution is used in our analysis. This model describes the likelihood function and follows with the description of the posterior function and the estimations of the point, interval, hazard function, and reliability. The net probability of failure if only one specific risk is present, crude probability of failure due to a specific risk in the presence of other causes, and partial crude probabilities are also included.
Security Threat Assessment of an Internet Security System Using Attack Tree and Vague Sets
2014-01-01
Security threat assessment of the Internet security system has become a greater concern in recent years because of the progress and diversification of information technology. Traditionally, the failure probabilities of bottom events of an Internet security system are treated as exact values when the failure probability of the entire system is estimated. However, security threat assessment when the malfunction data of the system's elementary event are incomplete—the traditional approach for calculating reliability—is no longer applicable. Moreover, it does not consider the failure probability of the bottom events suffered in the attack, which may bias conclusions. In order to effectively solve the problem above, this paper proposes a novel technique, integrating attack tree and vague sets for security threat assessment. For verification of the proposed approach, a numerical example of an Internet security system security threat assessment is adopted in this paper. The result of the proposed method is compared with the listing approaches of security threat assessment methods. PMID:25405226
Cascading failures with local load redistribution in interdependent Watts-Strogatz networks
NASA Astrophysics Data System (ADS)
Hong, Chen; Zhang, Jun; Du, Wen-Bo; Sallan, Jose Maria; Lordan, Oriol
2016-05-01
Cascading failures of loads in isolated networks have been studied extensively over the last decade. Since 2010, such research has extended to interdependent networks. In this paper, we study cascading failures with local load redistribution in interdependent Watts-Strogatz (WS) networks. The effects of rewiring probability and coupling strength on the resilience of interdependent WS networks have been extensively investigated. It has been found that, for small values of the tolerance parameter, interdependent networks are more vulnerable as rewiring probability increases. For larger values of the tolerance parameter, the robustness of interdependent networks firstly decreases and then increases as rewiring probability increases. Coupling strength has a different impact on robustness. For low values of coupling strength, the resilience of interdependent networks decreases with the increment of the coupling strength until it reaches a certain threshold value. For values of coupling strength above this threshold, the opposite effect is observed. Our results are helpful to understand and design resilient interdependent networks.
Sequential experimental design based generalised ANOVA
NASA Astrophysics Data System (ADS)
Chakraborty, Souvik; Chowdhury, Rajib
2016-07-01
Over the last decade, surrogate modelling technique has gained wide popularity in the field of uncertainty quantification, optimization, model exploration and sensitivity analysis. This approach relies on experimental design to generate training points and regression/interpolation for generating the surrogate. In this work, it is argued that conventional experimental design may render a surrogate model inefficient. In order to address this issue, this paper presents a novel distribution adaptive sequential experimental design (DA-SED). The proposed DA-SED has been coupled with a variant of generalised analysis of variance (G-ANOVA), developed by representing the component function using the generalised polynomial chaos expansion. Moreover, generalised analytical expressions for calculating the first two statistical moments of the response, which are utilized in predicting the probability of failure, have also been developed. The proposed approach has been utilized in predicting probability of failure of three structural mechanics problems. It is observed that the proposed approach yields accurate and computationally efficient estimate of the failure probability.
Sequential experimental design based generalised ANOVA
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chakraborty, Souvik, E-mail: csouvik41@gmail.com; Chowdhury, Rajib, E-mail: rajibfce@iitr.ac.in
Over the last decade, surrogate modelling technique has gained wide popularity in the field of uncertainty quantification, optimization, model exploration and sensitivity analysis. This approach relies on experimental design to generate training points and regression/interpolation for generating the surrogate. In this work, it is argued that conventional experimental design may render a surrogate model inefficient. In order to address this issue, this paper presents a novel distribution adaptive sequential experimental design (DA-SED). The proposed DA-SED has been coupled with a variant of generalised analysis of variance (G-ANOVA), developed by representing the component function using the generalised polynomial chaos expansion. Moreover,more » generalised analytical expressions for calculating the first two statistical moments of the response, which are utilized in predicting the probability of failure, have also been developed. The proposed approach has been utilized in predicting probability of failure of three structural mechanics problems. It is observed that the proposed approach yields accurate and computationally efficient estimate of the failure probability.« less
Security threat assessment of an Internet security system using attack tree and vague sets.
Chang, Kuei-Hu
2014-01-01
Security threat assessment of the Internet security system has become a greater concern in recent years because of the progress and diversification of information technology. Traditionally, the failure probabilities of bottom events of an Internet security system are treated as exact values when the failure probability of the entire system is estimated. However, security threat assessment when the malfunction data of the system's elementary event are incomplete--the traditional approach for calculating reliability--is no longer applicable. Moreover, it does not consider the failure probability of the bottom events suffered in the attack, which may bias conclusions. In order to effectively solve the problem above, this paper proposes a novel technique, integrating attack tree and vague sets for security threat assessment. For verification of the proposed approach, a numerical example of an Internet security system security threat assessment is adopted in this paper. The result of the proposed method is compared with the listing approaches of security threat assessment methods.
Predicting Quarantine Failure Rates
2004-01-01
Preemptive quarantine through contact-tracing effectively controls emerging infectious diseases. Occasionally this quarantine fails, however, and infected persons are released. The probability of quarantine failure is typically estimated from disease-specific data. Here a simple, exact estimate of the failure rate is derived that does not depend on disease-specific parameters. This estimate is universally applicable to all infectious diseases. PMID:15109418
Markov chains for testing redundant software
NASA Technical Reports Server (NTRS)
White, Allan L.; Sjogren, Jon A.
1988-01-01
A preliminary design for a validation experiment has been developed that addresses several problems unique to assuring the extremely high quality of multiple-version programs in process-control software. The procedure uses Markov chains to model the error states of the multiple version programs. The programs are observed during simulated process-control testing, and estimates are obtained for the transition probabilities between the states of the Markov chain. The experimental Markov chain model is then expanded into a reliability model that takes into account the inertia of the system being controlled. The reliability of the multiple version software is computed from this reliability model at a given confidence level using confidence intervals obtained for the transition probabilities during the experiment. An example demonstrating the method is provided.
Sharland, Michael J; Waring, Stephen C; Johnson, Brian P; Taran, Allise M; Rusin, Travis A; Pattock, Andrew M; Palcher, Jeanette A
2018-01-01
Assessing test performance validity is a standard clinical practice and although studies have examined the utility of cognitive/memory measures, few have examined attention measures as indicators of performance validity beyond the Reliable Digit Span. The current study further investigates the classification probability of embedded Performance Validity Tests (PVTs) within the Brief Test of Attention (BTA) and the Conners' Continuous Performance Test (CPT-II), in a large clinical sample. This was a retrospective study of 615 patients consecutively referred for comprehensive outpatient neuropsychological evaluation. Non-credible performance was defined two ways: failure on one or more PVTs and failure on two or more PVTs. Classification probability of the BTA and CPT-II into non-credible groups was assessed. Sensitivity, specificity, positive predictive value, and negative predictive value were derived to identify clinically relevant cut-off scores. When using failure on two or more PVTs as the indicator for non-credible responding compared to failure on one or more PVTs, highest classification probability, or area under the curve (AUC), was achieved by the BTA (AUC = .87 vs. .79). CPT-II Omission, Commission, and Total Errors exhibited higher classification probability as well. Overall, these findings corroborate previous findings, extending them to a large clinical sample. BTA and CPT-II are useful embedded performance validity indicators within a clinical battery but should not be used in isolation without other performance validity indicators.
Progressive failure on the North Anatolian fault since 1939 by earthquake stress triggering
Stein, R.S.; Barka, A.A.; Dieterich, J.H.
1997-01-01
10 M ??? 6.7 earthquakes ruptured 1000 km of the North Anatolian fault (Turkey) during 1939-1992, providing an unsurpassed opportunity to study how one large shock sets up the next. We use the mapped surface slip and fault geometry to infer the transfer of stress throughout the sequence. Calculations of the change in Coulomb failure stress reveal that nine out of 10 ruptures were brought closer to failure by the preceding shocks, typically by 1-10 bar, equivalent to 3-30 years of secular stressing. We translate the calculated stress changes into earthquake probability gains using an earthquake-nucleation constitutive relation, which includes both permanent and transient effects of the sudden stress changes. The transient effects of the stress changes dominate during the mean 10 yr period between triggering and subsequent rupturing shocks in the Anatolia sequence. The stress changes result in an average three-fold gain in the net earthquake probability during the decade after each event. Stress is calculated to be high today at several isolated sites along the fault. During the next 30 years, we estimate a 15 per cent probability of a M ??? 6.7 earthquake east of the major eastern centre of Ercinzan, and a 12 per cent probability for a large event south of the major western port city of Izmit. Such stress-based probability calculations may thus be useful to assess and update earthquake hazards elsewhere.
Base pair probability estimates improve the prediction accuracy of RNA non-canonical base pairs
2017-01-01
Prediction of RNA tertiary structure from sequence is an important problem, but generating accurate structure models for even short sequences remains difficult. Predictions of RNA tertiary structure tend to be least accurate in loop regions, where non-canonical pairs are important for determining the details of structure. Non-canonical pairs can be predicted using a knowledge-based model of structure that scores nucleotide cyclic motifs, or NCMs. In this work, a partition function algorithm is introduced that allows the estimation of base pairing probabilities for both canonical and non-canonical interactions. Pairs that are predicted to be probable are more likely to be found in the true structure than pairs of lower probability. Pair probability estimates can be further improved by predicting the structure conserved across multiple homologous sequences using the TurboFold algorithm. These pairing probabilities, used in concert with prior knowledge of the canonical secondary structure, allow accurate inference of non-canonical pairs, an important step towards accurate prediction of the full tertiary structure. Software to predict non-canonical base pairs and pairing probabilities is now provided as part of the RNAstructure software package. PMID:29107980
A Mathematics Software Database Update.
ERIC Educational Resources Information Center
Cunningham, R. S.; Smith, David A.
1987-01-01
Contains an update of an earlier listing of software for mathematics instruction at the college level. Topics are: advanced mathematics, algebra, calculus, differential equations, discrete mathematics, equation solving, general mathematics, geometry, linear and matrix algebra, logic, statistics and probability, and trigonometry. (PK)
The implementation and use of Ada on distributed systems with high reliability requirements
NASA Technical Reports Server (NTRS)
Knight, J. C.
1984-01-01
The use and implementation of Ada in distributed environments in which reliability is the primary concern is investigated. Emphasis is placed on the possibility that a distributed system may be programmed entirely in ADA so that the individual tasks of the system are unconcerned with which processors they are executing on, and that failures may occur in the software or underlying hardware. The primary activities are: (1) Continued development and testing of our fault-tolerant Ada testbed; (2) consideration of desirable language changes to allow Ada to provide useful semantics for failure; (3) analysis of the inadequacies of existing software fault tolerance strategies.
Reliability analysis based on the losses from failures.
Todinov, M T
2006-04-01
The conventional reliability analysis is based on the premise that increasing the reliability of a system will decrease the losses from failures. On the basis of counterexamples, it is demonstrated that this is valid only if all failures are associated with the same losses. In case of failures associated with different losses, a system with larger reliability is not necessarily characterized by smaller losses from failures. Consequently, a theoretical framework and models are proposed for a reliability analysis, linking reliability and the losses from failures. Equations related to the distributions of the potential losses from failure have been derived. It is argued that the classical risk equation only estimates the average value of the potential losses from failure and does not provide insight into the variability associated with the potential losses. Equations have also been derived for determining the potential and the expected losses from failures for nonrepairable and repairable systems with components arranged in series, with arbitrary life distributions. The equations are also valid for systems/components with multiple mutually exclusive failure modes. The expected losses given failure is a linear combination of the expected losses from failure associated with the separate failure modes scaled by the conditional probabilities with which the failure modes initiate failure. On this basis, an efficient method for simplifying complex reliability block diagrams has been developed. Branches of components arranged in series whose failures are mutually exclusive can be reduced to single components with equivalent hazard rate, downtime, and expected costs associated with intervention and repair. A model for estimating the expected losses from early-life failures has also been developed. For a specified time interval, the expected losses from early-life failures are a sum of the products of the expected number of failures in the specified time intervals covering the early-life failures region and the expected losses given failure characterizing the corresponding time intervals. For complex systems whose components are not logically arranged in series, discrete simulation algorithms and software have been created for determining the losses from failures in terms of expected lost production time, cost of intervention, and cost of replacement. Different system topologies are assessed to determine the effect of modifications of the system topology on the expected losses from failures. It is argued that the reliability allocation in a production system should be done to maximize the profit/value associated with the system. Consequently, a method for setting reliability requirements and reliability allocation maximizing the profit by minimizing the total cost has been developed. Reliability allocation that maximizes the profit in case of a system consisting of blocks arranged in series is achieved by determining for each block individually the reliabilities of the components in the block that minimize the sum of the capital, operation costs, and the expected losses from failures. A Monte Carlo simulation based net present value (NPV) cash-flow model has also been proposed, which has significant advantages to cash-flow models based on the expected value of the losses from failures per time interval. Unlike these models, the proposed model has the capability to reveal the variation of the NPV due to different number of failures occurring during a specified time interval (e.g., during one year). The model also permits tracking the impact of the distribution pattern of failure occurrences and the time dependence of the losses from failures.
NASA Astrophysics Data System (ADS)
Zhong, Yaoquan; Guo, Wei; Jin, Yaohui; Sun, Weiqiang; Hu, Weisheng
2010-12-01
A cost-effective and service-differentiated provisioning strategy is very desirable to service providers so that they can offer users satisfactory services, while optimizing network resource allocation. Providing differentiated protection services to connections for surviving link failure has been extensively studied in recent years. However, the differentiated protection services for workflow-based applications, which consist of many interdependent tasks, have scarcely been studied. This paper investigates the problem of providing differentiated services for workflow-based applications in optical grid. In this paper, we develop three differentiated protection services provisioning strategies which can provide security level guarantee and network-resource optimization for workflow-based applications. The simulation demonstrates that these heuristic algorithms provide protection cost-effectively while satisfying the applications' failure probability requirements.
Real-time sensor data validation
NASA Technical Reports Server (NTRS)
Bickmore, Timothy W.
1994-01-01
This report describes the status of an on-going effort to develop software capable of detecting sensor failures on rocket engines in real time. This software could be used in a rocket engine controller to prevent the erroneous shutdown of an engine due to sensor failures which would otherwise be interpreted as engine failures by the control software. The approach taken combines analytical redundancy with Bayesian belief networks to provide a solution which has well defined real-time characteristics and well-defined error rates. Analytical redundancy is a technique in which a sensor's value is predicted by using values from other sensors and known or empirically derived mathematical relations. A set of sensors and a set of relations among them form a network of cross-checks which can be used to periodically validate all of the sensors in the network. Bayesian belief networks provide a method of determining if each of the sensors in the network is valid, given the results of the cross-checks. This approach has been successfully demonstrated on the Technology Test Bed Engine at the NASA Marshall Space Flight Center. Current efforts are focused on extending the system to provide a validation capability for 100 sensors on the Space Shuttle Main Engine.
NASA Technical Reports Server (NTRS)
Quintana, Rolando
2003-01-01
The goal of this research was to integrate a previously validated and reliable safety model, called Continuous Hazard Tracking and Failure Prediction Methodology (CHTFPM), into a software application. This led to the development of a safety management information system (PSMIS). This means that the theory or principles of the CHTFPM were incorporated in a software package; hence, the PSMIS is referred to as CHTFPM management information system (CHTFPM MIS). The purpose of the PSMIS is to reduce the time and manpower required to perform predictive studies as well as to facilitate the handling of enormous quantities of information in this type of studies. The CHTFPM theory encompasses the philosophy of looking at the concept of safety engineering from a new perspective: from a proactive, than a reactive, viewpoint. That is, corrective measures are taken before a problem instead of after it happened. That is why the CHTFPM is a predictive safety because it foresees or anticipates accidents, system failures and unacceptable risks; therefore, corrective action can be taken in order to prevent all these unwanted issues. Consequently, safety and reliability of systems or processes can be further improved by taking proactive and timely corrective actions.
Operationalizing Cyberspace for Today’s Combat Air Force
2010-04-01
rootkit techniques to run inside common Windows services (sometimes bundled with fake antivirus software ) or in Windows safe mode, and it can hide...has shifted to downloading other malware, with its main focus on fake alerts and rogue antivirus software . 5. TR/Dldr.Agent.JKH - Compromised U.S...patch, software update, or security breech away from failure. In short, what works AU/ACSC/SIMMONS/AY10 5 today, may not work tomorrow; this fact
Patel, Teresa; Fisher, Stanley P.
2016-01-01
Objective This study aimed to utilize failure modes and effects analysis (FMEA) to transform clinical insights into a risk mitigation plan for intrathecal (IT) drug delivery in pain management. Methods The FMEA methodology, which has been used for quality improvement, was adapted to assess risks (i.e., failure modes) associated with IT therapy. Ten experienced pain physicians scored 37 failure modes in the following categories: patient selection for therapy initiation (efficacy and safety concerns), patient safety during IT therapy, and product selection for IT therapy. Participants assigned severity, probability, and detection scores for each failure mode, from which a risk priority number (RPN) was calculated. Failure modes with the highest RPNs (i.e., most problematic) were discussed, and strategies were proposed to mitigate risks. Results Strategic discussions focused on 17 failure modes with the most severe outcomes, the highest probabilities of occurrence, and the most challenging detection. The topic of the highest‐ranked failure mode (RPN = 144) was manufactured monotherapy versus compounded combination products. Addressing failure modes associated with appropriate patient and product selection was predicted to be clinically important for the success of IT therapy. Conclusions The methodology of FMEA offers a systematic approach to prioritizing risks in a complex environment such as IT therapy. Unmet needs and information gaps are highlighted through the process. Risk mitigation and strategic planning to prevent and manage critical failure modes can contribute to therapeutic success. PMID:27477689
Software verification plan for GCS. [guidance and control software
NASA Technical Reports Server (NTRS)
Dent, Leslie A.; Shagnea, Anita M.; Hayhurst, Kelly J.
1990-01-01
This verification plan is written as part of an experiment designed to study the fundamental characteristics of the software failure process. The experiment will be conducted using several implementations of software that were produced according to industry-standard guidelines, namely the Radio Technical Commission for Aeronautics RTCA/DO-178A guidelines, Software Consideration in Airborne Systems and Equipment Certification, for the development of flight software. This plan fulfills the DO-178A requirements for providing instructions on the testing of each implementation of software. The plan details the verification activities to be performed at each phase in the development process, contains a step by step description of the testing procedures, and discusses all of the tools used throughout the verification process.
Software For Fault-Tree Diagnosis Of A System
NASA Technical Reports Server (NTRS)
Iverson, Dave; Patterson-Hine, Ann; Liao, Jack
1993-01-01
Fault Tree Diagnosis System (FTDS) computer program is automated-diagnostic-system program identifying likely causes of specified failure on basis of information represented in system-reliability mathematical models known as fault trees. Is modified implementation of failure-cause-identification phase of Narayanan's and Viswanadham's methodology for acquisition of knowledge and reasoning in analyzing failures of systems. Knowledge base of if/then rules replaced with object-oriented fault-tree representation. Enhancement yields more-efficient identification of causes of failures and enables dynamic updating of knowledge base. Written in C language, C++, and Common LISP.
The Identification of Software Failure Regions
1990-06-01
be used to detect non-obviously redundant test cases. A preliminary examination of the manual analysis method is performed with a set of programs ...failure regions are defined and a method of failure region analysis is described in detail. The thesis describes how this analysis may be used to detect...is the termination of the ability of a functional unit to perform its required function. (Glossary, 1983) The presence of faults in program code
More About Software for No-Loss Computing
NASA Technical Reports Server (NTRS)
Edmonds, Iarina
2007-01-01
A document presents some additional information on the subject matter of "Integrated Hardware and Software for No- Loss Computing" (NPO-42554), which appears elsewhere in this issue of NASA Tech Briefs. To recapitulate: The hardware and software designs of a developmental parallel computing system are integrated to effectuate a concept of no-loss computing (NLC). The system is designed to reconfigure an application program such that it can be monitored in real time and further reconfigured to continue a computation in the event of failure of one of the computers. The design provides for (1) a distributed class of NLC computation agents, denoted introspection agents, that effects hierarchical detection of anomalies; (2) enhancement of the compiler of the parallel computing system to cause generation of state vectors that can be used to continue a computation in the event of a failure; and (3) activation of a recovery component when an anomaly is detected.
Browndye: A Software Package for Brownian Dynamics
McCammon, J. Andrew
2010-01-01
A new software package, Browndye, is presented for simulating the diffusional encounter of two large biological molecules. It can be used to estimate second-order rate constants and encounter probabilities, and to explore reaction trajectories. Browndye builds upon previous knowledge and algorithms from software packages such as UHBD, SDA, and Macrodox, while implementing algorithms that scale to larger systems. PMID:21132109
Software requirements for the study of contextual classifiers and label imperfections
NASA Technical Reports Server (NTRS)
Chittineni, C. B.
1979-01-01
The software requirements for the study of contextual classifiers and imperfections in the labels are presented. In particular, the requirements are described for updating the posteriori probability of the picture element under consideration using information from its local neighborhood, designing the Fisher classifier, and other required routines. Only the necessary equations are given for the development of software.
Reliability and availability analysis of a 10 kW@20 K helium refrigerator
NASA Astrophysics Data System (ADS)
Li, J.; Xiong, L. Y.; Liu, L. Q.; Wang, H. R.; Wang, B. M.
2017-02-01
A 10 kW@20 K helium refrigerator has been established in the Technical Institute of Physics and Chemistry, Chinese Academy of Sciences. To evaluate and improve this refrigerator’s reliability and availability, a reliability and availability analysis is performed. According to the mission profile of this refrigerator, a functional analysis is performed. The failure data of the refrigerator components are collected and failure rate distributions are fitted by software Weibull++ V10.0. A Failure Modes, Effects & Criticality Analysis (FMECA) is performed and the critical components with higher risks are pointed out. Software BlockSim V9.0 is used to calculate the reliability and the availability of this refrigerator. The result indicates that compressors, turbine and vacuum pump are the critical components and the key units of this refrigerator. The mitigation actions with respect to design, testing, maintenance and operation are proposed to decrease those major and medium risks.
ERIC Educational Resources Information Center
Radakovic, Nenad; McDougall, Douglas
2012-01-01
This classroom note illustrates how dynamic visualization can be used to teach conditional probability and Bayes' theorem. There are two features of the visualization that make it an ideal pedagogical tool in probability instruction. The first feature is the use of area-proportional Venn diagrams that, along with showing qualitative relationships,…
A Numerical Round Robin for the Reliability Prediction of Structural Ceramics
NASA Technical Reports Server (NTRS)
Powers, Lynn M.; Janosik, Lesley A.
1993-01-01
A round robin has been conducted on integrated fast fracture design programs for brittle materials. An informal working group (WELFEP-WEakest Link failure probability prediction by Finite Element Postprocessors) was formed to discuss and evaluate the implementation of the programs examined in the study. Results from the study have provided insight on the differences between the various programs examined. Conclusions from the study have shown that when brittle materials are used in design, analysis must understand how to apply the concepts presented herein to failure probability analysis.
ERIC Educational Resources Information Center
Adnan, Nor Hafizah; Ritzhaupt, Albert D.
2018-01-01
The failure of many instructional design initiatives is often attributed to poor instructional design. Current instructional design models do not provide much insight into design processes for creating e-learning instructional solutions. Given the similarities between the fields of instructional design and software engineering, instructional…
A Constrained and Guided Approach for Managing Software Engineering Course Projects
ERIC Educational Resources Information Center
Cheng, Y.-P.; Lin, J. M.-C.
2010-01-01
This paper documents several years of experimentation with a new approach to organizing and managing projects in a software engineering course. The initial failure and subsequent refinements that the new approach has been through since 2004 are described herein. The "constrained and guided" approach, as it is called, has helped to reduce…
NASA Astrophysics Data System (ADS)
Gurov, V. V.
2017-01-01
Software tools for educational purposes, such as e-lessons, computer-based testing system, from the point of view of reliability, have a number of features. The main ones among them are the need to ensure a sufficiently high probability of their faultless operation for a specified time, as well as the impossibility of their rapid recovery by the way of replacing it with a similar running program during the classes. The article considers the peculiarities of reliability evaluation of programs in contrast to assessments of hardware reliability. The basic requirements to reliability of software used for carrying out practical and laboratory classes in the form of computer-based training programs are given. The essential requirements applicable to the reliability of software used for conducting the practical and laboratory studies in the form of computer-based teaching programs are also described. The mathematical tool based on Markov chains, which allows to determine the degree of debugging of the training program for use in the educational process by means of applying the graph of the software modules interaction, is presented.
Effects of pressure angle and tip relief on the life of speed increasing gearbox: a case study.
Shanmugasundaram, Sankar; Kumaresan, Manivarma; Muthusamy, Nataraj
2014-01-01
This paper examines failure of helical gear in speed increasing gearbox used in the wind turbine generator (WTG). In addition, an attempt has been made to get suitable gear micro-geometry such as pressure angle and tip relief to minimize the gear failure in the wind turbines. As the gear trains in the wind turbine gearbox is prearranged with higher speed ratio and the gearboxes experience shock load due to atmospheric turbulence, gust wind speed, non-synchronization of pitching, frequent grid drops and failure of braking, the gear failure occurs either in the intermediate or high speed stage pinion. KISS soft gear calculation software was used to determine the gear specifications and analysis is carried out in ANSYS software version.11.0 for the existing and the proposed gear to evaluate the performance of bending stress tooth deflection and stiffness. The main objective of this research study is to propose suitable gear micro-geometry that is tip relief and pressure angle blend for increasing tooth strength of the helical gear used in the wind turbine for trouble free operation.
Robotic and Human-Tended Collaborative Drilling Automation for Subsurface Exploration
NASA Technical Reports Server (NTRS)
Glass, Brian; Cannon, Howard; Stoker, Carol; Davis, Kiel
2005-01-01
Future in-situ lunar/martian resource utilization and characterization, as well as the scientific search for life on Mars, will require access to the subsurface and hence drilling. Drilling on Earth is hard - an art form more than an engineering discipline. Human operators listen and feel drill string vibrations coming from kilometers underground. Abundant mass and energy make it possible for terrestrial drilling to employ brute-force approaches to failure recovery and system performance issues. Space drilling will require intelligent and autonomous systems for robotic exploration and to support human exploration. Eventual in-situ resource utilization will require deep drilling with probable human-tended operation of large-bore drills, but initial lunar subsurface exploration and near-term ISRU will be accomplished with lightweight, rover-deployable or standalone drills capable of penetrating a few tens of meters in depth. These lightweight exploration drills have a direct counterpart in terrestrial prospecting and ore-body location, and will be designed to operate either human-tended or automated. NASA and industry now are acquiring experience in developing and building low-mass automated planetary prototype drills to design and build a pre-flight lunar prototype targeted for 2011-12 flight opportunities. A successful system will include development of drilling hardware, and automated control software to operate it safely and effectively. This includes control of the drilling hardware, state estimation of both the hardware and the lithography being drilled and state of the hole, and potentially planning and scheduling software suitable for uncertain situations such as drilling. Given that Humans on the Moon or Mars are unlikely to be able to spend protracted EVA periods at a drill site, both human-tended and robotic access to planetary subsurfaces will require some degree of standalone, autonomous drilling capability. Human-robotic coordination will be important, either between a robotic drill and humans on Earth, or a human-tended drill and its visiting crew. The Mars Analog Rio Tinto Experiment (MARTE) is a current project that studies and simulates the remote science operations between an automated drill in Spain and a distant, distributed human science team. The Drilling Automation for Mars Exploration (DAME) project, by contrast: is developing and testing standalone automation at a lunar/martian impact crater analog site in Arctic Canada. The drill hardware in both projects is a hardened, evolved version of the Advanced Deep Drill (ADD) developed by Honeybee Robotics for the Mars Subsurface Program. The current ADD is capable of 20m, and the DAME project is developing diagnostic and executive software for hands-off surface operations of the evolved version of this drill. The current drill automation architecture being developed by NASA and tested in 2004-06 at analog sites in the Arctic and Spain will add downhole diagnosis of different strata, bit wear detection, and dynamic replanning capabilities when unexpected failures or drilling conditions are discovered in conjunction with simulated mission operations and remote science planning. The most important determinant of future 1unar and martian drilling automation and staffing requirements will be the actual performance of automated prototype drilling hardware systems in field trials in simulated mission operations. It is difficult to accurately predict the level of automation and human interaction that will be needed for a lunar-deployed drill without first having extensive experience with the robotic control of prototype drill systems under realistic analog field conditions. Drill-specific failure modes and software design flaws will become most apparent at this stage. DAME will develop and test drill automation software and hardware under stressful operating conditions during several planned field campaigns. Initial results from summer 2004 tests show seven identifi distinct failure modes of the drill: cuttings-removal issues with low-power drilling into permafrost, and successful steps at executive control and initial automation.
A Probability Problem from Real Life: The Tire Exploded.
ERIC Educational Resources Information Center
Bartlett, Albert A.
1993-01-01
Discusses the probability of seeing a tire explode or disintegrate while traveling down the highway. Suggests that a person observing 10 hours a day would see a failure on the average of once every 300 years. (MVL)
NASA Astrophysics Data System (ADS)
Gu, Jian; Lei, YongPing; Lin, Jian; Fu, HanGuang; Wu, Zhongwei
2017-02-01
The reliability of Sn-3.0Ag-0.5Cu (SAC 305) solder joint under a broad level of drop impacts was studied. The failure performance of solder joint, failure probability and failure position were analyzed under two shock test conditions, i.e., 1000 g for 1 ms and 300 g for 2 ms. The stress distribution on the solder joint was calculated by ABAQUS. The results revealed that the dominant reason was the tension due to the difference in stiffness between the print circuit board and ball grid array, and the maximum tension of 121.1 MPa and 31.1 MPa, respectively, under both 1000 g or 300 g drop impact, was focused on the corner of the solder joint which was located in the outmost corner of the solder ball row. The failure modes were summarized into the following four modes: initiation and propagation through the (1) intermetallic compound layer, (2) Ni layer, (3) Cu pad, or (4) Sn-matrix. The outmost corner of the solder ball row had a high failure probability under both 1000 g and 300 g drop impact. The number of failures of solder ball under the 300 g drop impact was higher than that under the 1000 g drop impact. The characteristic drop values for failure were 41 and 15,199, respectively, following the statistics.
NASA Astrophysics Data System (ADS)
Mbaya, Timmy
Embedded Aerospace Systems have to perform safety and mission critical operations in a real-time environment where timing and functional correctness are extremely important. Guidance, Navigation, and Control (GN&C) systems substantially rely on complex software interfacing with hardware in real-time; any faults in software or hardware, or their interaction could result in fatal consequences. Integrated Software Health Management (ISWHM) provides an approach for detection and diagnosis of software failures while the software is in operation. The ISWHM approach is based on probabilistic modeling of software and hardware sensors using a Bayesian network. To meet memory and timing constraints of real-time embedded execution, the Bayesian network is compiled into an Arithmetic Circuit, which is used for on-line monitoring. This type of system monitoring, using an ISWHM, provides automated reasoning capabilities that compute diagnoses in a timely manner when failures occur. This reasoning capability enables time-critical mitigating decisions and relieves the human agent from the time-consuming and arduous task of foraging through a multitude of isolated---and often contradictory---diagnosis data. For the purpose of demonstrating the relevance of ISWHM, modeling and reasoning is performed on a simple simulated aerospace system running on a real-time operating system emulator, the OSEK/Trampoline platform. Models for a small satellite and an F-16 fighter jet GN&C (Guidance, Navigation, and Control) system have been implemented. Analysis of the ISWHM is then performed by injecting faults and analyzing the ISWHM's diagnoses.
Tsauo, Jiaywei; Luo, Xuefeng; Ye, Linchao; Li, Xiao
2015-06-01
This study was designed to report our results with a modified technique of three-dimensional (3D) path planning software assisted transjugular intrahepatic portosystemic shunt (TIPS). 3D path planning software was recently developed to facilitate TIPS creation by using two carbon dioxide portograms acquired at least 20° apart to generate a 3D path for overlay needle guidance. However, one shortcoming is that puncturing along the overlay would be technically impossible if the angle of the liver access set and the angle of the 3D path are not the same. To solve this problem, a prototype 3D path planning software was fitted with a utility to calculate the angle of the 3D path. Using this, we modified the angle of the liver access set accordingly during the procedure in ten patients. Failure for technical reasons occurred in three patients (unsuccessful wedged hepatic venography in two cases, software technical failure in one case). The procedure was successful in the remaining seven patients, and only one needle pass was required to obtain portal vein access in each case. The course of puncture was comparable to the 3D path in all patients. No procedure-related complication occurred following the procedures. Adjusting the angle of the liver access set to match the angle of the 3D path determined by the software appears to be a favorable modification to the technique of 3D path planning software assisted TIPS.
NASA Astrophysics Data System (ADS)
Sorrentino, Valerio; Matasci, Battista; Abellan, Antonio; Jaboyedoff, Michel; Marino, Ermanno; Pignalosa, Antonio; Santo, Antonio
2016-04-01
Rockfalls and other types of landslides are the dominant processes causing a retreat of sea cliffs. The coastal areas constitute an important tourist attraction and a large number of people rest beneath the cliffs on a daily basis, considerably increasing the risk associated to rockfalls. We present an approach to assess rockfall susceptibility at the cliff scale based on terrestrial laser scanner (TLS) point clouds. The test area is a coastal cliff situated in the southern part of the Cilento (Centola Municipality, Campania Region), in which a natural arch was formed. This cliff is constituted by heavy fractured carbonate rock mass with a strong structural control. In June 2015 TLS data were acquired with long-range scanner RIEGL VZ1000®. The structural analysis of the cliff was performed in the field and using Coltop 3D software on the point cloud. As a result, 10 discontinuity sets (joint, faults and bedding planes) were individuated and the different characteristics such as orientation, spacing and persistence were measured. The kinematically unstable areas were highlighted using a script that computes an index of susceptibility to rockfalls based on the spatial distribution of failure mechanisms. The susceptibility index computation is based on the average surface that every joint set (or combinations of two joint sets in the case of wedge failure) forms on the topography according to its spacing, trace length, and incidence angle. This susceptibility index also depends on the steepness of the joint set (or of the intersection line in the case of wedge failure). As a result the most important discontinuity sets in terms of potential planar failure, wedge failure and toppling were individuated and an assessment of rockfall susceptibility at the cliff scale was achieved. Results show that the kinematically feasible failures are not equally distributed along the cliff but concentrated on certain areas. The most susceptible areas for planar failure are related to the discontinuity set K10 (71/097), whereas for toppling the highest susceptibility is reached with K1 (60/218). Concerning wedge failure, the combination of K10 and K1 yields the highest susceptibility values. It shows also clustering with higher density which is probably related to regional structures. More detailed investigations of the rockfall susceptibility and failure mechanisms will be performed during the forthcoming months. The relationship with regional structures will be also investigated in more detail. Perspectives also include using the methodology on the other side of the natural arch in order to provide a global susceptibility assessment of the area.
2014-01-01
Introduction Prolonged ventilation and failed extubation are associated with increased harm and cost. The added value of heart and respiratory rate variability (HRV and RRV) during spontaneous breathing trials (SBTs) to predict extubation failure remains unknown. Methods We enrolled 721 patients in a multicenter (12 sites), prospective, observational study, evaluating clinical estimates of risk of extubation failure, physiologic measures recorded during SBTs, HRV and RRV recorded before and during the last SBT prior to extubation, and extubation outcomes. We excluded 287 patients because of protocol or technical violations, or poor data quality. Measures of variability (97 HRV, 82 RRV) were calculated from electrocardiogram and capnography waveforms followed by automated cleaning and variability analysis using Continuous Individualized Multiorgan Variability Analysis (CIMVA™) software. Repeated randomized subsampling with training, validation, and testing were used to derive and compare predictive models. Results Of 434 patients with high-quality data, 51 (12%) failed extubation. Two HRV and eight RRV measures showed statistically significant association with extubation failure (P <0.0041, 5% false discovery rate). An ensemble average of five univariate logistic regression models using RRV during SBT, yielding a probability of extubation failure (called WAVE score), demonstrated optimal predictive capacity. With repeated random subsampling and testing, the model showed mean receiver operating characteristic area under the curve (ROC AUC) of 0.69, higher than heart rate (0.51), rapid shallow breathing index (RBSI; 0.61) and respiratory rate (0.63). After deriving a WAVE model based on all data, training-set performance demonstrated that the model increased its predictive power when applied to patients conventionally considered high risk: a WAVE score >0.5 in patients with RSBI >105 and perceived high risk of failure yielded a fold increase in risk of extubation failure of 3.0 (95% confidence interval (CI) 1.2 to 5.2) and 3.5 (95% CI 1.9 to 5.4), respectively. Conclusions Altered HRV and RRV (during the SBT prior to extubation) are significantly associated with extubation failure. A predictive model using RRV during the last SBT provided optimal accuracy of prediction in all patients, with improved accuracy when combined with clinical impression or RSBI. This model requires a validation cohort to evaluate accuracy and generalizability. Trial registration ClinicalTrials.gov NCT01237886. Registered 13 October 2010. PMID:24713049
Surrogate oracles, generalized dependency and simpler models
NASA Technical Reports Server (NTRS)
Wilson, Larry
1990-01-01
Software reliability models require the sequence of interfailure times from the debugging process as input. It was previously illustrated that using data from replicated debugging could greatly improve reliability predictions. However, inexpensive replication of the debugging process requires the existence of a cheap, fast error detector. Laboratory experiments can be designed around a gold version which is used as an oracle or around an n-version error detector. Unfortunately, software developers can not be expected to have an oracle or to bear the expense of n-versions. A generic technique is being investigated for approximating replicated data by using the partially debugged software as a difference detector. It is believed that the failure rate of each fault has significant dependence on the presence or absence of other faults. Thus, in order to discuss a failure rate for a known fault, the presence or absence of each of the other known faults needs to be specified. Also, in simpler models which use shorter input sequences without sacrificing accuracy are of interest. In fact, a possible gain in performance is conjectured. To investigate these propositions, NASA computers running LIC (RTI) versions are used to generate data. This data will be used to label the debugging graph associated with each version. These labeled graphs will be used to test the utility of a surrogate oracle, to analyze the dependent nature of fault failure rates and to explore the feasibility of reliability models which use the data of only the most recent failures.
Probabilistic framework for product design optimization and risk management
NASA Astrophysics Data System (ADS)
Keski-Rahkonen, J. K.
2018-05-01
Probabilistic methods have gradually gained ground within engineering practices but currently it is still the industry standard to use deterministic safety margin approaches to dimensioning components and qualitative methods to manage product risks. These methods are suitable for baseline design work but quantitative risk management and product reliability optimization require more advanced predictive approaches. Ample research has been published on how to predict failure probabilities for mechanical components and furthermore to optimize reliability through life cycle cost analysis. This paper reviews the literature for existing methods and tries to harness their best features and simplify the process to be applicable in practical engineering work. Recommended process applies Monte Carlo method on top of load-resistance models to estimate failure probabilities. Furthermore, it adds on existing literature by introducing a practical framework to use probabilistic models in quantitative risk management and product life cycle costs optimization. The main focus is on mechanical failure modes due to the well-developed methods used to predict these types of failures. However, the same framework can be applied on any type of failure mode as long as predictive models can be developed.
NESTEM-QRAS: A Tool for Estimating Probability of Failure
NASA Technical Reports Server (NTRS)
Patel, Bhogilal M.; Nagpal, Vinod K.; Lalli, Vincent A.; Pai, Shantaram; Rusick, Jeffrey J.
2002-01-01
An interface between two NASA GRC specialty codes, NESTEM and QRAS has been developed. This interface enables users to estimate, in advance, the risk of failure of a component, a subsystem, and/or a system under given operating conditions. This capability would be able to provide a needed input for estimating the success rate for any mission. NESTEM code, under development for the last 15 years at NASA Glenn Research Center, has the capability of estimating probability of failure of components under varying loading and environmental conditions. This code performs sensitivity analysis of all the input variables and provides their influence on the response variables in the form of cumulative distribution functions. QRAS, also developed by NASA, assesses risk of failure of a system or a mission based on the quantitative information provided by NESTEM or other similar codes, and user provided fault tree and modes of failure. This paper will describe briefly, the capabilities of the NESTEM, QRAS and the interface. Also, in this presentation we will describe stepwise process the interface uses using an example.
NESTEM-QRAS: A Tool for Estimating Probability of Failure
NASA Astrophysics Data System (ADS)
Patel, Bhogilal M.; Nagpal, Vinod K.; Lalli, Vincent A.; Pai, Shantaram; Rusick, Jeffrey J.
2002-10-01
An interface between two NASA GRC specialty codes, NESTEM and QRAS has been developed. This interface enables users to estimate, in advance, the risk of failure of a component, a subsystem, and/or a system under given operating conditions. This capability would be able to provide a needed input for estimating the success rate for any mission. NESTEM code, under development for the last 15 years at NASA Glenn Research Center, has the capability of estimating probability of failure of components under varying loading and environmental conditions. This code performs sensitivity analysis of all the input variables and provides their influence on the response variables in the form of cumulative distribution functions. QRAS, also developed by NASA, assesses risk of failure of a system or a mission based on the quantitative information provided by NESTEM or other similar codes, and user provided fault tree and modes of failure. This paper will describe briefly, the capabilities of the NESTEM, QRAS and the interface. Also, in this presentation we will describe stepwise process the interface uses using an example.
Assessing changes in failure probability of dams in a changing climate
NASA Astrophysics Data System (ADS)
Mallakpour, I.; AghaKouchak, A.; Moftakhari, H.; Ragno, E.
2017-12-01
Dams are crucial infrastructures and provide resilience against hydrometeorological extremes (e.g., droughts and floods). In 2017, California experienced series of flooding events terminating a 5-year drought, and leading to incidents such as structural failure of Oroville Dam's spillway. Because of large socioeconomic repercussions of such incidents, it is of paramount importance to evaluate dam failure risks associated with projected shifts in the streamflow regime. This becomes even more important as the current procedures for design of hydraulic structures (e.g., dams, bridges, spillways) are based on the so-called stationary assumption. Yet, changes in climate are anticipated to result in changes in statistics of river flow (e.g., more extreme floods) and possibly increasing the failure probability of already aging dams. Here, we examine changes in discharge under two representative concentration pathways (RCPs): RCP4.5 and RCP8.5. In this study, we used routed daily streamflow data from ten global climate models (GCMs) in order to investigate possible climate-induced changes in streamflow in northern California. Our results show that while the average flow does not show a significant change, extreme floods are projected to increase in the future. Using the extreme value theory, we estimate changes in the return periods of 50-year and 100-year floods in the current and future climates. Finally, we use the historical and future return periods to quantify changes in failure probability of dams in a warming climate.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chu, Tsong-Lun; Varuttamaseni, Athi; Baek, Joo-Seok
The U.S. Nuclear Regulatory Commission (NRC) encourages the use of probabilistic risk assessment (PRA) technology in all regulatory matters, to the extent supported by the state-of-the-art in PRA methods and data. Although much has been accomplished in the area of risk-informed regulation, risk assessment for digital systems has not been fully developed. The NRC established a plan for research on digital systems to identify and develop methods, analytical tools, and regulatory guidance for (1) including models of digital systems in the PRAs of nuclear power plants (NPPs), and (2) incorporating digital systems in the NRC's risk-informed licensing and oversight activities.more » Under NRC's sponsorship, Brookhaven National Laboratory (BNL) explored approaches for addressing the failures of digital instrumentation and control (I and C) systems in the current NPP PRA framework. Specific areas investigated included PRA modeling digital hardware, development of a philosophical basis for defining software failure, and identification of desirable attributes of quantitative software reliability methods. Based on the earlier research, statistical testing is considered a promising method for quantifying software reliability. This paper describes a statistical software testing approach for quantifying software reliability and applies it to the loop-operating control system (LOCS) of an experimental loop of the Advanced Test Reactor (ATR) at Idaho National Laboratory (INL).« less
Costa, Dorcas Lamounier; Rocha, Regina Lunardi; Chaves, Eldo de Brito Ferreira; Batista, Vivianny Gonçalves de Vasconcelos; Costa, Henrique Lamounier; Costa, Carlos Henrique Nery
2016-01-01
Early identification of patients at higher risk of progressing to severe disease and death is crucial for implementing therapeutic and preventive measures; this could reduce the morbidity and mortality from kala-azar. We describe a score set composed of four scales in addition to software for quick assessment of the probability of death from kala-azar at the point of care. Data from 883 patients diagnosed between September 2005 and August 2008 were used to derive the score set, and data from 1,031 patients diagnosed between September 2008 and November 2013 were used to validate the models. Stepwise logistic regression analyses were used to derive the optimal multivariate prediction models. Model performance was assessed by its discriminatory accuracy. A computational specialist system (Kala-Cal(r)) was developed to speed up the calculation of the probability of death based on clinical scores. The clinical prediction score showed high discrimination (area under the curve [AUC] 0.90) for distinguishing death from survival for children ≤2 years old. Performance improved after adding laboratory variables (AUC 0.93). The clinical score showed equivalent discrimination (AUC 0.89) for older children and adults, which also improved after including laboratory data (AUC 0.92). The score set also showed a high, although lower, discrimination when applied to the validation cohort. This score set and Kala-Cal(r) software may help identify individuals with the greatest probability of death. The associated software may speed up the calculation of the probability of death based on clinical scores and assist physicians in decision-making.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Powell, Danny H; Elwood Jr, Robert H
2011-01-01
Analysis of the material protection, control, and accountability (MPC&A) system is necessary to understand the limits and vulnerabilities of the system to internal threats. A self-appraisal helps the facility be prepared to respond to internal threats and reduce the risk of theft or diversion of nuclear material. The material control and accountability (MC&A) system effectiveness tool (MSET) fault tree was developed to depict the failure of the MPC&A system as a result of poor practices and random failures in the MC&A system. It can also be employed as a basis for assessing deliberate threats against a facility. MSET uses faultmore » tree analysis, which is a top-down approach to examining system failure. The analysis starts with identifying a potential undesirable event called a 'top event' and then determining the ways it can occur (e.g., 'Fail To Maintain Nuclear Materials Under The Purview Of The MC&A System'). The analysis proceeds by determining how the top event can be caused by individual or combined lower level faults or failures. These faults, which are the causes of the top event, are 'connected' through logic gates. The MSET model uses AND-gates and OR-gates and propagates the effect of event failure using Boolean algebra. To enable the fault tree analysis calculations, the basic events in the fault tree are populated with probability risk values derived by conversion of questionnaire data to numeric values. The basic events are treated as independent variables. This assumption affects the Boolean algebraic calculations used to calculate results. All the necessary calculations are built into the fault tree codes, but it is often useful to estimate the probabilities manually as a check on code functioning. The probability of failure of a given basic event is the probability that the basic event primary question fails to meet the performance metric for that question. The failure probability is related to how well the facility performs the task identified in that basic event over time (not just one performance or exercise). Fault tree calculations provide a failure probability for the top event in the fault tree. The basic fault tree calculations establish a baseline relative risk value for the system. This probability depicts relative risk, not absolute risk. Subsequent calculations are made to evaluate the change in relative risk that would occur if system performance is improved or degraded. During the development effort of MSET, the fault tree analysis program used was SAPHIRE. SAPHIRE is an acronym for 'Systems Analysis Programs for Hands-on Integrated Reliability Evaluations.' Version 1 of the SAPHIRE code was sponsored by the Nuclear Regulatory Commission in 1987 as an innovative way to draw, edit, and analyze graphical fault trees primarily for safe operation of nuclear power reactors. When the fault tree calculations are performed, the fault tree analysis program will produce several reports that can be used to analyze the MPC&A system. SAPHIRE produces reports showing risk importance factors for all basic events in the operational MC&A system. The risk importance information is used to examine the potential impacts when performance of certain basic events increases or decreases. The initial results produced by the SAPHIRE program are considered relative risk values. None of the results can be interpreted as absolute risk values since the basic event probability values represent estimates of risk associated with the performance of MPC&A tasks throughout the material balance area (MBA). The RRR for a basic event represents the decrease in total system risk that would result from improvement of that one event to a perfect performance level. Improvement of the basic event with the greatest RRR value produces a greater decrease in total system risk than improvement of any other basic event. Basic events with the greatest potential for system risk reduction are assigned performance improvement values, and new fault tree calculations show the improvement in total system risk. The operational impact or cost-effectiveness from implementing the performance improvements can then be evaluated. The improvements being evaluated can be system performance improvements, or they can be potential, or actual, upgrades to the system. The RIR for a basic event represents the increase in total system risk that would result from failure of that one event. Failure of the basic event with the greatest RIR value produces a greater increase in total system risk than failure of any other basic event. Basic events with the greatest potential for system risk increase are assigned failure performance values, and new fault tree calculations show the increase in total system risk. This evaluation shows the importance of preventing performance degradation of the basic events. SAPHIRE identifies combinations of basic events where concurrent failure of the events results in failure of the top event.« less
Coelli, Fernando C; Almeida, Renan M V R; Pereira, Wagner C A
2010-12-01
This work develops a cost analysis estimation for a mammography clinic, taking into account resource utilization and equipment failure rates. Two standard clinic models were simulated, the first with one mammography equipment, two technicians and one doctor, and the second (based on an actually functioning clinic) with two equipments, three technicians and one doctor. Cost data and model parameters were obtained by direct measurements, literature reviews and other hospital data. A discrete-event simulation model was developed, in order to estimate the unit cost (total costs/number of examinations in a defined period) of mammography examinations at those clinics. The cost analysis considered simulated changes in resource utilization rates and in examination failure probabilities (failures on the image acquisition system). In addition, a sensitivity analysis was performed, taking into account changes in the probabilities of equipment failure types. For the two clinic configurations, the estimated mammography unit costs were, respectively, US$ 41.31 and US$ 53.46 in the absence of examination failures. As the examination failures increased up to 10% of total examinations, unit costs approached US$ 54.53 and US$ 53.95, respectively. The sensitivity analysis showed that type 3 (the most serious) failure increases had a very large impact on the patient attendance, up to the point of actually making attendance unfeasible. Discrete-event simulation allowed for the definition of the more efficient clinic, contingent on the expected prevalence of resource utilization and equipment failures. © 2010 Blackwell Publishing Ltd.
NASA Technical Reports Server (NTRS)
Anderson, Leif; Box, Neil; Carter, Katrina; DiFilippo, Denise; Harrington, Sean; Jackson, David; Lutomski, Michael
2012-01-01
There are two general shortcomings to the current annual sparing assessment: 1. The vehicle functions are currently assessed according to confidence targets, which can be misleading- overly conservative or optimistic. 2. The current confidence levels are arbitrarily determined and do not account for epistemic uncertainty (lack of knowledge) in the ORU failure rate. There are two major categories of uncertainty that impact Sparing Assessment: (a) Aleatory Uncertainty: Natural variability in distribution of actual failures around an Mean Time Between Failure (MTBF) (b) Epistemic Uncertainty : Lack of knowledge about the true value of an Orbital Replacement Unit's (ORU) MTBF We propose an approach to revise confidence targets and account for both categories of uncertainty, an approach we call Probability and Confidence Trade-space (PACT) evaluation.
Architecture for Integrated Medical Model Dynamic Probabilistic Risk Assessment
NASA Technical Reports Server (NTRS)
Jaworske, D. A.; Myers, J. G.; Goodenow, D.; Young, M.; Arellano, J. D.
2016-01-01
Probabilistic Risk Assessment (PRA) is a modeling tool used to predict potential outcomes of a complex system based on a statistical understanding of many initiating events. Utilizing a Monte Carlo method, thousands of instances of the model are considered and outcomes are collected. PRA is considered static, utilizing probabilities alone to calculate outcomes. Dynamic Probabilistic Risk Assessment (dPRA) is an advanced concept where modeling predicts the outcomes of a complex system based not only on the probabilities of many initiating events, but also on a progression of dependencies brought about by progressing down a time line. Events are placed in a single time line, adding each event to a queue, as managed by a planner. Progression down the time line is guided by rules, as managed by a scheduler. The recently developed Integrated Medical Model (IMM) summarizes astronaut health as governed by the probabilities of medical events and mitigation strategies. Managing the software architecture process provides a systematic means of creating, documenting, and communicating a software design early in the development process. The software architecture process begins with establishing requirements and the design is then derived from the requirements.
Probabilistic structural analysis methods for select space propulsion system components
NASA Technical Reports Server (NTRS)
Millwater, H. R.; Cruse, T. A.
1989-01-01
The Probabilistic Structural Analysis Methods (PSAM) project developed at the Southwest Research Institute integrates state-of-the-art structural analysis techniques with probability theory for the design and analysis of complex large-scale engineering structures. An advanced efficient software system (NESSUS) capable of performing complex probabilistic analysis has been developed. NESSUS contains a number of software components to perform probabilistic analysis of structures. These components include: an expert system, a probabilistic finite element code, a probabilistic boundary element code and a fast probability integrator. The NESSUS software system is shown. An expert system is included to capture and utilize PSAM knowledge and experience. NESSUS/EXPERT is an interactive menu-driven expert system that provides information to assist in the use of the probabilistic finite element code NESSUS/FEM and the fast probability integrator (FPI). The expert system menu structure is summarized. The NESSUS system contains a state-of-the-art nonlinear probabilistic finite element code, NESSUS/FEM, to determine the structural response and sensitivities. A broad range of analysis capabilities and an extensive element library is present.
FINDS: A fault inferring nonlinear detection system programmers manual, version 3.0
NASA Technical Reports Server (NTRS)
Lancraft, R. E.
1985-01-01
Detailed software documentation of the digital computer program FINDS (Fault Inferring Nonlinear Detection System) Version 3.0 is provided. FINDS is a highly modular and extensible computer program designed to monitor and detect sensor failures, while at the same time providing reliable state estimates. In this version of the program the FINDS methodology is used to detect, isolate, and compensate for failures in simulated avionics sensors used by the Advanced Transport Operating Systems (ATOPS) Transport System Research Vehicle (TSRV) in a Microwave Landing System (MLS) environment. It is intended that this report serve as a programmers guide to aid in the maintenance, modification, and revision of the FINDS software.
Compounding effects of sea level rise and fluvial flooding.
Moftakhari, Hamed R; Salvadori, Gianfausto; AghaKouchak, Amir; Sanders, Brett F; Matthew, Richard A
2017-09-12
Sea level rise (SLR), a well-documented and urgent aspect of anthropogenic global warming, threatens population and assets located in low-lying coastal regions all around the world. Common flood hazard assessment practices typically account for one driver at a time (e.g., either fluvial flooding only or ocean flooding only), whereas coastal cities vulnerable to SLR are at risk for flooding from multiple drivers (e.g., extreme coastal high tide, storm surge, and river flow). Here, we propose a bivariate flood hazard assessment approach that accounts for compound flooding from river flow and coastal water level, and we show that a univariate approach may not appropriately characterize the flood hazard if there are compounding effects. Using copulas and bivariate dependence analysis, we also quantify the increases in failure probabilities for 2030 and 2050 caused by SLR under representative concentration pathways 4.5 and 8.5. Additionally, the increase in failure probability is shown to be strongly affected by compounding effects. The proposed failure probability method offers an innovative tool for assessing compounding flood hazards in a warming climate.
Improving online risk assessment with equipment prognostics and health monitoring
DOE Office of Scientific and Technical Information (OSTI.GOV)
Coble, Jamie B.; Liu, Xiaotong; Briere, Chris
The current approach to evaluating the risk of nuclear power plant (NPP) operation relies on static probabilities of component failure, which are based on industry experience with the existing fleet of nominally similar light water reactors (LWRs). As the nuclear industry looks to advanced reactor designs that feature non-light water coolants (e.g., liquid metal, high temperature gas, molten salt), this operating history is not available. Many advanced reactor designs use advanced components, such as electromagnetic pumps, that have not been used in the US commercial nuclear fleet. Given the lack of rich operating experience, we cannot accurately estimate the evolvingmore » probability of failure for basic components to populate the fault trees and event trees that typically comprise probabilistic risk assessment (PRA) models. Online equipment prognostics and health management (PHM) technologies can bridge this gap to estimate the failure probabilities for components under operation. The enhanced risk monitor (ERM) incorporates equipment condition assessment into the existing PRA and risk monitor framework to provide accurate and timely estimates of operational risk.« less
Statistical Performance Evaluation Of Soft Seat Pressure Relief Valves
DOE Office of Scientific and Technical Information (OSTI.GOV)
Harris, Stephen P.; Gross, Robert E.
2013-03-26
Risk-based inspection methods enable estimation of the probability of failure on demand for spring-operated pressure relief valves at the United States Department of Energy's Savannah River Site in Aiken, South Carolina. This paper presents a statistical performance evaluation of soft seat spring operated pressure relief valves. These pressure relief valves are typically smaller and of lower cost than hard seat (metal to metal) pressure relief valves and can provide substantial cost savings in fluid service applications (air, gas, liquid, and steam) providing that probability of failure on demand (the probability that the pressure relief valve fails to perform its intendedmore » safety function during a potentially dangerous over pressurization) is at least as good as that for hard seat valves. The research in this paper shows that the proportion of soft seat spring operated pressure relief valves failing is the same or less than that of hard seat valves, and that for failed valves, soft seat valves typically have failure ratios of proof test pressure to set pressure less than that of hard seat valves.« less
ERIC Educational Resources Information Center
Pitts, Laura; Dymond, Simon
2012-01-01
Research on the high-probability (high-p) request sequence shows that compliance with low-probability (low-p) requests generally increases when preceded by a series of high-p requests. Few studies have conducted formal preference assessments to identify the consequences used for compliance, which may partly explain treatment failures, and still…
An overview of the mathematical and statistical analysis component of RICIS
NASA Technical Reports Server (NTRS)
Hallum, Cecil R.
1987-01-01
Mathematical and statistical analysis components of RICIS (Research Institute for Computing and Information Systems) can be used in the following problem areas: (1) quantification and measurement of software reliability; (2) assessment of changes in software reliability over time (reliability growth); (3) analysis of software-failure data; and (4) decision logic for whether to continue or stop testing software. Other areas of interest to NASA/JSC where mathematical and statistical analysis can be successfully employed include: math modeling of physical systems, simulation, statistical data reduction, evaluation methods, optimization, algorithm development, and mathematical methods in signal processing.
Dam failure analysis for the Lago El Guineo Dam, Orocovis, Puerto Rico
Gómez-Fragoso, Julieta; Heriberto Torres-Sierra,
2016-08-09
The U.S. Geological Survey, in cooperation with the Puerto Rico Electric Power Authority, completed hydrologic and hydraulic analyses to assess the potential hazard to human life and property associated with the hypothetical failure of the Lago El Guineo Dam. The Lago El Guineo Dam is within the headwaters of the Río Grande de Manatí and impounds a drainage area of about 4.25 square kilometers.The hydrologic assessment was designed to determine the outflow hydrographs and peak discharges for Lago El Guineo and other subbasins in the Río Grande de Manatí hydrographic basin for three extreme rainfall events: (1) a 6-hour probable maximum precipitation event, (2) a 24-hour probable maximum precipitation event, and (3) a 24-hour, 100-year recurrence rainfall event. The hydraulic study simulated a dam failure of Lago El Guineo Dam using flood hydrographs generated from the hydrologic study. The simulated dam failure generated a hydrograph that was routed downstream from Lago El Guineo Dam through the lower reaches of the Río Toro Negro and the Río Grande de Manatí to determine water-surface profiles developed from the event-based hydrologic scenarios and “sunny day” conditions. The Hydrologic Engineering Center’s Hydrologic Modeling System (HEC–HMS) and Hydrologic Engineering Center’s River Analysis System (HEC–RAS) computer programs, developed by the U.S. Army Corps of Engineers, were used for the hydrologic and hydraulic modeling, respectively. The flow routing in the hydraulic analyses was completed using the unsteady flow module available in the HEC–RAS model.Above the Lago El Guineo Dam, the simulated inflow peak discharges from HEC–HMS resulted in about 550 and 414 cubic meters per second for the 6- and 24-hour probable maximum precipitation events, respectively. The 24-hour, 100-year recurrence storm simulation resulted in a peak discharge of about 216 cubic meters per second. For the hydrologic analysis, no dam failure conditions are considered within the model. The results of the hydrologic simulations indicated that for all hydrologic conditions scenarios, the Lago El Guineo Dam would not experience overtopping. For the dam breach hydraulic analysis, failure by piping was the selected hypothetical failure mode for the Lago El Guineo Dam.Results from the simulated dam failure of the Lago El Guineo Dam using the HEC–RAS model for the 6- and 24-hour probable maximum precipitation events indicated peak discharges below the dam of 1,342.43 and 1,434.69 cubic meters per second, respectively. Dam failure during the 24-hour, 100-year recurrence rainfall event resulted in a peak discharge directly downstream from Lago El Guineo Dam of 1,183.12 cubic meters per second. Dam failure during sunny-day conditions (no precipitation) produced a peak discharge at Lago El Guineo Dam of 1,015.31 cubic meters per second assuming the initial water-surface elevation was at the morning-glory spillway invert elevation.The results of the hydraulic analysis indicate that the flood would extend to many inhabited areas along the stream banks from the Lago El Guineo Dam to the mouth of the Río Grande as a result of the simulated failure of the Lago El Guineo Dam. Low-lying regions in the vicinity of Ciales, Manatí, and Barceloneta, Puerto Rico, are among the regions that would be most affected by failure of the Lago El Guineo Dam. Effects of the flood control (levee) structure constructed in 2000 to provide protection to the low-lying populated areas of Barceloneta, Puerto Rico, were considered in the hydraulic analysis of dam failure. The results indicate that overtopping can be expected in the aforementioned levee during 6- and 24-hour probable maximum precipitation events. The levee was not overtopped during dam failure scenarios under the 24-hour, 100-year recurrence rainfall event or sunny-day conditions.
Independent Orbiter Assessment (IOA): Analysis of the guidance, navigation, and control subsystem
NASA Technical Reports Server (NTRS)
Trahan, W. H.; Odonnell, R. A.; Pietz, K. C.; Hiott, J. M.
1986-01-01
The results of the Independent Orbiter Assessment (IOA) of the Failure Modes and Effects Analysis (FMEA) and Critical Items List (CIL) is presented. The IOA approach features a top-down analysis of the hardware to determine failure modes, criticality, and potential critical items. To preserve independence, this analysis was accomplished without reliance upon the results contained within the NASA FMEA/CIL documentation. The independent analysis results corresponding to the Orbiter Guidance, Navigation, and Control (GNC) Subsystem hardware are documented. The function of the GNC hardware is to respond to guidance, navigation, and control software commands to effect vehicle control and to provide sensor and controller data to GNC software. Some of the GNC hardware for which failure modes analysis was performed includes: hand controllers; Rudder Pedal Transducer Assembly (RPTA); Speed Brake Thrust Controller (SBTC); Inertial Measurement Unit (IMU); Star Tracker (ST); Crew Optical Alignment Site (COAS); Air Data Transducer Assembly (ADTA); Rate Gyro Assemblies; Accelerometer Assembly (AA); Aerosurface Servo Amplifier (ASA); and Ascent Thrust Vector Control (ATVC). The IOA analysis process utilized available GNC hardware drawings, workbooks, specifications, schematics, and systems briefs for defining hardware assemblies, components, and circuits. Each hardware item was evaluated and analyzed for possible failure modes and effects. Criticality was assigned based upon the severity of the effect for each failure mode.
Measurement and Analysis of Failures in Computer Systems
NASA Technical Reports Server (NTRS)
Thakur, Anshuman
1997-01-01
This thesis presents a study of software failures spanning several different releases of Tandem's NonStop-UX operating system running on Tandem Integrity S2(TMR) systems. NonStop-UX is based on UNIX System V and is fully compliant with industry standards, such as the X/Open Portability Guide, the IEEE POSIX standards, and the System V Interface Definition (SVID) extensions. In addition to providing a general UNIX interface to the hardware, the operating system has built-in recovery mechanisms and audit routines that check the consistency of the kernel data structures. The analysis is based on data on software failures and repairs collected from Tandem's product report (TPR) logs for a period exceeding three years. A TPR log is created when a customer or an internal developer observes a failure in a Tandem Integrity system. This study concentrates primarily on those TPRs that report a UNIX panic that subsequently crashes the system. Approximately 200 of the TPRs fall into this category. Approximately 50% of the failures reported are from field systems, and the rest are from the testing and development sites. It has been observed by Tandem developers that fewer cases are encountered from the field than from the test centers. Thus, the data selection mechanism has introduced a slight skew.
Huq, M. Saiful; Fraass, Benedick A.; Dunscombe, Peter B.; Gibbons, John P.; Mundt, Arno J.; Mutic, Sasa; Palta, Jatinder R.; Rath, Frank; Thomadsen, Bruce R.; Williamson, Jeffrey F.; Yorke, Ellen D.
2016-01-01
The increasing complexity of modern radiation therapy planning and delivery challenges traditional prescriptive quality management (QM) methods, such as many of those included in guidelines published by organizations such as the AAPM, ASTRO, ACR, ESTRO, and IAEA. These prescriptive guidelines have traditionally focused on monitoring all aspects of the functional performance of radiotherapy (RT) equipment by comparing parameters against tolerances set at strict but achievable values. Many errors that occur in radiation oncology are not due to failures in devices and software; rather they are failures in workflow and process. A systematic understanding of the likelihood and clinical impact of possible failures throughout a course of radiotherapy is needed to direct limit QM resources efficiently to produce maximum safety and quality of patient care. Task Group 100 of the AAPM has taken a broad view of these issues and has developed a framework for designing QM activities, based on estimates of the probability of identified failures and their clinical outcome through the RT planning and delivery process. The Task Group has chosen a specific radiotherapy process required for “intensity modulated radiation therapy (IMRT)” as a case study. The goal of this work is to apply modern risk-based analysis techniques to this complex RT process in order to demonstrate to the RT community that such techniques may help identify more effective and efficient ways to enhance the safety and quality of our treatment processes. The task group generated by consensus an example quality management program strategy for the IMRT process performed at the institution of one of the authors. This report describes the methodology and nomenclature developed, presents the process maps, FMEAs, fault trees, and QM programs developed, and makes suggestions on how this information could be used in the clinic. The development and implementation of risk-assessment techniques will make radiation therapy safer and more efficient. PMID:27370140
DOE Office of Scientific and Technical Information (OSTI.GOV)
Huq, M. Saiful, E-mail: HUQS@UPMC.EDU
The increasing complexity of modern radiation therapy planning and delivery challenges traditional prescriptive quality management (QM) methods, such as many of those included in guidelines published by organizations such as the AAPM, ASTRO, ACR, ESTRO, and IAEA. These prescriptive guidelines have traditionally focused on monitoring all aspects of the functional performance of radiotherapy (RT) equipment by comparing parameters against tolerances set at strict but achievable values. Many errors that occur in radiation oncology are not due to failures in devices and software; rather they are failures in workflow and process. A systematic understanding of the likelihood and clinical impact ofmore » possible failures throughout a course of radiotherapy is needed to direct limit QM resources efficiently to produce maximum safety and quality of patient care. Task Group 100 of the AAPM has taken a broad view of these issues and has developed a framework for designing QM activities, based on estimates of the probability of identified failures and their clinical outcome through the RT planning and delivery process. The Task Group has chosen a specific radiotherapy process required for “intensity modulated radiation therapy (IMRT)” as a case study. The goal of this work is to apply modern risk-based analysis techniques to this complex RT process in order to demonstrate to the RT community that such techniques may help identify more effective and efficient ways to enhance the safety and quality of our treatment processes. The task group generated by consensus an example quality management program strategy for the IMRT process performed at the institution of one of the authors. This report describes the methodology and nomenclature developed, presents the process maps, FMEAs, fault trees, and QM programs developed, and makes suggestions on how this information could be used in the clinic. The development and implementation of risk-assessment techniques will make radiation therapy safer and more efficient.« less
Huq, M Saiful; Fraass, Benedick A; Dunscombe, Peter B; Gibbons, John P; Ibbott, Geoffrey S; Mundt, Arno J; Mutic, Sasa; Palta, Jatinder R; Rath, Frank; Thomadsen, Bruce R; Williamson, Jeffrey F; Yorke, Ellen D
2016-07-01
The increasing complexity of modern radiation therapy planning and delivery challenges traditional prescriptive quality management (QM) methods, such as many of those included in guidelines published by organizations such as the AAPM, ASTRO, ACR, ESTRO, and IAEA. These prescriptive guidelines have traditionally focused on monitoring all aspects of the functional performance of radiotherapy (RT) equipment by comparing parameters against tolerances set at strict but achievable values. Many errors that occur in radiation oncology are not due to failures in devices and software; rather they are failures in workflow and process. A systematic understanding of the likelihood and clinical impact of possible failures throughout a course of radiotherapy is needed to direct limit QM resources efficiently to produce maximum safety and quality of patient care. Task Group 100 of the AAPM has taken a broad view of these issues and has developed a framework for designing QM activities, based on estimates of the probability of identified failures and their clinical outcome through the RT planning and delivery process. The Task Group has chosen a specific radiotherapy process required for "intensity modulated radiation therapy (IMRT)" as a case study. The goal of this work is to apply modern risk-based analysis techniques to this complex RT process in order to demonstrate to the RT community that such techniques may help identify more effective and efficient ways to enhance the safety and quality of our treatment processes. The task group generated by consensus an example quality management program strategy for the IMRT process performed at the institution of one of the authors. This report describes the methodology and nomenclature developed, presents the process maps, FMEAs, fault trees, and QM programs developed, and makes suggestions on how this information could be used in the clinic. The development and implementation of risk-assessment techniques will make radiation therapy safer and more efficient.
ERIC Educational Resources Information Center
Proffitt, Curtis K.
2012-01-01
Project failure remains a challenge within the software development field especially during the early stages of the IT project development. Despite the herculean efforts by project managers and organizations to identify and offset problems, projects remain plagued with issues. If these challenges are not mitigated, to a successful degree,…
Telemetry Option in the Measurement of Physical Activity for Patients with Heart Failure
ERIC Educational Resources Information Center
Melczer, Csaba; Melczer, László; Oláh, András; Sélleyné-Gyúró, Mónika; Welker, Zsanett; Ács, Pongrác
2015-01-01
Measurement of physical activity among patients with heart failure typically requires a special approach due to the patients' physical status. Nowadays, a technology is already available that can measure the kinematic movements in 3-D by a pacemaker and implantable defibrillator giving an assessment on software. The telemetry data can be…
NASA Technical Reports Server (NTRS)
Bueno, R.; Chow, E.; Gershwin, S. B.; Willsky, A. S.
1975-01-01
The research is reported on the problems of failure detection and reliable system design for digital aircraft control systems. Failure modes, cross detection probability, wrong time detection, application of performance tools, and the GLR computer package are discussed.
A physically-based earthquake recurrence model for estimation of long-term earthquake probabilities
Ellsworth, William L.; Matthews, Mark V.; Nadeau, Robert M.; Nishenko, Stuart P.; Reasenberg, Paul A.; Simpson, Robert W.
1999-01-01
A physically-motivated model for earthquake recurrence based on the Brownian relaxation oscillator is introduced. The renewal process defining this point process model can be described by the steady rise of a state variable from the ground state to failure threshold as modulated by Brownian motion. Failure times in this model follow the Brownian passage time (BPT) distribution, which is specified by the mean time to failure, μ, and the aperiodicity of the mean, α (equivalent to the familiar coefficient of variation). Analysis of 37 series of recurrent earthquakes, M -0.7 to 9.2, suggests a provisional generic value of α = 0.5. For this value of α, the hazard function (instantaneous failure rate of survivors) exceeds the mean rate for times > μ⁄2, and is ~ ~ 2 ⁄ μ for all times > μ. Application of this model to the next M 6 earthquake on the San Andreas fault at Parkfield, California suggests that the annual probability of the earthquake is between 1:10 and 1:13.
Lee, L.; Helsel, D.
2007-01-01
Analysis of low concentrations of trace contaminants in environmental media often results in left-censored data that are below some limit of analytical precision. Interpretation of values becomes complicated when there are multiple detection limits in the data-perhaps as a result of changing analytical precision over time. Parametric and semi-parametric methods, such as maximum likelihood estimation and robust regression on order statistics, can be employed to model distributions of multiply censored data and provide estimates of summary statistics. However, these methods are based on assumptions about the underlying distribution of data. Nonparametric methods provide an alternative that does not require such assumptions. A standard nonparametric method for estimating summary statistics of multiply-censored data is the Kaplan-Meier (K-M) method. This method has seen widespread usage in the medical sciences within a general framework termed "survival analysis" where it is employed with right-censored time-to-failure data. However, K-M methods are equally valid for the left-censored data common in the geosciences. Our S-language software provides an analytical framework based on K-M methods that is tailored to the needs of the earth and environmental sciences community. This includes routines for the generation of empirical cumulative distribution functions, prediction or exceedance probabilities, and related confidence limits computation. Additionally, our software contains K-M-based routines for nonparametric hypothesis testing among an unlimited number of grouping variables. A primary characteristic of K-M methods is that they do not perform extrapolation and interpolation. Thus, these routines cannot be used to model statistics beyond the observed data range or when linear interpolation is desired. For such applications, the aforementioned parametric and semi-parametric methods must be used.
Reliability, Safety and Error Recovery for Advanced Control Software
NASA Technical Reports Server (NTRS)
Malin, Jane T.
2003-01-01
For long-duration automated operation of regenerative life support systems in space environments, there is a need for advanced integration and control systems that are significantly more reliable and safe, and that support error recovery and minimization of operational failures. This presentation outlines some challenges of hazardous space environments and complex system interactions that can lead to system accidents. It discusses approaches to hazard analysis and error recovery for control software and challenges of supporting effective intervention by safety software and the crew.
Passive Superconducting Shielding: Experimental Results and Computer Models
NASA Technical Reports Server (NTRS)
Warner, B. A.; Kamiya, K.
2003-01-01
Passive superconducting shielding for magnetic refrigerators has advantages over active shielding and passive ferromagnetic shielding in that it is lightweight and easy to construct. However, it is not as easy to model and does not fail gracefully. Failure of a passive superconducting shield may lead to persistent flux and persistent currents. Unfortunately, modeling software for superconducting materials is not as easily available as is software for simple coils or for ferromagnetic materials. This paper will discuss ways of using available software to model passive superconducting shielding.
Chan, Jennifer S K
2016-05-01
Dropouts are common in longitudinal study. If the dropout probability depends on the missing observations at or after dropout, this type of dropout is called informative (or nonignorable) dropout (ID). Failure to accommodate such dropout mechanism into the model will bias the parameter estimates. We propose a conditional autoregressive model for longitudinal binary data with an ID model such that the probabilities of positive outcomes as well as the drop-out indicator in each occasion are logit linear in some covariates and outcomes. This model adopting a marginal model for outcomes and a conditional model for dropouts is called a selection model. To allow for the heterogeneity and clustering effects, the outcome model is extended to incorporate mixture and random effects. Lastly, the model is further extended to a novel model that models the outcome and dropout jointly such that their dependency is formulated through an odds ratio function. Parameters are estimated by a Bayesian approach implemented using the user-friendly Bayesian software WinBUGS. A methadone clinic dataset is analyzed to illustrate the proposed models. Result shows that the treatment time effect is still significant but weaker after allowing for an ID process in the data. Finally the effect of drop-out on parameter estimates is evaluated through simulation studies. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
D0 General Support: The Use of Programmable Logic Controllers (PLCS) at D0
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hance, R.; /Fermilab
With the exception of control of heating, ventilation, and air conditioning (HVAC) ventilation fans, and their shutdown in the case of smoke in the ducts, all implementations of Programmable Logic Controllers (PLCs) in Dzero have been made within the fundamental premise that no uncertified PLC apparatus shall be entrusted with the safety of equipment or personnel. Thus although PLCs are used to control and monitor all manner of intricate equipment, simple hardware interlocks and relief devices provide basic protection against component failure, control failure, or inappropriate control operation. Nevertheless, this report includes two observations as follows: (1) It may bemore » prudent to reconfigure the link between the Pyrotronics system and the HVAC system such that the Pyrotronics system provides interlocks to the ventilation fans instead of control inputs to the uncertified HVAC PLCs. Although the Pyrotronics system is certified and maintained to life safety standards, the HVAC system is not. A hardware or software failure of the HVAC system probably should not be allowed to result in the situation where the ventilation fans in a smoke filled duct continue to operate. Dan Markley is investigating this matter. (2) It may also be prudent to examine the network security of those systems connected to the Fermilab WAN (HVAC, Cryo, and Solenoid Controls). Even though the impact of a successful hack might only be to operations, it might nevertheless be disruptive and could be expensive. The risks should perhaps be analyzed. One of the most attractive features of these systems, from a user's viewpoint, is their unlimited networking. The unlimited networking that makes the systems so convenient to legitimate access also makes them vulnerable to illegitimate access.« less
NASA Astrophysics Data System (ADS)
Kraus, E. I.; Shabalin, I. I.; Shabalin, T. I.
2018-04-01
The main points of development of numerical tools for simulation of deformation and failure of complex technical objects under nonstationary conditions of extreme loading are presented. The possibility of extending the dynamic method for construction of difference grids to the 3D case is shown. A 3D realization of discrete-continuum approach to the deformation and failure of complex technical objects is carried out. The efficiency of the existing software package for 3D modelling is shown.
How to avoid the ten most frequent EMS pitfalls
DOE Office of Scientific and Technical Information (OSTI.GOV)
Andrews, W.
1982-04-19
It pays to do your homework before investing in an energy management system if you want to avoid the 10 most common pitfalls listed by users, consultants, and manufacturers as: oversimplification, improper maintenance, failure to involve operating personnel, inaccurate savings estimates, failure to include monitoring capability, incompetent or fradulent firms, improper load control, not allowing for a de-bugging period, failure to include manual override, and software problems. The article describes how each of these pitfalls can lead to poor decisions and poor results. (DCK)
NASA Technical Reports Server (NTRS)
Lalli, Vincent R. (Editor); Malec, Henry A. (Editor); Dillard, Richard B.; Wong, Kam L.; Barber, Frank J.; Barina, Frank J.
1992-01-01
Discussed here is failure physics, the study of how products, hardware, software, and systems fail and what can be done about it. The intent is to impart useful information, to extend the limits of production capability, and to assist in achieving low cost reliable products. A review of reliability for the years 1940 to 2000 is given. Next, a review of mathematics is given as well as a description of what elements contribute to product failures. Basic reliability theory and the disciplines that allow us to control and eliminate failures are elucidated.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Alfonsi, Andrea; Rabiti, Cristian; Mandelli, Diego
2016-06-01
RAVEN is a software framework able to perform parametric and stochastic analysis based on the response of complex system codes. The initial development was aimed at providing dynamic risk analysis capabilities to the thermohydraulic code RELAP-7, currently under development at Idaho National Laboratory (INL). Although the initial goal has been fully accomplished, RAVEN is now a multi-purpose stochastic and uncertainty quantification platform, capable of communicating with any system code. In fact, the provided Application Programming Interfaces (APIs) allow RAVEN to interact with any code as long as all the parameters that need to be perturbed are accessible by input filesmore » or via python interfaces. RAVEN is capable of investigating system response and explore input space using various sampling schemes such as Monte Carlo, grid, or Latin hypercube. However, RAVEN strength lies in its system feature discovery capabilities such as: constructing limit surfaces, separating regions of the input space leading to system failure, and using dynamic supervised learning techniques. The development of RAVEN started in 2012 when, within the Nuclear Energy Advanced Modeling and Simulation (NEAMS) program, the need to provide a modern risk evaluation framework arose. RAVEN’s principal assignment is to provide the necessary software and algorithms in order to employ the concepts developed by the Risk Informed Safety Margin Characterization (RISMC) program. RISMC is one of the pathways defined within the Light Water Reactor Sustainability (LWRS) program. In the RISMC approach, the goal is not just to identify the frequency of an event potentially leading to a system failure, but the proximity (or lack thereof) to key safety-related events. Hence, the approach is interested in identifying and increasing the safety margins related to those events. A safety margin is a numerical value quantifying the probability that a safety metric (e.g. peak pressure in a pipe) is exceeded under certain conditions. Most of the capabilities, implemented having RELAP-7 as a principal focus, are easily deployable to other system codes. For this reason, several side activates have been employed (e.g. RELAP5-3D, any MOOSE-based App, etc.) or are currently ongoing for coupling RAVEN with several different software. The aim of this document is to provide a set of commented examples that can help the user to become familiar with the RAVEN code usage.« less
Disasters as a necessary part of benefit-cost analyses.
Mark, R K; Stuart-Alexander, D E
1977-09-16
Benefit-cost analyses for water projects generally have not included the expected costs (residual risk) of low-probability disasters such as dam failures, impoundment-induced earthquakes, and landslides. Analysis of the history of these types of events demonstrates that dam failures are not uncommon and that the probability of a reservoir-triggered earth-quake increases with increasing reservoir depth. Because the expected costs from such events can be significant and risk is project-specific, estimates should be made for each project. The cost of expected damage from a "high-risk" project in an urban area could be comparable to project benefits.
NASA Technical Reports Server (NTRS)
Naumann, R. J.; Oran, W. A.; Whymark, R. R.; Rey, C.
1981-01-01
The single axis acoustic levitator that was flown on SPAR VI malfunctioned. The results of a series of tests, analyses, and investigation of hypotheses that were undertaken to determine the probable cause of failure are presented, together with recommendations for future flights of the apparatus. The most probable causes of the SPAR VI failure were lower than expected sound intensity due to mechanical degradation of the sound source, and an unexpected external force that caused the experiment sample to move radially and eventually be lost from the acoustic energy well.
A Brownian model for recurrent earthquakes
Matthews, M.V.; Ellsworth, W.L.; Reasenberg, P.A.
2002-01-01
We construct a probability model for rupture times on a recurrent earthquake source. Adding Brownian perturbations to steady tectonic loading produces a stochastic load-state process. Rupture is assumed to occur when this process reaches a critical-failure threshold. An earthquake relaxes the load state to a characteristic ground level and begins a new failure cycle. The load-state process is a Brownian relaxation oscillator. Intervals between events have a Brownian passage-time distribution that may serve as a temporal model for time-dependent, long-term seismic forecasting. This distribution has the following noteworthy properties: (1) the probability of immediate rerupture is zero; (2) the hazard rate increases steadily from zero at t = 0 to a finite maximum near the mean recurrence time and then decreases asymptotically to a quasi-stationary level, in which the conditional probability of an event becomes time independent; and (3) the quasi-stationary failure rate is greater than, equal to, or less than the mean failure rate because the coefficient of variation is less than, equal to, or greater than 1/???2 ??? 0.707. In addition, the model provides expressions for the hazard rate and probability of rupture on faults for which only a bound can be placed on the time of the last rupture. The Brownian relaxation oscillator provides a connection between observable event times and a formal state variable that reflects the macromechanics of stress and strain accumulation. Analysis of this process reveals that the quasi-stationary distance to failure has a gamma distribution, and residual life has a related exponential distribution. It also enables calculation of "interaction" effects due to external perturbations to the state, such as stress-transfer effects from earthquakes outside the target source. The influence of interaction effects on recurrence times is transient and strongly dependent on when in the loading cycle step pertubations occur. Transient effects may be much stronger than would be predicted by the "clock change" method and characteristically decay inversely with elapsed time after the perturbation.
Bordin, Dimorvan; Bergamo, Edmara T P; Fardin, Vinicius P; Coelho, Paulo G; Bonfante, Estevam A
2017-07-01
To assess the probability of survival (reliability) and failure modes of narrow implants with different diameters. For fatigue testing, 42 implants with the same macrogeometry and internal conical connection were divided, according to diameter, as follows: narrow (Ø3.3×10mm) and extra-narrow (Ø2.9×10mm) (21 per group). Identical abutments were torqued to the implants and standardized maxillary incisor crowns were cemented and subjected to step-stress accelerated life testing (SSALT) in water. The use-level probability Weibull curves, and reliability for a mission of 50,000 and 100,000 cycles at 50N, 100, 150 and 180N were calculated. For the finite element analysis (FEA), two virtual models, simulating the samples tested in fatigue, were constructed. Loading at 50N and 100N were applied 30° off-axis at the crown. The von-Mises stress was calculated for implant and abutment. The beta (β) values were: 0.67 for narrow and 1.32 for extra-narrow implants, indicating that failure rates did not increase with fatigue in the former, but more likely were associated with damage accumulation and wear-out failures in the latter. Both groups showed high reliability (up to 97.5%) at 50 and 100N. A decreased reliability was observed for both groups at 150 and 180N (ranging from 0 to 82.3%), but no significant difference was observed between groups. Failure predominantly involved abutment fracture for both groups. FEA at 50N-load, Ø3.3mm showed higher von-Mises stress for abutment (7.75%) and implant (2%) when compared to the Ø2.9mm. There was no significant difference between narrow and extra-narrow implants regarding probability of survival. The failure mode was similar for both groups, restricted to abutment fracture. Copyright © 2017 Elsevier Ltd. All rights reserved.
Fishnet model for failure probability tail of nacre-like imbricated lamellar materials
NASA Astrophysics Data System (ADS)
Luo, Wen; Bažant, Zdeněk P.
2017-12-01
Nacre, the iridescent material of the shells of pearl oysters and abalone, consists mostly of aragonite (a form of CaCO3), a brittle constituent of relatively low strength (≈10 MPa). Yet it has astonishing mean tensile strength (≈150 MPa) and fracture energy (≈350 to 1,240 J/m2). The reasons have recently become well understood: (i) the nanoscale thickness (≈300 nm) of nacre's building blocks, the aragonite lamellae (or platelets), and (ii) the imbricated, or staggered, arrangement of these lamellea, bound by biopolymer layers only ≈25 nm thick, occupying <5% of volume. These properties inspire manmade biomimetic materials. For engineering applications, however, the failure probability of ≤10-6 is generally required. To guarantee it, the type of probability density function (pdf) of strength, including its tail, must be determined. This objective, not pursued previously, is hardly achievable by experiments alone, since >10^8 tests of specimens would be needed. Here we outline a statistical model of strength that resembles a fishnet pulled diagonally, captures the tail of pdf of strength and, importantly, allows analytical safety assessments of nacreous materials. The analysis shows that, in terms of safety, the imbricated lamellar structure provides a major additional advantage—˜10% strength increase at tail failure probability 10^-6 and a 1 to 2 orders of magnitude tail probability decrease at fixed stress. Another advantage is that a high scatter of microstructure properties diminishes the strength difference between the mean and the probability tail, compared with the weakest link model. These advantages of nacre-like materials are here justified analytically and supported by millions of Monte Carlo simulations.
Effect of Progressive Heart Failure on Cerebral Hemodynamics and Monoamine Metabolism in CNS.
Mamalyga, M L; Mamalyga, L M
2017-07-01
Compensated and decompensated heart failure are characterized by different associations of disorders in the brain and heart. In compensated heart failure, the blood flow in the common carotid and basilar arteries does not change. Exacerbation of heart failure leads to severe decompensation and is accompanied by a decrease in blood flow in the carotid and basilar arteries. Changes in monoamine content occurring in the brain at different stages of heart failure are determined by various factors. The functional exercise test showed unequal monoamine-synthesizing capacities of the brain in compensated and decompensated heart failure. Reduced capacity of the monoaminergic systems in decompensated heart failure probably leads to overstrain of the central regulatory mechanisms, their gradual exhaustion, and failure of the compensatory mechanisms, which contributes to progression of heart failure.
Failure Investigation of Radiant Platen Superheater Tube of Thermal Power Plant Boiler
NASA Astrophysics Data System (ADS)
Ghosh, D.; Ray, S.; Mandal, A.; Roy, H.
2015-04-01
This paper highlights a case study of typical premature failure of a radiant platen superheater tube of 210 MW thermal power plant boiler. Visual examination, dimensional measurement and chemical analysis, are conducted as part of the investigations. Apart from these, metallographic analysis and fractography are also conducted to ascertain the probable cause of failure. Finally it has been concluded that the premature failure of the super heater tube can be attributed to localized creep at high temperature. The corrective actions has also been suggested to avoid this type of failure in near future.
Reliability analysis of redundant systems. [a method to compute transition probabilities
NASA Technical Reports Server (NTRS)
Yeh, H. Y.
1974-01-01
A method is proposed to compute the transition probability (the probability of partial or total failure) of parallel redundant system. The effect of geometry of the system, the direction of load, and the degree of redundancy on the probability of complete survival of parachute-like system are also studied. The results show that the probability of complete survival of three-member parachute-like system is very sensitive to the variation of horizontal angle of the load. However, it becomes very insignificant as the degree of redundancy increases.
CytoBayesJ: software tools for Bayesian analysis of cytogenetic radiation dosimetry data.
Ainsbury, Elizabeth A; Vinnikov, Volodymyr; Puig, Pedro; Maznyk, Nataliya; Rothkamm, Kai; Lloyd, David C
2013-08-30
A number of authors have suggested that a Bayesian approach may be most appropriate for analysis of cytogenetic radiation dosimetry data. In the Bayesian framework, probability of an event is described in terms of previous expectations and uncertainty. Previously existing, or prior, information is used in combination with experimental results to infer probabilities or the likelihood that a hypothesis is true. It has been shown that the Bayesian approach increases both the accuracy and quality assurance of radiation dose estimates. New software entitled CytoBayesJ has been developed with the aim of bringing Bayesian analysis to cytogenetic biodosimetry laboratory practice. CytoBayesJ takes a number of Bayesian or 'Bayesian like' methods that have been proposed in the literature and presents them to the user in the form of simple user-friendly tools, including testing for the most appropriate model for distribution of chromosome aberrations and calculations of posterior probability distributions. The individual tools are described in detail and relevant examples of the use of the methods and the corresponding CytoBayesJ software tools are given. In this way, the suitability of the Bayesian approach to biological radiation dosimetry is highlighted and its wider application encouraged by providing a user-friendly software interface and manual in English and Russian. Copyright © 2013 Elsevier B.V. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sallaberry, Cedric Jean-Marie; Helton, Jon C.
2015-05-01
Weak link (WL)/strong link (SL) systems are important parts of the overall operational design of high - consequence systems. In such designs, the SL system is very robust and is intended to permit operation of the entire system under, and only under, intended conditions. In contrast, the WL system is intended to fail in a predictable and irreversible manner under accident conditions and render the entire system inoperable before an accidental operation of the SL system. The likelihood that the WL system will fail to d eactivate the entire system before the SL system fails (i.e., degrades into a configurationmore » that could allow an accidental operation of the entire system) is referred to as probability of loss of assured safety (PLOAS). This report describes the Fortran 90 program CPLOAS_2 that implements the following representations for PLOAS for situations in which both link physical properties and link failure properties are time - dependent: (i) failure of all SLs before failure of any WL, (ii) failure of any SL before f ailure of any WL, (iii) failure of all SLs before failure of all WLs, and (iv) failure of any SL before failure of all WLs. The effects of aleatory uncertainty and epistemic uncertainty in the definition and numerical evaluation of PLOAS can be included in the calculations performed by CPLOAS_2. Keywords: Aleatory uncertainty, CPLOAS_2, Epistemic uncertainty, Probability of loss of assured safety, Strong link, Uncertainty analysis, Weak link« less
King, C.-Y.; Luo, G.
1990-01-01
Electric resistance and emissions of hydrogen and radon isotopes of concrete (which is somewhat similar to fault-zone materials) under increasing uniaxial compression were continuously monitored to check whether they show any pre- and post-failure changes that may correspond to similar changes reported for earthquakes. The results show that all these parameters generally begin to increase when the applied stresses reach 20% to 90% of the corresponding failure stresses, probably due to the occurrence and growth of dilatant microcracks in the specimens. The prefailure changes have different patterns for different specimens, probably because of differences in spatial and temporal distributions of the microcracks. The resistance shows large co-failure increases, and the gas emissions show large post-failure increases. The post-failure increase of radon persists longer and stays at a higher level than that of hydrogen, suggesting a difference in the emission mechanisms for these two kinds of gases. The H2 increase may be mainly due to chemical reaction at the crack surfaces while they are fresh, whereas the Rn increases may be mainly the result of the increased emanation area of such surfaces. The results suggest that monitoring of resistivity and gas emissions may be useful for predicting earthquakes and failures of concrete structures. ?? 1990 Birkha??user Verlag.
Advances on the Failure Analysis of the Dam-Foundation Interface of Concrete Dams.
Altarejos-García, Luis; Escuder-Bueno, Ignacio; Morales-Torres, Adrián
2015-12-02
Failure analysis of the dam-foundation interface in concrete dams is characterized by complexity, uncertainties on models and parameters, and a strong non-linear softening behavior. In practice, these uncertainties are dealt with a well-structured mixture of experience, best practices and prudent, conservative design approaches based on the safety factor concept. Yet, a sound, deep knowledge of some aspects of this failure mode remain unveiled, as they have been offset in practical applications by the use of this conservative approach. In this paper we show a strategy to analyse this failure mode under a reliability-based approach. The proposed methodology of analysis integrates epistemic uncertainty on spatial variability of strength parameters and data from dam monitoring. The purpose is to produce meaningful and useful information regarding the probability of occurrence of this failure mode that can be incorporated in risk-informed dam safety reviews. In addition, relationships between probability of failure and factors of safety are obtained. This research is supported by a more than a decade of intensive professional practice on real world cases and its final purpose is to bring some clarity, guidance and to contribute to the improvement of current knowledge and best practices on such an important dam safety concern.
Prognostic Factors in Severe Chagasic Heart Failure
Costa, Sandra de Araújo; Rassi, Salvador; Freitas, Elis Marra da Madeira; Gutierrez, Natália da Silva; Boaventura, Fabiana Miranda; Sampaio, Larissa Pereira da Costa; Silva, João Bastista Masson
2017-01-01
Background Prognostic factors are extensively studied in heart failure; however, their role in severe Chagasic heart failure have not been established. Objectives To identify the association of clinical and laboratory factors with the prognosis of severe Chagasic heart failure, as well as the association of these factors with mortality and survival in a 7.5-year follow-up. Methods 60 patients with severe Chagasic heart failure were evaluated regarding the following variables: age, blood pressure, ejection fraction, serum sodium, creatinine, 6-minute walk test, non-sustained ventricular tachycardia, QRS width, indexed left atrial volume, and functional class. Results 53 (88.3%) patients died during follow-up, and 7 (11.7%) remained alive. Cumulative overall survival probability was approximately 11%. Non-sustained ventricular tachycardia (HR = 2.11; 95% CI: 1.04 - 4.31; p<0.05) and indexed left atrial volume ≥ 72 mL/m2 (HR = 3.51; 95% CI: 1.63 - 7.52; p<0.05) were the only variables that remained as independent predictors of mortality. Conclusions The presence of non-sustained ventricular tachycardia on Holter and indexed left atrial volume > 72 mL/m2 are independent predictors of mortality in severe Chagasic heart failure, with cumulative survival probability of only 11% in 7.5 years. PMID:28443956
Advances on the Failure Analysis of the Dam—Foundation Interface of Concrete Dams
Altarejos-García, Luis; Escuder-Bueno, Ignacio; Morales-Torres, Adrián
2015-01-01
Failure analysis of the dam-foundation interface in concrete dams is characterized by complexity, uncertainties on models and parameters, and a strong non-linear softening behavior. In practice, these uncertainties are dealt with a well-structured mixture of experience, best practices and prudent, conservative design approaches based on the safety factor concept. Yet, a sound, deep knowledge of some aspects of this failure mode remain unveiled, as they have been offset in practical applications by the use of this conservative approach. In this paper we show a strategy to analyse this failure mode under a reliability-based approach. The proposed methodology of analysis integrates epistemic uncertainty on spatial variability of strength parameters and data from dam monitoring. The purpose is to produce meaningful and useful information regarding the probability of occurrence of this failure mode that can be incorporated in risk-informed dam safety reviews. In addition, relationships between probability of failure and factors of safety are obtained. This research is supported by a more than a decade of intensive professional practice on real world cases and its final purpose is to bring some clarity, guidance and to contribute to the improvement of current knowledge and best practices on such an important dam safety concern. PMID:28793709
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tsauo, Jiaywei, E-mail: 80732059@qq.com; Luo, Xuefeng, E-mail: luobo-913@126.com; Ye, Linchao, E-mail: linchao.ye@siemens.com
2015-06-15
PurposeThis study was designed to report our results with a modified technique of three-dimensional (3D) path planning software assisted transjugular intrahepatic portosystemic shunt (TIPS).Methods3D path planning software was recently developed to facilitate TIPS creation by using two carbon dioxide portograms acquired at least 20° apart to generate a 3D path for overlay needle guidance. However, one shortcoming is that puncturing along the overlay would be technically impossible if the angle of the liver access set and the angle of the 3D path are not the same. To solve this problem, a prototype 3D path planning software was fitted with a utility to calculate themore » angle of the 3D path. Using this, we modified the angle of the liver access set accordingly during the procedure in ten patients.ResultsFailure for technical reasons occurred in three patients (unsuccessful wedged hepatic venography in two cases, software technical failure in one case). The procedure was successful in the remaining seven patients, and only one needle pass was required to obtain portal vein access in each case. The course of puncture was comparable to the 3D path in all patients. No procedure-related complication occurred following the procedures.ConclusionsAdjusting the angle of the liver access set to match the angle of the 3D path determined by the software appears to be a favorable modification to the technique of 3D path planning software assisted TIPS.« less
Fatigue analysis of composite materials using the fail-safe concept
NASA Technical Reports Server (NTRS)
Stievenard, G.
1982-01-01
If R1 is the probability of having a crack on a flight component and R2 is the probability of seeing this crack propagate between two scheduled inspections, the global failure regulation states that this product must not exceed 0.0000001.
Fault Tree Analysis Application for Safety and Reliability
NASA Technical Reports Server (NTRS)
Wallace, Dolores R.
2003-01-01
Many commercial software tools exist for fault tree analysis (FTA), an accepted method for mitigating risk in systems. The method embedded in the tools identifies a root as use in system components, but when software is identified as a root cause, it does not build trees into the software component. No commercial software tools have been built specifically for development and analysis of software fault trees. Research indicates that the methods of FTA could be applied to software, but the method is not practical without automated tool support. With appropriate automated tool support, software fault tree analysis (SFTA) may be a practical technique for identifying the underlying cause of software faults that may lead to critical system failures. We strive to demonstrate that existing commercial tools for FTA can be adapted for use with SFTA, and that applied to a safety-critical system, SFTA can be used to identify serious potential problems long before integrator and system testing.
Human versus automation in responding to failures: an expected-value analysis
NASA Technical Reports Server (NTRS)
Sheridan, T. B.; Parasuraman, R.
2000-01-01
A simple analytical criterion is provided for deciding whether a human or automation is best for a failure detection task. The method is based on expected-value decision theory in much the same way as is signal detection. It requires specification of the probabilities of misses (false negatives) and false alarms (false positives) for both human and automation being considered, as well as factors independent of the choice--namely, costs and benefits of incorrect and correct decisions as well as the prior probability of failure. The method can also serve as a basis for comparing different modes of automation. Some limiting cases of application are discussed, as are some decision criteria other than expected value. Actual or potential applications include the design and evaluation of any system in which either humans or automation are being considered.
Conesa, David; López-Quílez, Antonio; Martínez-Beneito, Miguel Angel; Miralles, María Teresa; Verdejo, Francisco
2009-07-29
The early identification of influenza outbreaks has became a priority in public health practice. A large variety of statistical algorithms for the automated monitoring of influenza surveillance have been proposed, but most of them require not only a lot of computational effort but also operation of sometimes not-so-friendly software. In this paper, we introduce FluDetWeb, an implementation of a prospective influenza surveillance methodology based on a client-server architecture with a thin (web-based) client application design. Users can introduce and edit their own data consisting of a series of weekly influenza incidence rates. The system returns the probability of being in an epidemic phase (via e-mail if desired). When the probability is greater than 0.5, it also returns the probability of an increase in the incidence rate during the following week. The system also provides two complementary graphs. This system has been implemented using statistical free-software (R and WinBUGS), a web server environment for Java code (Tomcat) and a software module created by us (Rdp) responsible for managing internal tasks; the software package MySQL has been used to construct the database management system. The implementation is available on-line from: http://www.geeitema.org/meviepi/fludetweb/. The ease of use of FluDetWeb and its on-line availability can make it a valuable tool for public health practitioners who want to obtain information about the probability that their system is in an epidemic phase. Moreover, the architecture described can also be useful for developers of systems based on computationally intensive methods.
2009-01-01
Background The early identification of influenza outbreaks has became a priority in public health practice. A large variety of statistical algorithms for the automated monitoring of influenza surveillance have been proposed, but most of them require not only a lot of computational effort but also operation of sometimes not-so-friendly software. Results In this paper, we introduce FluDetWeb, an implementation of a prospective influenza surveillance methodology based on a client-server architecture with a thin (web-based) client application design. Users can introduce and edit their own data consisting of a series of weekly influenza incidence rates. The system returns the probability of being in an epidemic phase (via e-mail if desired). When the probability is greater than 0.5, it also returns the probability of an increase in the incidence rate during the following week. The system also provides two complementary graphs. This system has been implemented using statistical free-software (ℝ and WinBUGS), a web server environment for Java code (Tomcat) and a software module created by us (Rdp) responsible for managing internal tasks; the software package MySQL has been used to construct the database management system. The implementation is available on-line from: http://www.geeitema.org/meviepi/fludetweb/. Conclusion The ease of use of FluDetWeb and its on-line availability can make it a valuable tool for public health practitioners who want to obtain information about the probability that their system is in an epidemic phase. Moreover, the architecture described can also be useful for developers of systems based on computationally intensive methods. PMID:19640304
Wenk, Jonathan F; Wall, Samuel T; Peterson, Robert C; Helgerson, Sam L; Sabbah, Hani N; Burger, Mike; Stander, Nielen; Ratcliffe, Mark B; Guccione, Julius M
2009-12-01
Heart failure continues to present a significant medical and economic burden throughout the developed world. Novel treatments involving the injection of polymeric materials into the myocardium of the failing left ventricle (LV) are currently being developed, which may reduce elevated myofiber stresses during the cardiac cycle and act to retard the progression of heart failure. A finite element (FE) simulation-based method was developed in this study that can automatically optimize the injection pattern of the polymeric "inclusions" according to a specific objective function, using commercially available software tools. The FE preprocessor TRUEGRID((R)) was used to create a parametric axisymmetric LV mesh matched to experimentally measured end-diastole and end-systole metrics from dogs with coronary microembolization-induced heart failure. Passive and active myocardial material properties were defined by a pseudo-elastic-strain energy function and a time-varying elastance model of active contraction, respectively, that were implemented in the FE software LS-DYNA. The companion optimization software LS-OPT was used to communicate directly with TRUEGRID((R)) to determine FE model parameters, such as defining the injection pattern and inclusion characteristics. The optimization resulted in an intuitive optimal injection pattern (i.e., the one with the greatest number of inclusions) when the objective function was weighted to minimize mean end-diastolic and end-systolic myofiber stress and ignore LV stroke volume. In contrast, the optimization resulted in a nonintuitive optimal pattern (i.e., 3 inclusions longitudinallyx6 inclusions circumferentially) when both myofiber stress and stroke volume were incorporated into the objective function with different weights.
Runtime Verification of Pacemaker Functionality Using Hierarchical Fuzzy Colored Petri-nets.
Majma, Negar; Babamir, Seyed Morteza; Monadjemi, Amirhassan
2017-02-01
Today, implanted medical devices are increasingly used for many patients and in case of diverse health problems. However, several runtime problems and errors are reported by the relevant organizations, even resulting in patient death. One of those devices is the pacemaker. The pacemaker is a device helping the patient to regulate the heartbeat by connecting to the cardiac vessels. This device is directed by its software, so any failure in this software causes a serious malfunction. Therefore, this study aims to a better way to monitor the device's software behavior to decrease the failure risk. Accordingly, we supervise the runtime function and status of the software. The software verification means examining limitations and needs of the system users by the system running software. In this paper, a method to verify the pacemaker software, based on the fuzzy function of the device, is presented. So, the function limitations of the device are identified and presented as fuzzy rules and then the device is verified based on the hierarchical Fuzzy Colored Petri-net (FCPN), which is formed considering the software limits. Regarding the experiences of using: 1) Fuzzy Petri-nets (FPN) to verify insulin pumps, 2) Colored Petri-nets (CPN) to verify the pacemaker and 3) To verify the pacemaker by a software agent with Petri-network based knowledge, which we gained during the previous studies, the runtime behavior of the pacemaker software is examined by HFCPN, in this paper. This is considered a developing step compared to the earlier work. HFCPN in this paper, compared to the FPN and CPN used in our previous studies reduces the complexity. By presenting the Petri-net (PN) in a hierarchical form, the verification runtime, decreased as 90.61% compared to the verification runtime in the earlier work. Since we need an inference engine in the runtime verification, we used the HFCPN to enhance the performance of the inference engine.
Li, Qiuying; Pham, Hoang
2017-01-01
In this paper, we propose a software reliability model that considers not only error generation but also fault removal efficiency combined with testing coverage information based on a nonhomogeneous Poisson process (NHPP). During the past four decades, many software reliability growth models (SRGMs) based on NHPP have been proposed to estimate the software reliability measures, most of which have the same following agreements: 1) it is a common phenomenon that during the testing phase, the fault detection rate always changes; 2) as a result of imperfect debugging, fault removal has been related to a fault re-introduction rate. But there are few SRGMs in the literature that differentiate between fault detection and fault removal, i.e. they seldom consider the imperfect fault removal efficiency. But in practical software developing process, fault removal efficiency cannot always be perfect, i.e. the failures detected might not be removed completely and the original faults might still exist and new faults might be introduced meanwhile, which is referred to as imperfect debugging phenomenon. In this study, a model aiming to incorporate fault introduction rate, fault removal efficiency and testing coverage into software reliability evaluation is developed, using testing coverage to express the fault detection rate and using fault removal efficiency to consider the fault repair. We compare the performance of the proposed model with several existing NHPP SRGMs using three sets of real failure data based on five criteria. The results exhibit that the model can give a better fitting and predictive performance.
Diagnostics Tools Identify Faults Prior to Failure
NASA Technical Reports Server (NTRS)
2013-01-01
Through the SBIR program, Rochester, New York-based Impact Technologies LLC collaborated with Ames Research Center to commercialize the Center s Hybrid Diagnostic Engine, or HyDE, software. The fault detecting program is now incorporated into a software suite that identifies potential faults early in the design phase of systems ranging from printers to vehicles and robots, saving time and money.
Markov Chains For Testing Redundant Software
NASA Technical Reports Server (NTRS)
White, Allan L.; Sjogren, Jon A.
1990-01-01
Preliminary design developed for validation experiment that addresses problems unique to assuring extremely high quality of multiple-version programs in process-control software. Approach takes into account inertia of controlled system in sense it takes more than one failure of control program to cause controlled system to fail. Verification procedure consists of two steps: experimentation (numerical simulation) and computation, with Markov model for each step.
Exponential order statistic models of software reliability growth
NASA Technical Reports Server (NTRS)
Miller, D. R.
1985-01-01
Failure times of a software reliabilty growth process are modeled as order statistics of independent, nonidentically distributed exponential random variables. The Jelinsky-Moranda, Goel-Okumoto, Littlewood, Musa-Okumoto Logarithmic, and Power Law models are all special cases of Exponential Order Statistic Models, but there are many additional examples also. Various characterizations, properties and examples of this class of models are developed and presented.
Evolution of a modular software network
Fortuna, Miguel A.; Bonachela, Juan A.; Levin, Simon A.
2011-01-01
“Evolution behaves like a tinkerer” (François Jacob, Science, 1977). Software systems provide a singular opportunity to understand biological processes using concepts from network theory. The Debian GNU/Linux operating system allows us to explore the evolution of a complex network in a unique way. The modular design detected during its growth is based on the reuse of existing code in order to minimize costs during programming. The increase of modularity experienced by the system over time has not counterbalanced the increase in incompatibilities between software packages within modules. This negative effect is far from being a failure of design. A random process of package installation shows that the higher the modularity, the larger the fraction of packages working properly in a local computer. The decrease in the relative number of conflicts between packages from different modules avoids a failure in the functionality of one package spreading throughout the entire system. Some potential analogies with the evolutionary and ecological processes determining the structure of ecological networks of interacting species are discussed. PMID:22106260
An experimental investigation of fault tolerant software structures in an avionics application
NASA Technical Reports Server (NTRS)
Caglayan, Alper K.; Eckhardt, Dave E., Jr.
1989-01-01
The objective of this experimental investigation is to compare the functional performance and software reliability of competing fault tolerant software structures utilizing software diversity. In this experiment, three versions of the redundancy management software for a skewed sensor array have been developed using three diverse failure detection and isolation algorithms and incorporated into various N-version, recovery block and hybrid software structures. The empirical results show that, for maximum functional performance improvement in the selected application domain, the results of diverse algorithms should be voted before being processed by multiple versions without enforced diversity. Results also suggest that when the reliability gain with an N-version structure is modest, recovery block structures are more feasible since higher reliability can be obtained using an acceptance check with a modest reliability.
NASA Technical Reports Server (NTRS)
Schumann, Johann; Rozier, Kristin Y.; Reinbacher, Thomas; Mengshoel, Ole J.; Mbaya, Timmy; Ippolito, Corey
2013-01-01
Unmanned aerial systems (UASs) can only be deployed if they can effectively complete their missions and respond to failures and uncertain environmental conditions while maintaining safety with respect to other aircraft as well as humans and property on the ground. In this paper, we design a real-time, on-board system health management (SHM) capability to continuously monitor sensors, software, and hardware components for detection and diagnosis of failures and violations of safety or performance rules during the flight of a UAS. Our approach to SHM is three-pronged, providing: (1) real-time monitoring of sensor and/or software signals; (2) signal analysis, preprocessing, and advanced on the- fly temporal and Bayesian probabilistic fault diagnosis; (3) an unobtrusive, lightweight, read-only, low-power realization using Field Programmable Gate Arrays (FPGAs) that avoids overburdening limited computing resources or costly re-certification of flight software due to instrumentation. Our implementation provides a novel approach of combining modular building blocks, integrating responsive runtime monitoring of temporal logic system safety requirements with model-based diagnosis and Bayesian network-based probabilistic analysis. We demonstrate this approach using actual data from the NASA Swift UAS, an experimental all-electric aircraft.
Software-Implemented Fault Tolerance in Communications Systems
NASA Technical Reports Server (NTRS)
Gantenbein, Rex E.
1994-01-01
Software-implemented fault tolerance (SIFT) is used in many computer-based command, control, and communications (C(3)) systems to provide the nearly continuous availability that they require. In the communications subsystem of Space Station Alpha, SIFT algorithms are used to detect and recover from failures in the data and command link between the Station and its ground support. The paper presents a review of these algorithms and discusses how such techniques can be applied to similar systems found in applications such as manufacturing control, military communications, and programmable devices such as pacemakers. With support from the Tracking and Communication Division of NASA's Johnson Space Center, researchers at the University of Wyoming are developing a testbed for evaluating the effectiveness of these algorithms prior to their deployment. This testbed will be capable of simulating a variety of C(3) system failures and recording the response of the Space Station SIFT algorithms to these failures. The design of this testbed and the applicability of the approach in other environments is described.
NASA Technical Reports Server (NTRS)
Delaat, J. C.; Merrill, W. C.
1983-01-01
A sensor failure detection, isolation, and accommodation algorithm was developed which incorporates analytic sensor redundancy through software. This algorithm was implemented in a high level language on a microprocessor based controls computer. Parallel processing and state-of-the-art 16-bit microprocessors are used along with efficient programming practices to achieve real-time operation.
Bowden, Vanessa K; Visser, Troy A W; Loft, Shayne
2017-06-01
It is generally assumed that drivers speed intentionally because of factors such as frustration with the speed limit or general impatience. The current study examined whether speeding following an interruption could be better explained by unintentional prospective memory (PM) failure. In these situations, interrupting drivers may create a PM task, with speeding the result of drivers forgetting their newly encoded intention to travel at a lower speed after interruption. Across 3 simulated driving experiments, corrected or uncorrected speeding in recently reduced speed zones (from 70 km/h to 40 km/h) increased on average from 8% when uninterrupted to 33% when interrupted. Conversely, the probability that participants traveled under their new speed limit in recently increased speed zones (from 40 km/h to 70 km/h) increased from 1% when uninterrupted to 23% when interrupted. Consistent with a PM explanation, this indicates that interruptions lead to a general failure to follow changed speed limits, not just to increased speeding. Further testing a PM explanation, Experiments 2 and 3 manipulated variables expected to influence the probability of PM failures and subsequent speeding after interruptions. Experiment 2 showed that performing a cognitively demanding task during the interruption, when compared with unfilled interruptions, increased the probability of initially speeding from 1% to 11%, but that participants were able to correct (reduce) their speed. In Experiment 3, providing participants with 10s longer to encode the new speed limit before interruption decreased the probability of uncorrected speeding after an unfilled interruption from 30% to 20%. Theoretical implications and implications for road design interventions are discussed. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
neutron-Induced Failures in semiconductor Devices
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wender, Stephen Arthur
2017-03-13
Single Event Effects are a very significant failure mode in modern semiconductor devices that may limit their reliability. Accelerated testing is important for semiconductor industry. Considerable more work is needed in this field to mitigate the problem. Mitigation of this problem will probably come from Physicists and Electrical Engineers working together
14 CFR 29.729 - Retracting mechanism.
Code of Federal Regulations, 2014 CFR
2014-01-01
... loads occurring during retraction and extension at any airspeed up to the design maximum landing gear... of— (1) Any reasonably probable failure in the normal retraction system; or (2) The failure of any... location and operation of the retraction control must meet the requirements of §§ 29.777 and 29.779. (g...
14 CFR 27.729 - Retracting mechanism.
Code of Federal Regulations, 2010 CFR
2010-01-01
... loads occurring during retraction and extension at any airspeed up to the design maximum landing gear... of— (1) Any reasonably probable failure in the normal retraction system; or (2) The failure of any... location and operation of the retraction control must meet the requirements of §§ 27.777 and 27.779. (g...
14 CFR 29.729 - Retracting mechanism.
Code of Federal Regulations, 2012 CFR
2012-01-01
... loads occurring during retraction and extension at any airspeed up to the design maximum landing gear... of— (1) Any reasonably probable failure in the normal retraction system; or (2) The failure of any... location and operation of the retraction control must meet the requirements of §§ 29.777 and 29.779. (g...
14 CFR 27.729 - Retracting mechanism.
Code of Federal Regulations, 2013 CFR
2013-01-01
... loads occurring during retraction and extension at any airspeed up to the design maximum landing gear... of— (1) Any reasonably probable failure in the normal retraction system; or (2) The failure of any... location and operation of the retraction control must meet the requirements of §§ 27.777 and 27.779. (g...
14 CFR 29.729 - Retracting mechanism.
Code of Federal Regulations, 2011 CFR
2011-01-01
... loads occurring during retraction and extension at any airspeed up to the design maximum landing gear... of— (1) Any reasonably probable failure in the normal retraction system; or (2) The failure of any... location and operation of the retraction control must meet the requirements of §§ 29.777 and 29.779. (g...
14 CFR 29.729 - Retracting mechanism.
Code of Federal Regulations, 2013 CFR
2013-01-01
... loads occurring during retraction and extension at any airspeed up to the design maximum landing gear... of— (1) Any reasonably probable failure in the normal retraction system; or (2) The failure of any... location and operation of the retraction control must meet the requirements of §§ 29.777 and 29.779. (g...
14 CFR 27.729 - Retracting mechanism.
Code of Federal Regulations, 2012 CFR
2012-01-01
... loads occurring during retraction and extension at any airspeed up to the design maximum landing gear... of— (1) Any reasonably probable failure in the normal retraction system; or (2) The failure of any... location and operation of the retraction control must meet the requirements of §§ 27.777 and 27.779. (g...
14 CFR 27.729 - Retracting mechanism.
Code of Federal Regulations, 2011 CFR
2011-01-01
... loads occurring during retraction and extension at any airspeed up to the design maximum landing gear... of— (1) Any reasonably probable failure in the normal retraction system; or (2) The failure of any... location and operation of the retraction control must meet the requirements of §§ 27.777 and 27.779. (g...
14 CFR 27.729 - Retracting mechanism.
Code of Federal Regulations, 2014 CFR
2014-01-01
... loads occurring during retraction and extension at any airspeed up to the design maximum landing gear... of— (1) Any reasonably probable failure in the normal retraction system; or (2) The failure of any... location and operation of the retraction control must meet the requirements of §§ 27.777 and 27.779. (g...
14 CFR 29.729 - Retracting mechanism.
Code of Federal Regulations, 2010 CFR
2010-01-01
... loads occurring during retraction and extension at any airspeed up to the design maximum landing gear... of— (1) Any reasonably probable failure in the normal retraction system; or (2) The failure of any... location and operation of the retraction control must meet the requirements of §§ 29.777 and 29.779. (g...
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ebeida, Mohamed S.; Mitchell, Scott A.; Swiler, Laura P.
We introduce a novel technique, POF-Darts, to estimate the Probability Of Failure based on random disk-packing in the uncertain parameter space. POF-Darts uses hyperplane sampling to explore the unexplored part of the uncertain space. We use the function evaluation at a sample point to determine whether it belongs to failure or non-failure regions, and surround it with a protection sphere region to avoid clustering. We decompose the domain into Voronoi cells around the function evaluations as seeds and choose the radius of the protection sphere depending on the local Lipschitz continuity. As sampling proceeds, regions uncovered with spheres will shrink,more » improving the estimation accuracy. After exhausting the function evaluation budget, we build a surrogate model using the function evaluations associated with the sample points and estimate the probability of failure by exhaustive sampling of that surrogate. In comparison to other similar methods, our algorithm has the advantages of decoupling the sampling step from the surrogate construction one, the ability to reach target POF values with fewer samples, and the capability of estimating the number and locations of disconnected failure regions, not just the POF value. Furthermore, we present various examples to demonstrate the efficiency of our novel approach.« less
NASA Astrophysics Data System (ADS)
Massmann, Joel; Freeze, R. Allan
1987-02-01
This paper puts in place a risk-cost-benefit analysis for waste management facilities that explicitly recognizes the adversarial relationship that exists in a regulated market economy between the owner/operator of a waste management facility and the government regulatory agency under whose terms the facility must be licensed. The risk-cost-benefit analysis is set up from the perspective of the owner/operator. It can be used directly by the owner/operator to assess alternative design strategies. It can also be used by the regulatory agency to assess alternative regulatory policy, but only in an indirect manner, by examining the response of an owner/operator to the stimuli of various policies. The objective function is couched in terms of a discounted stream of benefits, costs, and risks over an engineering time horizon. Benefits are in the form of revenues for services provided; costs are those of construction and operation of the facility. Risk is defined as the cost associated with the probability of failure, with failure defined as the occurrence of a groundwater contamination event that violates the licensing requirements established for the facility. Failure requires a breach of the containment structure and contaminant migration through the hydrogeological environment to a compliance surface. The probability of failure can be estimated on the basis of reliability theory for the breach of containment and with a Monte-Carlo finite-element simulation for the advective contaminant transport. In the hydrogeological environment the hydraulic conductivity values are defined stochastically. The probability of failure is reduced by the presence of a monitoring network operated by the owner/operator and located between the source and the regulatory compliance surface. The level of reduction in the probability of failure depends on the probability of detection of the monitoring network, which can be calculated from the stochastic contaminant transport simulations. While the framework is quite general, the development in this paper is specifically suited for a landfill in which the primary design feature is one or more synthetic liners in parallel. Contamination is brought about by the release of a single, inorganic nonradioactive species into a saturated, high-permeability, advective, steady state horizontal flow system which can be analyzed with a two-dimensional analysis. It is possible to carry out sensitivity analyses for a wide variety of influences on this system, including landfill size, liner design, hydrogeological parameters, amount of exploration, extent of monitoring network, nature of remedial schemes, economic factors, and regulatory policy.
Defense Strategies for Asymmetric Networked Systems with Discrete Components.
Rao, Nageswara S V; Ma, Chris Y T; Hausken, Kjell; He, Fei; Yau, David K Y; Zhuang, Jun
2018-05-03
We consider infrastructures consisting of a network of systems, each composed of discrete components. The network provides the vital connectivity between the systems and hence plays a critical, asymmetric role in the infrastructure operations. The individual components of the systems can be attacked by cyber and physical means and can be appropriately reinforced to withstand these attacks. We formulate the problem of ensuring the infrastructure performance as a game between an attacker and a provider, who choose the numbers of the components of the systems and network to attack and reinforce, respectively. The costs and benefits of attacks and reinforcements are characterized using the sum-form, product-form and composite utility functions, each composed of a survival probability term and a component cost term. We present a two-level characterization of the correlations within the infrastructure: (i) the aggregate failure correlation function specifies the infrastructure failure probability given the failure of an individual system or network, and (ii) the survival probabilities of the systems and network satisfy first-order differential conditions that capture the component-level correlations using multiplier functions. We derive Nash equilibrium conditions that provide expressions for individual system survival probabilities and also the expected infrastructure capacity specified by the total number of operational components. We apply these results to derive and analyze defense strategies for distributed cloud computing infrastructures using cyber-physical models.
Reducing the Risk of Human Space Missions with INTEGRITY
NASA Technical Reports Server (NTRS)
Jones, Harry W.; Dillon-Merill, Robin L.; Tri, Terry O.; Henninger, Donald L.
2003-01-01
The INTEGRITY Program will design and operate a test bed facility to help prepare for future beyond-LEO missions. The purpose of INTEGRITY is to enable future missions by developing, testing, and demonstrating advanced human space systems. INTEGRITY will also implement and validate advanced management techniques including risk analysis and mitigation. One important way INTEGRITY will help enable future missions is by reducing their risk. A risk analysis of human space missions is important in defining the steps that INTEGRITY should take to mitigate risk. This paper describes how a Probabilistic Risk Assessment (PRA) of human space missions will help support the planning and development of INTEGRITY to maximize its benefits to future missions. PRA is a systematic methodology to decompose the system into subsystems and components, to quantify the failure risk as a function of the design elements and their corresponding probability of failure. PRA provides a quantitative estimate of the probability of failure of the system, including an assessment and display of the degree of uncertainty surrounding the probability. PRA provides a basis for understanding the impacts of decisions that affect safety, reliability, performance, and cost. Risks with both high probability and high impact are identified as top priority. The PRA of human missions beyond Earth orbit will help indicate how the risk of future human space missions can be reduced by integrating and testing systems in INTEGRITY.
Defense Strategies for Asymmetric Networked Systems with Discrete Components
Rao, Nageswara S. V.; Ma, Chris Y. T.; Hausken, Kjell; He, Fei; Yau, David K. Y.
2018-01-01
We consider infrastructures consisting of a network of systems, each composed of discrete components. The network provides the vital connectivity between the systems and hence plays a critical, asymmetric role in the infrastructure operations. The individual components of the systems can be attacked by cyber and physical means and can be appropriately reinforced to withstand these attacks. We formulate the problem of ensuring the infrastructure performance as a game between an attacker and a provider, who choose the numbers of the components of the systems and network to attack and reinforce, respectively. The costs and benefits of attacks and reinforcements are characterized using the sum-form, product-form and composite utility functions, each composed of a survival probability term and a component cost term. We present a two-level characterization of the correlations within the infrastructure: (i) the aggregate failure correlation function specifies the infrastructure failure probability given the failure of an individual system or network, and (ii) the survival probabilities of the systems and network satisfy first-order differential conditions that capture the component-level correlations using multiplier functions. We derive Nash equilibrium conditions that provide expressions for individual system survival probabilities and also the expected infrastructure capacity specified by the total number of operational components. We apply these results to derive and analyze defense strategies for distributed cloud computing infrastructures using cyber-physical models. PMID:29751588
Jibson, Randall W.; Harp, Edwin L.; Michael, John A.
1998-01-01
The 1994 Northridge, California, earthquake is the first earthquake for which we have all of the data sets needed to conduct a rigorous regional analysis of seismic slope instability. These data sets include (1) a comprehensive inventory of triggered landslides, (2) about 200 strong-motion records of the mainshock, (3) 1:24,000-scale geologic mapping of the region, (4) extensive data on engineering properties of geologic units, and (5) high-resolution digital elevation models of the topography. All of these data sets have been digitized and rasterized at 10-m grid spacing in the ARC/INFO GIS platform. Combining these data sets in a dynamic model based on Newmark's permanent-deformation (sliding-block) analysis yields estimates of coseismic landslide displacement in each grid cell from the Northridge earthquake. The modeled displacements are then compared with the digital inventory of landslides triggered by the Northridge earthquake to construct a probability curve relating predicted displacement to probability of failure. This probability function can be applied to predict and map the spatial variability in failure probability in any ground-shaking conditions of interest. We anticipate that this mapping procedure will be used to construct seismic landslide hazard maps that will assist in emergency preparedness planning and in making rational decisions regarding development and construction in areas susceptible to seismic slope failure.
Wang, X-M; Yin, S-H; Du, J; Du, M-L; Wang, P-Y; Wu, J; Horbinski, C M; Wu, M-J; Zheng, H-Q; Xu, X-Q; Shu, W; Zhang, Y-J
2017-07-01
Retreatment of tuberculosis (TB) often fails in China, yet the risk factors associated with the failure remain unclear. To identify risk factors for the treatment failure of retreated pulmonary tuberculosis (PTB) patients, we analyzed the data of 395 retreated PTB patients who received retreatment between July 2009 and July 2011 in China. PTB patients were categorized into 'success' and 'failure' groups by their treatment outcome. Univariable and multivariable logistic regression were used to evaluate the association between treatment outcome and socio-demographic as well as clinical factors. We also created an optimized risk score model to evaluate the predictive values of these risk factors on treatment failure. Of 395 patients, 99 (25·1%) were diagnosed as retreatment failure. Our results showed that risk factors associated with treatment failure included drug resistance, low education level, low body mass index (6 months), standard treatment regimen, retreatment type, positive culture result after 2 months of treatment, and the place where the first medicine was taken. An Optimized Framingham risk model was then used to calculate the risk scores of these factors. Place where first medicine was taken (temporary living places) received a score of 6, which was highest among all the factors. The predicted probability of treatment failure increases as risk score increases. Ten out of 359 patients had a risk score >9, which corresponded to an estimated probability of treatment failure >70%. In conclusion, we have identified multiple clinical and socio-demographic factors that are associated with treatment failure of retreated PTB patients. We also created an optimized risk score model that was effective in predicting the retreatment failure. These results provide novel insights for the prognosis and improvement of treatment for retreated PTB patients.
The implementation and use of Ada on distributed systems with high reliability requirements
NASA Technical Reports Server (NTRS)
Knight, J. C.; Gregory, S. T.; Urquhart, J. I. A.
1985-01-01
The use and implementation of Ada in distributed environments in which reliability is the primary concern were investigated. In particular, the concept that a distributed system may be programmed entirely in Ada so that the individual tasks of the system are unconcerned with which processors they are executing on, and that failures may occur in the software or underlying hardware was examined. Progress is discussed for the following areas: continued development and testing of the fault-tolerant Ada testbed; development of suggested changes to Ada so that it might more easily cope with the failure of interest; and design of new approaches to fault-tolerant software in real-time systems, and integration of these ideas into Ada.
Nygård, Lotte; Vogelius, Ivan R; Fischer, Barbara M; Kjær, Andreas; Langer, Seppo W; Aznar, Marianne C; Persson, Gitte F; Bentzen, Søren M
2018-04-01
The aim of the study was to build a model of first failure site- and lesion-specific failure probability after definitive chemoradiotherapy for inoperable NSCLC. We retrospectively analyzed 251 patients receiving definitive chemoradiotherapy for NSCLC at a single institution between 2009 and 2015. All patients were scanned by fludeoxyglucose positron emission tomography/computed tomography for radiotherapy planning. Clinical patient data and fludeoxyglucose positron emission tomography standardized uptake values from primary tumor and nodal lesions were analyzed by using multivariate cause-specific Cox regression. In patients experiencing locoregional failure, multivariable logistic regression was applied to assess risk of each lesion being the first site of failure. The two models were used in combination to predict probability of lesion failure accounting for competing events. Adenocarcinoma had a lower hazard ratio (HR) of locoregional failure than squamous cell carcinoma (HR = 0.45, 95% confidence interval [CI]: 0.26-0.76, p = 0.003). Distant failures were more common in the adenocarcinoma group (HR = 2.21, 95% CI: 1.41-3.48, p < 0.001). Multivariable logistic regression of individual lesions at the time of first failure showed that primary tumors were more likely to fail than lymph nodes (OR = 12.8, 95% CI: 5.10-32.17, p < 0.001). Increasing peak standardized uptake value was significantly associated with lesion failure (OR = 1.26 per unit increase, 95% CI: 1.12-1.40, p < 0.001). The electronic model is available at http://bit.ly/LungModelFDG. We developed a failure site-specific competing risk model based on patient- and lesion-level characteristics. Failure patterns differed between adenocarcinoma and squamous cell carcinoma, illustrating the limitation of aggregating them into NSCLC. Failure site-specific models add complementary information to conventional prognostic models. Copyright © 2018 International Association for the Study of Lung Cancer. Published by Elsevier Inc. All rights reserved.
Landslide Probability Assessment by the Derived Distributions Technique
NASA Astrophysics Data System (ADS)
Muñoz, E.; Ochoa, A.; Martínez, H.
2012-12-01
Landslides are potentially disastrous events that bring along human and economic losses; especially in cities where an accelerated and unorganized growth leads to settlements on steep and potentially unstable areas. Among the main causes of landslides are geological, geomorphological, geotechnical, climatological, hydrological conditions and anthropic intervention. This paper studies landslides detonated by rain, commonly known as "soil-slip", which characterize by having a superficial failure surface (Typically between 1 and 1.5 m deep) parallel to the slope face and being triggered by intense and/or sustained periods of rain. This type of landslides is caused by changes on the pore pressure produced by a decrease in the suction when a humid front enters, as a consequence of the infiltration initiated by rain and ruled by the hydraulic characteristics of the soil. Failure occurs when this front reaches a critical depth and the shear strength of the soil in not enough to guarantee the stability of the mass. Critical rainfall thresholds in combination with a slope stability model are widely used for assessing landslide probability. In this paper we present a model for the estimation of the occurrence of landslides based on the derived distributions technique. Since the works of Eagleson in the 1970s the derived distributions technique has been widely used in hydrology to estimate the probability of occurrence of extreme flows. The model estimates the probability density function (pdf) of the Factor of Safety (FOS) from the statistical behavior of the rainfall process and some slope parameters. The stochastic character of the rainfall is transformed by means of a deterministic failure model into FOS pdf. Exceedance probability and return period estimation is then straightforward. The rainfall process is modeled as a Rectangular Pulses Poisson Process (RPPP) with independent exponential pdf for mean intensity and duration of the storms. The Philip infiltration model is used along with the soil characteristic curve (suction vs. moisture) and the Mohr-Coulomb failure criteria in order to calculate the FOS of the slope. Data from two slopes located on steep tropical regions of the cities of Medellín (Colombia) and Rio de Janeiro (Brazil) where used to verify the model's performance. The results indicated significant differences between the obtained FOS values and the behavior observed on the field. The model shows relatively high values of FOS that do not reflect the instability of the analyzed slopes. For the two cases studied, the application of a more simple reliability concept (as the Probability of Failure - PR and Reliability Index - β), instead of a FOS could lead to more realistic results.
Introduction: Cybersecurity and Software Assurance Minitrack
DOE Office of Scientific and Technical Information (OSTI.GOV)
Burns, Luanne; George, Richard; Linger, Richard C
Modern society is dependent on software systems of remarkable scope and complexity. Yet methods for assuring their security and functionality have not kept pace. The result is persistent compromises and failures despite best efforts. Cybersecurity methods must work together for situational awareness, attack prevention and detection, threat attribution, minimization of consequences, and attack recovery. Because defective software cannot be secure, assurance technologies must play a central role in cybersecurity approaches. There is increasing recognition of the need for rigorous methods for cybersecurity and software assurance. The goal of this minitrack is to develop science foundations, technologies, and practices that canmore » improve the security and dependability of complex systems.« less
Cycles till failure of silver-zinc cells with completing failures modes: Preliminary data analysis
NASA Technical Reports Server (NTRS)
Sidik, S. M.; Leibecki, H. F.; Bozek, J. M.
1980-01-01
One hundred and twenty nine cells were run through charge-discharge cycles until failure. The experiment design was a variant of a central composite factorial in five factors. Preliminary data analysis consisted of response surface estimation of life. Batteries fail under two basic modes; a low voltage condition and an internal shorting condition. A competing failure modes analysis using maximum likelihood estimation for the extreme value life distribution was performed. Extensive diagnostics such as residual plotting and probability plotting were employed to verify data quality and choice of model.
Failure Modes and Effects Analysis (FMEA) Assistant Tool Feasibility Study
NASA Technical Reports Server (NTRS)
Flores, Melissa; Malin, Jane T.
2013-01-01
An effort to determine the feasibility of a software tool to assist in Failure Modes and Effects Analysis (FMEA) has been completed. This new and unique approach to FMEA uses model based systems engineering concepts to recommend failure modes, causes, and effects to the user after they have made several selections from pick lists about a component s functions and inputs/outputs. Recommendations are made based on a library using common failure modes identified over the course of several major human spaceflight programs. However, the tool could be adapted for use in a wide range of applications from NASA to the energy industry.
Failure Modes and Effects Analysis (FMEA) Assistant Tool Feasibility Study
NASA Astrophysics Data System (ADS)
Flores, Melissa D.; Malin, Jane T.; Fleming, Land D.
2013-09-01
An effort to determine the feasibility of a software tool to assist in Failure Modes and Effects Analysis (FMEA) has been completed. This new and unique approach to FMEA uses model based systems engineering concepts to recommend failure modes, causes, and effects to the user after they have made several selections from pick lists about a component's functions and inputs/outputs. Recommendations are made based on a library using common failure modes identified over the course of several major human spaceflight programs. However, the tool could be adapted for use in a wide range of applications from NASA to the energy industry.
Software requirements: Guidance and control software development specification
NASA Technical Reports Server (NTRS)
Withers, B. Edward; Rich, Don C.; Lowman, Douglas S.; Buckland, R. C.
1990-01-01
The software requirements for an implementation of Guidance and Control Software (GCS) are specified. The purpose of the GCS is to provide guidance and engine control to a planetary landing vehicle during its terminal descent onto a planetary surface and to communicate sensory information about that vehicle and its descent to some receiving device. The specification was developed using the structured analysis for real time system specification methodology by Hatley and Pirbhai and was based on a simulation program used to study the probability of success of the 1976 Viking Lander missions to Mars. Three versions of GCS are being generated for use in software error studies.
Remediating Non-Positive Definite State Covariances for Collision Probability Estimation
NASA Technical Reports Server (NTRS)
Hall, Doyle T.; Hejduk, Matthew D.; Johnson, Lauren C.
2017-01-01
The NASA Conjunction Assessment Risk Analysis team estimates the probability of collision (Pc) for a set of Earth-orbiting satellites. The Pc estimation software processes satellite position+velocity states and their associated covariance matri-ces. On occasion, the software encounters non-positive definite (NPD) state co-variances, which can adversely affect or prevent the Pc estimation process. Inter-polation inaccuracies appear to account for the majority of such covariances, alt-hough other mechanisms contribute also. This paper investigates the origin of NPD state covariance matrices, three different methods for remediating these co-variances when and if necessary, and the associated effects on the Pc estimation process.
NASA Astrophysics Data System (ADS)
Kim, Kwang Hyeon; Lee, Suk; Shim, Jang Bo; Yang, Dae Sik; Yoon, Won Sup; Park, Young Je; Kim, Chul Yong; Cao, Yuan Jie; Chang, Kyung Hwan
2018-01-01
The aim of this study was to derive a new plan-scoring index using normal tissue complication probabilities to verify different plans in the selection of personalized treatment. Plans for 12 patients treated with tomotherapy were used to compare scoring for ranking. Dosimetric and biological indexes were analyzed for the plans for a clearly distinguishable group ( n = 7) and a similar group ( n = 12), using treatment plan verification software that we developed. The quality factor ( QF) of our support software for treatment decisions was consistent with the final treatment plan for the clearly distinguishable group (average QF = 1.202, 100% match rate, n = 7) and the similar group (average QF = 1.058, 33% match rate, n = 12). Therefore, we propose a normal tissue complication probability (NTCP) based on the plan scoring index for verification of different plans for personalized treatment-plan selection. Scoring using the new QF showed a 100% match rate (average NTCP QF = 1.0420). The NTCP-based new QF scoring method was adequate for obtaining biological verification quality and organ risk saving using the treatment-planning decision-support software we developed for prostate cancer.
Applying formal methods and object-oriented analysis to existing flight software
NASA Technical Reports Server (NTRS)
Cheng, Betty H. C.; Auernheimer, Brent
1993-01-01
Correctness is paramount for safety-critical software control systems. Critical software failures in medical radiation treatment, communications, and defense are familiar to the public. The significant quantity of software malfunctions regularly reported to the software engineering community, the laws concerning liability, and a recent NRC Aeronautics and Space Engineering Board report additionally motivate the use of error-reducing and defect detection software development techniques. The benefits of formal methods in requirements driven software development ('forward engineering') is well documented. One advantage of rigorously engineering software is that formal notations are precise, verifiable, and facilitate automated processing. This paper describes the application of formal methods to reverse engineering, where formal specifications are developed for a portion of the shuttle on-orbit digital autopilot (DAP). Three objectives of the project were to: demonstrate the use of formal methods on a shuttle application, facilitate the incorporation and validation of new requirements for the system, and verify the safety-critical properties to be exhibited by the software.
SMART-FDIR: Use of Artificial Intelligence in the Implementation of a Satellite FDIR
NASA Astrophysics Data System (ADS)
Guiotto, A.; Martelli, A.; Paccagnini, C.
Nowadays space activities are characterized by increased constraints in terms of on-board computing power and functional complexity combined with reduction of costs and schedule. This scenario necessarily originates impacts on the on-board software with particular emphases to the interfaces between on-board software and system/mission level requirements. The questions are: How can the effectiveness of Space System Software design be improved? How can we increase sophistication in the area of autonomy and failure tolerance, maintaining the necessary quality with acceptable risks?
Diagnostic reasoning techniques for selective monitoring
NASA Technical Reports Server (NTRS)
Homem-De-mello, L. S.; Doyle, R. J.
1991-01-01
An architecture for using diagnostic reasoning techniques in selective monitoring is presented. Given the sensor readings and a model of the physical system, a number of assertions are generated and expressed as Boolean equations. The resulting system of Boolean equations is solved symbolically. Using a priori probabilities of component failure and Bayes' rule, revised probabilities of failure can be computed. These will indicate what components have failed or are the most likely to have failed. This approach is suitable for systems that are well understood and for which the correctness of the assertions can be guaranteed. Also, the system must be such that changes are slow enough to allow the computation.
Probabilistic metrology or how some measurement outcomes render ultra-precise estimates
NASA Astrophysics Data System (ADS)
Calsamiglia, J.; Gendra, B.; Muñoz-Tapia, R.; Bagan, E.
2016-10-01
We show on theoretical grounds that, even in the presence of noise, probabilistic measurement strategies (which have a certain probability of failure or abstention) can provide, upon a heralded successful outcome, estimates with a precision that exceeds the deterministic bounds for the average precision. This establishes a new ultimate bound on the phase estimation precision of particular measurement outcomes (or sequence of outcomes). For probe systems subject to local dephasing, we quantify such precision limit as a function of the probability of failure that can be tolerated. Our results show that the possibility of abstaining can set back the detrimental effects of noise.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Qi, Junjian; Pfenninger, Stefan
In this paper, we propose a strategy to control the self-organizing dynamics of the Bak-Tang-Wiesenfeld (BTW) sandpile model on complex networks by allowing some degree of failure tolerance for the nodes and introducing additional active dissipation while taking the risk of possible node damage. We show that the probability for large cascades significantly increases or decreases respectively when the risk for node damage outweighs the active dissipation and when the active dissipation outweighs the risk for node damage. By considering the potential additional risk from node damage, a non-trivial optimal active dissipation control strategy which minimizes the total cost inmore » the system can be obtained. Under some conditions the introduced control strategy can decrease the total cost in the system compared to the uncontrolled model. Moreover, when the probability of damaging a node experiencing failure tolerance is greater than the critical value, then no matter how successful the active dissipation control is, the total cost of the system will have to increase. This critical damage probability can be used as an indicator of the robustness of a network or system. Copyright (C) EPLA, 2015« less
GCS plan for software aspects of certification
NASA Technical Reports Server (NTRS)
Shagnea, Anita M.; Lowman, Douglas S.; Withers, B. Edward
1990-01-01
As part of the Guidance and Control Software (GCS) research project being sponsored by NASA to evaluate the failure processes of software, standard industry software development procedures are being employed. To ensure that these procedures are authentic, the guidelines outlined in the Radio Technical Commission for Aeronautics (RTCA/DO-178A document entitled, software considerations in airborne systems and equipment certification, were adopted. A major aspect of these guidelines is proper documentation. As such, this report, the plan for software aspects of certification, was produced in accordance with DO-178A. An overview is given of the GCS research project, including the goals of the project, project organization, and project schedules. It also specifies the plans for all aspects of the project which relate to the certification of the GCS implementations developed under a NASA contract. These plans include decisions made regarding the software specification, accuracy requirements, configuration management, implementation development and verification, and the development of the GCS simulator.
Dalthorp, Daniel; Huso, Manuela M. P.; Dail, David; Kenyon, Jessica
2014-01-01
Evidence of Absence software (EoA) is a user-friendly application used for estimating bird and bat fatalities at wind farms and designing search protocols. The software is particularly useful in addressing whether the number of fatalities has exceeded a given threshold and what search parameters are needed to give assurance that thresholds were not exceeded. The software is applicable even when zero carcasses have been found in searches. Depending on the effectiveness of the searches, such an absence of evidence of mortality may or may not be strong evidence that few fatalities occurred. Under a search protocol in which carcasses are detected with nearly 100 percent certainty, finding zero carcasses would be convincing evidence that overall mortality rate was near zero. By contrast, with a less effective search protocol with low probability of detecting a carcass, finding zero carcasses does not rule out the possibility that large numbers of animals were killed but not detected in the searches. EoA uses information about the search process and scavenging rates to estimate detection probabilities to determine a maximum credible number of fatalities, even when zero or few carcasses are observed.
Performance of concatenated Reed-Solomon/Viterbi channel coding
NASA Technical Reports Server (NTRS)
Divsalar, D.; Yuen, J. H.
1982-01-01
The concatenated Reed-Solomon (RS)/Viterbi coding system is reviewed. The performance of the system is analyzed and results are derived with a new simple approach. A functional model for the input RS symbol error probability is presented. Based on this new functional model, we compute the performance of a concatenated system in terms of RS word error probability, output RS symbol error probability, bit error probability due to decoding failure, and bit error probability due to decoding error. Finally we analyze the effects of the noisy carrier reference and the slow fading on the system performance.
Li, Jun; Zhang, Hong; Han, Yinshan; Wang, Baodong
2016-01-01
Focusing on the diversity, complexity and uncertainty of the third-party damage accident, the failure probability of third-party damage to urban gas pipeline was evaluated on the theory of analytic hierarchy process and fuzzy mathematics. The fault tree of third-party damage containing 56 basic events was built by hazard identification of third-party damage. The fuzzy evaluation of basic event probabilities were conducted by the expert judgment method and using membership function of fuzzy set. The determination of the weight of each expert and the modification of the evaluation opinions were accomplished using the improved analytic hierarchy process, and the failure possibility of the third-party to urban gas pipeline was calculated. Taking gas pipelines of a certain large provincial capital city as an example, the risk assessment structure of the method was proved to conform to the actual situation, which provides the basis for the safety risk prevention.
Role of stress triggering in earthquake migration on the North Anatolian fault
Stein, R.S.; Dieterich, J.H.; Barka, A.A.
1996-01-01
Ten M???6.7 earthquakes ruptured 1,000 km of the North Anatolian fault (Turkey) during 1939-92, providing an unsurpassed opportunity to study how one large shock sets up the next. Calculations of the change in Coulomb failure stress reveal that 9 out of 10 ruptures were brought closer to failure by the preceding shocks, typically by 5 bars, equivalent to 20 years of secular stressing. We translate the calculated stress changes into earthquake probabilities using an earthquake-nucleation constitutive relation, which includes both permanent and transient stress effects. For the typical 10-year period between triggering and subsequent rupturing shocks in the Anatolia sequence, the stress changes yield an average three-fold gain in the ensuing earthquake probability. Stress is now calculated to be high at several isolated sites along the fault. During the next 30 years, we estimate a 15% probability of a M???6.7 earthquake east of the major eastern center of Erzincan, and a 12% probability for a large event south of the major western port city of Izmit. Such stress-based probability calculations may thus be useful to assess and update earthquake hazards elsewhere. ?? 1997 Elsevier Science Ltd.
A Compatible Hardware/Software Reliability Prediction Model.
1981-07-22
machines. In particular, he was interested in the following problem: assu me that one has a collection of connected elements computing and transmitting...software reliability prediction model is desirable, the findings about the Weibull distribution are intriguing. After collecting failure data from several...capacitor, some of the added charge carriers are collected by the capacitor. If the added charge is sufficiently large, the information stored is changed
Case Study of Using High Performance Commercial Processors in Space
NASA Technical Reports Server (NTRS)
Ferguson, Roscoe C.; Olivas, Zulema
2009-01-01
The purpose of the Space Shuttle Cockpit Avionics Upgrade project (1999 2004) was to reduce crew workload and improve situational awareness. The upgrade was to augment the Shuttle avionics system with new hardware and software. A major success of this project was the validation of the hardware architecture and software design. This was significant because the project incorporated new technology and approaches for the development of human rated space software. An early version of this system was tested at the Johnson Space Center for one month by teams of astronauts. The results were positive, but NASA eventually cancelled the project towards the end of the development cycle. The goal to reduce crew workload and improve situational awareness resulted in the need for high performance Central Processing Units (CPUs). The choice of CPU selected was the PowerPC family, which is a reduced instruction set computer (RISC) known for its high performance. However, the requirement for radiation tolerance resulted in the re-evaluation of the selected family member of the PowerPC line. Radiation testing revealed that the original selected processor (PowerPC 7400) was too soft to meet mission objectives and an effort was established to perform trade studies and performance testing to determine a feasible candidate. At that time, the PowerPC RAD750s were radiation tolerant, but did not meet the required performance needs of the project. Thus, the final solution was to select the PowerPC 7455. This processor did not have a radiation tolerant version, but had some ability to detect failures. However, its cache tags did not provide parity and thus the project incorporated a software strategy to detect radiation failures. The strategy was to incorporate dual paths for software generating commands to the legacy Space Shuttle avionics to prevent failures due to the softness of the upgraded avionics.
Case Study of Using High Performance Commercial Processors in a Space Environment
NASA Technical Reports Server (NTRS)
Ferguson, Roscoe C.; Olivas, Zulema
2009-01-01
The purpose of the Space Shuttle Cockpit Avionics Upgrade project was to reduce crew workload and improve situational awareness. The upgrade was to augment the Shuttle avionics system with new hardware and software. A major success of this project was the validation of the hardware architecture and software design. This was significant because the project incorporated new technology and approaches for the development of human rated space software. An early version of this system was tested at the Johnson Space Center for one month by teams of astronauts. The results were positive, but NASA eventually cancelled the project towards the end of the development cycle. The goal to reduce crew workload and improve situational awareness resulted in the need for high performance Central Processing Units (CPUs). The choice of CPU selected was the PowerPC family, which is a reduced instruction set computer (RISC) known for its high performance. However, the requirement for radiation tolerance resulted in the reevaluation of the selected family member of the PowerPC line. Radiation testing revealed that the original selected processor (PowerPC 7400) was too soft to meet mission objectives and an effort was established to perform trade studies and performance testing to determine a feasible candidate. At that time, the PowerPC RAD750s where radiation tolerant, but did not meet the required performance needs of the project. Thus, the final solution was to select the PowerPC 7455. This processor did not have a radiation tolerant version, but faired better than the 7400 in the ability to detect failures. However, its cache tags did not provide parity and thus the project incorporated a software strategy to detect radiation failures. The strategy was to incorporate dual paths for software generating commands to the legacy Space Shuttle avionics to prevent failures due to the softness of the upgraded avionics.
A Survey of Reliability, Maintainability, Supportability, and Testability Software Tools
1991-04-01
designs in terms of their contributions toward forced mission termination and vehicle or function loss . Includes the ability to treat failure modes of...ABSTRACT: Inputs: MTBFs, MTTRs, support equipment costs, equipment weights and costs, available targets, military occupational specialty skill level and...US Army CECOM NAME: SPARECOST ABSTRACT: Calculates expected number of failures and performs spares holding optimization based on cost, weight , or
Gamell, Marc; Teranishi, Keita; Kolla, Hemanth; ...
2017-10-26
In order to achieve exascale systems, application resilience needs to be addressed. Some programming models, such as task-DAG (directed acyclic graphs) architectures, currently embed resilience features whereas traditional SPMD (single program, multiple data) and message-passing models do not. Since a large part of the community's code base follows the latter models, it is still required to take advantage of application characteristics to minimize the overheads of fault tolerance. To that end, this paper explores how recovering from hard process/node failures in a local manner is a natural approach for certain applications to obtain resilience at lower costs in faulty environments.more » In particular, this paper targets enabling online, semitransparent local recovery for stencil computations on current leadership-class systems as well as presents programming support and scalable runtime mechanisms. Also described and demonstrated in this paper is the effect of failure masking, which allows the effective reduction of impact on total time to solution due to multiple failures. Furthermore, we discuss, implement, and evaluate ghost region expansion and cell-to-rank remapping to increase the probability of failure masking. To conclude, this paper shows the integration of all aforementioned mechanisms with the S3D combustion simulation through an experimental demonstration (using the Titan system) of the ability to tolerate high failure rates (i.e., node failures every five seconds) with low overhead while sustaining performance at large scales. In addition, this demonstration also displays the failure masking probability increase resulting from the combination of both ghost region expansion and cell-to-rank remapping.« less
SU-E-T-627: Failure Modes and Effect Analysis for Monthly Quality Assurance of Linear Accelerator
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xie, J; Xiao, Y; Wang, J
2014-06-15
Purpose: To develop and implement a failure mode and effect analysis (FMEA) on routine monthly Quality Assurance (QA) tests (physical tests part) of linear accelerator. Methods: A systematic failure mode and effect analysis method was performed for monthly QA procedures. A detailed process tree of monthly QA was created and potential failure modes were defined. Each failure mode may have many influencing factors. For each factor, a risk probability number (RPN) was calculated from the product of probability of occurrence (O), the severity of effect (S), and detectability of the failure (D). The RPN scores are in a range ofmore » 1 to 1000, with higher scores indicating stronger correlation to a given influencing factor of a failure mode. Five medical physicists in our institution were responsible to discuss and to define the O, S, D values. Results: 15 possible failure modes were identified and all RPN scores of all influencing factors of these 15 failue modes were from 8 to 150, and the checklist of FMEA in monthly QA was drawn. The system showed consistent and accurate response to erroneous conditions. Conclusion: The influencing factors of RPN greater than 50 were considered as highly-correlated factors of a certain out-oftolerance monthly QA test. FMEA is a fast and flexible tool to develop an implement a quality management (QM) frame work of monthly QA, which improved the QA efficiency of our QA team. The FMEA work may incorporate more quantification and monitoring fuctions in future.« less
Schmeida, Mary; Savrin, Ronald A
2012-01-01
Heart failure readmission among the elderly is frequent and costly to both the patient and the Medicare trust fund. In this study, the authors explore the factors that are associated with states having heart failure readmission rates that are higher than the U.S. national rate. Acute inpatient hospital settings. 50 state-level data and multivariate regression analysis is used. The dependent variable Heart Failure 30-day Readmission Worse than U.S. Rate is based on adult Medicare Fee-for-Service patients hospitalized with a primary discharge diagnosis of heart failure and for which a subsequent inpatient readmission occurred within 30 days of their last discharge. One key variable found--states with a higher resident population speaking a primary language other than English at home--that is significantly associated with a decrease in probability in states ranking "worse" on heart failure 30-day readmission. Whereas, states with a higher median income, more total days of care per 1,000 Medicare enrollees, and a greater percentage of Medicare enrollees with prescription drug coverage have a greater probability for heart failure 30-day readmission to be "worse" than the U.S. national rate. Case management interventions targeting health literacy may be more effective than other factors to improve state-level hospital status on heart failure 30-day readmission. Factors such as total days of care per 1,000 Medicare enrollees and improving patient access to postdischarge medication(s) may not be as important as literacy. Interventions aimed to prevent disparities should consider higher income population groups as vulnerable for readmission.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gamell, Marc; Teranishi, Keita; Kolla, Hemanth
In order to achieve exascale systems, application resilience needs to be addressed. Some programming models, such as task-DAG (directed acyclic graphs) architectures, currently embed resilience features whereas traditional SPMD (single program, multiple data) and message-passing models do not. Since a large part of the community's code base follows the latter models, it is still required to take advantage of application characteristics to minimize the overheads of fault tolerance. To that end, this paper explores how recovering from hard process/node failures in a local manner is a natural approach for certain applications to obtain resilience at lower costs in faulty environments.more » In particular, this paper targets enabling online, semitransparent local recovery for stencil computations on current leadership-class systems as well as presents programming support and scalable runtime mechanisms. Also described and demonstrated in this paper is the effect of failure masking, which allows the effective reduction of impact on total time to solution due to multiple failures. Furthermore, we discuss, implement, and evaluate ghost region expansion and cell-to-rank remapping to increase the probability of failure masking. To conclude, this paper shows the integration of all aforementioned mechanisms with the S3D combustion simulation through an experimental demonstration (using the Titan system) of the ability to tolerate high failure rates (i.e., node failures every five seconds) with low overhead while sustaining performance at large scales. In addition, this demonstration also displays the failure masking probability increase resulting from the combination of both ghost region expansion and cell-to-rank remapping.« less
Arancibia, F; Ewig, S; Martinez, J A; Ruiz, M; Bauer, T; Marcos, M A; Mensa, J; Torres, A
2000-07-01
The aim of the study was to determine the causes and prognostic implications of antimicrobial treatment failures in patients with nonresponding and progressive life-threatening, community-acquired pneumonia. Forty-nine patients hospitalized with a presumptive diagnosis of community-acquired pneumonia during a 16-mo period, failure to respond to antimicrobial treatment, and documented repeated microbial investigation >/= 72 h after initiation of in-hospital antimicrobial treatment were recorded. A definite etiology of treatment failure could be established in 32 of 49 (65%) patients, and nine additional patients (18%) had a probable etiology. Treatment failures were mainly infectious in origin and included primary, persistent, and nosocomial infections (n = 10 [19%], 13 [24%], and 11 [20%] of causes, respectively). Definite but not probable persistent infections were mostly due to microbial resistance to the administered initial empiric antimicrobial treatment. Nosocomial infections were particularly frequent in patients with progressive pneumonia. Definite persistent infections and nosocomial infections had the highest associated mortality rates (75 and 88%, respectively). Nosocomial pneumonia was the only cause of treatment failure independently associated with death in multivariate analysis (RR, 16.7; 95% CI, 1.4 to 194.9; p = 0.03). We conclude that the detection of microbial resistance and the diagnosis of nosocomial pneumonia are the two major challenges in hospitalized patients with community-acquired pneumonia who do not respond to initial antimicrobial treatment. In order to establish these potentially life-threatening etiologies, a regular microbial reinvestigation seems mandatory for all patients presenting with antimicrobial treatment failures.
Predictors of treatment failure in young patients undergoing in vitro fertilization.
Jacobs, Marni B; Klonoff-Cohen, Hillary; Agarwal, Sanjay; Kritz-Silverstein, Donna; Lindsay, Suzanne; Garzo, V Gabriel
2016-08-01
The purpose of the study was to evaluate whether routinely collected clinical factors can predict in vitro fertilization (IVF) failure among young, "good prognosis" patients predominantly with secondary infertility who are less than 35 years of age. Using de-identified clinic records, 414 women <35 years undergoing their first autologous IVF cycle were identified. Logistic regression was used to identify patient-driven clinical factors routinely collected during fertility treatment that could be used to model predicted probability of cycle failure. One hundred ninety-seven patients with both primary and secondary infertility had a failed IVF cycle, and 217 with secondary infertility had a successful live birth. None of the women with primary infertility had a successful live birth. The significant predictors for IVF cycle failure among young patients were fewer previous live births, history of biochemical pregnancies or spontaneous abortions, lower baseline antral follicle count, higher total gonadotropin dose, unknown infertility diagnosis, and lack of at least one fair to good quality embryo. The full model showed good predictive value (c = 0.885) for estimating risk of cycle failure; at ≥80 % predicted probability of failure, sensitivity = 55.4 %, specificity = 97.5 %, positive predictive value = 95.4 %, and negative predictive value = 69.8 %. If this predictive model is validated in future studies, it could be beneficial for predicting IVF failure in good prognosis women under the age of 35 years.
NASA Technical Reports Server (NTRS)
Hatfield, Glen S.; Hark, Frank; Stott, James
2016-01-01
Launch vehicle reliability analysis is largely dependent upon using predicted failure rates from data sources such as MIL-HDBK-217F. Reliability prediction methodologies based on component data do not take into account system integration risks such as those attributable to manufacturing and assembly. These sources often dominate component level risk. While consequence of failure is often understood, using predicted values in a risk model to estimate the probability of occurrence may underestimate the actual risk. Managers and decision makers use the probability of occurrence to influence the determination whether to accept the risk or require a design modification. The actual risk threshold for acceptance may not be fully understood due to the absence of system level test data or operational data. This paper will establish a method and approach to identify the pitfalls and precautions of accepting risk based solely upon predicted failure data. This approach will provide a set of guidelines that may be useful to arrive at a more realistic quantification of risk prior to acceptance by a program.
Probability of in-vessel steam explosion-induced containment failure for a KWU PWR
DOE Office of Scientific and Technical Information (OSTI.GOV)
Esmaili, H.; Khatib-Rahbar, M.; Zuchuat, O.
During postulated core meltdown accidents in light water reactors, there is a likelihood for an in-vessel steam explosion when the melt contacts the coolant in the lower plenum. The objective of the work described in this paper is to determine the conditional probability of in-vessel steam explosion-induced containment failure for a Kraftwerk Union (KWU) pressurized water reactor (PWR). The energetics of the explosion depends on the mass of the molten fuel that mixes with the coolant and participates in the explosion and on the conversion of fuel thermal energy into mechanical work. The work can result in the generation ofmore » dynamic pressures that affect the lower head (and possibly lead to its failure), and it can cause acceleration of a slug (fuel and coolant material) upward that can affect the upper internal structures and vessel head and ultimately cause the failure of the upper head. If the upper head missile has sufficient energy, it can reach the containment shell and penetrate it. The analysis, must therefore, take into account all possible dissipation mechanisms.« less
Ruggeri, Annalisa; Labopin, Myriam; Sormani, Maria Pia; Sanz, Guillermo; Sanz, Jaime; Volt, Fernanda; Michel, Gerard; Locatelli, Franco; Diaz De Heredia, Cristina; O'Brien, Tracey; Arcese, William; Iori, Anna Paola; Querol, Sergi; Kogler, Gesine; Lecchi, Lucilla; Pouthier, Fabienne; Garnier, Federico; Navarrete, Cristina; Baudoux, Etienne; Fernandes, Juliana; Kenzey, Chantal; Eapen, Mary; Gluckman, Eliane; Rocha, Vanderson; Saccardi, Riccardo
2014-09-01
Umbilical cord blood transplant recipients are exposed to an increased risk of graft failure, a complication leading to a higher rate of transplant-related mortality. The decision and timing to offer a second transplant after graft failure is challenging. With the aim of addressing this issue, we analyzed engraftment kinetics and outcomes of 1268 patients (73% children) with acute leukemia (64% acute lymphoblastic leukemia, 36% acute myeloid leukemia) in remission who underwent single-unit umbilical cord blood transplantation after a myeloablative conditioning regimen. The median follow-up was 31 months. The overall survival rate at 3 years was 47%; the 100-day cumulative incidence of transplant-related mortality was 16%. Longer time to engraftment was associated with increased transplant-related mortality and shorter overall survival. The cumulative incidence of neutrophil engraftment at day 60 was 86%, while the median time to achieve engraftment was 24 days. Probability density analysis showed that the likelihood of engraftment after umbilical cord blood transplantation increased after day 10, peaked on day 21 and slowly decreased to 21% by day 31. Beyond day 31, the probability of engraftment dropped rapidly, and the residual probability of engrafting after day 42 was 5%. Graft failure was reported in 166 patients, and 66 of them received a second graft (allogeneic, n=45). Rescue actions, such as the search for another graft, should be considered starting after day 21. A diagnosis of graft failure can be established in patients who have not achieved neutrophil recovery by day 42. Moreover, subsequent transplants should not be postponed after day 42. Copyright© Ferrata Storti Foundation.
Peron, Guillaume; Hines, James E.
2014-01-01
Many industrial and agricultural activities involve wildlife fatalities by collision, poisoning or other involuntary harvest: wind turbines, highway network, utility network, tall structures, pesticides, etc. Impacted wildlife may benefit from official protection, including the requirement to monitor the impact. Carcass counts can often be conducted to quantify the number of fatalities, but they need to be corrected for carcass persistence time (removal by scavengers and decay) and detection probability (searcher efficiency). In this article we introduce a new piece of software that fits a superpopulation capture-recapture model to raw count data. It uses trial data to estimate detection and daily persistence probabilities. A recurrent issue is that fatalities of rare, protected species are infrequent, in which case the software offers the option to switch to an ‘evidence of absence’ mode, i.e., estimate the number of carcasses that may have been missed by field crews. The software allows distinguishing between different turbine types (e.g. different vegetation cover under turbines, or different technical properties), as well between two carcass age-classes or states, with transition between those classes (e.g, fresh and dry). There is a data simulation capacity that may be used at the planning stage to optimize sampling design. Resulting mortality estimates can be used 1) to quantify the required amount of compensation, 2) inform mortality projections for proposed development sites, and 3) inform decisions about management of existing sites.
Cascading failures in ac electricity grids.
Rohden, Martin; Jung, Daniel; Tamrakar, Samyak; Kettemann, Stefan
2016-09-01
Sudden failure of a single transmission element in a power grid can induce a domino effect of cascading failures, which can lead to the isolation of a large number of consumers or even to the failure of the entire grid. Here we present results of the simulation of cascading failures in power grids, using an alternating current (AC) model. We first apply this model to a regular square grid topology. For a random placement of consumers and generators on the grid, the probability to find more than a certain number of unsupplied consumers decays as a power law and obeys a scaling law with respect to system size. Varying the transmitted power threshold above which a transmission line fails does not seem to change the power-law exponent q≈1.6. Furthermore, we study the influence of the placement of generators and consumers on the number of affected consumers and demonstrate that large clusters of generators and consumers are especially vulnerable to cascading failures. As a real-world topology, we consider the German high-voltage transmission grid. Applying the dynamic AC model and considering a random placement of consumers, we find that the probability to disconnect more than a certain number of consumers depends strongly on the threshold. For large thresholds the decay is clearly exponential, while for small ones the decay is slow, indicating a power-law decay.
Recent advances in computational structural reliability analysis methods
NASA Astrophysics Data System (ADS)
Thacker, Ben H.; Wu, Y.-T.; Millwater, Harry R.; Torng, Tony Y.; Riha, David S.
1993-10-01
The goal of structural reliability analysis is to determine the probability that the structure will adequately perform its intended function when operating under the given environmental conditions. Thus, the notion of reliability admits the possibility of failure. Given the fact that many different modes of failure are usually possible, achievement of this goal is a formidable task, especially for large, complex structural systems. The traditional (deterministic) design methodology attempts to assure reliability by the application of safety factors and conservative assumptions. However, the safety factor approach lacks a quantitative basis in that the level of reliability is never known and usually results in overly conservative designs because of compounding conservatisms. Furthermore, problem parameters that control the reliability are not identified, nor their importance evaluated. A summary of recent advances in computational structural reliability assessment is presented. A significant level of activity in the research and development community was seen recently, much of which was directed towards the prediction of failure probabilities for single mode failures. The focus is to present some early results and demonstrations of advanced reliability methods applied to structural system problems. This includes structures that can fail as a result of multiple component failures (e.g., a redundant truss), or structural components that may fail due to multiple interacting failure modes (e.g., excessive deflection, resonate vibration, or creep rupture). From these results, some observations and recommendations are made with regard to future research needs.
Recent advances in computational structural reliability analysis methods
NASA Technical Reports Server (NTRS)
Thacker, Ben H.; Wu, Y.-T.; Millwater, Harry R.; Torng, Tony Y.; Riha, David S.
1993-01-01
The goal of structural reliability analysis is to determine the probability that the structure will adequately perform its intended function when operating under the given environmental conditions. Thus, the notion of reliability admits the possibility of failure. Given the fact that many different modes of failure are usually possible, achievement of this goal is a formidable task, especially for large, complex structural systems. The traditional (deterministic) design methodology attempts to assure reliability by the application of safety factors and conservative assumptions. However, the safety factor approach lacks a quantitative basis in that the level of reliability is never known and usually results in overly conservative designs because of compounding conservatisms. Furthermore, problem parameters that control the reliability are not identified, nor their importance evaluated. A summary of recent advances in computational structural reliability assessment is presented. A significant level of activity in the research and development community was seen recently, much of which was directed towards the prediction of failure probabilities for single mode failures. The focus is to present some early results and demonstrations of advanced reliability methods applied to structural system problems. This includes structures that can fail as a result of multiple component failures (e.g., a redundant truss), or structural components that may fail due to multiple interacting failure modes (e.g., excessive deflection, resonate vibration, or creep rupture). From these results, some observations and recommendations are made with regard to future research needs.