Alias: http://resiliencepapers.club (thanks to John Allspaw).
This doc contains notes about people active in resilience engineering, as well as some influential researchers who are no longer with us, organized alphabetically. It also includes people and papers from related fields, such as cognitive systems engineering and naturalistic decision-making.
Some papers have a (TWRR) link next to them. This stands for Thai Wood's Resilience Roundup. Thai publishes a newsletter that summarizes resilience engineering papers.
If you're not sure what to read first, check out:
For a collection of talks, check out the Resilience Engineering, Cognitive Systems Engineering, and Human Factors Concepts in Software Contexts YouTube playlist maintained by John Allspaw.
You might also be interested in my notes on David Woods's Resilience Engineering short course.
The papers linked here are also in the zotero res-eng group.
For each person, I list concepts that they reference in their writings, along with some publications. The publications lists aren't comprehensive: they're ones I've read or have added to my to-read list.
Note: there are now multiple contributors to this repository.
Allspaw is the former CTO of Etsy. He applies concepts from resilience engineering to the tech industry. He is one of the founders Adaptive Capacity Labs, a resilience engineering consultancy.
Allspaw tweets as @allspaw.
Bainbridge is (was?) a psychology researcher. (I have not been able to find any recent information about her).
Bainbridge is famous for her 1983 Ironies of automation paper, which continues to be frequently cited.
Baker is a practitioner who provides training services in human and organizational performance (HOP) and learning teams.
Baker tweets as @thehopmentor.
Bergström is a safety research and consultant. He runs the Master Program of Human Factors and Systems Safety at Lund University.
Bergström tweets as @bergstrom_johan.
Conklin's books are on my reading list, but I haven't read anything by him yet. I have listened to his great Preaccident investigation podcast.
Conklin tweets as @preaccident.
Quanta - Risk and Safety Conf 2019
Cook is an anasthesiologist who studies failures in complex systems. He is one of the founders Adaptive Capacity Labs, a resilience engineering consultancy.
Cook tweets as @ri_cook.
Dekker is a human factors and safety researcher with a background in aviation. His books aimed at a lay audience (Drift Into Failure, Just Culture, The Field Guide to 'Human Error' investigations) have been enormously influential. He was a founder of the MSc programme in Human Factors & Systems Safety at Lund University. His PhD advisor is David Woods.
Dekker tweets as @sidneydekkercom.
Dekker developed the theory of drift, characterized by five concepts:
Dekker examines how cultural norms defining justice can be re-oriented to minimize the negative impact and maximize learning when things go wrong.
Doyle is a control systems researcher. He is seeking to identify the universal laws that capture the behavior of resilient systems, and is concerned with the architecture of such systems.
Doyle's catch is a term introduced by David Woods, but attributed to John Doyle. Here's how Woods quotes Doyle:
Computer-based simulation and rapid prototyping tools are now broadly available and powerful enough that it is relatively easy to demonstrate almost anything, provided that conditions are made sufficiently idealized. However, the real world is typically far from idealized, and thus a system must have enough robustness in order to close the gap between demonstration and the real thing.
Edwards is a practitioner who provides training services in human and organizational performance (HOP).
Edwards tweets as @thehopcoach.
Ericsson introduced the idea of deliberate practice as a mechanism for achieving high level of expertise.
Ericsson isn't directly associated with the field of resilience engineering. However, Gary Klein's work is informed by his, and I have a particular interest in how people improve in expertise, so I'm including him here.
Feltovich is a retired Senior Research Scientist at the Florida Institute for Human & Machine Cognition (IHMC), who has done extensive reserach in human expertise.
Finkel is a Colonel in the Israeli Defense Force (IDF) and the Director of the IDF's Ground Forces Concept Development and Doctrine Department
Grayson is a cognitive systems engineer at Mile Two, LLC.
Herrera is an associate professor in the department of industrial economics and technology management at NTNU and a senior research scientist at SINTEF. Her areas of expertise include safety management and resilience engineering in avionics and air traffic management.
See also: list of publications
Hoffman is a senior research scientist at Florida Institute for Human & Machine Cognition (IHMC), who has done extensive reserach in human expertise.
Hollnagel proposed that there is always a fundamental tradeoff between efficiency and thoroughness, which he called the ETTO principle.
Safety-I: avoiding things that go wrong
Safety-II: performance variability rather than bimodality
Hollnagel proposed the Functional Resonance Analysis Method (FRAM) for modeling complex socio-technical systems.
Johannesen is currently a UX researcher and community advocate at IBM. Her PhD dissertation work examined how humans cooperate, including studies of anesthesiologists.
Klein studies how experts are able to quickly make effective decisions in high-tempo situations.
Klein tweets as @KleInsight.
Nancy Leveson is a computer science researcher with a focus in software safety.
Leveson developed the accident causality model known as STAMP: the Systems-Theoretic Accident Model and Process.
See STAMP for some more detailed notes of mine.
Macrae is a social psychology researcher who has done safety research in multiple domains, including aviation and healthcare. He helped set up the new healthcare investigation agency in England. He is currently a professor of organizational behavior and psychology at the Notthingham University Business School.
Macrae tweets at @CarlMacrae.
Maguire is a cognitive systems engineering researcher with a PhD from Ohio State University. Maguire has done safety work in multiple domains, including forestry, avalanches, and software services. She currently works as a researcher at jeli.io
Maguire tweets as @LauraMDMaguire.
Nemeth is a principal scientist at Applied Resesarch Associates, Inc.
Nyssen is a psychology professor at the University of Liège, who does research on human error in complex systems, in particular in medicine.
A list of publications can be found on her website linked above.
Ostrom was a Nobel-prize winning economics and political science researcher.
Pariès is the president of Dédale, a safety and human factors consultancy.
Patterson is a researcher who applies human factors engineering to improve patient safety in healthcare.
Perrow is a sociologist who studied the Three Mile Island disaster. "Normal Accidents" is cited by numerous other influential systems engineering publications such as Vaughan's "The Challenger Launch Decision".
Perry is a medical researcher who studies emergency medicine.
Jens Rasmussen was an enormously influential researcher in human factors and safety systems. In particular, you can see his influence in the work of Sidney Dekker, Nancy Leveson, David Woods
Rasmussen proposed three models of human performance.
Skill-based behavior doesn't require conscious attention. The prototypical example is riding a bicycle.
Rule-based behavior is based on a set of rules that we have internalized in advance. We select which rule to use based on experience, and then carry it out. An example would be: if threads are blocked, restart the server. You can think of rule-based behavior as a memorized runbook.
Knowledge-based behavior comes into play when facing an unfamiliar *situation. The person generates a set of plans based on their understanding of *the environment, and then selects which one to use. The challenging incidents *are the ones that require knowledge-based behavior to resolve.
He also proposed three types of information that humans process as they perform work.
Signals. Example: weather vane
Signs. Example: stop sign
Symbols. Example: written language
Rasmussen proposed a state-based model of a socio-technical system as a system that moves within a region of a state space. The region is surrounded by different boundaries:
Source: Risk management in a dynamic society: a modelling problem
Incentives push the system towards the boundary of acceptable performance: accidents happen when the boundary is exceeded.
TBD
Rasmussen proposed a multi-layer view of socio-technical systems:
Source: Risk management in a dynamic society: a modelling problem
(These are written but others about Rasmussen's work)
Reason is a psychology researcher who did work on understanding and categorizing human error.
Reason developed an accident causation model that is sometimes known as the swiss cheese model of accidents. In this model, Reason introduced the terms "sharp end" and "blunt end".
Reason developed a model of the types of errors that humans make:
Reed is a Senior Applied Resilience engineer at Netflix and runs REdeploy, a conference focused on Resilience Engineering in the software development and operations industry.
Reed tweets as @jpaulreed.
Roth is a cognitive psychologist who serves as the principal scientist at Roth Cognitive Engineering, a small company that conducts research and application in the areas of human factors and applied cognitive psychology (cognitive engineering)
Sarter is a researcher in industrial and operations engineering. She is the director of the Center for Ergonomics at the University of Michigan.
Scott is an anthropologist who also does research in political science. While Scott is not a member of a resilience engineering community, his book Seeing like a state has long been a staple of the cognitive systems engineering and resilience engineering communities.
Shorrock is a chartered psychologist and a chartered ergonomist and human factors specialist. He is the editor-in-chief of EUROCONTROL HindSight magazine. He runs the excellent Humanistic Systems blog.
Shorrock tweets as @StevenShorrock.
Life After Human Error (Velocity Europe 2014 keynote)
Vaughan is a sociology researcher who did a famous study of the NASA Challenger accident, concluding that it was the result of organizational failure rather than a technical failure. Specifically, production pressure overrode the rigorous scientific safety culture in place at NASA.
Turner was a sociologist who greatly influenced the field of organization studies.
Wears was a medical researcher who also had a PhD in industrial safety.
Woods has a research background in cognitive systems engineering and did work researching NASA accidents. He is one of the founders Adaptive Capacity Labs, a resilience engineering consultancy.
Woods tweets as @ddwoods2.
Woods has contributed an enormous number of concepts.
Woods uses the adaptive universe as a lens for understanding the behavior of all different kinds of systems.
All systems exist in a dynamic environment, and must adapt to change.
A successful system will need to adapt by virtue of its success.
Systems can be viewed as units of adaptive behavior (UAB) that interact. UABs exist at different scales (e.g., cell, organ, individual, group, organization).
All systems have competence envelopes, which are constrained by boundaries.
The resilience of a system is determined by how it behaves when it comes near to a boundary.
See Resilience Engineering Short Course for more details.
From The theory of graceful extensibility: basic rules that govern adaptive systems:
(Longer wording)
(Shorter wording)
For more details, see summary of graceful extensibility theorems.
(tbd)
Many of these are mentioned in Woods's short course.
Wreathall is an expert in human performance in safety. He works at the WreathWood Group, a risk and safety studies consultancy. Wreathall tweets as @wreathall.