Before we define operational risk, let us understand what we mean by risk. Risk is the probability and severity of an event that results in a financial or reputational loss. Note, that Risk is about the downside, that is losses. Some like to include the upside under the Risk label. But why do that? There is already a word for the upside and it is Reward. That is why we talk about the Risk Reward trade-off. Why create confusion by including the upside as part of Risk?
Now that we have clarified what Risk is and is not, let us proceed to define what Operational Risk is.
At one time Operational risk was known as the Other Risk. In other words, it was the risk that was not either credit, market, or liquidity. Over time, Regulators, after industry consultation, settled on operational risk as the risk of loss resulting from inadequate or failed internal processes, people, and systems or from external events. This definition includes legal risk but excludes strategic and reputational risk.
Although this definition is better than Other Risk it is still too generic to be useful for managing the associated risk.
If you were to develop a more operationally useful definition of operational risk you need to develop a categorization scheme that allows you to distinguish different types of operational risk. For example, in Market Risk, categories include equity price risk, interest risk, FX risk, etc.
How would you go about creating such a categorization scheme for operational risk?
Like any situation where there is no readily available solution, you would apply Agile.
The First step in Agile is to have a clear understanding of the end state. Here that is straightforward. We want a definition of operational risk that makes it easy to distinguish what is operational risk and what is not, and easy to distinguish the different types of operational risk.
The Second step is to search for potential solutions and remix and repurpose. Do an internet search such as the top operational risk and you likely come up with something better than a regulatory definition such as:
Is this good enough? after it comes from a widely respected industry publication?
Now we apply the Third Agile step: Testing.
Suppose your financial institution experiences a cyber attack emanating from a hostile nation that causes your trading system to be down. Is this an IT disruption or a geopolitical risk? Suppose that the same attack allowed fraudsters to defraud bank clients by compromising the security protections of your mobile banking app. Which is it now: an IT disruption, geopolitical risk, or Fraud? Soon you find yourself in a confusing mess. The problem with this approach is that some categories are causes (some proximate causes, such as IT disruption, some root causes, such as geopolitical risk in this case and some are outcomes such as Fraud. Since outcomes, usually have multiple causes. Whenever you mix causes and outcomes as categories, there is no unique way to easily categorize operational risk events. Instead complex and often confusing decision trees have to be implemented to ensure some level of consistency of event categorization across the firm.
Too complicated. So you quickly disregard this categorization approach and start looking for alternatives. You will find that other industry publications and websites have similar categories and are therefore plagued with the same severe shortcomings.
A further refined search on the internet would quickly lead you to the Basel categories for operation risk.
This is the first Layer in the Basel hierarchal categories of operational risk. Each of these is broken down into 2 other layers (page 17+).
These categories were settled upon by Basel after extensive industry consultations. They are an improvement over the industry publication categories, but a quick test will surface the same major issue. There is no way to uniquely classify operational risk since it also mixes outcomes such as Fraud or Damage to Physical Assets with causes such as a system failure. For example, suppose computers are damaged because of a system failure that turned on the sprinkler system. is it Damage to Physical Assets or is it a System Failure? Again without a complex set of rules, there would be lots of confusion and no consistency across the firm. Also, the categories seem incomplete. Where does damage to information assets (data), which increasingly is one of the most important assets of any firm, fit in? So this is better but not good enough!
The next Agile Step is to Iterate. But instead of continuing to search for solutions, this time we will build one of our own. We will collect a vast number of operational risk events some of which are readily available on the Internet and and some from operational risk databases, such as the SAS Operational Risk Database. Once we have gathered such a collection, we can start to classify them into a MECE structure.
After several iterations, you will likely come up with something like this.
Note, these are Mutually Exclusive (no need for complex categorization rules)
And, they are Comprehensively Exhaustive (all events can be uniquely and easily fitted into one of these categories)
Unfortunately, the Basel categories have become the industry standard, with each firm having its own variation of these categories but all mapped back into the Basel categories. Since there can be no universal mapping scheme, that would require MECE structures, the individual mappings introduce further inconsistencies across firms. Such is life!
The Basel categories with all their shortcoming will be used through these lessons.
What role do operational risk databases like SAS play in improving the categorization of risk events, and is there any challenges in utilizing these resources effectively? Also, for some operational risks like internal & external fraud, we may use large databases to help us quantify the risk and losses, but when it comes to some execution and process management risk that are not likely to be quantifiable, how do we measure and quantify these risks using large database? Are there any other approaches to deal with it?
How can an institution determine when an operational risk event, such as a cyberattack, should be categorized under multiple causes (e.g., IT disruption vs. geopolitical risk), and how does this affect the firm's risk mitigation strategies?
How can we effectively distinguish between causes and outcomes in operational risk categorization to avoid confusion and ensure consistency? I think we can create a mapping system that links causes to their potential outcomes. This can help in visualizing the relationship between different risk factors and understanding how one category of risk can lead to various outcomes. We can also develop clear rules for categorizing risks as causes or outcomes. For example, if an incident leads to a direct financial loss, categorize it as an outcome. If it is due to a system failure, categorize it as a cause.
How can a financial institution develop a categorization scheme for operational risk that is both mutually exclusive and comprehensively exhaustive, avoiding the confusion of mixing causes and outcomes?
How has the definition of operational risk evolved over time, and what challenges are associated with categorizing operational risk in a way that allows for clear and consistent classification of different types of operational risk events?