knowledge-base

Architecture Risk

Every architecture has risk associated with it. Analyzing risk is one of the key activities of architecture. By analyzing risk, the architect can address deficiencies within the architecture and take corrective action to mitigate the risk.

Risk Matrix

When assessing architecture risk it’s hard and fairly subjective to qualify a risk as low, medium, or high. The risk matrix can be leverages to reduce the level of subjectiveness.

The architecture risk matrix uses 2 dimensions to qualify risk: the overall impact of the risk and the likelihood of that risk occurring. Each dimension can be qualified as low (1), medium (2), or high (3). These numbers are multiplied together within each grid of the matrix, providing an objective numerical representation of that risk.

TIP: Considering to start first with the “overall impact” dimension before rating the likelihood.

Matrix

Overall Impact / Likelihood Of Risk Low (1) Medium (2) High (3)
Low (1) 1 (green) 2 (green) 3 (yellow)
Medium (2) 2 (green) 4 (yellow) 6 (red)
High (3) 3 (yellow) 6 (red) 9 (red)

Colored Risk Matrix

Example

There is a concern about availability with regard to a primary central database used in the application:

Risk Assessments

Given that we have the risk matrix to help grade risk, we can make a summarized report of the overall risk of an architecture. That is what we call a risk assessment.

Risk Criteria Customer Registration Catalog Checkout Order Fulfillment Order Shipment Total Risk
Scalability 2 6 1 2 11
Availability 2 4 2 1 10
Performance 4 2 3 6 15
Security 6 3 1 1 11
Data Integrity 9 6 1 1 17
Total Risk 24 21 8 11  

Such a Risk Assessments allows to find critical areas that need attention and prioritize accordingly, by looking at the risk by domain or criteria. Based on the business and domain, certain criteria or domains will have priority over the others (e.g Checkout might be considered the most critical for some businesses).

Risk assessments should be done periodically, as these are not static factors. Measure them over time and track the trends of the values. Annual risk assessments are a good starting point.

Visual Variations for better story telling

ISO 27001 & Security

Doing risk assessments is considered a best practice for security. The ISO 27001 information security standard has one of its central tenets the process of doing periodical risk assessment.

Identifying Risk

There are various methods to identify and address risks in your architecture.

  Threat Modeling Risk Storming
Definition - Systematic Process
- Can be done collaborative but emphasize is on the process.
- Collaborative Technique
Participants Usually security experts Various stakeholders, from developers to business analysts to users
Scope Primarily focused on security threats Looks at a broader rang of risk, which can include security, but also usability, performance, compliance, and other concerns.
Output Typically results in a list of potential threats, their severity, and suggested mitigation strategies. Outputs not only a list of risks but also a prioritized set of actions based on the collaborative discussions.
Notes Strong security emphasize Might also result in a better understanding of business priorities, user concerns, and technical constraints.

Threat Modeling deserves its own place, so we will cover here only Risk Storming.

Addressing Risk

Once risks/threats are identified with their respective risk rating, identified risks can be addressed in various ways:

Risk Storming

Risk storming is a collaborative exercise used to determine architectural risk within a specific dimension. Common dimensions (areas of risk) include unproven technology, performance, scalability, availability (including transitive dependencies), data loss, single points of failure, and security. Risk storming efforts include architects, senior developers, and tech leads. Providing an implementation perspective for architects, but also for non-architects to gain a better understanding of the architecture.

Risk storming involves an individual part and a collaborative part. An architecture diagram is used for both parts. Based on the scope of the risk storming (entire architecture or a specific context) appropriate architecture diagram should be provided.

Step 1: Identification (individual)

Each participant identifies areas of risk within the architecture on their own. This is essential so that other participants don’t influence each other or direct attention away. This should be done ahead of time before the collaborative part starts. This preparation can be done with a risk per post-it, which is color-coded based on the overall risk rating (e.g. High - red). This means that each participant must rate each risk using the risk matrix ahead of time.

Step 2: Consensus (collaborative)

The goal is gaining consensus among all participants in a meeting regarding the risk within the architecture. Ideally the architecture diagram is printed or visible on a large screen and at the start of the session, all participants put their post-its on the diagram, based on where they identified the risks.

Once all post-its in place, the collaborative part on gaining consensus in terms of the risk qualification. All post-its should be covered.

Step 3: Mitigation (collaborative)

Now risks must be addressed, some will be mitigated, others might not, see Addressing Risk about what options exist. Key (business) stakeholders should be involved on deciding how risks are addressed as they are usually best positioned to decide on accepting risk or accepting costs to mitigate a risk.

Example: Risk Storming

Example: Complete Risk Assessment Table

ID Risk Overall Impact Likelihood Rating Address Notes
001 Database becomes unavailable High Medium High Mitigate Risk Configure Database as HA with cluster configuration
002 Hardware accessed by unauthorized person High Medium High Transfer Risk Azure is responsible for hardware security
….

Resources