Architecture Risk

Architecture Risk #

Every architecture has risk associated with it. Analyzing risk is one of the key activities of architecture. By analyzing risk, the architect can address deficiencies within the architecture and take corrective action to mitigate the risk.

Risk Matrix #

When assessing architecture risk it’s hard and fairly subjective to qualify a risk as low, medium, or high. The risk matrix can be leverages to reduce the level of subjectiveness.

The architecture risk matrix uses 2 dimensions to qualify risk: the overall impact of the risk and the likelihood of that risk occurring. Each dimension can be qualified as low (1), medium (2), or high (3). These numbers are multiplied together within each grid of the matrix, providing an objective numerical representation of that risk.

  • Risk 1 - 2 : Low Risk (green)
  • Risk 3 - 4 : Medium Risk (yellow)
  • Risk 6 - 9 : High Risk (red)

TIP: Considering to start first with the “overall impact” dimension before rating the likelihood.

Matrix #

Overall Impact / Likelihood Of RiskLow (1)Medium (2)High (3)
Low (1)1 (green)2 (green)3 (yellow)
Medium (2)2 (green)4 (yellow)6 (red)
High (3)3 (yellow)6 (red)9 (red)

Colored Risk Matrix

Example #

There is a concern about availability with regard to a primary central database used in the application:

  • Overall Impact: Loosing the databases has a high impact if the application needs the database to do anything.
  • Likelihood Of Risk: If the database is Highly Available in a cluster configuration, then there is a low likelihood of risk.
  • Overall Risk Rating: High (3) * Low (1) = 3 (Medium Risk)

Risk Assessments #

Given that we have the risk matrix to help grade risk, we can make a summarized report of the overall risk of an architecture. That is what we call a risk assessment.

Risk CriteriaCustomer RegistrationCatalog CheckoutOrder FulfillmentOrder ShipmentTotal Risk
Scalability261211
Availability242110
Performance423615
Security631111
Data Integrity961117
Total Risk2421811

Such a Risk Assessments allows to find critical areas that need attention and prioritize accordingly, by looking at the risk by domain or criteria. Based on the business and domain, certain criteria or domains will have priority over the others (e.g Checkout might be considered the most critical for some businesses).

Risk assessments should be done periodically, as these are not static factors. Measure them over time and track the trends of the values. Annual risk assessments are a good starting point.

Visual Variations for better story telling #

ISO 27001 & Security #

Doing risk assessments is considered a best practice for security. The ISO 27001 information security standard has one of its central tenets the process of doing periodical risk assessment.

Identifying Risk #

There are various methods to identify and address risks in your architecture.

Threat ModelingRisk Storming
Definition- Systematic Process- Can be done collaborative but emphasize is on the process.- Collaborative Technique
ParticipantsUsually security expertsVarious stakeholders, from developers to business analysts to users
ScopePrimarily focused on security threatsLooks at a broader rang of risk, which can include security, but also usability, performance, compliance, and other concerns.
OutputTypically results in a list of potential threats, their severity, and suggested mitigation strategies.Outputs not only a list of risks but also a prioritized set of actions based on the collaborative discussions.
NotesStrong security emphasizeMight also result in a better understanding of business priorities, user concerns, and technical constraints.

Threat Modeling deserves its own place, so we will cover here only Risk Storming.

Addressing Risk #

Once risks/threats are identified with their respective risk rating, identified risks can be addressed in various ways:

  • Avoid Risk: Stop doing the activity that causes the risk.
    • Example: Delete the web server if it’s not used anymore.
  • Mitigate Risk: Implement controls or measures to reduce impact or likelihood.
    • Example: Move a database to a High Available cluster configuration.
  • Transfer Risk: Transfer the risk to a 3th party.
    • Example: Buy insurance, outsource activity, … When using Azure, you transfer any risk regarding physical server security to Azure.
  • Accept Risk: We don’t do anything, and accept consciously this risk. Usually when measures/controls would outweigh the risk. Or the risk is just too low.
    • Example: The risk rating is to low
    • Example: Special software would cost $100 000, while your company has revenue of $20 000.

Risk Storming #

Risk storming is a collaborative exercise used to determine architectural risk within a specific dimension. Common dimensions (areas of risk) include unproven technology, performance, scalability, availability (including transitive dependencies), data loss, single points of failure, and security. Risk storming efforts include architects, senior developers, and tech leads. Providing an implementation perspective for architects, but also for non-architects to gain a better understanding of the architecture.

Risk storming involves an individual part and a collaborative part. An architecture diagram is used for both parts. Based on the scope of the risk storming (entire architecture or a specific context) appropriate architecture diagram should be provided.

Step 1: Identification (individual) #

Each participant identifies areas of risk within the architecture on their own. This is essential so that other participants don’t influence each other or direct attention away. This should be done ahead of time before the collaborative part starts. This preparation can be done with a risk per post-it, which is color-coded based on the overall risk rating (e.g. High - red). This means that each participant must rate each risk using the risk matrix ahead of time.

Step 2: Consensus (collaborative) #

The goal is gaining consensus among all participants in a meeting regarding the risk within the architecture. Ideally the architecture diagram is printed or visible on a large screen and at the start of the session, all participants put their post-its on the diagram, based on where they identified the risks.

Once all post-its in place, the collaborative part on gaining consensus in terms of the risk qualification. All post-its should be covered.

Step 3: Mitigation (collaborative) #

Now risks must be addressed, some will be mitigated, others might not, see Addressing Risk about what options exist. Key (business) stakeholders should be involved on deciding how risks are addressed as they are usually best positioned to decide on accepting risk or accepting costs to mitigate a risk.

Example: Risk Storming #

Example: Complete Risk Assessment Table #

IDRiskOverall ImpactLikelihoodRatingAddressNotes
001Database becomes unavailableHighMediumHighMitigate RiskConfigure Database as HA with cluster configuration
002Hardware accessed by unauthorized personHighMediumHighTransfer RiskAzure is responsible for hardware security
….

Resources #