In this first part we are going to share with you some common logical and high level steps we tend to follow to design detection logic for a certain attack technique. To make it simple and straightforward we will start with some definitions (to align) and then analyze the following diagram that summarizes big chunks of the process.
- attack technique: group of small blocks (primitives) chained to bypass a certain security control (e.g. steal secrets, elevate privileges, execute code remote or locally).
- datasource: mainly logs (host and network) and OS telemetry such as processes execution, file modifications, network connection.
- detection resilience: high level qualitative metric to measure how easy for an attacker to bypass a certain detection logic (e.g. to detect LSASS memory dump creation we monitor file creation with the name "lsass.dmp". this can be easily bypassed if the attacker has control over the file name).
- unique changes: if a certain attack primitive performs a change that happens a lot and in normal conditions (e.g. create a file with extension .tmp or .js in the user temporary directory) then this change is not unique enough and hence can't be used as an indicator of suspicious activity.
- context: if a certain change is unique enough to use it as an indicator of suspicious activity, we still have to assess if it provides enough context or it can be associated to 100 techniques.
Step A, consist of identifying all building blocks of certain attack technique, in our example we have 8 primitives for the attack technique X (often involves reading documentation and source code if available and needed).
Step B, consist of identifying what's necessary for the technique success and what's optional from an attacker perspective for the success of the technique, in our example out of 8 primitives only 5 are needed (still green) and the rest are optional and if omitted the technique still works.
Step C, consist of identifying what's under the attacker control and what's not (e.g. in PM1 the technique needs drop a dll file in the system32 directory, the default name is abc.dll (still can be used as signature) but the attacker controls the name and can set it to more than 20 different unique names). In our example out of 5 necessary PMs, only 3 are non modifiable (still green) and 2 are modifiable (marked as dark green).
Step D, consist of mapping the 5 necessary PM to the relevant datasources we have at our disposition, (e.g. in PM8 Explorer.exe will perform a network connection but we don't collect processes network telemetry). In our example out of 5 PMs we have telemetry for only 3 PMs and the 2 others are opportunities for improvement (marked in purple) and if we encounter a medium to high number of techniques that requires the same type of telemetry then it's worth using it as a justification to enable visibility on those gaps.
Step E, mainly consist of identifying what's normal (happens a lot and if enabled as a detection will DoS your mailbox and SIEM), exclusion opportunities and what's unique enough to use it as an indicator of suspicious activity. This usually involves querying the history of existing datastores and if the number of hits is medium to low then its worth moving to the next step. In our example out of 3 remaining PMs we are left with 2 .
Step F, In this step we are are left with 2/8 PMs, that can serve as our initial detection scope, we need to assess the detection opportunities we have in term of performance impact, alert context and enrichment options. for instance if PM4 alone is indeed indicative of something suspicious still it can be also associated to other unrelated malicious techniques (context), and for PM5 we need to create a rule that matches against 100 different file names (query time and high performance impact).
Following those steps in order is not necessary, and we may have missed (unintentionally) some other important steps. It usually comes to having a good understanding of the offensive technique, filtering out normal behavior while in the same time balancing detection resilience, alert context and performance impact. Also not always we have guarantees to come up with a detection for a TTP, but the ultimate goal is to capture gaps and potential opportunities of improvement. In the upcoming parts we will try to cover each step in details with some practical examples.