Monday, 4 January 2021

How to Design Abnormal Child Processes Rules without Telemetry

    In detection engineering we often encounter attack techniques that result into a system process spawning an unusual child process, which can be used as a good detection or hunting logic. The only problem that remains is to exclude legitimate/expected benign child processes,  and for this often we need some endpoint production telemetry (the more the better), unfortunately not everyone has this privilege. In this post we will share with you some quick steps you can follow to tune your rule with no telemetry access.

For this let's take the example of a malware masquerading as WerFault.exe via hollowing or any equivalent form of injection, and our goal is to detect suspicious instances via looking at any abnormal child process (e.g. cmd.exe):

1) Imported Modules: Step 1 consist of identifying all imported DLL that are specific to the functioning of the subject process (not generic ones such as kernel32.dll, ntdll.dll etc.), in our example we can see 2 modules wer.dll and faultrep.dll:


  

2) Strings: Step 2 consist of identifying all executables in the process (werfault.exe) strings and also the previously identified function specific DLLs (wer.dll, faultrep.dll):


Of course not all programs names are valid child processes of WerFault.exe, to confirm which one are potential benign/expected child processes we need to move to the next step. 

3) Process Creation APIs: goal here is to identify all references to process creation related APIs (CreateProcessA/W, CreateProcessAsUserA/W, CreateProcessInternal, WinExec,  ShellExecute , ShellExecuteEx, NtCreateProcess, ZwCreateProcess  etc.). we will need to start first with those extracted directly from WerFault.exe, and then repeat same steps for the function specific DLLs.

For brevity we will show the steps for WerFault.exe example only, open your subject process in your favorite disassembler (you don't need to be a reverser!) and go to the the imports view, then search for the process creation related APIs:


Next double-click on the matched API name, then right-click or X to display the functions that use this API:


 As you can see below, we have only 6 functions to check, you can also start from the process names identified in Step 2 (Strings View), but for better flow and understanding start first with the APIs XREFs:


The CreateProcess API arguments that we care about (point the potential benign child process we are looking for) are lpCommandLine or lpApplicationName:


In the example of the CInpagePlugIn::StartCoFireProcess function we can see that the cofire.exe t is a potential child process.


In this case it was easy (adjacent to the API call), in other cases you will need to drill-down a couple of functions to find the ApplicationName or CommandLine being populated, you can always go back to the Program names extracted at phase 2 and cross reference the function that uses them for correlation. 

Psr.exe is another potential child process referenced in CAppRecorderPlugin::StopRecordingSession 


4) Going back to Step 1 if needed:  before repeating the same steps 1, 2 and 3 on wer.dll and faultrep.dll, first check via strings or checking the Import Table the presence of any process creation related APIs:


In case of no references to process names or process creation APIs, it's safe to move directly to step 5.

5) Detection Logic: Last step is straightforward, look for process with parent process name equal to WerFault.exe and process name is different than the identified potential benign child processes:

process where process.parent.name == "werfault.exe" and not process.name in ("cofire.exe", "psr.exe, ", "VsJITDebugger.exe", "TTTracer.exe", "rundll32.exe")


  Of course this approach will miss in some instances potential false positives such as processes created with arguments passed via standard input, config files, registry values, COM, RPC and equivalent :





above you can see an example where a potential child process name is extracted from the registry values ReflectDebugger. or Debugger:




This method is time consuming but still if applied to a limited number of target processes it can provide you with an initial working detection rule with minimum noise and with no access to production endpoint process execution telemetry.

References:

https://twitter.com/SBousseaden/status/1235533224337641473

https://www.hexacorn.com/blog/2018/08/31/beyond-good-ol-run-key-part-85/

https://github.com/elastic/detection-rules/blob/main/rules/windows/defense_evasion_masquerading_suspicious_werfault_childproc.toml



Friday, 27 November 2020

How to Design Detection Logic - Part 1

   In this first part we are going to share with you some common logical and high level steps we tend to follow to design detection logic for a certain attack technique. To make it simple and straightforward we will start with some definitions (to align) and then analyze the following diagram that summarizes big chunks of the process.

Definitions:

  • attack technique: group of small blocks (primitives) chained to bypass a certain security control (e.g. steal secrets, elevate privileges, execute code remote or locally).
  • datasource: mainly logs (host and network) and OS telemetry such as processes execution, file modifications, network connection.
  • detection resilience: high level qualitative metric to measure how easy for an attacker to bypass a certain detection logic (e.g. to detect LSASS memory dump creation we monitor file creation with the name "lsass.dmp". this can be easily bypassed if the attacker has control over the file name).
  • unique changes: if a certain attack primitive  performs a change that happens a lot and in normal conditions (e.g. create a file with extension .tmp or .js in the user temporary directory)  then this change is not unique enough and hence can't be used as an indicator of suspicious activity.
  • context: if a certain change is unique enough to use it as an indicator of suspicious activity, we still have to assess if it provides enough context or it can be associated to 100 techniques.



Step A, consist of identifying all building blocks of certain attack technique, in our example we have 8 primitives for the attack technique X (often involves reading documentation and source code if available and needed).

Step B, consist of identifying what's necessary for the technique success and what's optional from an attacker perspective for the success of the technique, in our example out of 8 primitives only 5 are needed (still green) and the rest are optional and if omitted the technique still works. 

Step C, consist of identifying what's under the attacker control and what's not (e.g. in PM1 the technique needs drop a dll file in the system32 directory, the default name is abc.dll (still can be used as signature) but the attacker controls the name and can set it to more than 20 different unique names). In our example out of 5 necessary PMs, only 3 are non modifiable (still green) and 2 are modifiable (marked as dark green).

Step D, consist of mapping the 5 necessary PM to the relevant datasources we have at our disposition, (e.g. in PM8 Explorer.exe will perform a network connection but we don't collect processes network telemetry). In our example out of 5 PMs we have telemetry for only  3 PMs and the 2 others are opportunities for improvement (marked in purple) and if we encounter a medium to high number of techniques that requires the same type of telemetry then it's worth using it as a justification to enable visibility on those gaps.

Step E, mainly consist of identifying what's normal (happens a lot and if enabled as a detection will DoS your mailbox and SIEM), exclusion opportunities and what's unique enough to use it as an indicator of suspicious activity. This usually involves querying the history of existing datastores and if the number of hits is medium to low then its worth moving to the next step. In our example out of 3 remaining PMs we are left with 2 .

Step F,  In this step we are are left with 2/8 PMs, that can serve as our initial detection scope, we need to assess the detection opportunities we have in term of performance impact, alert context and enrichment options. for instance if PM4 alone is indeed indicative of something suspicious still it can be also associated to other unrelated malicious techniques (context), and for PM5 we need to create a rule that matches against 100 different file names (query time and high performance impact). 

 Following those steps in order is not necessary, and we may have missed (unintentionally) some other important steps. It usually comes to having a good understanding of the offensive technique, filtering out normal behavior while in the same time balancing detection resilience, alert context and performance impact. Also not always we have guarantees to come up with a detection for a TTP, but the ultimate goal is to capture gaps and potential opportunities of improvement. In the upcoming parts we will try to cover each step in details with some practical examples. 



Friday, 4 September 2020

Hunting Local Accounts and Groups Changes using Sysmon

   Visibility on local accounts and groups changes is as important as for Domain ones for both good systems hygiene and security. attackers may add or change existing local account to persist, escalate privileges or simply to bypass any existing known accounts monitoring. In this post we will try to highlight some of the standard options to enable monitoring local accounts and also share some less known tricks that you can leverage using your existing Sysmon or EDR for hunting local accounts activities.

A) Windows Native Event Logs:

Windows provides good auditing for this category of changes under Account Management Audit Policy:

 

below example of event-id 4720 recording a local account creation activity:

adding user support to the local Administrators group is also covered by event-id 4732:

 

As can be seen, both events provide good details such as when, who did the action and other relevant details and it's important to capture those events where feasible. This method provides a reasonable resilience level but often subject to common audit policy and central logs collection issues.

B) Process Activity:

This is the most common approach, monitoring system command-line value such as net.exe or net1.exe with args containing keywords such as "/add", "administrators" is good and must have but not that resilient if the same activity is done via  APIs or using an uncommon utility.

C) APIs Hooking:

This approach consists of hooking relevant System APIs such as NetUserAdd, NetLocalGroupAddMemberNetUserSetInfo and NetLocalGroupSetInfo which is indeed a more resilient approach than the Process Command-line one but still subject to evasions techniques such as hooking/unhooking, direct Syscalls or RPC via MS-SAMR (SamrCreateUserInDomain, SamrAddMemberToGroup ):

 

D) Sysmon:

Sysmon provides great set of events covering different type of actions but none of them is specific to local accounts changes. one easy approach is to monitor process creation with user name like "MachineNamePatterns\*" but this provides clues on the activities conducted by a local account and not related to account creation or modification.

we know also that most local accounts activity tend to be saved on the SAM registry hive, and we also known that Sysmon provides visibility on Registry changes via events 12 (key creation or deletion) and 13 (registry value modification) so let's try to do the same action we did before with ProcMon ON and see if there are any relevant changes we can use for hunting:

Main changes that are relevant to our immediate hunting needs are:

  • New Local Account Name means new registry key HKLM\SAM\SAM\DOMAINS\Account\Users\Names\<accountname> (Sysmon 12 👍):

  • Similar to account creation, local account deletion can be detected using Sysmon EventID 12 (EventType eq to DeleteKey):

  • Account added or deleted from local Administrators Group means changes to HKLM\SAM\SAM\Domains\Builtin\Aliases\00000220\ 
00000220 is the local Administrators group Alias on Windows


which we can also confirm by looking at the changed binary data value

 000003F8 is a unique key name (RID) associated to the account support which was appended to the Administrators Alias C value: