Writing Advanced Sigma Detection Rules: Using Correlation Rules

If you are reading this blog post via a 3rd party source it is very likely that many parts of it will not render correctly (usually, the interactive graphs). Please view the post on dogesec.com for the full interactive viewing experience.

tl;dr

One of my previous posts, Writing Effective Sigma Detection Rules: A Guide for Novice Detection Engineers, led to a lot of questions.

Specifically, how behavioral based detections over multiple events could be achieved in Sigma to detect TTPs (e.g. this happens, then this happens, and then this happens).

By popular demand, here’s a follow up to that post where I dig into correlations in Sigma which achieve just that.

Why correlations?

Sigma Rules have largely focused on detections of single significant events. Of course, it is rare that a single event can provide enough context to indicate an incident.

When I worked at Splunk, over 12 years ago now, one of the most popular ways to demonstrate correlations in SPL was using the example of detecting failed logons.

5 failed logon events from the same IP where previous successful attempts have been made from could be considered benign.

However, 5 failed logins, from 5 different IPs never seen before is indicative of a security incident.

By linking multiple events together, based on characteristics (like source IP and machine), it is possible to identify patterns which a single event would not reveal.

The reality is, if you’re using detections on single events, it is likely you’re going to be triaging a lot of false positives.

Meta rules

Sigma Meta rules are an extension of the Sigma detection format. They allow for the definition of more advanced detection techniques, such as correlations and filters (I covered filters in the last post).

Meta rules share many of the same metadata properties as a base Sigma Rule (e.g. title, level, references, etc.).

However, instead of logsources and detection sections, Correlation rules have a correlation property.

Correlation rules themselves reference one or more base Sigma rules to perform the correlation logic on.

When packaging Correlation rules to share, they are often in stored in the same YAML file as the core Sigma rule(s) they reference, with each rule separated by ---.

e.g.

<SIGMA CORRELATION RULE HERE>
---
<SIGMA CORE RULE HERE>
---
<SIGMA CORE RULE HERE>

Note, the order of rules in the file has no significance, but it’s generally clearer from experience to include the correlation rule at the top of the file.

Lets look at some correlation rules in practice.

The Structure of a Sigma Correlation attribute

The correlation attribute has the following structure:

type: The type of correlation to be used (see types below).
rules: A list of Sigma rules that are used for the correlation (either by name or id values).
group-by: A list of fields to group the events by.
timespan: The time frame in which the events are aggregated.
condition: The condition that has to be met for the correlation to match.

It’s easiest to demonstrate how these work using some real examples.

There are currently four correlation types available to use in Correlation rules;

event_count: A threshold rule where and alert is generated if the number of times a particular rule fires is exceeded, with a field specified to meaningfully group the events.
value_count: A threshold rule where an alert fires if the number of unique values for a certain field in a given rule is exceeded within a given timeframe, grouped by a particular field which links the events.
temporal: A cluster rule where an alert is generated if multiple distinct but related rules fire within a given timeframe.
ordered_temporal: A chain rule where an alert is generated if distinct but related rules fire within a given timeframe in a particular order.

Correlation types: `event_count`

This correlation type is typically chosen when the frequency of an event within a given time frame is relevant.

Here’s an example correlation using event_count type along with the base rule;

### CORRELATION
title: Multiple failed logons for a single user (possible brute force attack)
status: test
correlation: 
    type: event_count
    rules:
        - failed_logon
    group-by:
        - TargetUserName
        - TargetDomainName
    timespan: 5m
    condition:
        gte: 10
tags:
    - brute_force
    - attack.t1110
---
### BASE RULE
title: Windows Failed Logon Event
name: failed_logon
description: Detects failed logon events on Windows systems.
logsource:
    product: windows
    service: security
detection:
    selection:
        EventID: 4625
    condition: selection

The base rule is searching Windows security logs for EventID = 4625 (an account failed to logon).

The Sigma correlation rule defines how the rule (failed_logon) should be considered.

Literally the correlation rule reads;

for the rule name = failed_logon,
where 10 or more matching events are seen within 5m (minutes)
with either the same TargetUserName and TargetDomainName

Trigger an alert.

Just to stress the benefit of the Correlation rule here; without the correlation part, the rule would simply trigger every time a failed logon event happens. With the correlation rule the logic is much more targetted to logon attempts that look more suspicious.

I also want to go back to the earlier point about Sigma Correlations also containing the same metadata properties as a base rule. Here you can see title, status, and tags being used (inc. an ATT&CK tag for T1110).

Correlation types: `value_count`

value_count differs from event_count because it counts field values, rather than events themselves.

value_count is useful for detecting a high or low number of unique entities.

For example;

### CORRELATION
title: Enumeration of multiple high-privilege groups by tools like BloodHound
status: stable
correlation:
  type: value_count
  rules:
    - privileged_group_enumeration
  group-by:
    - SubjectUserName
  timespan: 15m
  condition:
    gte: 4
    field: TargetUserName
level: high
falsepositives:
  - Administrative activity
  - Directory assessment tools
---
### BASE RULE
title: High-privilege group enumeration
name: privileged_group_enumeration
status: stable
logsource:
  product: windows
  service: security
detection:
  selection:
    EventID: 4799
    CallerProcessId: 0x0
    TargetUserName:
      - Administrators
      - Remote Desktop Users
      - Remote Management Users
      - Distributed COM Users
  condition: selection
level: informational
falsepositives:
  - Administrative activity
  - Directory assessment tools

The base rule is searching windows products the are classified as a security service where an event with EventID: 4799 AND CallerProcessId = 0x0 AND (TargetUserName = "Administrators" OR TargetUserName = "Remote Desktop Users" OR TargetUserName = "Remote Management Users" OR TargetUserName = "Distributed COM Users").

The Sigma correlation rule defines how the rule (privileged_group_enumeration) should be considered.

Literally the correlation rule reads;

for the rule name = privileged_group_enumeration,
where TargetUserName is seen more then 4 times in 15m (minutes)
with either the same SubjectUserName

Correlation types: `temporal`

This type is useful for identifying related events that happen within a specified timeframe.

For example;

### CORRELATION RULE
title: CVE-2023-22518 Exploit Chain
description: Access to endpoint vulnerable to CVE-2023-22518 with suspicious process creation.
status: experimental
correlation:
  type: temporal
  rules:
    - 1ddaa9a4-eb0b-4398-a9fe-7b018f9e23db
    - a902d249-9b9c-4dc4-8fd0-fbe528ef965c
  timespan: 10s
level: high
---
### BASE RULE
title: CVE-2023-22518 Exploitation Attempt - Vulnerable Endpoint Connection (Webserver)
id: a902d249-9b9c-4dc4-8fd0-fbe528ef965c
related:
    - id: 27d2cdde-9778-490e-91ec-9bd0be6e8cc6
      type: similar
status: test
description: |
    Detects exploitation attempt of CVE-2023-22518 (Confluence Data Center / Confluence Server), where an attacker can exploit vulnerable endpoints to e.g. create admin accounts and execute arbitrary commands.
references:
    - https://confluence.atlassian.com/security/cve-2023-22518-improper-authorization-vulnerability-in-confluence-data-center-and-server-1311473907.html
    - https://www.huntress.com/blog/confluence-to-cerber-exploitation-of-cve-2023-22518-for-ransomware-deployment
    - https://github.com/ForceFledgling/CVE-2023-22518
author: Andreas Braathen (mnemonic.io)
date: 2023-11-14
tags:
    - detection.emerging-threats
    - attack.initial-access
    - attack.t1190
    - cve.2023-22518
logsource:
    category: webserver
detection:
    selection_method:
        cs-method: 'POST'
    selection_uris:
        cs-uri-query|contains:
          # Exploitable endpoints
            - '/json/setup-restore-local.action'
            - '/json/setup-restore-progress.action'
            - '/json/setup-restore.action'
            - '/server-info.action'
            - '/setup/setupadministrator.action'
    selection_status:
        # Response code may be indicative of exploitation success, but is not always the case
        sc-status:
            - 200
            - 302
            - 405
    condition: all of selection_*
falsepositives:
    - Vulnerability scanners
level: medium
---
### BASE RULE
title: CVE-2023-22518 Exploitation Attempt - Suspicious Confluence Child Process (Linux)
id: f8987c03-4290-4c96-870f-55e75ee377f4
related:
    - id: 1ddaa9a4-eb0b-4398-a9fe-7b018f9e23db
      type: similar
status: test
description: |
    Detects exploitation attempt of CVE-2023-22518 (Confluence Data Center / Confluence Server), where an attacker can exploit vulnerable endpoints to e.g. create admin accounts and execute arbitrary commands.
references:
    - https://confluence.atlassian.com/security/cve-2023-22518-improper-authorization-vulnerability-in-confluence-data-center-and-server-1311473907.html
    - https://www.huntress.com/blog/confluence-to-cerber-exploitation-of-cve-2023-22518-for-ransomware-deployment
    - https://github.com/ForceFledgling/CVE-2023-22518
author: Andreas Braathen (mnemonic.io)
date: 2023-11-14
tags:
    - detection.emerging-threats
    - attack.execution
    - attack.t1059
    - attack.initial-access
    - attack.t1190
    - cve.2023-22518
logsource:
    category: process_creation
    product: linux
detection:
    selection_parent:
        ParentImage|endswith: '/java'
        ParentCommandLine|contains: 'confluence'
    selection_child:
        # Only children associated with known campaigns
        Image|endswith:
            - '/bash'
            - '/curl'
            - '/echo'
            - '/wget'
    filter_main_ulimit:
        CommandLine|contains: 'ulimit -u'
    condition: all of selection_* and not 1 of filter_main_*
falsepositives:
    - Unlikely
level: high

Here, two base rules are called by the Correlation rule.

As a quick aside, unlike the two previous examples, the Correlation rule uses the base rule ids not their names (either method is supported, just wanted to point this out should there be any confusion).

The first core rule (1ddaa9a4-eb0b-4398-a9fe-7b018f9e23db) use webserver logs where the events cs-method field must be POST, the cs-uri-query field must contain either /json/setup-restore-local.action OR /json/setup-restore-progress.action OR /json/setup-restore.action OR /server-info.action OR /setup/setupadministrator.action AND the sc-status code must return either a 200 OR 302 OR 405 code.

The second core rule (a902d249-9b9c-4dc4-8fd0-fbe528ef965c) uses Linux logs related to process creation events that contain (ParentImage|endswith: '/java' AND ParentCommandLine|contains: 'confluence') AND (Image|endswith either /bash OR /curl OR /echo OR /wget) but the CommandLine field should never contain ulimit -u.

The correlation rule then looks for instances where both these rules trigger within a timespan of 10s (seconds).

Correlation types: `ordered_temporal`

The ordered_temporal correlation is similar to temporal, but adds the order of the events in the rules attribute as another condition (e.g. rule 1 must trigger before rule 2).

Generally speaking a well written temporal correlation is better to use than ordered_temporal (due to efficiency and conversion issues) – the appearance of certain events in a specific time frame is already sufficient for a detection, e.g. multiple failed logons followed by a successful one delivers mostly the same results if the failed and successful logon just appear within the same time frame.

Using the same example as I showed last for a temporal correlation but changing the type to ordered_temporal;

### CORRELATION RULE
title: CVE-2023-22518 Exploit Chain
description: Access to endpoint vulnerable to CVE-2023-22518 with suspicious process creation.
status: experimental
correlation:
  type: ordered_temporal
  rules:
    - 1ddaa9a4-eb0b-4398-a9fe-7b018f9e23db
    - a902d249-9b9c-4dc4-8fd0-fbe528ef965c
  timespan: 10s
level: high
---
...

This time, rule 1ddaa9a4-eb0b-4398-a9fe-7b018f9e23db must trigger before a902d249-9b9c-4dc4-8fd0-fbe528ef965c (and this must be within a 10 second time window). Whereas for temporal correlations these rules could have fired in any order to trigger a detection.

Dealing with different field names

If specifying more than one core rule in a Sigma correlation rule, you might run into issues where field names being considered are named different across the log sources.

One example of such a situation is when the correlation rule must aggregate by source IP with the destination IP in another event type.

Sigma correlations support log field normalisation using aliases to fix this problem. Here’s an example of a correlation rule with an alias defined;

correlation:
  type: temporal
  rules:
    - rule_with_src_ip_field
    - rule_with_dest_ip_field
  aliases:
    ip:
      rule_with_src_ip_field: src_ip
      rule_with_dest_ip_field: dest_ip
  group-by:
    - ip
  timespan: 5m

The aliases attribute defines a virtual field ip that is mapped from the field src_ip in the events matched by rule rule_with_src_ip_field and from dest_ip in the events matched by the rule rule_with_dest_ip_field. The defined field ip is then used in the group-by field list as aggregation field name for IPs across both events.