Everything you ever wanted to know about Sigma rules. Part 1

In creating products and developing expertise, we are primarily guided by the desire to improve the security of companies. However, our research is driven by more than customer care. For quite a long time, we had a desire to conduct research for the information security community on a volunteer basis, and now we are actively doing this: we publish detections of high-profile network attacks on Twitter , supply traffic analysis rules to the ANY.RUN service, and add ETOpen rules . There are many open source projects to which you can send a pull request, but until recently, host detections still did not reach their hands.



And then we learned that a group of enthusiasts decided to arrange a two-week sprinton writing rules for the Sigma project , which was created to develop a unified format for describing rules for SIEM systems and is supported by more than 140 participants. We were interested in the news about the event, since as a SIEM vendor we closely follow the development of the community.



Imagine our surprise when the organizers contacted us and invited the PT Expert Security Center team to participate in the sprint! The event participants formed the Open Security Collaborative Development (OSCD) - an international initiative of information security specialists aimed at disseminating knowledge and improving computer security in general. We gladly agreed to participate in order to apply our experience for the benefit of common safety.



How this article came about



When we started writing the rules, we realized that there was no exhaustive description of the syntax of Sigma-rules, especially in Russian. The main sources of knowledge are GitHub and personal experience. There are several good articles (in Russian and in English ), but in them the focus is shifted from the syntax of the rules to the analysis of the scope of Sigma rules or the creation of a specific rule. We decided to make it easier for beginners to get acquainted with the Sigma project, share our own experience, collect in one place information about the syntax and features of its use. And of course we hope that this will help expand the OSCD initiative and create a large community in the future.



Since there was a lot of material, we decided to publish a description in a series of three articles:



  1. , ( ).
  2. . , .
  3. (, , , , ) .


Sigma



Sigma is a unified format for describing detection rules based on data from logs. Rules are stored in separate YAML files. Sigma allows you to write a rule using a unified syntax once, and then, using a special converter, get the rule in the syntax of a supported SIEM system. In addition to the syntax of queries of various SIEM systems, the creation of queries of the following types is supported:



  • Elasticsearch Query,
  • grep utility launch line with the required parameters,
  • Windows Audit Logs PowerShell string.


The last two types are notable for the fact that they do not require additional software for analyzing logs. The grep utility and the PowerShell interpreter are supported out of the box on Linux and Windows, respectively.



The existence of a unified format for describing detections based on logs makes it easier to share knowledge, develop open-source security, and help the information security community fight emerging threats.



General syntax



First of all, it should be said that there are required and optional parts of the rule. This is documented in the official wiki on GitHub. The outline of the rule (source: official Wiki) is presented below:







Almost any rule can be roughly divided into three parts:



  1. attributes describing the rule (meta information);
  2. attributes describing data sources;
  3. attributes describing the conditions for triggering the rule.


Each of the parts corresponds to the required high-level attributes of title (in addition to title, the last group includes other optional high-level attributes), logsource and detection .



There is one more feature of the rule structure that is worth talking about. Since the rules are described in the YAML markup language, the Sigma developers have found some use for this, because the YAML format allows multiple YAML documents to be placed in one file. And for Sigma - several rules to combine in one file, that is, create "rule collections". This approach is convenient when there are several ways to detect an attack and you do not want to duplicate the descriptive part (as will be described in the corresponding section, you can duplicate not only the descriptive part of the rule).



In this case, the rule is conventionally divided into two parts:



  • a part with general attributes for collection items (usually all fields, except for the logsource and detection sections),
  • one or several parts describing the detection (sections logsource and detection).


If the file contains a single rule, this statement is also true, since we are getting a degenerate collection from one rule. Collections of rules will be discussed in detail in the third part of this article series.



Next, we'll look at an example of a hypothetical rule. It should be noted that comments in this form are usually not used in rules, here they are only for describing fields.



Description of the typical rule





An example of creating a Sigma rule



Before describing the details of the syntax and talking about the capabilities of Sigma rules, let us consider a small example of creating such a rule in order to make it clear where in practice these or those attribute values ​​come from. There is a good article on this topic in English. If you have already tried to write your own rules and figured out what data should be specified in the attribute of the YAML file, you can proceed to the next section with a detailed description of the event sources section (we will also call this section log sources).



Let's describe how to create a rule that detects the use of SettingSyncHost.exe as Living Off The Land Binary (LOLBin). Rule creation usually involves three stages:



  1. carrying out an attack and collecting the necessary logs,
  2. description of the detection as a rule,
  3. checking the created rule.


Conducting an attack



The idea for the rule is well documented on the Hexacorn blog . After careful reading, it becomes clear what steps need to be taken to repeat the result described in the article:



  1. Copy the program you want to run to any writeable directory. The article suggests to choose% TEMP%, however you can choose the path of your choice. It is worth considering that a subdirectory will be created in this directory with the name that you specify in step 4.
  2. , , , (wevtutil.exe, makecab.exe, reg.exe, ipconfig.exe, settingsynchost.exe, tracelog.exe). , findstr.exe. , SettingSyncHost.exe Binary Search Order Hijacking (MITRE ATT&CK ID: T1574.008).
  3. , ( settingsynchost.exe cmd PowerShell, cd < >).
  4. : c:\windows\system32\SettingSyncHost.exe -LoadAndRunDiagScript <___>
  5. SettingSyncHost.exe.






Sysmon is installed on the system with a configuration file from the sysmon-modular project . Thus, the collection of logs was carried out automatically. What kind of logs are useful for writing a detection will be seen while writing a rule.



Description of the detection in the form of a Sigma rule



At this step, two approaches are possible: find an existing rule that is closest in the detection logic and modify it to suit your needs, or write a rule from scratch. In the initial stages, it is recommended to stick to the first approach. For clarity, we will write a rule using the second approach.



We create a new file and try to briefly and succinctly describe its essence in the name. Here you should adhere to the style of the existing rules. In our case, we chose the name win_using_settingsynchost_to_run_hijacked_binary.yml. Next, we start filling it with content. Let's start by filling in the meta information at the beginning of the rule. We already have all the data necessary for this.

We briefly describe what kind of attack the rule detects in the fieldtitle, more detailed explanations - in the description field, for new rules it is customary to set status: experimental. The unique identifier can be generated in a number of ways; on Windows, the easiest way is to run the following code in a PowerShell interpreter:



PS C:\> "id: $(New-Guid)"
id: b2ddd389-f676-4ac4-845a-e00781a48e5f


The rest of the fields speak for themselves, I will only note that it is advisable to provide links to sources that helped to understand the attack. This will help people who will further understand this rule, and it is also a tribute to the efforts made by the author of the original study to describe the attack.



Our rule at this stage is as follows:







Next, you need to describe the sources of logs. As mentioned above, we will rely on Sysmon logs, however, with the advent of generic categories, it is customary to use the process_creation category to create processes. More about generalized categories will be discussed below. Note that it is customary to write comments and advice on configuring sources in the definition field, such as Sysmon configuration features:







Now it is necessary to describe the detection logic. This is the most time consuming part. This attack can be detected by many criteria, our example does not claim to cover all possible ways of detection, therefore, we will describe one of the possible options.



If you look at the events that happened, you can build the following chain.

First, we started the process (PID: 4712) with the start line c: \ windows \ system32 \ SettingSyncHost.exe -LoadAndRunDiagScript join_oscd







Note that the current working directory of the process is the user's TEMP directory.



Next, the running process creates a batch file and starts its execution.











The process of executing batch file instructions received the identifier 7076. Upon further analysis of the events, we see the launch of the ipconfig.exe file, which does not contain the metadata inherent in system files and, in addition, is located in the folder with temporary files:







It is proposed to consider the launch of processes whose executable files do not lie in the system directory (C: \ Windows \ System32), and also if the parent process startup line contains the substrings "cmd.exe / c", "RoamDiag.cmd" and "-outputpath". Let's describe this in the Sigma syntax and get the final rule (a detailed analysis of constructions that can be used to describe the detection logic will be given in the next part of our series of articles about Sigma):







Checking that the rule works



We launch the converter into a PowerShell query:







For our case, this query will not give the desired result, since the exclusion filter also finds the path to the parent process executable file image. Therefore, we simply indicate that there should not be a letter t before the word Image - the end of the word Parent:







We see that our event was found. The rule works.



This is how Sigma rules are created in practice. Next, we will describe in detail the fields responsible for the detection, namely the description of the log sources.



Detect description



The main part of the rule is the description of the detection, since this is where information is contained about where and how to look for signs of an attack. This information is contained in the fields of the logsource (where) and detection (how) attributes. In this article we will take a closer look at the logsource section, and describe the detection section in the next part of our series.



Description of the event sources section (logsource attribute)



The description of event sources is contained in the value of the logsource field . This section describes the data sources from which events for the detection section will be delivered (the detection attribute is discussed in the next section). The section describes the source itself, the platform and the application that are required for detection. It can contain three attributes that are processed automatically by converters, and an arbitrary number of optional elements. Basic fields:



  • Category - describes the classes of products. Examples of values ​​for this field: firewall, web, antivirus. Also, the field can contain generalized categories, which will be discussed below.
  • Product is a software product or operating system that creates logs.
  • Service - restriction of logs to a certain subset of services, for example "sshd" for Linux or "Security" for Windows.
  • Definition - an additional field to describe the features of the source, for example, requirements for setting up auditing (rarely used, an example of a rule with this field can be found on GitHub ). It is recommended to use this attribute if the source has any specifics.


 

The official wiki on GitHub defines a set of fields that must be used in order for the rules to be cross-product. These fields are summarized in the table below.



Category Product Service
windows security
system
sysmon
taskscheduler
wmi
application
dns-server
driver-framework
powershell
powershell-classic
linux auth
auditd
clamav
apache access
error
process_creation windows
proxy
firewall
webserver
dns


Next, we will describe in more detail some sources of logs, indicating the used event fields and give examples of rules in which these fields are used.



Proxy category event fields



Category Product / Service Fields Examples
proxy c-uri proxy_ursnif_malware.yml
c-uri-extension proxy_download_susp_tlds_blacklist.yml
c-uri-query proxy_susp_flash_download_loc.yml
c-uri-stem proxy_susp_flash_download_loc.yml
c-useragent proxy_powershell_ua.yml
cs-bytes -
cs-cookie proxy_cobalt_amazon.yml
cs-host proxy_cobalt_ocsp.yml
cs-method proxy_downloadcradle_webdav.yml
r-dns proxy_apt40.yml
cs-referrer -
cs-version -
sc-bytes -
sc-status proxy_ursnif_malware.yml
src_ip -
dst_ip -


Description of the event fields of this source
-------------------------------------------------- -------------

c-uri - URI,  

c-uri-extension -  URI.     

c-uri-query -  URI,     

c-uri-stem -    URL   ( :)    .    URIstem        -

c-useragent -  UserAgent  HTTP- 

cs-bytes -  ,     

cs-cookie -  cookie,     

cs-host -  Host  HTTP- 

cs-method -  HTTP- 

r-dns - DNS-  

cs-referrer -  Referrer  HTTP- 

cs-version -   HTTP,   

sc-bytes -  ,     

sc-status -  HTTP-

src_ip - IP- 

dst_ip - IP- 


Firewall Event Fields



Category Product / Service Fields Examples
firewall src_ip -
src_port -
dst_ip -
dst_port net_high_dns_bytes_out.yml
username -


Description of the event fields of this source
---------------------------------------------------------------
src_ip - IP-  
src_port - ,     
dst_ip - IP-  
dst_port - ,     
username -  ,    




Event fields of the Web server category



Category Product / Service Fields Examples
webserver c-uri web_cve_2020_0688_msexchange.yml
c-uri-extension -
c-uri-query -
c-uri-stem -
c-useragent -
cs-bytes -
cs-cookie -
cs-host -
cs-method web_cve_2020_0688_msexchange.yml
r-dns -
cs-referrer -
cs-version -
sc-bytes -
sc-status -
src_ip -
dst_ip -


Description of the event fields of this source
---------------------------------------------------------------
c-uri  - URI,   
c-uri-extension -  URI.      
c-uri-query -  URI,      
c-uri-stem  -    URI   ( :)    .    URI stem        - 
c-useragent  -  UserAgent  HTTP-  
cs-bytes  -  ,      
cs-cookie -  cookie,      
cs-host -  Host  HTTP-  
cs-method -  HTTP-  
r-dns  - DNS-   
cs-referrer -  Referrer  HTTP-  
cs-version -   HTTP,    
sc-bytes -  ,      
sc-status -  HTTP- 
src_ip - IP-  
dst_ip - IP- 




Generalized categories



Since Sigma is a generalized format for describing log-based detection rules, the syntax of such rules should be able to describe the detection logic for different systems. Some systems use tables with aggregated data instead of events, and data from different sources may come in to describe the same situation. To unify syntax and solve similar problems, a generic logsources mechanism was introduced. At the moment, one such category has been created - process_creation. You can read more about this on the patzke.org blog . The list of fields for this category can be found on the taxonomy page (this page also describes other supported categories).



Generalized category event fields process_creation



Category Product Fields Examples
process_creation windows UtcTime -
ProcessGuid -
ProcessId sysmon_raw_disk_access_using_illegitimate_tools.yml
Image win_susp_regsvr32_anomalies.yml
FileVersion sysmon_susp_file_characteristics.yml
Description sysmon_susp_file_characteristics.yml
Product sysmon_susp_file_characteristics.yml
Company sysmon_susp_file_characteristics.yml
CommandLine win_meterpreter_or_cobaltstrike_getsystem_service_start.yml
CurrentDirectory win_susp_powershell_parent_combo.yml
User win_susp_schtask_creation.yml
LogonGuid -
LogonId -
TerminalSessionId -
IntegrityLevel -
imphash win_renamed_paexec.yml
md5 -
sha256 -
ParentProcessGuid -
ParentProcessId -
ParentImage win_meterpreter_or_cobaltstrike_getsystem_service_start.yml
ParentCommandLine win_cmstp_com_object_access.yml


Description of the event fields of this source
---------------------------------------------------------------
UtcTime -    UTC 
ProcessGuid - GUID   
ProcessId - PID   
Image -      
FileVersion -  ,      
Description -  ,      
Product -  ,      
Company -   β€”  ,      
CommandLine -     
CurrentDirectory -     
User - ,      
LogonGuid - GUID    
LogonId -     
TerminalSessionId -     
IntegrityLevel -  ,     
imphash - -         
md5 - MD5-  ,      
sha256 - SHA256-  ,      
ParentProcessGuid - GUID   
ParentProcessId - PID   
ParentImage -       
ParentCommandLine -    




UPDATE



In preparation for this article, new generic categories have been added:





They all involve duplicate information in Windows log events and Sysmon log events. We advise you to use existing generalized categories when writing your rules. Since the project is actively developing, it is advisable to follow the emergence of new categories and update your rules in accordance with further innovations.



Event source usage statistics in existing rules



The table below shows the most common constructs for describing log sources. Most likely, you will find among them the one that suits your rule.



Statistics on the use of a combination of description fields for some of the most common sources (a dash means the absence of this field):

Number of rules Category Product Service Sample syntax A comment
197 process_creation windows - logsource:

       category: process_creation

       product: windows
Generalized category of process creation logs on Windows systems. Includes Sysmon

EventID = 1

and Windows Security Event Log

EventID = 4688
68 - windows sysmon logsource:

     product: windows

     service: sysmon
Sysmon events
48 -

windows security logsource:

    product: windows

    service: security
Windows Security Event Log
24 proxy β€” β€” logsource:

category: proxy
-
15 β€” windows system logsource:

    product: windows

service: system
Windows System Event Log
12 accounting cisco aaa logsource:

    category: accounting

product: cisco

service: aaa
Cisco AAA Security Services
10 β€” windows powershell logsource:

    product: windows

service: powershell


Microsoft Windows PowerShell

Event Log
9 β€” linux β€” logsource:

product: linux
Linux
8 β€” linux auditd logsource:

product: linux

service: auditd
Linux events, clarification to the logs of a specific service (AuditD subsystem)


Tips for writing rules



When writing a new rule, the following situations are possible:



  1. The correct event source has already been used in existing rules.
  2. There is not a single rule in the repository that uses this event source.


If you are faced with the first case, then use one of the existing rules as a template. Perhaps the required log source is already used in other rules, then this means that the authors of plugins (backend converters) for different SIEM systems, most likely, took it into account in their mapping and your rule should be processed correctly right away.



In the second situation, it is necessary, using the example of existing rules, to understand how to use the category, product and service identifiers correctly. When creating your own log source, it is recommended to add it to all mappings of existing backends. Other contributors or even developers can do this, the main thing is to inform about such a need.



We have created a visualization of the combination of log source description fields in the existing rules:



Distribution of log sources







Use statistics for combinations of logsource attribute subfields







In this article, we gave an example of creating a simple rule and talked about the description of event sources. Now you can apply the knowledge gained, look at the rules in the Sigma repository and figure out which sources are used in a particular rule. Follow our publications: in the next part we will look at the most difficult part of Sigma rules - the section describing the detection logic.



Author : Anton Kutepov, specialist of the department of expert services and development of Positive Technologies (PT Expert Security Center)



All Articles