We are friends with ELK and Exchange. Part 1





I'm starting a series of articles where I want to share my experience of connecting Exchange and ELK. This stack will help you process large volumes of logs and not wonder at what size the usual logging tools will refuse to help us. Let's get acquainted with the new fighter with logs.



Exchange has a fairly extensive logging system. The most popular logs are tracking logs, which track the step-by-step passage of a particular letter within the postal organization; web server logs, which track each new user session in the system, and logs of specific web applications with varying degrees of session granularity. Exchange can also store raw smtp, imap and pop3 protocol logs.



What tools can we use to work with logs:



  • Standard Get-MessageTrackingLog cmdlet: convenient to process tracking logs;
  • The logparser utility: uses a pseudo-SQL search language for logging and works fairly quickly;
  • External SQL server: for especially specific cases (for example, analyzing data over long periods of time).


All this works well when we have a couple of servers and the volume of processed logs is measured in tens or hundreds of gigabytes. But what if there are tens of servers, and the size of the logs has exceeded a terabyte? This scheme is most likely starting to crumble.



And here's what happens: Get-MessageTrackingLog starts to fall off due to timeout, logparser hits the ceiling of 32-bit architecture, and uploading to SQL server breaks at the most inopportune moment, without digesting a multi-line exception from the service.



Here a new player enters the scene - the ELK stack, which is specially tailored for juggling with huge volumes of logs in a reasonable time and with tolerable resource consumption.



In the first part I will explain in detail how to connect filebeat, which is part of the ELK stack- is responsible for reading and sending simple text files to which different applications write their logs. In the following articles, we will dwell on the Logstash and Kibana components in more detail.



Installation



So, the file-archive of the filebeat agent can be downloaded from this site .



We'll complete the installation by simply unpacking the contents of the zip file. For example, in c:\Program Files\filebeat. Then you need to run the PowerShell script install-service-filebeat.ps1that comes with the kit to install the filebeat service.



We are now ready to start customizing the configuration file.



fault tolerance



Filebeat guarantees the delivery of logs to the log collection system. This is accomplished by maintaining a register of entries in log files. The registry stores information about those records that were read from the log files, and marks the specific records that were delivered to the destination.



If some record cannot be delivered, then filebeat will try to send it again until it receives a confirmation of delivery from the receiving system or the original log file is deleted during the rotation.



When the service is restarted, filebeat will read information about the last read and delivered records from the registry and will read records in the log files based on the information in the registry.



This allows you to minimize the risk of losing information about the logs that need to be sent to elastic \ logstash servers in the process of unforeseen failures and performing server maintenance operations.



You can read more about this in the documentation in the paragraphs : How does Filebeat keep the state of files and How does Filebeat ensure at-least-once delivery?



Setting up



All configuration is done in the format configuration file yml, which is divided into several sections. Let's look at some of them that are involved in collecting logs from Exchange servers.



Log processing unit



The log processing block begins with the field:



filebeat.inputs:


We will use a common log collection tool:



- type: log


Next, we indicate the status (enabled) and paths to the folder with logs. For example, in the case of IIS logs, the settings can be as follows:



    enabled: true
    paths:
	- C:\inetpub\logs\LogFiles\W3SVC1\*.log
	- C:\inetpub\logs\LogFiles\W3SVC2\*.log


Another important tweak is how filebeat should read multi-line records. By default, filebeat considers one line of the log file to be one record. This works well, as long as we do not start receiving exceptions related to the incorrect operation of the service in the log. In this case, the exceptions can span multiple lines. Therefore filebeat should treat a multi-line record as one if the next line starts with a date. The format for recording logs in Exchange is as follows: each new record in the log file begins with a date. In the configuration, this condition looks like this:



multiline:
	pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2}'
	negate: true
	match: after


It makes sense to add tags to the post you send, for example:



  tags: ['IIS', 'ex-srv1']


And do not forget to exclude from processing lines starting with a hash symbol:



  exclude_lines: ['^#']


So, the block for reading logs will look like this:



filebeat.inputs:
- type: log
  enabled: true
  paths:
	- C:\inetpub\logs\LogFiles\W3SVC1\*.log
	- C:\inetpub\logs\LogFiles\W3SVC2\*.log
  multiline:
	pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2}'
	negate: true
	match: after
  tags: ['IIS', 'ex-srv1']
  exclude_lines: ['^#']


Log sending block



Filebeat sends individual entries in the log file as a json object, in which a specific entry from the log is contained in a single message field. If we want to somehow work with this information, we need to first parse this field into separate fields. This can be done, for example, in logstash. It will be the recipient of the records from filebeat. This is how it might look in the filebeat configuration file:



output.logstash:
  hosts: ["logstash1.domain.com:5044"]


If there are several servers, then balancing can be enabled for them: then filebeat will send logs not to the first available server from the list, but will distribute the sent logs between several servers:



hosts: ["logstash1.domain.com:5044", "logstash2.domain.com:5044"]
  loadbalance: true 


Filebeat, when processing logs into the sent json, in addition to the log entry, which is contained in the message field, adds a certain amount of metadata, which affects the size of the document that gets into elastic. This metadata can be selectively removed from the submission. This is done in the processor block using the processor drop_fields. You can exclude, for example, the following fields:



processors:
- drop_fields:
	fields: ["agent.ephemeral_id", "agent.hostname", "agent.id", "agent.type", "agent.version", "agent", "ecs.version", "ecs", "input.type", "input", "log.offset", "version"]


The choice of excluded fields should be approached carefully, because some of them can be used on the elastic side to build indexes.



So, the block for sending logs will look like this:



output.logstash:
  hosts: ["logstash1.domain.com:5044", "logstash2.domain.com:5044"]
  loadbalance: true
 
processors:
- drop_fields:
	fields: ["agent.ephemeral_id", "agent.hostname", "agent.id", "agent.type", "agent.version", "agent", "ecs.version", "ecs", "input.type", "input", "log.offset", "version"]


Filebeat logging settings



It makes sense to set the following logging settings:



  • Logging level info;
  • We write logs to files located by default (the logs directory, in the filebeat installation directory);
  • The log file name is filebeat;
  • Keep the last 10 log files;
  • Start rotation when the size reaches 1MB.


Finally, the logging settings block will look like this:



logging.level: info
logging.to_files: true
logging.files:
  name: filebeat
  keepfiles: 10
  rotateeverybytes: 1048576


Final configuration



We have collected the configuration, and now it looks like this:



filebeat.inputs:
- type: log
  enabled: true
  paths:
    - C:\inetpub\logs\LogFiles\W3SVC1\*.log
    - C:\inetpub\logs\LogFiles\W3SVC2\*.log
  multiline:
    pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2}'
    negate: true
    match: after
  tags: ['IIS', 'ex-srv1']
  exclude_lines: ['^#']
 
output.logstash:
  hosts: ["logstash1.domain.com:5044", "logstash2.domain.com:5044"]
  loadbalance: true
 
processors:
- drop_fields:
    fields: ["agent.ephemeral_id", "agent.hostname", "agent.id", "agent.type", "agent.version", "agent", "ecs.version", "ecs", "input.type", "input", "log.offset", "version"]
 
logging.level: info
logging.to_files: true
logging.files:
  name: filebeat
  keepfiles: 10
  rotateeverybytes: 1048576


It is important to understand that the configuration file format is yml. Therefore, it is important to correctly place spaces and minus signs.



Filebeat can check the configuration file and, if the syntax contains errors, it will indicate in which line and where in the line the syntax is incorrect. The check is done as follows:



.\filebeat.exe test config


Filebeat can also check the network availability of the log receiver. The check is started like this:



.\filebeat.exe test output


In the next parts, I'll talk about Exchange connectivity and friendship with the Logstash and Kibana components.



useful links






All Articles