Fluentd: why tuning the output buffer is important



Nowadays, it is impossible to imagine a Kubernetes-based project without an ELK stack, with which logs of both applications and system components of the cluster are saved. In our practice, we use the EFK stack with Fluentd instead of Logstash.



Fluentd β€” , Cloud Native Computing Foundation, - Kubernetes.



Fluentd Logstash , , Fluentd , .



, EFK , , Kibana . , .





Fluentd DaemonSet ( Kubernetes) stdout /var/log/containers. JSON- ElasticSearch, standalone , . Kibana.



Fluentd , ElasticSearch . , Nginx. :



127.0.0.1 192.168.0.1 - [28/Feb/2013:12:00:00 +0900] "GET / HTTP/1.1" 200 777 "-" "Opera/12.0" -


, ElasticSearch , :



{
  "_index": "test-custom-prod-example-2020.01.02",
  "_type": "_doc",
  "_id": "HgGl_nIBR8C-2_33RlQV",
  "_version": 1,
  "_score": 0,
  "_source": {
    "service": "test-custom-prod-example",
    "container_name": "nginx",
    "namespace": "test-prod",
    "@timestamp": "2020-01-14T05:29:47.599052886 00:00",
    "log": "127.0.0.1 192.168.0.1 - [28/Feb/2013:12:00:00  0900] \"GET / HTTP/1.1\" 200 777 \"-\" \"Opera/12.0\" -",
    "tag": "custom-log"
  }
}

{
  "_index": "test-custom-prod-example-2020.01.02",
  "_type": "_doc",
  "_id": "IgGm_nIBR8C-2_33e2ST",
  "_version": 1,
  "_score": 0,
  "_source": {
    "service": "test-custom-prod-example",
    "container_name": "nginx",
    "namespace": "test-prod",
    "@timestamp": "2020-01-14T05:29:47.599052886 00:00",
    "log": "127.0.0.1 192.168.0.1 - [28/Feb/2013:12:00:00  0900] \"GET / HTTP/1.1\" 200 777 \"-\" \"Opera/12.0\" -",
    "tag": "custom-log"
  }
}


, .



Fluentd :



2020-01-16 01:46:46 +0000 [warn]: [test-prod] failed to flush the buffer. retry_time=4 next_retry_seconds=2020-01-16 01:46:53 +0000 chunk="59c37fc3fb320608692c352802b973ce" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch\", :port=>9200, :scheme=>\"http\", :user=>\"elastic\", :password=>\"obfuscated\"}): read timeout reached"


ElasticSearch request_timeout , - . Fluentd ElasticSearch :



2020-01-16 01:47:05 +0000 [warn]: [test-prod] retry succeeded. chunk_id="59c37fc3fb320608692c352802b973ce" 
2020-01-16 01:47:05 +0000 [warn]: [test-prod] retry succeeded. chunk_id="59c37fad241ab300518b936e27200747" 
2020-01-16 01:47:05 +0000 [warn]: [test-dev] retry succeeded. chunk_id="59c37fc11f7ab707ca5de72a88321cc2" 
2020-01-16 01:47:05 +0000 [warn]: [test-dev] retry succeeded. chunk_id="59c37fb5adb70c06e649d8c108318c9b" 
2020-01-16 01:47:15 +0000 [warn]: [kube-system] retry succeeded. chunk_id="59c37f63a9046e6dff7e9987729be66f"


, ElasticSearch _id . .



Kibana :







. β€” fluent-plugin-elasticsearch . , ElasticSearch . , -, .



Fluentd, . - ElasticSearch , , . , , , , , Fluentd .



, , , , : , , . , , , , , Fluentd .



:



 <buffer>
        @type file
        path /var/log/fluentd-buffers/kubernetes.test.buffer
        flush_mode interval
        retry_type exponential_backoff
        flush_thread_count 2
        flush_interval 5s
        retry_forever
        retry_max_interval 30
        chunk_limit_size 8M
        queue_limit_length 8
        overflow_action block
      </buffer>


:

chunk_limit_size β€” , .



  • flush_interval β€” , .
  • queue_limit_length β€” .
  • request_timeout β€” , Fluentd ElasticSearch.


, queue_limit_length chunk_limit_size, Β« , Β». :



2020-01-21 10:22:57 +0000 [warn]: [test-prod] failed to write data into buffer by buffer overflow action=:block


, , , , .



: , , .



chunk_limit_size 32 , ElasticSeacrh , . , , queue_limit_length.



-, request_timeout. , 20 , Fluentd :



2020-01-21 09:55:33 +0000 [warn]: [test-dev] buffer flush took longer time than slow_flush_log_threshold: elapsed_time=20.85753920301795 slow_flush_log_threshold=20.0 plugin_id="postgresql-dev" 


, , slow_flush_log_threshold. request_timeout.



:



  1. request_timeout , ( ). -.
  2. slow_flush_log_threshold. elapsed_time .
  3. request_timeout , elapsed_time, . request_timeout elapsed_time + 50%.
  4. , slow_flush_log_threshold. elapsed_time + 25%.


, , . , , .



, , , :



node-1 node-2 node-3 node-4
/ / / /
failed to flush the buffer 1749/2 694/2 47/0 1121/2
retry succeeded 410/2 205/1 24/0 241/2


, , , . - Fluentd , slow_flush_log_threshold. request_timeout, , .





Fluentd EFK , . , , ElasticSearch , .



:






All Articles