Let's talk about centralized logging

This article is a continuation of the text on monitoring . Here I propose to talk to you about the role of logs in assessing the state of the observed site, see what they can give us, and also raise the question - "is it possible to separate logs from metrics?"





Along the way, I will return to some of the theses expressed in the previous publication, so I recommend that you familiarize yourself with it first.





So let's talk about logging.





By the way, what will be correct: logging or logging? Personally, I lean towards the second option, simply because loGGing, but I notice that most people prefer the first. And you?






Debriefing

Before starting a new article, I want to briefly return to the previous one. Several topics were raised in the comments, which, in my opinion, should be given a few suggestions.





Collect everything or just the minimum amount?

Here my position is that you need to collect all the metrics that the object is able to give. As @BugM noted , they are in the database, they don't ask for food, they don't bother anyone. But if you don't have them, but you suddenly needed them, especially for, say, last month, then nothing can be done.





: « – , , , ».





ML, . , , () . , , ( ML), .





, , ?

. , , :





… ,





, , . , . , .





@sizziff .





«» , 150%, , , :





Engineer inundated with alerts
,

@Dr_Wut :





— , — spf. , , . — .





, , - , – , .





- -

.





- – «» , (, …). – .





- - – , - . , – BI-.





.





.






, , , , , , , .





, , . :





– ; , :





2019-04-23 00:39:10,092  INFO  DatabaseConnector – Connection estabilished
      
      



. – . /, , , .





– ; , . API. , , Nginx:





66.249.65.62 - - [06/Nov/2014:19:12:14 +0600] "GET /?q=%E0%A6%A6%E0%A7%8B%E0%A7%9F%E0%A6%BE HTTP/1.1" 200 4356 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
      
      



. , .





– , .





: , , , , – , , , , . , , , , , .





, « ?».





– , , .





– , database_error_count. , , - , , ( ) . :





2019-04-27 00:39:10,092  ERROR  DatabaseConnector – Error connecting to database MSSQLDB – connection refused on port 1433
      
      



– .





, . , , «» , , , , , , .





, . -, , , , … !





– HTTP- , , , , , , :





, , , . Observability – .





? ? ? ? , .





:





  • (99% - API - - , - )





  • (- API)





  • ( )





. .





, HTTP – .





. :





  1. DMZ (trace ID) ; !





  2. , , -, , -,





, trace ID , – .





, :





– , :









  • ;









, – Pull Push.





Pull – ( , , ), , //- . – ; – , .





Push – / / . , , .





, , ( , ), .





– plain text, jsonl, logsft, . – , .





– , .





:





@timestamp<time>:      
application<string>:  ,    ;      
host<string>:         ,    
log_type<string>:     ; application|access|.... (     application )
trace_id<string>:      ( )
      
      



.





, :





message<string>:           
generic_message<string>:    
level<string>:              
level_value<int>:           
logger_name<string>:      ,   ( )
thread_name<string>:      ,   ( )
stack_trace<string>:      ;     -      ( )
      
      



:





status_code<int>:              
elapsed_time<int>:          ,      
requested_resource<string>:  
method<string>:              
      
      



.





, .





:





  • – NoSQL , , . , , –





  • – , - , . , ,





, , «EMERGENCY», , , , , . , «FATAL» - .





, «generic_message». .





– ( , ).





– , . :





:





Error on AMQP connection <0.12956.79> (127.0.0.1:52879 -> 127.0.0.1:5672, state: starting):
      
      



, :





Error on AMQP connection <{connection_id}> ({remote_host} -> {destination_host}, state: {connection_state}):
      
      



.





? :





  • ; , , . ,





  • ; «session_id»





  • , ( , ), ( )





. .





, . Elasticsearch, , , Loki . , - https://habr.com/ru/company/badoo/blog/507718/.





, , .





:

























, ( , ).





:





  1. - , ERROR





  2. – , ( , )





  3. , – , , ,





:





The monitoring user moves from top to bottom, analyzing the incident
,

, :





  • ;





  • , ; , ,





, ?





, – , , . .





, , , – .





Perhaps later another article will appear, already with examples of the use of specific technologies and practices, in which we will try to implement what was described earlier and see how it works.








All Articles