Customize observability quickly and flexibly with canonical log lines

In posts on HabrĂ©, the topic of structural logging is often mentioned, but in passing. So when I came across this detailed article by Brandur Leach from Stripe , I decided to translate it and share it with the community. 





Badoo . — , id , — . , — , .





Brandur Leach , . — Stripe , , — ( ).





!






— . , «». .





— , - « » . , .





Stripe , , , (canonical log lines). : , , . .





, , -, (operational visibility) , . API, -, PCI- (PCI vault) Stripe Dashboard.





API -, . API :





[2019-03-18 22:48:32.990] Request started

[2019-03-18 22:48:32.991] User authenticated

[2019-03-18 22:48:32.992] Rate limiting ran

[2019-03-18 22:48:32.998] Charge created

[2019-03-18 22:48:32.999] Request finished
      
      



, . «» : JSON, , «-» ( logfmt). , .





:





[2019-03-18 22:48:32.990] Request started httpmethod=POST httppath=/v1/charges requestid=req123

[2019-03-18 22:48:32.991] User authenticated authtype=apikey keyid=mk123 userid=usr123

[2019-03-18 22:48:32.992] Rate limiting ran rateallowed=true ratequota=100 rateremaining=99

[2019-03-18 22:48:32.998] Charge created chargeid=ch123 permissionsused=accountwrite team=acquiring

[2019-03-18 22:48:32.999] Request finished alloccount=9123 databasequeries=34 duration=0.009 httpstatus=200
      
      



( - , , ). , . 





, , API . Splunk :





“Request started” | head
      
      



, - API:





“Rate limiting ran” allowed=false
      
      



API :





“Request finished” earliest=-1h | stats count p50(duration) p95(duration) p99(duration)
      
      



, Graphite StatsD, . , , , - . .





, — , . , , HTTP- :





“Request started” | stats count by http_path
      
      



API 500 ( ), , , - :





“Request finished” status=500 | stats count p50(duration) p95(duration) p99(duration)
      
      



, . , . , .





:

, , , . , (rate limiting) API, : « ?» - , .





, . - . — . , , .





. : ( ) , . :





[2019-03-18 22:48:32.999] canonical-log-line alloc_count=9123 auth_type=api_key database_queries=34 duration=0.009 http_method=POST http_path=/v1/charges http_status=200 key_id=mk_123 permissions_used=account_write rate_allowed=true rate_quota=100 rate_remaining=99 request_id=req_123 team=acquiring user_id=usr_123
      
      



, :





  • HTTP-, ;





  • , ( API, ), API-;





  • (rate limiters), ;





  • , ;





  • , .





, — , IETF URL.





. «» , , , . , , , . . , , , .





:





canonical-log-line rate_allowed=false | stats count by user_id
      
      







, , , . , , , .





. charges



, 4, . , , . :





canonical-log-line user=usr_123 http_method=POST http_path=/v1/charges http_status!=4* | timechart p50(duration) p95(duration) p99(duration)
      
      



Duration of API requests for 50th, 95th and 99th percentiles (generated on the fly from logs)
API 50-, 95- 99- ( )

middleware

, , .





API Stripe middleware . , , , middleware .





:





class CanonicalLineLogger
  def call(env)
    # Call into the core application and inner middleware
    status, headers, body = @app.call(env)

    # Emit the canonical line using response status and other
    # information embedded in the request environment
    log_canonical_line(status, env)

    # Return results upstream
    [status, headers, body]
  end
end
      
      







, . ensure ( finally Ruby — ) , - . begin/rescue ( try/catch), . , ( ).





. -, , — , . , , .





Stripe , . , , , . , Google Protocol Buffers.





API Kafka. , S3. Presto Redshift, , .





, . , Go, , API-:





Using Go versions (data obtained from the archive of canonical log lines that got into our repository)
Go ( , )

, SQL, , . 





:





SELECT
    DATE_TRUNC('week', created) AS week,
    REGEXP_SUBSTR(language_version, '\\d*\\.\\d*') AS major_minor,
    COUNT(DISTINCT user)
FROM events.canonical_log_lines
WHERE created > CURRENT_DATE - interval '2 months'
    AND language = 'go'
GROUP BY 1, 2
ORDER BY 1, 3 DESC
      
      



Google Protocol Buffers , Stripe . Developer Dashboard, API- .





Developer Dashboard displays the number of successful API requests for the Stripe account (data generated from canonical log lines archived in S3)
Developer Dashboard API Stripe- ( , S3)

. MapReduce , S3, . , Google Protocol Buffer, .





, . , .





. , .





, . Kubernetes Elasticsearch, GCP — Google Stackdriver Logging. AWS CloudWatch. Fluentd . , : , , .





, - . , , . Kafka , . - Redis. Redshift BigQuery. , .





, .





  • . .





  • , , .





  • Kafka , .





  • . Stripe Developer Dashboard.





— , , , . , , .












All Articles