In posts on Habré, the topic of structural logging is often mentioned, but in passing. So when I came across this detailed article by Brandur Leach from Stripe , I decided to translate it and share it with the community.
Badoo . — , id , — . , — , .
Brandur Leach , . — Stripe , , — ( ).
!
— . , «». .
— , - « » . , .
Stripe , , , (canonical log lines). : , , . .
, , -, (operational visibility) , . API, -, PCI- (PCI vault) Stripe Dashboard.
API -, . API :
[2019-03-18 22:48:32.990] Request started
[2019-03-18 22:48:32.991] User authenticated
[2019-03-18 22:48:32.992] Rate limiting ran
[2019-03-18 22:48:32.998] Charge created
[2019-03-18 22:48:32.999] Request finished
, . «» : JSON, , «-» ( logfmt). , .
:
[2019-03-18 22:48:32.990] Request started httpmethod=POST httppath=/v1/charges requestid=req123
[2019-03-18 22:48:32.991] User authenticated authtype=apikey keyid=mk123 userid=usr123
[2019-03-18 22:48:32.992] Rate limiting ran rateallowed=true ratequota=100 rateremaining=99
[2019-03-18 22:48:32.998] Charge created chargeid=ch123 permissionsused=accountwrite team=acquiring
[2019-03-18 22:48:32.999] Request finished alloccount=9123 databasequeries=34 duration=0.009 httpstatus=200
( - , , ). , .
, , API . Splunk :
“Request started” | head
, - API:
“Rate limiting ran” allowed=false
API :
“Request finished” earliest=-1h | stats count p50(duration) p95(duration) p99(duration)
, Graphite StatsD, . , , , - . .
, — , . , , HTTP- :
“Request started” | stats count by http_path
API 500 ( ), , , - :
“Request finished” status=500 | stats count p50(duration) p95(duration) p99(duration)
, . , . , .
:
, , , . , (rate limiting) API, : « ?» - , .
, . - . — . , , .
. : ( ) , . :
[2019-03-18 22:48:32.999] canonical-log-line alloc_count=9123 auth_type=api_key database_queries=34 duration=0.009 http_method=POST http_path=/v1/charges http_status=200 key_id=mk_123 permissions_used=account_write rate_allowed=true rate_quota=100 rate_remaining=99 request_id=req_123 team=acquiring user_id=usr_123
, :
HTTP-, ;
, ( API, ), API-;
(rate limiters), ;
, ;
, .
. «» , , , . , , , . . , , , .
:
canonical-log-line rate_allowed=false | stats count by user_id
, , , . , , , .
. charges
, 4, . , , . :
canonical-log-line user=usr_123 http_method=POST http_path=/v1/charges http_status!=4* | timechart p50(duration) p95(duration) p99(duration)
middleware
, , .
API Stripe middleware . , , , middleware .
:
class CanonicalLineLogger
def call(env)
# Call into the core application and inner middleware
status, headers, body = @app.call(env)
# Emit the canonical line using response status and other
# information embedded in the request environment
log_canonical_line(status, env)
# Return results upstream
[status, headers, body]
end
end
, . ensure ( finally Ruby — ) , - . begin/rescue ( try/catch), . , ( ).
. -, , — , . , , .
Stripe , . , , , . , Google Protocol Buffers.
API Kafka. , S3. Presto Redshift, , .
, . , Go, , API-:
, SQL, , .
:
SELECT
DATE_TRUNC('week', created) AS week,
REGEXP_SUBSTR(language_version, '\\d*\\.\\d*') AS major_minor,
COUNT(DISTINCT user)
FROM events.canonical_log_lines
WHERE created > CURRENT_DATE - interval '2 months'
AND language = 'go'
GROUP BY 1, 2
ORDER BY 1, 3 DESC
Google Protocol Buffers , Stripe . Developer Dashboard, API- .
. MapReduce , S3, . , Google Protocol Buffer, .
, . , .
. , .
, . Kubernetes Elasticsearch, GCP — Google Stackdriver Logging. AWS CloudWatch. Fluentd . , : , , .
, - . , , . Kafka , . - Redis. Redshift BigQuery. , .
, .
. .
, , .
Kafka , .
. Stripe Developer Dashboard.
— , , , . , , .