The other day I watched the OWOX webinar, where Andrey Osipov (web analyst, author of the web-analytics.me blog and lecturer at the Andrey Osipov School of Web Analytics) spoke about his experience of using dbt. He talked about who the tool will be useful to and what problems it solves, and most importantly, how not to go astray with a complex table hierarchy and be sure that all data is considered correct. I decided to transcribe the webinar into an article, because itβs more convenient to return to the information, and, believe me, itβs worth it.
Why do you need dbt
SQL-? Google BigQuery, Google Cloud , , scheduled queries.
, ( ) 2-4 , .
, .
. , CRM, , , , Google BigQuery .
, β . , , , , .
, . , , β .
( scheduled queries) :
. scheduled query , . , , . , 5 . - , 7 . . - , 40.
. Google Docs , , .. , . , . .
. , , , Google BigQuery . , . , . - . , timestamp .
SQL. , , . BigQuery. , dbt GBQ 20 , . , .
. , dbt.
dbt (data build tool)
SQL , . dbt? -, , -, , . , .
:
dbt.
:
.
, . Cloud Functions, Cloud Run, OWOX BI Pipeline .
, : , , .
, BI- - .
dbt β , , . , scheduled queries.
dbt
dbt : .
β , , view table.
(.sql) β , SELECT-.
, . , (, ) descriptions .
, BigQuery. , , GBQ.
(.yml) β , , , .
dbt CLI
dbt : cloud. , Google Cloud .
. β dbt run, . . dbt run, dbt test, , , .
dbt Cloud Source Repositories GitHub. cloud-, dbt .
dbt :
, Atom, , , , . - .
git push . dbt Cloud Build .
, , - .
dbt Cloud
, dbt Cloud. - . , , . , . . , β $50 .
dbt
(Refsβ¦)
β . dbt . , -, , -, (Directed Acyclic Graph).
(Directed Acyclic Graph)
, dbt. , . , , Google BigQuery. , 2-3 , , .
dbt , . . , . - , .
, , - . , ( , ), . , .
(Loops)
β Jinja. . , , . .
, Google Analytics , , . , - , . .
, dbt , .. β , , . , . , , .
(Variables)
dbt : .
(Macros)
, Β«MacrosΒ». . , , . , .
, dbt .
, , , , .. , - .
Incremental
, .
, dbt. BigQuery , select * from [ ], . , GA 4 OWOX BI , . , .
dbt (, order_id), .
, GA 4 , events, . , , . intraday events .
, , . , . , (- ), . , , .
, , . .
, , .
, dbt:
Not Null.
Unique.
Reference Integrity β (, customer_id orders id customers).
.
Custom data tests.
β , , .
dbt . descriptions, , , , , , Google BigQuery.
.
DEV β TEST β PROD
dbt . , , , , , . , Google BigQuery.
Git
dbt Git, GitHub Google Cloud Source Repositories, , .
(Logging via webhooks)
- pop-up Google Cloud, . GBQ , .
dbt
dbt loud
dbt cloud : , , - cron. . , , - . , , .
schedule, .
Google Cloud β Cloud Shell
dbt. Google Cloud , Cloud Shell. , AppEngine. dbt , AppEngine.
, , , . , .
Google Cloud β Cloud Run
dbt, Google Cloud Cloud Run :
Atom , .
Cloud .
Upon pushing through Cloud Build, a new version of our Cloud Run is generated and launched through the Cloud Scheduler according to the required schedule.
As a result of the work of dbt, which is in the Cloud Run, all this is calculated in BigQuery, from where it goes to Data Studio.
Logs can be added via pop-ups in Telegram according to certain rules - for example, not all push, but only some important changes.
Such an infrastructure makes it quite easy to transfer requests from one Cloud project to another and control everything that happens with calculations in dbt. By using Git, you clearly understand who on your team pushed what, where, why and why.