Working with dbt powered by Google BigQuery

The other day I watched the OWOX webinar, where Andrey Osipov (web analyst, author of the web-analytics.me blog and lecturer at the Andrey Osipov School of Web Analytics) spoke about his experience of using dbt. He talked about who the tool will be useful to and what problems it solves, and most importantly, how not to go astray with a complex table hierarchy and be sure that all data is considered correct. I decided to transcribe the webinar into an article, because it’s more convenient to return to the information, and, believe me, it’s worth it.





Why do you need dbt

SQL-? Google BigQuery, Google Cloud , , scheduled queries.





, ( ) 2-4 , . 





, .





. , CRM, , , , Google BigQuery .





, β€” . , , , , . 





, . , , β€” . 





( scheduled queries) : 





  1. . scheduled query , . , , . , 5 . - , 7 . . - , 40.





  2. . Google Docs , , .. , . , . .





  3. . , , , Google BigQuery . , . , . - . , timestamp . 





  4. SQL. , , . BigQuery. , dbt GBQ 20 , . , .





. , dbt. 





dbt (data build tool)

SQL , . dbt? -, , -, , . , . 









dbt.









  1. .





  2. , . Cloud Functions, Cloud Run, OWOX BI Pipeline .





  3. , : , , .





  4. , BI- - . 





dbt β€” , , . , scheduled queries.





dbt

dbt : . 





β€” , , view table.





(.sql) β€” , SELECT-.





, . , (, ) descriptions . 





, BigQuery. , , GBQ.





(.yml) β€” , , , .





dbt CLI

dbt : cloud. , Google Cloud . 





. β€” dbt run, . . dbt run, dbt test, , , .





dbt Cloud Source Repositories GitHub. cloud-, dbt . 





dbt : 





  • , Atom, , , , . - . 





  • git push . dbt Cloud Build .





, , - . 





dbt Cloud

, dbt Cloud. - . , , . , . . , β€” $50 . 





dbt 

(Refs…)

β€” . dbt . , -, , -, (Directed Acyclic Graph). 





(Directed Acyclic Graph)

, dbt. , . , , Google BigQuery. , 2-3 , , . 





dbt , . . , . - , .





, , - . , ( , ), . , . 





(Loops)

β€” Jinja. . , , . .





, Google Analytics , , . , - , . . 





, dbt , .. β€” , , . , . , , . 





(Variables)

dbt : . 





(Macros)

, Β«MacrosΒ». . , , . , . 





, dbt .





, , , , .. , - . 





Incremental

, . 





, dbt. BigQuery , select * from [ ], . , GA 4 OWOX BI , . , .





dbt (, order_id), . 





, GA 4 , events, . , , . intraday events . 





, , . , . , (- ), . , , . 





, , . . 





, , .





, dbt:





  • Not Null.





  • Unique.





  • Reference Integrity β€” (, customer_id orders id customers).





  • .





  • Custom data tests.





β€” , , . 





dbt . descriptions, , , , , , Google BigQuery. 





.





DEV β€” TEST β€” PROD

dbt . , , , , , . , Google BigQuery.





Git

dbt Git, GitHub Google Cloud Source Repositories, , .





(Logging via webhooks)

- pop-up Google Cloud, . GBQ , .





dbt

dbt loud

dbt cloud : , , - cron. . , , - . , , . 





schedule, .





Google Cloud β€” Cloud Shell

dbt. Google Cloud , Cloud Shell. , AppEngine. dbt , AppEngine. 





, , , . , .





Google Cloud β€” Cloud Run

dbt, Google Cloud Cloud Run : 





  1. Atom , .





  2. Cloud .





  3. Upon pushing through Cloud Build, a new version of our Cloud Run is generated and launched through the Cloud Scheduler according to the required schedule. 





  4. As a result of the work of dbt, which is in the Cloud Run, all this is calculated in BigQuery, from where it goes to Data Studio. 





  5. Logs can be added via pop-ups in Telegram according to certain rules - for example, not all push, but only some important changes. 





Such an infrastructure makes it quite easy to transfer requests from one Cloud project to another and control everything that happens with calculations in dbt. By using Git, you clearly understand who on your team pushed what, where, why and why.








All Articles