Azure SQL End-to-End Analytics + dbt + Github Actions + Metabase

Title







Hello, Habr! My name is Artemy Kozyr.







In recent years, I have accumulated quite extensive experience with data and what is now called Big Data .







Not so long ago, interest in the field of Internet marketing and End-to-End Analytics has also flared up , and not from scratch. My friend from the acting agency provided me with data and cases of real clients, and hereskippedaway we go.

It turns out quite interesting: Azure SQL + dbt + Github Actions + Metabase.







Half of the success in getting the task right



So, let's try without praises and straight to the point. Client portrait (this is the one for whom we are doing the service): the owner of an online store / retail network / mobile application / educational platform. It pursues the following goals:







  • Positioning and promoting your product; business growth
  • Optimization of promotion channels: focus on areas that give the best result
  • Control of expenses and return indicators for each invested ruble
  • Segmentation of users and building communication with them


On average, he uses 4 groups of services:







  • CRM ( AmoCRM , Bitrix24 ) - leads, funnels and sales; life cycle and actual attributes of transactions
  • (Yandex.Metrika, Google Analytics) — , , ,
  • (Facebook, Google Adwords, Yandex.Direct) — , -, ,
  • Call-tracking/Event tracking — , -


, : , , CRM, . . :







Agreed reporting form of the Agency to the Client







, , . :







  • ()
  • ( !)
  • ( )
  • ( )


()



- . : : ? ( 3:13).







myBI Connect. - , -. , :







1.







( ), , ETL-.







, Facebook (Campaigns), (Adsets) (Ads), , , , , ..







Facebook myBI Connect detailed layer diagram

Facebook myBI Connect







2. //







.? .Clickhouse . , , , ( ).







API .
https://api-metrika.yandex.net/stat/v1/data.csv
   ?ids=55254416
   &dimensions=ym:s:date, ym:s:UTMSource, ym:s:UTMMedium, ym:s:UTMCampaign
   &metrics=ym:s:visits, ym:s:ecommercePurchases, ym:s:ecommerceRevenue
   &date1=2020-12-01
   &date2=2020-12-31
   &group=day
   &lang=en
   &accuracy=full
   &sort=ym:s:date
   &limit=100000
   &pretty=true
      
      





JSON-, ( cron), (requests), .. .







myBI Connect .







Custom upload from Yandex.Metrica

.







3. Webhook ,







  • CRM? ?
  • ! Webhook .


. inhouse-CRM, XML:







# get data from XML endpoint with curl utility
curl "https://www.internal-crm.ru/order-list.xml?date-from=01-12-2020&date-to=31-12-2020" -o export.xml

# convert XML to JSON with xq utility
xq . export.xml > export.json

# parse JSON doc with jq utility
jq '[."order-list".date[] | .order[]]' export.json > parsed.json

# post to myBI Connect Webhook endpoint with curl utility
curl --header "Content-Type: application/json" \
 --request POST \
 --data @parsed.json \
 https://app.mybi.ru/webhook/23576/xhsfcxmlyh/
      
      





XML, JSON, , myBI Connect Webhook. shell-, .







4.









, . .









(Data Modeling), . , , () - .







Basic DWH blocks: sources, detail layer, data marts

DWH: , ,







DWH :







  • (1) — , , CRM
  • (2) — (, );
  • (3) — -, ;


dbt. git-, .sql ( ) .yaml (). dbt : Data Build Tool .







- : BigQuery, Redshift, Snowflake, Postgres, Spark, Presto. Azure SQL Database (managed SQL Server). , ; , -, ( ).







Project structure: git repo with code (.sql) and configuration (.yaml)

: git- (.sql) (.yaml)







DWH :







Model dependency chain: Sources -> Stage -> Subs -> Showcases

: -> -> ->







1. (Sources)







- , myBI Connect. .







2. (Staging)







( views), . :







  • , UTM-
  • : , ..
  • ( )


3. (Auxiliary)







Aux . () :







  • — , , ( )
  • : , ..


4. (Marts)







— , . , - . . :







  • , ,
  • -: ,
  • : full join,




— ?

— , .

— . , API, , Github Action, .







Visualization of the dynamics of key indicators on an interactive dashboard







. . , ().







, , . , . Slack.







— . , . .







Semantic layer for access to storefront metadata and detail layer







:







  • ,
  • , , -
  • (x-ray)


Open Source BI Metabase (!). Amazon Elastic Beanstalk, :







  • Docker-
  • Postgres (AWS RDS)
  • (Load Balancing) -Healthcheck
  • Metabase


Productively deploying BI Metabase in the AWS Elastic Beanstalk cloud

BI Metabase AWS Elastic Beanstalk









? ! , , . , .







1.







, . , .. .

. - — . .







(1) : , , { }, UTM- ( !), , . , , . , .







(2) -. - - php-, .







(3) , . , Google Adwords aud-, kwd-, pla-.







(4) , . !







Examples of taking into account the peculiarities of markup for the subsequent parsing of identifiers







, .







2. CRM







CRM, , . , , - . , , .







— , CRM-, ?

— : .

? , .







Checking dashboard for problem deals in CRM

CRM







3. ( )







, , . . : . - :







from costs c
   full join conversions cv on
           c.[] = cv.[]
       and c.[ ] = cv.[ ]
       and c.[  ] = cv.[  ]
       and c.[  ] = cv.[  ]   
      
      





NULL? ( NULL = NULL).







: , -:







--      
{%- set key_field_list = [
       '[]',
       '[ ]',
       '[  ]',
       '[  ]'
   ]
-%}

--  -     
select

     {{ concat_key(key_field_list) }} as concat_key
   , {{ surrogate_key(key_field_list) }} as hash_key

...

--    :
from costs c
   full join conversions cv on c.hash_key = cv.hash_key
      
      





. .







A surrogate hash key is ideal for a join;  the concatenation key is human-readable

- ;







4.







, full join. , , , : CRM, , ..







-, , ( ). -, .







-, - , . -:







  • meta_is_row_match (true/false) — ?
  • meta_row_origin — (././AmoCRM)?


Meta-columns is_match, row_origin help in finding sources of problems

- is_match, row_origin







, CRM - ? . , , . :







   select 
       ...
       , sum(1) as []
       , sum(CASE WHEN [] LIKE '%%' THEN 1 ELSE 0 END) as [ -  ]
       , sum(CASE WHEN [ ] LIKE '%%' THEN 1 ELSE 0 END) as [ -  ]
       , sum(CASE WHEN [ ] LIKE '%%' THEN [] ELSE 0 END) as [ -  ]
       ...
      
      





5.







. . , API .. , ym:s:<AttributionModel>UTMSource:







ym:s:lastsignUTMSource --   
ym:s:firstUTMSource --  
ym:s:lastUTMSource --  
ym:s:last_yandex_direct_clickUTMSource --    

      
      





6. ()







- . , .







Automating data marts with Github Action

Github Action







Github Action kzzzr / mybi-dbt-action — , Docker-, , , , . , , .









. . , , . .







, . — . , . . .







? ?

1 5 .

, ?

.

, TODO:







  • : , API, Webhook
  • , dbt CORE . .
  • + + + (- -)
  • ( + ),
  • ( ) Github Actions
  • , , (DAG)
  • : Git-, DEV PROD.
  • : , -, .




, . :









, ?



pet-project, . :







  • , . , ?
  • . Private, public Open Source.
  • . , - .
  • . — .


I will publish news related to this project in the telegram channel https://t.me/enthusiastech .







Stay tuned for updates and ask questions, I will be happy to answer them.







Thank you for attention.








All Articles