Hello, Habr! My name is Artemy Kozyr.
In recent years, I have accumulated quite extensive experience with data and what is now called Big Data .
Not so long ago, interest in the field of Internet marketing and End-to-End Analytics has also flared up , and not from scratch. My friend from the acting agency provided me with data and cases of real clients, and hereskippedaway we go.
It turns out quite interesting: Azure SQL + dbt + Github Actions + Metabase.
Half of the success in getting the task right
So, let's try without praises and straight to the point. Client portrait (this is the one for whom we are doing the service): the owner of an online store / retail network / mobile application / educational platform. It pursues the following goals:
- Positioning and promoting your product; business growth
- Optimization of promotion channels: focus on areas that give the best result
- Control of expenses and return indicators for each invested ruble
- Segmentation of users and building communication with them
On average, he uses 4 groups of services:
- CRM ( AmoCRM , Bitrix24 ) - leads, funnels and sales; life cycle and actual attributes of transactions
- (Yandex.Metrika, Google Analytics) — , , ,
- (Facebook, Google Adwords, Yandex.Direct) — , -, ,
- Call-tracking/Event tracking — , -
, : , , CRM, . . :
, , . :
- ()
- ( !)
- ( )
- ( )
()
- . : : ? ( 3:13).
myBI Connect. - , -. , :
1.
( ), , ETL-.
, Facebook (Campaigns), (Adsets) (Ads), , , , , ..
2. //
.? .Clickhouse . , , , ( ).
https://api-metrika.yandex.net/stat/v1/data.csv ?ids=55254416 &dimensions=ym:s:date, ym:s:UTMSource, ym:s:UTMMedium, ym:s:UTMCampaign &metrics=ym:s:visits, ym:s:ecommercePurchases, ym:s:ecommerceRevenue &date1=2020-12-01 &date2=2020-12-31 &group=day &lang=en &accuracy=full &sort=ym:s:date &limit=100000 &pretty=true
JSON-, ( cron), (requests), .. .
myBI Connect .
3. Webhook ,
- CRM? ?
- ! Webhook .
. inhouse-CRM, XML:
# get data from XML endpoint with curl utility
curl "https://www.internal-crm.ru/order-list.xml?date-from=01-12-2020&date-to=31-12-2020" -o export.xml
# convert XML to JSON with xq utility
xq . export.xml > export.json
# parse JSON doc with jq utility
jq '[."order-list".date[] | .order[]]' export.json > parsed.json
# post to myBI Connect Webhook endpoint with curl utility
curl --header "Content-Type: application/json" \
--request POST \
--data @parsed.json \
https://app.mybi.ru/webhook/23576/xhsfcxmlyh/
XML, JSON, , myBI Connect Webhook. shell-, .
4.
, . .
(Data Modeling), . , , () - .
DWH :
- (1) — , , CRM
- (2) — (, );
- (3) — -, ;
dbt. git-, .sql ( ) .yaml (). dbt : Data Build Tool .
- : BigQuery, Redshift, Snowflake, Postgres, Spark, Presto. Azure SQL Database (managed SQL Server). , ; , -, ( ).
DWH :
1. (Sources)
- , myBI Connect. .
2. (Staging)
( views), . :
- , UTM-
- : , ..
- ( )
3. (Auxiliary)
Aux . () :
- — , , ( )
- : , ..
4. (Marts)
— , . , - . . :
- , ,
- -: ,
- : full join,
— ?
— , .
— . , API, , Github Action, .
. . , ().
, , . , . Slack.
— . , . .
:
- ,
- , , -
- (x-ray)
Open Source BI Metabase (!). Amazon Elastic Beanstalk, :
- Docker-
- Postgres (AWS RDS)
- (Load Balancing) -Healthcheck
- Metabase
BI Metabase AWS Elastic Beanstalk
? ! , , . , .
1.
, . , ...
. - — . .
(1) : , , { }, UTM- ( !), , . , , . , .
(2) -. - - php-, .
(3) , . , Google Adwords aud-, kwd-, pla-.
(4) , . !
, .
2. CRM
CRM, , . , , - . , , .
— , CRM-, ?
— : .
? , .
3. ( )
, , . . : . - :
from costs c full join conversions cv on c.[] = cv.[] and c.[ ] = cv.[ ] and c.[ ] = cv.[ ] and c.[ ] = cv.[ ]
NULL? ( NULL = NULL).
: , -:
--
{%- set key_field_list = [
'[]',
'[ ]',
'[ ]',
'[ ]'
]
-%}
-- -
select
{{ concat_key(key_field_list) }} as concat_key
, {{ surrogate_key(key_field_list) }} as hash_key
...
-- :
from costs c
full join conversions cv on c.hash_key = cv.hash_key
. .
4.
, full join. , , , : CRM, , ..
-, , ( ). -, .
-, - , . -:
- meta_is_row_match (true/false) — ?
- meta_row_origin — (././AmoCRM)?
, CRM - ? . , , . :
select
...
, sum(1) as []
, sum(CASE WHEN [] LIKE '%%' THEN 1 ELSE 0 END) as [ - ]
, sum(CASE WHEN [ ] LIKE '%%' THEN 1 ELSE 0 END) as [ - ]
, sum(CASE WHEN [ ] LIKE '%%' THEN [] ELSE 0 END) as [ - ]
...
5.
. . , API .. , ym:s:<AttributionModel>UTMSource:
ym:s:lastsignUTMSource --
ym:s:firstUTMSource --
ym:s:lastUTMSource --
ym:s:last_yandex_direct_clickUTMSource --
6. ()
- . , .
Github Action kzzzr / mybi-dbt-action — , Docker-, , , , . , , .
. . , , . .
, . — . , . . .
— ? ?
— 1 5 .
— , ?
— .
, TODO:
- : , API, Webhook
- , dbt CORE . .
- + + + (- -)
- ( + ),
- ( ) Github Actions
- , , (DAG)
- : Git-, DEV PROD.
- : , -, .
, . :
- — Continuous Integration & Data Tests. , .
- (Data Quality): , . , Wheely : : QA .
- 1- . , http://spasibo-metabase.eu-central-1.elasticbeanstalk.com/
- Apache Superset Metabase. , .
- - . : , .
- self-service BI. . , , , .
- : , , .
- . . ! .
, ?
pet-project, . :
- , . , ?
- . Private, public Open Source.
- . , - .
- . — .
I will publish news related to this project in the telegram channel https://t.me/enthusiastech .
Stay tuned for updates and ask questions, I will be happy to answer them.
Thank you for attention.