Despite the fact that there is a lot of data now almost everywhere, analytical databases are still quite exotic. They are poorly known and even worse know how to use them effectively. Many continue to "eat a cactus" with MySQL or PostgreSQL, which are designed for different scenarios, struggle with NoSQL, or overpay for commercial solutions. ClickHouse is a game changer and significantly lowers the threshold for entering the world of analytical DBMS.
Report from BackEnd Conf 2018 and published with the permission of the speaker.
Who am I and why am I talking about ClickHouse? I am the Development Director at LifeStreet, which uses ClickHouse. Also, I am the founder of Altinity. This is a Yandex partner who promotes ClickHouse and helps Yandex make ClickHouse more successful. I am also ready to share my knowledge about ClickHouse.
. . , .
ยซ ยป, ClickHouse:
- ,
- ,
- .
, .
, , ClickHouse, .
, ClickHouse , ClickHouse , .
, ClickHouse . , .
: ยซ ClickHouse?ยป. , , .
- โ . ClickHouse . ClickHouse . , - .
- โ . . , Vertica โ . , . , . . ClickHouse .
- โ . - . RedShift โ . RedShift . , , Amazon, . Google BigQuery . - , , .
ClickHouse .
ClickHouse ? ClickHouse .
- -, . . use case, .
- AdTech ClickHouse.
- , .
- ClickHouse . ClickHouse, , .
- , . . ClickHouse.
- CloudFlare. - ClickHouse , . community. ClickHouse-. , Kafka Engine ClickHouse.
- . ClickHouse proof on concept, production.
- ClickHouse . , , 2 000 . โ .
- -. , Bloxy.info. ethereum-. ClickHouse.
. , . . .
, :
- : 500+ , 25 .
- LifeStreet: 60 , 75 . , , .
- CloudFlare: 36 , 200 . .
- Bloomberg: 102 , . .
โ . heatmap, ClickHouse . , , . . 4 .
, . , Altinity, . , , , . . , . . . , . , ClickHouse โ , , .
? , ClickHouse . , . , , , .
ClickHouse .
- โ : Vertica ClickHouse. , Vertica .
- โ ClickHouse. . , ClickHouse , . , . , .
- โ ClickHouse. , ClickHouse Hadoop . , ClickHouse - map reduce , . , .
- LifeStreet โ Ad Tech , , .
- , programmatic bidding.
- : 10 . .
- , , โ , programmatic bidding.
. HighLoad. LifeStreet MySQL ( Oracle) Vertica. .
, , Vertica โ . . . proof of concept performance testing , 13- 16- . HighLoad.
โ Vertica , . . , . , , - , , . , , .
? , , , . , Vertica, . , .
, open source , , . . . , . .
, , , open source, โ .
, , , ClickHouse. , : ยซ?ยป, .
2016- , ClickHouse. , Vertica. . , . . (join), ClickHouse Vertica .
. : ClickHouse Vertica, .
(join), . ClickHouse Vertica . - , . . .
, , LifeStreet ClickHouse.
16- , . , , . , . .
, , .
โ :
- .
- . 10 , , LifeStreet 75 3 . , . SQL- , .
- , ClickHouse , Vertica, , SAS-. ClickHouse SATA. ? Vertica insert . , , , , . . . ClickHouse insert . , , , ClickHouse , . . SATA, RAID , .
- , . . 3 60 (20 โ ) 6 . Vertica .
.
- โ . .
- โ SQL.
OLAP- โ select. group by, . where, . group by . .
star-, , .
, , , . , . , . . - .
ClickHouse . :
- โ ClickHouse (join), . . (join) , . .
- โ , . , star-, - . , , . .
ClickHouse . :
- โ . External Dictionaries โ , 99 % star-, .
- โ . (join) .
- (join).
- . 2018- ( ) , . . , . โ .
- , (join) , , , , , , .
- (join).
- 1 .
- , . - .
. , .
, . :
- . - .
- key-value . - .
- , - .
. - , (join).
ClickHouse , string key-value.
โ , , . , .
. has
, , . , , .
subid - . , , , . .
, , , , -, . , -, , .
. , ID. . arrayMap
. -. -. ID .
. , .
.
, , .
- ClickHouse . .
- . ?
- (join), . , , .
- โ . , , . (join), .
- .
. , -5 . 2,5 , -. , - . , , (int). . . 2,5 1,5 . .
. . 5 . , , - , , .
. , , , , , . .
- .
- , .
- - (join), , , , (join) .
- .
, , . .
. . ?
:
- - .
- .
?
, , 20 , . - , , . , , - .
: ยซ , ?ยป ยซ , ?ยป. . . , , , ?
, , - , . . - . DWH. , .
โ , , - , , , , . , , , - , . , . .
.
โ :
- Last Interaction, interaction โ , .
- First Interaction, . . , .
- โ .
- .
- .
? Runtime Cassandra. Cassandra transaction storage, . . . - Runtime, , - - , Cassandra โ . , . .
, transaction id, . . . .
, . , , 10 , 300 , . Cassandra , , Runtime , 10-15 .
, . ? , 30 . , , 30 . , - . 500 , , , Runtime , - . , .
ClickHouse. ClickHouse? , .
- , , . . mutable, ClickHouse mutable-.
- , , visit id. point query, ClickHouse . ClickHouse โฆ, . .
- , json, , json , , - . .
. . .
, .
? ClickHouse, , . attributed , ClickHouse . visit id , , . . , . , , . . ClickHouse, . . . , DWH, .
. ClickHouse , visit id, 1 000-2 000 visit id 1 000-2 000 . .
ClickHouse, 3 , .
, , .
. materialized view , attributed , . . . materialized view . . . materialized view , .
SQL. .
โ ClickHouse json , . . . ClickHouse json. - .
visitParamExtractInt json , . . . transaction id visit id. .
โ materialized . ? , , . . , . ClickHouse . json , .
materialized view โ . . ? -, , . . visit id, .
โ index_granularity. MergeTree, 8 192 index_granularity. ? . ClickHouse , . 8 192. , , , , overhead. index granularity, overhead. , . .
ClickHouse.
-, AggregatingMergeTree. AggregatingMergeTree argMax, . . , timestamp. . . ClickHouse. argMax .
- ยซยป Runtime.
- 3 . , Cassandra, . . .
- 25 ClickHouse. 5 . , Cassandra , click based , impression based. . . , 30 , .
โ Y, .
:
- 5 000 .
- 100 .
- 10 . , , .
- 100 .
.
. , , , . . , . , , , , . . , .
.
. ? 100 , : , . 100 runningDifference . RunningDifference โ ClickHouse, .
, . 5 000 12,5 . , . . 12,5 .
- , อx อy โ . . . . , . - 12,5 , . . 60 . .
-, - , , ClickHouse.
Hadoop , Spark, Greenplum. . . . - , .
ClickHouse .
, , . , , .
? . . . logReturn, .
, . , . . . . . , . . .
. ยซrยป , . . . โ .
12,5 , , . . . 2 500 5 000 . ClickHouse-. , .
, . : , , . logReturn, . . , logReturn. , . . groupArray . . ClickHouse, . .
proof of concept โ , . . . .
: Log_return .
- 50 . 50 โ , . . , 70 .
, , . . . . .
- โ . โ ClickHouse.
- Summing/AggregatingMergeTrees โ , state . .
- Materialized Views . , , , , attribute , . . , , . , . Materialized Views , , ClickHouse.
- .
- , . , , .
, , ClickHouse , open source , . . . . , , , ClickHouse, InfiniDB. , , , SQL. !
- ! ! - Apache Phoenix?
-, , - . ClickHouse . - ClickHouse, . .
( ) Apache Phoenix โ SQL- Hbase. Hbase key-value. . Hbase, Cassandra. . , , ClickHouse.
-
! , . ClickHouse, , ClickHouse , mutable. - , ClickHouse, , ? , . , ?
. . , , mutable, , . . . ClickHouse Oracle, - . ClickHouse , , ClickHouse . . . , , . , - ClickHouse , .
! . . , -. , ? , , , - , ? . . - , ?
, ClickHouse . Hadoop. Hadoop โ . Hadoop . โ , ClickHouse , , . . , - , .
. , 50 . , ?
-.
, .
3- .
! ! . , ClickHouse . . . -, ? ClickHouse? , ? , , ClickHouse , - .
, . ClickHouse . , . ? , . - 2012-2013- ClickHouse. . . - , , . . ClickHouse โ . . . .
! , . ? -. , , , - .
, , . . , Materialized View, , . . . ClickHouse, . . extract - , primary key, , Materialized View. , . Materialized View, . .
.