Theory and practice of using ClickHouse in real applications. Alexander Zaitsev (2018)



Despite the fact that there is a lot of data now almost everywhere, analytical databases are still quite exotic. They are poorly known and even worse know how to use them effectively. Many continue to "eat a cactus" with MySQL or PostgreSQL, which are designed for different scenarios, struggle with NoSQL, or overpay for commercial solutions. ClickHouse is a game changer and significantly lowers the threshold for entering the world of analytical DBMS.



Report from BackEnd Conf 2018 and published with the permission of the speaker.







Who am I and why am I talking about ClickHouse? I am the Development Director at LifeStreet, which uses ClickHouse. Also, I am the founder of Altinity. This is a Yandex partner who promotes ClickHouse and helps Yandex make ClickHouse more successful. I am also ready to share my knowledge about ClickHouse.





. . , .





ยซ ยป, ClickHouse:



  • ,
  • ,
  • .


, .





, , ClickHouse, .



, ClickHouse , ClickHouse , .



, ClickHouse . , .





: ยซ ClickHouse?ยป. , , .





  • โ€“ . ClickHouse . ClickHouse . , - .
  • โ€“ . . , Vertica โ€“ . , . , . . ClickHouse .
  • โ€“ . - . RedShift โ€“ . RedShift . , , Amazon, . Google BigQuery . - , , .


ClickHouse .





ClickHouse ? ClickHouse .



  • -, . . use case, .
  • AdTech ClickHouse.
  • , .
  • ClickHouse . ClickHouse, , .
  • , . . ClickHouse.
  • CloudFlare. - ClickHouse , . community. ClickHouse-. , Kafka Engine ClickHouse.
  • . ClickHouse proof on concept, production.
  • ClickHouse . , , 2 000 . โ€“ .
  • -. , Bloxy.info. ethereum-. ClickHouse.




. , . . .



, :



  • : 500+ , 25 .
  • LifeStreet: 60 , 75 . , , .
  • CloudFlare: 36 , 200 . .
  • Bloomberg: 102 , . .




โ€“ . heatmap, ClickHouse . , , . . 4 .



, . , Altinity, . , , , . . , . . . , . , ClickHouse โ€“ , , .





? , ClickHouse . , . , , , .





ClickHouse .



  • โ€“ : Vertica ClickHouse. , Vertica .
  • โ€“ ClickHouse. . , ClickHouse , . , . , .
  • โ€“ ClickHouse. , ClickHouse Hadoop . , ClickHouse - map reduce , . , .




  • LifeStreet โ€“ Ad Tech , , .
  • , programmatic bidding.
  • : 10 . .
  • , , โ€“ , programmatic bidding.




. HighLoad. LifeStreet MySQL ( Oracle) Vertica. .



, , Vertica โ€“ . . . proof of concept performance testing , 13- 16- . HighLoad.





โ€“ Vertica , . . , . , , - , , . , , .





? , , , . , Vertica, . , .



, open source , , . . . , . .



, , , open source, โ€” .





, , , ClickHouse. , : ยซ?ยป, .





2016- , ClickHouse. , Vertica. . , . . (join), ClickHouse Vertica .



. : ClickHouse Vertica, .



(join), . ClickHouse Vertica . - , . . .





, , LifeStreet ClickHouse.





16- , . , , . , . .





, , .



โ€“ :



  • .
  • . 10 , , LifeStreet 75 3 . , . SQL- , .
  • , ClickHouse , Vertica, , SAS-. ClickHouse SATA. ? Vertica insert . , , , , . . . ClickHouse insert . , , , ClickHouse , . . SATA, RAID , .
  • , . . 3 60 (20 โ€“ ) 6 . Vertica .




.



  • โ€“ . .
  • โ€“ SQL.




OLAP- โ€“ select. group by, . where, . group by . .





star-, , .





, , , . , . , . . - .



ClickHouse . :



  • โ€“ ClickHouse (join), . . (join) , . .
  • โ€“ , . , star-, - . , , . .


ClickHouse . :



  • โ€“ . External Dictionaries โ€“ , 99 % star-, .
  • โ€“ . (join) .




  • (join).
  • . 2018- ( ) , . . , . โ€“ .
  • , (join) , , , , , , .




  • (join).
  • 1 .
  • , . - .


. , .





, . :



  • . - .
  • key-value . - .
  • , - .


. - , (join).





ClickHouse , string key-value.



โ€“ , , . , .



. has, , . , , .



subid - . , , , . .



, , , , -, . , -, , .





. , ID. . arrayMap. -. -. ID .



. , .





.



, , .



  • ClickHouse . .
  • . ?
  • (join), . , , .
  • โ€“ . , , . (join), .


- .





. , -5 . 2,5 , -. , - . , , (int). . . 2,5 1,5 . .





. . 5 . , , - , , .





. , , , , , . .





  • .
  • , .
  • - (join), , , , (join) .
  • .


, , . .





. . ?



:



  • - .
  • .




?



, , 20 , . - , , . , , - .



: ยซ , ?ยป ยซ , ?ยป. . . , , , ?



, , - , . . - . DWH. , .





โ€“ , , - , , , , . , , , - , . , . .





.



โ€“ :



  • Last Interaction, interaction โ€“ , .
  • First Interaction, . . , .
  • โ€“ .
  • .
  • .




? Runtime Cassandra. Cassandra transaction storage, . . . - Runtime, , - - , Cassandra โ€“ . , . .



, transaction id, . . . .



, . , , 10 , 300 , . Cassandra , , Runtime , 10-15 .



, . ? , 30 . , , 30 . , - . 500 , , , Runtime , - . , .





ClickHouse. ClickHouse? , .



  • , , . . mutable, ClickHouse mutable-.
  • , , visit id. point query, ClickHouse . ClickHouse โ€ฆ, . .
  • , json, , json , , - . .


. . .





, .



? ClickHouse, , . attributed , ClickHouse . visit id , , . . , . , , . . ClickHouse, . . . , DWH, .



. ClickHouse , visit id, 1 000-2 000 visit id 1 000-2 000 . .





ClickHouse, 3 , .



, , .



. materialized view , attributed , . . . materialized view . . . materialized view , .





SQL. .



โ€“ ClickHouse json , . . . ClickHouse json. - .



visitParamExtractInt json , . . . transaction id visit id. .



โ€“ materialized . ? , , . . , . ClickHouse . json , .



materialized view โ€“ . . ? -, , . . visit id, .



โ€“ index_granularity. MergeTree, 8 192 index_granularity. ? . ClickHouse , . 8 192. , , , , overhead. index granularity, overhead. , . .





ClickHouse.



-, AggregatingMergeTree. AggregatingMergeTree argMax, . . , timestamp. . . ClickHouse. argMax .





  • ยซยป Runtime.
  • 3 . , Cassandra, . . .
  • 25 ClickHouse. 5 . , Cassandra , click based , impression based. . . , 30 , .




โ€“ Y, .



:



  • 5 000 .
  • 100 .
  • 10 . , , .
  • 100 .


.





. , , , . . , . , , , , . . , .



.





. ? 100 , : , . 100 runningDifference . RunningDifference โ€“ ClickHouse, .



, . 5 000 12,5 . , . . 12,5 .



- , อžx อžy โ€“ . . . . , . - 12,5 , . . 60 . .





-, - , , ClickHouse.





Hadoop , Spark, Greenplum. . . . - , .





ClickHouse .



, , . , , .



? . . . logReturn, .



, . , . . . . . , . . .



. ยซrยป , . . . โ€“ .



12,5 , , . . . 2 500 5 000 . ClickHouse-. , .





, . : , , . logReturn, . . , logReturn. , . . groupArray . . ClickHouse, . .





proof of concept โ€“ , . . . .



: Log_return .



- 50 . 50 โ€“ , . . , 70 .



, , . . . . .





  • โ€“ . โ€“ ClickHouse.
  • Summing/AggregatingMergeTrees โ€“ , state . .
  • Materialized Views . , , , , attribute , . . , , . , . Materialized Views , , ClickHouse.
  • .
  • , . , , .




, , ClickHouse , open source , . . . . , , , ClickHouse, InfiniDB. , , , SQL. !





- ! ! - Apache Phoenix?



-, , - . ClickHouse . - ClickHouse, . .



  • ( ) Apache Phoenix โ€“ SQL- Hbase. Hbase key-value. . Hbase, Cassandra. . , , ClickHouse.





    • ! , . ClickHouse, , ClickHouse , mutable. - , ClickHouse, , ? , . , ?



    • . . , , mutable, , . . . ClickHouse Oracle, - . ClickHouse , , ClickHouse . . . , , . , - ClickHouse , .







! . . , -. , ? , , , - , ? . . - , ?



, ClickHouse . Hadoop. Hadoop โ€“ . Hadoop . โ€“ , ClickHouse , , . . , - , .



. , 50 . , ?



-.



, .



3- .



! ! . , ClickHouse . . . -, ? ClickHouse? , ? , , ClickHouse , - .



, . ClickHouse . , . ? , . - 2012-2013- ClickHouse. . . - , , . . ClickHouse โ€“ . . . .



! , . ? -. , , , - .



, , . . , Materialized View, , . . . ClickHouse, . . extract - , primary key, , Materialized View. , . Materialized View, . .



.




All Articles