Effective use of ClickHouse. Alexey Milovidov (Yandex)



Since ClickHouse is a specialized system, it is important to take into account the peculiarities of its architecture when using it. In this talk, Alexey will talk about examples of common mistakes when using ClickHouse that can lead to ineffective work. Practical examples will show how the choice of one or another data processing scheme can change performance by orders of magnitude.



Video:





Hello! My name is Alexey, I am doing ClickHouse.





Firstly, I am in a hurry to please you, I will not tell you today what ClickHouse is. To be honest, I'm sick of this. Every time I tell you what it is. And, probably, everyone already knows.





, , . . ClickHouse. , ClickHouse , , , . , .



- , , .



, ? . , , , , - .





, , , , inserts batches, . . inserts.



, ClickHouse insert, . .



, . , .. . 105 - . 700 . - .



MergeTree, . . โ€“ , 400 000 .



, , , 250 000 . โ€“ ClickHouse*.



* 2020 , .





, ? MergeTree 59 . 10 000 . ReplicatedMergeTree โ€“ 6 . , 2 . -, - . ? , ClickHouse . .





โ€“ . , , . , โ€“ . . , . . , , . , , . , . - .



, insert ClickHouse, memtable. log structure MergeTree, MergeTree, logโ€™, memTable. , . 100 , 200 . .





: ยซ ?ยป, , - - ClickHouse.



1. . - . , Kafka. Kafka, . , , .



, Kafka โ€“ . , Kafka. , . , , . .





2. . - , . . , , , . cron, - daemon ClickHouse. , .



, , - , .





3. , . , - - daemon, . , . , , , , , ClickHouse.



kill -9 . , . , , . , .





4. . - . ClickHouse , . , http- transfer-encoding: chunked insertโ€™. , , overhead .



ClickHouse . ClickHouse .



. , , , ClickHouse , insert. ClickHouse inserts . , . .





5. . - community โ€“ . , . , ClickHouse . open source, , , . โ€“ , GitHub, . , - .



* 2020 , KittenHouse.





6. โ€“ Buffer . , . Buffer .



, . MergeTree , buffer , . 10 000 , . , , . .



buffer . - , .





ClickHouse Kafka. โ€“ Kafka. . . Kafka .



, . community . ยซcommunity ยป, . , , .



* 2020 , RabbitMQ.





? insert values values - . , now() โ€“ . ClickHouse , . .



* , , VALUES .



, , . ClickHouse . , , . , , .



* ClickHouse write-ahead log, .





โ€“ .



, . โ€“ , , string. . .



, , , - , , ClickHouse , . - .





, IP-. . , 192.168.1.1. โ€“ UInt32*. 32 IPv4 .



-, , . , , . - .



.



IP-, . 137 . , 37 . , . . 4 .



, . - , IP- . , .



. , , , , , .





.



\1. , . , , , . ClickHouse. . .



, , . . : . , ยซยป, , , - . .



Ulnt32 250. 250 , , , -. , ClickHouse . , , . . , , , . .





, ClickHouse. Enum. Enum . , : , , , . 4 .



, . . alter table. alter table ClickHouse . Enum, . alter * , selects. alter , . . - .



* ClickHouse, ALTER .





ClickHouse โ€“ . ClickHouse , . , : MySQL, Mongo, Postgres. , http . ClickHouse , .



, join . . , . , , .



. .. . , , . . โ€“ , . MySQL.



, , hit rate 100 %. , MySQL. ClickHouse , โ€“ , , .



, โ€“ ClickHouse . . . , , . , .





, , . . โ€“ 64 .



, 64- , . , .



. , - .



. , , - , . , . .. , . , , , . , , - , .



โ€“ .





, , , . . , , ru โ€“ 2 . , , , 2 . , , .





, , , , . โ€“ . , - . , . , . , , , . . .



โ€“ , , join. join โ€“ , . , .



in place, .





- , , - .



, , . , , , .



, . ClickHouse , . , . , , RFC, , , .



. 166 . , 67 , . . . - , - , - , .



- , , . . . , .



, , 126 , 5 . 25 . 4 . . , 25 - -.



, , , - 4. - 25 . ? - . , . .





, , , , . IPv4, UInt32*. IPv6, FixedString(16), IPv6 โ€“ 128 , . . .



, IPv4 , IPv6? , . IPv4, IPv6. , IPv4 IPv6. , IPv4 , .



* ClickHouse IPv4, IPv6, , , , .





, . , - . , , ClickHouse, . - , .



, . , , , . . : 12.3. , , , . , . , . , . .



4 . , ClickHouse. ClickHouse โ€“ . , . 5 BrowserVersion, 5 . .





, , , . ClickHouse . ClickHouse - . - .



, . , 512. 512 โ€“ .





, ClickHouse, Log, . , , .



* ClickHouse LowCardinality .





. - . . , - , , MySQL 3.23.



, .



, , , .





- . , . , MyISAM . .



โ€“ , alter . . MySQL .



, , , .





ClickHouse , , -, , .



: ยซ ClickHouse ?ยป. , . , . . , .



, -, . , , , โ€“ . - . . . ClickHouse .





Alter ClickHouse , alter add/drop column.



, 10 10 000 , . ClickHouse โ€“ , throughput, latency, 10 .





. , .



, maintenance .



, , , - , . โ€“ StripeLog. TinyLog, .



* ClickHouse input.





โ€“ . , 5 , 6 . , . 5 , 1 000 . . , , 200 ClickHouse, . instance .





ClickHouse . instance ClickHouse . . . - , , 56 . , , 56 . 200 ClickHouse , , 10 000 . , .



, instances . - , - . instance, ClickHouse .



, TCP. , . .





, . .



, โ€“ . , 1 000 , . . . ClickHouse AggregatingMergeTree, .



, , . , , SummingMergeTree , 20 - . , .





. -, . , . โ€“ , , , . , .



? , , . . . , . ClickHouse alter . - C++. , C++.



ClickHouse, โ€“ , . ClickHouse , . , .





โ€“ . - production show processlist. , - .



, . , . url in .





โ€“ ? , . , , ru url = - . , url, . . ClickHouse .



- , , ClickHouse . , -. . , . , , .





, ClickHouse IN. , MySQL IN, , 100 - , MySQL 10 , .



โ€“ , ClickHouse, , , full scan, . . , . , .



. , , IN . . . *.



โ€“ , , . , 100 500 . , , 50 . .



* ; , .





, API. , - . - , API , - . - , .



. API, . , -. . .



ClickHouse โ€“ . . , , . .





. .



, , , ClickHouse , ClickHouse .



? pipeline . , , -. ClickHouse . , , - - . , .



. , rsync.



ClickHouse . , ZooKeeper. ZooKeeper , , , , - java-, ClickHouse โ€“ , C++, . ZooKeeper java. - , .





ClickHouse โ€“ . . , Distributed , failover. , , .





, table engines. ClickHouse โ€“ , . , , MergeTree. โ€“ , .



MergeTree , - . . , , default โ€“ 2000 . .



, . .





, . , , . Log.



โ€“ StripeLog TinyLog.



Memory , - .





ClickHouse .



. . . JOIN, , , ClickHouse Hash JOIN. , , JOIN *.



, , , inplace .



* ClickHouse merge join , , . .





, .



ClickHouse . *. . - , , , , .



* update delete batch .



, . , ReplaceMergeTree. merges. optimize table. , .



JOIN ClickHouse โ€“ .



, .



ClickHouse , select.*



* ClickHouse . , . ClickHouse โ€“ Catboost. , : ยซ . !ยป. .





, ClickHouse, . , . , โ€“ . - , , , , .





! ClickHouse?



.



ClickHouse. cli .



.



selectโ€™.



.



GitHub, .



.



, , .



.



. .



. . LZ4, ZSTD*. 64 1 .



* , .



?



. . , .



.



, , uniqExact , . . , uniqExact , . ? . . , , . , , ? - ? , , - .



, , . IP- . , , ClickHouse IP- . , . . uniqExact , . -.



? , user id, in, , , ?



. , , - โ€“ . , , , , , .



, ! ClickHouse! . ?



. . ?



-. MySQL, . . after, , .



. , , - . , .



, .



, , . . , . . . : -. , , , . ?



, , , C++.



C++ ?



-.



*.



* โ€“ pull request.



!



! ! , ClickHouse , . . , ClickHouse, , . , ZooKeeper . - ClickHouse, -, ?



, . , , set max_threads = 1. , . . . , , .



, ! . , ClickHouse . , , . . . ?



-, โ€“ , , . , , . , - , java, exception, . , . . , . ? โ€“ *.



* ClickHouse, " ", .



โ€“ ?



.



! ! , , . - WITH CTE?



. WITH . .



. !



! ! . , , - ?



. . , . *.



* .



- ? , ?



, deletes, updates , selects inserts.



. primary key. , , , ? , , , ?



.



. - primary key, ยซยป , , ? , ?



.



Maybe it makes sense to put such a field in the primary key, according to which the data will be better compressed if they are sorted by this field. For example, user ID. User, for example, goes to the same site. In this case, put the user id and time. And then your data will be better compressed. As for the date, if you really do not have and never have range queries by dates, then you can not put the date in the primary key.



OK thank you very much!




All Articles