In the report, it is planned to consider the little-known or insufficiently well-covered features of ClickHouse in the documentation: incremental aggregation and manipulations with the states of aggregate functions, inter-cluster copying, executing queries without using a server, etc. Examples from the practice of developing Yandex services will be given: how to get the most out of the system.
! ! ClickHouse. , , .
, . , , . . . .
ClickHouse, , . - clickstream. , , -. , , ..
, , . . - .
. , , pupkin.narod.ru; , yandex.ru. . ? .
. , . MergeTree, .
, . CREATE TABLE. ORDER BY β . , , . . , . . , , Hash . . . hash .
, . - . , . .
4,5 , . . 3,5 . . 4,5 .
1/10. from . 0,6 . , - , 2,5 . , , , . , β . , 10 . , .
, ?
- -, . . , .
- . , unix timestamp.
- timestamp, ( , ). , , DevOps, . . , .
- - . , .
- , .
- , url, hash . , , , . , url . - hash-.
- - .
- . , - .
- , . , . - hash. , . ? . - , . , - , .
- . β , . . .
, , :
- -, , , hash-, . . , , .
- . , . . . . . 1/10 hashes .
- , .
:
- β 1/10 .
- β β - , , 1 000 000. . , , 1 000 000, . , , , , . . . 1 000 000 10 000 000 1 000 000 000? β _sample_factor. : x _sample_factor, , .
- β SAMPLE OFFSET. . 1/10 , : Β« , , 1/10 Β». : SAMPLE 1/10 OFFSET 1/10. .
- . , , . , . , , . ? , , , , , overhead.
β . , , , count distinct, uniq. uniq 4 .
. . , If. , : sumIf. , . β , , β , - UInt8. , , . , , .
? β . ..
Google. .
β Array. , Array. , , . .
. . , .
groupArray, . , β¦ . . . .
groupUniqArray. .
Array - , , .
, groupArrayArray? , , . .
groupUniqArrayArray β , .
, . , . , , , , , .
. . . sum, Array, If , , . . . sumArrayIf SumIfArray.
? . , . β , , β : . sumIfArray. . . Array sumIf .
β sumArrayIf. , β : sumArray .
. : sumForEachStateForEachIfArrayIfState. , , , . . , .
. , , . , . .
, , . . .
, - , .
, count distinct? Hash table. hashβ .
, , . : Β« , , Β». - .
, ClickHouse . βState, . , , AggregateFunction - .
, . . . AggtregateFunction. , , .
βMerge. , , .
, .
uniq . .
. βState. - , . - . , UTF-8. , , UTF-8.
? AggregateFunction . . β , β .
. , , . ?
β .
, ClickHouse β , β . clickstream, , , . , . ClickHouse , .
. - , , .
. AggregatingMergeTree. , . . , - . . , . . , , count distinct, .
, . ClickHouse, , .
. ?
- β , . , . , . , ClickHouse-, . .
- , , , , β . sum sumIf β , .
- , *. , , , arrayReduce. , , , , . . βState, .
* 2020 , initializeAggregation.
ClickHouse β .
, ClickHouse , . . INSERT, , . . INSERT , . , .
conflict-free , update. INSERT. , .
mulit-master. . , . , , . . . exactly-once .
, . , INSERT, , , , .
. , INSERT.
. INSERT, SELECT, .
β INSERT. , , . insert_quorum = 2. .
, INSERT , , INSERT, .
SELECT , select_sequential_consistency. , . select_linearizability, .
, SELECT, ZooKeeper. SELECT , , . . INSERT, .
, , - , . , , .
, . , , , .
. .
, , GROUP BY, , , - . , .
. . . .
. progress-bar, . ClickHouse- β , . , , . , 9,31 , 10 .
, : Β« 10 ?Β». . .
, . , , . - 0,5 . , , .
- , 10 . , , . , - .
: Β« max_memory_usage Β»?
, . . , production .
β GROUP BY . ? , - . , . , . , . . . . . - . , .
, :
- β , .
- β merge, . . merge , .
. dataset - . . datasets - , buckets. .
. max_memory_usage 10 . 8 (max_bytes_before_external_group_by = 8 000 000 000, distributed_aggregation_memory_efficient = 1). progress-bar - . . ? , . , , .
. , , . , , . . . , .
, . . . , 10 GROUP BY, , 1 .
, , , .
. .
, ClickHouse β - geospatial-, .
β . . , , .
? , . , - , . , , .
? , .
pointInPolygon. β : lat, lon, . *. - .
* 2020 , .
, . , -, . , . . .
, pointInPolygon, . pointInElliplses, ellipses . , , , . , . . 0 1, ellipses.
β greatCircleDistance. *. . , .
* 2020 , geoDistance, WGS-84 .
https://events.yandex.ru/lib/talks/5330/
β ClickHouse . ClickHouse. , , - . ClickHouse. .
. . , . modelEvaluate.
, , .
, CTR . . , : https://events.yandex.ru/lib/talks/5330/. .
. CatBoost. CatBoost? .
, ?
- - , . , - , . *. , . - , , . .
- , , ClickHouse . . , β . , , . , . , ClickHouse. . .
- , , -, . . .
* ClickHouse.
. ClickHouse ClickHouse-.
- . ClickHouse. - . .
? grep, sed, awk perl. , , - , ClickHouse , . ClickHouse - , . . clickhouse-local.
. β , . . . , ClickHouse - .
, . , , JSON, JSONEachRow. stdin . , , . , , - ClickHouse. , , . , awk, perl, sed. , grep, . . .
, , , , β ClickHouse-. . .
ClickHouse-, ClickHouse-local , . , ClickHouse- .
? , ?
Date DateTime. Date , , ISO 8601
, DateTime , - , . , , , *.
, . parseDateTimeBestEffort. , . , . . , , , .
* , date_time_input_format
.
, , Hadoop . ClickHouse-local MapReduce jobs Hadoop.
, Parquet. pull request. , *.
* , β !
, , trash SQL. , - regexpβ , clickhouse-local. , Awk, , ClickHouse*.
* β Regexp
.
, , .
? ClickHouse . . . , , . , .
? . , INSERT SELECT . . , .
, . . , , , , , β , , . - . .
, ClickHouse-copier. , . Zookeeper. β , .
. . . , .
- , . - .
β production.
.. 538 . 240 . , . .
. , . . lz4, zstd . . , . ClickHouse-copier. , , . - .
, - . , , . , , , , .
, !
! . . real time , ?
. , .
. ., , , ? ? ?
. , , , . - . , . . . . , . . . . , . .
. . ClickHouse?
, ClickHouse. . . . , . , , . , , , .
! Avro, Parquet, MapReduce machine learning, - , ClickHouse - jobs YARN, Mesos? . . , Spark, , , date locality , ?
, , , . , Hadoop MapReduce YT. , ClickHouse YT, , . - , .
. . . YT - ? . . , ClickHouse, tool CSV job?
*. , - . ClickHouse-, - , ClickHouse, YT. .
* .
! ClickHouse-local - - - , ?
, standalone-. β ClickHouse. . . ClickHouse, , , local, , .
! . groupUniqArrayArray order , ?
, , , .
! ! ClickHouse. . Vertica. , . ? CSV. CSV ClickHouse. ClickHouse , . . , , , . , , . - ?
ClickHouse JOIN , ? *. ClickHouse hash JOIN, . . , hash- . . . , , - . , merge JOIN, , .
* , join_algorithm
. , , .
! , , master-master , , , ? , , ?
β , pull request. . , . Master-master . -, ClickHouse ZooKeeper. . , , , - - .
! ! ClickHouse-copier , , , ?
ClickHouse-copier , operation-. . , , , . . . , . . , -, . . , ZooKeeper. β *.
* , .