Alexey Milovidov makes ClickHouse and, of course, knows it inside and out. Including how it can be used in addition to its standard and all well-known functions.
And today he will talk about these unusual ways of using it and, perhaps, not even for storing and processing data.
ClickHouse for hardware tests
The easiest thing to do with ClickHouse if there are free servers is to use it for hardware tests. Because its test dataset contains the same data from production Yandex, only anonymized - and they are available outside for testing. I talked about how to prepare good anonymized data at Saint HighLoad ++ 2019 in St. Petersburg.
We install ClickHouse on any Linux (x86_64, AArch64) or Mac OS. How to do it? - we collect it for every commit and pull request. ClickHouse Build Check will show us all the details of all possible builds:
β gcc clang , debug, , x86, ARM Mac OS. ClickHouse : CPU, . β , .
, . 30 ClickHouse. ClickHouse, .
:
:
β . , , , SPECint SPEC. ClickHouse , .
ClickHouse
, ClickHouse β + . - . ClickHouse, code.txt:
, , , C++ . shell- , . , , , β , β , Β«return falseΒ».
1,665 . . , LC_ALL=C, 0,376 , 5 . - .
? , clickhouse-local, .
- , , β clickhouse-local SQL . , ( , β TabSeparated), . 0.103 β 3,7β16 ( , ).
- , GitHub Archive β , GitHub, , issue, , -. https://www.gharchive.org/ ( 890 ):
- , ClickHouse local:
time clickhouse-local --query "SELECT * FROM
file('*.json.gz', TSV, 'data String')
WHERE JSONExtractString(data, 'actor', 'login') = 'alexey-milovidov'
LIMIT 10" | jq
file, *.json.gz β TSV, string. JSON JSONβ 'actor', β 'login' , Β« Β» β 10 GitHub.
, 890 1,3 . . , 10 , . , , , GitHub.
clickhouse-local --query "SELECT count() FROM
file('*.json.gz', TSV, 'data String')
WHERE JSONExtractString(data, 'actor', 'login') = 'alexey-milovidov'"
SELECT COUNT... , . , dstat:
, 530 / β ( RAID HDD).
ClickHouse local 980 . ClickHouse url β file https://.../*.json.gz, .
ClickHouse, :
file.
glob patterns. glob patterns (, .)
gzip, xz zstd . gz .
JSON. , JSON, . - , .
. , . , .
.
, , . β MergeTree. : , SELECT clickhouse-client. β , protobuf JSON :
clickhouse-local --input-format Protobuf --format-schema -
--output-format JSONEachRow ...
Serverless ClickHouse
ClickHouse serverless-. , ClickHouse - Google Cloud Run: https://mybranch.dev/posts/clickhouse-on-cloud-run/ (Alex Reid). ClickHouse .
, , tab separated (TSV) comma separated (CSV). CustomSeparated, , .
CustomSeparated:
format_custom_escaping_rule
format_custom_field_delimiter
format_custom_row_before/between/after_delimiter
format_custom_result_before/after_delimiter
, . β . , CSV, JSON, CSV. , . | . , ..
β Template:
format_template_resultset
format_template_row
format_template_rows_between_delimiter
, , , . XML, .
Regexp:
format_regexp
format_regexp_escaping_rule
format_regexp_skip_unmatched
clickhouse-local awk. , Regexp subpatterns, subpattern . . β , , .
ClickHouse
β ClickHouse. Mongo, . ClickHouse, β .
, , , β . 'message' String. JSON, JSON . β , , 'actor.login', JSON β . ClickHouse , ALTER :
, actor_login , SELECT , β . :
ALTER TABLE logs UPDATE actor_login = actor_login
, .
MySQL
ClickHouse MySQL. : , , , , ( , ), SELECT 15 :
: MySQL , MySQL , β 15 . , MySQL ?
5 41 β ! ClickHouse - β MySQL ClickHouse . MySQL β ?
β . ClickHouse ββ (20577 13772), MySQL β (44744), collation ( ) GROUP BY. , , :
, . ClickHouse , . . MySQL ClickHouse . MySQL :
, . SELECT:
6 , β , , , . MySQL ClickHouse . MySQL , MySQL ClickHouse-, ClickHouse. Distributed , , ClickHouse- ClickHouse, , MySQL.
, - ( ClickHouse). :
, β MergeTree . SELECT:
, SELECT 0,6 . , β ClickHouse!
ClickHouse MySQL. MySQL ClickHouse , , , MySQL. ClickHouse:
ClickHouse . , odbc PostgreSQL, url β REST-. :
: ClickHouse postgresql, PostgreSQL PostgreSQL. .
ClickHouse
ClickHouse CatBoost. , modelEvaluate.
. : , , : , , . β , , . ClickHouse CatBoost, .
ClickHouse. β , . , , , GROUP BY:
State:
SELECT stochasticLogisticRegressionState(...
k . AggregateFunction(stochasticLogisticRegression(01, 00, 10, 'Adam'), ...), . applyMLModel:
. , , :
ClickHouse
, ClickHouse β , . , . , , pagerank:
, , , . , Amos Bird. , open-source. .
UDF ClickHouse
, ClickHouse (user defined functions). . , cache- executable, . stdin , stdout . , .
Python, , β , β ClickHouse, user defined function.
: UDF roadmap 2021 .
ClickHouse GPU Application Server
Zhang2014 β ClickHouse Application Server. Zhang2014 pull request, HTTP- (SELECT INSERT). POST - , - GET , , SELECT .
ClickHouse β , - , , , - . , ClickHouse - . production!