How to use ClickHouse not for its intended purpose

Alexey Milovidov makes ClickHouse and, of course, knows it inside and out. Including how it can be used in addition to its standard and all well-known functions.





And today he will talk about these unusual ways of using it and, perhaps, not even for storing and processing data.





ClickHouse for hardware tests

The easiest thing to do with ClickHouse if there are free servers is to use it for hardware tests. Because its test dataset contains the same data from production Yandex, only anonymized - and they are available outside for testing. I talked about how to prepare good anonymized data at Saint HighLoad ++ 2019 in St. Petersburg. 





We install ClickHouse on any Linux (x86_64, AArch64) or Mac OS. How to do it? - we collect it for every commit and pull request. ClickHouse Build Check will show us all the details of all possible builds:





β€” gcc clang , debug, , x86, ARM Mac OS. ClickHouse : CPU, . β€” , . 





, . 30 ClickHouse. ClickHouse, .





:





:





β€” . , , , SPECint SPEC. ClickHouse , .





ClickHouse

, ClickHouse β€” + . - . ClickHouse, code.txt:





, , , C++ . shell- , . , , , β€” , β€” , Β«return falseΒ».





1,665 . . , LC_ALL=C, 0,376 , 5 . - . 





? , clickhouse-local, . 





- , , β€” clickhouse-local SQL . , ( , β€” TabSeparated), . 0.103 β€” 3,7–16 ( , ).





- , GitHub Archive β€” , GitHub, , issue, , -. https://www.gharchive.org/ ( 890 ):





- , ClickHouse local:





time clickhouse-local --query "SELECT * FROM

file('*.json.gz', TSV, 'data String')

WHERE JSONExtractString(data, 'actor', 'login') = 'alexey-milovidov'

LIMIT 10" | jq








file, *.json.gz β€” TSV, string. JSON JSONβ€˜ 'actor', β€” 'login' , Β« Β» β€” 10 GitHub.





, 890 1,3 . . , 10 , . , , , GitHub.





clickhouse-local --query "SELECT count() FROM

file('*.json.gz', TSV, 'data String')

WHERE JSONExtractString(data, 'actor', 'login') = 'alexey-milovidov'"








SELECT COUNT... , . , dstat:





, 530 / β€” ( RAID HDD).





ClickHouse local 980 . ClickHouse url β€” file https://.../*.json.gz, . 





ClickHouse, :





  1. file.





  2. glob patterns. glob patterns (, .)





  3. gzip, xz zstd . gz .





  4. JSON. , JSON, . - , .





  5. . , . , .





  6. .





, , . β€” MergeTree. : , SELECT clickhouse-client. β€” ,   protobuf JSON : 





clickhouse-local --input-format Protobuf --format-schema -

--output-format JSONEachRow ...








: GitHub Archive .





Serverless ClickHouse

ClickHouse serverless-. , ClickHouse - Google Cloud Run: https://mybranch.dev/posts/clickhouse-on-cloud-run/ (Alex Reid). ClickHouse .





, , tab separated (TSV) comma separated (CSV). CustomSeparated, , .





CustomSeparated:





format_custom_escaping_rule







format_custom_field_delimiter







format_custom_row_before/between/after_delimiter







format_custom_result_before/after_delimiter







, . β€” . , CSV, JSON, CSV. , . | . , ..





β€” Template:





format_template_resultset







format_template_row







format_template_rows_between_delimiter







, , , . XML, .





Regexp:





format_regexp







format_regexp_escaping_rule







format_regexp_skip_unmatched







clickhouse-local awk. , Regexp subpatterns, subpattern . . β€” , , .





ClickHouse

β€” ClickHouse. Mongo, . ClickHouse, β€” .





, , , β€” . 'message' String. JSON, JSON . β€” , , 'actor.login', JSON β€” . ClickHouse , ALTER :





, actor_login , SELECT , β€” . : 





ALTER TABLE logs UPDATE actor_login = actor_login







, .





MySQL

ClickHouse MySQL. : , , , , ( , ), SELECT 15 :





: MySQL , MySQL , β€” 15 . , MySQL ?





5 41 β€” ! ClickHouse - β€” MySQL ClickHouse . MySQL β€” ? 





β€” . ClickHouse β€œβ€ (20577 13772), MySQL β€” (44744), collation ( ) GROUP BY. , , :





, . ClickHouse , . . MySQL ClickHouse .   MySQL :





, . SELECT:





6 , β€” , , , . MySQL ClickHouse . MySQL , MySQL ClickHouse-, ClickHouse. Distributed , , ClickHouse- ClickHouse, , MySQL. 





, - ( ClickHouse). :





, β€” MergeTree . SELECT:





, SELECT 0,6 . , β€” ClickHouse!





ClickHouse MySQL. MySQL ClickHouse , , , MySQL. ClickHouse:





ClickHouse . , odbc PostgreSQL, url β€” REST-. :





: ClickHouse postgresql, PostgreSQL PostgreSQL. .





ClickHouse

ClickHouse CatBoost. , modelEvaluate. 





. : , , : , , . β€” , , . ClickHouse CatBoost, .





ClickHouse. β€” , . , , , GROUP BY:





State:





SELECT stochasticLogisticRegressionState(...







k . AggregateFunction(stochasticLogisticRegression(01, 00, 10, 'Adam'), ...), .   applyMLModel:





. , , :





.





ClickHouse

, ClickHouse β€” , . , . , , pagerank: 





, , , . , Amos Bird. , open-source. .





UDF ClickHouse

, ClickHouse (user defined functions). . , cache- executable, . stdin , stdout . , . 





Python, , β€”  , β€” ClickHouse, user defined function.





: UDF roadmap 2021 .





ClickHouse GPU Application Server

. nVidia ClickHouse , . 





Zhang2014 β€” ClickHouse Application Server. Zhang2014 pull request, HTTP- (SELECT INSERT). POST - , - GET , , SELECT .





ClickHouse β€” , - , , , - . , ClickHouse - . production!








All Articles