failure stories , . , β , Β« Β» , . , , . β !
β , . , , , ( ) . . β , .
β¦
ClickHouse- HDD-. 2 , 2 . ( ), : β raw-, β .
, . . : , .
ClickHouse, Β« Β»: , . . . (NVMe) , . , , ( ) : .
, , ZooKeeper. , , , .. . , ? , , : .
ZooKeeper? ZK , ClickHouse, , ( VM β ). ZK β , ( ).
: CH, ZK, .
, :
β¦
. . ClickHouse: CH, , .
Linux-, ClickHouse. : , .
, CH, . :
2020.10.24 18:50:28.105250 [ 75 ] {} <Error> enriched_distributed.Distributed.DirectoryMonitor: Code: 252, e.displayText() = DB::Exception: Received from 192.168.1.4:9000. DB::Exception: Too many parts (300). Merges are processing significantly slower than inserts.. Stack trace:
replication_queue
, β . error- :
/var/log/clickhouse-server/clickhouse-server.err.log.0.gz:2020.10.24 16:18:33.233639 [ 16 ] {} <Error> DB.TABLENAME: DB::StorageReplicatedMergeTree::queueTask()::<lambda(DB::StorageReplicatedMergeTree::LogEntryPtr&)>: Poco::Exception. Code: 1000, e.code() = 0, e.displayText() = DNS error: Temporary DNS error while resolving: clickhouse2-0 (version 19.15.2.2 (official build)
:
2020.10.24 18:38:51.192075 [ 53 ] {} <Error> search_analyzer_distributed.Distributed.DirectoryMonitor: Code: 210, e.displayText() = DB::NetException:
Connection refused (192.168.1.3:9000), Stack trace:
2020.10.24 18:38:57.871637 [ 58 ] {} <Error> raw_distributed.Distributed.DirectoryMonitor: Code: 210, e.displayText() = DB::NetException: Connection refused (192.168.1.3:9000), Stack trace:
, . , SHOW CREATE TABLE
. , β ZKPATH!
ENGINE = ReplicatedMergeTree('/clickhouse/tables/DBNAME/{shard}/TABLENAME', '{replica}')
:
ZK- ;
β CH-;
β ;
ZKPATH
, .
, ClickHouse, . , .
: , ZK , ZKPATH .
«» . «» , . , .
, : .
, , , :
/ ;
, (
replication_queue
);
«» .
( ) , .
β DROP
, ( ) :
,
DROP
( , .. , );
ZK, , .
, , clickhouse2-0
clickhouse2-1
( 2-0 2-1 ).
DROP TABLE
2-0 , 2-1 - : ZK «».
:
rmr /clickhouse/tables/DBNAME/2/TABLENAME/replicas/clickhouse2-1
β¦ node not empty
, . znode «».
ClickHouse DROPβ ZK-, . , ClickHouse 19, 20-. CH , , :
SYSTEM DROP REPLICA 'replica_name' FROM ZKPATH '/path/to/table/in/zk';
β¦ !
( , ) ZK-, CH- - . ? replication_queue
. , ( ) .
, ZK-. :
/clickhouse/tables/DBNAME/1/TABLENAME/replicas/clickhouse-0/queue/
, , replication_queue
CH. SELECT
nodename
, .
β . , :
clickhouse-client -h 127.0.0.1 --pass PASS -q "select node_name from system.replication_queue where source_replica='clickhouse2-1'" > bad_queueid
ZK:
for id in $(cat bad_queueid); do /usr/share/zookeeper/bin/zkCli.sh rmr /clickhouse/tables/DBNAME/2/TABLENAME/replicas/clickhouse-1/queue/$id; sleep 2; done
, SELECT
.
, «» ZK, β .
, .
?
, , . , .
Β« β Β». , - ( ). , , , . , , .
- , , . «» β , , Dev ( ), , Ops. .
hardcode, , : ZooKeeper, / , β¦ β . , .
( ) ClickHouse , β . CH, , ( , ), .
P.S.
:
Β« SRE-. 2Β» (. β2 ClickHouse);
Β« Redis K8s - Β»