Apache Kafka is a distributed software message broker used to process large amounts of data in real time. The distinctive features of Apache Kafka include: reliability, scalability and high performance. In the talk, we will analyze the main architectural features and use cases for Apache Kafka. Consider the unobvious moments and rakes that we have collected on the way of the East.
Hello everyone! My name is Grigory! And today we will talk about Kafka.
We will have the following plan:
- Kafka, .
- , , . . Kafka.
- .
- – - . - «, », .
- . : - .
:
[TOC]
Kafka?
- Vostok.
, . . . Hercules . , , , , .
- , , - .
Kafka .
. . , , . Kafka - . Spark, , . .
Kafka 2 , . . .
? Java stack.
Kafka 0.11, Python, DotNet Kafka 1 . stack – DotNet, . Kafka .
Kafka , . , - . - . .
, Apache Kafka
Apache Kafka
, , , Kafka . – producer. Producer – , -.
consumer. , - .
. , Apache Kafka – . , .
.
.
, producer consumer , . , Message Queue. , .
Publish-Subscribe, . . , consumers . consumers.
, poll-, . . : «, consumer, . ». consumer Kafka : « - ?». .
, , consumers.
Apache Kafka
- Topic
- Broker
- Producer
- Consumer
4 . , , producer consumer. .
Kafka Topic
– , producers, consumers. - . – .
. . ?
- .
- . , .
, offset. 0.
. Offset = 0
.
. , . , .
- .
.
, . . 1 , 2 . .
.
- . , . , . .
.
offset, .
. offset .
, , . data, index, timeindex – .
Data – , . . - . . . Kafka , , .
index? - , . index .
. index 8 . int: relative offset, positon.
Relative offset – .
offset , offset, .
, offset int. , offset int.
position. Position – -.
, relative offset , .
-.
. , . offset relative offset , .
, , - . , . Kafka.
timeindex. ? Kafka - . . Timeindex .
Kafka Broker
, . – .
.
. .
, . . .
Kafka , replication factor . , .
, . , .
4 , .
Kafka .
, , .
– , . producers , .
. , . .
– , . . - , - .
.
, . .
Kafka , , -, . , .
? , , . , , follower. .
, , , in sync replica. , .
, . . in sync replica.
, - . follower, ? ? follower . , .
, ?
3 2, . , .
Kafka , . . , .
. .
, , , , .
. . . .
Kafka , – .
? , . . , . Kafka . , .
Kafka Producer
, . . , .
. , , .
Kafka , – , , . , . consumer , .
? , . MurmurHash , .
, round robin, , . . , ..
point: . . . - , . . , , . .
, Kafka , . – offset, timestamp . .
. . .
. 9 .
, . Kafka, acknowledgement, . . . ,
. ?
. , .
.
Kafka, . - . 0 . . .
. 1.
? . Kafka. Kafka . .
. , .
, .
?
Followers : « ?». : «». . . . , , .
, , . all, .
? .
, .
followers .
, . , .
, , , , .
, .
. min.insync.replicas.
3, , , 3 , . . + 2.
, , , , -1
.
Kafka Consumer
, consumer. . consumer . , . - .
, .
.
?
- .
.
- .
version polled Kafka . , , .
. , , . , . . , Kafka. , commit offset.
?
consumer, .
, , . Kafka: « ». , .
- .
. . . .
consumer , , .
.
, .
- .
consumer, .
. ? , consumer , consumer . , Kafka .
consumers . .
, .
- offset, .
, .
– , -.
? -, .
.
consumer .
.
.
consumer , . .
- , .
, , . , .
Kafka
.
, .
, - .
, Kafka.
—
. ? . .
Kafka. log.dirs. , , .
. , - .
– , . , , , .
?
1 , . GC .
. , . Kafka , . , .
— unclean.leader.election.enable=false. 0.11 .
KIP-106 — Change Default unclean.leader.election.enabled from True to False (0.11)
. . . ? , .
. , . ? , . . 1 - , 0 .
, 0.
1 , . .
, , . , , .
https://issues.apache.org/jira/browse/KAFKA-3410
Kafka . , 1.1.
. .
—
. . default.replication.factor = 1.
, . , Kafka , dev- , , default.replication.factor - , . .
, replication.factor, .
– auto.create.topics.enable = true.
. ? , , - , Kafka : «, - ». . , . . replication.factor =1. production, replication.factor, - replication.factor = 1. , - . - .
, , .
. , , , consumer . , .
, Kafka .
, message.max.bytes. 1 000 012 .
? , . , Kafka.
, , . , . consumer , . , consumer , , . - . Kafka Java , Scala.
? - , , , production. , , . . . , .
.
—
. . , . . , Kafka .
Kafka , batch.size. , . 16 – , Kafka- , . . . Kafka, , . , . .
. , 10. . . 160 performance.
, , .
production. , 10. , 160 , 10 . production 1,5 .
? , , . . , . . - . - . .
?
, Kafka KIP-126 — Allow KafkaProducer to split and resend
oversized batches (0.11)
KIP (Kafka Improvement Proposals) – , Kafka, . . .
KIP 0.11 . Kafka producer, , - , Kafka.
, . , . , . , , , , .
, . . . .
. : « ». . , . .
API — send
API, send. ? , Kafka API, . . , Kafka, future. . , , . , send , . . future , . – , . , , , . . . . , . 60 .
. - , . , , Kafka , .
KIP-286: producer.send() should not block on metadata update (discuss)
, ? , , , . , : « ». KIP discuss. , - . , Kafka . , .
API —
, . , , consumer. , poll-, . . consumer Kafka . , consumer :
, Kafka. - - . , . .
consumer.poll ConsumerRecords. Key Event. - , . - Kafka , ?
, . , , . , . .
? fetcher, stack trace , poll.
, message exception: , - - offset. , - . – , . , . offset, offset.
, exception, , partition, , . , , offset .
, , . , .
, . .
.
, . , , .
. ?
, ? . . .
, ?
, .
, exception, . null, .
, .
. . , , , , , .
, consumer.
API —
. . .
, . .
.
, , , , .
, . .
?
KIP-41: KafkaConsumer Max Records (0.10).
, , . KIP .
round robin, . consumer’ . . , . , . .
KIP-387: Fair Message Consumption Across Partitions in KafkaConsumer (discuss)
Google: « ?». KIP – .
discuss. , , , , -. , , . . .
—
. DevOps. .
- : « . ? ». . , .
, . – retention.bytes.
Retention.bytes , . . log.retention.bytes – , .
, , per partition, . . .
. – 1 . 3 – . . . , .
, .
—
KIP-113: Support replicas movement between log directories (1.1)
, , , Kafka . , KIP, . , , . ! , «».
Kafka - «». . - , . , .
- , - , - . , - .
, . , .
KIP-178: Size-based log directory selection strategy (discuss)
KIP, - , discuss, . . , . , .
—
, ? . .
. bash-, partition reassignment.
«», , JSON, bash-, - .
:
, - .
. . , , .
«».
preferred leader. , , , , – ? .
, JSON. , , . , . , .
? ?
. . .
API. , . , .
, DevOps
, . KIP . , , , .
! Kafka – , . .
, . !
, , ? . . , ? , , – , 3 , 4, ?
. : – . ? , . . , . . . partition reassignments. , , . , . , bash- . 5 , 10 . .
, , . .
! Kafka . , , consumers . -, Kafka. , , bash-, - , , . -. true way, , . , .
! , partitions ? , partitions ?
, . , , , . , , . , ( value ), . compaction. Kafka , , , , 10 . . . , , . .
. - best practices partitions, . . consumers consumer groups?
. , , , – . ? , . , , . , consumer group , . - consumers , , . . . .
! ! , ? . - ?
. , , . . , . , Dot Net . – - -.
! , Kafka, . . Kubernetes? Kubernetes, helm charts ? failed offset commit?
Kubernetes. Kubernetes , . . Kafka Kubernetes . . . . , Kafka Kubernetes, . failed commit offset, , , consumers . Consumer, , , . , : « » . . . . GC consumer, . , , , .
. . , consumer . ? , , . , , consumers . consumers . . . . , consumers. -. . .
! consumers, . . , , , , consumer?
, consumer , . . , consumer 3 , . .
! . user case, Kafka ? Postgres, , Kafka , Postgres. , .
The most absurd thing was specially selected there. I wanted to show an absurd situation in the sense that when you get acquainted with Kafka, with its architectural things, you see how well they are made and just fall in love with the technology. The technology is really cool. This was point. This is not about replacing Postgres with Kafka. No, of course not.
What are we using Kafka for? We have a bunch of microservices, thousands of instances launched. We collect logs from them, and in the future there is still a task for tracing, that is, we want to aggregate information of network interactions between microservices through Kafka. And also metrics, applications, i.e. the number of requests, errors, etc.