exactly once semantics in kafka (atomic broadcast)
Tags: Kafka, atomic broadcast
Exactly once semantics in Kafka are defined in terms of atomic broadcast:
Messages are published and they will be delivered one time exactly by one or more receiving applications.
Design docs:
- https://cwiki.apache.org/confluence/display/KAFKA/KIP-98+-+Exactly+Once+Delivery+and+Transactional+Messaging
- https://cwiki.apache.org/confluence/display/KAFKA/KIP-129%3A+Streams+Exactly-Once+Semantics
- https://docs.google.com/document/d/11Jqy_GjUGtdXJK94XGsEIK7CP1SnQGdp2eF0wSw9ra8/edit#heading=h.xq0ee1vnpz4o
Notes
- Original impossibility of consensus with one faulty process describes failures within extremely limited scenarios
- few main problems:
- If the producer writes to the log but doesn’t get an ack
- solved by the idempotence of kafka, producer can write as much as it wants (and kafka will just drop duplicates)
- consumer reads from the log but doesn’t update its offset
- Consumer ensures the dervied states it creates and the offsets in the upstream need to stay in sync
- so we can:
- store offsets in the same db as derive state, and only update them as a transaction, restarting reads current offset from the db
- write both state updates and offsets in idempotent fashion
- not technically “exactly once”, more like “effectively once”
- leverages kafka streams
- 3 components:
- log compaction in kafka allows it to be used for a journal and snapshot state changes
- reading data from kafka = advancing offsets
- writing to kafka
- wrapping all 3 in a kafka stream as a whole transaction allows for the atomic broadcasts
- 3 components:
- If the producer writes to the log but doesn’t get an ack