عجفت الغور

exactly once semantics in kafka (atomic broadcast)

Tags: Kafka, atomic broadcast

Exactly once semantics in Kafka are defined in terms of atomic broadcast:

Messages are published and they will be delivered one time exactly by one or more receiving applications.

Design docs:

  1. https://cwiki.apache.org/confluence/display/KAFKA/KIP-98+-+Exactly+Once+Delivery+and+Transactional+Messaging
  2. https://cwiki.apache.org/confluence/display/KAFKA/KIP-129%3A+Streams+Exactly-Once+Semantics
  3. https://docs.google.com/document/d/11Jqy_GjUGtdXJK94XGsEIK7CP1SnQGdp2eF0wSw9ra8/edit#heading=h.xq0ee1vnpz4o

Notes

  • Original impossibility of consensus with one faulty process describes failures within extremely limited scenarios
  • few main problems:
    • If the producer writes to the log but doesn’t get an ack
      • solved by the idempotence of kafka, producer can write as much as it wants (and kafka will just drop duplicates)
    • consumer reads from the log but doesn’t update its offset
      • Consumer ensures the dervied states it creates and the offsets in the upstream need to stay in sync
      • so we can:
        • store offsets in the same db as derive state, and only update them as a transaction, restarting reads current offset from the db
        • write both state updates and offsets in idempotent fashion
      • not technically “exactly once”, more like “effectively once”
    • leverages kafka streams
      • 3 components:
        • log compaction in kafka allows it to be used for a journal and snapshot state changes
        • reading data from kafka = advancing offsets
        • writing to kafka
      • wrapping all 3 in a kafka stream as a whole transaction allows for the atomic broadcasts