عجفت الغور

Herbert - Erlang in Anger

books, erlang

Chapter 1: How to Dive into a Codebase

  • Raw erlang apps
    • old fashioned way, pray for a readme or something
  • OTP Application
    • usually there’s some fashion of appname or appname_app which has the supervisor tree, we can typically follow the supervisor tree down for information
    • one_for_one and simple_one_for_one are processes that aren’t dependent on each other, but failures will be combined together for total application shutdown
    • rest_for_one~ represents processes that depend on each other in linear manner
    • one_for_all is used for processes that entirely depend on each other
    • gen_server holds resources and tends to follow client/server patterns
    • gen_fsm deals with sequences of events or inputs, usually used to implement protocols (parse X, see Y, do Z)
    • gen_event is the event hub for callbacks, usually the way to deal with notifications
  • OTP Releases
    • Same as OTP applications, except they might package in the Erlang VM. Check for a thing that’s being build with relx.config or relx in the rebar.config file

Chapter 2: Build Open Source Erlang Software

  • For OTP Applications, usually just look for rebar.config, and srcs are under /apps/<appname>/src
  • Erlang supervisors and supervision trees is that their start phases are syncronous, one tree failing to start will restart until dead
    • The reason there aren’t backoffs or cooldowns for supervisors because restarts need to bring the application back to a stable state, if initialization is unstable, supervision means nothing
    • Supervised procsses provide guarantees, not best effort
    • Note that erlang supervision trees are started depth first
  • For remote resources, consider doing a manager pattern, where we guarantee that the manager is up, but not the actual thing its holding
  • OTP Applications are started in 3 ways:
    • permanent: going down means the entire system shuts down
    • transient: terminations in normal means that there’s no problem
    • temporary: application is allowed to stop for any reason

Chapter 3: Planning for Overload

  • error_logger usually is the one that explodes, usually people use lager as a substitute logging library
  • Waiting on TCP sockets might also cause messages to pile up
    • anything that is a central hub for receiving messages, should have stuff moved out of these
  • Unexpected message backlogs are rare, since usually things just blow up when you get an unexpected message
  • Backpressure is one of the easiest ways, but we should usually check for how long the time out shout be
    • either timeout based, or permission based
    • We can drop with queue buffers, which may be at risk of bufferbloat, where a few messages get buffered and everyone slows down
    • or we can use a stack buffer, where only a restricted number of elements are kept waiting
    • if a certain element is beyond a qos requirement, just drop the rest of the stack and go from there, but bad for a sequence of events
  • Constant overload situations are usually handled by two ways: use processes that act as buffers and load balance, or use ETS tables that act as locks and counters
    • Note that the we want to avoid atoms that are generated dynamically, so people usually use ETS tables that have read_concurrency set to true.