Herbert - Erlang in Anger

Chapter 1: How to Dive into a Codebase

Raw erlang apps
- old fashioned way, pray for a readme or something
OTP Application
- usually there’s some fashion of appname or appname_app which has the supervisor tree, we can typically follow the supervisor tree down for information
- one_for_one and simple_one_for_one are processes that aren’t dependent on each other, but failures will be combined together for total application shutdown
- rest_for_one~ represents processes that depend on each other in linear manner
- one_for_all is used for processes that entirely depend on each other
- gen_server holds resources and tends to follow client/server patterns
- gen_fsm deals with sequences of events or inputs, usually used to implement protocols (parse X, see Y, do Z)
- gen_event is the event hub for callbacks, usually the way to deal with notifications
OTP Releases
- Same as OTP applications, except they might package in the Erlang VM. Check for a thing that’s being build with relx.config or relx in the rebar.config file

For OTP Applications, usually just look for rebar.config, and srcs are under /apps/<appname>/src
Erlang supervisors and supervision trees is that their start phases are syncronous, one tree failing to start will restart until dead
- The reason there aren’t backoffs or cooldowns for supervisors because restarts need to bring the application back to a stable state, if initialization is unstable, supervision means nothing
- Supervised procsses provide guarantees, not best effort
- Note that erlang supervision trees are started depth first
For remote resources, consider doing a manager pattern, where we guarantee that the manager is up, but not the actual thing its holding
OTP Applications are started in 3 ways:
- permanent: going down means the entire system shuts down
- transient: terminations in normal means that there’s no problem
- temporary: application is allowed to stop for any reason

error_logger usually is the one that explodes, usually people use lager as a substitute logging library
Waiting on TCP sockets might also cause messages to pile up
- anything that is a central hub for receiving messages, should have stuff moved out of these
Unexpected message backlogs are rare, since usually things just blow up when you get an unexpected message
Backpressure is one of the easiest ways, but we should usually check for how long the time out shout be
- either timeout based, or permission based
- We can drop with queue buffers, which may be at risk of bufferbloat, where a few messages get buffered and everyone slows down
- or we can use a stack buffer, where only a restricted number of elements are kept waiting
- if a certain element is beyond a qos requirement, just drop the rest of the stack and go from there, but bad for a sequence of events
Constant overload situations are usually handled by two ways: use processes that act as buffers and load balance, or use ETS tables that act as locks and counters
- Note that the we want to avoid atoms that are generated dynamically, so people usually use ETS tables that have read_concurrency set to true.