Updated 2019-10-08
We are planning introducing Elixir into our toolbox. This page summarizes key resources we have user / are using for learning Elixir and pushing it to production. Feel free to propose changes via pull-request.
Deployment & Containers/Kubernetes
Motivation is to be able to deploy apps leveraging OTP to k8s (and
running in containers). Especially important piece of having a support
for OTP is to be able to use things like long-running
GenServer
processes, migrate state etc. Valuable resources
for this topic are
- Elixir OTP applications on Kubernetes
- ElixirConf 2018 - Docker and OTP Friends or Foes - Daniel Azuma (source code, blogpost)
- Graceful shutdown on Kubernetes with signals & Erlang OTP 20
- An alternative approach seems to be Lasp
This leads into the following key building blocks:
- Establishing Erlang cluster - libcluster
- Moving/Restarting/Monitoring Processes - Horde Supervisor
- Sharing process state / date across nodes - CRDT
- Service discovery - Horde Registry
Distributed systems / data-types
- Using
Rust to Scale Elixir for 11 Million Concurrent Users
- Rust implementation of
SortedSet
which is then used by Elixir backend
- Rust implementation of
- An
Adventure in Distributed Programming by Wiebe-Marten Wijnja
- Open-source chat application
- Intro into distributed systems (CAP, byzantine fault). Rundown of Mnesia, Cassandra, CouchDB and Riak. They are working on Ecto adapter for Riak.
- Distributing Phoenix -- Part 2: Learn You a 𝛿-CRDT for Great Good
- Building
Resilient Systems with Stacking by Chris Keathley
- Recording from ElixrConf EU 2019
Overview of techniques which helps in building more resilient systems. Refers to How Complex Systems Fail for parallels between medical systems and complex distributed services.
Circuit brakers: Recommended implementation is fuse.
Configuration: Should avoid use of "mix configs", instead he pointed to (his) project Vapor. Example of usage (from the talk, chech project for other one):
defmodule Jenga.Application do use Application def start(_type, _args) do = [ config port: "PORT", db_url: "DB_URL", ] = [ children {Jenga.Config, config}, ] = [strategy: :one_for_one, name: Jenga.Supervisor] opts Supervisor.start_link(children, opts) end end defmodule Jenga.Config do use GenServer def start_link(desired_config) do GenServer.start_link(__MODULE__, desired_config, name: __MODULE__) end def init(desired) do :jenga_config = :ets.new(:jenga_config, [:set, :protected, :named_table]) case load_config(:jenga_config, desired) do :ok -> {:ok, %{table: :jenga_config, desired: desired}} :error -> {:stop, :could_not_load_config} end end defp load_config(table, config, retry_count \\ 0) defp load_config(_table, [], _), do: :ok defp load_config(_table, _, 10), do: :error defp load_config(table, [{k, v} | tail], retry_count) do case System.get_env(v) do nil -> (table, [{k, v} | tail], retry_count + 1) load_config-> value :ets.insert(table, {k, value}) (table, tail, retry_count) load_configend end end
Monitoring: you can use Erlang's alarms. Example from the talk, which takes database as dependency and if not reachable will raise an alarm:
defmodule Jenga.Database.Watchdog do use GenServer def init(:ok) do () schedule_check{:ok, %{status: :degraded, passing_checks: 0}} end def handle_info(:check_db, state) do = Jenga.Database.check_status() status = change_state(status, state) state () schedule_check{:noreply, state} end defp change_state(result, %{status: status, passing_checks: count}) do case {result, status, count} do {:ok, :connected, count} -> if count == 3 do :alarm_handler.clear_alarm(@alarm_id) end {status: :connected, passing_checks: count + 1} % {:ok, :degraded, _} -> {status: :connected, passing_checks: 0} % {:error, :connected, _} -> :alarm_handler.set_alarm({@alarm_id, "We cannot connect to the database”}) %{status: :degraded, passing_checks: 0} {:error, :degraded, _} -> %{status: :degraded, passing_checks: 0} end end end
Then alarm handle can be added:
defmodule Jenga.Application do use Application def start(_type, _args) do = [ config port: "PORT", db_url: "DB_URL", ] :gen_event.swap_handler( :alarm_handler, {:alarm_handler, :swap}, {Jenga.AlarmHandler, :ok} ) = [ children {Jenga.Config, config}, Jenga.Database.Supervisor, ] = [strategy: :one_for_one, name: Jenga.Supervisor] opts Supervisor.start_link(children, opts) end end defmodule Jenga.AlarmHandler do require Logger def init({:ok, {:alarm_handler, _old_alarms}}) do Logger.info("Installing alarm handler") {:ok %{}} end def handle_event({:set_alarm, :database_disconnected}, alarms) do # Do something with the alarm rising (e.g. notify monitoring) Logger.error("Database connection lost") {:ok, alarms} end def handle_event({:clear_alarm, :database_disconnected}, alarms) do # Do something with the alarm being cleared (e.g. notify monitoring) Logger.error("Database connection recovered") {:ok, alarms} end def handle_event(event, state) do Logger.info("Unhandled alarm event: #{inspect(event)}") {:ok, state} end end
- Recording from ElixrConf EU 2019
Best practices
Code/Development
typical suspect - credo
mix credo --strict
spend time writing documentation in code with ExDoc
use official formatter in your projects
mix format --check-formatted
nice writeup about putting these tools together - https://itnext.io/enforcing-code-quality-in-elixir-20f87efc7e66
Code Design
GenServer
is a great abstraction, but be aware of becoming a bottleneck thanks to serialization of messages passed to it.- Optimizing Your Elixir and Phoenix projects with ETS
- Avoiding GenServer bottlenecks
- You may not need GenServers and Supervision Trees
- Elixir and Phoenix Performance
- To spawn, or not to spawn?
- The Primitives of Elixir Concurrency: a Full Example
- Elixir Streams to process large HTTP responses on the fly
Ops/Infrastructure/Monitoring
codify / agree on generally shared metrics (e.g. rps, run queue, query time, atoms, memory, latency). Useful libraries - exometer, statix (for
statsd
compatible backends). We still need to evaluate integration with Prometheus, but projects to look at- prometheus.erl & prometheus.ex & prometheus-phoenix - seems to cover a lot of ground
- prometheus_exometer
We have built our very own tracing platform long time ago. But it is time to go with the crowd and adopt OpenTelemetry which is a merger of OpenTracing and OpenCensus. There is an official client for OpenCensus - opencensus-erlang. OpenTracing is supported via Spandex Project. But only DataDog seems to be currently implemented exporter.
Nice write up about low-level ad hoc tracing - A guide to tracing in Elixir!