Updated 2019-10-08
We are planning introducing Elixir into our toolbox. This page summarizes key resources we have user / are using for learning Elixir and pushing it to production. Feel free to propose changes via pull-request.
Deployment & Containers/Kubernetes
Motivation is to be able to deploy apps leveraging OTP to k8s (and running in containers). Especially important piece of having a support for OTP is to be able to use things like long-running GenServer
processes, migrate state etc. Valuable resources for this topic are
- Elixir OTP applications on Kubernetes
- ElixirConf 2018 - Docker and OTP Friends or Foes - Daniel Azuma (source code, blogpost)
- Graceful shutdown on Kubernetes with signals & Erlang OTP 20
- An alternative approach seems to be Lasp
This leads into the following key building blocks:
- Establishing Erlang cluster - libcluster
- Moving/Restarting/Monitoring Processes - Horde Supervisor
- Sharing process state / date across nodes - CRDT
- Service discovery - Horde Registry
Distributed systems / data-types
- Using Rust to Scale Elixir for 11 Million Concurrent Users
- Rust implementation of
SortedSet
which is then used by Elixir backend
- Rust implementation of
- An Adventure in Distributed Programming by Wiebe-Marten Wijnja
- Open-source chat application
- Intro into distributed systems (CAP, byzantine fault). Rundown of Mnesia, Cassandra, CouchDB and Riak. They are working on Ecto adapter for Riak.
- Distributing Phoenix – Part 2: Learn You a 𝛿-CRDT for Great Good
Building Resilient Systems with Stacking by Chris Keathley
- Recording from ElixrConf EU 2019
- Overview of techniques which helps in building more resilient systems. Refers to How Complex Systems Fail for parallels between medical systems and complex distributed services.
- Circuit brakers: Recommended implementation is fuse.
Configuration: Should avoid use of “mix configs”, instead he pointed to (his) project Vapor. Example of usage (from the talk, chech project for other one):
defmodule Jenga.Application do use Application def start(_type, _args) do config = [ port: "PORT", db_url: "DB_URL", ] children = [ {Jenga.Config, config}, ] opts = [strategy: :one_for_one, name: Jenga.Supervisor] Supervisor.start_link(children, opts) end end defmodule Jenga.Config do use GenServer def start_link(desired_config) do GenServer.start_link(__MODULE__, desired_config, name: __MODULE__) end def init(desired) do :jenga_config = :ets.new(:jenga_config, [:set, :protected, :named_table]) case load_config(:jenga_config, desired) do :ok -> {:ok, %{table: :jenga_config, desired: desired}} :error -> {:stop, :could_not_load_config} end end defp load_config(table, config, retry_count \\ 0) defp load_config(_table, [], _), do: :ok defp load_config(_table, _, 10), do: :error defp load_config(table, [{k, v} | tail], retry_count) do case System.get_env(v) do nil -> load_config(table, [{k, v} | tail], retry_count + 1) value -> :ets.insert(table, {k, value}) load_config(table, tail, retry_count) end end end
Monitoring: you can use Erlang’s alarms. Example from the talk, which takes database as dependency and if not reachable will raise an alarm:
defmodule Jenga.Database.Watchdog do use GenServer def init(:ok) do schedule_check() {:ok, %{status: :degraded, passing_checks: 0}} end def handle_info(:check_db, state) do status = Jenga.Database.check_status() state = change_state(status, state) schedule_check() {:noreply, state} end defp change_state(result, %{status: status, passing_checks: count}) do case {result, status, count} do {:ok, :connected, count} -> if count == 3 do :alarm_handler.clear_alarm(@alarm_id) end %{status: :connected, passing_checks: count + 1} {:ok, :degraded, _} -> %{status: :connected, passing_checks: 0} {:error, :connected, _} -> :alarm_handler.set_alarm({@alarm_id, "We cannot connect to the database”}) %{status: :degraded, passing_checks: 0} {:error, :degraded, _} -> %{status: :degraded, passing_checks: 0} end end end
Then alarm handle can be added:
defmodule Jenga.Application do use Application def start(_type, _args) do config = [ port: "PORT", db_url: "DB_URL", ] :gen_event.swap_handler( :alarm_handler, {:alarm_handler, :swap}, {Jenga.AlarmHandler, :ok} ) children = [ {Jenga.Config, config}, Jenga.Database.Supervisor, ] opts = [strategy: :one_for_one, name: Jenga.Supervisor] Supervisor.start_link(children, opts) end end defmodule Jenga.AlarmHandler do require Logger def init({:ok, {:alarm_handler, _old_alarms}}) do Logger.info("Installing alarm handler") {:ok %{}} end def handle_event({:set_alarm, :database_disconnected}, alarms) do # Do something with the alarm rising (e.g. notify monitoring) Logger.error("Database connection lost") {:ok, alarms} end def handle_event({:clear_alarm, :database_disconnected}, alarms) do # Do something with the alarm being cleared (e.g. notify monitoring) Logger.error("Database connection recovered") {:ok, alarms} end def handle_event(event, state) do Logger.info("Unhandled alarm event: #{inspect(event)}") {:ok, state} end end
Best practices
Code/Development
typical suspect - credo
mix credo --strict
spend time writing documentation in code with ExDoc
use official formatter in your projects
mix format --check-formatted
nice writeup about putting these tools together - https://itnext.io/enforcing-code-quality-in-elixir-20f87efc7e66
Code Design
GenServer
is a great abstraction, but be aware of becoming a bottleneck thanks to serialization of messages passed to it.- Optimizing Your Elixir and Phoenix projects with ETS
- Avoiding GenServer bottlenecks
- You may not need GenServers and Supervision Trees
- Elixir and Phoenix Performance
- To spawn, or not to spawn?
- The Primitives of Elixir Concurrency: a Full Example
- Elixir Streams to process large HTTP responses on the fly
Ops/Infrastructure/Monitoring
codify / agree on generally shared metrics (e.g. rps, run queue, query time, atoms, memory, latency). Useful libraries - exometer, statix (for
statsd
compatible backends). We still need to evaluate integration with Prometheus, but projects to look at- prometheus.erl & prometheus.ex & prometheus-phoenix - seems to cover a lot of ground
- prometheus_exometer
We have built our very own tracing platform long time ago. But it is time to go with the crowd and adopt OpenTelemetry which is a merger of OpenTracing and OpenCensus. There is an official client for OpenCensus - opencensus-erlang. OpenTracing is supported via Spandex Project. But only DataDog seems to be currently implemented exporter.
Nice write up about low-level ad hoc tracing - A guide to tracing in Elixir!