Prometheus and Aegis
Read time in minutes: 10Last time in the xeiaso dot net cinematic universe:
Unix sockets started to be used to grace the cluster. Things were at peace. Then, a realization came through:

This could not do! Without observability the people of the Discord wouldn't have a livefeed of the infrastructure falling over! This cannot stand! Look, our hero takes action!

In order to help keep an eye on all of the services I run, I use Prometheus for collecting metrics. For an example of the kind of metrics I collect, see here (1). In the configuration that I have, Prometheus runs on a server in my apartment and reaches out to my other machines to scrape metrics over the network. This worked great when I had my major services listen over TCP, I could just point Prometheus at the backend port over my tunnel.
When I started using Unix sockets for hosting my services, this stopped working. It became very clear very quickly that I needed some kind of shim. This shim needed to do the following things:
- Listen over the network as a HTTP server
- Connect to the unix sockets for relevant services based on the path (eg.
/xesite
should get the metrics from/srv/within/run/xesite.sock
) - Do nothing else
The Go standard library has a tool for doing reverse proxying in the standard
library:
net/http/httputil#ReverseProxy
.
Maybe we could build something with this?

Transport
field?type ReverseProxy struct {
// ...
// The transport used to perform proxy requests.
// If nil, http.DefaultTransport is used.
Transport http.RoundTripper
// ...
}

RoundTripper
, which is a
function that takes a request and returns a response somehow. It uses
http.DefaultTransport
by default, which reads from the network. So at a
minimum we're gonna need: - a
ReverseProxy
- a
Transport
- a dialing function
- Right?
Yep! Unix sockets can be used like normal sockets, so all you need is something like this:
func proxyToUnixSocket(w http.ResponseWriter, r *http.Request) {
name := path.Base(r.URL.Path)
fname := filepath.Join(*sockdir, name+".sock")
_, err := os.Stat(fname)
if os.IsNotExist(err) {
http.NotFound(w, r)
return
}
ts := &http.Transport{
Dial: func(_, _ string) (net.Conn, error) {
return net.Dial("unix", fname)
},
DisableKeepAlives: true,
}
rp := httputil.ReverseProxy{
Director: func(req *http.Request) {
req.URL.Scheme = "http"
req.URL.Host = "aegis"
req.URL.Path = "/metrics"
req.URL.RawPath = "/metrics"
},
Transport: ts,
}
rp.ServeHTTP(w, r)
}

name := path.Base(r.URL.Path)
fname := filepath.Join(*sockdir, name+".sock")
_, err := os.Stat(fname)
if os.IsNotExist(err) {
http.NotFound(w, r)
return
}
ts := &http.Transport{
Dial: func(_, _ string) (net.Conn, error) {
return net.Dial("unix", fname)
},
DisableKeepAlives: true,
}

Yep. Then the rest is really just boilerplate:
package main
import (
"flag"
"log"
"net"
"net/http"
"net/http/httputil"
"os"
"path"
"path/filepath"
)
var (
hostport = flag.String("hostport", "[::]:31337", "TCP host:port to listen on")
sockdir = flag.String("sockdir", "./run", "directory full of unix sockets to monitor")
)
func main() {
flag.Parse()
log.SetFlags(0)
log.Printf("%s -> %s", *hostport, *sockdir)
http.DefaultServeMux.HandleFunc("/", proxyToUnixSocket)
log.Fatal(http.ListenAndServe(*hostport, nil))
}
Now all that's needed is to build a NixOS service out of this:
{ config, lib, pkgs, ... }:
let cfg = config.within.services.aegis;
in
with lib; {
# Mara\ this describes all of the configuration options for Aegis.
options.within.services.aegis = {
enable = mkEnableOption "Activates Aegis (unix socket prometheus proxy)";
# Mara\ This is the IPv6 host:port that the service should listen on.
# It's IPv6 because this is $CURRENT_YEAR.
hostport = mkOption {
type = types.str;
default = "[::1]:31337";
description = "The host:port that aegis should listen for traffic on";
};
# Mara\ This is the folder full of unix sockets. In the previous post we
# mentioned that the sockets should go somewhere like /tmp, however this
# may be a poor life decision:
# https://lobste.rs/s/fqqsct/unix_domain_sockets_for_serving_http#c_g4ljpf
sockdir = mkOption {
type = types.str;
default = "/srv/within/run";
example = "/srv/within/run";
description =
"The folder that aegis will read from";
};
};
# Mara\ The configuration that will arise from this module if it's enabled
config = mkIf cfg.enable {
# Mara\ Aegis has its own user account to keep things tidy. It doesn't need
# root to run so we don't give it root.
users.users.aegis = {
createHome = true;
description = "tulpa.dev/cadey/aegis";
isSystemUser = true;
group = "within";
home = "/srv/within/aegis";
};
# Mara\ The systemd service that actually runs Aegis.
systemd.services.aegis = {
wantedBy = [ "multi-user.target" ];
# Mara\ These correlate to the [Service] block in the systemd unit.
serviceConfig = {
User = "aegis";
Group = "within";
Restart = "on-failure";
WorkingDirectory = "/srv/within/aegis";
RestartSec = "30s";
};
# Mara\ When the service starts up, run this script.
script = let aegis = pkgs.tulpa.dev.cadey.aegis;
in ''
exec ${aegis}/bin/aegis -sockdir="${cfg.sockdir}" -hostport="${cfg.hostport}"
'';
};
};
}

within.services.aegis = {
enable = true;
hostport = "[fda2:d982:1da2:180d:b7a4:9c5c:989b:ba02]:43705";
sockdir = "/srv/within/run";
};

curl
:$ curl http://[fda2:d982:1da2:180d:b7a4:9c5c:989b:ba02]:43705/printerfacts
# HELP printerfacts_hits Number of hits to various pages
# TYPE printerfacts_hits counter
printerfacts_hits{page="fact"} 15
printerfacts_hits{page="index"} 23
printerfacts_hits{page="not_found"} 17
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 0.06
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1024
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 12
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 5296128
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1617458164.36
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 911777792


This is another experiment in writing these kinds of posts in more of a Socratic method. I'm trying to strike a balance with a limited pool of stickers while I wait for more stickers/emoji to come in. Feedback is always welcome.
(1): These metrics are not perfect because of the level of caching that Cloudflare does for me.