# process-exporter Prometheus exporter that mines /proc to report on selected processes. The premise for this exporter is that sometimes you have apps that are impractical to instrument directly, either because you don't control the code or they're written in a language that isn't easy to instrument with Prometheus. A fair bit of information can be gleaned from /proc, especially for long-running programs. For most systems it won't be beneficial to create metrics for every process by name: there are just too many of them and most don't do enough to merit it. Various command-line options are provided to control how processes are grouped and the groups are named. Run "process-exporter -man" to see a help page giving details. Metrics available currently include CPU usage, bytes written and read, and number of processes in each group. Bytes read and written come from /proc/[pid]/io in recent enough kernels. These correspond to the fields `read_bytes` and `write_bytes` respectively. These IO stats come with plenty of caveats, see either the Linux kernel documentation or man 5 proc. CPU usage comes from /proc/[pid]/stat fields utime (user time) and stime (system time.) It has been translated into fractional seconds of CPU consumed. Since it is a counter, using rate() will tell you how many fractional cores were running code from this process during the interval given. An example Grafana dashboard to view the metrics is available at https://grafana.net/dashboards/249 ## Instrumentation cost process-exporter will consume CPU in proportion to the number of processes in the system and the rate at which new ones are created. The most expensive parts - applying regexps and executing templates - are only applied once per process seen. If you have mostly long-running processes process-exporter should be lightweight: each time a scrape occurs, parsing of /proc/$pid/stat and /proc/$pid/cmdline for every process being monitored and adding a few numbers. ## Config To select and group the processes to monitor, either provide command-line arguments or use a YAML configuration file. To avoid confusion with the cmdline YAML element, we'll refer to the null-delimited contents of `/proc//cmdline` as the array `argv[]`. Each item in `process_names` gives a recipe for identifying and naming processes. The optional `name` tag defines a template to use to name matching processes; if not specified, `name` defaults to `{{.ExeBase}}`. Template variables available: - `{{.ExeBase}}` contains the basename of the executable - `{{.ExeFull}}` contains the fully qualified path of the executable - `{{.Matches}}` map contains all the matches resulting from applying cmdline regexps Each item in `process_names` must contain one or more selectors (`comm`, `exe` or `cmdline`); if more than one selector is present, they must all match. Each selector is a list of strings to match against a process's `comm`, `argv[0]`, or in the case of `cmdline`, a regexp to apply to the command line. For `comm` and `exe`, the list of strings is an OR, meaning any process matching any of the strings will be added to the item's group. For `cmdline`, the list of regexes is an AND, meaning they all must match. Any capturing groups in a regexp must use the `?P` option to assign a name to the capture, which is used to populate `.Matches`. A process may only belong to one group: even if multiple items would match, the first one listed in the file wins. Other performance tips: give an exe or comm clause in addition to any cmdline clause, so you avoid executing the regexp when the executable name doesn't match. ``` process_names: # comm is the second field of /proc//stat minus parens. # It is the base executable name, truncated at 15 chars. # It cannot be modified by the program, unlike exe. - comm: - bash # exe is argv[0]. If no slashes, only basename of argv[0] need match. # If exe contains slashes, argv[0] must match exactly. - exe: - postgres - /usr/local/bin/prometheus # cmdline is a list of regexps applied to argv. # Each must match, and any captures are added to the .Matches map. - name: "{{.ExeFull}}:{{.Matches.Cfgfile}}" exe: - /usr/local/bin/process-exporter cmdline: - -config.path\\s+(?P\\S+) ``` Here's the config I use on my home machine: ``` process_names: - comm: - chromium-browse - bash - prometheus - gvim - exe: - /sbin/upstart cmdline: - --user name: upstart:-user ``` ## Docker A docker image can be created with ``` make docker ``` Then run the docker, e.g. ``` docker run --privileged --name pexporter -d -v /proc:/host/proc -p 127.0.0.1:9256:9256 process-exporter:master -procfs /host/proc -procnames chromium-browse,bash,prometheus,gvim,upstart:-user -namemapping "upstart,(-user)" ``` This will expose metrics on http://localhost:9256/metrics. Leave off the `127.0.0.1:` to publish on all interfaces. Leave off the --priviliged and add the --user docker run argument if you only need to monitor processes belonging to a single user. ## History An earlier version of this exporter had options to enable auto-discovery of which processes were consuming resources. This functionality has been removed. These options were based on a percentage of resource usage, e.g. if an untracked process consumed X% of CPU during a scrape, start tracking processes with that name. However during any given scrape it's likely that most processes are idle, so we could add a process that consumes minimal resources but which happened to be active during the interval preceding the current scrape. Over time this means that a great many processes wind up being scraped, which becomes unmanageable to visualize. This could be mitigated by looking at resource usage over longer intervals, but ultimately I didn't feel this feature was important enough to invest more time in at this point. It may re-appear at some point in the future, but no promises. Another lost feature: the "other" group was used to count usage by non-tracked procs. This was useful to get an idea of what wasn't being monitored. But it comes at a high cost: if you know what processes you care about, you're wasting a lot of CPU to compute the usage of everything else that you don't care about. The new approach is to minimize resources expended on non-tracked processes and to require the user to whitelist the processes to track.