Monitoring for sfp server

OpenTelemetry Monitoring Setup

The sfp-pro stack includes an optional OpenTelemetry Collector that monitors all Docker containers (CPU, memory, network, block I/O) and host-level metrics, then exports them via OTLP HTTP to any compatible observability backend.

Quick Start

  1. Add the required env vars to your .env (or secrets provider)

  2. Enable the monitoring profile: COMPOSE_PROFILES=monitoring

  3. If your backend uses a header other than api-key, edit the config file

The collector config is mounted at ./config/otel-collector-config.yaml and can be customized per deployment.

Environment Variables

Variable
Required
Description

OTEL_EXPORTER_OTLP_ENDPOINT

Yes

OTLP HTTP endpoint URL of your backend

OTEL_AUTH_API_KEY

Yes*

Auth value sent in the configured header

* Not required for backends that don't need authentication (e.g. self-hosted Jaeger).

The monitoring profile activates automatically when OTEL_EXPORTER_OTLP_ENDPOINT is set and a secrets provider (e.g. Infisical) injects it at runtime.

Changing the Auth Header

The default config sends auth via the api-key header, which works for New Relic. Other backends use different header names. To change it, edit the headers section in ./config/otel-collector-config.yaml:

exporters:
  otlp_http:
    endpoint: ${OTEL_EXPORTER_OTLP_ENDPOINT}
    headers:
      api-key: ${OTEL_AUTH_API_KEY}        # <-- change this key

Backend Configuration Examples

New Relic

Works with the default config, no changes needed.

Env vars:

Config header (default, no change needed):

Reference: New Relic OTLP endpoint docs


Datadog

Datadog uses the dd-api-key header and requires delta metrics (the sfp collector config already includes the cumulativetodelta processor).

Env vars:

Config header - change api-key to dd-api-key:

Optionally add metric translation config:

For EU, use https://otlp.datadoghq.eu as the endpoint.

Reference: Datadog OTLP metrics intake


Grafana Cloud

Grafana Cloud uses HTTP Basic auth via the Authorization header. The value is Basic <base64(instanceID:apiToken)>.

Generate the auth value:

Env vars:

Config header - change api-key to Authorization:

Replace <zone> with your Grafana Cloud zone (e.g. prod-us-east-0, prod-eu-north-0). Find your endpoint in Grafana Cloud Portal > Stack > Configure > OpenTelemetry.

Reference: Grafana Cloud OTLP docs


Honeycomb

Honeycomb uses the x-honeycomb-team header.

Env vars:

Config header - change api-key to x-honeycomb-team:

For EU, use https://api.eu1.honeycomb.io as the endpoint.

Reference: Honeycomb collector docs


Self-Hosted / Generic OTLP (Jaeger, SigNoz, etc.)

For self-hosted backends that don't require authentication, you only need the endpoint.

Env vars:

Config header - remove the headers section entirely:


What Gets Collected

The collector gathers two categories of metrics:

Container metrics (via docker_stats receiver):

  • CPU usage per container (including per-CPU)

  • Memory usage and limits

  • Network I/O (bytes/packets in/out)

  • Block I/O (reads/writes)

Host metrics (via hostmetrics receiver):

  • CPU utilization

  • Memory utilization

  • Disk I/O and operations

  • Filesystem usage

  • Network traffic

  • System load

  • Paging and process counts

All metrics are tagged with:

  • sfp.tenant - your tenant name (TENANT_NAME env var)

  • deployment.environment - your environment (NODE_ENV env var)

  • Host and OS metadata (auto-detected)

Troubleshooting

Collector not starting:

  • Check docker logs <project>-otel-collector-1 for config errors

  • Empty TENANT_NAME or NODE_ENV will cause the resource processor to fail; the docker-compose provides defaults (unknown / production)

401 / 403 errors:

  • Verify your API key is correct

  • Verify the header name matches your backend (see examples above)

  • Check you're using the right regional endpoint

No data in your backend:

  • Metrics are batched every 10 seconds; allow 30 seconds after startup

  • The collector doesn't log successful exports, only errors — no error = working

  • Verify COMPOSE_PROFILES includes monitoring

Last updated

Was this helpful?