# Monitoring for sfp server

## OpenTelemetry Monitoring Setup

The sfp-pro stack includes an optional OpenTelemetry Collector that monitors all Docker containers (CPU, memory, network, block I/O) and host-level metrics, then exports them via OTLP HTTP to any compatible observability backend.

### Quick Start

1. Add the required env vars to your `.env` (or secrets provider)
2. Enable the monitoring profile: `COMPOSE_PROFILES=monitoring`
3. If your backend uses a header other than `api-key`, edit the config file

The collector config is mounted at `./config/otel-collector-config.yaml` and can be customized per deployment.

### Environment Variables

| Variable                      | Required | Description                              |
| ----------------------------- | -------- | ---------------------------------------- |
| `OTEL_EXPORTER_OTLP_ENDPOINT` | Yes      | OTLP HTTP endpoint URL of your backend   |
| `OTEL_AUTH_API_KEY`           | Yes\*    | Auth value sent in the configured header |

\* Not required for backends that don't need authentication (e.g. self-hosted Jaeger).

The monitoring profile activates automatically when `OTEL_EXPORTER_OTLP_ENDPOINT` is set and a secrets provider (e.g. Infisical) injects it at runtime.

### Changing the Auth Header

The default config sends auth via the `api-key` header, which works for New Relic. Other backends use different header names. To change it, edit the `headers` section in `./config/otel-collector-config.yaml`:

```yaml
exporters:
  otlp_http:
    endpoint: ${OTEL_EXPORTER_OTLP_ENDPOINT}
    headers:
      api-key: ${OTEL_AUTH_API_KEY}        # <-- change this key
```

***

### Backend Configuration Examples

#### New Relic

Works with the default config, no changes needed.

**Env vars:**

```env
OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp.nr-data.net
OTEL_AUTH_API_KEY=<your-new-relic-license-key>
```

**Config header** (default, no change needed):

```yaml
headers:
  api-key: ${OTEL_AUTH_API_KEY}
```

> Reference: [New Relic OTLP endpoint docs](https://docs.newrelic.com/docs/opentelemetry/best-practices/opentelemetry-otlp/)

***

#### Datadog

Datadog uses the `dd-api-key` header and requires delta metrics (the sfp collector config already includes the `cumulativetodelta` processor).

**Env vars:**

```env
OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp.datadoghq.com
OTEL_AUTH_API_KEY=<your-datadog-api-key>
```

**Config header** - change `api-key` to `dd-api-key`:

```yaml
headers:
  dd-api-key: ${OTEL_AUTH_API_KEY}
```

Optionally add metric translation config:

```yaml
headers:
  dd-api-key: ${OTEL_AUTH_API_KEY}
  dd-otel-metric-config: '{"resource_attributes_as_tags": true}'
```

> For EU, use `https://otlp.datadoghq.eu` as the endpoint.
>
> Reference: [Datadog OTLP metrics intake](https://docs.datadoghq.com/opentelemetry/setup/otlp_ingest/metrics/)

***

#### Grafana Cloud

Grafana Cloud uses HTTP Basic auth via the `Authorization` header. The value is `Basic <base64(instanceID:apiToken)>`.

**Generate the auth value:**

```bash
echo -n "<instance-id>:<api-token>" | base64
```

**Env vars:**

```env
OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp-gateway-<zone>.grafana.net/otlp
OTEL_AUTH_API_KEY=Basic <base64-encoded-credentials>
```

**Config header** - change `api-key` to `Authorization`:

```yaml
headers:
  Authorization: ${OTEL_AUTH_API_KEY}
```

> Replace `<zone>` with your Grafana Cloud zone (e.g. `prod-us-east-0`, `prod-eu-north-0`). Find your endpoint in Grafana Cloud Portal > Stack > Configure > OpenTelemetry.
>
> Reference: [Grafana Cloud OTLP docs](https://grafana.com/docs/grafana-cloud/send-data/otlp/send-data-otlp/)

***

#### Honeycomb

Honeycomb uses the `x-honeycomb-team` header.

**Env vars:**

```env
OTEL_EXPORTER_OTLP_ENDPOINT=https://api.honeycomb.io
OTEL_AUTH_API_KEY=<your-honeycomb-api-key>
```

**Config header** - change `api-key` to `x-honeycomb-team`:

```yaml
headers:
  x-honeycomb-team: ${OTEL_AUTH_API_KEY}
```

> For EU, use `https://api.eu1.honeycomb.io` as the endpoint.
>
> Reference: [Honeycomb collector docs](https://docs.honeycomb.io/send-data/opentelemetry/collector)

***

#### Self-Hosted / Generic OTLP (Jaeger, SigNoz, etc.)

For self-hosted backends that don't require authentication, you only need the endpoint.

**Env vars:**

```env
OTEL_EXPORTER_OTLP_ENDPOINT=http://jaeger:4318
```

**Config header** - remove the headers section entirely:

```yaml
exporters:
  otlp_http:
    endpoint: ${OTEL_EXPORTER_OTLP_ENDPOINT}
```

***

### What Gets Collected

The collector gathers two categories of metrics:

**Container metrics** (via `docker_stats` receiver):

* CPU usage per container (including per-CPU)
* Memory usage and limits
* Network I/O (bytes/packets in/out)
* Block I/O (reads/writes)

**Host metrics** (via `hostmetrics` receiver):

* CPU utilization
* Memory utilization
* Disk I/O and operations
* Filesystem usage
* Network traffic
* System load
* Paging and process counts

All metrics are tagged with:

* `sfp.tenant` - your tenant name (`TENANT_NAME` env var)
* `deployment.environment` - your environment (`NODE_ENV` env var)
* Host and OS metadata (auto-detected)

### Troubleshooting

**Collector not starting:**

* Check `docker logs <project>-otel-collector-1` for config errors
* Empty `TENANT_NAME` or `NODE_ENV` will cause the `resource` processor to fail; the docker-compose provides defaults (`unknown` / `production`)

**401 / 403 errors:**

* Verify your API key is correct
* Verify the header name matches your backend (see examples above)
* Check you're using the right regional endpoint

**No data in your backend:**

* Metrics are batched every 10 seconds; allow 30 seconds after startup
* The collector doesn't log successful exports, only errors — no error = working
* Verify `COMPOSE_PROFILES` includes `monitoring`


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.flxbl.io/flxbl/sfp-server/managing-your-sfp-server/monitoring-for-sfp-server.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
