Telemetry & Observability

tileserver-rs exports observability data via two independent pipelines:

OTLP push — traces + metrics pushed to an OpenTelemetry collector over gRPC (Grafana Alloy, OTel Collector, Jaeger, …).
Prometheus pull — a dedicated /metrics HTTP listener exposing the same metrics in Prometheus text format for direct scraping.

Both pipelines feed off the same instrument handles, so enabling them in any combination has no effect on hot-path performance — recording a metric is a single atomic load when neither pipeline is active.

Quick Start

Prometheus scraping (recommended for production)

[telemetry]
prometheus_bind = "127.0.0.1:9100"
metrics_label_cardinality = "strict"

Then point Prometheus at http://your-host:9100/metrics. No OTLP collector required.

OTLP push (traces + metrics)

[telemetry]
enabled = true
endpoint = "http://localhost:4317"

This enables both traces and metrics, exported to an OTLP-compatible collector on port 4317 (gRPC).

Both at once

[telemetry]
enabled = true                       # OTLP push
endpoint = "http://localhost:4317"
prometheus_bind = "127.0.0.1:9100"   # Prometheus pull
metrics_label_cardinality = "strict"

What's Exported

Traces

Every HTTP request creates a span with method, path, status, and duration. Traces use the standard tracing crate integration — all structured log events in the codebase become span events. Traces are OTLP-only (Prometheus does not have a trace data model).

Metrics

The same 10 metrics are emitted to both pipelines:

Metric	Type	Unit	Labels	Description
`http_requests_total`	counter	—	`route`, `status_class`	HTTP requests by matched route + status
`http_request_duration_seconds`	histogram	s	`route`, `status_class`	Per-request latency
`http_requests_in_flight`	up-down	—	—	In-flight HTTP requests
`tile_requests_total`	counter	—	`source`, `format`, `z_bucket`, `outcome`	Tile lookups (hit/miss/not_found/error)
`tile_request_duration_seconds`	histogram	s	`source`, `format`, `z_bucket`, `outcome`	End-to-end tile latency
`tile_request_bytes`	histogram	By	`source`, `format`	Tile response payload size
`tile_cache_hits_total`	counter	—	`source`	In-process tile cache hits
`tile_cache_misses_total`	counter	—	`source`	In-process tile cache misses
`render_duration_seconds`	histogram	s	`style`, `format`	Native MapLibre raster render duration
`render_errors_total`	counter	—	`style`, `reason`	Native render failures bucketed by reason

The route label uses the matched Axum route template (/data/{source}/{z}/{x}/{y_fmt}) — never the raw URL — so paths stay bounded. The status_class label collapses HTTP status codes into 1xx/2xx/3xx/4xx/5xx.

Configuration Reference

[telemetry]
# OTLP push pipeline
enabled = true                       # Enable OTLP traces + metrics push (default: false)
endpoint = "http://localhost:4317"   # OTLP gRPC endpoint
service_name = "tileserver-rs"       # Service name in traces/metrics
sample_rate = 1.0                    # Trace sampling (0.0–1.0; metrics ignore this)
metrics_enabled = true               # Enable OTLP metrics push (default: true)
metrics_export_interval_secs = 60    # OTLP metrics push interval (default: 60)

# Prometheus pull endpoint (independent of OTLP push)
prometheus_bind = "127.0.0.1:9100"   # Bind address (unset = disabled, default)
prometheus_path = "/metrics"         # HTTP path (default: "/metrics")
metrics_label_cardinality = "strict" # "strict" (default) | "verbose"

Info

When neither enabled = true nor prometheus_bind is set (the default), all instruments are no-ops — recording a metric compiles to a single atomic load.

Prometheus Scrape Endpoint

prometheus_bind opts into a separate HTTP listener for Prometheus to scrape directly. Following the official Axum example, the listener runs on its own port so you can:

Bind it to a private interface (127.0.0.1 for sidecar scraping, or a VPC-internal address) while keeping /data, /styles, and /__admin behind their own ACLs.
Skip authentication — the metrics endpoint never sees your tile traffic.
Restart/reload the metrics listener independently of the main server.

Scrape config

# prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: tileserver-rs
    scrape_interval: 15s
    static_configs:
      - targets: ['tileserver-rs.internal:9100']
    # For Kubernetes service discovery:
    # kubernetes_sd_configs:
    #   - role: pod
    # relabel_configs:
    #   - source_labels: [__meta_kubernetes_pod_label_app]
    #     regex: tileserver-rs
    #     action: keep

Health check

The metrics listener also exposes GET /metrics/health returning 200 OK for liveness/readiness probes.

Label Cardinality

Unbounded labels are the #1 cause of Prometheus memory blowups. The metrics_label_cardinality setting controls the cardinality budget:

`strict` (default — production-safe)

Zoom is collapsed into z_bucket = low (z 0–6), mid (z 7–12), or high (z 13+).
Tile x/y coordinates are dropped entirely.
HTTP route uses Axum's matched-path template, never the raw URL.
HTTP status_class collapses status codes into 1xx/2xx/3xx/4xx/5xx.

Worst-case cardinality for a typical config (5 sources × 3 formats × 3 z_buckets × 4 outcomes = 180 series per metric) is comfortably within the <10,000 total series rule of thumb.

`verbose` (debug-only)

Raw zoom levels 0..=22 are passed through.
All other labels behave like strict.

Warning

A globe-scale source served at z=22 has trillions of unique tile coordinates. Even with coordinates dropped, multiplying 5 sources × 3 formats × 23 zoom levels × 4 outcomes = 1,380 series per metric — manageable, but storage grows fast under sustained traffic. Do not leave verbose enabled in production. Use it for short-window investigations and switch back to strict afterwards.

Backend Setup Examples

Grafana Alloy + Tempo + Prometheus

This is the recommended stack for production. Grafana Alloy (formerly Grafana Agent) receives OTLP and forwards traces to Tempo and metrics to Prometheus.

# compose.yml
services:
  tileserver:
    image: ghcr.io/vinayakkulkarni/tileserver-rs:latest
    ports:
      - '8080:8080'
    volumes:
      - ./data:/data:ro
      - ./config.toml:/app/config.toml:ro

  alloy:
    image: grafana/alloy:latest
    ports:
      - '4317:4317' # OTLP gRPC
      - '12345:12345' # Alloy UI
    volumes:
      - ./alloy-config.alloy:/etc/alloy/config.alloy
    command: ['run', '/etc/alloy/config.alloy']

  tempo:
    image: grafana/tempo:latest
    ports:
      - '3200:3200'
    volumes:
      - ./tempo.yaml:/etc/tempo.yaml
    command: ['-config.file=/etc/tempo.yaml']

  prometheus:
    image: prom/prometheus:latest
    ports:
      - '9090:9090'
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml

  grafana:
    image: grafana/grafana:latest
    ports:
      - '3000:3000'
    environment:
      - GF_AUTH_ANONYMOUS_ENABLED=true
      - GF_AUTH_ANONYMOUS_ORG_ROLE=Admin

# config.toml
[telemetry]
enabled = true
endpoint = "http://alloy:4317"

Jaeger (Development)

For quick local development, Jaeger's all-in-one image accepts OTLP directly:

# compose.yml
services:
  tileserver:
    image: ghcr.io/vinayakkulkarni/tileserver-rs:latest
    ports:
      - '8080:8080'
    volumes:
      - ./data:/data:ro
      - ./config.toml:/app/config.toml:ro

  jaeger:
    image: jaegertracing/jaeger:latest
    ports:
      - '4317:4317' # OTLP gRPC
      - '16686:16686' # Jaeger UI
    environment:
      - COLLECTOR_OTLP_ENABLED=true

# config.toml
[telemetry]
enabled = true
endpoint = "http://jaeger:4317"
metrics_enabled = false  # Jaeger is traces-only

Open http://localhost:16686 to view traces.

OpenTelemetry Collector

For maximum flexibility, use the official OTel Collector to fan-out to multiple backends:

# otel-collector-config.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317

processors:
  batch:
    timeout: 10s

exporters:
  otlp/tempo:
    endpoint: tempo:4317
    tls:
      insecure: true
  prometheusremotewrite:
    endpoint: http://prometheus:9090/api/v1/write

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlp/tempo]
    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [prometheusremotewrite]

# config.toml
[telemetry]
enabled = true
endpoint = "http://otel-collector:4317"

Performance Considerations

Traces use batch export — spans are buffered and sent periodically, not per-request
Metrics use a PeriodicReader (default: every 60 seconds) — metric data points are aggregated in-memory and pushed at the configured interval
Instruments (counters, histograms) are lock-free — recording a metric is an atomic operation with negligible overhead
When disabled, all instruments are no-ops. The only per-request cost is a single atomic load from OnceLock

Sampling

For high-traffic deployments, reduce trace sampling to control export volume:

[telemetry]
enabled = true
sample_rate = 0.1  # Only export 10% of traces

Metrics are always exported at full fidelity regardless of sample_rate — sampling only affects traces.

Troubleshooting

No data appearing in backend

Verify the endpoint is reachable: curl -v http://localhost:4317
Check tileserver logs for OpenTelemetry initialized with metrics=true
Ensure your collector accepts OTLP gRPC (not HTTP) on the configured port
Wait at least metrics_export_interval_secs for the first metrics push

High memory usage

If metrics cardinality is too high (many unique URL paths), consider using a collector with attribute filtering to drop or group url.path values.