Documentation

tileserver-rs exports observability data via two independent pipelines:

  • OTLP push — traces + metrics pushed to an OpenTelemetry collector over gRPC (Grafana Alloy, OTel Collector, Jaeger, …).
  • Prometheus pull — a dedicated /metrics HTTP listener exposing the same metrics in Prometheus text format for direct scraping.

Both pipelines feed off the same instrument handles, so enabling them in any combination has no effect on hot-path performance — recording a metric is a single atomic load when neither pipeline is active.

Quick Start

[telemetry]
prometheus_bind = "127.0.0.1:9100"
metrics_label_cardinality = "strict"

Then point Prometheus at http://your-host:9100/metrics. No OTLP collector required.

OTLP push (traces + metrics)

[telemetry]
enabled = true
endpoint = "http://localhost:4317"

This enables both traces and metrics, exported to an OTLP-compatible collector on port 4317 (gRPC).

Both at once

[telemetry]
enabled = true                       # OTLP push
endpoint = "http://localhost:4317"
prometheus_bind = "127.0.0.1:9100"   # Prometheus pull
metrics_label_cardinality = "strict"

What's Exported

Traces

Every HTTP request creates a span with method, path, status, and duration. Traces use the standard tracing crate integration — all structured log events in the codebase become span events. Traces are OTLP-only (Prometheus does not have a trace data model).

Metrics

The same 10 metrics are emitted to both pipelines:

MetricTypeUnitLabelsDescription
http_requests_totalcounterroute, status_classHTTP requests by matched route + status
http_request_duration_secondshistogramsroute, status_classPer-request latency
http_requests_in_flightup-downIn-flight HTTP requests
tile_requests_totalcountersource, format, z_bucket, outcomeTile lookups (hit/miss/not_found/error)
tile_request_duration_secondshistogramssource, format, z_bucket, outcomeEnd-to-end tile latency
tile_request_byteshistogramBysource, formatTile response payload size
tile_cache_hits_totalcountersourceIn-process tile cache hits
tile_cache_misses_totalcountersourceIn-process tile cache misses
render_duration_secondshistogramsstyle, formatNative MapLibre raster render duration
render_errors_totalcounterstyle, reasonNative render failures bucketed by reason

The route label uses the matched Axum route template (/data/{source}/{z}/{x}/{y_fmt}) — never the raw URL — so paths stay bounded. The status_class label collapses HTTP status codes into 1xx/2xx/3xx/4xx/5xx.

Configuration Reference

[telemetry]
# OTLP push pipeline
enabled = true                       # Enable OTLP traces + metrics push (default: false)
endpoint = "http://localhost:4317"   # OTLP gRPC endpoint
service_name = "tileserver-rs"       # Service name in traces/metrics
sample_rate = 1.0                    # Trace sampling (0.0–1.0; metrics ignore this)
metrics_enabled = true               # Enable OTLP metrics push (default: true)
metrics_export_interval_secs = 60    # OTLP metrics push interval (default: 60)

# Prometheus pull endpoint (independent of OTLP push)
prometheus_bind = "127.0.0.1:9100"   # Bind address (unset = disabled, default)
prometheus_path = "/metrics"         # HTTP path (default: "/metrics")
metrics_label_cardinality = "strict" # "strict" (default) | "verbose"
Info

When neither enabled = true nor prometheus_bind is set (the default), all instruments are no-ops — recording a metric compiles to a single atomic load.

Prometheus Scrape Endpoint

prometheus_bind opts into a separate HTTP listener for Prometheus to scrape directly. Following the official Axum example, the listener runs on its own port so you can:

  • Bind it to a private interface (127.0.0.1 for sidecar scraping, or a VPC-internal address) while keeping /data, /styles, and /__admin behind their own ACLs.
  • Skip authentication — the metrics endpoint never sees your tile traffic.
  • Restart/reload the metrics listener independently of the main server.

Scrape config

# prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: tileserver-rs
    scrape_interval: 15s
    static_configs:
      - targets: ['tileserver-rs.internal:9100']
    # For Kubernetes service discovery:
    # kubernetes_sd_configs:
    #   - role: pod
    # relabel_configs:
    #   - source_labels: [__meta_kubernetes_pod_label_app]
    #     regex: tileserver-rs
    #     action: keep

Health check

The metrics listener also exposes GET /metrics/health returning 200 OK for liveness/readiness probes.

Label Cardinality

Unbounded labels are the #1 cause of Prometheus memory blowups. The metrics_label_cardinality setting controls the cardinality budget:

strict (default — production-safe)

  • Zoom is collapsed into z_bucket = low (z 0–6), mid (z 7–12), or high (z 13+).
  • Tile x/y coordinates are dropped entirely.
  • HTTP route uses Axum's matched-path template, never the raw URL.
  • HTTP status_class collapses status codes into 1xx/2xx/3xx/4xx/5xx.

Worst-case cardinality for a typical config (5 sources × 3 formats × 3 z_buckets × 4 outcomes = 180 series per metric) is comfortably within the <10,000 total series rule of thumb.

verbose (debug-only)

  • Raw zoom levels 0..=22 are passed through.
  • All other labels behave like strict.
Warning

A globe-scale source served at z=22 has trillions of unique tile coordinates. Even with coordinates dropped, multiplying 5 sources × 3 formats × 23 zoom levels × 4 outcomes = 1,380 series per metric — manageable, but storage grows fast under sustained traffic. Do not leave verbose enabled in production. Use it for short-window investigations and switch back to strict afterwards.

Backend Setup Examples

Grafana Alloy + Tempo + Prometheus

This is the recommended stack for production. Grafana Alloy (formerly Grafana Agent) receives OTLP and forwards traces to Tempo and metrics to Prometheus.

# compose.yml
services:
  tileserver:
    image: ghcr.io/vinayakkulkarni/tileserver-rs:latest
    ports:
      - '8080:8080'
    volumes:
      - ./data:/data:ro
      - ./config.toml:/app/config.toml:ro

  alloy:
    image: grafana/alloy:latest
    ports:
      - '4317:4317' # OTLP gRPC
      - '12345:12345' # Alloy UI
    volumes:
      - ./alloy-config.alloy:/etc/alloy/config.alloy
    command: ['run', '/etc/alloy/config.alloy']

  tempo:
    image: grafana/tempo:latest
    ports:
      - '3200:3200'
    volumes:
      - ./tempo.yaml:/etc/tempo.yaml
    command: ['-config.file=/etc/tempo.yaml']

  prometheus:
    image: prom/prometheus:latest
    ports:
      - '9090:9090'
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml

  grafana:
    image: grafana/grafana:latest
    ports:
      - '3000:3000'
    environment:
      - GF_AUTH_ANONYMOUS_ENABLED=true
      - GF_AUTH_ANONYMOUS_ORG_ROLE=Admin
# config.toml
[telemetry]
enabled = true
endpoint = "http://alloy:4317"

Jaeger (Development)

For quick local development, Jaeger's all-in-one image accepts OTLP directly:

# compose.yml
services:
  tileserver:
    image: ghcr.io/vinayakkulkarni/tileserver-rs:latest
    ports:
      - '8080:8080'
    volumes:
      - ./data:/data:ro
      - ./config.toml:/app/config.toml:ro

  jaeger:
    image: jaegertracing/jaeger:latest
    ports:
      - '4317:4317' # OTLP gRPC
      - '16686:16686' # Jaeger UI
    environment:
      - COLLECTOR_OTLP_ENABLED=true
# config.toml
[telemetry]
enabled = true
endpoint = "http://jaeger:4317"
metrics_enabled = false  # Jaeger is traces-only

Open http://localhost:16686 to view traces.

OpenTelemetry Collector

For maximum flexibility, use the official OTel Collector to fan-out to multiple backends:

# otel-collector-config.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317

processors:
  batch:
    timeout: 10s

exporters:
  otlp/tempo:
    endpoint: tempo:4317
    tls:
      insecure: true
  prometheusremotewrite:
    endpoint: http://prometheus:9090/api/v1/write

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlp/tempo]
    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [prometheusremotewrite]
# config.toml
[telemetry]
enabled = true
endpoint = "http://otel-collector:4317"

Performance Considerations

  • Traces use batch export — spans are buffered and sent periodically, not per-request
  • Metrics use a PeriodicReader (default: every 60 seconds) — metric data points are aggregated in-memory and pushed at the configured interval
  • Instruments (counters, histograms) are lock-free — recording a metric is an atomic operation with negligible overhead
  • When disabled, all instruments are no-ops. The only per-request cost is a single atomic load from OnceLock

Sampling

For high-traffic deployments, reduce trace sampling to control export volume:

[telemetry]
enabled = true
sample_rate = 0.1  # Only export 10% of traces

Metrics are always exported at full fidelity regardless of sample_rate — sampling only affects traces.

Troubleshooting

No data appearing in backend

  1. Verify the endpoint is reachable: curl -v http://localhost:4317
  2. Check tileserver logs for OpenTelemetry initialized with metrics=true
  3. Ensure your collector accepts OTLP gRPC (not HTTP) on the configured port
  4. Wait at least metrics_export_interval_secs for the first metrics push

High memory usage

If metrics cardinality is too high (many unique URL paths), consider using a collector with attribute filtering to drop or group url.path values.