tileserver-rs exports observability data via two independent pipelines:
- OTLP push — traces + metrics pushed to an OpenTelemetry collector over gRPC (Grafana Alloy, OTel Collector, Jaeger, …).
- Prometheus pull — a dedicated
/metricsHTTP listener exposing the same metrics in Prometheus text format for direct scraping.
Both pipelines feed off the same instrument handles, so enabling them in any combination has no effect on hot-path performance — recording a metric is a single atomic load when neither pipeline is active.
Quick Start
Prometheus scraping (recommended for production)
[telemetry]
prometheus_bind = "127.0.0.1:9100"
metrics_label_cardinality = "strict"
Then point Prometheus at http://your-host:9100/metrics. No OTLP collector required.
OTLP push (traces + metrics)
[telemetry]
enabled = true
endpoint = "http://localhost:4317"
This enables both traces and metrics, exported to an OTLP-compatible collector on port 4317 (gRPC).
Both at once
[telemetry]
enabled = true # OTLP push
endpoint = "http://localhost:4317"
prometheus_bind = "127.0.0.1:9100" # Prometheus pull
metrics_label_cardinality = "strict"
What's Exported
Traces
Every HTTP request creates a span with method, path, status, and duration. Traces use the standard tracing crate integration — all structured log events in the codebase become span events. Traces are OTLP-only (Prometheus does not have a trace data model).
Metrics
The same 10 metrics are emitted to both pipelines:
| Metric | Type | Unit | Labels | Description |
|---|---|---|---|---|
http_requests_total | counter | — | route, status_class | HTTP requests by matched route + status |
http_request_duration_seconds | histogram | s | route, status_class | Per-request latency |
http_requests_in_flight | up-down | — | — | In-flight HTTP requests |
tile_requests_total | counter | — | source, format, z_bucket, outcome | Tile lookups (hit/miss/not_found/error) |
tile_request_duration_seconds | histogram | s | source, format, z_bucket, outcome | End-to-end tile latency |
tile_request_bytes | histogram | By | source, format | Tile response payload size |
tile_cache_hits_total | counter | — | source | In-process tile cache hits |
tile_cache_misses_total | counter | — | source | In-process tile cache misses |
render_duration_seconds | histogram | s | style, format | Native MapLibre raster render duration |
render_errors_total | counter | — | style, reason | Native render failures bucketed by reason |
The route label uses the matched Axum route template (/data/{source}/{z}/{x}/{y_fmt}) — never the raw URL — so paths stay bounded. The status_class label collapses HTTP status codes into 1xx/2xx/3xx/4xx/5xx.
Configuration Reference
[telemetry]
# OTLP push pipeline
enabled = true # Enable OTLP traces + metrics push (default: false)
endpoint = "http://localhost:4317" # OTLP gRPC endpoint
service_name = "tileserver-rs" # Service name in traces/metrics
sample_rate = 1.0 # Trace sampling (0.0–1.0; metrics ignore this)
metrics_enabled = true # Enable OTLP metrics push (default: true)
metrics_export_interval_secs = 60 # OTLP metrics push interval (default: 60)
# Prometheus pull endpoint (independent of OTLP push)
prometheus_bind = "127.0.0.1:9100" # Bind address (unset = disabled, default)
prometheus_path = "/metrics" # HTTP path (default: "/metrics")
metrics_label_cardinality = "strict" # "strict" (default) | "verbose"
When neither enabled = true nor prometheus_bind is set (the default), all instruments are no-ops — recording a metric compiles to a single atomic load.
Prometheus Scrape Endpoint
prometheus_bind opts into a separate HTTP listener for Prometheus to scrape directly. Following the official Axum example, the listener runs on its own port so you can:
- Bind it to a private interface (
127.0.0.1for sidecar scraping, or a VPC-internal address) while keeping/data,/styles, and/__adminbehind their own ACLs. - Skip authentication — the metrics endpoint never sees your tile traffic.
- Restart/reload the metrics listener independently of the main server.
Scrape config
# prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: tileserver-rs
scrape_interval: 15s
static_configs:
- targets: ['tileserver-rs.internal:9100']
# For Kubernetes service discovery:
# kubernetes_sd_configs:
# - role: pod
# relabel_configs:
# - source_labels: [__meta_kubernetes_pod_label_app]
# regex: tileserver-rs
# action: keep
Health check
The metrics listener also exposes GET /metrics/health returning 200 OK for liveness/readiness probes.
Label Cardinality
Unbounded labels are the #1 cause of Prometheus memory blowups. The metrics_label_cardinality setting controls the cardinality budget:
strict (default — production-safe)
- Zoom is collapsed into
z_bucket=low(z 0–6),mid(z 7–12), orhigh(z 13+). - Tile
x/ycoordinates are dropped entirely. - HTTP
routeuses Axum's matched-path template, never the raw URL. - HTTP
status_classcollapses status codes into1xx/2xx/3xx/4xx/5xx.
Worst-case cardinality for a typical config (5 sources × 3 formats × 3 z_buckets × 4 outcomes = 180 series per metric) is comfortably within the <10,000 total series rule of thumb.
verbose (debug-only)
- Raw zoom levels 0..=22 are passed through.
- All other labels behave like
strict.
A globe-scale source served at z=22 has trillions of unique tile coordinates. Even with coordinates dropped, multiplying 5 sources × 3 formats × 23 zoom levels × 4 outcomes = 1,380 series per metric — manageable, but storage grows fast under sustained traffic. Do not leave verbose enabled in production. Use it for short-window investigations and switch back to strict afterwards.
Backend Setup Examples
Grafana Alloy + Tempo + Prometheus
This is the recommended stack for production. Grafana Alloy (formerly Grafana Agent) receives OTLP and forwards traces to Tempo and metrics to Prometheus.
# compose.yml
services:
tileserver:
image: ghcr.io/vinayakkulkarni/tileserver-rs:latest
ports:
- '8080:8080'
volumes:
- ./data:/data:ro
- ./config.toml:/app/config.toml:ro
alloy:
image: grafana/alloy:latest
ports:
- '4317:4317' # OTLP gRPC
- '12345:12345' # Alloy UI
volumes:
- ./alloy-config.alloy:/etc/alloy/config.alloy
command: ['run', '/etc/alloy/config.alloy']
tempo:
image: grafana/tempo:latest
ports:
- '3200:3200'
volumes:
- ./tempo.yaml:/etc/tempo.yaml
command: ['-config.file=/etc/tempo.yaml']
prometheus:
image: prom/prometheus:latest
ports:
- '9090:9090'
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
grafana:
image: grafana/grafana:latest
ports:
- '3000:3000'
environment:
- GF_AUTH_ANONYMOUS_ENABLED=true
- GF_AUTH_ANONYMOUS_ORG_ROLE=Admin
# config.toml
[telemetry]
enabled = true
endpoint = "http://alloy:4317"
Jaeger (Development)
For quick local development, Jaeger's all-in-one image accepts OTLP directly:
# compose.yml
services:
tileserver:
image: ghcr.io/vinayakkulkarni/tileserver-rs:latest
ports:
- '8080:8080'
volumes:
- ./data:/data:ro
- ./config.toml:/app/config.toml:ro
jaeger:
image: jaegertracing/jaeger:latest
ports:
- '4317:4317' # OTLP gRPC
- '16686:16686' # Jaeger UI
environment:
- COLLECTOR_OTLP_ENABLED=true
# config.toml
[telemetry]
enabled = true
endpoint = "http://jaeger:4317"
metrics_enabled = false # Jaeger is traces-only
Open http://localhost:16686 to view traces.
OpenTelemetry Collector
For maximum flexibility, use the official OTel Collector to fan-out to multiple backends:
# otel-collector-config.yaml
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
processors:
batch:
timeout: 10s
exporters:
otlp/tempo:
endpoint: tempo:4317
tls:
insecure: true
prometheusremotewrite:
endpoint: http://prometheus:9090/api/v1/write
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [otlp/tempo]
metrics:
receivers: [otlp]
processors: [batch]
exporters: [prometheusremotewrite]
# config.toml
[telemetry]
enabled = true
endpoint = "http://otel-collector:4317"
Performance Considerations
- Traces use batch export — spans are buffered and sent periodically, not per-request
- Metrics use a
PeriodicReader(default: every 60 seconds) — metric data points are aggregated in-memory and pushed at the configured interval - Instruments (counters, histograms) are lock-free — recording a metric is an atomic operation with negligible overhead
- When disabled, all instruments are no-ops. The only per-request cost is a single atomic load from
OnceLock
Sampling
For high-traffic deployments, reduce trace sampling to control export volume:
[telemetry]
enabled = true
sample_rate = 0.1 # Only export 10% of traces
Metrics are always exported at full fidelity regardless of sample_rate — sampling only affects traces.
Troubleshooting
No data appearing in backend
- Verify the endpoint is reachable:
curl -v http://localhost:4317 - Check tileserver logs for
OpenTelemetry initializedwithmetrics=true - Ensure your collector accepts OTLP gRPC (not HTTP) on the configured port
- Wait at least
metrics_export_interval_secsfor the first metrics push
High memory usage
If metrics cardinality is too high (many unique URL paths), consider using a collector with attribute filtering to drop or group url.path values.