Skip to content

simon-data/trino-buffer-exporter

Trino Buffer Exporter

A Prometheus exporter that collects output buffer metrics from Trino workers, enabling safe Horizontal Pod Autoscaler (HPA) scale-down decisions for Trino clusters running on Kubernetes.

Why This Exists

Trino workers involved in distributed queries hold output buffers containing intermediate data destined for downstream stages. Scaling down a worker that has active output buffers kills the query for all users. Standard CPU/memory metrics cannot detect this -- you need visibility into the query execution layer.

This exporter solves the problem by:

  1. Querying each Trino coordinator for all running queries
  2. Streaming-parsing the (often massive) query detail JSON to extract per-worker output buffer state
  3. Mapping worker IPs to Kubernetes pod names via Prometheus
  4. Exposing per-pod buffer metrics that HPA or custom controllers can use to protect active workers from scale-down

Architecture

+---------------------+          +------------------------+
|  Trino Coordinator   |          |  Prometheus Server      |
|  /v1/query (REST)    |<---------+  (IP-to-pod mapping)   |
+----------+----------+          +----------+-------------+
           |                                |
           v                                v
+----------+--------------------------------+-------------+
|                Trino Buffer Exporter                     |
|                                                          |
|  1. Query Prometheus for worker IP -> pod name mapping   |
|  2. GET /v1/query?state=RUNNING from each coordinator    |
|  3. Stream-parse /v1/query/{id} for outputBuffers stats  |
|  4. Aggregate per-worker and expose as Prometheus gauges |
+----------------------------+-----------------------------+
                             |
                             v
                   +---------+---------+
                   |  /metrics endpoint |
                   |  (port 8000)       |
                   +---------+---------+
                             |
                             v
                   +---------+---------+
                   |  Prometheus scrape |-----> HPA / Alerts
                   +-------------------+

Exported Metrics

Metric Type Labels Description
trino_worker_output_buffered_bytes Gauge pod, namespace, release Total bytes in output buffers waiting to be consumed
trino_worker_output_buffered_pages Gauge pod, namespace, release Total pages in output buffers
trino_worker_active_output_buffers Gauge pod, namespace, release Count of output buffers NOT in FINISHED state
trino_worker_output_pages_sent Gauge pod, namespace, release Total pages already sent from output buffers

Quick Start

Build the Docker Image

docker build -t trino-buffer-exporter:latest .

Deploy with Helm

helm upgrade --install trino-buffer-exporter ./chart \
  -n trino \
  -f chart/values.yaml

For environment-specific overrides, layer an additional values file:

helm upgrade --install trino-buffer-exporter ./chart \
  -n trino \
  -f chart/values.yaml \
  -f examples/values-blue-green.yaml

Configuration Reference

chart/values.yaml

Parameter Default Description
image.repository trino-buffer-exporter Docker image repository
image.tag latest Docker image tag
image.pullPolicy IfNotPresent Image pull policy
namespace trino Kubernetes namespace for deployment
prometheus.url http://prometheus-server.prometheus.svc:80 Prometheus server URL for IP-to-pod queries
prometheus.auth.type none Prometheus auth type: none, basic, bearer
prometheus.auth.existingSecret "" K8s Secret name for Prometheus credentials
trinoAuth.type header Trino auth type: header, basic, bearer
trinoAuth.headerName X-Trino-User Header name for header-based auth
trinoAuth.headerValue "" Header value for header-based auth
trinoAuth.existingSecret "" K8s Secret name for Trino credentials
coordinators {default: {url: ..., release: ...}} Map of Trino coordinator endpoints
workerHttpPort 8080 HTTP port workers listen on (for IP mapping)
ipToPodMetric trino_execution_executor_TaskExecutor_Tasks Prometheus metric used for IP-to-pod mapping
pollIntervalSeconds 15 Seconds between collection cycles
requestTimeoutSeconds 60 HTTP request timeout for Trino API calls
metricsPort 8000 Port for the /metrics endpoint
logging.level INFO Log level: DEBUG, INFO, WARNING, ERROR
serviceMonitor.enabled true Create a Prometheus ServiceMonitor resource
serviceMonitor.interval 15s Scrape interval for ServiceMonitor
serviceMonitor.namespace "" Namespace for ServiceMonitor (defaults to deployment namespace)
serviceMonitor.additionalLabels {} Extra labels on the ServiceMonitor
resources.requests.cpu 100m CPU request
resources.requests.memory 256Mi Memory request
resources.limits.cpu 500m CPU limit
resources.limits.memory 1Gi Memory limit
nodeAffinity.requiredLabels {} Node affinity label requirements (empty = no constraint)
tolerations [] Pod tolerations
nodeSelector {} Pod node selector
datadog.enabled false Enable Datadog log annotations
datadog.source trino-buffer-exporter Datadog log source
datadog.service trino-buffer-exporter Datadog log service

Coordinator Configuration

Each coordinator entry requires:

coordinators:
  <name>:
    url: "http://<service>.<namespace>.svc:<port>"
    release: "<helm-release-name>"

The release label is attached to exported metrics so you can distinguish workers from different Trino deployments.

Authentication

Trino Authentication

Three auth modes are supported:

Header-based (default): Sends a custom header with each request. This is the standard Trino approach for single-user service accounts.

trinoAuth:
  type: "header"
  headerName: "X-Trino-User"
  headerValue: "monitoring"

Basic auth: Username/password via HTTP Basic Authentication. Store the password in a Kubernetes Secret.

trinoAuth:
  type: "basic"
  existingSecret: "trino-credentials"  # Must have a 'password' key

Bearer token: Token-based authentication (e.g., OAuth2, JWT).

trinoAuth:
  type: "bearer"
  existingSecret: "trino-credentials"  # Must have a 'token' key

Prometheus Authentication

If your Prometheus server requires authentication:

prometheus:
  url: "https://prometheus.example.com"
  auth:
    type: "bearer"
    existingSecret: "prometheus-credentials"  # Must have a 'token' key

Secret Format

For Trino basic auth:

apiVersion: v1
kind: Secret
metadata:
  name: trino-credentials
type: Opaque
stringData:
  password: "your-password"

For bearer token auth:

apiVersion: v1
kind: Secret
metadata:
  name: trino-credentials
type: Opaque
stringData:
  token: "your-token"

Examples

See the examples/ directory for ready-to-use values files:

How It Works

The exporter runs a continuous collection loop:

  1. Build IP-to-pod map: Queries Prometheus for a known Trino worker metric (default: trino_execution_executor_TaskExecutor_Tasks) to map each worker's instance IP to its Kubernetes pod name.

  2. Discover running queries: For each configured coordinator, calls GET /v1/query?state=RUNNING to get the list of active query IDs.

  3. Stream-parse query details: For each running query, calls GET /v1/query/{queryId} with streaming enabled. The response can be tens of megabytes for complex queries. The exporter uses ijson to incrementally parse the JSON without loading it into memory, extracting taskStatus.self (worker address) and outputBuffers stats (buffered bytes, pages, state).

  4. Aggregate and export: Buffer stats are aggregated per worker across all queries. Workers are mapped to pod names, and Prometheus gauges are updated. Metrics are cleared each cycle to avoid stale label sets.

  5. Sleep and repeat: The loop sleeps for the remaining time in the poll interval before starting the next cycle.

Memory Efficiency

The streaming JSON parser is critical for production use. A single Trino query detail response can exceed 100MB when the query has many splits. Loading these into memory would cause OOM kills. The ijson-based parser processes the response incrementally, keeping memory usage constant regardless of response size.

Development

Run Locally

pip install -r requirements.txt
python trino-buffer-exporter.py --config config.yaml --log-level DEBUG

Run Tests

# Lint
python -m py_compile trino-buffer-exporter.py

# Helm lint
helm lint ./chart

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

About Simon

Trino Buffer Exporter is maintained by Simon, the agentic marketing platform that combines customer data with real-world signals to orchestrate personalized, 1:1 campaigns at scale. We built this tool to safely autoscale our Trino query clusters and open-sourced it so others can benefit.

License

Apache License 2.0. See LICENSE for details.

About

Prometheus exporter for Trino worker output buffer metrics — enables safe HPA scale-down by tracking in-flight query result pages per worker.

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Contributors