Workflow Examples#

This guide lists all of Minitrino’s modules as well as some of the workflows that the tool is best suited for.

Overview#

Module Documentation#

Each module has a readme associated with it. The list below points to the readme files for each module.

Enterprise vs Open Source Modules#

Minitrino supports both Trino (open-source) and Starburst Enterprise modules. The table below shows which modules require a Starburst license.

IMAGE=starburst :::

Module Type

Module Name

License

Description

Admin

cache-service

⭐ Enterprise

Materialized view caching and table scan redirection

Admin

data-products

⭐ Enterprise

Data products management and governance

Admin

file-group-provider

OSS

File-based group provider for authorization

Admin

insights

⭐ Enterprise

Starburst Insights analytics and monitoring UI

Admin

ldap-group-provider

⭐ Enterprise

LDAP-based group integration

Admin

minio

OSS

S3-compatible object storage (MinIO)

Admin

mysql-event-listener

OSS

MySQL-based audit logging and event tracking

Admin

resource-groups

OSS

Resource management and query queueing

Admin

results-cache

⭐ Enterprise

Query result caching for improved performance

Admin

scim

⭐ Enterprise

SCIM 2.0 user and group synchronization

Admin

session-property-manager

OSS

Manage session properties and defaults

Admin

spooling-protocol

OSS

Client-side result spooling protocol

Admin

starburst-gateway

⭐ Enterprise

Starburst Gateway for query federation

Catalog

clickhouse

OSS

ClickHouse database connector

Catalog

db2

⭐ Enterprise

IBM Db2 database connector

Catalog

delta-lake

OSS

Delta Lake table format support

Catalog

elasticsearch

OSS

Elasticsearch search engine connector

Catalog

faker

OSS

Faker data generator for testing

Catalog

hive

OSS

Hive Metastore with MinIO storage

Catalog

iceberg

OSS

Apache Iceberg REST catalog

Catalog

iceberg-hms

⭐ Enterprise

Iceberg with Hive Metastore catalog

Catalog

mariadb

OSS

MariaDB database connector

Catalog

mysql

OSS

MySQL database connector

Catalog

pinot

OSS

Apache Pinot real-time analytics connector

Catalog

postgres

OSS

PostgreSQL database connector

Catalog

sqlserver

OSS

Microsoft SQL Server connector

Catalog

stargate

⭐ Enterprise

Stargate data federation connector

Catalog

stargate-parallel

⭐ Enterprise

Stargate with parallel query execution

Security

biac

⭐ Enterprise

Built-in access control (BIAC)

Security

file-access-control

OSS

File-based authorization rules

Security

kerberos

OSS

Kerberos authentication integration

Security

ldap

OSS

LDAP authentication

Security

oauth2

OSS

OAuth 2.0 authentication

Security

password-file

OSS

Password file-based authentication

Security

tls

OSS

TLS/SSL encryption for secure connections

Using Enterprise Modules:

Enterprise modules require:

  1. Starburst Enterprise distribution: IMAGE=starburst

  2. Valid Starburst license file (see license configuration)

# Example: Provision with enterprise module
minitrino -e IMAGE=starburst -e LIC_PATH=~/starburstdata.license provision -m insights

Administration Modules

Catalog Modules

Security Modules

CLI Examples

Choosing a Trino or Starburst Version

Each Minitrino release uses a default Trino version, specified in lib/minitrino.env via the CLUSTER_VER variable. Unless overridden by an environment variable, this is the version that will be used for all provision commands.

To use Starburst instead of Trino, set the IMAGE environment variable to starburst. Starburst Enterprise (SEP) releases are based on Trino releases. So, SEP release 400-e directly maps to Trino release 400. To clearly demonstrate the relationship, here are few more examples:

  • SEP 413-e.9 -> Trino 413

  • SEP 446-e -> Trino 446

  • SEP 453-e.4 -> Trino 453

  • SEP 460-e -> Trino 460

Nearly all Trino plugins are exposed in SEP, so anything documented in the Trino docs should be configurable in the related SEP image.

Run Commands in Verbose Mode

It is recommended to run the majority of Minitrino commands in verbose mode. To do so, simply add the -v option to any command:

minitrino -v ...

List Modules

To see which modules exist (without needing to reference this wiki), make use of the modules command.

List all modules:

minitrino modules

List all modules of a given type (one of admin, catalog, security):

minitrino modules --type admin
minitrino modules --type catalog
minitrino modules --type security

The --json option can be used to dump all metadata tied to a module(s), such as the module’s directory, its Docker Compose file location, and a JSON representation of the entire Docker Compose file.

Using jq, the command can be manipulated to show many different groupings of information.

minitrino modules -m ldap --json

# Return the module directory
minitrino modules -m ${module} --json | jq .${module}.module_dir

# Return the module's Compose file location
minitrino modules -m ${module} --json | jq .${module}.yaml_dict

Provision an Environment

When a certain CLUSTER_VER is deployed for the first time, Minitrino must first build the image. This typically takes ~ 5 minutes depending on the quality of your network. Once a given CLUSTER_VER is built, it will be reused for all future commands specifying the same version.

Provision a single-node cluster using the default Trino version:

minitrino -v provision

Provision a multi-node cluster with two worker nodes:

minitrino -v provision --workers 2

Provision the postgres catalog module with a specific Trino version:

minitrino -v -e CLUSTER_VER=${VER} provision -m postgres

Provision with Starburst instead of Trino:

minitrino -v -e IMAGE=starburst -e CLUSTER_VER=476-e provision -m postgres

Provision the hive catalog module with two worker nodes:

minitrino -v provision -m hive --workers 2

Append the running Hive environment with the iceberg module and downsize to one worker (the hive module will be included automatically):

minitrino -v provision -m iceberg --workers 1

Provision Trino with password authentication, which will include the tls module as a dependency and expose the service on https://localhost:8443:

minitrino -v provision -m password-file

Provision multiple password authenticators in tandem:

minitrino -v provision -m ldap -m password-file

Access the UI

All environments expose the Starburst service on localhost:8080, meaning you can visit the web UI directly, or you can connect external clients, such as DBeaver, to the localhost service.

The coordinator container can be directly accessed via:

docker exec -it minitrino-default bash

Note: The default cluster name is default. If you provisioned with a custom cluster name (e.g., --cluster my-cluster), replace minitrino-default with minitrino-my-cluster.

Worker Provisioning Overview

When you provision an environment with one or more workers, the following events take place:

  • The coordinator container is deployed and any relevant bootstrap scripts are executed inside of it.

  • The coordinator is restarted.

  • Once the coordinator is up, the worker containers are deployed, and the coordinator’s /mnt/etc/ directory is compressed, copied, and extracted to all of the worker containers.

  • The workers’ config.properties files are overwritten with basic configurations for connectivity to the coordinator.

This ensures that any distributed files, such as catalog files, are placed on every container in the cluster. It also ensures that coordinator-specific configurations do not remain on the workers.

Modify Files in a Running Container

You can modify files inside a running container. For example:

# Update coordinator logging settings
docker exec -it minitrino-default bash
echo "io.trino=DEBUG" >> /mnt/etc/log.properties
exit
docker restart minitrino-default

# Update worker logging settings (assuming default cluster name)
docker exec -it minitrino-worker-1-default bash
echo "io.trino=DEBUG" >> /mnt/etc/log.properties
exit
docker restart minitrino-worker-1-default

Restarting the container allows Trino to register the configuration change.

Note: Replace default with your cluster name if using a custom cluster.

Access the Trino CLI

docker exec -it minitrino-default bash
trino-cli --debug --user admin --execute "SELECT * FROM tpch.tiny.customer LIMIT 10"

Restart Cluster Containers

Restart all containers in the cluster without reprovisioning:

minitrino restart

This is useful when you’ve made configuration changes and need to reload the cluster without tearing it down completely.

View Cluster Resources

View all resources associated with your cluster:

minitrino resources

Filter by specific resource types:

# View only containers
minitrino resources --container

# View only volumes
minitrino resources --volume

# View only images
minitrino resources --image

# View only networks
minitrino resources --network

Shut Down an Environment

minitrino down

To skip graceful shutdown and stop all containers immediately, run:

minitrino down --sig-kill

The default behavior of the down command removes containers. To stop the containers instead of removing them, run:

minitrino down --keep

Remove Minitrino Resources

Minitrino creates volumes and pulls/builds various images to support each module. All resources are labeled with project-specific metadata, so all remove commands will target Docker resources specifically tied to Minitrino modules.

Remove all Minitrino-labeled images (requires --cluster all or -c all):

minitrino -c all remove --images

Remove images from a specific module:

minitrino remove --images \
  --label org.minitrino.module.${MODULE_TYPE}.${MODULE}=true

Where ${MODULE_TYPE} is one of: admin, catalog, security.

Remove all Minitrino-labeled volumes:

minitrino remove --volumes

Remove volumes from a specific module:

minitrino remove --volumes \
  --label org.minitrino.module.${MODULE_TYPE}.${MODULE}=true

Snapshot a Customized Module

Users designing and customizing their own modules can persist them for later usage with the snapshot command. For example, if a user modifies the hive module, they can persist their changes by running:

minitrino snapshot --name ${SNAPSHOT_NAME} -m hive

By default, the snapshot file is placed in ${LIB_PATH}/${SNAPSHOT_NAME}.tar.gz. The snapshot can be saved to a different directory by passing the --directory option to the snapshot command.

Point to a Starburst License File for Enterprise Modules

Enterprise Module Reference table above for a complete list of enterprise vs open-source modules. :::

Pass an environment variable directly to the command:

minitrino -e LIC_PATH=/path/to/starburstdata.license provision -m insights

Export the environment variable to the shell:

export LIC_PATH=/path/to/starburstdata.license
minitrino provision -m insights

Add the variable to the minitrino.cfg file:

minitrino config

# Edit the config file
[config]
LIC_PATH=~/work/license/starburstdata.license

# Enterprise modules now automatically receive the license file
minitrino provision -m insights

More information about environment variables can be found here.

Modify the Trino config.properties and jvm.config Files

Many modules may change the Trino’s config.properties and jvm.config files. There are two supported ways to modify these files.

Method One: Docker Compose Environment Variables

Minitrino has special support for two Trino-specific environment variables: CONFIG_PROPERTIES and JVM_CONFIG. Below is an example of setting these variables in a Docker Compose file:

minitrino:
  environment:
    CONFIG_PROPERTIES: |-
      insights.jdbc.url=jdbc:postgresql://postgresdb:5432/insights
      insights.jdbc.user=admin
      insights.jdbc.password=password
      insights.persistence-enabled=true
    JVM_CONFIG: |-
      -Xlog:gc:/var/log/sep-gc-%t.log:time:filecount=10

Method Two: Environment Variables

Environment variables can be exported prior to executing a provision command:

export CONFIG_PROPERTIES=$'query.max-stage-count=85\ndiscovery.uri=http://trino:8080'
export JVM_CONFIG=$'-Xmx2G\n-Xms1G'

minitrino -v provision

Note that multiple configs can be separated with a newline, though $'...' syntax must be used, as it allows you to include escape sequences like \n that are interpreted as actual newlines.

As an alternative to using export to set environment variables, they can also be passed directly to the provision command:

minitrino -v \
  --env CONFIG_PROPERTIES=$'query.max-stage-count=85\ndiscovery.uri=http://trino:8080' \
  --env JVM_CONFIG=$'-Xmx2G\n-Xms1G' \
  provision

Single configs are simpler and do not require $'...' syntax:

minitrino -v \
  --env CONFIG_PROPERTIES='query.max-stage-count=85' \
  --env JVM_CONFIG='-Xmx2G' \
  provision

Method Three: Bootstrap Scripts

The config.properties and jvm.config files can be modified directly with a module bootstrap script.

Bootstrap Scripts

Bootstrap scripts allow you to customize container initialization with arbitrary shell commands. These scripts execute at specific lifecycle stages and can modify configurations, load data, or initialize external services.

How Bootstrap Scripts Work

  1. Minitrino copies bootstrap scripts from the library to the container

  2. Scripts execute in two phases: before_start and after_start

  3. Container restarts after each bootstrap execution

  4. Idempotency checks: Minitrino checksums scripts and only re-executes if content changes

Important: Bootstrap scripts do NOT replace the container’s entrypoint. They augment the startup process.

Execution Context

User and Permissions:

  • Scripts initially run as root

  • After execution, ownership of /etc/${CLUSTER_DIST} changes to service user (trino/starburst)

  • Scripts can use sudo for privileged operations

  • Files created should be owned by ${SERVICE_USER} for persistence

Execution Phases:

  1. before_start():

    • Runs before Trino/Starburst server starts

    • Use for: config generation, certificate setup, schema initialization

    • Trino/Starburst service is NOT available yet

  2. after_start():

    • Runs after Trino/Starburst is accepting queries

    • Use for: data loading, SQL execution, post-startup validation

    • Trino/Starburst service IS available (can use trino-cli)

Available Environment Variables

Your bootstrap script has access to these key variables:

$CLUSTER_DIST         # "trino" or "starburst"
$CLUSTER_VER          # Version number (e.g., "476")
$CLUSTER_NAME         # Cluster name (e.g., "default")
$SERVICE_USER         # Service user ("trino" or "starburst")
$HOSTNAME             # Container hostname
$COORDINATOR          # "true" for coordinator, "false" for workers

All Docker Compose environment variables are also available.

Available Tools and Utilities

Bootstrap scripts have access to a rich toolset:

  • Networking: curl, wget, ping, telnet

  • Data: jq (JSON processor)

  • Trino: trino-cli (for SQL execution in after_start)

  • Security: keytool (Java keystore), openssl, ldap-utils, krb5-user

  • Utilities: wait-for-it (wait for services), vim, tree, less

  • Python: Python 3 with pip (can install additional packages)

Creating a Bootstrap Script

1. Create the script directory:

mkdir -p lib/modules/${type}/${module}/resources/bootstrap/

2. Write the script:

#!/usr/bin/env bash

set -euxo pipefail

before_start() {
    echo "Executing before Trino starts..."

    # Example: Modify config file
    echo "discovery.uri=http://custom-coordinator:8080" >> /etc/${CLUSTER_DIST}/config.properties

    # Example: Wait for external service
    wait-for-it postgres:5432 --strict --timeout=30

    # Example: Generate certificates (as root)
    keytool -genkeypair -alias mykey ...

    # Example: Fix ownership
    chown -R ${SERVICE_USER}:root /etc/${CLUSTER_DIST}/custom/
}

after_start() {
    echo "Executing after Trino started..."

    # Example: Load data via SQL
    trino-cli --user admin --execute "CREATE SCHEMA IF NOT EXISTS hive.default"

    # Example: Validate setup
    if trino-cli --user admin --execute "SELECT 1"; then
        echo "Cluster is ready!"
    else
        echo "Cluster validation failed!"
        exit 1
    fi
}

3. Reference in Docker Compose YAML:

services:
  minitrino:
    environment:
      MINITRINO_BOOTSTRAP: bootstrap.sh

4. Test the module:

minitrino -v provision -m ${module}

Bootstrap Script Best Practices

Idempotency

Always check if operations were already completed:

before_start() {
    # Check if index already exists
    if curl -s http://elasticsearch:9200/_cat/indices | grep -q 'myindex'; then
        echo "Index already exists, skipping creation"
        return 0
    fi

    # Create index
    curl -XPUT http://elasticsearch:9200/myindex ...
}

Error Handling

Use set -e to exit on errors, but handle expected failures:

set -euxo pipefail  # Exit on error, undefined vars, pipe failures

before_start() {
    # This will exit script if command fails
    keytool -genkeypair ...

    # Handle expected failures
    if ! some_optional_command; then
        echo "Optional command failed, continuing..."
    fi
}

External Service Dependencies

Wait for services to be ready:

before_start() {
    # Wait for service to be available
    wait-for-it elasticsearch:9200 --strict --timeout=60 -- \
        echo "Elasticsearch is up"

    # Alternative: manual retry logic
    retries=30
    until curl -s http://postgres:5432 > /dev/null 2>&1; do
        retries=$((retries - 1))
        if [ $retries -eq 0 ]; then
            echo "Postgres failed to start"
            exit 1
        fi
        sleep 1
    done
}

File Ownership

Ensure files are owned by the service user:

before_start() {
    # Create files as root
    echo "config-value" > /etc/${CLUSTER_DIST}/custom.properties

    # Fix ownership for service user
    chown ${SERVICE_USER}:root /etc/${CLUSTER_DIST}/custom.properties
    chmod 644 /etc/${CLUSTER_DIST}/custom.properties
}

Using Python in Bootstraps

Install packages and run Python scripts:

before_start() {
    # Install Python packages
    sudo pip install requests faker

    # Run inline Python
    python3 << 'EOF'
import requests
response = requests.get('http://service:8080/health')
print(f"Health check: {response.status_code}")
EOF

    # Or execute external script
    python3 /mnt/bootstrap/mymodule/myscript.py
}

Debugging Bootstrap Scripts

View Bootstrap Logs

# Check if bootstraps completed
docker logs minitrino-default 2>&1 | grep BOOTSTRAP

# View bootstrap script execution
docker logs minitrino-default 2>&1 | grep -A 50 "before_start"

Manually Execute Bootstrap

# Enter container
docker exec -it minitrino-default bash

# View bootstrap script
cat /tmp/minitrino/bootstrap/mymodule/bootstrap.sh

# Execute manually
bash -x /tmp/minitrino/bootstrap/mymodule/bootstrap.sh

Force Re-execution

Minitrino tracks bootstrap checksums to avoid re-running unchanged scripts:

# Remove checksum file to force re-execution
docker exec minitrino-default rm /etc/trino/.minitrino/bootstrap_checksums.json

# Restart container (will re-run bootstraps)
docker restart minitrino-default

Check Bootstrap Files

# View bootstrap directory structure
docker exec minitrino-default tree /tmp/minitrino/bootstrap/

# Check what's mounted
docker exec minitrino-default ls -la /mnt/bootstrap/

Real-World Examples

Example 1: TLS Certificate Generation

From security/tls/resources/cluster/bootstrap.sh:

before_start() {
    local ssl_dir=/mnt/etc/tls

    # Check if certificates exist
    if [[ -f "${ssl_dir}/keystore.jks" ]]; then
        echo "TLS artifacts already exist, skipping generation"
        return 0
    fi

    # Generate keystore
    keytool -genkeypair \
        -alias minitrino \
        -keyalg RSA \
        -keystore "${ssl_dir}/keystore.jks" \
        -storepass changeit \
        -validity 3650 \
        -dname "CN=*.minitrino.com"

    # Export certificate
    keytool -export \
        -alias minitrino \
        -keystore "${ssl_dir}/keystore.jks" \
        -file "${ssl_dir}/minitrino_cert.cer" \
        -storepass changeit
}

Example 2: Data Loading

From catalog/elasticsearch/resources/cluster/bootstrap-es.sh:

before_start() {
    # Wait for Elasticsearch
    wait-for-it elasticsearch:9200 --strict --timeout=60

    # Check if index exists
    if curl -s http://elasticsearch:9200/_cat/indices | grep -q 'user'; then
        echo "Index already exists, skipping"
        return 0
    fi

    # Create index
    curl -XPUT http://elasticsearch:9200/user -H 'Content-Type: application/json' -d '{
        "settings": { "number_of_replicas": 0 }
    }'

    # Install Python dependencies
    sudo pip install faker requests

    # Generate and load data
    python3 << 'EOF'
import requests
from faker import Faker
fake = Faker()
for i in range(1, 500):
    user = {"name": fake.name(), "email": fake.email()}
    requests.post("http://elasticsearch:9200/user/_doc/" + str(i), json=user)
EOF
}

Example 3: Configuration File Modification

before_start() {
    # Append to JVM config
    echo "-Djava.security.krb5.conf=/etc/krb5.conf" >> /etc/${CLUSTER_DIST}/jvm.config

    # Modify config.properties
    cat << EOF >> /etc/${CLUSTER_DIST}/config.properties
http-server.authentication.krb5.service-name=trino
http-server.authentication.krb5.keytab=/etc/trino.keytab
EOF

    # Ensure ownership
    chown -R ${SERVICE_USER}:root /etc/${CLUSTER_DIST}/
}

Testing Bootstrap Scripts Locally

Before adding to a module, test bootstrap logic locally:

# 1. Provision cluster without custom bootstrap
minitrino -v provision -m base-module

# 2. Copy test script into container
docker cp my-bootstrap.sh minitrino-default:/tmp/test-bootstrap.sh

# 3. Execute manually
docker exec -it minitrino-default bash -x /tmp/test-bootstrap.sh

# 4. Verify results
docker exec -it minitrino-default cat /etc/trino/config.properties

When Bootstrap Scripts Fail

If provisioning hangs or fails during bootstrap:

  1. Check logs: docker logs minitrino-default 2>&1 | tail -100

  2. Look for errors: Search for “ERROR”, “failed”, or stack traces

  3. Verify external services: Ensure dependencies (DBs, etc.) are running

  4. Test interactively: Enter container and run script manually

  5. Simplify script: Comment out sections to isolate the failure

  6. Check permissions: Verify files/directories are accessible

After fixing issues, destroy and re-provision:

minitrino down
minitrino remove --volumes
minitrino -v provision -m ${module}

More Examples

Additional bootstrap examples can be found in the library:

  • TLS setup: lib/modules/security/tls/resources/cluster/bootstrap.sh

  • LDAP configuration: lib/modules/security/ldap/resources/cluster/bootstrap.sh

  • OAuth2 setup: lib/modules/security/oauth2/resources/cluster/bootstrap.sh

  • BIAC initialization: lib/modules/security/biac/resources/cluster/bootstrap.sh

  • Elasticsearch data loading: lib/modules/catalog/elasticsearch/resources/cluster/bootstrap-es.sh