Skip to main content
Availability: Pre-aggregates are an Early Access feature available on Enterprise plans only.
Pre-aggregates let you define materialized summaries of your data directly in your dbt YAML. When a user runs a query in Lightdash, the system checks if the query can be answered from a pre-aggregate instead of querying your warehouse. If it matches, the query is served from the pre-computed results, making it significantly faster and reducing warehouse load. This is especially useful for dashboards with high traffic or expensive aggregations that don’t need real-time data.

Getting started

Define pre-aggregates in your dbt project and configure scheduling.

Monitoring and debugging

Track materialization status, debug query matching, and view hit/miss stats.

How it works

Pre-aggregates follow a four-step cycle:
  1. Define — You add a pre_aggregates block to your dbt model YAML, specifying which dimensions and metrics to include.
  2. Materialize — Lightdash runs the aggregation query against your warehouse and stores the results. This happens automatically on compile, on a cron schedule you define, or when you trigger it manually.
  3. Match — When a user runs a query, Lightdash checks if every requested dimension, metric, and filter is covered by a pre-aggregate.
  4. Serve — If a match is found, the query is served from the materialized data instead of hitting your warehouse.

Example

Suppose you have an orders table with thousands of rows, and you define a pre-aggregate with dimensions status and metrics total_amount (sum) and order_count (count), with a day granularity on order_date. Your warehouse data:
order_datestatuscustomeramount
2024-01-15shippedAlice$100
2024-01-15shippedBob$50
2024-01-15pendingCharlie$75
2024-01-16shippedAlice$200
2024-01-16pendingCharlie$30
Lightdash materializes this into a pre-aggregate:
order_date_daystatustotal_amountorder_count
2024-01-15shipped$1502
2024-01-15pending$751
2024-01-16shipped$2001
2024-01-16pending$301
Now when a user queries “total amount by status, grouped by month”, Lightdash re-aggregates from the daily pre-aggregate instead of scanning the full table:
order_date_monthstatustotal_amount
January 2024shipped$350
January 2024pending$105
This works because sum can be re-aggregated — summing daily sums gives the correct monthly sum.

Query matching

When a user runs a query, Lightdash automatically checks if a pre-aggregate can serve the results. A pre-aggregate matches when all of the following are true:
  • Every dimension in the query is included in the pre-aggregate
  • Every metric in the query is included in the pre-aggregate
  • Every dimension used in filters is included in the pre-aggregate
  • All metrics use supported metric types
  • The query does not contain custom dimensions, custom metrics, or table calculations
  • If the query uses a time dimension, the requested granularity is equal to or coarser than the pre-aggregate’s granularity (for example, a day pre-aggregate can serve day, week, month, or year queries, but not hour)
When multiple pre-aggregates match a query, Lightdash picks the smallest one (fewest dimensions, then fewest metrics as tiebreaker).

Dimensions from joined tables

Pre-aggregates support dimensions from joined tables. Reference them by their full name (for example, customers.first_name) in the dimensions list.

Supported metric types

Pre-aggregates support metrics that can be re-aggregated from pre-computed results:
  • sum
  • count
  • min
  • max
  • average

Current limitations

Not all metrics work this way. Consider count_distinct with the same daily pre-aggregate from above. If a daily pre-aggregate stores “2 distinct customers on 2024-01-15” and “1 distinct customer on 2024-01-16”, you can’t sum those to get the monthly distinct count — Alice ordered on both days and would be counted twice:
order_date_daystatusdistinct_customers
2024-01-15shipped2 (Alice, Bob)
2024-01-16shipped1 (Alice)
Re-aggregating: 2 + 1 = 3, but the correct monthly answer is 2 (Alice, Bob). The pre-aggregate lost track of which customers were counted. We’re investigating supporting count_distinct through approximation algorithms. Follow this issue for updates. For similar reasons, the following metric types are also not supported:
  • sum_distinct, average_distinct
  • median, percentile
  • percent_of_total, percent_of_previous
  • running_total
  • Custom SQL metrics — Follow this issue
  • number, string, date, timestamp, boolean
For metrics that can’t be pre-aggregated, consider using caching instead.