Abstract

Query optimizers use knowledge of your data's nature, such as statistics and schema, to find optimal plans. Apache Calcite collectively refers to this information as metadata and provides a convenient API to extract operator's metadata within optimization routines. In this blog post, we will discuss the design of the metadata framework in Apache Calcite.

Example

Recall the query from our previous blog post about join planning:


SELECT 
  lineitem.*
FROM 
  customer,
  orders,
  lineitem
WHERE
  c_custkey = ? 
  AND c_custkey = o_custkey
  AND o_orderkey = l_orderkey

Cheaper plans tend to generate smaller intermediate relations. To ensure that the optimizer prefers such plans, we may make the `Join` operator cost proportional to the number of produced rows.

But how to estimate the number of rows (cardinality) in the first place? For the `Scan` operator, we may rely on table statistics maintained by the database.

For the `Filter` operator, we may estimate the fraction of rows that satisfy the predicate (selectivity) and multiply it by input's cardinality. For example, the selectivity of the equality condition on a non-unique attribute could be estimated as the number of distinct attribute values divided by the total number of rows. The equality condition on a unique attribute would produce no more than one row.

For the `Join` operator, we may multiply the predicate's selectivity by the cardinalities of both inputs. To make the estimation more accurate, we may want to propagate information about predicates already applied to the given attribute in the child operators.

We already defined quite a few metadata classes which might be required for the join order planning:

Operator cardinalities that depend on ...
Predicate selectivities that depend on ...
Number of attribute's distinct values (NDV) that might depend on ...
Attribute's uniqueness and applied predicates.

We need some powerful infrastructure to propagate all these pieces of information efficiently across operators.

Design Considerations

As we understand the problem, let's outline the possible design consideration for the metadata infrastructure.

First, we define the metadata consumers. In cost-based optimizers, metadata is used extensively to estimate the operator's cost. In rule-based optimizers, we may want to access metadata from within the optimization rules. For example, we may use the information about the attribute's uniqueness to eliminate the unnecessary `DISTINCT` clause from queries like `SELECT DISTINCT unique_column FROM t`. Therefore, metadata API should be part of the global context available to different optimizer parts.

Second, in rule-based optimizers, you typically do not have access to the complete operator tree until the end of the optimization process. For example, cost-based optimizers often use the MEMO data structure, where normal operator inputs are replaced with dynamically changing equivalence groups. Therefore, metadata calculation must be performed on the operator level rather than the whole query plan. On the other hand, the derivation of a particular metadata class might depend on other metadata classes. For example, `Filter` cardinality might require `Filter` selectivity and input cardinality. Therefore, the API must allow for recursive access to input metadata.

Third, SQL queries may produce complex plans with tens of initial operators that expand to thousands and even millions of other operators during the planning. The straightforward recursive dives might become too expensive. Caching is essential to mitigate the performance impact.

Finally, if you create a query optimization framework, like Apache Calcite, you may want to decouple metadata from operators. This allows you to provide foundational operators and associated optimization rules from the framework while still allowing users to change their costs.

Metadata in Apache Calcite

We defined the requirements of the API. Now let's take a look at how metadata management works in Apache Calcite.

API

Apache Calcite provides a single entry point to all metadata through the RelMetadataQuery interface. The interface contains a single method for each metadata class that accepts the target operator and optional parameters specific to the concrete metadata class. For example, the cardinality requires only the target operator, while selectivity also requires the predicate that is going to be analyzed:


class RelMetadataQuery {
  // Cardinality
  public Double getRowCount(RelNode rel) { ... }
  
  // Selectivity
  public Double getSelectivity(RelNode rel, RexNode predicate) { ... }
}

The `RelMetadataQuery` object is available from the global optimization context called RelOptCluster. `RelOptCluster` is passed as a constructor argument to every operator. Therefore you may access metadata easily from any part of the optimizer's infrastructure, such as the operator's cost function, optimization rule, or even the metadata handler routines that we explain below.

Dispatching

Internally, `RelMetadataQuery` dispatches metadata requests to dedicated handler functions. To install the handlers, we create a class that contains a set of methods with signatures similar to the public API plus the additional `RelMetadataQuery` argument, one method per operator type.

For example, if the public row count API accepts `RelNode` (operator), the handler must accept both operator and `RelMetadataQuery`.


class RelMetadataQuery {
  public Double getRowCount(RelNode rel) { ... }  
}

class RelMdRowCount {
  // Handler for scan.
  Double getRowCount(TableScan scan, RelMetadataQuery mq) { ... }  
  
  // Handler for filter.
  Double getRowCount(Filter filter, RelMetadataQuery mq) { ... }
  
  // Handler for the equivalence set. Required for the cost-based
  // optimization with VolcanoPlanner.
  Double getRowCount(RelSubset rel, RelMetadataQuery mq) { ... }
  
  // Catch-all handler invoked if there is no dedicated handler
  // for the operator class.
  Double getRowCount(RelNode rel, RelMetadataQuery mq) { ... }
}

Finally, you assemble all available handler classes into a composite object and install it to the global context, `RelOptCluster`. We omit the details for brevity, but you may take a look at RelMdRowCount, BuiltInMetadata.RowCount, DefaultRelMetadataProvider, and RelOptCluster.setMetadataProvider for more detail.

Once you provided all handler functions, magic happens. Apache Calcite will analyze handler function signatures and various marker interfaces and link them together inside the `RelMetadataQuery` instance. Now, the invocation of`RelMetadataQuery.getRowCount(Filter)` will trigger the relevant handler function.

Handler functions might be overridden if needed. By extending the `RelMetadataQuery` class, you can also add new metadata classes.

Previously, Apache Calcite used Java reflection to dispatch metadata requests, see ReflectiveRelMetadataProvider. However, due to performance concerns, the reflective approach was replaced with code generation using the Janino compiler, see JaninoRelMetadataProvider. Internally, the generated code is basically a large `switch` block that dispatches the metadata request to a proper handler function.

Caching

Metadata calculation might be expensive. Intermediate operators, such as `Filter` or `Join`, often rely on children's metadata. This leads to recursive calls, which makes the complexity of metadata calculation proportional to the size of the query plan.

A key observation is that metadata of a given operator remains stable for so long there are no changes to the operator's children. Therefore, we may cache the operator's metadata and invalidate it when a change to a child node is detected. Apache Calcite tracks connections between operators, which allows it to detect such changes and provide metadata caching capabilities out-of-the-box.

Useful Metadata Classes

In this section, we describe Apache Calcite metadata classes often used in practice.

Cardinality - estimates the number of rows emitted from the operator. Used by operator cost functions.
A number of distinct values (NDV)- estimates the number of distinct values for the given set of attributes. Facilitates cardinality estimation for operators with predicates and aggregations.
Selectivity - estimates the fraction of rows that pass the predicate. Helps to estimate cardinalities for operators with predicates, such as `Filter` and `Join`.
Attribute uniqueness - provides information about unique attributes. Used by optimization rules to simplify or reduce operators. E.g., to eliminate unnecessary aggregates.
Predicates - deduces the restrictions that hold on rows emitted from an operator. Used by optimization rules for operator simplification, transitive predicate derivation, etc.

Summary

Metadata is auxiliary information that helps optimizer find better plans. Examples are operator cardinality, predicate selectivity, attribute uniqueness.

Apache Calcite comes with a rich metadata management framework. Users may access metadata through a single gateway, `RelMetadataQuery`, from any part of theoptimizer's code (operators, rules, metadata).

Internally, Apache Calcite works with isolated metadata handler functions, one per metadata class per operator. You may override existing handler functions and provide new ones. Apache Calcite uses code generation to wire independent handler functions into a single facade exposed to the user. Additionally, Apache Calcite uses aggressive caching to minimize the overhead on recursive metadata calls.

In further posts, we will explore in detail how cardinality is derived for different operators. Stay tuned!

We are always ready to help you with your query engine design. Just let us know.

Metadata Management in Apache Calcite