Written byRoman Kondakov
August 23, 2021
Query optimizers use knowledge of your data's nature, such as statistics and schema, to find optimal plans. Apache Calcite collectively refers to this information as metadata and provides a convenient API to extract operator's metadata within optimization routines. In this blog post, we will discuss the design of the metadata framework in Apache Calcite.
Recall the query from our previous blog post about join planning:
Cheaper plans tend to generate smaller intermediate relations. To ensure that the optimizer prefers such plans, we may make the `Join` operator cost proportional to the number of produced rows.
But how to estimate the number of rows (cardinality) in the first place? For the `Scan` operator, we may rely on table statistics maintained by the database.
For the `Filter` operator, we may estimate the fraction of rows that satisfy the predicate (selectivity) and multiply it by input's cardinality. For example, the selectivity of the equality condition on a non-unique attribute could be estimated as the number of distinct attribute values divided by the total number of rows. The equality condition on a unique attribute would produce no more than one row.
For the `Join` operator, we may multiply the predicate's selectivity by the cardinalities of both inputs. To make the estimation more accurate, we may want to propagate information about predicates already applied to the given attribute in the child operators.
We already defined quite a few metadata classes which might be required for the join order planning:
We need some powerful infrastructure to propagate all these pieces of information efficiently across operators.
As we understand the problem, let's outline the possible design consideration for the metadata infrastructure.
First, we define the metadata consumers. In cost-based optimizers, metadata is used extensively to estimate the operator's cost. In rule-based optimizers, we may want to access metadata from within the optimization rules. For example, we may use the information about the attribute's uniqueness to eliminate the unnecessary `DISTINCT` clause from queries like `SELECT DISTINCT unique_column FROM t`. Therefore, metadata API should be part of the global context available to different optimizer parts.
Second, in rule-based optimizers, you typically do not have access to the complete operator tree until the end of the optimization process. For example, cost-based optimizers often use the MEMO data structure, where normal operator inputs are replaced with dynamically changing equivalence groups. Therefore, metadata calculation must be performed on the operator level rather than the whole query plan. On the other hand, the derivation of a particular metadata class might depend on other metadata classes. For example, `Filter` cardinality might require `Filter` selectivity and input cardinality. Therefore, the API must allow for recursive access to input metadata.
Third, SQL queries may produce complex plans with tens of initial operators that expand to thousands and even millions of other operators during the planning. The straightforward recursive dives might become too expensive. Caching is essential to mitigate the performance impact.
Finally, if you create a query optimization framework, like Apache Calcite, you may want to decouple metadata from operators. This allows you to provide foundational operators and associated optimization rules from the framework while still allowing users to change their costs.
We defined the requirements of the API. Now let's take a look at how metadata management works in Apache Calcite.
Apache Calcite provides a single entry point to all metadata through the RelMetadataQuery interface. The interface contains a single method for each metadata class that accepts the target operator and optional parameters specific to the concrete metadata class. For example, the cardinality requires only the target operator, while selectivity also requires the predicate that is going to be analyzed:
The `RelMetadataQuery` object is available from the global optimization context called RelOptCluster. `RelOptCluster` is passed as a constructor argument to every operator. Therefore you may access metadata easily from any part of the optimizer's infrastructure, such as the operator's cost function, optimization rule, or even the metadata handler routines that we explain below.
Internally, `RelMetadataQuery` dispatches metadata requests to dedicated handler functions. To install the handlers, we create a class that contains a set of methods with signatures similar to the public API plus the additional `RelMetadataQuery` argument, one method per operator type.
For example, if the public row count API accepts `RelNode` (operator), the handler must accept both operator and `RelMetadataQuery`.
Finally, you assemble all available handler classes into a composite object and install it to the global context, `RelOptCluster`. We omit the details for brevity, but you may take a look at RelMdRowCount, BuiltInMetadata.RowCount, DefaultRelMetadataProvider, and RelOptCluster.setMetadataProvider for more detail.
Once you provided all handler functions, magic happens. Apache Calcite will analyze handler function signatures and various marker interfaces and link them together inside the `RelMetadataQuery` instance. Now, the invocation of`RelMetadataQuery.getRowCount(Filter)` will trigger the relevant handler function.
Handler functions might be overridden if needed. By extending the `RelMetadataQuery` class, you can also add new metadata classes.
Previously, Apache Calcite used Java reflection to dispatch metadata requests, see ReflectiveRelMetadataProvider. However, due to performance concerns, the reflective approach was replaced with code generation using the Janino compiler, see JaninoRelMetadataProvider. Internally, the generated code is basically a large `switch` block that dispatches the metadata request to a proper handler function.
Metadata calculation might be expensive. Intermediate operators, such as `Filter` or `Join`, often rely on children's metadata. This leads to recursive calls, which makes the complexity of metadata calculation proportional to the size of the query plan.
A key observation is that metadata of a given operator remains stable for so long there are no changes to the operator's children. Therefore, we may cache the operator's metadata and invalidate it when a change to a child node is detected. Apache Calcite tracks connections between operators, which allows it to detect such changes and provide metadata caching capabilities out-of-the-box.
In this section, we describe Apache Calcite metadata classes often used in practice.
Metadata is auxiliary information that helps optimizer find better plans. Examples are operator cardinality, predicate selectivity, attribute uniqueness.
Apache Calcite comes with a rich metadata management framework. Users may access metadata through a single gateway, `RelMetadataQuery`, from any part of theoptimizer's code (operators, rules, metadata).
Internally, Apache Calcite works with isolated metadata handler functions, one per metadata class per operator. You may override existing handler functions and provide new ones. Apache Calcite uses code generation to wire independent handler functions into a single facade exposed to the user. Additionally, Apache Calcite uses aggressive caching to minimize the overhead on recursive metadata calls.
In further posts, we will explore in detail how cardinality is derived for different operators. Stay tuned!
We are always ready to help you with your query engine design. Just let us know.