Avoiding unnecessary computations is essential for high-performance query engines. This blog post discusses search arguments, or SARGs - a technique to derive data restrictions from query predicates that enables index selection, data pruning, and query plan simplification optimizations.
Distributed SQL engines process queries on several nodes. Nodes may need to exchange tuples during query execution to ensure correctness and maintain a high degree of parallelism. This blog post discusses the concept of data shuffling in distributed query engines.
In this blog post, we discuss cross-product suppression, an important heuristic that powers the join order planning in modern query optimizers.
Query optimizers use knowledge of your data's nature, such as statistics and schema, to find optimal plans. Apache Calcite collectively refers to this information as metadata and provides a convenient API to extract operator's metadata within optimization routines. In this blog post, we will discuss the design of the metadata framework in Apache Calcite.
When a user submits a query to a database, the optimizer translates the query string to an intermediate representation (IR) and applies various transformations to find the optimal execution plan. Apache Calcite uses relational operators as the intermediate representation. In this blog post, we discuss the design of the relational operators in Apache Calcite.
A typical database may execute an SQL query in multiple ways, depending on the selected operators' order and algorithms. One crucial decision is the order in which the optimizer should join relations. In this blog post, we define the join ordering problem and estimate the complexity of join planning.
In this blog post, we will discuss what a cost of a query plan is and how it can drive optimizer decisions.
Query optimization is an expensive process that needs to explore multiple alternative ways to execute the query. The query optimization problem is NP-hard, with the number of possible plans growing exponentially with the query's complexity. This blog post will discuss memoization - an important technique that allows rule-based optimizers to consider billions of alternative plans in a reasonable time.
In this blog post, we discuss rule-based optimization - a common pattern to explore equivalent plans used by modern optimizers. Then we analyze the rule-based optimization in Apache Calcite, Presto, and CockroachDB.
Presto is an open-source distributed SQL query engine for big data. In this blog post series, we explore the internals of the Presto query optimizer. In the first part, we discuss the relational tree organization, the optimizer interface, and the design of the rule-based planner.