Querify Labs

What we do

We create high-performance data processing engines with Apache Arrow and Apache Calcite and advanced analytical solutions with Trino / Presto. Learn more about our team.

Query Engines

Building a query engine is a challenging task, requiring careful design of executor, scheduler, memory manager, and other key components. We created multiple distributed query engines for both transactional and analytical workloads using modern approaches, such as vectorization and compiled execution.

We routinely use Apache Arrow for high-performance columnar processing.

SQL Optimizers

Query optimizer is one of the most important components of modern data management systems, having a critical impact on performance. We create powerful cost-based optimizers for distributed, federated and analytical engines.

We frequently use Apache Calcite as a back bone of our optimizers.

Analytical Platforms

Data is at the heart of any modern business. Ability to analyze large volumes of data quickly is essential to stay ahead of your competitors.

We build custom analytical solutions using the modern open-source stack, including Apache Spark, Apache Flink, Apache Kafka, Trino, and open data formats Parquet, Orc, Avro managed by Apache Hive or Apache Iceberg.

Clients

services

Design

Designing a new data management system is a challenging task. We create prototypes and do design reviews to ensure that you considered all trade-offs as early as possible.

RESEARCH

Data processing is an active area of research. We bridge academic knowledge and practice to help you make better design decisions.

TRAINING

In-house expertise is essential for long-term product success. We conduct training to help your team accumulate solid knowledge of distributed systems and query processing.

Composable Data Systems: Lessons from Apache Calcite Success

Apr 1, 2024

Apache Calcite achieved tremendous success, powering query optimization in many popular systems, such as Apache Hive and Apache Flink. But even though such a great library has existed for more than ten years, query optimization development is still remarkably complicated and hardly "commoditized." Why is it so? We will discuss which exact technical decisions contributed to Apache Calcite's success, what role community plays in such projects, why it is still so difficult to integrate "composable" libraries into real products, and why I personally do not believe that composable data systems trend will fundamentally change the competition dynamics in the market.

Distinct aggregation optimization in Apache Calcite and Trino

Feb 22, 2023

Aggregation is one of the most frequently encountered operations in analytics. In SQL, aggregations are performed using aggregate functions (e.g., `SUM`, `COUNT`) with the optional `GROUP BY` clause. An aggregation function could contain the `DISTINCT` keyword, which might be non-trivial to implement in the query engine. This blog post explains how Apache Calcite and Trino optimizers rewrite distinct aggregates so that the underlying query engine can process them.

What we do

Query Engines

SQL Optimizers

Analytical Platforms

Clients

services

Design

RESEARCH

TRAINING

recent blog posts

Composable Data Systems: Lessons from Apache Calcite Success

Dynamic Filtering: a Critical Performance Optimization in Analytical Engines

Distinct aggregation optimization in Apache Calcite and Trino

about querify Labs

Navigation

contact us