We create query engines, distributed protocols, and data management systems for large-scale data processing.
Building an SQL query engine is a challenging task, requiring careful optimizer and executor design. We created multiple query engines for different backends, including distributed and federated systems and custom hardware.
We have extensive experience with Apache Calcite, a framework to build query engines, and regularly contribute to it.
Distributed execution is essential for horizontal scalability and fault tolerance. We build complex distributed protocols for infrastructure products, including replication and transactional layers, and use formal methods to verify their correctness.
Almost every database requires a persistent storage for the business data. We have decent experience building distributed storage engines, including ARIES recovery protocols, write-ahead logs and B+Tree indexes.
Designing a new data management system is a challenging task. We create prototypes and do design reviews to ensure that you considered all trade-offs as early as possible.
Data processing is an active area of research. We bridge academic knowledge and practice to help you make better design decisions.
In-house expertise is essential for long-term product success. We conduct training to help your team accumulate solid knowledge of distributed systems and query processing.
Query optimizers use knowledge of your data's nature, such as statistics and schema, to find optimal plans. Apache Calcite collectively refers to this information as metadata and provides a convenient API to extract operator's metadata within optimization routines. In this blog post, we will discuss the design of the metadata framework in Apache Calcite.
When a user submits a query to a database, the optimizer translates the query string to an intermediate representation (IR) and applies various transformations to find the optimal execution plan. Apache Calcite uses relational operators as the intermediate representation. In this blog post, we discuss the design of the relational operators in Apache Calcite.
A typical database may execute an SQL query in multiple ways, depending on the selected operators' order and algorithms. One crucial decision is the order in which the optimizer should join relations. In this blog post, we define the join ordering problem and estimate the complexity of join planning.
Vladimir founded Querify Labs with a vision to bridge the gap between cutting-edge research in data management and practical systems. Before that, Vladimir worked at Hazelcast, where he led the development of a distributed SQL engine for in-memory data. Vladimir loves to hack open-source products and talk about them at international IT conferences. Vladimir is a contributor to the Apache Calcite project.
Alexey brings 10 years of experience building complex distributed systems and data storages. Before Querify Labs, Alexey worked at GridGain, where he was responsible for the overall product architecture, playing a pivotal role in developing persistence, replication, and transaction protocol for the Apache Ignite project. Alexey's area of interest includes query optimizers, concurrent algorithms, and formal methods. Alexey is a committer to the Apache Ignite project.
Roman is responsible for the architecture of data management systems at Querify Labs. Before joining the company, Roman worked at Yandex, the largest internet company in Eastern Europe, where he built the query optimization engine for massively parallel data processing. Roman is passionate about query optimization, looking for practical solutions to NP-hard problems on a daily basis. Roman is a contributor to the Apache Calcite and Apache Ignite projects.