We create query engines, distributed protocols, and data management systems for large-scale data processing.
Building an SQL query engine is a challenging task, requiring careful optimizer and executor design. We created multiple query engines for different backends, including distributed and federated systems and custom hardware.
We have extensive experience with Apache Calcite, a framework to build query engines, and regularly contribute to it.
Distributed execution is essential for horizontal scalability and fault tolerance. We build complex distributed protocols for infrastructure products, including replication and transactional layers, and use formal methods to verify their correctness.
Almost every database requires a persistent storage for the business data. We have decent experience building distributed storage engines, including ARIES recovery protocols, write-ahead logs and B+Tree indexes.
Designing a new data management system is a challenging task. We create prototypes and do design reviews to ensure that you considered all trade-offs as early as possible.
Data processing is an active area of research. We bridge academic knowledge and practice to help you make better design decisions.
In-house expertise is essential for long-term product success. We conduct training to help your team accumulate solid knowledge of distributed systems and query processing.
Distributed SQL engines process queries on several nodes. Nodes may need to exchange tuples during query execution to ensure correctness and maintain a high degree of parallelism. This blog post discusses the concept of data shuffling in distributed query engines.
In this blog post, we discuss cross-product suppression, an important heuristic that powers the join order planning in modern query optimizers.
Query optimizers use knowledge of your data's nature, such as statistics and schema, to find optimal plans. Apache Calcite collectively refers to this information as metadata and provides a convenient API to extract operator's metadata within optimization routines. In this blog post, we will discuss the design of the metadata framework in Apache Calcite.
Vladimir founded Querify Labs with a vision to bridge the gap between cutting-edge research in data management and practical systems. Before that, Vladimir worked at Hazelcast, where he led the development of a distributed SQL engine for in-memory data. Vladimir loves to hack open-source products and talk about them at international IT conferences. Vladimir is a contributor to the Apache Calcite project.
Alexey brings 10 years of experience building complex distributed systems and data storages. Before Querify Labs, Alexey worked at GridGain, where he was responsible for the overall product architecture, playing a pivotal role in developing persistence, replication, and transaction protocol for the Apache Ignite project. Alexey's area of interest includes query optimizers, concurrent algorithms, and formal methods. Alexey is a committer to the Apache Ignite project.
Roman is responsible for the architecture of data management systems at Querify Labs. Before joining the company, Roman worked at Yandex, the largest internet company in Eastern Europe, where he built the query optimization engine for massively parallel data processing. Roman is passionate about query optimization, looking for practical solutions to NP-hard problems on a daily basis. Roman is a contributor to the Apache Calcite and Apache Ignite projects.