We create query engines, distributed protocols, and data management systems for large-scale data processing.
Building an SQL query engine is a challenging task, requiring careful optimizer and executor design. We created multiple query engines for different backends, including distributed and federated systems and custom hardware.
We have extensive experience with Apache Calcite, a framework to build query engines, and regularly contribute to it.
Distributed execution is essential for horizontal scalability and fault tolerance. We build complex distributed protocols for infrastructure products, including replication and transactional layers, and use formal methods to verify their correctness.
Almost every database requires a persistent storage for the business data. We have decent experience building distributed storage engines, including ARIES recovery protocols, write-ahead logs and B+Tree indexes.
Designing a new data management system is a challenging task. We create prototypes and do design reviews to ensure that you considered all trade-offs as early as possible.
Data processing is an active area of research. We bridge academic knowledge and practice to help you make better design decisions.
In-house expertise is essential for long-term product success. We conduct training to help your team accumulate solid knowledge of distributed systems and query processing.
A typical database may execute an SQL query in multiple ways, depending on the selected operators' order and algorithms. One crucial decision is the order in which the optimizer should join relations. In this blog post, we define the join ordering problem and estimate the complexity of join planning.
In this blog post, we will discuss what a cost of a query plan is and how it can drive optimizer decisions.
Query optimization is an expensive process that needs to explore multiple alternative ways to execute the query. The query optimization problem is NP-hard, with the number of possible plans growing exponentially with the query's complexity. This blog post will discuss memoization - an important technique that allows rule-based optimizers to consider billions of alternative plans in a reasonable time.
Vladimir founded Querify Labs with a vision to bridge the gap between cutting-edge research in data management and practical systems. Before that, Vladimir worked at Hazelcast, where he led the development of a distributed SQL engine for in-memory data. Vladimir loves to hack open-source products and talk about them at international IT conferences. Vladimir is a contributor to the Apache Calcite project.
Alexey brings 10 years of experience building complex distributed systems and data storages. Before Querify Labs, Alexey worked at GridGain, where he was responsible for the overall product architecture, playing a pivotal role in developing persistence, replication, and transaction protocol for the Apache Ignite project. Alexey's area of interest includes query optimizers, concurrent algorithms, and formal methods. Alexey is a committer to the Apache Ignite project.
Roman is responsible for the architecture of data management systems at Querify Labs. Before joining the company, Roman worked at Yandex, the largest internet company in Eastern Europe, where he built the query optimization engine for massively parallel data processing. Roman is passionate about query optimization, looking for practical solutions to NP-hard problems on a daily basis. Roman is a contributor to the Apache Calcite and Apache Ignite projects.