Paper - Optimizing Database-Backed Applications with Query Synthesis
Query-by-Synthesis (QBS) [@cheung2013optimizing] is a system developed to optimize database applications via static analysis. It is a very powerful system that, in the end, converts sections of code that would otherwise perform relational-query-like operations on the application server to a semantically equivalent SQL query that can be run on the DBMS.
Summary
The general process they use is:
- Convert the code (specifically Java, although there is nothing really Java specific about this) to a kernel language, which is chunked into optimizable sections.
- Based on the kernel language, create a set of Hoare-logic post conditions and invariants that must hold after each section of code. This is done in terms of a theory they develop, which is very similar to relational algebra, except that it is defined in terms of ordered lists instead of sets.
- Create queries that satisfy the post conditions.
Seems simple enough, but that second step is doing a lot of heavy lifting, both in theoretic complexity and computation time. They find that an analysis like this can dramatically increase throughput and decrease latency, even on code that has already had significant manual optimizations performed on it.
There are some limitations:
- The analysis cannot handle all Java features. For instance, if there is a side effect, that forces a "break" in the analysis. On either side of the side effect, optimizations can be performed, but no code including a side effect can be optimized. (Note: for the purposes of the paper, it seems like they were not including some kinds of state as a side effect)
- The second step involves generation of loop invariants. Since this is an undecidable problem, there are some programs on which this analysis will time out.
- This analysis takes a significant amount of time (they mention up to ten minutes). While they didn't talk about what exactly took so long, my best guess is the fact that there is a constraint solver substep of step 2.
- Even some generally used relational operations (
unique
andappend
) are unsupported in their analysis
Koby's thoughts
While this paper presents a very powerful system, it still suffers from the same issues I mentioned in my post on Ramachandra et. al.'s work: namely that the optimizations done by this analysis do not leave any trace in the code, while also not being universal. This puts a developer at risk of unknowingly causing significant performance regressions. I'm also pretty disappointed in the time it takes to run. Ideally, we would have some kind of a tool that could run at interactive latencies.
One thing that I've found interesting reading this paper (and a couple of others) is that there seems to be a focus on moving certain kinds of computation to the DBMS to take advantage of things like indexing and optimized query plans. While this is a great reason to do so, equally important in my mind is simply reducing the number of round-trips between the application server and database server. QBS does accomplish this, but it seems like it mostly does it incidentally, or at least very little attention is payed to it in the paper.