Tree Optimizations: Processing Millions of Users Lightning Fast

By Nandakumar Edamana

I’ve written a lot of code that involves the tree data structure. Most of it is in the context of language processors like trans-compilers. After parsing, compilers internally represent a program as a tree (called “Abstract Syntax Tree”).

Although compilers are complicated, the base tree-specific code in them is not challenging. But I’ve worked on programs where the data of millions of users have to be processed, and they are arranged in a tree that can go really deep. This is very challenging.

Two critical problems that can arise: (i) slow execution, and (ii) crashes due to stack overflow. The latter is obviously fatal, but don’t think that the slow execution is only an inconvenience. It’s also fatal, when you have timeouts like in a serverless environment.

So how did I overcome these issues?

The solution to the first problem is classic performance engineering: it involves a lot from writing low-level code to profiling and optimizations, backed by basic knowledge on concepts like memory hierarchy and cache locality.

What about the second problem?

Tree-related code is prone to stack overflow because basic tree-traversal algorithms use recursion. Remember that the trees I was working on had millions of nodes and could go really deep.

Functional programming languages are usually good at dealing with situations like this without a crash, as recursion is a central concept in them. But I was working on with C and Go, and I had to solve the issue manually. Yes, Go’s stack can handle really deep recursion, and C Compilers like GCC can convert.

Some recursive code to iteration on its own at higher optimization levels, but neither was enough in my case. What I did was converting some recursions to tail calls and some other to stack-based iteration, both manually.

The end result was programs that processed millions of hierarchical and inter-dependent user data in a matter of seconds (a few minutes when database is involved).