A Parallel World: The Page Rank Mystery

I kick-started my quest for some long slow code, out there, lost in the GitHub world, waiting to be rescued by some parallel programmer, who'd walk by, and wave his magic openmp wand and change the world. Eventually I discovered that hunting for "slow code" is a wild goose chase; it makes much more sense to look for "interesting code". Voila-algorithm implementations!

So I stumbled upon this repository implementing the infamous page rank algorithm. Much to my disappointment, it was very well-written in Greek. The readme directed me to this useful link: How Google Finds Your Needle in the Web's Haystack, that explained the latent ugly math of the algorithm. Henceforth, I dived into the C++ code to get a clearer picture of what is happening.

The code was surprisingly very fast for large inputs (about 1.2 seconds for 100000 vertices). This is probably because of the extensive use of library functions that are assuredly compiler friendly. An extremely useful lesson: When you write code, exploit the language library as much as you can! Nevertheless, when I looked at 0.65 seconds response time on Google search results page, I hoped that there might be some scope for improvement.

A quick examination of the code from the perspective of a profiler yields obvious results : that the page rank computation is essentially the bottleneck. I suppressed my instincts to instantly use the #pragma mantra and played around with the serial code. For instance, it might have caught your eye that the current element of the H vector is computed in each and every iteration!

But wait! Aren't computations involving double data types expensive? Loss of precision? It depends on the representation? Either way, why would I want to compute it every single time? Can't I just compute once and store it in an array? Isn't that more cache friendly?

Yes. I had the same set of questions on my mind. As it turns out, I could not have been more wrong. The computation, though expensive, is conditional. For a sparse matrix, pre-computing values in an array would hardly make any difference; in fact, pray that it does not add more overhead. This led me to my second lesson: Don't always blame the expensive computation.

Parallelizing the computations on the rows using good old openmp reduced the running time by 0.2 seconds (16% speedup). Frankly, I was balked. Ouch! Expectations hurt. I welcome any comments on your speculation of the reason.

Moving on, my next move was to address the smaller elephant in the room - the file input. Reading a huge file line by line seems highly inefficient for several reasons such as increase in number of system calls and therefore increased interrupts, the file pointer has to be moved more frequently etc...

So, what is the most efficient way to read a file? Perhaps read the entire file into a string and then parse the string. I tested the same with a sample file and realized that this was indeed the case.

Hurrah! Our page rank algorithm got 0.15 s faster!

Summing up, I believe that the optimization is still "work in progress". But that is for another time in another post. Till such time, happy coding in a parallel world. 😊

P.S: All the relevant code and readme files will be available in my gitHub account.

A Parallel World

Sunday, 29 January 2017

The Page Rank Mystery

No comments:

Post a Comment

Blog Archive