[mkgmap-dev] Increasing the performance of whole-planet splits.

Sat Jun 12 00:52:29 BST 2010

Hi Scott,

> I created a binary OSM format that is 30-50% smaller than the bzipped
> planet (about 5.3gb without metadata) and 5x-10x faster to parse than
> the gzipped planet. As that format is substantially smaller than the
> splitter's cache, and to simplify my patch, I removed the cache before
> refactoring the
> splitter. My original announcement is at
> http://lists.openstreetmap.org/pipermail/dev/2010-April/019370.html
> Since that post I have fixed the issue of unwanted extra precision.
> I want your thoughts on the binary format before submitting it.

I've finally had a bit of time to read over your original post and the followups 
to it. (Note that, as I understand it, the OSM dev mailing list and Osmosis 
don't really have any direct relationship with the splitter or mkgmap. mkgmap 
is a tool originally written by Steve Ratcliffe purely to port osm data to 
Garmin devices, and the splitter was an additional tool he wrote to help 
achieve that goal and it has evolved from there. This probably explains why 
no one from the mkgmap/splitter community noticed or commented on your original 
post on the osm-dev list - if they're anything like me they don't pay too 
much attention to what's going on there so I guess it got missed).

I really like what you've done and can see the benefits of having a compact 
& fast binary format when it comes to processing large amounts of osm data. 
I can also see you've put a lot of thought and effort into how the file is 
structured. Having something like this as a standard for tools like mkgmap 
and splitter would be very beneficial in terms of performance and interoperability.

As far as the splitter is concerned, I'd be happy to use it internally instead 
of the current cache format given the speed boost it appears to offer. I've 
always considered the cache to be a short lived/temporary thing anyway, purely 
as an aid to speeding up the split, so anything that helps that is a win. 
Using a binary format between the splitter and mkgmap would speed things 
up immensely too, in fact we've discussed it a bit in the past on the mailing 
list. Before we jump in too deep however, can you comment on how well your 
format has been accepted/intergrated into Osmosis or any other osm projects, 
and how stable the file format now? (ie, has it been battle tested? ;) If 
it looks like becoming a standard/supported format in the osm community then 
supporting it as both input and output formats in the splitter is a bit of 
a no-brainer, likewise as an input format for mkgmap (Steve and co, correct 
me if I'm wrong!). I admit I'm a bit wary of the complexity of the format 
and the potential maintenance effort required, especially if the format isn't 
used elsewhere in the osm community other than splitter and mkgmap.

Also, I see you complained about the lack of common libraries/data structures 
between Osmosis, splitter and mkgmap. I guess it's the way it is because 
they've all evolved independently with no real incentive to extract common 
code. If your file format were to become standardised, there's no reason 
why the splitter couldn't be refactored so it was based off a standard library 
and data structures. Hopefully other projects could too, but I'm not involved 
in them so I'll leave that aspect for others to comment on separately.

> My multithread changes reduced the real and user CPU time by 30-50% on
> a uk extract. Other tests, e.g., a whole planet, show almost no
> effect. I think this is because a lot of my changes increase the
> efficiency of code that is parallelized, which paradoxically ends up
> reducing the relative benefits of parellelization.

I've had a look at this patch - I'd be interested to hear your thinking behind 
the changes you made (because I'm too lazy to try and get my head around 
it by studying the code!). Are you sure it's behaving as you intended? The 
--max-threads parameter is being ignored, instead you're creating as many 
threads as there are areas in a pass (up to 255). This seems to result in 
fairly unpredictable bursts in activity, sometimes quite agressive, but generally 
most of the threads are sitting idle. In my benchmarks I'm seeing your patch 
run a bit quicker and with higher CPU utilisation than the r109 code. It's 
about 20% faster for a UK extract, 9% faster for the planet (on a core i7). 
Note that WanMil's original patch uses (cores - 1) for maxthreads.

I suspect the reason the planet doesn't do so well as a smaller area is because 
the planet has more areas in the split and so takes longer to fill up a given 
'bundle' to trigger some processing. This reduces the amount of CPU used 
on average, since we'll get more idle periods (when many areas are still 
filling their buffers), but the maximum possible work being done when buffers 
fill up is still throttled by the number of cores.

A couple of ideas I have here... perhaps reducing the bundle size will help 
keep the workers fed (especially when there is a larger number of areas), 
and also using a threadpool of max-threads to do the work on, rather than 
one thread per area, would help prevent CPU starvation for other applications, 
without adversely effecting performance? I know people leave the splitter 
running as a background task on their desktop PC so if we're periodically 
bursting 10+ threads at full CPU I can imagine they might get a bit annoyed 
at times. Thoughts?

Chris