[mkgmap-dev] Increasing the performance of whole-planet splits.

Thu Jun 10 20:35:44 BST 2010

Hello! I would like to submit the first of a set of performance
optimizations to the
splitter that improve the performance and reduce memory usage.
The end result is that a core 2 duo can split the entire planet in 3.5 hours
with
-Xmx1700m and without using any temporary space. (67 minutes on a 8gb core
i7)
Everything is up on github. I'm sending the first set of patches now.

Perhaps the biggest change is that I've made the splitter accept my binary
OSM format.
I created a binary OSM format that is 30-50% smaller than the bzipped
planet (about 5.3gb without metadata) and 5x-10x faster to parse than
the gzipped planet. As that format is substantially smaller than the
splitter's cache, and to simplify my patch, I removed the cache before
refactoring the
splitter. My original announcement is at
http://lists.openstreetmap.org/pipermail/dev/2010-April/019370.html
Since that post I have fixed the issue of unwanted extra precision.

I want your thoughts on the binary format before submitting it. Please have
a look and tell me what you think. In this email I am submitting some other
optimizations and patches. The performance numbers
I quote assume that I was using the binary format; I am
unsure what benefit they have on trunk.

In IntIntMap, the current offset value for rehashing is way to low,
given the dense nature of OSM nodeID values and a sequential insertion
order. This problem is most obvious on a whole-planet split.
It results in huge numbers of collisions; requiring 30+ probes
on average. Note when benchmarking that the collision rate is
irregular throughout the planet file; with a --max-nodes=4000000 whole
planet split, there's virtually no performance differences on the
first 90M nodes, yet by 120M nodes, using a random 31-bit prime number
is over 3x faster.

My multithread changes reduced the real and user CPU time by 30-50% on a uk
extract. Other tests, e.g., a whole planet, show almost no effect. I think
this is because a
lot of my changes increase the efficiency of code that is parallelized,
which paradoxically
ends up reducing the relative benefits of parellelization.

The single biggest remaining slowdown is that for each point, the splitter
sends each Node to each area (represented
by an OSMWriter), which writes it in the output file if it is in the bounds
of the region (with overlap), or does
nothing otherwise. On a whole planet, with 840 areas and 550M nodes, that is
over 5 trillion bounds-checks that are
not run in parallel. An index that can can map a node to the regions that
may contain it could avoid 99% of these checks.

Scott
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.mkgmap.org.uk/pipermail/mkgmap-dev/attachments/20100610/d5415cb9/attachment.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0004-Lazily-generate-the-tag-hash-table.patch
Type: text/x-patch
Size: 1595 bytes
Desc: not available
Url : http://lists.mkgmap.org.uk/pipermail/mkgmap-dev/attachments/20100610/d5415cb9/attachment.bin 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0005-Change-the-offset-for-the-hash-table.patch
Type: text/x-patch
Size: 827 bytes
Desc: not available
Url : http://lists.mkgmap.org.uk/pipermail/mkgmap-dev/attachments/20100610/d5415cb9/attachment-0001.bin 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0006-IntIntMap-Simplify-hash-search-function.patch
Type: text/x-patch
Size: 1740 bytes
Desc: not available
Url : http://lists.mkgmap.org.uk/pipermail/mkgmap-dev/attachments/20100610/d5415cb9/attachment-0002.bin 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0008-New-thread-design.patch
Type: text/x-patch
Size: 7485 bytes
Desc: not available
Url : http://lists.mkgmap.org.uk/pipermail/mkgmap-dev/attachments/20100610/d5415cb9/attachment-0003.bin