[mkgmap-dev] Increasing the performance of whole-planet splits.

Sat Jun 12 19:43:15 BST 2010

Hi Chris, hi Scott,

I like new ideas and the vital discussion about your new concepts!! This 
is good for the evolution of the OSM project!

>
> On Fri, Jun 11, 2010 at 6:52 PM, Chris Miller
> <chris_overseas at hotmail.com <mailto:chris_overseas at hotmail.com>> wrote:
>
>     Hi Scott,
>
>...
>
>     Using a binary format between the splitter and mkgmap would speed things
>     up immensely too, in fact we've discussed it a bit in the past on
>     the mailing
>     list. Before we jump in too deep however, can you comment on how
>     well your
>     format has been accepted/intergrated into Osmosis or any other osm
>     projects,
>     and how stable the file format now? (ie, has it been battle tested? ;)
>
>
> No use that I know of. I got busy after my initial emails, so I haven't
> had time to clean it up and package it as a true osmosis plugin. After I
> do that, then the chance that it may get more use increase, especially
> if other tools are engineered to use it. I've been using it for my own
> use for the last couple of months.

So that's the typical chicken-and-egg-situation.
As far as I understand mkgmaps design supports multiple input formats. 
At the moment most of the effort is spent to the OSM format but it 
should be possible to write some new reader classes to support the new 
format. There will be work to do to move some basic handling from the 
OSM-Reader so that it can be used by the new reader but refactorings are 
not a no-go reason.

>
>     If
>     it looks like becoming a standard/supported format in the osm
>     community then
>     supporting it as both input and output formats in the splitter is a
>     bit of
>     a no-brainer, likewise as an input format for mkgmap (Steve and co,
>     correct
>     me if I'm wrong!). I admit I'm a bit wary of the complexity of the
>     format
>     and the potential maintenance effort required, especially if the
>     format isn't
>     used elsewhere in the osm community other than splitter and mkgmap.
>
>
>
>
>
>     I've had a look at this patch - I'd be interested to hear your
>     thinking behind
>     the changes you made (because I'm too lazy to try and get my head around
>     it by studying the code!). Are you sure it's behaving as you intended?
>
>
> Yes.
>
>     The
>     --max-threads parameter is being ignored, instead you're creating as
>     many
>     threads as there are areas in a pass (up to 255). This seems to
>     result in
>     fairly unpredictable bursts in activity, sometimes quite agressive,
>     but generally
>     most of the threads are sitting idle.
>
>
> Yes, to some extent the activity is variable. However it seems that the
> current design has the worker threads effectively busywaiting, trying
> each of the queues to see if there is data in it to process?

Yes that's true. I observed in tests that all threads could be fed with 
data all the time on my 4 thread system. But of course this must not be 
true for all systems. It's easy to write a patch to add some sleep-time 
to the busywaiting if no data is available. Please let me know if this 
patch is needed.
A better patch would be to use an ExecutorService with a thread pool.

>
>     In my benchmarks I'm seeing your patch
>     run a bit quicker and with higher CPU utilisation than the r109
>     code. It's
>     about 20% faster for a UK extract, 9% faster for the planet (on a
>     core i7).
>     Note that WanMil's original patch uses (cores - 1) for maxthreads.

One requirement I tried to achieve is the possiblity to define the exact 
number of threads used by the splitter. I think this is interesting and 
important for server hosted map providers (Lambertus, AllInOne etc.) who 
are not alone on their host.
Splitter uses cores-1 for the writing threads to ensure that the reading 
thread always has one core. This might be improved by different thread 
priorities (but all my readings about Java thread priorities told me: 
it's quite useless... Do you know more?)

>
>     I suspect the reason the planet doesn't do so well as a smaller area
>     is because
>     the planet has more areas in the split and so takes longer to fill
>     up a given
>     'bundle' to trigger some processing.
>
>
> I suspect it is from a different cause. A planet split has
> proportionally more serial code than a UK extract. On a planet, summing
> over each pass, each node is sent to all 800 areas to see if it is to be
> included. On UK, each node is only sent to the 20 or so areas. Now that
> reading is so fast, I strongly suspect the system is bottlenecked in
> this serial code.

Sounds like the need for a profiling session...

>
>     This reduces the amount of CPU used
>     on average, since we'll get more idle periods (when many areas are still
>     filling their buffers), but the maximum possible work being done
>     when buffers
>     fill up is still throttled by the number of cores.
>
>
> I don't believe so. On a core i7, the system is rarely using more than 2
> cores, which indicates that parallel code cannot be a bottleneck. In the
> steady state, a bundle can be processed in a few ms each (my core2 duo
> can do 500k nodes/sec), the aggregate unprocessed work of all incomplete
> bundles is a second or two of CPU time. This work is not avoided, only
> delayed. In the steady-state, this is lost in the noise. Reducing the
> bundle size will only change the delay before the work is done, at a
> cost of increasing the number of memory barriers in the concurrent queue
> code.
>
>     I know people leave the splitter
>     running as a background task on their desktop PC so if we're
>     periodically
>     bursting 10+ threads at full CPU I can imagine they might get a bit
>     annoyed
>     at times. Thoughts?
>
>
>  From so many cores being idle, I don't see how we could be getting
> bursts of 10+ threads unless the serial code gets sped up. This could be
> a risk if the density-map idea goes in and massively reduces the serial
> work. If this worries you, maybe hold off on this part of the patch
> until after the densitymap is in, or put it in and be prepared to revert
> later?
>
>     A couple of ideas I have here... perhaps reducing the bundle size
>     will help
>     keep the workers fed (especially when there is a larger number of
>     areas),
>     and also using a threadpool of max-threads to do the work on, rather
>     than
>     one thread per area, would help prevent CPU starvation for other
>     applications,
>     without adversely effecting performance?
>
>
> If CPU starvation is a problem, can't the splitter be set to run with a
> lower priority?

That should be possible. But from my experience well dimensioned thread 
pools are a better solution.

WanMil

>
> Scott
>
>
>