logo separator

[mkgmap-dev] splitter memory usage

From Scott Crosby scott at sacrosby.com on Sun Oct 30 20:38:49 GMT 2011

On Thu, Oct 27, 2011 at 4:56 AM, GerdP <gpetermann_muenchen at hotmail.com> wrote:
> Hello list,
>
> since a few days I try to understand the idea of the class
> SparseInt2ShortMapInline in splitter because this seems to waste memory
> although it is documented to save it. If I got this right, the amount of
> needed memory directly depends on the highest id that is saved, thus for
> example the data for a small country like german "bundesland" saarland (from
> geofabrik) is not processed with
> java -Xmx700 -jar splitter.jar --max-areas=2 --max-nodes=200000
> e:\dwnload\saarland.osm.pbf

There are two main algorithms.

   O(A*# distinct ID's), using a hash table, with A being around 64
bits (with 32 bit node IDs) or 100 bits (with 64 bit node IDs)

But, this would result in too much memory usage for a whole planet
due to the constant factor of the A. The current algorithm is

   O(B*#maximum_id) with the constant

But, factor B is a lot smaller than A, around 4 bits unused entry and
20 bytes per used entry For a whole planet split, this is a third or
half of the memory of the first choice.

> I tried that with R181 and R180.
> The program runs without problems when I change the code so that
> SparseInt2ShortMultiMap always uses Int2ShortOpenHashMap and not
> SparseInt2ShortMapInline.
> Was this code optimized for a special input (e.g.with small id values) ?

No. The code is tuned for whole-planet splits, where minimizing memory
consumption is paramount. The right algorithm would adapt the number
of nodes, using Int2ShortOpenHashMap when for small splits (say,
country-sized or smaller) and SparseInt2ShortMapInline for whole
planet splits. Maybe do it as a command line option?

I recommend against your proposed change later in the thread because
it will increase the memory usage by at least 500mb for whole-planet
splits. (1.5B nodes requires at least 25M Storers, with 8 bytes object
overhead +  4 bytes hashtable key + 4 bytes hashtable value pointer +
50% hashtable overhead, or about 20 bytes each)

Scott



More information about the mkgmap-dev mailing list