logo separator

[mkgmap-dev] Problems with latest splitter versions

From Scott Crosby scott at sacrosby.com on Sat Mar 5 16:00:23 GMT 2011

On Fri, Mar 4, 2011 at 6:12 PM, Michael Prinzing <mipri at gmx.net> wrote:
> On Thu, 3 Mar 2011 21:16:26 -0600, Scott Crosby wrote:
>>On Wed, Mar 2, 2011 at 5:36 PM, Michael Prinzing <mipri at gmx.net> wrote:
>

>
> If the node IDs are starting at 2073741824 (= 2^30 + 1e09) there is
> still an exception. It happens immediately when the splitter begins to
> write out the data. I've posted an example yesterday, but with the new
> splitter there is an other exception:

Yes, I would expect this. *All* node ID's should be less than
2,000,000,000. Start at 1,950,000,000 at the largest.

>
> Exception in thread "main" java.lang.IndexOutOfBoundsException: Index (32402221) is greater than or equal to list size (31250001)
>        at it.unimi.dsi.fastutil.objects.ObjectArrayList.get(ObjectArrayList.java:258)
>        at uk.me.parabola.splitter.SparseInt2ShortMapInline.put(SparseInt2ShortMapInline.java:128)
>        at uk.me.parabola.splitter.SparseInt2ShortMultiMap$Inner.put(SparseInt2ShortMultiMap.java:81)
>        at uk.me.parabola.splitter.SparseInt2ShortMultiMap.put(SparseInt2ShortMultiMap.java:31)

>
>
> Next try with node IDs beginning at 1773741824. Again I get an
> exception but this time after the splitter has written a part of the
> output, not immediately as above. In this time, the splitter uses an
> huge amount of memory, and so the exception says:
>
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>        at it.unimi.dsi.fastutil.longs.LongArrays.ensureCapacity(LongArrays.java:107)
>        at it.unimi.dsi.fastutil.longs.LongArrayList.ensureCapacity(LongArrayList.java:202)
>        at it.unimi.dsi.fastutil.longs.LongArrayList.size(LongArrayList.java:271)
>        at uk.me.parabola.splitter.SparseInt2ShortMapInline.resizeTo(SparseInt2ShortMapInline.java:97)
>        at uk.me.parabola.splitter.SparseInt2ShortMapInline.put(SparseInt2ShortMapInline.java:125)
>        at uk.me.parabola.splitter.SparseInt2ShortMultiMap$Inner.put(SparseInt2ShortMultiMap.java:81)
>        at uk.me.parabola.splitter.SparseInt2ShortMultiMap$Inner.put(SparseInt2ShortMultiMap.java:79)
>        at uk.me.parabola.splitter.SparseInt2ShortMultiMap.put(SparseInt2ShortMultiMap.java:31)
>        at uk.me.parabola.splitter.SplitProcessor.writeWay(SplitProcessor.java:231)
>        at uk.me.parabola.splitter.SplitProcessor.processWay(SplitProcessor.java:134)
>        at uk.me.parabola.splitter.OSMParser.endElement(OSMParser.java:253)
>        at uk.me.parabola.splitter.AbstractXppParser.parse(AbstractXppParser.java:57)
>        at uk.me.parabola.splitter.Main.processMap(Main.java:399)
>        at uk.me.parabola.splitter.Main.writeAreas(Main.java:355)
>        at uk.me.parabola.splitter.Main.split(Main.java:188)
>        at uk.me.parabola.splitter.Main.start(Main.java:116)
>        at uk.me.parabola.splitter.Main.main(Main.java:105)
>
> This looks like if the fix that should extend the possible range for
> the IDs to 1,999,999,999 does not work yet.

This illustrates another problem. SparseInt2ShortMapInline uses less
than a third of the memory of a typical hash table, when node ID's are
dense, when   $max(id)/count(nodes)$   is small. For typical OSM
datasets, this is true and we benefit from the reduced memory usage

However, unlike a typical hash table, SparseInt2ShortMapInline's
memory usage is proportional to $max(id)$. When you put in a single
nodes with an ID around 2,000,000,000, you blew up its memory usage.
Even without that, the splitter's current memory usage on current
planets is getting close to the limits available in a 32-bit JVM on
windows.

> Unfortunately it is not
> sufficient in this case to have just a few nodes and ways, so I cannot
> provide a small piece of data to reproduce it. Of course I could send
> you the whole file which has about 60MB in PBF format.
>
> If I am using IDs beginning at 1573741824, everything works fine (even
> with the same memory settings as before).
>
> For now I can assign IDs that the splitter is able to handle, but
> sooner or later this will be a problem. If OSM is growing like it did
> in the last months, it will reach node IDs of 2e09 and above pretty
> soon (next year). And the new version of srtm2osm is also generating
> node IDs from 2e09 on upwards to avoid collisions with the OSM data.
> While splitter r161 could handle this, it is not possible to process
> this data with the new splitter versions.

The splitter *will* blow up with any node ID > 2**31. And after making
it 64-bit clean, memory usage for the splitter will grow by  max(id).

Scott



More information about the mkgmap-dev mailing list