logo separator

[mkgmap-dev] Patch to reduce memory usage by interning strings.

From Scott A Crosby scrosby at cs.rice.edu on Wed Mar 31 21:10:40 BST 2010

On Wed, 31 Mar 2010 21:13:49 +0200, WanMil <wmgcnfg at web.de> writes:

>> I noticed that mkgmap does not intern any strings. In particular, this
>> tile, generated by the splitter, fails to build with -Xmx3000m on
>> 64-bit jdk under linux. With my patch, mkgmap generates the tile with
>> -Xmx1000m.
>>
>>      <bounds minlat='55.1953125' minlon='9.4921875' maxlat='56.6015625'
>>      maxlon='11.513671875'/>
>>
>> This tile has 1m nodes. Among the nodes and ways on this tile, there
>> are 12m tags, yet only 100k distinct tag key/value pairs; on average
>> each value occurs 120 times.
>>
>> I explicitly do not use normal string interning because
>> String.intern() strings are kept forever, and I want these strings to
>> be GC'able after the tile is done. I trade GCability for having the
>> occasional string duplicated in memory by flushing the interning table
>> every 10k unique strings.
>>
>> This code is not presently multithread safe; Ideally there should be
>> one string interning table for each parser/thread.
>>
>> Scott
>>
>
> Hi Scott!
>
> I think that's a good idea to intern the strings.
> As far as I know the LossyIntern class is not needed. The .intern()
> function of a string does exactly the same.

You are right. String intern does not intern forever at least since
Java 1.2.

> Some time ago I sent a very similar patch to the mailing list which
> is not yet committed. Could you please test with your use case if it
> performs a similar memory reduction?

You can run it if you want, but from the numbers I gave above for this
tile, interning values as in my patch will decrease the number of
strings in RAM from 12M to <100k values. Interning only keys would
reduce the number of Strings in RAM from 24M to 12M.


> The patch is thread safe and does not intern all strings. In my
> opinion the value of a name tag should not be interned because there
> is a high probability that this tag is used once only.

Thats probably true for many or most tiles, but not for the tile I
referenced above, where on average each value occurs 120 times. That
tile is unbuildable with a 3gb heap without my patch and buildable
with 1gb heap with my patch.

Shall I post an updated patch without FuzzyIntern?

Scott



More information about the mkgmap-dev mailing list