logo separator

[mkgmap-dev] Patch to reduce memory usage by interning strings.

From Chris Miller chris_overseas at hotmail.com on Wed Mar 31 23:12:12 BST 2010

Note that Java's String.intern() method can be pretty slow, so while you'll 
save a fair chunk of memory you'll potentially suffer a noticable performance 
hit too if you're calling it a lot. By adding a barrier-free caching layer 
in front of the String.intern() calls you can gain a reasonable performance 
boost in this situation. As an example of how this can be implemented, take 
a look at Lucene's SimpleStringInterner which does exactly this:

http://github.com/apache/lucene/blob/1c5c409241a2b8b9e64dc8c253791b497a66c369/src/java/org/apache/lucene/util/SimpleStringInterner.java

It's threadsafe in that it guarantees just enough visibility to never generate 
invalid results, yet also avoids any blocking. Might be worth benchmarking 
something like this against the normal String.intern() with mkgmap.

Chris


> On Wed, 31 Mar 2010 21:13:49 +0200, WanMil <wmgcnfg at web.de> writes:
> 
>>> I noticed that mkgmap does not intern any strings. In particular,
>>> this tile, generated by the splitter, fails to build with -Xmx3000m
>>> on 64-bit jdk under linux. With my patch, mkgmap generates the tile
>>> with -Xmx1000m.
>>> 
>>> <bounds minlat='55.1953125' minlon='9.4921875' maxlat='56.6015625'
>>> maxlon='11.513671875'/>
>>> 
>>> This tile has 1m nodes. Among the nodes and ways on this tile, there
>>> are 12m tags, yet only 100k distinct tag key/value pairs; on average
>>> each value occurs 120 times.
>>> 
>>> I explicitly do not use normal string interning because
>>> String.intern() strings are kept forever, and I want these strings
>>> to be GC'able after the tile is done. I trade GCability for having
>>> the occasional string duplicated in memory by flushing the interning
>>> table every 10k unique strings.
>>> 
>>> This code is not presently multithread safe; Ideally there should be
>>> one string interning table for each parser/thread.
>>> 
>>> Scott
>>> 
>> Hi Scott!
>> 
>> I think that's a good idea to intern the strings.
>> As far as I know the LossyIntern class is not needed. The .intern()
>> function of a string does exactly the same.
> You are right. String intern does not intern forever at least since
> Java 1.2.
> 
>> Some time ago I sent a very similar patch to the mailing list which
>> is not yet committed. Could you please test with your use case if it
>> performs a similar memory reduction?
>> 
> You can run it if you want, but from the numbers I gave above for this
> tile, interning values as in my patch will decrease the number of
> strings in RAM from 12M to <100k values. Interning only keys would
> reduce the number of Strings in RAM from 24M to 12M.
> 
>> The patch is thread safe and does not intern all strings. In my
>> opinion the value of a name tag should not be interned because there
>> is a high probability that this tag is used once only.
>> 
> Thats probably true for many or most tiles, but not for the tile I
> referenced above, where on average each value occurs 120 times. That
> tile is unbuildable with a 3gb heap without my patch and buildable
> with 1gb heap with my patch.
> 
> Shall I post an updated patch without FuzzyIntern?
> 
> Scott
> 






More information about the mkgmap-dev mailing list