logo separator

[mkgmap-dev] splitter r254

From Gerd Petermann gpetermann_muenchen at hotmail.com on Mon Dec 10 11:12:14 GMT 2012

Hello Klaus,

here is a comparison to the trunk version r202  (I hope it is complete without being to complex). 
It would be great if someone else could put this into a better readable format once the
changes are in the trunk version.

Corrections:- Prevent overflow in node counters reported here:
- Missing data because of rounded/trimmed bounding boxes, reported here:

Added debugging features :+ parameter stop-after allows to stop execution after a given program phase was executed. This saves time
when debugging / testing the new split algorithm (see below)
+ parameter output=simultate allows to simulate the whole split process without writing data to tiles.
I use this to avoid writing masses of data to my SSD 
+ if no split-file is given, splitter will write a file densities-out.txt containing the densitiy data that was used
to calculate the tile areas. When debugging, you can rename this file to densities.txt and place it into the 
same directory as splitter.jar. If splitter finds such a file, it will read the content of the file instead of parsing
the input file. This saves a lot of time (the densities.txt for whole planet has just ~ 47Mb)

Other new features (less important first): + in addition to the areas.list file splitter writes an area.poly in the osmosis polygon file format.
+  o5m format supported for reading and writing:
For input, the file name has to end with .o5m, for output you have to specify parameter --output=o5m .
The o5m format requires more disk space but is faster to read. This is espicially true on slower cpus.
+ polygon file handling:
With parameter --polygon-file you can pass a bounding polygon to splitter. This is probaly only useful
when you want to use  an input file that contains much more data than the map that you want to create, 
for example you may create a polygon file covering scandinavia and use europe or planet as input.
The polygon file is only used when splitter has to calculate the areas (no --split-file parameter given) and it is 
only used to calculate the areas. With a given polygon file, a special split algorithm is used 
which tries to create tiles that cover the bounding polygon completely, but not too much outside of the
polygon. The parameter no-trim is ignored if --polygon-file is used.

+ a new split algorithm was implemented to address two problems:
 ++ r202 may create tiles with only a few nodes, this leads to serious problems described here:
 ++ r202 with no-trim=true may create huge, almost empty tiles, this leads to problems in mkgmap. Details
were described here:
The new algorithm tries to optimize the created tiles so that
- the number of tiles is small
- the aspect ratio is near 1 (values between 0.25 and 4 are considered to be nice)
- no tile contains less than max-nodes/3 nodes 
- no tile is larger than 90° in longitudes and 85° in latitude 

It is not alwys possible to find a split that meats all these goals, esp. not if you provide a bounding polygon.
A few users reported problems in mkgmap with the results of the new algorithm (higher memory needs,
smaller max-jobs parm needed)
It is not yet clear if these problems are to be solved in splitter or in mkgmap.

+ problem-list handling:
Two new approaches have been implemented to solve the frequently reported problem of flooding:

These problems are caused by the split process. Splitter r202 simply divides multipolygon relations 
into parts that lie within one tile. Later, mkgmap has to guess how the original polygon was closed.
This guessing fails from time to time. The solution in r202 is to specify a large enough overlap value. 

Approach 1)
The new parameter --problem-file allows to specify a list of known problem relations and ways. 
A list containing many problem cases can be found here:
To use such a file you have to specify --problem-file=<path to file>
A way or relation listed in this file is treated specially by splitter:
 - ways: 
 ++ if the way is closed (first and last node reference are equal), splitter calculates
the bounding box of the way and writes the complete way to each tile that intersects with 
the bounding box (complete means with all referenced nodes that were found in the input file)
 ++ if the way is not closed, splitter calculates the tiles that are crossed by the way
and writes the complete data to those tiles

- relations: 
A relation is completely written to all tiles that 
- contain one or more nodes listed as members of the relation
- contain one or more nodes listsed as members of the ways of the relation
- are crossed by one or more ways of the relation
- are enclosed by one or more ways of a type=multipolygon relation
Note that a relation with type=multipolygon is treated similar to a single closed way, splitter calulates the
bounding box of each area enclosed by one or more ways building a closed polygon.
The complete relation is written to each tile that intersects with any of the calculated bounding

The problem file still has some disadvantages:
- not up to date until maintanance
- user has to verify the result of the map, if something is wrong, he has to find the id
of the way or relation that causes the problem, add it to the problem file and restart the 
whole process of map creation.  This can be very time consuming and it is still likely that
the user will not find all broken polygons. The solution is 

Approach 2)
the parameter --keep-complete
which should be used instead of --problem-file 
With keep-complete splitter reads the input file multiple times to detect those polygons
that are divided during the split process. Splitter thus creates the list of problem 
cases and handles them exactly the same way as described above.

Advantage of --keep-complete compared to --problem-file :
- no need to maintain a list of problem cases
Advantage of --keep-complete and problem-file compared to a large --overlap value:
- makes sure that problem polygons are complete
 (of course only if input file is complete)
- doesn't write a lot of "noise" like houes or road which are in the overlap area,
but not at all related to the bounding box of the tile

Drawback of --keep-complete compared to --problem-file:
- Splitter is slower because it has to read the input file more often and the 
processing of all problem ways and relations requires additional memory on heap.
On a 32bit system, it is not possible to split whole planet with --keep-complete, 
because you need around 4GB of heap to process all the problem cases.
On the other hand, on a 64bit system with at least 8GB you can split 
planet using e.g.
java -Xmx7000m -jar splitter.jar --max-areas=2048 --keep-complete--output=xml  planet.o5m
(note that I use output=xml because both the o5m and pbf writer require too much heap
to write ~1500 areas in one pass. Each open *.o5m or *.pbf file requires more than 1Mb
for string tables and other stuff, the xml writer needs almost no fixed storage)

-For some tiles, unneeded data is written if they lie within the bounding box of 
huge multipolygon relation, but not within any of the polygons described by the relation.


> Date: Sun, 9 Dec 2012 12:49:31 -0800
> From: easyclasspage at googlemail.com
> To: mkgmap-dev at lists.mkgmap.org.uk
> Subject: Re: [mkgmap-dev] splitter r254
> Hi Gerd,
> release candidate ... sounds good after all the hard work.
> Is it possible for you to write a (short) summery concerning all changes and
> enhancements ?
> (I have to admit that I got lost in "all" the splitter threads.)
> Regards Klaus
> --
> View this message in context: http://gis.19327.n5.nabble.com/splitter-r254-tp5739717p5739736.html
> Sent from the Mkgmap Development mailing list archive at Nabble.com.
> _______________________________________________
> mkgmap-dev mailing list
> mkgmap-dev at lists.mkgmap.org.uk
> http://lists.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.mkgmap.org.uk/pipermail/mkgmap-dev/attachments/20121210/5027cfd7/attachment.html 

More information about the mkgmap-dev mailing list