Tile splitter for mkgmap
The format used for Garmin maps has, in effect, a maximum size, meaning that you have to split an .osm file that contains large well mapped regions into a number of smaller tiles. This program does that. There are two stages of processing required. The first stage is to calculate what area each tile should cover, based on the distribution of nodes. The second stage writes out the nodes, ways and relations from the original .osm file into separate smaller .osm files, one for each area that was calculated in stage one.
For a full discussion see the news post[1], but the two most important features are:
- Variable sized tiles so that you don't get a large number of tiny files.
- Tiles join exactly with no overlap or gaps.
First
You will need a lot of memory on your computer if you intend to split a large area. You need about 8 to 10 bytes for every node and way. This doesn't sound a lot but there are about 400 million nodes in the whole planet file and so you cannot process the whole planet file on a 32 bit machine using this utility as the maximum java heap space is 2G. It is possible with 64 bit java and about 4GB of heap.
The Europe extract from Cloudmade or Geofabrik can be processed within the 2G limit if you have sufficient memory. It takes about half an hour, which is 3 times longer than it takes mkgmap to compile the resulting files. With the default options europe is split into about 110 tiles. The Europe extract is about a quarter the size of the complete planet file.
On the other hand a single country, even a well mapped one such as Germany or the UK, will be possible on a modest machine.
Download
Download from the splitter download directory
The source code is only available from subversion: at http://svn.mkgmap.org.uk/splitter/trunk
Usage
Splitter requires java 1.6. Run the following. If you have less than 2G of memory on your computer you should reduce the -Xmx argument
java -Xmx2000m -jar splitter.jar file.osm
This will produce a number of compressed .osm files that can be read by mkgmap. There is also two other files produced called template.args and areas.list.
The template.args file is a file that can be used with the -c option of
mkgmap that will compile all the files. You can use it as is or you can copy it and edit it to include your own options. For example instead of each description being "OSM Map" it could be "NW Scotland" as appropriate.
The areas.list file is the list of bounding boxes that were calculated. If you want you can use this on a subsequent call the the splitter using the --split-file option to use exactly the same areas as last time. This might be useful if you produce a map regularly and want to keep the tile areas the same from month to month. It is also useful to avoid the time it takes to regenerate the file each time (currently about a third of the overall time taken to perform the split). Of course if the map grows enough that one of the tiles overflows you will have to re-calculate the areas again.
You can also use a gzip'ed or bz2'ed compressed .osm file as the input file. Note that this can slow down the splitter considerably (particularly true for bz2) because decompressing the .osm file can take quite a lot of CPU power. If you are likely to be processing a file several times you're probably better off uncompressing it separately first, or using the --cache parameter to create a disk cache that can be used in subsequent runs instead of the compressed .osm file.
Options
There are a number of options to fine tune things that you might want to try.
- --mapid=63240001
- Set the filename for the split files. In the example the first file will be called 63240001.osm.gz and the next one will be 63240002.osm.gz and so on.
- "--description=OSM Map"
- Sets the desciption to be written in to the template.args file.
- --max-nodes=1600000
- The maximum number of nodes that can be in any of the resultant files. The default is fairly conservative, I think you could increase it quite a lot before getting any 'map too big' messages. I've not experimented much. Also the bigger this value, the more memory required during the splitting stage.
- --max-areas=255
- The maximum number of areas that can be processed in a single pass during the second stage of processing. This must be a number from 1 to 255. Higher numbers mean fewer passes over the source file and hence quicker overall processing, but also require more memory. If you find you are running out of memory but don't want to reduce your --max-nodes value, try reducing this instead. Changing this will have no effect on the result of the split, it's purely to let you trade off memory for performance. Note that the first stage of the processing has a fixed memory overhead regardless of what this is set to so if you are running out of memory before the areas.list file is generated, you need to either increase your -Xmx value or reduce the size of the .osm file you're trying to split.
- --overlap=2000
- The splitter includes nodes outside the bounding box, so that mkgmap can neatly crop exactly at the border. This parameter controls the size of that overlap. It is in map units, the default of 2000 is about 0.04 degrees of latitude or longitude.
- --split-file=areas.list
- Use the previously calculated tile areas instead of calculating them from scratch.
- --cache=<directory>
- This parameter is optional and won't affect the outcome of the split, but it can reduce the time it takes for the split to complete. Setting this parameter will cause the splitter to generate several files in the specified directory during the first stage of the split. These files contain the same information as the source .osm file(s) do, but in an optimised format that allow subsequent passes over the data to happen much more quickly. The more passes that happen in the second stage of the split, the greater the speedup you will see.
- Some benchmarks have shown the following speed improvements when running against uncompressed .osm files:
- 1 pass - 5% faster
- 2 passes - 25%
- 3 passes - 35%
- 4 passes - 40%
- 5 passes - 45%
- If you are using compressed .osm files (bz2 compression especially), the speed improvement should be greater still.
- Note that these figures are very approximate; the actual performance will vary depending on your disk and CPU speed, the particular map being processed, and what other disk and CPU activity is taking place on your PC at the same time. In some cases you might find that splits that only require a single pass will run faster without the disk cache enabled.
- The disk cache can be used across multiple runs of the splitter, as long as you are splitting the same .osm file(s) each time. For example suppose you ran a splitter with --cache=. --max-nodes=1500000, but found that max-nodes value was too high when running mkgmap. You can run the splitter again, this time with --cache=. --max-nodes=1200000, and the existing cache files will be used to perform the split. This will be much faster than reprocessing the .osm file. When you reuse a cache like this, there is no need to specify the .osm file on the command line as all the required data will be read in from the cache directly. Remember to delete the cache files if you want to rerun the splitter with a different osm file.
- Note that the disk cache can require a lot of disk space, typically about 20-25% of the space the uncompressed .osm file takes up. For example the 27GB europe.osm file generates a cache of just over 5GB.
Notes
- There is no longer an upper limit on the number of areas that can be output (previously it was 255). More areas just mean potentially more passes being required over the .osm file, and hence the splitter will take longer to run.
- There is no longer a limit on how many areas a way or relation can belong to (previously it was 4). There IS still a limit of 4 areas per node. If you do hit this limit, a warning will be output and the node (and any associated ways or relations) will only be written to 4 of the areas. In practice it is unlikely you will hit this limit unless you choose small values for --max-nodes (300000 or so). Please let us know if this is a genuine problem for you - it is possible to change the splitter so it can handle this though it does have an associated memory overhead.
- It is not possible to work on a complete planet file on a 32 bit OS, but working with an extract of the whole of Europe is fine within a 2G heap size. A single country is not a problem at all, even a well mapped one.