logo separator

[mkgmap-dev] Problem with splitter

From Gerd Petermann gpetermann_muenchen at hotmail.com on Sun Mar 18 08:59:11 GMT 2012

Hi Richard,

generally I think an OSM element is "touching" a tile when it would change 
the output of mkgmap if we simply remove it.
So, splittter should not remove something that "touches" a tile without
adding something else that allows to produce the same result.
Maybe it would be better to say "has influence on" instead of "touches".

I have to find out how the algorithms that are triggered with parameters 
like --add-pois-to-lines or --location-autofill=nearest are affected.
Imagine a point near the edge of a tile (inside), and a node X that marks a city 
which lies outside of the tiles bounding box, but nearer than any city inside the
bounding box. I'd say that node X touches the tile, but I fear that splitter will 
not be able to detect that in a reasonable processing time.

One more definition: An OSM element is a multi-tile-element if it is touching 
more than one tile.

See further comments below...
 
> Date: Sat, 17 Mar 2012 19:03:02 -0400
> From: rhansen at bbn.com
> To: mkgmap-dev at lists.mkgmap.org.uk
> Subject: Re: [mkgmap-dev] Problem with splitter
> 
> On 2012-03-17 14:31, Gerd Petermann wrote:
> >>> a) for each coord, way: find out all tiles that are "touched",
> >>> save the infoin a map
> >>
> >> What is the definition of "touched"?
> >
> > A point normally lies in exactly one tile.  Nevertheless a point can
> > be written to more tiles because of the overlap handling.  With
> > "touched" I mean the point is contained in the bounding box of the
> > tile with its overlap.
> 
> If I understand your definition correctly, in the following ASCII art 
> figure, the way is not touching the tile because none of the nodes are 
> in the tile's extended bounding box.  Correct?

Sorry, I did  not define what "touching" means for ways. I think a way 
touches the tile if it 
- has at least one point in it or
- any line of the pass passes through the tile or
- the way is closed and encloses the tile

> 
>      o-----------o
>      |  _______  |
>      | |       | |
>      | |       | |
>      | |_______| |
>      o-----------o
> 
> If the way designates a lake, wouldn't this definition result in a hole 
> in the lake?
> 
> Also, what if a way passes through a tile but none of that way's nodes 
> are in the tile?  Does the way touch the tile?  An illustration of this 
> scenario:
> 
>          _______
>         |       |
>      o--|-------|--o
>         |_______|
> 
> >>> b) for each relation, find out all tiles that are "touched", and
> >>> add this information to the way map (so that each way belonging
> >>> to a relation will be written to all tiles that is touched by the
> >>> relation)
> >>
> >> Same question: What does it mean for a relation to touch a tile?
> >

see above

> > This is something I want to find out with this discussion.
> > Splitter r200 simply does this:
> > If a way has at least one point in a tile, then the way is written
> > to that tile.
> > If a relation has at least one point or a way that is written to a
> > tile, then the relation is also written to that tile.
> > I think a correct solution should additionally write the relation to
> > all tiles that are fully enclosed.
> 
> Couldn't a similar argument be made for closed ways?  (A way should be 
> written to a tile if the tile is fully enclosed by the closed way.)

yes, sure.

> 
> >> Suppose I feed all of North America to splitter. Wouldn't this
> >> algorithm write all of the nodes, ways, and relations that compose the
> >> United States administrative boundary to every tile in the US?
> >
> > Yes, without a clever filter algorithm this could be the case.  I
> > still hope that we can find an algorithm that writes all that is
> > needed, but not more.  With needed I mean all information that mkgmap
> > needs to correctly handle the ways and relations.
> 
> For a tile in the middle of the United States, mkgmap doesn't need all 
> of the ways and nodes that compose the complete United States 
> administrative boundary.  It only needs to know that the tile is 
> completely contained within relation 148838 and therefore all of that 
> relation's tags apply to the tile's entire area.

yes, that would be sufficient. 

> 
> If relation 148838 was included in the tile but none of its members were 
> included, mkgmap would not know what part(s) of the tile were covered by 
> the relation.  That information would have to be communicated somehow.

For that case I would like to write the relation id, its OSM tags plus one 
special tag like 

mkgmap:covers_tile=y

and nothing else (assuming that no point or way or sub-relation "touches" the tile)


> 
> I see two ways to obtain complete data in a tile:
> 
>    1. Splitter simply acts as a filter, choosing which objects to 
> include in a tile.  In this case, to have complete information, every 
> tile in the middle of the United States would have to include all of the 
> objects that compose the complete United States administrative boundary.
>    2. Splitter somehow communicates extra data to mkgmap.  The extra 
> data indicates which parts of the tile are covered by an incomplete way 
> or relation.
> 
> I don't think option #1 will work; each tile would contain too many objects.

yes, my test results for a rather small file like italy did already show 50% 
more data in the output, so that doesn't seem to be a good idea.

> 
> I don't know enough about the OSM data format to know if option #2 is 
> palatable.  Creating fictitious OSM objects is not a good idea.  Perhaps 
> each tile's .pbf file can be accompanied by an .aux file indicating what 
> parts of the tile are covered by each incomplete way or relation 
> mentioned in the .pbf file.  Handling the case where part of an area's 
> border passes through a tile would be a bit tricky but doable.

Well, I already suggested one additional file containing all multi-tile-elements
(all nodes of all ways that are touching multiple tiles, plus all multi-tile-relations 
with all referenced nodes and ways). Thorsten did not like that idea, see
http://gis.19327.n5.nabble.com/Problem-with-splitter-tp5555886p5560250.html

I don't know if he would also complain about one *.aux file for each *.pbf ?


> 
> Here's my attempt at an algorithm.  It's conceptual; I don't attempt to 
> minimize memory usage for example.

Well, I hope that memory is not a problem if we allow one more pass. 
Splitter requires most of its 
heap to store the information to which files a node has to be written.
This information is no longer needed (better: can be calculated again) 
when we know which ways and relations are multi-tile-elements. 
So, when this information is calculated, we can read again the ways
(and only the ways) to collect all needed node ids, and finally read again 
the nodes (and only the nodes) to do the needed calculations with those 
nodes that are relevant for the multi-tile-elements.

> 
> 1. calculate tile areas by only looking at nodes
> 2. for each object:
>     a. if the object is a node, write the node to the appropriate tile
>     b. if the object is a way:
>        i. for each node in the way, write the way to the tile containing 
> the node (unless already written)
>        ii. for each line segment in the way:
>            A. if the segment touches more than one tile, write the way 
> and the two nodes on either end of the line segment to each tile touched 
> by the line segment (unless already written)
>        iii. if the way intersects itself one or more times, determine 
> the areas defined by the way
>        iv. for each area defined by the way:
>            A. if the area's size > 0 and the area intersects more than 
> one tile, write the way to each intersecting tile (unless already 
> written).  also, for each intersecting tile, write aux data indicating 
> which parts of the tile are covered by the area.
>     c. if the object is a relation:
>        i. if one of the relation's members was written to a tile, write 
> the relation to the same tile (unless already written)
>        ii. determine the areas defined by the relation (if any)
>        iii. for each area defined by the relation:
>            A. if the area's size > 0 and the area intersects more than 
> one tile, write the relation to each intersecting tile (unless already 
> written).  also, for each intersecting tile, write aux data indicating 
> which parts of the tile are covered by the relation.
> 

> Writing aux data indicating which parts of the tile are covered by the 
> way/relation:
> 
> 1. if the entire tile is covered by the way/relation, write something 
> that means "way/relation XYZ covers the entire tile" to the tile's aux file
> 2. otherwise:
>     a. determine the area(s) that result from calculating the 
> intersection of the tile area and the way/relation area.
>     b. for each area calculated above:
>        i. if the area does not touch the tile bounding box, skip it 
> (mkgmap will have enough information in the tile to derive it on its own)
>        ii. if the area's size = 0, skip it
>        iii. pick an arbitrary point inside the area, call it P
>        iv. the area's border is defined by the tile's bounding box and 
> one or more ways.  determine those ways (e.g., foo, bar, baz)
>        v. write something that means "way/relation XYZ covers the part 
> of the tile containing point P whose border is defined by ways foo, bar, 
> and baz and the tile bounding box" to the tile's aux file.  Or maybe 
> just write a sequence of points that define the area's border (whatever 
> is easiest for splitter and mkgmap).

> Thoughts?

I think this could work, but it seems to require a lot of coding both in 
splitter and mkgmap to create and interpret the aux file. 

My proposal for an algorithm to select required nodes for one tile looks like this:

For all closed multi-tile-polygons: calculate the bounding box of the polygon.
If the bounding box of the tile intersects with the bounding box of the polygon:
a1) mark all nodes of the polygon that lie within the tile (marking means: set a flag that connects 

the node id to the tile-writer id). If none was found, check if polygon encloses the tile
and continue with next polygon
a2) iterate over all nodes: mark each node that is next to a node which was 
already marked above. Maybe mark all remaining nodes if there are just a few 
more.
b) if all nodes are marked: we're done: continue with next multi-tile-polygon
else connect the marked nodes (in the same order as in the original polygon)
If the intersection of the reduced polygon is equal to the intersection of the 
complete polygon, we're done: continue with next multi-tile-polygon
We need an efficient algorithm for this comparison, 
we probably don't have to calculate the intersection of areas, we just have 
to make sure that none of the shortcuts crosses the tile and that the 
direction of the polygon remains the same (clockwise/counterclockwise)
c) mark further node(s):
c1) if the new connection of two marked nodes crosses the tile: 
  mark also the midpoint of the tow nodes
c2) if no additional crossings were detected, but the direction of the new polygon is different,
mark the first unmarked point (I don't see a way to select a good candidate, so the first one is 
the easiest)
continue with b)

(I have to make sure that this will not add a polygon which doesn't intersect with the tile)

When all multi-tile-polygon are processed (for all tiles) we can start writing the 
tiles.
For each node: if the node-id was marked in the above process, send it to the calculated writers
For each way or relation: if it is not a multi-tile-element, send it completely to its writer,else send a modified 
version to each writer, so that the way or relation doesn't reference members that were not send to the writer.
For the tile-enclosing polygons: add the mkgmap:covers_tile=y tag when the relation is written to 
the enclosed tile.

I see no open problem besides runtime. The memory requirent should be only a few percent higher 
compared to r200. Maybe the new algorithm will allow to significantly 
reduce the overlap, but the additional passes require at least one additional reading of the 
input file (for *.pbf I've already implemented a method that allows to skip blocks that are not needed for 
a given pass, but with *.osm format we'll require three more passes or we need some caching 
or we don't allow the new algorithm with osm format.

Gerd




> 
> -Richard
> 
> > See also Wolfgangs suggestions:
> > http://gis.19327.n5.nabble.com/Problem-with-splitter-tp5555886p5573307.html
> >
> > I personally also don't like the idea of inventing points or ways, so
> > I'd prefer a
> > solution that simply doesn't write what isn't needed.
> >
> > Gerd
> >
> >  > -Richard
> >  > _______________________________________________
> >  > mkgmap-dev mailing list
> >  > mkgmap-dev at lists.mkgmap.org.uk
> >  > http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
> >
> >
> > _______________________________________________
> > mkgmap-dev mailing list
> > mkgmap-dev at lists.mkgmap.org.uk
> > http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
> 
> _______________________________________________
> mkgmap-dev mailing list
> mkgmap-dev at lists.mkgmap.org.uk
> http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.mkgmap.org.uk/pipermail/mkgmap-dev/attachments/20120318/9a491ec5/attachment.html 


More information about the mkgmap-dev mailing list