logo separator

[mkgmap-dev] tile takes very long time to generate

From Gerd Petermann gpetermann_muenchen at hotmail.com on Thu Apr 29 09:31:13 BST 2021

Hi Ticker,

I did a few tests with a tile in Zambia that contains some complex multipolygons and nestings. No
1) disable partitioning
2) disable partitioning and skip isLineInShape() test if center of geometry of the inner is IN for inner AND outer
3) use code in branch as is (enabled partitioning)
4) enable partitioning  and skip isLineInShape() test if center of geometry of the inner is IN for inner AND outer

Times for the compilation of the tile are
39 secs for 1)
32 secs for 2)
4 secs for 3) and 4)

With 3) and 4) I see only ~2 secs for the "processing MP relation" part, so it would require more complex relations to see improvements with 4)

If you know a quick and stable and simple method to find a point that is IN please let me know and I do some more testing. The getCofG() only works well for convex polygons. For around 3 percent of the polygons in my tests getCofG() returns a point outside of the polygon, and those polygons are likely more complex.

I've also done some tests with PrecompSeaGenerator and the complex tile sea_2752512_851968.osm.pbf
Results are more or less equal for all 4 methods, the processing of the generated MP takes ~20 secs, only ~2 secs are used in createContainsMatrix() and ~18 secs are used in MultipolygonCutter

At the moment I try to decide what to do with incomplete data. I see three different situations with incomplete relations:
1) BoundaryPreprocessor is used with a country extract or maybe the combination of multiple extracts from geofabrik or other sources.
In this case I tend to think that we can either ignore incomplete data completely or we might use the *.poly file(s) that were used to cut the data to "close the boundary polygon along the cutting polygon". The latter might help to avoid situations where the LocationHook returns no info for points outside of the cutting polygon.
2) mkgmap is used with splitter data and splitter was used with --keep-complete=false
In this case lots of multipolygons are incomplete, esp. those near the tile boundary. We MUST do some guessing about the missing data, else the maps are more or less unusable because large wood or lake areas wood be missing.
3) mkgmap is used with splitter data and splitter was used with --keep-complete=true
In this case we see incomplete MP when the country extract was already incomplete or with those MP that splitter doesn't keep complete.
It's hard to say if we can ignore those incomplete MP or not. Most of them are boundaries. The default style doesn't render any boundary as polygon, but other styles do, e.g. the OpenFietsMap styles.

Maybe a simple new option "--ignore-incomplete-data" could improve performance. This would be easy to implement. A much more complex solution would be a new hook that is called directly before the processElements() call. It would use a new rule file similar to relations and could set special tags to tell mkgmap if it should calculate the areas of the multipolygon or just handle the outlines to produce the mkgmap:stylefilter=polyline ways.
Just an idea so far.

The current code only tries to use the "tile boundary" to complete data. I think it would be better to use the cutting polygon. I started to experiment with that but found out that many *.poly files from geofabrik intersect with the country borders which are expected to be fully inside. I contacted them about this and Frederick Ramm confirmed that they want to fix that.

ciao,
Gerd

________________________________________
Von: mkgmap-dev <mkgmap-dev-bounces at lists.mkgmap.org.uk> im Auftrag von Ticker Berkin <rwb-mkgmap at jagit.co.uk>
Gesendet: Dienstag, 27. April 2021 18:09
An: Development list for mkgmap
Betreff: Re: [mkgmap-dev] tile takes very long time to generate

Hi Gerd

Point IN and area should be adequate for determining enclosed/enclosing
polygons relationships. Overlapping / intersecting is an error anyway,
but, even with badly/arbitrarily closed boundaries, provided the
closing takes a similar route around the known mapArea, the area shouldsuffice.

Ticker


On Tue, 2021-04-27 at 14:23 +0000, Gerd Petermann wrote:
> Hi Ticker,
>
> I don't care much about invalid MP so far. It is not allowed that
> inner ways and outer ways are connected, nor is it allowed that outer
> rings share segments.
> So, at the moment that means Garbage in -> Garbage out.
> There are some special cases more than two outer ways are connected
> at the same node and there should be a back-tracking algo to combine
> them in a ways that doesn't produce self intersecting rings. No idea
> how to implement that.
>
> reg. containsMatrix: I did not find any problems with nested polygons
> so far, but yes, the partitioning is more likely to produce more edge
> cases.
>
> I don't see how a single call of isPointInShape() can be enough. It
> only helps if it returns OUT, but that is unlikely. If it returns IN
> the polygons may still be overlapping. I've not yet decided what to
> do with those MP because the real MP with complete data might be
> correct and the overlapping could be a result of artifically closed
> rings.
>
> I've just noticed that the branch is sometimes erronously connecting
> ways which trunk doesn't connect (only with BoundaryRelation).
>
> I think it will take a one or two more weeks before this branch is
> getting stable.
>
> Gerd
>
> ________________________________________
> Von: mkgmap-dev <mkgmap-dev-bounces at lists.mkgmap.org.uk> im Auftrag
> von Ticker Berkin <rwb-mkgmap at jagit.co.uk>
> Gesendet: Dienstag, 27. April 2021 15:37
> An: Development list for mkgmap
> Betreff: Re: [mkgmap-dev] tile takes very long time to generate
>
> Hi Gerd
>
> I'd have thought using isPointInShape() in combination with polygon
> areas to be more efficient and robust than isLineInShape() (which
> makes
> multiple calls to isPointInShape). Would need to choose a reference
> point within each polygon, ensuring it is IN but not ON itself.
>
> Probably splitShape(), isPointInShape() and
> calcAreaSize()/calcAreaSizeTestVal() all have very similar costs;
> linear on number of points.
>
> I understand some benefits of splitting a massive outer into
> multiples
> polygons, in that a point-IN question might give an quicker answer,
> but
> it must introduce a lot of other difficulties with edge cases on the
> artificial boundaries, etc, etc. Splitting inners shouldn't cause a
> probem in this respect (just a list of inners), but if there is
> another
> outer in the hole, this will become confusing.
>
> A couple of other things I meant to mention following earlier faster
> -mp
> svn updates:
>
> I found cases, as both inner and outer, where adjacent rectangles (eg
> buildings) are touched together, with either a duplicate or identical
> member being the dividing wall. I realise that some combinations of
> this violate OSM MP rules, but, keeping the duplicate, allows this
> simple "double-touching" structure to be resolved; where there are 4
> ways at a node, don't join 2 that are identical or have same other
> -end
> and no other points.
>
> Joining ways if possible in the initial element iteration and
> ignoring
> roles can lead to problems when inner and outer touch. Whereas saving
> them all in a Map of both ends, before attempting to join, will avoid
> the problem for single-touching. For double-touching, any roles can
> be
> used to join the correct parts. For ordering stability if using a
> HashMap, could just keep track of lowest element# in joinedWays and
> always keep this first.
>
> Ticker
>
>
> On Mon, 2021-04-26 at 14:17 +0000, Gerd Petermann wrote:
> > Hi Ticker,
> >
> > I think I am making progress with my divide&conquer approach. I see
> > ~20 secs instead of more than 7 minutes to process the monster
> > relation r9488835.
> > The basic approach is to use ShapeSplitter.splitShape() multiple
> > times to divide the multipolygon into smaller pieces where no ring
> > has more than 20.000 nodes and executing the block
> >
> > analyseRelationRoles()
> > ...
> > doReporting()
> > for each piece.
> >
> > The work is not yet done, the output file is a bit larger, not yet
> > sure why, and the calculation of the "largestOuterPolygon" neads a
> > bit more work.
> > There are probably edge cases where this doesn't work, esp. when
> > inners are very large.
> >
> > BTW: I found a bug in IsInUtil.isLineInShape() : It sometimes
> > returns
> > ON when it should return IN/ON
> > I think it happens when a ring has start /end node ON and also 2nd
> > or
> > 2nd-last node ON.
> > Probably a special case created by the ShapeSplitter. Attached
> > patch
> > fixes the problem in IsInUtil.
> >
> > Gerd
> >
> > ________________________________________
> > Von: mkgmap-dev <mkgmap-dev-bounces at lists.mkgmap.org.uk> im Auftrag
> > von Ticker Berkin <rwb-mkgmap at jagit.co.uk>
> > Gesendet: Mittwoch, 7. April 2021 13:02
> > An: Development list for mkgmap
> > Betreff: Re: [mkgmap-dev] tile takes very long time to generate
> >
> > Hi Gerd
> >
> > Problem is that IdentityHashMap is the ideal solution, but ordering
> > behaviour depends on internal java object handles. A solution to
> > this
> > stability issue would be to rotate joinedWays.originalWays so, say,
> > the
> > one with the lowest ID is first, doing this just before the full
> > point
> > -list is created.
> >
> > Maybe not worth it if existing logic is not a problem.
> >
> > Some of the other advantages my logic:
> >
> >   Clearer and more efficient.
> >
> >   Elements are scanned once; cOg and closed polygons are processed
> >   immediately. Elements from SeaGenerator are all closed.
> >
> >   Single touching polygons are handled without difficulty;
> >   just part of the way the algo is expressed.
> >
> >   Maintaining jw.inner/outerCount gives clarity to this issue.
> >   Can be used for error reporting and behavioral definition.
> >   If the roles are not to be trusted, the counts can be adjusted
> > when
> >   the geometry is determined.
> >
> > Ticker
> >
> >
> > On Wed, 2021-04-07 at 08:57 +0000, Gerd Petermann wrote:
> > > Hi Ticker,
> > >
> > > reg. mpInitialLogic.patch: I think I tried the same approach to
> > > join
> > > the ways in the past. Problem is that the patched code produces
> > > rings
> > > with an unpredictable start/end and thus random content in RGN.
> > > This
> > > makes comparisons of two mkgmap outputs much more difficult if
> > > not
> > > impossable.
> > >
> > > The existing algo works quite well if the members are sorted and
> > > most
> > > complex MP are sorted well enough. I think in the worst case
> > > (badly
> > > shuffled member list) the algo complexity is probably O(n²) , in
> > > the
> > > best case it is O(n) where n gives the number of unclosed ways in
> > > the
> > > member list.
> > >
> > > Maybe you find a way to use the HashMap to avoid the sequential
> > > search for the next member without adding too much complexity to
> > > the
> > > current code?
> > > Or a simple sort to avoid the worst case?
> > > Even with worse cases I don't see more than a few seconds for
> > > very
> > > complex MP.
> > >
> > > Gerd
> >
> > _______________________________________________
> > mkgmap-dev mailing list
> > mkgmap-dev at lists.mkgmap.org.uk
> > http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
> > _______________________________________________
> > mkgmap-dev mailing list
> > mkgmap-dev at lists.mkgmap.org.uk
> > https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
> _______________________________________________
> mkgmap-dev mailing list
> mkgmap-dev at lists.mkgmap.org.uk
> https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
> _______________________________________________
> mkgmap-dev mailing list
> mkgmap-dev at lists.mkgmap.org.uk
> https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
_______________________________________________
mkgmap-dev mailing list
mkgmap-dev at lists.mkgmap.org.uk
https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev


More information about the mkgmap-dev mailing list