logo separator

[mkgmap-dev] Address search and index.

From WanMil wmgcnfg at web.de on Mon Feb 14 18:09:36 GMT 2011

Am 14.02.2011 13:20, schrieb Steve Hosgood:
> On 14/02/11 10:21, Steve Ratcliffe wrote:
>>
>>> If there are faults in the data, they should be fixed.
>> Well I get England, Great Britain, Great Britian, and United Kingdom.
>>
>> One is spelling mistake and so, fair enough, should be fixed and there
>> is not going to be any argument from anyone about that.
>>
>> But the others are not wrong and might be fine in other situations.
>>
>> A few examples from England:
>>
>> 	k='is_in' v='Nantwich, Cheshire, England, United Kingdom'
>> 	k='is_in' v='UK, England, County Durham, Teesdale'
>> 	k='is_in' v='England, Essex'
>>
>
> It's really an issue to be debated at OSM level, not mkgmap level, but I
> have considered for quite a while now that the "is_in" tag should be
> entirely deprecated in favour of a concept of boundary polygons. "Is_in"
> is fraught with problems, some illustrated above.
>
> The country of England should have a polygon around its border with the
> tag "country=England" and therefore could also be tagged with
> "country:cy=Lloegr" (and other language specifics). Any item within that
> polygon would automatically considered "is_in=England" (and Lloegr to
> allow for searching in Welsh). Within the England polygon would be a set
> of non-intersecting polygons each tagged "county=<wherever>" and again
> some of them might have foreign language variants - which would be fine.
> Any item within those polygons would automatically considered
> "is_in=Cumbria" or wherever (plus foreign-language variants of county
> names where they exist). The nesting would continue right down to hamlets.
>
> Notice how quickly you could fix a mistake - no need to trawl through
> millions of "is_in" tags looking for inconsistencies, spelling mistakes
> etc. Just fix the correct enclosing polygon.
>
> There is a war memorial in England (not far from London) which is
> officially "in" the USA! I think this is called an "enclave"... but even
> an enclave situation could be handled by polygons. Why should there not
> be a polygon around that site claiming "country=United States"??
>
> The interesting question from a polygon-parsing point of view is whether
> you'd need to establish a fixed heirarchy of tags each with "levels"
> assumed so that if you encountered "country=" within a "county="
> polygon, that the "county=" would be forgotten about within the inner
> "country=" polygon. This would make sense - if that war memorial was
> "in" the USA, it can hardly also be "in" Middlesex (or wherever) which
> is a county of England. But see below for an alternative system...
>
> Back to "England" again. If England is a "country", what is the UK
> deemed to be? In some ways the "country" should be "UK", and "England",
> "Wales", "Scotland" and "Northern Ireland" are "states". But that's not
> how any resident of the UK would see it. It's just down to semantics.
>
> You could get rid of this "what's it called" argument by requiring the
> outer polygon to be tagged "political:level=1"
> "political:designation=country" "name=United Kingdom" "alt-name=UK".
> Further in you'd get a polygon tagged "political:level=5"
> "political:designation=county" "name=Kent" and eventually (inside that)
> maybe a polygon tagged "political:level=6" "political:designation=town"
> "name=Canterbury". This removes the need for a fixed known hierarchy of
> polygons (mentioned above when disussing the handling of enclaves) - if
> you encountered a polygon with a given "political:level=" it would,
> within its bounds,  cancel any supposed enclosing polygon with an equal
> or higher numerical "political:level=" tag.
>
> It also allows an easy way to handle the cases where some cities in the
> UK are considered to be counties in their own right (Swansea for
> instance - it is inside a polygon of "political:level=5"
> "political:designation=county" "name=South Glamorganshire" but the city
> itself could be polygonned as "political:level=5"
> "political:designation=city and county" "name=Swansea" "name:cy=Abertawe".
>
> In order to implement anything like this, we'd need a way to work out
> quickly for any point on a map what the enclosing hierarchy of polygons
> would be. That in turn probably means having to implement knowledge of
> it in the splitter so that a small part of the planetary map still has
> intact (but truncated) nested polygons.
>
> PS: I guess "Great Britain" would be handled by a polygon around just
> the right-hand island of the British Isles (and its sub-islands)
> claiming "geographical:level=1" "name=Great Britain" "name:de=Groß
> Britannien" "name:fr=Grande Bretagne" etc.
>
> Mkgmap would probably not be interested in "geographical:*" tags.
>

Steve, that was my first idea and I think that's the only solution to 
the problem. But it's not that easy to realize.

I propose two solutions.
1. Quick fix
Use the Locator.xml to merge different notations of the same country / 
region name. This won't be perfect but will probably fix the most 
obvious problems for a first release of the index branch.

For this purpose we need a good source to initially create a full 
release of the Locator.xml. For this we could use osmosis filtering a 
planet dump for the boundary multipolygons (or boundary tags in 
general). If this data is too incomplete there is also a good list in 
the english wikipedia (http://en.wikipedia.org/wiki/ISO_3166-2). The 
german wikipedia also contains templates with worldwide regional 
information (http://de.wikipedia.org/wiki/Vorlage:Info_ISO-3166-2:GB). 
Maybe someone likes to extract that data?!?

2. OSM boundary data (the general solution)
It sounds great to use the OSM boundary data but there are some pitfalls 
we need to go around. I'll list the pitfalls here. Maybe someone finds 
an easy solution for them.

1st problem: Splitter (as you already mentioned)
The tiles do not contain the full information for multipolygons that 
exceed the tile bounds. I don't think that this will be easy fixable. 
You would need to implement a complete multipolygon handling in splitter 
to decide which data must additionally added to a single tile. That's a 
big deal and will consume lots of resources.

2nd problem: Incomplete data
The boundary data has a similar structure to the coastline data. The 
coastline processing is working now with mkgmap but the failure rate is 
quite high. Only a single OSM data failure can cause the complete 
workflow to fail.

3rd problem: Amount of data
A solution for pitfall 1 (and 2) could be to provide quality checked 
extra data containing boundary information only. This is already 
available for the generate-sea processing. You can provide the coastline 
data in a separate file. But the amount of data will be VERY high. I 
don't think that it is a good thing to have minimum memory requirements 
of some GB.
So in the end we would need to throw away the tile concept and implement 
a database interface for mkgmap.
Maybe that's the solution?

WanMil




More information about the mkgmap-dev mailing list