logo separator

[mkgmap-dev] Address & city country name assignment.

From Dermot McNally dermotm at gmail.com on Wed Feb 16 12:09:58 GMT 2011

It's amusing and not particularly surprising how, as soon as we have
searchable maps, we discover the importance of having better
addressing information about locations. So far a lot of a fundamental
principles have been mentioned:

* That using is_in information is easy, but not satisfactory, since
it's often missing, inconsistent, poorly maintained and hard to use to
infer a hierarchy of belonging (arbitrary bits of streets usually
don't have it set, so how do you  make a best guess of what nearby
element should "own" it?

* That boundary polygons are increasingly present on our map, that
they can solve most of the problems of is_in, that they are already
succeeding is_in for other address-sensitive applications in OSM, but
that they are very hard to process as part of how mkgmap processes the
map.

I am convinced that is_in is never going to give us satisfactory
results, that we cannot trust the values entered in that field by
mappers and that, the more boundary polygons are used to solve other
problems, the less is_in will even be maintained. I have not been
entering is_in in my mapping for at least two years, at most I will
correct entries by others.

Mkgmap needs to, at those parts of the process where address hierarchy
information is currently inferred, be capable of querying an external
source to find the required information. Because at least some of my
ideas for a possible source are a little cumbersome, it would probably
be ideal if a number of options are permitted, rather like how drawing
the sea is managed. One of the address lookup "plugins" would probably
be the existing simple one based on is_in, for users who want to avoid
extra prerequisites.

So if that's what a simple, poorly-functioning address plugin looks
like, what would the best one look like? Right now, the ultimate OSM
geocoder is Nominatim. It is capable of consuming a place name or
co-ordinate (of a road segment, say) and deducing an address
hierarchy. It already uses the best clues available to do this -
including both boundary polygons and is_in tags. And because an entire
hierarchy is deduced, it offers us the flexibility to index locations
under more than one hierarchy element, as many commercial Garmin maps
seem to. For instance, my current location might reasonably be
searched for under any of the following names in the city field:

Dublin (city of which my location is a suburb)
Dublin (historical county where I am located)
Dublin 15 (postal district)
Blanchardstown (Historical village and focus of modern suburb)

and there are even sub-parts of Blanchardstown, typically
corresponding to old rural "townlands" that might be searched for:

Corduff, Ongar, Carpenterstown.

Only the most disciplined maintainer of is_in will capture enough
information to permit matching on all of these elements and there is
no way sufficient consistency will exist. So a Nominatim lookup is the
way to go, as we export all of the problems to an externally
maintained tool.

The snag: Even though Mapquest, who currently host the biggest public
Nominatim instance, are very generous with the level of API lookups
they allow there will be trouble if every mkgmap user performs
thousands of Nominatim lookups when refreshing their Garmin maps. It
will also be slow and bandwidth-intensive. This can be solved somewhat
by having one's own instance of Nominatim, possibly containing only an
interesting subset of the map. It would very likely prove worthwhile
to define a cache file format into which to stuff those results of the
query that mkgmap will require.

If these cache files were maintained by country of bbox, they could be
calculated centrally by people with sufficient hardware or expertise,
then made available for download by normal users. This is a lot like
what Steve suggests above, but without the expectation that mappers
maintain the address file (because they just plain won't, and the
required information is already available from Nominatim, so it would
be a waste anyway).

I'm interested in your comments on this. While to do what I describe
certainly requires some hard work, it's all front-loaded, once we can
find a working framework we never have to worry about it again. Well,
not until Nominatim is superseded by an even more awesome geocoder.

Dermot

-- 
--------------------------------------
Igaühel on siin oma laul
ja ma oma ei leiagi üles



More information about the mkgmap-dev mailing list