logo separator

[mkgmap-dev] mixed index branch merge

From Marko Mäkelä marko.makela at iki.fi on Sat Feb 14 07:50:49 GMT 2015

On Thu, Feb 12, 2015 at 01:24:29PM +0000, Steve Ratcliffe wrote:
>So finally I will merge the mixed index branch.

I believe that the database terminology for this is 'inverted index' or 
'fulltext index'.

>I think it would be best to selectively enable it per country along 
>with lists of names to avoid. This would be best done by people from or 
>familiar with the countries in question.

In fulltext search, these are called 'stopwords'.

It might not be necessary to do anything to for countries where street 
names are commonly written as a single word. Example: "Main Street" 
would be "Hauptstrasse" in German, "Huvudgatan" in Sweden and "Päätie" 
in Finnish. Only if the first part of the street name is a proper name 
such as a person's name, the second part could be written as a separate 
word, separated by a space or dash.

That said, I guess it would still make sense to introduce some 
stopwords. Words that I can think of:

Swedish: gata, gatan, gränd, gränden, stig, stigen, (stråk, stråket)
Finnish: tie, katu, polku, kuja, (raitti, taival)
German: Straße, Strasse, Weg, Allee, Chaussee
Estonian: mnt, maantee, tn, tänav, pst, puiestee

In Estonia, it seems to be common to write the tn, mnt or pst as a 
separate word.

I could be missing some stopwords in Estonian and for German-speaking 
countries. Also, it could be that the French loan words Allee and 
Chaussee are sometimes accented.

The Finnish and Swedish words that I have put in parenthesis should be 
very rare, typically used for ways for non-motorized traffic.  I don't 
think that including them would pollute the index much. You might in 
fact want to search for such a name when you are looking for a nice 
walking or cycling route (i.e., you expect there to exist some 
random-famous-person-name-stråket, but you do not know the random name).

	Marko


More information about the mkgmap-dev mailing list