logo separator

[mkgmap-dev] mixed index branch merge

From Alexandre Loss alexandre.loss at gmail.com on Sat Feb 14 11:57:49 GMT 2015

Hi guys,

The stopwords are very important for Brazilian's maps, because more than
90% of our street names are prefixed with its kind. Examples:
Rua Paris, Avenida Antônio de Castro, Avenida Afonso Pena, etc.
Avenida (avenue), Rua (road), etc are prefixes.

These prefixes will be included in the index increasing its size
unnecessarily.

I believe that you don't need to care about the country where maps will be
compiled. Firstly, because it will be very difficult to identify,
understand and apply the particular rules for every country. Moreover, you
will expend too much time creating these rules and the users will lost
flexibility to the define their own stopwords.

So, my suggestion is exactly that: allow the users to define their own
stopwords.

It should be developed a feature in mkgmap allowing the users to pass the
stopwords throw a new parameter/file, for example:
--index_stopwords=file.csv

file.csv example: "rua","avenida",  "tie", "katu", "polku", "kuja"

mkgmap must ignore case.

That's it.

Regards,

Alexandre

2015-02-14 5:50 GMT-02:00 Marko Mäkelä <marko.makela at iki.fi>:

> On Thu, Feb 12, 2015 at 01:24:29PM +0000, Steve Ratcliffe wrote:
>
>> So finally I will merge the mixed index branch.
>>
>
> I believe that the database terminology for this is 'inverted index' or
> 'fulltext index'.
>
>  I think it would be best to selectively enable it per country along with
>> lists of names to avoid. This would be best done by people from or familiar
>> with the countries in question.
>>
>
> In fulltext search, these are called 'stopwords'.
>
> It might not be necessary to do anything to for countries where street
> names are commonly written as a single word. Example: "Main Street" would
> be "Hauptstrasse" in German, "Huvudgatan" in Sweden and "Päätie" in
> Finnish. Only if the first part of the street name is a proper name such as
> a person's name, the second part could be written as a separate word,
> separated by a space or dash.
>
> That said, I guess it would still make sense to introduce some stopwords.
> Words that I can think of:
>
> Swedish: gata, gatan, gränd, gränden, stig, stigen, (stråk, stråket)
> Finnish: tie, katu, polku, kuja, (raitti, taival)
> German: Straße, Strasse, Weg, Allee, Chaussee
> Estonian: mnt, maantee, tn, tänav, pst, puiestee
>
> In Estonia, it seems to be common to write the tn, mnt or pst as a
> separate word.
>
> I could be missing some stopwords in Estonian and for German-speaking
> countries. Also, it could be that the French loan words Allee and Chaussee
> are sometimes accented.
>
> The Finnish and Swedish words that I have put in parenthesis should be
> very rare, typically used for ways for non-motorized traffic.  I don't
> think that including them would pollute the index much. You might in fact
> want to search for such a name when you are looking for a nice walking or
> cycling route (i.e., you expect there to exist some
> random-famous-person-name-stråket, but you do not know the random name).
>
>         Marko
> _______________________________________________
> mkgmap-dev mailing list
> mkgmap-dev at lists.mkgmap.org.uk
> http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mkgmap.org.uk/pipermail/mkgmap-dev/attachments/20150214/fc5fc00d/attachment.html>


More information about the mkgmap-dev mailing list