logo separator

[mkgmap-dev] Search addresses for latin countries (help on reg exp)

From Carlos Dávila cdavilam at orangecorreo.es on Tue Aug 6 07:30:01 BST 2013

El 06/08/13 00:02, Carlos Dávila escribió:
> El 05/08/13 23:09, Carlos Dávila escribió:
>> El 05/08/13 19:42, Steve Ratcliffe escribió:
>>> Hi
>>>
>>>> Folks, as you know – this comes up time to time – address search is
>>>> unpractical in most Latin countries where the street/square name 
>>>> usually
>>>> starts with the type (Via, Viale,Corso, Piazza etc [IT]; Avenida,
>>>> Calle, Plaza etc [ES]; Avenue, Boulevard, Rue, Place etc [FR] etc.)
>>>> followed by the full name of - usually - the person naming the street.
>>>> Nevertheless the street names sometime appears abbreviated (V.le,
>>>> Av.da, Bld. etc), sometime the Middle name is skipped, sometime the
>>>> work “of” is used (Avenue de Bobigny, Corso del Popolo etc)
>>>>
>>> The Garmin index format has a way of dealing with this problem and
>>> earlier this year I made a branch that creates an index with the extra
>>> information to show where the interesting part of the name starts.
>>>
>>> The latest version indexes every word in the name separately so you
>>> could find 'corso del popolo' by typing 'corso' , 'del' or 'popolo'.
>>>
>>> So this will always work for any language, but at the cost of a
>>> much larger index.
>>>
>>> It would be great if someone could try it out as it is, then
>>> if useful, its more likely that someone would improve it. By
>>> devising a suitable way to cut down the useless entries.
>>>
>>> Download it as mkgmap-mixed-index-r2662.jar at the bottom of the 
>>> download
>>> page.
>>>
>>>> So what is a simple Mozartstrasse in Austria would look like “Via
>>>> Wolfgang Amadeus Mozart” in Italy or “Rue Wolfgang Amadeus Mozart” in
>>>> France but possibly also “Av.da de Mozart” etc.
>>>>
>>>> Now, everyone knows the street/square by its last name and it would be
>>>> much more practical to search by it: I’d like to have a style that 
>>>> just
>>>> pick the last full word of the street/square name and put it as a 
>>>> suffix
>>>> followed by a comma and the original name.
>>>>
>>>> This would really boost address search for Latin countries – so it 
>>>> might
>>>> be a default style to add to IT, FR, ES, BR, MX… etc).
>>>>
>>>> Could you help me on making that regular expression for the style?
>>>>
>>>> “str1 str2… strN” -> “strN, str1 str2… strN”
>>>>
>>>> Thanks!
>>>>
>>>> Enrico
>> First result with the mixed-index branch, processing Spain with 
>> default style
>> Total time taken: 391216ms vs 449649ms with r2661
>> index size: 29 MB vs 21.6 MB with r2661
>> Apart from the numbers, the address search doesn't work by now. 
>> Entries in the index are not unique and are not ordered (see 
>> screenshot 1). When you type a letter search results don't change 
>> accordingly (screenshot 2). This is the console output, if it is of 
>> any help:
>> === FIRST
>> t1=0, t2=55013
>> first av 96203/24, last 0/12
>> AVENIDA : 32380
>> CAMINO : 14816
>> PLAZA : 12864
>> CARRETERA : 28180
>> CALLE : 288500
>> RÚA : 9130
>> CARRER : 117140
>> AVINGUDA : 11602
>> === LAST
>> KALEA : 9682
>> AUZOA : 11604
> I have compiled the same input data with the same command and 
> strangely now it seems to work better. Typing "C" in the search field 
> selects all streets with a "C" as first letter in their name after 
> calle, avenida or whatever (see screenshot), apart from the 3 first 
> entries in the list.
Doing some more test, it seems that the new index is able to find 
streets by their second word. For example, searching for Calle Naciones 
Unidas (United Nations Street) it is found typing both "calle nac" and 
"nacione" but not typing "unidas"


More information about the mkgmap-dev mailing list