logo separator

[mkgmap-dev] Twülpstedt, Normalisation of unicode strings

From Gerd Petermann GPetermann_muenchen at hotmail.com on Mon Nov 15 11:01:20 GMT 2021

Hi all,

see also https://forum.openstreetmap.org/viewtopic.php?id=74231
mkgmap sometimes fails to encode correct strings for a given codepage like 1252 (latin1). 
I've uploaded a file that contains an area in Germany where the u-umlaut in name 
Twülpstedt is encoded in two different ways, either with ü (0xfc) or 
as u + "COMBINING DIAERESIS" (0x75 + 0x308)
See umlaut.osm at https://files.mkgmap.org.uk/detail/537

With the current code the 2nd variant is displayed as Twu?lpstedt.
This 1-liner
name = Normalizer.normalize(name, Normalizer.Form.NFC);
helps to change the name to the usual encoding which works well with the codepage translation.

So far so good. Now I wonder where exactly this call should be placed. 
My first idea was the code where the string is converted to a Garmin label, but maybe 
it should happen much earlier so that also the style rules "see" the normalized form.

Any thoughts?

Gerd



More information about the mkgmap-dev mailing list