logo separator

[mkgmap-dev] Twülpstedt, Normalisation of unicode strings

From Gerd Petermann gpetermann_muenchen at hotmail.com on Tue Nov 16 09:27:00 GMT 2021

Hi,

please review my patch. I had some problems adding the Twülpstedt example to the existing unit test. I think the new code is closer to what should be tested.
Did I miss something?

Gerd

________________________________________
Von: mkgmap-dev <mkgmap-dev-bounces at lists.mkgmap.org.uk> im Auftrag von Gerd Petermann <gpetermann_muenchen at hotmail.com>
Gesendet: Montag, 15. November 2021 17:22
An: Development list for mkgmap
Betreff: Re: [mkgmap-dev] Twülpstedt, Normalisation of unicode strings

Hi Ticker,

OK, I had the same thoughts.

Gerd

________________________________________
Von: mkgmap-dev <mkgmap-dev-bounces at lists.mkgmap.org.uk> im Auftrag von Ticker Berkin <rwb-mkgmap at jagit.co.uk>
Gesendet: Montag, 15. November 2021 16:19
An: Development list for mkgmap
Betreff: Re: [mkgmap-dev] Twülpstedt, Normalisation of unicode strings

Hi

I'd vote for normalisation when the label is generated.

If the un-normalised string can be represented in the target charset,
no need for normalisation.

I don't see that styles should be testing names like this, and, if they
really need to, clauses for alternate representations could be added.

The proportion of input tag values that never make it into the final
.img must be quite high, so doing it early could be costly.

Ticker

On Mon, 2021-11-15 at 11:01 +0000, Gerd Petermann wrote:
> Hi all,
>
> see also https://forum.openstreetmap.org/viewtopic.php?id=74231
> mkgmap sometimes fails to encode correct strings for a given codepage
> like 1252 (latin1).
> I've uploaded a file that contains an area in Germany where the u-
> umlaut in name
> Twülpstedt is encoded in two different ways, either with ü (0xfc) or
> as u + "COMBINING DIAERESIS" (0x75 + 0x308)
> See umlaut.osm at https://files.mkgmap.org.uk/detail/537
>
> With the current code the 2nd variant is displayed as Twu?lpstedt.
> This 1-liner
> name = Normalizer.normalize(name, Normalizer.Form.NFC);
> helps to change the name to the usual encoding which works well with
> the codepage translation.
>
> So far so good. Now I wonder where exactly this call should be
> placed.
> My first idea was the code where the string is converted to a Garmin
> label, but maybe
> it should happen much earlier so that also the style rules "see" the
> normalized form.
>
> Any thoughts?
>
> Gerd
>
> _______________________________________________
> mkgmap-dev mailing list
> mkgmap-dev at lists.mkgmap.org.uk
> https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev


_______________________________________________
mkgmap-dev mailing list
mkgmap-dev at lists.mkgmap.org.uk
https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
_______________________________________________
mkgmap-dev mailing list
mkgmap-dev at lists.mkgmap.org.uk
https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev


More information about the mkgmap-dev mailing list