[mkgmap-dev] patched polish file charset and multipolygon handling

Mon Feb 21 22:46:37 GMT 2011

On 21/02/11 12:22, Kolesár András wrote:

Hello,

Welcome to the list.

> Finally, I have modified READING_CHARSET in
> mkgmap/reader/polish/PolishMapDataSource.java from "UTF-8" to
> "ISO-8859-2" and accented characters started to work. The used config

Yes, you are correct.

The way it was meant to work was that, since you didn't know the
codepage before reading the file, you read the file in iso-8859-1
always. When you save a label, you recover the bytes from the string
that you have read (it was read incorrectly because the character set
is different, but you can always recover the actual bytes that were in
the file) and decode them into unicode using the correct charset. The
recode() method does this.

But.. then READING_CHARSET was changed to utf-8 to deal with a
commonly found kind of file, and the recode() method only works
properly if the READING_CHARSET is iso-8859-1 (or similar 8-bit only
charset).

The change to utf-8 was made, I belive, because there are files
that do no contain a CodePage and have the strings in utf-8 (produced
by osm2mp).

I've never used cgpsmapper, so I don't know if there is a standard way
to say that the file is in utf-8 for this case.

So I guess, we should change READING_CHARSET back to iso-8859-1 and
find some other way to deal with utf-8 files if it is still an
important use.

Best wishes

..Steve