logo separator

[mkgmap-dev] Global index branch

From Steve Ratcliffe steve at parabola.me.uk on Sun Feb 27 23:11:29 GMT 2011


On 26/02/11 16:46, ValentinAK wrote:
> What about Russian? Many users in Russia use mkgmap with -charset:cp1251 or
> -code-page:1251. The index-branch version 1861 does not work correctly with
> the Russian character set.

To support code pages other than 1252 we need to create files
that describe how the characters are sorted.

The following will be of interest to just about everyone.

This is specified in the SRT file and I have created a text file
format that can be used to create these files.  However I don't
know 100% how it works, so some things have to be guessed.

At the top of the file there are the following:

	# The code page
	codepage 1251

	# I have no idea what these are for, but they are important
	# and if they are not present everywhere they should be then
	# things will not work.  I don't know if they simply identify the
	# sort or have some other meaning.
	id1 7
	id2 2

	# Any descriptive text
	description "Russian Sort"

Then there are the characters and how they should sort.
The file itself is in utf-8, but characters can be represented
either by themselves (in utf-8) or by a two character hexadecimal
representation of their value in the target character set.

Every different letter is on its own line, eg:

	code A
	code B

Characters that differ only in case are separated by a comma:

	code a, A

Letters that differ only by accent are separated by a semi-colon

	code a, A; á, Á

Today I found this site: http://www.collation-charts.org/
which is a good source of information.


More information about the mkgmap-dev mailing list