logo separator

[mkgmap-dev] Minor splitter issue [PATCH]

From Steve Ratcliffe steve at parabola.me.uk on Sat Oct 13 13:16:35 BST 2012

Hi

On 04/10/12 16:46, Chris66 wrote:
> when there is a CR in the input data (coded as "
" or "
" )
> splitter writes a real CR (ascii 13) to the output file.
>
> Altough this is still legal XML, better IMHO is to keep the code sequence.

After a lot of research I now think there is a bug here.

Since most of the information on the net appears to be
incorrect or incomplete it is worth explaining.

In an attribute value the characters 0xA 0xD and 0x9 are all valid 
characters in xml attribute values, however they are not preserved
on reading and are all replaced with a space character. So for example:

v="hello
world"

Means the same as:

v="hello world"

So to preserve the input data, those three characters must be
encoded as character references in attribute values.

This is only true of attribute values, if a "
" occurs somewhere
else in the file, where it does not need to be, then we can not
preserve that on output (and there would be no advantage in doing so
if we could).

See: http://www.w3.org/TR/xml/#AVNormalize
Also: 
http://recycledknowledge.blogspot.co.uk/2006/03/writing-out-xml.html 
(but ignore the comment, which I believe is incorrect).

Attached is a patch that implements this.

..Steve
-------------- next part --------------
A non-text attachment was scrubbed...
Name: attr_cr_lf_tab.patch
Type: text/x-patch
Size: 623 bytes
Desc: not available
Url : http://lists.mkgmap.org.uk/pipermail/mkgmap-dev/attachments/20121013/70778e0b/attachment.bin 


More information about the mkgmap-dev mailing list