[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [JDEV] Character Encodings and Languages thread
> Sorry, but this is nonsense. We are *only* talking about CDATA here,
> that is 8 bit bytes. The XML parser simply pulls these out without
> interpretation (apart from escaped characters) and gives them to you.
> (Well, at least that's what my XML parser does!). I am free to
> interpret those bytes in anyway I chose. The encoding is only relevant
> to the rendering software, it has nothing to do with the parser at
> all. Implementing it is essentially trivial.
Take a look at http://www.w3.org/TR/1998/REC-xml-19980210#sec-guessing
It sure doesn't sound or look trivial, and I couldn't expect any XML
parser to attempt to do that at the start of every <![CDATA[ block, and
they might have a problem finding the end of the CDATA block depending on
how the bits are laid out(as in a non-8bit encoding might accidentically
be bit indentical to the 8bit ]]> which would be disasterous).
As others have mentioned, mixing character encodings is a path that hasn't
fared well in other places and likely a bad idea here.
> This doesn't help at all. What I want to be able to do is to
> communicate with my friends in Korea in korean, my friends in japan in
> Japanese and use English here. I want to do this over a single message
> stream. Not everyone will have UTF-8 support on their machines.
Well, if they don't have UTF-8 they don't have XML since it's required of
all XML parsers to understand.
What I'm really wondering here is, this is just character encodings,
correct? I mean, all the same characters are still there in ANY
encoding(unless it's a severly restricted one), they are just encoded
differently. If it's all UTF-8 you can still do all of the korean,
japanese, english, etc, characters just fine, they are just encoded in
UTF-8. It's just a language and font display issue at that point. So
what is the problem with using the XML-forced UTF-8 encoding?
Jer