XMLTextReader & encoded string

guest2013-1

guest
Joined
Aug 22, 2003
Messages
19,800
Reaction score
13
Hey,

So I've been struggling for a while (it's almost 4am...) and can't really get this to budge at all.

I have an XML file, ISO-8859-1 (which I ignore using XmlResolver = Nothing in code) which automatically converts data like " and &"#"9472; (replace the " with nothing, vBulletin converts it...) into their UTF-8 counterparts automatically due to the way .NET handles strings (UTF-16)

This screws around with the database being in Latin1 (which I won't be able to convert to UTF-8... would be easy and I would be sleeping now)

I've tried everything under the sun, and at the very least can convert these encoded characters back to "?"... now why the **** can't I just get the string that's in the damn XML!!!?

Frustrating to say the least. So if anyone can shed some light on this it would be awesome. Otherwise I'll just try and change **** on the database and see how much it breaks and blame someone else for it when(if) it does
 
@The_Assimilator, I posted sample content. I figured the XMLTextReader was implemented by everyone the same way so I figured posting code in a programming forum would kind of be redundant since all I do is a simple XMLTextReader'tjie

@dequadin, yes I know you can set the encoding that way, but as soon as xmlReader.value (which returns a string) hits, it encodes the " and &"#"9472; (replace the " with nothing, vBulletin converts it...) anyway because "String" in .NET is handled as UTF-16.

I also tried converting it to bytes and doing Encoding.Convert on it, but I only get it to *ignore* the "already converted to UTF-8 for me even though the ****ing data doesn't contain it explicitly" string and you end up with "?" in your data.

It would appear as if there's no way around this due to .NET insisting and doing the converting for you. Even wrote an HTML "Encoder" which would replicate the values (unsuccessfully).

In the end I just changed the ****ing database to take UTF-8 values instead (which worked, but because it's not my call to change the database, it's the single point of failure in corporate bull**** policy that might just make all my work null & void)

Doing this in PHP I have no problem, it doesn't try to convert the ****ing string into anything and honors the Latin1 encoding on it, **** knows why .NET does this.
 
Answer was given in stackoverflow. The data, even though it's Latin-1, ignore the "escape with &" rule. So XmlTextReader essentially just converted it for me so it won't break when reading (aw, how nice of it). You'd need to Streamread the text, replace & with & and then & with & and then write that to a memory stream and push that to the XmlTextReader

Combining this with the SecurityElement.Escape() feature, it should convert everything into nice pallatable HTML for you
 
Top
Sign up to the MyBroadband newsletter
X