Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect text encoding with feeds containing "euc-kr" text encoding #39

Open
sylverb opened this issue Nov 16, 2011 · 1 comment
Open

Comments

@sylverb
Copy link

sylverb commented Nov 16, 2011

Hello,
I had issues with some specific text encoding when the http headers were not indicating the text encoding (like this one for example : http://www.torrentrg.com/bbs/rss.php?bo_table=torrent_variety ).
To fix this, I decided to get the encoding type from the XML declaration if we don't get it from http headers :

<?xml version="1.0" encoding="euc-kr"?>

If you want to check about this, here is my code ... in MWFeedParser.m / - (void)startParsingData:(NSData *)data textEncodingName:(NSString *)textEncodingName :

        [...]
        // Not UTF-8 so convert
        MWLog(@"MWFeedParser: XML document was not UTF-8 so we're converting it");
        NSString *string = nil;

        // Attempt to detect encoding from response header
        NSStringEncoding nsEncoding = 0;
        [...]

becomes :

        [...]
        // Not UTF-8 so convert
        MWLog(@"MWFeedParser: XML document was not UTF-8 so we're converting it");
        NSString *string = nil;

        // If no text encoding indication was in the response header
        // then try to get encoding from the XML declaration
        if (textEncodingName == nil) {
            NSData* xmlEncodingData = [NSData dataWithBytesNoCopy:(void *)[data bytes]
                                                           length:100
                                                     freeWhenDone:NO];
            NSString* xmlEncodingString = [[NSString alloc] initWithData:xmlEncodingData encoding:NSUTF8StringEncoding];
            if (!xmlEncodingString) xmlEncodingString = [[NSString alloc] initWithData:xmlEncodingData encoding:NSISOLatin1StringEncoding];
            if (!xmlEncodingString) xmlEncodingString = [[NSString alloc] initWithData:xmlEncodingData encoding:NSMacOSRomanStringEncoding];

            if ([xmlEncodingString hasPrefix:@"<?xml"]) {
                NSRange a = [xmlEncodingString rangeOfString:@"?>"];
                if (a.location != NSNotFound) {
                    NSString *xmlDec = [xmlEncodingString substringToIndex:a.location];
                    NSRange b = [xmlDec rangeOfString:@"encoding=\""];
                    if (b.location != NSNotFound) {
                        NSUInteger s = b.location+b.length;
                        NSRange c = [xmlDec rangeOfString:@"\"" options:0 range:NSMakeRange(s, [xmlDec length] - s)];
                        if (c.location != NSNotFound) {
                            textEncodingName = [xmlEncodingString substringWithRange:NSMakeRange(b.location+b.length,c.location-b.location-b.length)];
                        }
                    }
                }
            }
            [xmlEncodingString release];
        }

        // Attempt to detect encoding from response header or XML declaration
        NSStringEncoding nsEncoding = 0;
        [...]
@dodyw
Copy link

dodyw commented Apr 12, 2012

Thanks! The above code fix the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants