Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser spends a lot of time in [NSString stringByRemovingNewLinesAndWhitespace] #50

Open
Colourclash opened this issue Mar 30, 2012 · 5 comments

Comments

@Colourclash
Copy link

I have noticed a performance bottleneck. When parsing a large RSS feed such as the itunes 300 new releases () it can spend 60% of the total parsing time in [NSString stringByRemovingNewLinesAndWhitespace].

The entire parsing operation (not including downloading of data from the server) can take 3900ms, and roughly 2200ms of that can be in stringByRemovingNewLinesAndWhitespace.

These figures where taken when testing on a release build on an iPhone 3GS using the MWFeedParser sample code and the following RSS feed:

http://ax.phobos.apple.com.edgesuite.net/WebObjects/MZStore.woa/wpa/MRSS/newreleases/limit=300/rss.xml

[Note: is this stripping of white space needed? Could it be made an optional feature to save CPU time?]

@sylverb
Copy link

sylverb commented Mar 30, 2012

Hello colourclash,
maybe you could do a performance comparison with my fork https://github.com/sylverb/MWFeedParser which is using a different xml parser (based on libxml2 and GDataXMLNode). But I don't know if it's better or not in term of performances ...

@Colourclash
Copy link
Author

@sylverb

Hi, I checked out your fork and I noticed you are not calling [NSString stringByRemovingNewLinesAndWhitespace] at all, so it's not really a fair comparison, as my issue is related to the time spent stripping whitespace.

Just out of interest though, your code takes 900ms to parse the same feed as I used as a benchmark on the original MWFeedParser code, so yes, your code is faster. The NSXMLParser based code takes 1600ms if I remove the white space stripping code. Does your code offer the same functionality as the original MWFeedParser then?

Regards

@sylverb
Copy link

sylverb commented Mar 30, 2012

Thanks for the test.
It should provide the same functionalities (At least I've done my changes to keep the same interface and same results). The main reason for my fork is that NSXMLParseris very strict and will reject any not 100% xml compliant feed. As my goal was to be compatible with as much RSS feed as possible, it was important for me to make it less strict ...
Performance was not my main goal at all, but it's great if it's faster ...

@mwaterfall
Copy link
Owner

The whitespace removal has 2 purposes, to trim whitespace around fields such as dates and times and other values that need processing (which is important and has to be done), and the other is to simply tidy things up with the pure text fields and remove extra spaces and new lines.

Thinking about it now, this could be optimised so that the more thorough whitespace cleanup only happens on the text fields, and other value fields (dates, links, etc) can have a faster routine that just trims whitespace from the beginning and end of the string. I will try and implement this optimisation when I next get around to updating the parser.

@bluesuedesw
Copy link

optimized this function by replacing it with this one line solution

return [self stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants