-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize #at_css and #css initialization #14
base: master
Are you sure you want to change the base?
Conversation
21f99b6
to
a3257b6
Compare
a3257b6
to
44b899f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @zyc9012 - I like the look of that micro-benchmark!
Some preliminary tests with our scraper parsers shows an average improvement of about 3%, and a small improvement to memory consumption.
I'm not sure we really need these pthread_key_t
variables - they can be shared across objects, they can surely also be shared across threads provided they're protected by a mutex, which we have implicitly in the GIL. Can we drop all the pthread stuff and just have:
static lxb_html_parser_t *html_parser = NULL;
static lxb_css_parser_t *css_parser = NULL;
static lxb_selectors_t *selectors = NULL;
And allocate them on initialization or first use? Letting them leak seems less important if they're just a one-time process-wide allocation.
ext/nokolexbor/nl_node.c
Outdated
pthread_key_create(&p_key_css_parser, free_css_parser); | ||
pthread_key_create(&p_key_selectors, free_selectors); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These functions are fallible and can fail to allocate a key, which would lead to undefined behaviour in later calls to pthread_get/setspecific
.
ext/nokolexbor/nl_node.c
Outdated
css_parser = lxb_css_parser_create(); | ||
status = lxb_css_parser_init(css_parser, NULL, NULL); | ||
if (status != LXB_STATUS_OK) { | ||
css_parser = NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless the error was specifically LXB_STATUS_ERROR_OBJECT_IS_NULL
this will leak the initial allocation plus any others that happened to succeed within the init call.
ext/nokolexbor/nl_node.c
Outdated
selectors = lxb_selectors_create(); | ||
status = lxb_selectors_init(selectors); | ||
if (status != LXB_STATUS_OK) { | ||
selectors = NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, if the error was anything other than LXB_STATUS_ERROR_INCOMPLETE_OBJECT
, the initial allocation will leak, plus any of the dobject allocations that did succeed within the init function.
ext/nokolexbor/nl_node.c
Outdated
/* Destroy Selectors object. */ | ||
(void)lxb_selectors_destroy(selectors, true); | ||
|
||
/* Destroy resources for CSS Parser. */ | ||
(void)lxb_css_parser_destroy(parser, true); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should be kept for the error handling cases, and also for the #ifndef HAVE_PTHREAD_H
case where the destructor call is not done for us at thread exit.
lxb_status_t status = lxb_html_parser_init(g_parser); | ||
if (html_parser == NULL) { | ||
html_parser = lxb_html_parser_create(); | ||
lxb_status_t status = lxb_html_parser_init(html_parser); | ||
if (status != LXB_STATUS_OK) { | ||
nl_raise_lexbor_error(status); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaks the initial allocation unless status was LXB_STATUS_ERROR_OBJECT_IS_NULL
. Otherwise it also leaks other allocations performed by lxb_html_parser_init
. The parser object also leaks in the non-HAVE_PTHREAD_H
case due to reliance on it calling the destructor.
Thanks @Freaky. I've removed all pthread stuff and made sure that it's thread-safe on truffleruby. |
Before
After