This repository has been archived by the owner on Feb 15, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 660
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
tags: Use a perfect hash for lookups
The previous version using `strcasecmp` over an array was a bottleneck on the library. This version uses a simple, minimal perfect hash table (computed via `mph`) to convert tag names into strings. Since we're now hashing tag names, we can pass in the length of the tag name explicitly, and avoid the superfluous allocations that the tokenizer was performing in order to NULL-terminate the tag. This is implemented on the new `gumbo_tagn_enum` API. The old `gumbo_tag_enum` API has been left as a thin wrapper to keep backwards compatibility -- it is not used internally by the library. `mph` was chosen for the perfect hash function because it generates hashes that are slightly slower than GPerf but significantly simpler, and occuppying an order of magnitude less memory (as they don't need a full copy of all the strings in the set for hashing). If the tag lookup function proves to be a bottleneck, this decision can be re-evaluated in the future.
- Loading branch information
Showing
4 changed files
with
269 additions
and
11 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,150 @@ | ||
html | ||
head | ||
title | ||
base | ||
link | ||
meta | ||
style | ||
script | ||
noscript | ||
template | ||
body | ||
article | ||
section | ||
nav | ||
aside | ||
h1 | ||
h2 | ||
h3 | ||
h4 | ||
h5 | ||
h6 | ||
hgroup | ||
header | ||
footer | ||
address | ||
p | ||
hr | ||
pre | ||
blockquote | ||
ol | ||
ul | ||
li | ||
dl | ||
dt | ||
dd | ||
figure | ||
figcaption | ||
main | ||
div | ||
a | ||
em | ||
strong | ||
small | ||
s | ||
cite | ||
q | ||
dfn | ||
abbr | ||
data | ||
time | ||
code | ||
var | ||
samp | ||
kbd | ||
sub | ||
sup | ||
i | ||
b | ||
u | ||
mark | ||
ruby | ||
rt | ||
rtc | ||
rp | ||
bdi | ||
bdo | ||
span | ||
br | ||
wbr | ||
ins | ||
del | ||
image | ||
img | ||
iframe | ||
embed | ||
object | ||
param | ||
video | ||
audio | ||
source | ||
track | ||
canvas | ||
map | ||
area | ||
math | ||
mi | ||
mo | ||
mn | ||
ms | ||
mtext | ||
mglyph | ||
malignmark | ||
annotation-xml | ||
svg | ||
foreignobject | ||
desc | ||
table | ||
caption | ||
colgroup | ||
col | ||
tbody | ||
thead | ||
tfoot | ||
tr | ||
td | ||
th | ||
form | ||
fieldset | ||
legend | ||
label | ||
input | ||
button | ||
select | ||
datalist | ||
optgroup | ||
option | ||
textarea | ||
keygen | ||
output | ||
progress | ||
meter | ||
details | ||
summary | ||
menu | ||
menuitem | ||
applet | ||
acronym | ||
bgsound | ||
dir | ||
frame | ||
frameset | ||
noframes | ||
isindex | ||
listing | ||
xmp | ||
nextid | ||
noembed | ||
plaintext | ||
rb | ||
strike | ||
basefont | ||
big | ||
blink | ||
center | ||
font | ||
marquee | ||
multicol | ||
nobr | ||
spacer | ||
tt |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters