You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In sample EN_B59a of the fathom-form-autofill corpus, there are two base64 strings that already existed in the page when they were frozen. These strings have whitespace in them. fathom extract uses a regular expression for extracting base64 strings. It does not include whitespace as an allowed character and will assume the string has ended upon encountering whitespace. @erikrose found a link that says that whitespace is allowed in quoted base64 strings, but is ignored when decoding. We should add support to fathom extract for this quoted case. We do not expect whitespace to appear in any of the strings created by freeze-dry (what we use to freeze/save samples). This should only affect some of the pages that already have base64 strings in them. Because this problem rarely occurs and shouldn't prevent feature vectors from being created for a sample, this is low priority.
The text was updated successfully, but these errors were encountered:
In sample EN_B59a of the fathom-form-autofill corpus, there are two base64 strings that already existed in the page when they were frozen. These strings have whitespace in them.
fathom extract
uses a regular expression for extracting base64 strings. It does not include whitespace as an allowed character and will assume the string has ended upon encountering whitespace. @erikrose found a link that says that whitespace is allowed in quoted base64 strings, but is ignored when decoding. We should add support tofathom extract
for this quoted case. We do not expect whitespace to appear in any of the strings created byfreeze-dry
(what we use to freeze/save samples). This should only affect some of the pages that already have base64 strings in them. Because this problem rarely occurs and shouldn't prevent feature vectors from being created for a sample, this is low priority.The text was updated successfully, but these errors were encountered: