You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for the reply! I understand that that's the reason, but is there any workaround?
Or is this a limitation of hyperscan? In the sense that you cannot get exact offsets with UTF-8.
try add flag HS_FLAG_UTF8
expressions = ("test.+",)
db = hyperscan.Database()
db.compile(
expressions=[e.encode("utf-8") for e in expressions], flags=[hyperscan.HS_FLAG_UTF8],
)
Hi,
first of all thank you for this amazing library.
While playing around with it I stumbled upon this issue.
When matching on strings containing characters that UTF-8 converts into more then one byte, the end offset is wrong.
See for instance this example:
The highest end offset is
6
butlen("test®") is
5`.Is there any workaround to this? Am I misunderstanding something?
Thank you!
The text was updated successfully, but these errors were encountered: