You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have created a few scripts to preprocess text corpus ~6MB. In order to keep text formatting I need to iterate over each line and do some text manipulations with it. This in turn produces PANIC: unprotected error in call to Lua API (not enough memory). I decided to try tds.Hash to keep my corpus table.
Here is the code I am using:
text_arr = tokenize(text)
text_arr = tds.Hash(text_arr)
-- replace rare tokens with <unk>
-- text_arr is a {idx: {tokens arr}}
for l=1,#text_arr do -- iterating lines {}
for t=1,#text_arr[l] do -- iterating tokens {}
-- rare is arr of rare words
for r=1,#rare do
if text_arr[l][t] == rare[r] then text_arr[l][t] = "<unk>" end
end
end
end
text_arr is a table of size 2900 and this 3 loop operation becomes really slow when using tds.Hash.
I am by no means a lua expert but am I doing something wrong?
The text was updated successfully, but these errors were encountered:
I have created a few scripts to preprocess text corpus ~6MB. In order to keep text formatting I need to iterate over each line and do some text manipulations with it. This in turn produces
PANIC: unprotected error in call to Lua API (not enough memory)
. I decided to try tds.Hash to keep my corpus table.Here is the code I am using:
text_arr
is a table of size 2900 and this 3 loop operation becomes really slow when usingtds.Hash
.I am by no means a lua expert but am I doing something wrong?
The text was updated successfully, but these errors were encountered: