You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a column where each entry is a variable-length list or set.
for example, I have a table like:
user id, purchased product id
and I group by user_id. Thus I get for each user a list of all the products the user purchased. I want to extract features to describe that information.
One simple and potentially effective option is to use min-hash, but at the level of the entries in the list (ie in this case, each product id would get hashed and we would compute the min over all the product ids purchased by a user, for each hashing function) rather than character n-grams.
Feature Description
it could be built into the minhashencoder, where if the column values are arrow or python lists or arrays it switches to that behavior. it could also be built into the agg joiner somehow.
Alternative Solutions
No response
Additional Context
came up during euroscipy 2024 discussions
The text was updated successfully, but these errors were encountered:
probably the easiest is to have a Hash (no min) transformer that hashes the full entry with several seeds, then this can be aggregated with the 'min' operation in the aggjoiner
Problem Description
I have a column where each entry is a variable-length list or set.
for example, I have a table like:
and I group by user_id. Thus I get for each user a list of all the products the user purchased. I want to extract features to describe that information.
One simple and potentially effective option is to use min-hash, but at the level of the entries in the list (ie in this case, each product id would get hashed and we would compute the min over all the product ids purchased by a user, for each hashing function) rather than character n-grams.
Feature Description
it could be built into the minhashencoder, where if the column values are arrow or python lists or arrays it switches to that behavior. it could also be built into the agg joiner somehow.
Alternative Solutions
No response
Additional Context
came up during euroscipy 2024 discussions
The text was updated successfully, but these errors were encountered: