This repository has been archived by the owner on Nov 10, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 2
user_item and item_item recommender tables #12
Comments
Interesting.. Would every cell in one table need to be computed with all others? |
I don't think so unless every user has interacted with every item. I've started with a def item_item_counts(dataf, user_col="user", item_col="item"):
"""
Computers item-item overlap counts from user-item interactions, useful for recommendations.
This function is meant to be used in a `.pipe()`-line.
Arguments:
- dataf: polars dataframe
- user_col: name of the column containing the user id
- item_col: name of the column containing the item id
"""
return (dataf
.with_columns([
pl.col(pl.col(item_col)).list().over('user').explode().alias("item_rec"),
])
.filter(pl.col(item_col) != pl.col("item_rec"))
.with_columns([
pl.col(user_col).count().over(pl.col(item_col)).alias("n_item"),
pl.col(user_col).count().over('item_rec').alias("n_item_rec"),
pl.col(user_col).count().over([pl.col(item_col), 'item_rec']).alias("n_both")
])
.select(['item', 'item_rec', 'n_item', 'n_item_rec', 'n_both'])
.drop_duplicates()
) Something is telling me these kinds of queries are gonna benchmark reaaaal well. |
Hebbes. It's something like this; result = (df
.pipe(remove_outliers)
.with_column(
pl.col('item').list().over('user').explode().alias("item_rec")
)
.filter(pl.col("item") != pl.col("item_rec"))
.with_columns([
pl.col('user').count().over('item').alias("n_item"),
pl.col('user').count().over('item_rec').alias("n_item_rec"),
pl.col('user').count().over(['item', 'item_rec']).alias("n_both")
])
)
(result
.with_column((pl.col('n_both')/pl.col('n_item')).alias('rating'))
.filter(pl.col('n_both') > 10)
.sort(['item', 'rating'], reverse=True)) |
@ritchie46 does polars support |
Merged
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Given a log of weighted
user
-item
interactions, can we generate aitem-item
recommendation table and auser-item
recommendation table?Kind of! We can calculate
p(item_a | item_b)
andp(item_a)
which is can be reweighed into a table with recommendations. We can also do something similar for users. After all, a user that interactive with itemsa
,b
andc
will have a score for itemx
defined via;The text was updated successfully, but these errors were encountered: