Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT] Add LSA encoder #1121

Open
Vincent-Maladiere opened this issue Oct 21, 2024 · 6 comments
Open

[FEAT] Add LSA encoder #1121

Vincent-Maladiere opened this issue Oct 21, 2024 · 6 comments
Labels
enhancement New feature or request

Comments

@Vincent-Maladiere
Copy link
Member

Problem Description

Latent Semantic Analysis (LSA) consists of a TfidfVectorizer followed by Singular Value Decomposition (SVD). Scikit-learn mentions it in TruncatedSVD, and I wonder why it hasn't been implemented in scikit-learn in the first place, @GaelVaroquaux?

Feature Description

Create the LSAEncoder, a simple pipeline chaining TfidfVectorizer and TruncatedSVD (or a PCA, both support sparse matrices).

Alternative Solutions

No response

Additional Context

No response

@Vincent-Maladiere Vincent-Maladiere added the enhancement New feature or request label Oct 21, 2024
@GaelVaroquaux
Copy link
Member

Great!!

We need to think about a name. I think LSA is a bit of a technical name that might ring a bell to non technical users.

We brainstormed a bit in terms of name with @jeromedockes and @rcap107 . The name StringEncoder came to mind. It would be close to TextEncoder (#1077), but we feel that the difference is somewhat understandable.

That said, maybe it would be an argument to move the name TextEncoder to SentenceEncoder, which would also be (maybe) a good name because it would be more explicit (link to "SentenceTransformer")

@Vincent-Maladiere
Copy link
Member Author

Very interesting! One might wonder why we don't consider the GapEncoder as a string encoder, though. WDYT?

@GaelVaroquaux
Copy link
Member

GaelVaroquaux commented Oct 21, 2024 via email

@Vincent-Maladiere
Copy link
Member Author

Okay, this sounds easy to explain in the doc!

@Vincent-Maladiere
Copy link
Member Author

Scikit-learn mentions LSA in TruncatedSVD, and I wonder why it hasn't been implemented in scikit-learn in the first place

Any thoughts @GaelVaroquaux? I'm curious

@GaelVaroquaux
Copy link
Member

GaelVaroquaux commented Oct 21, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants