Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generating Embeddings of Code Tokens using StarCoder #141

Open
code2graph opened this issue Sep 23, 2023 · 1 comment
Open

Generating Embeddings of Code Tokens using StarCoder #141

code2graph opened this issue Sep 23, 2023 · 1 comment

Comments

@code2graph
Copy link

I am exploring the possibility of using StarCoder to generate embeddings for code tokens and would like to know if this is feasible with the current implementation.

Questions:

  1. Is it possible to use StarCoder to generate embeddings of code tokens?
  2. If yes, how should we configure and use StarCoder to make it usable for generating embeddings of code tokens?
@loubnabnl
Copy link
Contributor

Hi, you can take the last hidden layer of the model as embeddings, however it might be better to use an encoder for the embeddings, we have trained a BERT-like code model called StarEncoder which you can try https://huggingface.co/bigcode/starencoder

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants