Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code inconsistency issues #15

Open
GooLiang opened this issue Apr 26, 2024 · 4 comments
Open

Code inconsistency issues #15

GooLiang opened this issue Apr 26, 2024 · 4 comments

Comments

@GooLiang
Copy link

In generate_pyg_data.py
Line 10: from utils import knowledge_augmentation, compute_pca_with_whitening, bert_whitening
However, compute_pca_with_whitening, bert_whitening does not exist in utils.py

Thanks for your help!

@CurryTang
Copy link
Owner

Hi, these lines are not relevant to the topic discussed in the paper, so you can safely delete them. It's related to strategies shown in https://arxiv.org/abs/2103.15316.

@GooLiang
Copy link
Author

Thanks for your quick reply.
I have some other questions:

  1. How is cora_fixed_sbert(ada, google, …).pt generated? Is "data.raw_text" the original text attribute from cora or the text attribute after using LLM?
  2. The know_inp(sep)_sb.pt file is missing in lmfinetune.py. How to generate them?
  3. The implementation of Iterative structure seems not to be found in the project.

Best wishes!

@CurryTang
Copy link
Owner

  1. these embeddings are generated by api provided by openai and google, you may find endpoints at api.py. Personally, I don't recommend re-generating them since they perform poorly considering the price. 'raw_text' is the original text attribute.
  2. check generate_pyg_data.py
  3. We follow https://github.com/AndyJZhao/GLEM and write data interface. Since the original codebase is too complicated, we don't integrate them there. If you need them, I can upload them later

@GooLiang
Copy link
Author

  1. Regarding "perform poorly considering the price", do you mean that only pre-training with abundant computing resources can achieve good performance? If so, how many resources are needed to ensure it?
  2. In generate_pyg_data.py
    Lines 180 and 189 "know_inp_ft" and "know_sep_ft" do not appear to explain how to obtain "_inp_finetune_XXX.emb". Am I missing something?
  3. Please upload them, this is very helpful to me, thanks a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants