Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

数据集 #140

Open
happy-liuzhixuan opened this issue Dec 2, 2024 · 6 comments
Open

数据集 #140

happy-liuzhixuan opened this issue Dec 2, 2024 · 6 comments

Comments

@happy-liuzhixuan
Copy link

请问训练数据只需要图像还是说需要对应文本信息的图像数据

@happy-liuzhixuan
Copy link
Author

期待您的回复,非常感谢

@0x3f3f3f3fun
Copy link
Collaborator

您好。都是可以的,如果您的图像数据没有文本描述的话,可以将文本设置为空。

有文本描述的情况下训练得到的模型有更好的可控性,在图像退化较重的情况下可以提高修复的准确度。

@happy-liuzhixuan
Copy link
Author

请问如果我自己的数据集要添加文本描述的话,可以参考什么来制作呢。谢谢您的答复

@0x3f3f3f3fun
Copy link
Collaborator

我试过以下3种:

  1. BLIP-2:https://github.com/salesforce/LAVIS/tree/main/projects/blip2
  2. LLaVA:https://github.com/haotian-liu/LLaVA
  3. Recognize Anything (RAM):https://github.com/xinyu1205/recognize-anything

BLIP-2不建议使用,资源占用高,准确率低;LLaVA资源占用高,准确率高,描述的信息更丰富,还可以根据不同的question生成长或者短的文本描述;RAM资源占用低,可以生成以逗号分隔的文本描述,准确率方面还没测试过。

LLaVA跟RAM的使用代码可以参考:https://github.com/XPixelGroup/DiffBIR/blob/main/diffbir/utils/caption.py

这个任务叫image caption,应该有很多方法能够完成这个任务。BLIP-2跟LLaVA都不是最新的方法了。

@happy-liuzhixuan
Copy link
Author

谢谢您的答复,我会按照您的建议进行尝试,希望有增强的效果

@happy-liuzhixuan
Copy link
Author

请问数据集输入尺寸为512512是必须的吗,换成256256或者其他尺寸可不可以,对于性能来说是否有影响

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants