Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add references #24

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -217,6 +217,7 @@ Related projects:
4. Evaluating Object Hallucination in Large Vision-Language Models. _Yifan Li et al._ arXiv 2023. [[paper](https://arxiv.org/abs/2305.10355)]
5. A Survey of Hallucination in Large Foundation Models. _Vipula Rawte et al._ arXiv 2023. [[paper](https://arxiv.org/abs/2309.05922)]
6. Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models. _Yue Zhang et al._ arXiv 2023. [[paper](https://arxiv.org/abs/2309.01219)]
7. Beyond Factuality: A Comprehensive Evaluation of Large Language Models as Knowledge Generators. _Liang Chen et al._ EMNLP 2023. [[paper](https://arxiv.org/abs/2310.07289)]

### Social science
1. How ready are pre-trained abstractive models and LLMs for legal case judgement summarization. _Aniket Deroy et al._ arXiv 2023. [[paper](https://arxiv.org/abs/2306.01248)]
Expand Down Expand Up @@ -358,6 +359,7 @@ of VLMs |
| Dialogue CoT [[paper](https://arxiv.org/abs/2305.11792)] [[GitHub](https://github.com/ruleGreen/Cue-CoT)] | In-depth dialogue | Specific downstream task | Helpfulness and acceptness of LLMs|
| LAMM [[paper](https://arxiv.org/abs/2306.06687)] [[GitHub](https://github.com/OpenLAMM/LAMM)] | Multi-modal point clouds | Specific downstream task | Task-specific metrics|
| GLUE-X [[paper](https://arxiv.org/abs/2211.08073)] [[GitHub](https://github.com/YangLinyi/GLUE-X)] | OOD robustness for NLU tasks | General language task | OOD robustness |
| CONNER [[paper](https://arxiv.org/abs/2310.07289)][[GitHub](https://github.com/ChanLiang/CONNER)] | Knowledge-oriented evaluation | Knowledge-intensive task | Intrinsic and extrinsic metrics |
| KoLA [[paper](https://arxiv.org/abs/2306.09296)] | Knowledge-oriented evaluation | General language task | Self-contrast metrics |
| AGIEval [[paper](https://arxiv.org/abs/2304.06364)] | Human-centered foundational models | General language task | General |
| PromptBench [[paper](https://arxiv.org/abs/2306.04528)] [[GitHub](https://github.com/microsoft/promptbench)] | Adversarial prompt resilience | General language task | Adversarial robustness |
Expand Down