From b31409cd7cd7d97ce586d291cb3e18ee7d8e2a89 Mon Sep 17 00:00:00 2001 From: CHEN Liang Date: Sat, 14 Oct 2023 19:45:23 +0800 Subject: [PATCH 1/2] Update README.md --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index b595486..0f939a0 100644 --- a/README.md +++ b/README.md @@ -217,6 +217,7 @@ Related projects: 4. Evaluating Object Hallucination in Large Vision-Language Models. _Yifan Li et al._ arXiv 2023. [[paper](https://arxiv.org/abs/2305.10355)] 5. A Survey of Hallucination in Large Foundation Models. _Vipula Rawte et al._ arXiv 2023. [[paper](https://arxiv.org/abs/2309.05922)] 6. Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models. _Yue Zhang et al._ arXiv 2023. [[paper](https://arxiv.org/abs/2309.01219)] +7. Beyond Factuality: A Comprehensive Evaluation of Large Language Models as Knowledge Generators. _Liang Chen et al._ EMNLP 2023. [[paper](https://arxiv.org/abs/2310.07289)] ### Social science 1. How ready are pre-trained abstractive models and LLMs for legal case judgement summarization. _Aniket Deroy et al._ arXiv 2023. [[paper](https://arxiv.org/abs/2306.01248)] @@ -358,6 +359,7 @@ of VLMs | | Dialogue CoT [[paper](https://arxiv.org/abs/2305.11792)] [[GitHub](https://github.com/ruleGreen/Cue-CoT)] | In-depth dialogue | Specific downstream task | Helpfulness and acceptness of LLMs| | LAMM [[paper](https://arxiv.org/abs/2306.06687)] [[GitHub](https://github.com/OpenLAMM/LAMM)] | Multi-modal point clouds | Specific downstream task | Task-specific metrics| | GLUE-X [[paper](https://arxiv.org/abs/2211.08073)] [[GitHub](https://github.com/YangLinyi/GLUE-X)] | OOD robustness for NLU tasks | General language task | OOD robustness | +| CONNER [[paper](https://arxiv.org/abs/2310.07289)] | Knowledge-oriented evaluation | Knowledge-intensive task | Intrinsic and extrinsic metrics | | KoLA [[paper](https://arxiv.org/abs/2306.09296)] | Knowledge-oriented evaluation | General language task | Self-contrast metrics | | AGIEval [[paper](https://arxiv.org/abs/2304.06364)] | Human-centered foundational models | General language task | General | | PromptBench [[paper](https://arxiv.org/abs/2306.04528)] [[GitHub](https://github.com/microsoft/promptbench)] | Adversarial prompt resilience | General language task | Adversarial robustness | From 50ef9f294ce33d3979987ad78e7357f68eb0c5b1 Mon Sep 17 00:00:00 2001 From: CHEN Liang Date: Sat, 14 Oct 2023 19:53:29 +0800 Subject: [PATCH 2/2] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 0f939a0..b1bd3ee 100644 --- a/README.md +++ b/README.md @@ -359,7 +359,7 @@ of VLMs | | Dialogue CoT [[paper](https://arxiv.org/abs/2305.11792)] [[GitHub](https://github.com/ruleGreen/Cue-CoT)] | In-depth dialogue | Specific downstream task | Helpfulness and acceptness of LLMs| | LAMM [[paper](https://arxiv.org/abs/2306.06687)] [[GitHub](https://github.com/OpenLAMM/LAMM)] | Multi-modal point clouds | Specific downstream task | Task-specific metrics| | GLUE-X [[paper](https://arxiv.org/abs/2211.08073)] [[GitHub](https://github.com/YangLinyi/GLUE-X)] | OOD robustness for NLU tasks | General language task | OOD robustness | -| CONNER [[paper](https://arxiv.org/abs/2310.07289)] | Knowledge-oriented evaluation | Knowledge-intensive task | Intrinsic and extrinsic metrics | +| CONNER [[paper](https://arxiv.org/abs/2310.07289)][[GitHub](https://github.com/ChanLiang/CONNER)] | Knowledge-oriented evaluation | Knowledge-intensive task | Intrinsic and extrinsic metrics | | KoLA [[paper](https://arxiv.org/abs/2306.09296)] | Knowledge-oriented evaluation | General language task | Self-contrast metrics | | AGIEval [[paper](https://arxiv.org/abs/2304.06364)] | Human-centered foundational models | General language task | General | | PromptBench [[paper](https://arxiv.org/abs/2306.04528)] [[GitHub](https://github.com/microsoft/promptbench)] | Adversarial prompt resilience | General language task | Adversarial robustness |