chore: update confs

Doragd · Dec 31, 2024 · f0a31d2 · f0a31d2
1 parent 58a7c8a
commit f0a31d2
Showing 1 changed file with 35 additions and 0 deletions.
diff --git a/arxiv.json b/arxiv.json
@@ -37770,5 +37770,40 @@
         "pub_date": "2024-12-27",
         "summary": "Large Language Models (LLMs) can correct their self-generated responses, but a decline in accuracy after self-correction is also witnessed. To have a deeper understanding of self-correction, we endeavor to decompose, evaluate, and analyze the self-correction behaviors of LLMs. By enumerating and analyzing answer correctness before and after self-correction, we decompose the self-correction capability into confidence (being confident to correct answers) and critique (turning wrong answers to correct) capabilities, and propose two metrics from a probabilistic perspective to measure these 2 capabilities, along with another metric for overall self-correction capability evaluation. Based on our decomposition and evaluation metrics, we conduct extensive experiments and draw some empirical conclusions. For example, we find different models can exhibit distinct behaviors: some models are confident while others are more critical. We also find the trade-off between the two capabilities (i.e. improving one can lead to a decline in the other) when manipulating model self-correction behavior by prompts or in-context learning. Further, we find a simple yet efficient strategy to improve self-correction capability by transforming Supervision Fine-Tuning (SFT) data format, and our strategy outperforms vanilla SFT in both capabilities and achieves much higher accuracy after self-correction. Our code will be publicly available on GitHub.",
         "translated": "大型语言模型（LLMs）能够对其自我生成的响应进行修正，但我们也观察到自我修正后准确率的下降。为了更深入地理解自我修正，我们致力于分解、评估和分析LLMs的自我修正行为。通过列举和分析自我修正前后答案的正确性，我们将自我修正能力分解为自信（对修正后的答案有信心）和批判（将错误答案转为正确）两种能力，并从概率角度提出了两个指标来衡量这两种能力，同时提出了另一个指标用于整体自我修正能力的评估。基于我们的分解和评估指标，我们进行了广泛的实验并得出了一些经验性结论。例如，我们发现不同的模型可能表现出不同的行为：一些模型自信，而另一些则更具批判性。我们还发现，在通过提示或上下文学习操纵模型自我修正行为时，这两种能力之间存在权衡（即提升一种能力可能导致另一种能力的下降）。此外，我们发现通过改变监督微调（SFT）数据格式，可以简单而有效地提升自我修正能力，我们的策略在两种能力上均优于普通SFT，并在自我修正后实现了更高的准确率。我们的代码将在GitHub上公开。"
+    },
+    {
+        "title": "Towards Identity-Aware Cross-Modal Retrieval: a Dataset and a Baseline",
+        "url": "http://arxiv.org/abs/2412.21009v1",
+        "pub_date": "2024-12-30",
+        "summary": "Recent advancements in deep learning have significantly enhanced content-based retrieval methods, notably through models like CLIP that map images and texts into a shared embedding space. However, these methods often struggle with domain-specific entities and long-tail concepts absent from their training data, particularly in identifying specific individuals. In this paper, we explore the task of identity-aware cross-modal retrieval, which aims to retrieve images of persons in specific contexts based on natural language queries. This task is critical in various scenarios, such as for searching and browsing personalized video collections or large audio-visual archives maintained by national broadcasters. We introduce a novel dataset, COCO Person FaceSwap (COCO-PFS), derived from the widely used COCO dataset and enriched with deepfake-generated faces from VGGFace2. This dataset addresses the lack of large-scale datasets needed for training and evaluating models for this task. Our experiments assess the performance of different CLIP variations repurposed for this task, including our architecture, Identity-aware CLIP (Id-CLIP), which achieves competitive retrieval performance through targeted fine-tuning. Our contributions lay the groundwork for more robust cross-modal retrieval systems capable of recognizing long-tail identities and contextual nuances. Data and code are available at https://github.com/mesnico/IdCLIP.",
+        "translated": "深度学习的最新进展显著增强了基于内容的检索方法，尤其是通过像CLIP这样的模型，将图像和文本映射到共享的嵌入空间中。然而，这些方法在处理训练数据中未包含的特定领域实体和长尾概念时往往表现不佳，特别是在识别特定个体方面。本文探讨了身份感知的跨模态检索任务，该任务旨在根据自然语言查询检索特定情境下的人物图像。这一任务在多种场景中至关重要，例如搜索和浏览个性化视频集或由国家广播机构维护的大型音视频档案。我们引入了一个新的数据集，COCO Person FaceSwap (COCO-PFS)，该数据集源自广泛使用的COCO数据集，并通过VGGFace2生成的深度伪造人脸进行了丰富。这一数据集解决了训练和评估该任务模型所需的大规模数据集的缺乏问题。我们的实验评估了针对该任务重新设计的各种CLIP变体的性能，包括我们的架构——身份感知CLIP (Id-CLIP)，该架构通过有针对性的微调实现了具有竞争力的检索性能。我们的贡献为能够识别长尾身份和上下文细微差别的更强大的跨模态检索系统奠定了基础。数据和代码可在https://github.com/mesnico/IdCLIP获取。"
+    },
+    {
+        "title": "Rise of Generative Artificial Intelligence in Science",
+        "url": "http://arxiv.org/abs/2412.20960v1",
+        "pub_date": "2024-12-30",
+        "summary": "Generative Artificial Intelligence (GenAI, generative AI) has rapidly become available as a tool in scientific research. To explore the use of generative AI in science, we conduct an empirical analysis using OpenAlex. Analyzing GenAI publications and other AI publications from 2017 to 2023, we profile growth patterns, the diffusion of GenAI publications across fields of study, and the geographical spread of scientific research on generative AI. We also investigate team size and international collaborations to explore whether GenAI, as an emerging scientific research area, shows different collaboration patterns compared to other AI technologies. The results indicate that generative AI has experienced rapid growth and increasing presence in scientific publications. The use of GenAI now extends beyond computer science to other scientific research domains. Over the study period, U.S. researchers contributed nearly two-fifths of global GenAI publications. The U.S. is followed by China, with several small and medium-sized advanced economies demonstrating relatively high levels of GenAI deployment in their research publications. Although scientific research overall is becoming increasingly specialized and collaborative, our results suggest that GenAI research groups tend to have slightly smaller team sizes than found in other AI fields. Furthermore, notwithstanding recent geopolitical tensions, GenAI research continues to exhibit levels of international collaboration comparable to other AI technologies.",
+        "translated": "生成式人工智能（Generative Artificial Intelligence，简称GenAI或生成式AI）已迅速成为科学研究中的一种工具。为了探索生成式AI在科学领域的应用，我们利用OpenAlex进行了实证分析。通过分析2017年至2023年间生成式AI相关出版物及其他AI出版物，我们描绘了生成式AI的增长模式、其在不同研究领域的扩散情况以及生成式AI科学研究的地理分布。我们还研究了团队规模和国际合作，以探讨生成式AI作为一个新兴科学研究领域，是否与其他AI技术相比表现出不同的合作模式。研究结果表明，生成式AI在科学出版物中经历了快速增长，并逐渐占据重要地位。生成式AI的应用现已超越计算机科学，扩展到其他科学研究领域。在研究期间，美国研究人员贡献了全球近五分之二的生成式AI出版物。紧随其后的是中国，此外，多个中小型发达经济体在其研究出版物中也展示了相对较高的生成式AI应用水平。尽管整体科学研究正变得越来越专业化和协作化，但我们的研究结果表明，生成式AI研究团队的规模往往略小于其他AI领域。此外，尽管近期地缘政治局势紧张，生成式AI研究仍保持了与其他AI技术相当的国际合作水平。"
+    },
+    {
+        "title": "Ontology-grounded Automatic Knowledge Graph Construction by LLM under\n  Wikidata schema",
+        "url": "http://arxiv.org/abs/2412.20942v1",
+        "pub_date": "2024-12-30",
+        "summary": "We propose an ontology-grounded approach to Knowledge Graph (KG) construction using Large Language Models (LLMs) on a knowledge base. An ontology is authored by generating Competency Questions (CQ) on knowledge base to discover knowledge scope, extracting relations from CQs, and attempt to replace equivalent relations by their counterpart in Wikidata. To ensure consistency and interpretability in the resulting KG, we ground generation of KG with the authored ontology based on extracted relations. Evaluation on benchmark datasets demonstrates competitive performance in knowledge graph construction task. Our work presents a promising direction for scalable KG construction pipeline with minimal human intervention, that yields high quality and human-interpretable KGs, which are interoperable with Wikidata semantics for potential knowledge base expansion.",
+        "translated": "我们提出了一种基于本体的知识图谱（KG）构建方法，该方法利用大型语言模型（LLMs）在知识库上进行操作。本体的构建过程包括：通过生成关于知识库的能力问题（CQ）来发现知识范围，从CQ中提取关系，并尝试用Wikidata中的对应关系替换等效关系。为了确保生成的知识图谱的一致性和可解释性，我们基于提取的关系，使用构建的本体来指导知识图谱的生成。在基准数据集上的评估表明，该方法在知识图谱构建任务中表现出色。我们的工作展示了一个有前景的方向，即通过最少的人工干预实现可扩展的知识图谱构建流程，从而生成高质量且人类可解释的知识图谱，这些图谱能够与Wikidata的语义互操作，为潜在的知识库扩展提供支持。"
+    },
+    {
+        "title": "Unsupervised dense retrieval with conterfactual contrastive learning",
+        "url": "http://arxiv.org/abs/2412.20756v1",
+        "pub_date": "2024-12-30",
+        "summary": "Efficiently retrieving a concise set of candidates from a large document corpus remains a pivotal challenge in Information Retrieval (IR). Neural retrieval models, particularly dense retrieval models built with transformers and pretrained language models, have been popular due to their superior performance. However, criticisms have also been raised on their lack of explainability and vulnerability to adversarial attacks. In response to these challenges, we propose to improve the robustness of dense retrieval models by enhancing their sensitivity of fine-graned relevance signals. A model achieving sensitivity in this context should exhibit high variances when documents' key passages determining their relevance to queries have been modified, while maintaining low variances for other changes in irrelevant passages. This sensitivity allows a dense retrieval model to produce robust results with respect to attacks that try to promote documents without actually increasing their relevance. It also makes it possible to analyze which part of a document is actually relevant to a query, and thus improve the explainability of the retrieval model. Motivated by causality and counterfactual analysis, we propose a series of counterfactual regularization methods based on game theory and unsupervised learning with counterfactual passages. Experiments show that, our method can extract key passages without reliance on the passage-level relevance annotations. Moreover, the regularized dense retrieval models exhibit heightened robustness against adversarial attacks, surpassing the state-of-the-art anti-attack methods.",
+        "translated": "从大规模文档语料库中高效检索出一组简洁的候选集仍然是信息检索（IR）领域的一个关键挑战。基于神经网络的检索模型，特别是使用Transformer和预训练语言模型构建的密集检索模型，因其优越的性能而广受欢迎。然而，这些模型也因其缺乏可解释性和对对抗性攻击的脆弱性而受到批评。针对这些挑战，我们提出通过增强密集检索模型对细粒度相关性信号的敏感性来提高其鲁棒性。在这种背景下，一个具有敏感性的模型应在文档中决定其与查询相关性的关键段落被修改时表现出高方差，而对不相关段落的其他变化保持低方差。这种敏感性使得密集检索模型在面对试图提升文档排名但实际上并未增加其相关性的攻击时能够产生鲁棒的结果。同时，这也使得分析文档中哪些部分实际与查询相关成为可能，从而提高了检索模型的可解释性。基于因果性和反事实分析的动机，我们提出了一系列基于博弈论和反事实段落的无监督学习的反事实正则化方法。实验表明，我们的方法能够在不依赖段落级相关性标注的情况下提取关键段落。此外，经过正则化的密集检索模型在面对对抗性攻击时表现出更高的鲁棒性，超越了当前最先进的抗攻击方法。"
+    },
+    {
+        "title": "AmalREC: A Dataset for Relation Extraction and Classification Leveraging\n  Amalgamation of Large Language Models",
+        "url": "http://arxiv.org/abs/2412.20427v1",
+        "pub_date": "2024-12-29",
+        "summary": "Existing datasets for relation classification and extraction often exhibit limitations such as restricted relation types and domain-specific biases. This work presents a generic framework to generate well-structured sentences from given tuples with the help of Large Language Models (LLMs). This study has focused on the following major questions: (i) how to generate sentences from relation tuples, (ii) how to compare and rank them, (iii) can we combine strengths of individual methods and amalgamate them to generate an even bette quality of sentences, and (iv) how to evaluate the final dataset? For the first question, we employ a multifaceted 5-stage pipeline approach, leveraging LLMs in conjunction with template-guided generation. We introduce Sentence Evaluation Index(SEI) that prioritizes factors like grammatical correctness, fluency, human-aligned sentiment, accuracy, and complexity to answer the first part of the second question. To answer the second part of the second question, this work introduces a SEI-Ranker module that leverages SEI to select top candidate generations. The top sentences are then strategically amalgamated to produce the final, high-quality sentence. Finally, we evaluate our dataset on LLM-based and SOTA baselines for relation classification. The proposed dataset features 255 relation types, with 15K sentences in the test set and around 150k in the train set organized in, significantly enhancing relational diversity and complexity. This work not only presents a new comprehensive benchmark dataset for RE/RC task, but also compare different LLMs for generation of quality sentences from relational tuples.",
+        "translated": "现有的关系分类和抽取数据集通常存在一些局限性，例如关系类型受限和领域特定的偏差。本研究提出了一个通用框架，借助大语言模型（LLMs）从给定的元组生成结构良好的句子。本研究重点关注以下几个主要问题：（i）如何从关系元组生成句子，（ii）如何对生成的句子进行比较和排序，（iii）我们能否结合不同方法的优势并融合它们以生成更高质量的句子，以及（iv）如何评估最终的数据集？针对第一个问题，我们采用了一个多方面的五阶段流水线方法，结合LLMs和模板引导生成。我们引入了句子评估指数（Sentence Evaluation Index, SEI），该指数优先考虑语法正确性、流畅性、与人类情感的一致性、准确性和复杂性等因素，以回答第二个问题的第一部分。针对第二个问题的第二部分，本研究提出了一个SEI-Ranker模块，利用SEI选择生成的最佳候选句子。然后，通过策略性地融合这些最佳句子，生成最终的、高质量的句子。最后，我们在基于LLM和SOTA（State-of-the-Art）基线模型上评估了我们的关系分类数据集。所提出的数据集包含255种关系类型，测试集中有15K个句子，训练集中约有150K个句子，显著增强了关系的多样性和复杂性。本研究不仅为关系抽取/关系分类（RE/RC）任务提供了一个新的综合基准数据集，还比较了不同LLMs从关系元组生成高质量句子的能力。"
     }
 ]