Proj CDeepFuzz Paper Reading: An Extensive Study on Pre-trained Models for Program Understanding and Generation-526互联

Abstract

1. Intro

2. Background

2.1 Program Understanding and Generation Tasks

2.2 NL-PL Pre-Trained Models

3. The Extensive Study

3.1 Subjects and Dataset

3.2 Research Questions

RQ1: How do pre-trained models perform for program understanding and generation tasks? For this RQ, we extensively study the performance evaluation between all the adopted pre-trained models upon the CodeXGLUE benchmark [43].
RQ2: How do pre-trained models perform against non-pre-trained models? For this RQ, we evaluate and compare the performance between pre-trained models and domain-specific non-pre-trained state-of-the-art (SOTA) models on multiple tasks.
RQ3: Are the pre-trained models robust? For this RQ, we apply semantics-preserving adversarial attack techniques to investigate the robustness of the studied models.

3.3 Results and Analysis

Finding 1: The performance of all studied models can overall be replicated on their original benchmarks. However, subtle performance fluctuations can result in the untenable findings in their original papers, e.g., PLBART fails to outperform CodeBERT in defection detection and clone detection
发现 1：所有研究模型的性能总体上可以在其原始基准上复制。然而，细微的性能波动可能会导致原始论文中的结论站不住脚，例如，PLBART 在缺陷检测和克隆检测方面未能超越 CodeBERT

Finding 2: Encoder-based pre-trained models can achieve similar or even superior performance over encoder-decoder-based models on specific program generation tasks.
发现 2：在特定的程序生成任务上，基于编码器的预训练模型可以比基于编码器-解码器的模型实现相似甚至更好的性能。

Finding 3: There exists no dominating pre-trained models. One main reason is that the current SOTA encoder-decoder models largely overlooked the fact that different training objectives (for program understanding and generation) can potentially compromise each other. We call for future research on more inclusive encoder-decoder models.
发现 3：不存在占主导地位的预训练模型。一个主要原因是当前的 SOTA 编码器-解码器模型在很大程度上忽视了这样一个事实：不同的训练目标（用于程序理解和生成）可能会相互损害。我们呼吁未来研究更具包容性的编码器-解码器模型。

Finding 4: General-purpose pre-trained models can incur dramatic performance variance.
发现 4：通用预训练模型可能会产生巨大的性能差异。

Finding 5: Pre-trained models could be more promising than non-pre-trained models and more research efforts should be devoted to this direction.
发现 5：预训练模型可能比非预训练模型更有前景，应该在这个方向投入更多的研究工作。

Finding 6: We demonstrate for the first time that NL-PL pretrained models are not robust. They are highly vulnerable against semantics-preserving adversarial samples
发现 6：我们首次证明 NL-PL 预训练模型并不稳健。它们非常容易受到保留语义的对抗性样本的攻击

Finding 7: Current strategies for improving the robustness of pre-trained models, e.g., considering advanced learning strategies or additional code semantics, have limited effectiveness. We call for future research on more robust pre-trained models.
发现 7：当前提高预训练模型稳健性的策略（例如，考虑高级学习策略或附加代码语义）的有效性有限。我们呼吁未来研究更强大的预训练模型。

Finding 8: A simple random attack approach can already be rather powerful for attacking NL-PL pre-trained models.
发现 8：简单的随机攻击方法对于攻击 NL-PL 预训练模型来说已经相当强大。

Finding 9: Transformation methods tend to impact more on attack performance than search methods for NL-PL pre-trained models.
发现 9：对于 NL-PL 预训练模型，转换方法往往比搜索方法对攻击性能的影响更大。