openAI cookbook - debug

发布时间 2023-04-27 14:49:50作者: fxjwind

 

当用GPT失败的时候应该怎么办?

更好的prompt

finetune

模型不行,let it be

When GPT-3 fails on a task, what should you do?

  • Search for a better prompt that elicits more reliable answers?
  • Invest in thousands of examples to fine-tune a custom model?
  • Assume the model is incapable of the task, and move on?

There is no simple answer - it depends. However, if your task involves logical reasoning or complexity, consider trying the techniques in this article to build more reliable, high-performing prompts.

 

人无法直接给出两位乘的结果,但是通过列过程验算就可以,对于模型也是一样

所以不要过于简单的认为模型不具备能力

If you were asked to multiply 13 by 17, would the answer pop immediately into your mind? For most of us, probably not. Yet, that doesn't mean humans are incapable of two-digit multiplication. With a few seconds, and some pen and paper, it's not too taxing to work out that 13 x 17 = 130 + 70 + 21 = 221.

简单的魔法,Let's think step by step.

试验证明,这个方法非常有效果

On a benchmark of word math problems, the Let's think step by step trick raised GPT-3's solve rate massively, from a worthless 18% to a decent 79%!

更多的魔法

The rest of this article shares techniques for improving reliability of large language models on complex tasks. Although some of the techniques are specific to certain types of problems, many of them are built upon general principles that can be applied to a wide range of tasks, e.g.:

  • Give clearer instructions
  • Split complex tasks into simpler subtasks
  • Structure the instruction to keep the model on task
  • Prompt the model to explain before answering
  • Ask for justifications of many possible answers, and then synthesize
  • Generate many outputs, and then use the model to pick the best one
  • Fine-tune custom models to maximize performance

再给个例子,模型需要你更有耐心的teach

在gpt3.5下试,会选b

做法是分步骤,分而治之,对于人或模型都是万能药

分三步,就能对

实际在3.5中试,答案是错的,第一步能找出3, 5相关,但第二步推论仍然是错的

 

Zero-shot,直接问,没有范例

这种方法也不是对所有问题都有效,直觉上看,对于人而言,更多的过程无用的场景,那么对模型也无效

The authors found that it was most helpful for multi-step arithmetic problems, symbolic reasoning problems, strategy problems, and other reasoning problems.
It didn't help with simple math problems or common sense questions, and presumably wouldn't help with many other non-reasoning tasks either.

 

Few-shot 少量的范例

框出来的部分就是范例,这种称为 in-context learning

CoT就是给出步骤,思路有点像上面

但是CoT的好处是,可以teach的更具体,帮助更大

One advantage of the few-shot example-based approach relative to the Let's think step by step technique is that you can more easily specify the format, length, and style of reasoning that you want the model to perform before landing on its final answer. This can be particularly helpful in cases where the model isn't initially reasoning in the right way or depth.

Few-shot的方式可以用于让大模型生成训练集, 很直觉

In 2022, Eric Zelikman and Yuhuai Wu et al. published a clever procedure for using a few-shot prompt to generate a dataset of explanations that could be used to fine-tune a model.
The idea is to use a few-shot prompt to generate candidate explanations, and only keep the explanations that produce the correct answer.
Then, to get additional explanations for some of the incorrect answers, retry the few-shot prompt but with correct answers given as part of the question.
The authors called their procedure STaR (Self-taught Reasoner):

few-shot prompt用于生成fine-tuning dataset可以扩展开

Using a few-shot prompt to extend or modify a fine-tuning dataset is an idea that can be generalized beyond explanation writing.
For example, if you have large quantities of unstructured text that you want to train on, you may find opportunities to use a prompt to extract a structured dataset from your unstructured text, and then fine-tune a custom model on that structured dataset.

 

后面列出很多COT的扩展,和很多的最新的Prompt技术,先略过