dataset format of benchmarks

发布时间 2024-01-02 11:53:25作者: Daze_Lu

note: the datasets are classified into two types, generative(the answer is natural language, the length and content are not in a fixed format) and selection(such as selecting an answer from A B C D).

 

 

mmlu

image

triviaqa

image

gsm8k

image

human eval

image

bbh

image

hellaswag

image

NQ(natural question)

image

MBPP

image

PIQA

image

SIQA

image

ARC

image

winogrande

image

openbookQA

image

commonsense_qa

image

squad

image

quac

image

boolq

image