526互联

dataset format of benchmarks

发布时间 2024-01-02 11:53:25作者: Daze_Lu

note: the datasets are classified into two types, generative(the answer is natural language, the length and content are not in a fixed format) and selection(such as selecting an answer from A B C D).

mmlu

triviaqa

gsm8k

human eval

bbh

hellaswag

NQ(natural question)

MBPP

PIQA

SIQA

ARC

winogrande

openbookQA

commonsense_qa

squad

quac

boolq

benchmarks dataset format of

benchmark命令

benchmark性能

redis-benchmark

benchmark_radio_multi_rf