CodeBPE
Proj. CAR Paper Reading: CodeBPE: Investigating Subtokenization Options for Large Language Model Pretraining on Source Code
## Abstract 本文:探索LLM在source code上pretrain时的subtokenization效果。 subtokenization: split long tokens into smaller subtokens, in order to ensure the relati ......