UIE-X在医疗领域的实战

PaddleNLP全新发布UIE-X ?，除已有纯文本抽取的全部功能外，新增文档抽取能力。

UIE-X延续UIE的思路，基于跨模态布局增强预训练模型文心ERNIE-Layout重训模型，融合文本、图像、布局等信息进行联合建模，能够深度理解多模态文档。基于Prompt思想，实现开放域信息抽取，支持零样本抽取，小样本能力领先。

项目链接：https://github.com/PaddlePaddle/PaddleNLP/tree/develop/applications/information_extraction

本案例为UIE-X在医疗领域的实战，通过少量标注+模型微调即可具备定制场景的端到端文档信息提取能力！

1.项目背景

目前医疗领域有大量的医学检查报告单，病历，发票，CT影像，眼科等等的医疗图片数据。现阶段，针对这些图片都是靠人工分类，结构化录入系统中，做患者的全生命周期的管理。
耗时耗力，人工成本极大。如果能靠人工智能的技术做到图片的自动分类和结构化，将大大的降低成本，提高系统录入的整体效率。

2.案例简介

本案例基于PaddleNLP最新开源的UIE-X，以医学检查单这种医疗领域常见的图片类型为例，展示从数据标注、模型训练到Taskflow一键部署的全流程解决方案

数据集来源：https://tianchi.aliyun.com/dataset/126039

数据集样例展示：

医疗场景常见图片展示：

3.环境准备

!pip install --upgrade --user paddleocr
!pip install --upgrade --user paddlenlp

我们推荐使用数据标注平台Label-Studio进行数据标注，本案例也打通了从标注到训练的通道，即Label-Studio导出数据后可通过label_studio.py脚本轻松将数据转换为输入模型时需要的形式，实现无缝衔接。为了达到这个目的，您可以参考信息抽取任务Label-Studio标注指南在Label-Studio平台上标注数据：

# 下载标注数据：
!wget https://paddlenlp.bj.bcebos.com/datasets/medical_checklist.zip
!unzip medical_checklist.zip

数据转换

!python label_studio.py \
    --label_studio_file ./medical_checklist/label_studio.json \
    --save_dir ./medical_checklist \
    --splits 0.8 0.2 0\
    --task_type ext \

5.模型微调

!python finetune.py  \
    --device gpu \
    --logging_steps 5 \
    --save_steps 25 \
    --eval_steps 25 \
    --seed 42 \
    --model_name_or_path uie-x-base \
    --output_dir ./checkpoint/model_best \
    --train_path medical_checklist/train.txt \
    --dev_path medical_checklist/dev.txt  \
    --per_device_train_batch_size  16 \
    --per_device_eval_batch_size 16 \
    --num_train_epochs 5 \
    --learning_rate 1e-5 \
    --label_names 'start_positions' 'end_positions' \
    --do_train \
    --do_eval \
    --do_export \
    --export_model_dir ./checkpoint/model_best \
    --overwrite_output_dir \
    --disable_tqdm True \
    --metric_for_best_model eval_f1 \
    --load_best_model_at_end  True \
    --save_total_limit 1

[2023-07-21 15:36:09,684] [ WARNING] - evaluation_strategy reset to IntervalStrategy.STEPS for do_eval is True. you can also set evaluation_strategy='epoch'.
[2023-07-21 15:36:09,684] [    INFO] - The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).
[2023-07-21 15:36:09,684] [    INFO] - ============================================================
[2023-07-21 15:36:09,685] [    INFO] -      Model Configuration Arguments      
[2023-07-21 15:36:09,685] [    INFO] - paddle commit id              :3fa7a736e32508e797616b6344d97814c37d3ff8
[2023-07-21 15:36:09,685] [    INFO] - export_model_dir              :./checkpoint/model_best
[2023-07-21 15:36:09,685] [    INFO] - model_name_or_path            :uie-x-base
[2023-07-21 15:36:09,685] [    INFO] - 
[2023-07-21 15:36:09,685] [    INFO] - ============================================================
[2023-07-21 15:36:09,685] [    INFO] -       Data Configuration Arguments      
[2023-07-21 15:36:09,685] [    INFO] - paddle commit id              :3fa7a736e32508e797616b6344d97814c37d3ff8
[2023-07-21 15:36:09,685] [    INFO] - dev_path                      :medical_checklist/dev.txt
[2023-07-21 15:36:09,685] [    INFO] - max_seq_len                   :512
[2023-07-21 15:36:09,685] [    INFO] - train_path                    :medical_checklist/train.txt
[2023-07-21 15:36:09,685] [    INFO] - 
[2023-07-21 15:36:09,685] [ WARNING] - Process rank: -1, device: gpu, world_size: 1, distributed training: False, 16-bits training: False
[2023-07-21 15:36:09,686] [    INFO] - Model config ErnieLayoutConfig {
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "coordinate_size": 128,
  "enable_recompute": false,
  "eos_token_id": 2,
  "fuse": false,
  "gradient_checkpointing": false,
  "has_relative_attention_bias": true,
  "has_spatial_attention_bias": true,
  "has_visual_segment_embedding": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "image_feature_pool_shape": [
    7,
    7,
    256
  ],
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_2d_position_embeddings": 1024,
  "max_position_embeddings": 514,
  "max_rel_2d_pos": 256,
  "max_rel_pos": 128,
  "model_type": "ernie_layout",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "output_past": true,
  "pad_token_id": 1,
  "paddlenlp_version": null,
  "pool_act": "tanh",
  "rel_2d_pos_bins": 64,
  "rel_pos_bins": 32,
  "shape_size": 128,
  "task_id": 0,
  "task_type_vocab_size": 3,
  "type_vocab_size": 100,
  "use_task_id": true,
  "vocab_size": 250002
}

[2023-07-21 15:36:09,687] [    INFO] - Configuration saved in /home/aistudio/.paddlenlp/models/uie-x-base/config.json
[2023-07-21 15:36:09,687] [    INFO] - Downloading uie_x_base.pdparams from https://bj.bcebos.com/paddlenlp/models/transformers/uie_x/uie_x_base.pdparams
100%|██████████████████████████████████████| 1.05G/1.05G [00:15<00:00, 73.4MB/s]
W0721 15:36:28.591925   856 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 11.2
W0721 15:36:28.595674   856 gpu_resources.cc:91] device: 0, cuDNN Version: 8.2.
[2023-07-21 15:36:30,069] [    INFO] - All model checkpoint weights were used when initializing UIEX.

[2023-07-21 15:36:30,069] [    INFO] - All the weights of UIEX were initialized from the model checkpoint at uie-x-base.
If your task is similar to the task the model of the checkpoint was trained on, you can already use UIEX for predictions without further training.
[2023-07-21 15:36:30,070] [    INFO] - We are using <class 'paddlenlp.transformers.ernie_layout.tokenizer.ErnieLayoutTokenizer'> to load 'uie-x-base'.
[2023-07-21 15:36:30,071] [    INFO] - Downloading https://bj.bcebos.com/paddlenlp/models/transformers/ernie_layout/vocab.txt and saved to /home/aistudio/.paddlenlp/models/uie-x-base
[2023-07-21 15:36:30,132] [    INFO] - Downloading vocab.txt from https://bj.bcebos.com/paddlenlp/models/transformers/ernie_layout/vocab.txt
100%|██████████████████████████████████████| 2.70M/2.70M [00:00<00:00, 48.4MB/s]
[2023-07-21 15:36:30,263] [    INFO] - Downloading https://bj.bcebos.com/paddlenlp/models/transformers/ernie_layout/sentencepiece.bpe.model and saved to /home/aistudio/.paddlenlp/models/uie-x-base
[2023-07-21 15:36:30,325] [    INFO] - Downloading sentencepiece.bpe.model from https://bj.bcebos.com/paddlenlp/models/transformers/ernie_layout/sentencepiece.bpe.model
100%|██████████████████████████████████████| 4.83M/4.83M [00:00<00:00, 63.2MB/s]
[2023-07-21 15:36:31,214] [    INFO] - tokenizer config file saved in /home/aistudio/.paddlenlp/models/uie-x-base/tokenizer_config.json
[2023-07-21 15:36:31,214] [    INFO] - Special tokens file saved in /home/aistudio/.paddlenlp/models/uie-x-base/special_tokens_map.json
[2023-07-21 15:36:33,843] [    INFO] - ============================================================
[2023-07-21 15:36:33,844] [    INFO] -     Training Configuration Arguments    
[2023-07-21 15:36:33,844] [    INFO] - paddle commit id              :3fa7a736e32508e797616b6344d97814c37d3ff8
[2023-07-21 15:36:33,844] [    INFO] - _no_sync_in_gradient_accumulation:True
[2023-07-21 15:36:33,844] [    INFO] - activation_quantize_type      :None
[2023-07-21 15:36:33,844] [    INFO] - adam_beta1                    :0.9
[2023-07-21 15:36:33,844] [    INFO] - adam_beta2                    :0.999
[2023-07-21 15:36:33,844] [    INFO] - adam_epsilon                  :1e-08
[2023-07-21 15:36:33,844] [    INFO] - algo_list                     :None
[2023-07-21 15:36:33,844] [    INFO] - batch_num_list                :None
[2023-07-21 15:36:33,844] [    INFO] - batch_size_list               :None
[2023-07-21 15:36:33,844] [    INFO] - bf16                          :False
[2023-07-21 15:36:33,844] [    INFO] - bf16_full_eval                :False
[2023-07-21 15:36:33,844] [    INFO] - bias_correction               :False
[2023-07-21 15:36:33,844] [    INFO] - current_device                :gpu:0
[2023-07-21 15:36:33,844] [    INFO] - dataloader_drop_last          :False
[2023-07-21 15:36:33,844] [    INFO] - dataloader_num_workers        :0
[2023-07-21 15:36:33,845] [    INFO] - device                        :gpu
[2023-07-21 15:36:33,845] [    INFO] - disable_tqdm                  :True
[2023-07-21 15:36:33,845] [    INFO] - do_compress                   :False
[2023-07-21 15:36:33,845] [    INFO] - do_eval                       :True
[2023-07-21 15:36:33,845] [    INFO] - do_export                     :True
[2023-07-21 15:36:33,845] [    INFO] - do_predict                    :False
[2023-07-21 15:36:33,845] [    INFO] - do_train                      :True
[2023-07-21 15:36:33,845] [    INFO] - eval_batch_size               :16
[2023-07-21 15:36:33,845] [    INFO] - eval_steps                    :25
[2023-07-21 15:36:33,845] [    INFO] - evaluation_strategy           :IntervalStrategy.STEPS
[2023-07-21 15:36:33,845] [    INFO] - flatten_param_grads           :False
[2023-07-21 15:36:33,845] [    INFO] - fp16                          :False
[2023-07-21 15:36:33,845] [    INFO] - fp16_full_eval                :False
[2023-07-21 15:36:33,845] [    INFO] - fp16_opt_level                :O1
[2023-07-21 15:36:33,845] [    INFO] - gradient_accumulation_steps   :1
[2023-07-21 15:36:33,845] [    INFO] - greater_is_better             :True
[2023-07-21 15:36:33,845] [    INFO] - ignore_data_skip              :False
[2023-07-21 15:36:33,845] [    INFO] - input_dtype                   :int64
[2023-07-21 15:36:33,845] [    INFO] - input_infer_model_path        :None
[2023-07-21 15:36:33,845] [    INFO] - label_names                   :['start_positions', 'end_positions']
[2023-07-21 15:36:33,845] [    INFO] - lazy_data_processing          :True
[2023-07-21 15:36:33,845] [    INFO] - learning_rate                 :1e-05
[2023-07-21 15:36:33,845] [    INFO] - load_best_model_at_end        :True
[2023-07-21 15:36:33,845] [    INFO] - local_process_index           :0
[2023-07-21 15:36:33,845] [    INFO] - local_rank                    :-1
[2023-07-21 15:36:33,845] [    INFO] - log_level                     :-1
[2023-07-21 15:36:33,845] [    INFO] - log_level_replica             :-1
[2023-07-21 15:36:33,846] [    INFO] - log_on_each_node              :True
[2023-07-21 15:36:33,846] [    INFO] - logging_dir                   :./checkpoint/model_best/runs/Jul21_15-36-09_jupyter-2631487-6518069
[2023-07-21 15:36:33,846] [    INFO] - logging_first_step            :False
[2023-07-21 15:36:33,846] [    INFO] - logging_steps                 :5
[2023-07-21 15:36:33,846] [    INFO] - logging_strategy              :IntervalStrategy.STEPS
[2023-07-21 15:36:33,846] [    INFO] - lr_scheduler_type             :SchedulerType.LINEAR
[2023-07-21 15:36:33,846] [    INFO] - max_grad_norm                 :1.0
[2023-07-21 15:36:33,846] [    INFO] - max_steps                     :-1
[2023-07-21 15:36:33,846] [    INFO] - metric_for_best_model         :eval_f1
[2023-07-21 15:36:33,846] [    INFO] - minimum_eval_times            :None
[2023-07-21 15:36:33,846] [    INFO] - moving_rate                   :0.9
[2023-07-21 15:36:33,846] [    INFO] - no_cuda                       :False
[2023-07-21 15:36:33,846] [    INFO] - num_train_epochs              :5.0
[2023-07-21 15:36:33,846] [    INFO] - onnx_format                   :True
[2023-07-21 15:36:33,846] [    INFO] - optim                         :OptimizerNames.ADAMW
[2023-07-21 15:36:33,846] [    INFO] - output_dir                    :./checkpoint/model_best
[2023-07-21 15:36:33,846] [    INFO] - overwrite_output_dir          :True
[2023-07-21 15:36:33,846] [    INFO] - past_index                    :-1
[2023-07-21 15:36:33,846] [    INFO] - per_device_eval_batch_size    :16
[2023-07-21 15:36:33,846] [    INFO] - per_device_train_batch_size   :16
[2023-07-21 15:36:33,846] [    INFO] - prediction_loss_only          :False
[2023-07-21 15:36:33,846] [    INFO] - process_index                 :0
[2023-07-21 15:36:33,846] [    INFO] - prune_embeddings              :False
[2023-07-21 15:36:33,846] [    INFO] - recompute                     :False
[2023-07-21 15:36:33,846] [    INFO] - remove_unused_columns         :True
[2023-07-21 15:36:33,846] [    INFO] - report_to                     :['visualdl']
[2023-07-21 15:36:33,846] [    INFO] - resume_from_checkpoint        :None
[2023-07-21 15:36:33,846] [    INFO] - round_type                    :round
[2023-07-21 15:36:33,847] [    INFO] - run_name                      :./checkpoint/model_best
[2023-07-21 15:36:33,847] [    INFO] - save_on_each_node             :False
[2023-07-21 15:36:33,847] [    INFO] - save_steps                    :25
[2023-07-21 15:36:33,847] [    INFO] - save_strategy                 :IntervalStrategy.STEPS
[2023-07-21 15:36:33,847] [    INFO] - save_total_limit              :1
[2023-07-21 15:36:33,847] [    INFO] - scale_loss                    :32768
[2023-07-21 15:36:33,847] [    INFO] - seed                          :42
[2023-07-21 15:36:33,847] [    INFO] - sharding                      :[]
[2023-07-21 15:36:33,847] [    INFO] - sharding_degree               :-1
[2023-07-21 15:36:33,847] [    INFO] - should_log                    :True
[2023-07-21 15:36:33,847] [    INFO] - should_save                   :True
[2023-07-21 15:36:33,847] [    INFO] - skip_memory_metrics           :True
[2023-07-21 15:36:33,847] [    INFO] - strategy                      :dynabert+ptq
[2023-07-21 15:36:33,847] [    INFO] - train_batch_size              :16
[2023-07-21 15:36:33,847] [    INFO] - use_pact                      :True
[2023-07-21 15:36:33,847] [    INFO] - warmup_ratio                  :0.1
[2023-07-21 15:36:33,847] [    INFO] - warmup_steps                  :0
[2023-07-21 15:36:33,847] [    INFO] - weight_decay                  :0.0
[2023-07-21 15:36:33,847] [    INFO] - weight_quantize_type          :channel_wise_abs_max
[2023-07-21 15:36:33,847] [    INFO] - width_mult_list               :None
[2023-07-21 15:36:33,847] [    INFO] - world_size                    :1
[2023-07-21 15:36:33,847] [    INFO] - 
[2023-07-21 15:36:33,849] [    INFO] - ***** Running training *****
[2023-07-21 15:36:33,849] [    INFO] -   Num examples = 686
[2023-07-21 15:36:33,849] [    INFO] -   Num Epochs = 5
[2023-07-21 15:36:33,849] [    INFO] -   Instantaneous batch size per device = 16
[2023-07-21 15:36:33,849] [    INFO] -   Total train batch size (w. parallel, distributed & accumulation) = 16
[2023-07-21 15:36:33,849] [    INFO] -   Gradient Accumulation steps = 1
[2023-07-21 15:36:33,849] [    INFO] -   Total optimization steps = 215.0
[2023-07-21 15:36:33,849] [    INFO] -   Total num train samples = 3430.0
[2023-07-21 15:36:33,856] [    INFO] -   Number of trainable parameters = 281693122
[2023-07-21 15:36:55,804] [    INFO] - loss: 0.00139983, learning_rate: 1e-05, global_step: 5, interval_runtime: 21.9466, interval_samples_per_second: 3.645, interval_steps_per_second: 0.228, epoch: 0.1163
[2023-07-21 15:37:17,246] [    INFO] - loss: 0.00095238, learning_rate: 1e-05, global_step: 10, interval_runtime: 21.4431, interval_samples_per_second: 3.731, interval_steps_per_second: 0.233, epoch: 0.2326
[2023-07-21 15:37:38,397] [    INFO] - loss: 0.00227169, learning_rate: 1e-05, global_step: 15, interval_runtime: 21.1288, interval_samples_per_second: 3.786, interval_steps_per_second: 0.237, epoch: 0.3488
[2023-07-21 15:37:59,719] [    INFO] - loss: 0.00058537, learning_rate: 1e-05, global_step: 20, interval_runtime: 21.3431, interval_samples_per_second: 3.748, interval_steps_per_second: 0.234, epoch: 0.4651
[2023-07-21 15:38:20,879] [    INFO] - loss: 0.00099298, learning_rate: 1e-05, global_step: 25, interval_runtime: 21.1605, interval_samples_per_second: 3.781, interval_steps_per_second: 0.236, epoch: 0.5814
[2023-07-21 15:38:20,879] [    INFO] - ***** Running Evaluation *****
[2023-07-21 15:38:20,880] [    INFO] -   Num examples = 35
[2023-07-21 15:38:20,880] [    INFO] -   Total prediction steps = 3
[2023-07-21 15:38:20,880] [    INFO] -   Pre device batch size = 16
[2023-07-21 15:38:20,880] [    INFO] -   Total Batch size = 16
[2023-07-21 15:38:31,387] [    INFO] - eval_loss: 0.0014212249079719186, eval_precision: 0.9344262295081968, eval_recall: 0.9047619047619048, eval_f1: 0.9193548387096775, eval_runtime: 10.5013, eval_samples_per_second: 3.333, eval_steps_per_second: 0.286, epoch: 0.5814
[2023-07-21 15:38:31,387] [    INFO] - Saving model checkpoint to ./checkpoint/model_best/checkpoint-25
[2023-07-21 15:38:31,390] [    INFO] - Configuration saved in ./checkpoint/model_best/checkpoint-25/config.json
[2023-07-21 15:38:33,536] [    INFO] - tokenizer config file saved in ./checkpoint/model_best/checkpoint-25/tokenizer_config.json
[2023-07-21 15:38:33,537] [    INFO] - Special tokens file saved in ./checkpoint/model_best/checkpoint-25/special_tokens_map.json
[2023-07-21 15:38:46,593] [    INFO] - loss: 0.00054665, learning_rate: 1e-05, global_step: 30, interval_runtime: 25.7138, interval_samples_per_second: 3.111, interval_steps_per_second: 0.194, epoch: 0.6977
[2023-07-21 15:39:07,860] [    INFO] - loss: 0.00042223, learning_rate: 1e-05, global_step: 35, interval_runtime: 21.2605, interval_samples_per_second: 3.763, interval_steps_per_second: 0.235, epoch: 0.814
[2023-07-21 15:39:29,450] [    INFO] - loss: 0.00070746, learning_rate: 1e-05, global_step: 40, interval_runtime: 21.5964, interval_samples_per_second: 3.704, interval_steps_per_second: 0.232, epoch: 0.9302
[2023-07-21 15:39:50,745] [    INFO] - loss: 0.00027768, learning_rate: 1e-05, global_step: 45, interval_runtime: 21.2946, interval_samples_per_second: 3.757, interval_steps_per_second: 0.235, epoch: 1.0465
[2023-07-21 15:40:12,219] [    INFO] - loss: 0.00037302, learning_rate: 1e-05, global_step: 50, interval_runtime: 21.4753, interval_samples_per_second: 3.725, interval_steps_per_second: 0.233, epoch: 1.1628
[2023-07-21 15:40:12,220] [    INFO] - ***** Running Evaluation *****
[2023-07-21 15:40:12,220] [    INFO] -   Num examples = 35
[2023-07-21 15:40:12,220] [    INFO] -   Total prediction steps = 3
[2023-07-21 15:40:12,220] [    INFO] -   Pre device batch size = 16
[2023-07-21 15:40:12,221] [    INFO] -   Total Batch size = 16
[2023-07-21 15:40:22,304] [    INFO] - eval_loss: 0.0014475114876404405, eval_precision: 0.9482758620689655, eval_recall: 0.873015873015873, eval_f1: 0.9090909090909091, eval_runtime: 10.0828, eval_samples_per_second: 3.471, eval_steps_per_second: 0.298, epoch: 1.1628
[2023-07-21 15:40:22,305] [    INFO] - Saving model checkpoint to ./checkpoint/model_best/checkpoint-50
[2023-07-21 15:40:22,308] [    INFO] - Configuration saved in ./checkpoint/model_best/checkpoint-50/config.json
[2023-07-21 15:40:24,464] [    INFO] - tokenizer config file saved in ./checkpoint/model_best/checkpoint-50/tokenizer_config.json
[2023-07-21 15:40:24,465] [    INFO] - Special tokens file saved in ./checkpoint/model_best/checkpoint-50/special_tokens_map.json
[2023-07-21 15:40:37,740] [    INFO] - loss: 0.00019248, learning_rate: 1e-05, global_step: 55, interval_runtime: 25.5206, interval_samples_per_second: 3.135, interval_steps_per_second: 0.196, epoch: 1.2791
[2023-07-21 15:40:58,905] [    INFO] - loss: 0.00021258, learning_rate: 1e-05, global_step: 60, interval_runtime: 21.1645, interval_samples_per_second: 3.78, interval_steps_per_second: 0.236, epoch: 1.3953
[2023-07-21 15:41:20,213] [    INFO] - loss: 0.00024681, learning_rate: 1e-05, global_step: 65, interval_runtime: 21.3084, interval_samples_per_second: 3.754, interval_steps_per_second: 0.235, epoch: 1.5116
[2023-07-21 15:41:41,237] [    INFO] - loss: 0.000169, learning_rate: 1e-05, global_step: 70, interval_runtime: 21.024, interval_samples_per_second: 3.805, interval_steps_per_second: 0.238, epoch: 1.6279
[2023-07-21 15:42:02,163] [    INFO] - loss: 0.00036645, learning_rate: 1e-05, global_step: 75, interval_runtime: 20.9256, interval_samples_per_second: 3.823, interval_steps_per_second: 0.239, epoch: 1.7442
[2023-07-21 15:42:02,163] [    INFO] - ***** Running Evaluation *****
[2023-07-21 15:42:02,163] [    INFO] -   Num examples = 35
[2023-07-21 15:42:02,164] [    INFO] -   Total prediction steps = 3
[2023-07-21 15:42:02,164] [    INFO] -   Pre device batch size = 16
[2023-07-21 15:42:02,164] [    INFO] -   Total Batch size = 16
[2023-07-21 15:42:12,158] [    INFO] - eval_loss: 0.001322056632488966, eval_precision: 0.9508196721311475, eval_recall: 0.9206349206349206, eval_f1: 0.9354838709677418, eval_runtime: 9.9708, eval_samples_per_second: 3.51, eval_steps_per_second: 0.301, epoch: 1.7442
[2023-07-21 15:42:12,159] [    INFO] - Saving model checkpoint to ./checkpoint/model_best/checkpoint-75
[2023-07-21 15:42:12,161] [    INFO] - Configuration saved in ./checkpoint/model_best/checkpoint-75/config.json
[2023-07-21 15:42:14,264] [    INFO] - tokenizer config file saved in ./checkpoint/model_best/checkpoint-75/tokenizer_config.json
[2023-07-21 15:42:14,264] [    INFO] - Special tokens file saved in ./checkpoint/model_best/checkpoint-75/special_tokens_map.json
[2023-07-21 15:42:18,485] [    INFO] - Deleting older checkpoint [checkpoint/model_best/checkpoint-25] due to args.save_total_limit
[2023-07-21 15:42:27,793] [    INFO] - loss: 0.00060927, learning_rate: 1e-05, global_step: 80, interval_runtime: 25.6304, interval_samples_per_second: 3.121, interval_steps_per_second: 0.195, epoch: 1.8605
[2023-07-21 15:42:48,729] [    INFO] - loss: 0.00068383, learning_rate: 1e-05, global_step: 85, interval_runtime: 20.9361, interval_samples_per_second: 3.821, interval_steps_per_second: 0.239, epoch: 1.9767
[2023-07-21 15:43:09,835] [    INFO] - loss: 0.00042777, learning_rate: 1e-05, global_step: 90, interval_runtime: 21.1056, interval_samples_per_second: 3.79, interval_steps_per_second: 0.237, epoch: 2.093
[2023-07-21 15:43:30,942] [    INFO] - loss: 0.00013877, learning_rate: 1e-05, global_step: 95, interval_runtime: 21.1075, interval_samples_per_second: 3.79, interval_steps_per_second: 0.237, epoch: 2.2093
[2023-07-21 15:43:52,187] [    INFO] - loss: 0.00042886, learning_rate: 1e-05, global_step: 100, interval_runtime: 21.2446, interval_samples_per_second: 3.766, interval_steps_per_second: 0.235, epoch: 2.3256
[2023-07-21 15:43:52,188] [    INFO] - ***** Running Evaluation *****
[2023-07-21 15:43:52,188] [    INFO] -   Num examples = 35
[2023-07-21 15:43:52,188] [    INFO] -   Total prediction steps = 3
[2023-07-21 15:43:52,188] [    INFO] -   Pre device batch size = 16
[2023-07-21 15:43:52,188] [    INFO] -   Total Batch size = 16
[2023-07-21 15:44:02,369] [    INFO] - eval_loss: 0.001290834159590304, eval_precision: 0.9508196721311475, eval_recall: 0.9206349206349206, eval_f1: 0.9354838709677418, eval_runtime: 10.1799, eval_samples_per_second: 3.438, eval_steps_per_second: 0.295, epoch: 2.3256
[2023-07-21 15:44:02,369] [    INFO] - Saving model checkpoint to ./checkpoint/model_best/checkpoint-100
[2023-07-21 15:44:02,371] [    INFO] - Configuration saved in ./checkpoint/model_best/checkpoint-100/config.json
[2023-07-21 15:44:04,511] [    INFO] - tokenizer config file saved in ./checkpoint/model_best/checkpoint-100/tokenizer_config.json
[2023-07-21 15:44:04,511] [    INFO] - Special tokens file saved in ./checkpoint/model_best/checkpoint-100/special_tokens_map.json
[2023-07-21 15:44:08,763] [    INFO] - Deleting older checkpoint [checkpoint/model_best/checkpoint-50] due to args.save_total_limit
[2023-07-21 15:44:17,868] [    INFO] - loss: 0.00011366, learning_rate: 1e-05, global_step: 105, interval_runtime: 25.6806, interval_samples_per_second: 3.115, interval_steps_per_second: 0.195, epoch: 2.4419
[2023-07-21 15:44:39,049] [    INFO] - loss: 4.777e-05, learning_rate: 1e-05, global_step: 110, interval_runtime: 21.1812, interval_samples_per_second: 3.777, interval_steps_per_second: 0.236, epoch: 2.5581
[2023-07-21 15:45:00,245] [    INFO] - loss: 0.00013845, learning_rate: 1e-05, global_step: 115, interval_runtime: 21.1969, interval_samples_per_second: 3.774, interval_steps_per_second: 0.236, epoch: 2.6744
[2023-07-21 15:45:21,118] [    INFO] - loss: 0.00040561, learning_rate: 1e-05, global_step: 120, interval_runtime: 20.8727, interval_samples_per_second: 3.833, interval_steps_per_second: 0.24, epoch: 2.7907
[2023-07-21 15:45:41,985] [    INFO] - loss: 0.00054928, learning_rate: 1e-05, global_step: 125, interval_runtime: 20.8671, interval_samples_per_second: 3.834, interval_steps_per_second: 0.24, epoch: 2.907
[2023-07-21 15:45:41,986] [    INFO] - ***** Running Evaluation *****
[2023-07-21 15:45:41,986] [    INFO] -   Num examples = 35
[2023-07-21 15:45:41,986] [    INFO] -   Total prediction steps = 3
[2023-07-21 15:45:41,986] [    INFO] -   Pre device batch size = 16
[2023-07-21 15:45:41,986] [    INFO] -   Total Batch size = 16
[2023-07-21 15:45:52,179] [    INFO] - eval_loss: 0.0013684021541848779, eval_precision: 0.9508196721311475, eval_recall: 0.9206349206349206, eval_f1: 0.9354838709677418, eval_runtime: 10.1923, eval_samples_per_second: 3.434, eval_steps_per_second: 0.294, epoch: 2.907
[2023-07-21 15:45:52,180] [    INFO] - Saving model checkpoint to ./checkpoint/model_best/checkpoint-125
[2023-07-21 15:45:52,182] [    INFO] - Configuration saved in ./checkpoint/model_best/checkpoint-125/config.json
[2023-07-21 15:45:54,324] [    INFO] - tokenizer config file saved in ./checkpoint/model_best/checkpoint-125/tokenizer_config.json
[2023-07-21 15:45:54,324] [    INFO] - Special tokens file saved in ./checkpoint/model_best/checkpoint-125/special_tokens_map.json
[2023-07-21 15:45:58,570] [    INFO] - Deleting older checkpoint [checkpoint/model_best/checkpoint-100] due to args.save_total_limit
[2023-07-21 15:46:07,445] [    INFO] - loss: 5.219e-05, learning_rate: 1e-05, global_step: 130, interval_runtime: 25.4597, interval_samples_per_second: 3.142, interval_steps_per_second: 0.196, epoch: 3.0233
[2023-07-21 15:46:28,712] [    INFO] - loss: 0.00026077, learning_rate: 1e-05, global_step: 135, interval_runtime: 21.2671, interval_samples_per_second: 3.762, interval_steps_per_second: 0.235, epoch: 3.1395
[2023-07-21 15:46:49,731] [    INFO] - loss: 6.99e-05, learning_rate: 1e-05, global_step: 140, interval_runtime: 21.0185, interval_samples_per_second: 3.806, interval_steps_per_second: 0.238, epoch: 3.2558
[2023-07-21 15:47:10,751] [    INFO] - loss: 0.00023049, learning_rate: 1e-05, global_step: 145, interval_runtime: 21.0205, interval_samples_per_second: 3.806, interval_steps_per_second: 0.238, epoch: 3.3721
[2023-07-21 15:47:31,889] [    INFO] - loss: 0.00015275, learning_rate: 1e-05, global_step: 150, interval_runtime: 21.1372, interval_samples_per_second: 3.785, interval_steps_per_second: 0.237, epoch: 3.4884
[2023-07-21 15:47:31,889] [    INFO] - ***** Running Evaluation *****
[2023-07-21 15:47:31,889] [    INFO] -   Num examples = 35
[2023-07-21 15:47:31,889] [    INFO] -   Total prediction steps = 3
[2023-07-21 15:47:31,890] [    INFO] -   Pre device batch size = 16
[2023-07-21 15:47:31,890] [    INFO] -   Total Batch size = 16
[2023-07-21 15:47:42,271] [    INFO] - eval_loss: 0.0013476903550326824, eval_precision: 0.9508196721311475, eval_recall: 0.9206349206349206, eval_f1: 0.9354838709677418, eval_runtime: 10.3813, eval_samples_per_second: 3.371, eval_steps_per_second: 0.289, epoch: 3.4884
[2023-07-21 15:47:42,272] [    INFO] - Saving model checkpoint to ./checkpoint/model_best/checkpoint-150
[2023-07-21 15:47:42,274] [    INFO] - Configuration saved in ./checkpoint/model_best/checkpoint-150/config.json
[2023-07-21 15:47:44,424] [    INFO] - tokenizer config file saved in ./checkpoint/model_best/checkpoint-150/tokenizer_config.json
[2023-07-21 15:47:44,424] [    INFO] - Special tokens file saved in ./checkpoint/model_best/checkpoint-150/special_tokens_map.json
[2023-07-21 15:47:48,728] [    INFO] - Deleting older checkpoint [checkpoint/model_best/checkpoint-125] due to args.save_total_limit
[2023-07-21 15:47:57,472] [    INFO] - loss: 0.00024907, learning_rate: 1e-05, global_step: 155, interval_runtime: 25.5832, interval_samples_per_second: 3.127, interval_steps_per_second: 0.195, epoch: 3.6047
[2023-07-21 15:48:18,254] [    INFO] - loss: 0.00027028, learning_rate: 1e-05, global_step: 160, interval_runtime: 20.7824, interval_samples_per_second: 3.849, interval_steps_per_second: 0.241, epoch: 3.7209
[2023-07-21 15:48:39,309] [    INFO] - loss: 0.0001771, learning_rate: 1e-05, global_step: 165, interval_runtime: 21.0551, interval_samples_per_second: 3.8, interval_steps_per_second: 0.237, epoch: 3.8372
[2023-07-21 15:49:00,354] [    INFO] - loss: 0.00024041, learning_rate: 1e-05, global_step: 170, interval_runtime: 21.0449, interval_samples_per_second: 3.801, interval_steps_per_second: 0.238, epoch: 3.9535
[2023-07-21 15:49:21,382] [    INFO] - loss: 4.51e-05, learning_rate: 1e-05, global_step: 175, interval_runtime: 21.0273, interval_samples_per_second: 3.805, interval_steps_per_second: 0.238, epoch: 4.0698
[2023-07-21 15:49:21,382] [    INFO] - ***** Running Evaluation *****
[2023-07-21 15:49:21,382] [    INFO] -   Num examples = 35
[2023-07-21 15:49:21,382] [    INFO] -   Total prediction steps = 3
[2023-07-21 15:49:21,382] [    INFO] -   Pre device batch size = 16
[2023-07-21 15:49:21,382] [    INFO] -   Total Batch size = 16
[2023-07-21 15:49:31,953] [    INFO] - eval_loss: 0.0013263615546748042, eval_precision: 0.9508196721311475, eval_recall: 0.9206349206349206, eval_f1: 0.9354838709677418, eval_runtime: 10.57, eval_samples_per_second: 3.311, eval_steps_per_second: 0.284, epoch: 4.0698
[2023-07-21 15:49:31,954] [    INFO] - Saving model checkpoint to ./checkpoint/model_best/checkpoint-175
[2023-07-21 15:49:31,956] [    INFO] - Configuration saved in ./checkpoint/model_best/checkpoint-175/config.json
[2023-07-21 15:49:34,699] [    INFO] - tokenizer config file saved in ./checkpoint/model_best/checkpoint-175/tokenizer_config.json
[2023-07-21 15:49:34,700] [    INFO] - Special tokens file saved in ./checkpoint/model_best/checkpoint-175/special_tokens_map.json
[2023-07-21 15:49:40,286] [    INFO] - Deleting older checkpoint [checkpoint/model_best/checkpoint-150] due to args.save_total_limit
[2023-07-21 15:49:48,671] [    INFO] - loss: 0.0003263, learning_rate: 1e-05, global_step: 180, interval_runtime: 27.2898, interval_samples_per_second: 2.931, interval_steps_per_second: 0.183, epoch: 4.186
[2023-07-21 15:50:09,486] [    INFO] - loss: 0.00014406, learning_rate: 1e-05, global_step: 185, interval_runtime: 20.8144, interval_samples_per_second: 3.843, interval_steps_per_second: 0.24, epoch: 4.3023
[2023-07-21 15:50:31,097] [    INFO] - loss: 0.00010923, learning_rate: 1e-05, global_step: 190, interval_runtime: 21.6107, interval_samples_per_second: 3.702, interval_steps_per_second: 0.231, epoch: 4.4186
[2023-07-21 15:50:52,282] [    INFO] - loss: 8.216e-05, learning_rate: 1e-05, global_step: 195, interval_runtime: 21.1856, interval_samples_per_second: 3.776, interval_steps_per_second: 0.236, epoch: 4.5349
[2023-07-21 15:51:14,299] [    INFO] - loss: 9.251e-05, learning_rate: 1e-05, global_step: 200, interval_runtime: 22.0164, interval_samples_per_second: 3.634, interval_steps_per_second: 0.227, epoch: 4.6512
[2023-07-21 15:51:14,299] [    INFO] - ***** Running Evaluation *****
[2023-07-21 15:51:14,299] [    INFO] -   Num examples = 35
[2023-07-21 15:51:14,299] [    INFO] -   Total prediction steps = 3
[2023-07-21 15:51:14,299] [    INFO] -   Pre device batch size = 16
[2023-07-21 15:51:14,300] [    INFO] -   Total Batch size = 16
[2023-07-21 15:51:24,773] [    INFO] - eval_loss: 0.0014609990175813437, eval_precision: 0.9508196721311475, eval_recall: 0.9206349206349206, eval_f1: 0.9354838709677418, eval_runtime: 10.4732, eval_samples_per_second: 3.342, eval_steps_per_second: 0.286, epoch: 4.6512
[2023-07-21 15:51:24,774] [    INFO] - Saving model checkpoint to ./checkpoint/model_best/checkpoint-200
[2023-07-21 15:51:24,776] [    INFO] - Configuration saved in ./checkpoint/model_best/checkpoint-200/config.json
[2023-07-21 15:51:27,228] [    INFO] - tokenizer config file saved in ./checkpoint/model_best/checkpoint-200/tokenizer_config.json
[2023-07-21 15:51:27,228] [    INFO] - Special tokens file saved in ./checkpoint/model_best/checkpoint-200/special_tokens_map.json
[2023-07-21 15:51:32,347] [    INFO] - Deleting older checkpoint [checkpoint/model_best/checkpoint-175] due to args.save_total_limit
[2023-07-21 15:51:41,379] [    INFO] - loss: 0.00016781, learning_rate: 1e-05, global_step: 205, interval_runtime: 27.0808, interval_samples_per_second: 2.954, interval_steps_per_second: 0.185, epoch: 4.7674
[2023-07-21 15:52:03,510] [    INFO] - loss: 0.00013611, learning_rate: 1e-05, global_step: 210, interval_runtime: 22.1302, interval_samples_per_second: 3.615, interval_steps_per_second: 0.226, epoch: 4.8837
[2023-07-21 15:52:23,996] [    INFO] - loss: 0.0001641, learning_rate: 1e-05, global_step: 215, interval_runtime: 20.4867, interval_samples_per_second: 3.905, interval_steps_per_second: 0.244, epoch: 5.0
[2023-07-21 15:52:23,997] [    INFO] - ***** Running Evaluation *****
[2023-07-21 15:52:23,997] [    INFO] -   Num examples = 35
[2023-07-21 15:52:23,997] [    INFO] -   Total prediction steps = 3
[2023-07-21 15:52:23,997] [    INFO] -   Pre device batch size = 16
[2023-07-21 15:52:23,997] [    INFO] -   Total Batch size = 16
[2023-07-21 15:52:33,805] [    INFO] - eval_loss: 0.0011874400079250336, eval_precision: 0.9508196721311475, eval_recall: 0.9206349206349206, eval_f1: 0.9354838709677418, eval_runtime: 9.8078, eval_samples_per_second: 3.569, eval_steps_per_second: 0.306, epoch: 5.0
[2023-07-21 15:52:33,806] [    INFO] - Saving model checkpoint to ./checkpoint/model_best/checkpoint-215
[2023-07-21 15:52:33,808] [    INFO] - Configuration saved in ./checkpoint/model_best/checkpoint-215/config.json
[2023-07-21 15:52:36,141] [    INFO] - tokenizer config file saved in ./checkpoint/model_best/checkpoint-215/tokenizer_config.json
[2023-07-21 15:52:36,141] [    INFO] - Special tokens file saved in ./checkpoint/model_best/checkpoint-215/special_tokens_map.json
[2023-07-21 15:52:41,717] [    INFO] - Deleting older checkpoint [checkpoint/model_best/checkpoint-200] due to args.save_total_limit
[2023-07-21 15:52:42,252] [    INFO] - 
Training completed. 

[2023-07-21 15:52:42,252] [    INFO] - Loading best model from ./checkpoint/model_best/checkpoint-75 (score: 0.9354838709677418).
[2023-07-21 15:52:43,847] [    INFO] - train_runtime: 969.9908, train_samples_per_second: 3.536, train_steps_per_second: 0.222, train_loss: 0.0003774468271267535, epoch: 5.0
[2023-07-21 15:52:43,915] [    INFO] - Saving model checkpoint to ./checkpoint/model_best
[2023-07-21 15:52:43,917] [    INFO] - Configuration saved in ./checkpoint/model_best/config.json
[2023-07-21 15:52:46,306] [    INFO] - tokenizer config file saved in ./checkpoint/model_best/tokenizer_config.json
[2023-07-21 15:52:46,306] [    INFO] - Special tokens file saved in ./checkpoint/model_best/special_tokens_map.json
[2023-07-21 15:52:46,314] [    INFO] - ***** train metrics *****
[2023-07-21 15:52:46,315] [    INFO] -   epoch                    =        5.0
[2023-07-21 15:52:46,315] [    INFO] -   train_loss               =     0.0004
[2023-07-21 15:52:46,315] [    INFO] -   train_runtime            = 0:16:09.99
[2023-07-21 15:52:46,315] [    INFO] -   train_samples_per_second =      3.536
[2023-07-21 15:52:46,315] [    INFO] -   train_steps_per_second   =      0.222
[2023-07-21 15:52:46,318] [    INFO] - ***** Running Evaluation *****
[2023-07-21 15:52:46,318] [    INFO] -   Num examples = 35
[2023-07-21 15:52:46,318] [    INFO] -   Total prediction steps = 3
[2023-07-21 15:52:46,318] [    INFO] -   Pre device batch size = 16
[2023-07-21 15:52:46,318] [    INFO] -   Total Batch size = 16
[2023-07-21 15:52:55,755] [    INFO] - eval_loss: 0.001322056632488966, eval_precision: 0.9508196721311475, eval_recall: 0.9206349206349206, eval_f1: 0.9354838709677418, eval_runtime: 9.4374, eval_samples_per_second: 3.709, eval_steps_per_second: 0.318, epoch: 5.0
[2023-07-21 15:52:55,756] [    INFO] - ***** eval metrics *****
[2023-07-21 15:52:55,756] [    INFO] -   epoch                   =        5.0
[2023-07-21 15:52:55,756] [    INFO] -   eval_f1                 =     0.9355
[2023-07-21 15:52:55,756] [    INFO] -   eval_loss               =     0.0013
[2023-07-21 15:52:55,756] [    INFO] -   eval_precision          =     0.9508
[2023-07-21 15:52:55,756] [    INFO] -   eval_recall             =     0.9206
[2023-07-21 15:52:55,756] [    INFO] -   eval_runtime            = 0:00:09.43
[2023-07-21 15:52:55,756] [    INFO] -   eval_samples_per_second =      3.709
[2023-07-21 15:52:55,756] [    INFO] -   eval_steps_per_second   =      0.318
[2023-07-21 15:52:55,759] [    INFO] - Exporting inference model to ./checkpoint/model_best/model
[2023-07-21 15:53:55,567] [    INFO] - Inference model exported.

6.模型评估

!python evaluate.py \
    --device "gpu" \
    --model_path ./checkpoint/model_best \
    --test_path ./medical_checklist/dev.txt \
    --output_dir ./checkpoint/model_best \
    --label_names 'start_positions' 'end_positions'\
    --max_seq_len 512 \
    --per_device_eval_batch_size 16

[2023-07-21 15:55:25,012] [    INFO] - The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).
[2023-07-21 15:55:25,012] [    INFO] - ============================================================
[2023-07-21 15:55:25,013] [    INFO] -      Model Configuration Arguments      
[2023-07-21 15:55:25,013] [    INFO] - paddle commit id              :3fa7a736e32508e797616b6344d97814c37d3ff8
[2023-07-21 15:55:25,013] [    INFO] - model_path                    :./checkpoint/model_best
[2023-07-21 15:55:25,013] [    INFO] - 
[2023-07-21 15:55:25,013] [    INFO] - ============================================================
[2023-07-21 15:55:25,013] [    INFO] -       Data Configuration Arguments      
[2023-07-21 15:55:25,013] [    INFO] - paddle commit id              :3fa7a736e32508e797616b6344d97814c37d3ff8
[2023-07-21 15:55:25,013] [    INFO] - debug                         :False
[2023-07-21 15:55:25,013] [    INFO] - max_seq_len                   :512
[2023-07-21 15:55:25,013] [    INFO] - schema_lang                   :ch
[2023-07-21 15:55:25,013] [    INFO] - test_path                     :./medical_checklist/dev.txt
[2023-07-21 15:55:25,013] [    INFO] - 
[2023-07-21 15:55:25,014] [    INFO] - We are using <class 'paddlenlp.transformers.ernie_layout.tokenizer.ErnieLayoutTokenizer'> to load './checkpoint/model_best'.
[2023-07-21 15:55:25,693] [    INFO] - loading configuration file ./checkpoint/model_best/config.json
[2023-07-21 15:55:25,694] [    INFO] - Model config ErnieLayoutConfig {
  "architectures": [
    "UIEX"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "coordinate_size": 128,
  "dtype": "float32",
  "enable_recompute": false,
  "eos_token_id": 2,
  "fuse": false,
  "gradient_checkpointing": false,
  "has_relative_attention_bias": true,
  "has_spatial_attention_bias": true,
  "has_visual_segment_embedding": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "image_feature_pool_shape": [
    7,
    7,
    256
  ],
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_2d_position_embeddings": 1024,
  "max_position_embeddings": 514,
  "max_rel_2d_pos": 256,
  "max_rel_pos": 128,
  "model_type": "ernie_layout",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "output_past": true,
  "pad_token_id": 1,
  "paddlenlp_version": null,
  "pool_act": "tanh",
  "rel_2d_pos_bins": 64,
  "rel_pos_bins": 32,
  "shape_size": 128,
  "task_id": 0,
  "task_type_vocab_size": 3,
  "type_vocab_size": 100,
  "use_task_id": true,
  "vocab_size": 250002
}

W0721 15:55:29.126700  3399 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 11.2
W0721 15:55:29.130168  3399 gpu_resources.cc:91] device: 0, cuDNN Version: 8.2.
[2023-07-21 15:55:31,058] [    INFO] - All model checkpoint weights were used when initializing UIEX.

[2023-07-21 15:55:31,058] [    INFO] - All the weights of UIEX were initialized from the model checkpoint at ./checkpoint/model_best.
If your task is similar to the task the model of the checkpoint was trained on, you can already use UIEX for predictions without further training.
[2023-07-21 15:55:31,259] [    INFO] - ============================================================
[2023-07-21 15:55:31,259] [    INFO] -     Training Configuration Arguments    
[2023-07-21 15:55:31,259] [    INFO] - paddle commit id              :3fa7a736e32508e797616b6344d97814c37d3ff8
[2023-07-21 15:55:31,260] [    INFO] - _no_sync_in_gradient_accumulation:True
[2023-07-21 15:55:31,260] [    INFO] - adam_beta1                    :0.9
[2023-07-21 15:55:31,260] [    INFO] - adam_beta2                    :0.999
[2023-07-21 15:55:31,260] [    INFO] - adam_epsilon                  :1e-08
[2023-07-21 15:55:31,260] [    INFO] - bf16                          :False
[2023-07-21 15:55:31,260] [    INFO] - bf16_full_eval                :False
[2023-07-21 15:55:31,260] [    INFO] - current_device                :gpu:0
[2023-07-21 15:55:31,260] [    INFO] - dataloader_drop_last          :False
[2023-07-21 15:55:31,260] [    INFO] - dataloader_num_workers        :0
[2023-07-21 15:55:31,260] [    INFO] - device                        :gpu
[2023-07-21 15:55:31,260] [    INFO] - disable_tqdm                  :False
[2023-07-21 15:55:31,260] [    INFO] - do_eval                       :False
[2023-07-21 15:55:31,260] [    INFO] - do_export                     :False
[2023-07-21 15:55:31,260] [    INFO] - do_predict                    :False
[2023-07-21 15:55:31,260] [    INFO] - do_train                      :False
[2023-07-21 15:55:31,260] [    INFO] - eval_batch_size               :16
[2023-07-21 15:55:31,261] [    INFO] - eval_steps                    :None
[2023-07-21 15:55:31,261] [    INFO] - evaluation_strategy           :IntervalStrategy.NO
[2023-07-21 15:55:31,261] [    INFO] - flatten_param_grads           :False
[2023-07-21 15:55:31,261] [    INFO] - fp16                          :False
[2023-07-21 15:55:31,261] [    INFO] - fp16_full_eval                :False
[2023-07-21 15:55:31,261] [    INFO] - fp16_opt_level                :O1
[2023-07-21 15:55:31,261] [    INFO] - gradient_accumulation_steps   :1
[2023-07-21 15:55:31,261] [    INFO] - greater_is_better             :None
[2023-07-21 15:55:31,261] [    INFO] - ignore_data_skip              :False
[2023-07-21 15:55:31,261] [    INFO] - label_names                   :['start_positions', 'end_positions']
[2023-07-21 15:55:31,261] [    INFO] - lazy_data_processing          :True
[2023-07-21 15:55:31,261] [    INFO] - learning_rate                 :5e-05
[2023-07-21 15:55:31,261] [    INFO] - load_best_model_at_end        :False
[2023-07-21 15:55:31,261] [    INFO] - local_process_index           :0
[2023-07-21 15:55:31,261] [    INFO] - local_rank                    :-1
[2023-07-21 15:55:31,261] [    INFO] - log_level                     :-1
[2023-07-21 15:55:31,261] [    INFO] - log_level_replica             :-1
[2023-07-21 15:55:31,261] [    INFO] - log_on_each_node              :True
[2023-07-21 15:55:31,261] [    INFO] - logging_dir                   :./checkpoint/model_best/runs/Jul21_15-55-25_jupyter-2631487-6518069
[2023-07-21 15:55:31,262] [    INFO] - logging_first_step            :False
[2023-07-21 15:55:31,262] [    INFO] - logging_steps                 :500
[2023-07-21 15:55:31,262] [    INFO] - logging_strategy              :IntervalStrategy.STEPS
[2023-07-21 15:55:31,262] [    INFO] - lr_scheduler_type             :SchedulerType.LINEAR
[2023-07-21 15:55:31,262] [    INFO] - max_grad_norm                 :1.0
[2023-07-21 15:55:31,262] [    INFO] - max_steps                     :-1
[2023-07-21 15:55:31,262] [    INFO] - metric_for_best_model         :None
[2023-07-21 15:55:31,262] [    INFO] - minimum_eval_times            :None
[2023-07-21 15:55:31,262] [    INFO] - no_cuda                       :False
[2023-07-21 15:55:31,262] [    INFO] - num_train_epochs              :3.0
[2023-07-21 15:55:31,262] [    INFO] - optim                         :OptimizerNames.ADAMW
[2023-07-21 15:55:31,262] [    INFO] - output_dir                    :./checkpoint/model_best
[2023-07-21 15:55:31,262] [    INFO] - overwrite_output_dir          :False
[2023-07-21 15:55:31,262] [    INFO] - past_index                    :-1
[2023-07-21 15:55:31,262] [    INFO] - per_device_eval_batch_size    :16
[2023-07-21 15:55:31,262] [    INFO] - per_device_train_batch_size   :8
[2023-07-21 15:55:31,262] [    INFO] - prediction_loss_only          :False
[2023-07-21 15:55:31,262] [    INFO] - process_index                 :0
[2023-07-21 15:55:31,262] [    INFO] - recompute                     :False
[2023-07-21 15:55:31,262] [    INFO] - remove_unused_columns         :True
[2023-07-21 15:55:31,262] [    INFO] - report_to                     :['visualdl']
[2023-07-21 15:55:31,262] [    INFO] - resume_from_checkpoint        :None
[2023-07-21 15:55:31,262] [    INFO] - run_name                      :./checkpoint/model_best
[2023-07-21 15:55:31,262] [    INFO] - save_on_each_node             :False
[2023-07-21 15:55:31,262] [    INFO] - save_steps                    :500
[2023-07-21 15:55:31,263] [    INFO] - save_strategy                 :IntervalStrategy.STEPS
[2023-07-21 15:55:31,263] [    INFO] - save_total_limit              :None
[2023-07-21 15:55:31,263] [    INFO] - scale_loss                    :32768
[2023-07-21 15:55:31,263] [    INFO] - seed                          :42
[2023-07-21 15:55:31,263] [    INFO] - sharding                      :[]
[2023-07-21 15:55:31,263] [    INFO] - sharding_degree               :-1
[2023-07-21 15:55:31,263] [    INFO] - should_log                    :True
[2023-07-21 15:55:31,263] [    INFO] - should_save                   :True
[2023-07-21 15:55:31,263] [    INFO] - skip_memory_metrics           :True
[2023-07-21 15:55:31,263] [    INFO] - train_batch_size              :8
[2023-07-21 15:55:31,263] [    INFO] - warmup_ratio                  :0.0
[2023-07-21 15:55:31,263] [    INFO] - warmup_steps                  :0
[2023-07-21 15:55:31,263] [    INFO] - weight_decay                  :0.0
[2023-07-21 15:55:31,263] [    INFO] - world_size                    :1
[2023-07-21 15:55:31,263] [    INFO] - 
[2023-07-21 15:55:31,263] [    INFO] - ***** Running Evaluation *****
[2023-07-21 15:55:31,263] [    INFO] -   Num examples = 35
[2023-07-21 15:55:31,263] [    INFO] -   Total prediction steps = 3
[2023-07-21 15:55:31,263] [    INFO] -   Pre device batch size = 16
[2023-07-21 15:55:31,264] [    INFO] -   Total Batch size = 16
100%|█████████████████████████████████████████████| 3/3 [00:03<00:00,  1.31s/it]
[2023-07-21 15:55:41,222] [    INFO] - -----Evaluate model-------
[2023-07-21 15:55:41,222] [    INFO] - Class Name: ALL CLASSES
[2023-07-21 15:55:41,222] [    INFO] - Evaluation Precision: 0.95082 | Recall: 0.92063 | F1: 0.93548
[2023-07-21 15:55:41,222] [    INFO] - -----------------------------

7.Taskflow一键部署

from pprint import pprint
from paddlenlp import Taskflow
schema = {
    '项目名称': [
        '结果',
        '单位',
        '参考范围'
    ]
}
my_ie = Taskflow("information_extraction", model="uie-x-base", schema=schema, task_path='./checkpoint/model_best')

pprint(my_ie({"doc": "test.jpg"}))

[{'项目名称': [{'bbox': [[417, 598, 764, 653]],
            'end': 161,
            'probability': 0.9931185709767476,
            'relations': {'单位': [{'bbox': [[1383, 603, 1475, 653]],
                                  'end': 170,
                                  'probability': 0.9982062669088805,
                                  'start': 166,
                                  'text': 'ng/L'}],
                          '参考范围': [{'bbox': [[1603, 603, 1717, 650]],
                                    'end': 175,
                                    'probability': 0.994915152253455,
                                    'start': 170,
                                    'text': '0-0.2'}],
                          '结果': [{'bbox': [[1055, 608, 1161, 647]],
                                  'end': 166,
                                  'probability': 0.9779773840612904,
                                  'start': 161,
                                  'text': '0.000'}]},
            'start': 150,
            'text': '乙肝表面抗原HBsAg'},
           {'bbox': [[420, 803, 807, 850]],
            'end': 263,
            'probability': 0.9839514684545492,
            'relations': {'单位': [{'bbox': [[1382, 800, 1481, 856]],
                                  'end': 272,
                                  'probability': 0.9902134016753692,
                                  'start': 268,
                                  'text': 'U/mL'}],
                          '参考范围': [{'bbox': [[1609, 806, 1717, 845]],
                                    'end': 277,
                                    'probability': 0.9948578061238109,
                                    'start': 272,
                                    'text': '0-0.2'}],
                          '结果': [{'bbox': [[1055, 806, 1163, 853]],
                                  'end': 268,
                                  'probability': 0.9997722031372689,
                                  'start': 263,
                                  'text': '0.081'}]},
            'start': 248,
            'text': '乙肝e抗体Anti-HBeAB'},
           {'bbox': [[417, 671, 863, 718]],
            'end': 197,
            'probability': 0.9933030680080606,
            'relations': {'单位': [{'bbox': [[1383, 671, 1512, 717]],
                                  'end': 208,
                                  'probability': 0.993252639775573,
                                  'start': 202,
                                  'text': 'MIU/mL'}],
                          '参考范围': [{'bbox': [[1603, 671, 1697, 717]],
                                    'end': 212,
                                    'probability': 0.9968451209051636,
                                    'start': 208,
                                    'text': '0-10'}],
                          '结果': [{'bbox': [[1055, 676, 1163, 715]],
                                  'end': 202,
                                  'probability': 0.9627551951018489,
                                  'start': 197,
                                  'text': '0.000'}]},
            'start': 181,
            'text': '乙肝表面抗体Anti-HBsAB'},
           {'bbox': [[420, 735, 706, 785]],
            'end': 228,
            'probability': 0.9925530039269148,
            'relations': {'单位': [{'bbox': [[1383, 738, 1475, 785]],
                                  'end': 237,
                                  'probability': 0.9953925121749307,
                                  'start': 233,
                                  'text': 'U/mL'}],
                          '参考范围': [{'bbox': [[1606, 741, 1715, 780]],
                                    'end': 242,
                                    'probability': 0.9982005347972311,
                                    'start': 237,
                                    'text': '0-0.5'}],
                          '结果': [{'bbox': [[1057, 743, 1163, 782]],
                                  'end': 233,
                                  'probability': 0.9943726871306069,
                                  'start': 228,
                                  'text': '0.000'}]},
            'start': 218,
            'text': '乙肝e抗原HBeAg'},
           {'bbox': [[420, 871, 870, 918]],
            'end': 299,
            'probability': 0.9931226228703274,
            'relations': {'单位': [{'bbox': [[1389, 871, 1477, 918]],
                                  'end': 308,
                                  'probability': 0.9990609045893919,
                                  'start': 304,
                                  'text': 'U/mL'}],
                          '参考范围': [{'bbox': [[1611, 873, 1717, 912]],
                                    'end': 313,
                                    'probability': 0.9937555165322465,
                                    'start': 308,
                                    'text': '0-0.9'}],
                          '结果': [{'bbox': [[1054, 867, 1169, 921]],
                                  'end': 304,
                                  'probability': 0.9996564084931308,
                                  'start': 299,
                                  'text': '1.053'}]},
            'start': 283,
            'text': '乙肝核心抗体Anti-HBcAB'},
           {'bbox': [[415, 536, 794, 580]],
            'end': 130,
            'probability': 0.9905078246100985,
            'relations': {'单位': [{'bbox': [[1383, 536, 1475, 585]],
                                  'end': 139,
                                  'probability': 0.9996564019316949,
                                  'start': 135,
                                  'text': 's/co'}],
                          '参考范围': [{'bbox': [[1603, 533, 1745, 588]],
                                    'end': 144,
                                    'probability': 0.9937541085628041,
                                    'start': 139,
                                    'text': '阴性（-)'}],
                          '结果': [{'bbox': [[1055, 536, 1194, 582]],
                                  'end': 135,
                                  'probability': 0.9912728416351548,
                                  'start': 130,
                                  'text': '阴性（-）'}]},
            'start': 118,
            'text': '乙肝病毒前S1抗原HBV'}]}]

图像展示

import matplotlib.pyplot as plt
from paddlenlp.utils.doc_parser import DocParser

results = my_ie({"doc": "test.jpg"})

img_show = DocParser.write_image_with_results(
    "test.jpg",
    result=results[0], 
    return_image=True)

plt.figure(figsize=(15,15))
plt.imshow(img_show)
plt.show()

项目地址：https://aistudio.baidu.com/aistudio/projectdetail/6518069?sUid=2631487&shared=1&ts=1690163802670