用python语言统计PPT文档的所有slides备注的总字数

发布时间 2023-07-09 16:28:47作者: 华东博客

在一些场景下我们需要对PPT的备注进行字数统计, 比如非常严格的项目答辩、报奖等的PPT音频录制。但是我们发现Macrosoft PowerPoint和WPS PPT等,都没有直接的统计功能,官方提供的统计指导,速度非常慢效率很低。下面提供一种通过Python快速统计中文备注的方法。

方法:

使用python-pptx库来读取和分析PowerPoint文档。这个库提供了一个可以访问、修改和创建PowerPoint .pptx文件的API。使用它来读取slide的备注,并统计总字数。

下面是一个基本的示例代码:

from pptx import Presentation

def count_characters_in_notes(ppt_file):
    ppt = Presentation(ppt_file)
    total_characters = 0
    for k, slide in enumerate(ppt.slides):
        
        if slide.has_notes_slide:
            notes_slide = slide.notes_slide
            slide_word_counts = 0
            for paragraph in notes_slide.notes_text_frame.paragraphs:
                for run in paragraph.runs:
                    total_characters += len(run.text)  # 直接计算字符数,而不是分割后的单词数
                    slide_word_counts += len(run.text)
            print(f'slide {k}: {slide_word_counts}')
                
    return total_characters


file = '法律智能答辩.pptx'
count_words_in_notes = count_characters_in_notes(file)
print(f"Total word count in notes: {count_words_in_notes}")

  

运行结果:

slide 0: 50
slide 1: 15
slide 2: 68
slide 3: 55
slide 4: 125
slide 5: 35
slide 6: 85
slide 7: 62
slide 8: 102
slide 9: 36
slide 10: 31
slide 11: 63
slide 12: 99
slide 13: 105
slide 14: 48
slide 15: 21
slide 16: 140
slide 17: 24
slide 18: 124
slide 19: 62
slide 20: 52
slide 21: 27
slide 22: 53
slide 23: 31
slide 24: 85
slide 25: 79
slide 26: 68
slide 27: 43
slide 28: 36
slide 29: 70
slide 30: 87
slide 31: 90
slide 32: 35
slide 33: 41
slide 34: 89
slide 35: 19
slide 36: 21
slide 37: 74
slide 38: 79
slide 39: 23
slide 40: 115
slide 41: 30
slide 42: 84
slide 43: 77
slide 44: 86
slide 45: 20
slide 46: 74
slide 47: 100
slide 48: 54
slide 49: 149
Total word count in notes: 3241