python正则表达式记录

发布时间 2023-03-31 17:36:30作者: llllrj

今天写个脚本用到正则表达式,查阅了资料加问了gpt老师终于解决,在此记录。正则匹配的规则还是挺多挺复杂的,想要得心应手也不是非常简单。

记录两种正则表达式有用的用法:

1、匹配指定了前后文的字符串

如我们需要匹配'on the one hand'中的'one',而不要'on the other hand'中的'other';需要用到正则表达式语法中的“特殊构造”:(?...),之所以是特殊构造因为这个(?...)不会像正常括号算在分组中,而只作为匹配要求。

 

import re

text = "On the one hand, we have the option to do X. On the other hand, we have the option to do Y."
pattern = "(?<=in the )one(?= hand)"

matches = re.findall(pattern, text)
print(matches) # ['one']

 

2、有大量文本需要替换,且具有替换规则

如现在text = "On the one hand, we have the option to do X. On the two hand, we have the option to do Y.On the other hand, we have the option to do Z."

我们要把'one'->'1','two'->2,则可以用如下写法

import re

project = {
        'one': '1',
        'two': '2'
    }

text = "On the one hand, we have the option to do X. On the two hand, we have the option to do Y.On the other hand, we have the option to do Z."
pattern = "(?<=in the )" + '|'.join(project.keys()) + "(?= hand)"

res = re.sub(ptn,
               lambda match: project[match],
               text)
print(res)
# "On the 1 hand, we have the option to do X. On the 2 hand, we have the option to do Y.On the other hand, we have the option to do Z."

注意此处用到了re.sub(pattern, repl, string, count=0, flags=0)

需要注意的点是参数repl可以是字符串,也可以是一个函数,若为字符串很好理解;若是函数则输入的参数为match,是pattern匹配了string后的结果。所以上面用lambda match: project[match]返回匹配了'one','two'映射后的字符串

def sub(pattern, repl, string, count=0, flags=0):
    """Return the string obtained by replacing the leftmost
    non-overlapping occurrences of the pattern in string by the
    replacement repl.  repl can be either a string or a callable;
    if a string, backslash escapes in it are processed.  If it is
    a callable, it's passed the Match object and must return
    a replacement string to be used."""
    return _compile(pattern, flags).sub(repl, string, count)