在看菜鸟的pandas对格式错误清洗时,发现菜鸟提供的代码在我现在的版本跑不通。
把报错在网上找了半天都是把报错errors参数给修改的。
最后重看了下报错信息,发现把format改成mixed,告诉pandas数据格式混合就可以(汗),应该是python3版本太新的问题
报错代码:
import pandas as pd
# 第三个日期格式错误
data = {
"Date": ['2020/12/01', '2020/12/02' , '20201226'],
"duration": [50, 40, 45]
}
df = pd.DataFrame(data, index = ["day1", "day2", "day3"])
df['Date'] = pd.to_datetime(df['Date'])
print(df.to_string())
错误信息:
ValueError: time data "20201226" doesn't match format "%Y/%m/%d", at position 2. You might want to try:
- passing `format` if your strings have a consistent format;
- passing `format='ISO8601'` if your strings are all ISO8601 but not necessarily in exactly the same format;
- passing `format='mixed'`, and the format will be inferred for each element individually. You might want to use `dayfirst` alongside this.
修复代码:
import pandas as pd
# 第三个日期格式错误
data = {
"Date": ['2020/12/01', '2020/12/02' , '20201226'],
"duration": [50, 40, 45]
}
df = pd.DataFrame(data, index = ["day1", "day2", "day3"])
df['Date'] = pd.to_datetime(df['Date'], format='mixed')
# df['Date'] = pd.to_datetime(df['Date'],format="%Y/%m/%d",errors='ignore')
print(df.to_string())