写入csv最省事,每次更新,都可以按行写入。hdf5如果是整块写入速度很快,但是记录实时行情需要不断追加,效率非常低下。
csv最大的问题是读取,如果长年用一个csv文件记录一个品种,这个csv文件会变得很臃肿,读取csv对内存是个大考验。hdf5连接的是一个对象,然后查询该对象最后1000行数据,占内存很小,效率非常高
hdf5要想像csv那样按行写入,就只能每次获取到数据转换成DataFrame,效率非常低下。下面看代码测试。
import numpy as np
import pandas as pd
import csv
df=pd.read_csv('a.csv')
%%time
csvheader = ['date','open','high','low','close']
csvfile = open('t1.csv', 'w', newline='')
csvfile_w = csv.writer(csvfile)
csvfile_w.writerow(csvheader)
i=0
for date,o,h,l,c in zip(df['date'],df['open'],df['high'],df['low'],df['close']):
i+=1
if i>100:
break
mdlist=[date,o,h,l,c]
csvfile_w.writerow(mdlist)
csvfile.close()
# Wall time: 34.9 ms
%%time
store=pd.HDFStore('t1','w')
i=0
for date,o,h,l,c in zip(df['date'],df['open'],df['high'],df['low'],df['close']):
i+=1
if i>100:
break
mdlist=[date,o,h,l,c]
df1=pd.DataFrame([mdlist],columns=['date','open','high','low','close'])
store.append('df1',df1)
store.close()
# Wall time: 464 ms