tick数据和K线数据 写入csv还是hdf5

发布时间 2023-08-15 19:10:43作者: C羽言

写入csv最省事,每次更新,都可以按行写入。hdf5如果是整块写入速度很快,但是记录实时行情需要不断追加,效率非常低下。

csv最大的问题是读取,如果长年用一个csv文件记录一个品种,这个csv文件会变得很臃肿,读取csv对内存是个大考验。hdf5连接的是一个对象,然后查询该对象最后1000行数据,占内存很小,效率非常高

hdf5要想像csv那样按行写入,就只能每次获取到数据转换成DataFrame,效率非常低下。下面看代码测试。

import numpy as np
import pandas as pd
import csv
df=pd.read_csv('a.csv')

%%time
csvheader = ['date','open','high','low','close']
csvfile = open('t1.csv', 'w', newline='')
csvfile_w = csv.writer(csvfile)
csvfile_w.writerow(csvheader)
i=0
for date,o,h,l,c in zip(df['date'],df['open'],df['high'],df['low'],df['close']):
    i+=1
    if i>100:
        break
    mdlist=[date,o,h,l,c]
    csvfile_w.writerow(mdlist)
csvfile.close()
# Wall time: 34.9 ms

%%time
store=pd.HDFStore('t1','w')
i=0
for date,o,h,l,c in zip(df['date'],df['open'],df['high'],df['low'],df['close']):
    i+=1
    if i>100:
        break
    mdlist=[date,o,h,l,c]
    df1=pd.DataFrame([mdlist],columns=['date','open','high','low','close'])
    store.append('df1',df1)

store.close()
# Wall time: 464 ms