中财网爬取上市公司第一大股东持股比例

发布时间 2023-03-22 21:13:58作者: package_main

1.目标

在中财网(https://www.cfi.cn/) 获取给定上市股票、给定年分的第一大股东持股比例,如下图所示:

image-20230321211912588

  • 分析xhr请求

image-20230321211848424

查看payload需要三个参数,但是非常简单哈,contenttypejzrq非常简单,主要是stockid为什么不是我们熟悉的六位的股票代码呢?

image-20230321212154468

在网站上看到股票代码的页面如下:

image-20230321212358304

从上面的网页源代码中,可以找到对应的stockid

image-20230321212601692

  • 将请求转化为python代码
import requests,re

headers = {
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
    'Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8',
    'Connection': 'keep-alive',
    'Referer': 'https://quote.cfi.cn/quote.aspx?actstockid=7&actcontenttype=gdtj&client=pc&searchcode=',
    'Sec-Fetch-Dest': 'document',
    'Sec-Fetch-Mode': 'navigate',
    'Sec-Fetch-Site': 'same-origin',
    'Sec-Fetch-User': '?1',
    'Upgrade-Insecure-Requests': '1',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36',
    'sec-ch-ua': '"Google Chrome";v="111", "Not(A:Brand";v="8", "Chromium";v="111"',
    'sec-ch-ua-mobile': '?0',
    'sec-ch-ua-platform': '"Windows"',
}

def getTable(stockid,jzrq):

    params = {
        'stockid': stockid,
        'contenttype': 'gdtj',
        'jzrq': jzrq,
    }

    response = requests.get('https://quote.cfi.cn/quote.aspx', params=params, headers=headers)
    return response.text


def reg_find(text):
    """
    </td><td>23.67%</td><td>
    """
    anss = re.findall(r'</td><td>([\d|\.]*)%</td><td>',text)
    if len(anss) == 0:
        print("error")
        exit(0)
    return anss[0]


def id2stkid(uid):

    params = {
        't': '12',
    }

    response = requests.get('https://quote.cfi.cn/stockList.aspx', params=params, headers=headers)
    ans = re.findall(rf"onclick=\"stock_clickFunc\((\d+),\'{uid}\'\)",response.text)
    return ans

if __name__ == "__main__":
    codes = ['000001','000002','000008']
    for i in codes:
        ncode = id2stkid(i)
        text = getTable(ncode,'2020-06-30')
        ans = reg_find(text)
        print(ans)



  • 运行截图

image-20230321212643949

qq群:529528142