一个C函数异常,没有进入函数就报FAULTADDR,根因定位发现是栈溢出

发布时间 2024-01-06 21:45:19作者: bonelee

最近在写用C写算法的过程中,发现一个异常,非常有趣,现象如下:

[2024-01-03 20:34:54]  ================= Exception info (no: 1 idx: 0) =================               
[2024-01-03 20:34:54]  Exception Type:             11
[2024-01-03 20:34:54]  Exception Subtype:          2
[2024-01-03 20:34:54]  Exception Name:             SIGSEGV
[2024-01-03 20:34:54]  Exception Slot/CPU:         0/0
[2024-01-03 20:34:54]  Exception Process:          13947
[2024-01-03 20:34:54]  Exception Process Name:     asset-main
[2024-01-03 20:34:54]  Exception Time:             2024-01-03 16:27:01
[2024-01-03 20:34:54]  Exception Tick:             0x00000000 (CPU Tick High) 0x00000000 (CPU Tick Low)
[2024-01-03 20:34:54]  YunShan OS Version:         V100R023C10B001
[2024-01-03 20:34:54]  DOPRA Version:              DOPRA SSP V300R023C10SPC011B200
[2024-01-03 20:34:54]  PATCH:                      NA
[2024-01-03 20:34:55]  Exception Sender PID:       2113030912
[2024-01-03 20:34:55]  Exception Sender NAME:      
[2024-01-03 20:34:55]  Register Contents:          
[2024-01-03 20:34:55]  Reg:  X0         0x0000fffc8001f618   Reg:  X1         0x0000fffcb270fbf0
[2024-01-03 20:34:55]  Reg:  X2         0x0000fffcb26fd780   Reg:  X3         0x000000000000000d
[2024-01-03 20:34:55]  Reg:  X4         0x0000ffff80521b40   Reg:  X5         0x000000000000000e
[2024-01-03 20:34:55]  Reg:  X6         0x0000000000000000   Reg:  X7         0x0000000000000000
[2024-01-03 20:34:55]    ---- More ----Reg:  X8         0x0000fffc900d4028   Reg:  X9         0x00000000000249ed
[2024-01-03 20:35:27]  Reg:  X10        0x00000000000249f0   Reg:  X11        0x0000000000249f00
[2024-01-03 20:35:27]  Reg:  X12        0x0000fffc9031df08   Reg:  X13        0x0000000000249f00
[2024-01-03 20:35:27]  Reg:  X14        0x0000000000004b19   Reg:  X15        0x0000fffc900d4018
[2024-01-03 20:35:27]  Reg:  X16        0x0000fffcb27389b8   Reg:  X17        0x0000fffcb26e7630
[2024-01-03 20:35:27]  Reg:  X18        0x0000fffc9031df1c   Reg:  X19        0x0000fffcb2739000
[2024-01-03 20:35:28]  Reg:  X20        0x0000fffcb27f1b98   Reg:  X21        0x0000ffff7df4abc7
[2024-01-03 20:35:28]  Reg:  X22        0x0000fffc9bfeee70   Reg:  X23        0x0000fffc8001f618
[2024-01-03 20:35:28]  Reg:  X24        0x0000ffff7df4c5e0   Reg:  X25        0x0000ffff7df4c90a
[2024-01-03 20:35:28]  Reg:  X26        0x0000ffff80519140   Reg:  X27        0x0000fffcb28042c8
[2024-01-03 20:35:28]  Reg:  X28        0x0000ffff7df2e000   Reg:  X29        0x0000ffff7df4ab60
[2024-01-03 20:35:28]  Reg:  X30        0x0000fffcb26e7a34   Reg:  SP         0x0000ffff7df24b00
[2024-01-03 20:35:28]  Reg:  PC         0x0000fffcb26e7638   Reg:  LR         0x0000fffcb26e7a34
[2024-01-03 20:35:28]  Reg:  PSTATE     0x0000000060000000   Reg:  FAULTADDR  0x0000ffff7df24b00
[2024-01-03 20:35:28]  
[2024-01-03 20:35:28]  Show CallStack:
[2024-01-03 20:35:28]  Instruction Address: 0x0000fffcb26e7638, Func:[libasset.so+0x24638](deserialize_asset_portray+0x8)
[2024-01-03 20:35:28]  Instruction Address: 0x0000fffcb2703004, Func:[libasset.so+0x40004](asset_handle_counterfeit_detection+0x2c)
[2024-01-03 20:35:28]  Instruction Address: 0x0000fffcb27048b4, Func:[libasset.so+0x418b4](asset_detect_iot_security+0xcc)
[2024-01-03 20:35:28]  Instruction Address: 0x0000fffcb2701c2c, Func:[libasset.so+0x3ec2c](asset_qcache_shm_refresh+0x6c)
[2024-01-03 20:35:29]  Instruction Address: 0x0000fffcb2701e20, Func:[libasset.so+0x3ee20](asset_qcache_add+0x150)
[2024-01-03 20:35:29]  Instruction Address: 0x0000fffcb26e4678, Func:[libasset.so+0x21678](asset_service_msg_add+0x110)
[2024-01-03 20:35:29]  Instruction Address: 0x0000fffcb26e35d4, Func:[libasset.so+0x205d4](asset_filter_msg_send+0xcc)
[2024-01-03 20:35:29]  Instruction Address: 0x0000fffcb2803ac0, Func:[libase_asset.so+0x3ac0](asset_dc_rev+0x70)
[2024-01-03 20:35:29]  Instruction Address: 0x0000ffff7de54950, Func:[libplat_adp.so+0xc950](ase_dc_recv_handle+0x78)
[2024-01-03 20:35:29]  Instruction Address: 0x0000ffff804a334c, Func:[libase_utils.so+0x2134c](sched_proc_evts+0xac)
[2024-01-03 20:35:29]  Instruction Address: 0x0000ffff804a31ec, Func:[libase_utils.so+0x211ec](sched_evt_loop_inner+0x174)
[2024-01-03 20:35:29]  Instruction Address: 0x0000ffff804a2f9c, Func:[libase_utils.so+0x20f9c](sched_evt_loop+0x94)
[2024-01-03 20:35:29]  Instruction Address: 0x0000fffcb2804274, Func:[libase_asset.so+0x4274](0xfffcb2804274)
[2024-01-03 20:35:30]  Instruction Address: 0x0000ffff804c6d5c, Func:[libase_utils.so+0x44d5c](sec_thread_main+0xec)
[2024-01-03 20:35:30]  Instruction Address: 0x0000ffff7dc6c424, Func:[libc.so.6+0x82424](0xffff7dc6c424)
[2024-01-03 20:35:30]  Instruction Address: 0x0000ffff7dcdaf1c, Func:[libc.so.6+0xf0f1c](0xffff7dcdaf1c)

  

可以看到是:

[2024-01-03 20:35:28]  Reg:  X30        0x0000fffcb26e7a34   Reg:  SP         0x0000ffff7df24b00
[2024-01-03 20:35:28]  Reg:  PC         0x0000fffcb26e7638   Reg:  LR         0x0000fffcb26e7a34
[2024-01-03 20:35:28]  Reg:  PSTATE     0x0000000060000000   Reg:  FAULTADDR  0x0000ffff7df24b00

SP的寄存器指向了错误的内存地址。从我的打印日志看 ,根本就没有进入这个函数就出错了!

反汇编看下,具体出错的汇编行:

(gdb) disassemble deserialize_asset_portray
Dump of assembler code for function deserialize_asset_portray:
   0x0000fffcc8127dc8 <+0>:     sub     sp, sp, #0x60
   0x0000fffcc8127dcc <+4>:     sub     sp, sp, #0x26, lsl #12
   0x0000fffcc8127dd0 <+8>:     stp     x29, x30, [sp]
   0x0000fffcc8127dd4 <+12>:    mov     x29, sp
   0x0000fffcc8127dd8 <+16>:    add     x4, x29, #0x24, lsl #12
   0x0000fffcc8127ddc <+20>:    stp     x21, x22, [sp, #32]
   0x0000fffcc8127de0 <+24>:    adrp    x22, 0xfffcc817a000 <asset_cfg_shm_ip_range_reached_max@got.plt>
   0x0000fffcc8127de4 <+28>:    mov     x21, x0
   0x0000fffcc8127de8 <+32>:    str     x24, [sp, #56]
   0x0000fffcc8127dec <+36>:    ldr     x0, [x22, #3576]
   0x0000fffcc8127df0 <+40>:    mov     x24, x1
   0x0000fffcc8127df4 <+44>:    mov     x3, x21
   0x0000fffcc8127df8 <+48>:    adrp    x1, 0xfffcc8150000
   0x0000fffcc8127dfc <+52>:    ldr     x5, [x0]

  

为什么会出现这个错误呢?

Dump of assembler code for function deserialize_asset_portray:
0x0000fffcc8127dc8 <+0>: sub sp, sp, #0x60
0x0000fffcc8127dcc <+4>: sub sp, sp, #0x26, lsl #12
0x0000fffcc8127dd0 <+8>: stp x29, x30, [sp]

分析下这份汇编代码功能

 

这段汇编代码是ARM架构下的代码,它是函数deserialize_asset_portray的开始部分。下面是对这段代码的分析:

1. 0x0000fffcc8127dc8 <+0>: sub sp, sp, #0x60:这行代码从栈指针(sp)中减去0x60(96字节),为函数的局部变量分配空间。这是函数调用的常见模式,因为局部变量通常存储在栈上。

2. 0x0000fffcc8127dcc <+4>: sub sp, sp, #0x26, lsl #12:这行代码再次从栈指针中减去一个值,这次的值是0x26左移12位的结果,即0x26 4096。这可能是为了分配更多的栈空间,但具体原因需要查看更多的上下文代码。

3. 0x0000fffcc8127dd0 <+8>: stp x29, x30, [sp]:这行代码将寄存器x29和x30的值存储到栈上。在ARM架构中,x29通常用作帧指针(fp),x30用作链接寄存器(lr),存储返回地址。这是函数调用的常见模式,因为在进入新的函数时,通常需要保存当前函数的帧指针和返回地址。

总的来说,这段代码是在设置新的函数调用的栈帧,包括分配栈空间和保存帧指针和返回地址。也就是说在函数分配栈空间就出错了,都没有走到函数真正运行!!!

 

如果 stp x29, x30, [sp] 这里抛出异常,说非法访问内存 ,会是什么原因???

如果在执行 stp x29, x30, [sp] 这条指令时抛出非法访问内存的异常,可能的原因有以下几点:

1. 栈溢出:在前两条指令中,栈指针sp被减去了一定的值,用于分配栈空间。如果减去的值过大,可能会导致栈指针sp超出了为其分配的内存范围,从而导致栈溢出。当试图在溢出的栈上存储数据时,就会抛出非法访问内存的异常。

2. 内存保护:操作系统通常会实施内存保护机制,防止程序访问其未被分配或者不应访问的内存区域。如果sp指向的内存区域是受保护的,那么试图在这个位置写入数据就会抛出非法访问内存的异常。

3. 错误的地址对齐:某些硬件和操作系统要求数据在内存中的位置必须满足特定的对齐要求。例如,某些系统要求8字节的数据必须存储在8的倍数的地址上。如果sp的值不满足这些对齐要求,那么写入数据可能会抛出异常。


定位后发现是我的函数中分配了151552 B栈空间。

我的栈分配大小是 151552 字节 ,会是原因1吗?

如果你的栈分配大小是151552字节,那么在执行sub sp, sp, #0x26, lsl #12这条指令时,实际上是从栈指针中减去了0x26 4096 = 159744字节,这个值大于你的栈分配大小151552字节。因此,当执行到stp x29, x30, [sp]这条指令时,sp指向的内存区域已经超出了为其分配的栈空间,这就可能导致非法访问内存的异常。

导致出错的代码:

#define BUFFER_SIZE 512
void deserialize_asset_portray(struct ase_hash *asset_portray_map, char *filename)
{
    if (asset_portray_map == NULL) {
        return;
    }
    // ....
    // 错误根因在此,buffer size过大
    struct brief_asset_info buffer[BUFFER_SIZE];
    size_t count;
    while ((count = fread(buffer, sizeof(struct brief_asset_info), BUFFER_SIZE, file)) > 0) {
        for (size_t i = 0; i < count; i++) {
            upsert_asset_portray(&buffer[i], asset_portray_map);
        }
    }
    // ....
}

  

修改后:

#define BUFFER_SIZE 32  

问题解决!