MySQL 对 bit 类型与字符判断结果不一致现象

发布时间 2023-12-21 11:06:24作者: 沧海一滴

 

结论

涉及 bit 类型的比较时,最好给常量的比较对象都加上 b 前缀,比如a = '1'变成a = b'1',来避免结果的非预期不一致现象。

复现 SQL

CREATE TABLE t1 (
  `id` varchar(36),
  `status` bit(1),
  `open` bit(1),
  `store_id` varchar(20),
  PRIMARY KEY (`id`),
  KEY `index_status` (`store_id`, `status`)
);

INSERT INTO t1 VALUES ('7B03CF04', b'1', b'1', 'h09az');

执行上述查询,我们创建了一个包含长度为 1 的 bit 类型列 status 的表,往其中插入了一条数据。

通过以下查询我们是可以看到这条数据的:

root@localhost:test 8.0.23> SELECT * FROM t1 WHERE status = '1' AND open = '1';
+----------+----------------+------------+----------+
| id       | status         | open       | store_id |
+----------+----------------+------------+----------+
| 7B03CF04 | 0x01           | 0x01       | h09az    |
+----------+----------------+------------+----------+
1 row in set (0.00 sec)

root@localhost:test 8.0.23> EXPLAIN SELECT * FROM t1 WHERE status = '1' AND open = '1';
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | t1    | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    1 |   100.00 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

但是当我们给上述 SQL 加上一个 WHERE 条件store_id = 'h09az'后,按理说这条数据一定也能满足条件被选中,但是却返回了空结果集:

root@localhost:test 8.0.23> SELECT * FROM t1 WHERE status = '1' AND open = '1' AND store_id = 'h09az';
Empty set (0.00 sec)

root@localhost:test 8.0.23> EXPLAIN SELECT * FROM t1 WHERE status = '1' AND open = '1' AND store_id = 'h09az';
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                          |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------+
|  1 | SIMPLE      | NULL  | NULL       | NULL | NULL          | NULL | NULL    | NULL | NULL |     NULL | no matching row in const table |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------+
1 row in set, 1 warning (0.00 sec)

很诡异的现象对不对,我们深入到代码层面分析原因,发现当带有store_id = 'h09az'条件时,store_id 字段和 status 字段组成了索引前缀, 导致优化器会考虑 RANGE 查询,在构建表示 RANGE 的 Min-Max Tree 时,会将条件中的常量 '1' 存入 status 的 field 中,在这个过程中优化器是将 '1' 当作字符来看的,因此存入的是其对应的 ASCII 码 49,这显然超出了 1 个 bit 所能表示的范围,返回 TYPE_WARN_OUT_OF_RANGE。优化器一看需要比较的等值常量超出了这个列的表示范围,自然一定不存在,所以直接返回空结果集。这个过程在下列栈中:

#0  Field_bit_as_char::store at sql/field.cc:9007
#1  0x0000000004a42b80 in Item::save_str_value_in_field at sql/item.cc:603
#2  0x0000000004a5648c in Item_string::save_in_field_inner at sql/item.cc:6551
#3  0x0000000004a559f5 in Item::save_in_field at sql/item.cc:6426
#4  0x0000000004a463b0 in Item::save_in_field_no_warnings at sql/item.cc:1632
#5  0x000000000437b67d in save_value_and_handle_conversion at sql/opt_range.cc:8170
#6  0x000000000437c75e in get_mm_leaf at sql/opt_range.cc:8588
#7  0x000000000437b328 in get_mm_parts at sql/opt_range.cc:8070
#8  0x0000000004379a94 in get_func_mm_tree at sql/opt_range.cc:7548
#9  0x0000000004379c98 in get_full_func_mm_tree at sql/opt_range.cc:7649
#10 0x000000000437abda in get_mm_tree at sql/opt_range.cc:7871
#11 0x0000000004379fb1 in get_mm_tree at sql/opt_range.cc:7720
#12 0x0000000004370621 in test_quick_select at sql/opt_range.cc:4127
#13 0x00000000044de9fa in get_quick_record_count at sql/sql_optimizer.cc:6280
#14 0x00000000044dde76 in JOIN::estimate_rowcount at sql/sql_optimizer.cc:6004
#15 0x00000000044dbfc7 in JOIN::make_join_plan at sql/sql_optimizer.cc:5371
#16 0x00000000044cee5c in JOIN::optimize at sql/sql_optimizer.cc:721
#17 0x00000000045934c0 in SELECT_LEX::optimize at sql/sql_select.cc:1844
#18 0x00000000045914ba in Sql_cmd_dml::execute_inner at sql/sql_select.cc:855
#19 0x0000000004590e15 in Sql_cmd_dml::execute at sql/sql_select.cc:713
#20 0x0000000004509ecf in mysql_execute_command at sql/sql_parse.cc:6579

当不带有store_id = 'h09az'条件时,优化器不再考虑上述过程,而是采用全表扫描,在 Server 层使用 WHERE 条件来过滤,在构建 bit 类型和 '1' 的比较符时,Arg_comparator::set_cmp_func() 是将两边都当做实数来处理的,因此会将 '1' 字符转化为整型数 1 后进行比较,与 b'1' 的等值比较是成立的,会返回这条数据。这个过程在:

#0  Arg_comparator::compare_real (this=0x7ffc7881d0f8) at sql/item_cmpfunc.cc:1933
#1  0x0000000004a90402 in Arg_comparator::compare (this=0x7ffc7881d0f8) at sql/item_cmpfunc.h:133
#2  0x0000000004a77536 in Item_func_eq::val_int (this=(Item_func_eq *) 0x7ffc7881cfd0) at sql/item_cmpfunc.cc:2579
#3  0x0000000004a41a5e in Item::val_bool (this=(Item_func_eq *) 0x7ffc7881cfd0) at sql/item.cc:306
#4  0x0000000004a8a111 in Item_cond_and::val_int (this=(Item_cond_and *) 0x7ffc78741fa8) at sql/item_cmpfunc.cc:7137
#5  0x000000000447c230 in evaluate_join_record (join=0x7ffc7881dfd0, qep_tab=0x7ffc78742168) at sql/sql_executor.cc:1835
#6  0x000000000447b9cd in sub_select (join=0x7ffc7881dfd0, qep_tab=0x7ffc78742168, end_of_records=false) at sql/sql_executor.cc:1636
#7  0x000000000447acf3 in do_select (join=0x7ffc7881dfd0) at sql/sql_executor.cc:1230
#8  0x0000000004477d07 in JOIN::exec (this=0x7ffc7881dfd0) at sql/sql_executor.cc:303
#9  0x0000000004591913 in Sql_cmd_dml::execute_inner (this=0x7ffc7881d550, thd=0x7ffc7880b000) at sql/sql_select.cc:1003
#10 0x0000000004590e15 in Sql_cmd_dml::execute (this=0x7ffc7881d550, thd=0x7ffc7880b000) at sql/sql_select.cc:713

类似的情况还存在于整型数与字符串的比较,在做 WHERE 条件的判断时,都会优先将字符串转换为整型数,甚至在下列 SQL 中a = '10ab' 都能过滤出 a = 10 这条数据,很神奇...

CREATE TABLE t2 (id varchar(10), a INT, b VARCHAR(40));
INSERT INTO t2 VALUES ('mm', 10, 'test');

root@localhost:test 8.0.23> SELECT * FROM t2 WHERE a = '10ab';
+------+------+------+
| id   | a    | b    |
+------+------+------+
| mm   |   10 | test |
+------+------+------+
1 row in set, 1 warning (0.00 sec)

总结

究其原因,是 MySQL 优化器在不同路径下对于字符串的标准不一样导致的,在构建 Min-Max Tree 时将 '1' 当作字符来处理;在执行层进行 WHERE 条件的比较时,将 '1' 当作整型数来处理,造成了理论上应该返回相同结果的 SQL 返回了不同的结果。这里也给社区提了 BUG:,看看社区后续的修复方案。

这里究竟应该把 '1' 当作字符还是整型数来处理,可能难以有完全统一的标准,也正是 MySQL 对于语法、类型的检查不严格,才使得 MySQL 更“好用”,不管什么样的 SQL 都能顺利执行。

但是为了避免对业务结果造成一些非预期的影响,涉及 bit 类型还是都加上 b 前缀更稳妥。

https://zhuanlan.zhihu.com/p/555484197