Oracle内连接HASH JOIN/半连接HASH JOIN SEMI不处理驱动表连接列null值

发布时间 2023-06-14 16:30:22作者: PiscesCanon

 

Oracle内连接HASH JOIN/半连接HASH JOIN SEMI不处理驱动表连接列null值

 

先说内连接,比如有sql如下:

select count(*) from t,tt where t.col=tt.col; 

 

测试表t,tt的信息如下:其中t表全是null值。

15:16:18 ZKM@zkm(37)> desc t                                        15:16:20 ZKM@zkm(37)> desc tt 
 Name    Null?    Type                                               Name    Null?    Type
 ------- -------- -------------                                      ------- -------- -------------
 COL              VARCHAR2(20)                                       COL              VARCHAR2(20)


15:17:40 ZKM@zkm(37)> select col,count(*) from t group by col;       15:17:43 ZKM@zkm(37)> select col,count(*) from tt group by col;
                                                                     
COL      COUNT(*)                                                    COL      COUNT(*)
------ ----------                                                    ------ ----------
          1000000                                                                    1
                                                                     a               1
Elapsed: 00:00:00.16                                                 
                                                                     Elapsed: 00:00:00.00

 

对于不同驱动表时,执行计划如下:(由于OMem、1Mem根据上次sql执行估算出来的,所以每个sql多执行几次,去最后一次结果)

select /*+ leading(t) */  count(*) from t,tt where t.col=tt.col                                                       select count(*) from t,tt where t.col=tt.col
Plan hash value: 3743140366                                                                                           Plan hash value: 1389617266
                                                                                                                      
-----------------------------------------------------------------------------------------------------------------     -----------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |     | Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------     -----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |00:00:00.01 |    1590 |       |       |          |     |   0 | SELECT STATEMENT    |      |      1 |        |      1 |00:00:00.01 |    1596 |       |       |          |
|   1 |  SORT AGGREGATE     |      |      1 |      1 |      1 |00:00:00.01 |    1590 |       |       |          |     |   1 |  SORT AGGREGATE     |      |      1 |      1 |      1 |00:00:00.01 |    1596 |       |       |          |
|*  2 |   HASH JOIN         |      |      1 |      1 |      0 |00:00:00.01 |    1590 |  1969K|  1969K|  464K (0)|     |*  2 |   HASH JOIN         |      |      1 |      1 |      0 |00:00:00.01 |    1596 |  2616K|  2616K|  786K (0)|
|*  3 |    TABLE ACCESS FULL| T    |      1 |      3 |      0 |00:00:00.01 |    1590 |       |       |          |     |*  3 |    TABLE ACCESS FULL| TT   |      1 |      1 |      1 |00:00:00.01 |       6 |       |       |          |
|*  4 |    TABLE ACCESS FULL| TT   |      0 |      1 |      0 |00:00:00.01 |       0 |       |       |          |     |*  4 |    TABLE ACCESS FULL| T    |      1 |      3 |      0 |00:00:00.01 |    1590 |       |       |          |
-----------------------------------------------------------------------------------------------------------------     -----------------------------------------------------------------------------------------------------------------
                                                                                                                      
Predicate Information (identified by operation id):                                                                   Predicate Information (identified by operation id):
---------------------------------------------------                                                                   ---------------------------------------------------
                                                                                                                      
   2 - access("T"."COL"="TT"."COL")                                                                                      2 - access("T"."COL"="TT"."COL")
   3 - filter("T"."COL" IS NOT NULL)                                                                                     3 - filter("TT"."COL" IS NOT NULL)
   4 - filter("TT"."COL" IS NOT NULL)                                                                                    4 - filter("T"."COL" IS NOT NULL)

 

值得一提的是,由于t表全是null值,这个时候t表做驱动表会导致被驱动表tt访问次数为0,看上边第一个执行计划id=4的Starts=0。

这个现象在我另外一篇博文里有提到:Oracle Hash Join被驱动表被访问0次的另外一种情况

 

这里还看不出来表t的连接列col全是null值是否在id=2中有没有被忽略。

一开始我是直接在t表再insert进去15000000个null值,然后执行同样的sql记录下执行计划,前后比对id=2处的时间时间消耗,OMem,1Mem,Used-Mem。

如下,逻辑读已经多了20多倍,而A-Time变为0.13s几乎无增加还是秒出,多出来的0.1s是消耗在id=3出而不是id=2,由此可证。

Plan hash value: 3743140366

-----------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |00:00:00.13 |   24388 |       |       |          |
|   1 |  SORT AGGREGATE     |      |      1 |      1 |      1 |00:00:00.13 |   24388 |       |       |          |
|*  2 |   HASH JOIN         |      |      1 |      1 |      0 |00:00:00.13 |   24388 |  1969K|  1969K|  501K (0)|
|*  3 |    TABLE ACCESS FULL| T    |      1 |      3 |      0 |00:00:00.13 |   24388 |       |       |          |
|*  4 |    TABLE ACCESS FULL| TT   |      0 |      1 |      0 |00:00:00.01 |       0 |       |       |          |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("T"."COL"="TT"."COL")
   3 - filter("T"."COL" IS NOT NULL)
   4 - filter("TT"."COL" IS NOT NULL)

 

后来想到另外一个的办法,直接创建索引即可。

create index idx_col on t(col);

 

再次执行SQL如下:

可以看出来,执行计划直接连表t都不访问了,而是访问IDX_COL。防小偷。

我们知道,单列普通索引是不存储null值的,也就是在id=3就已经过滤了null,综合对比前边的A-Time,OMem,1Mem,Used-Mem可以得知确实在HASH JOIN步骤,对于null值是不会在pga中计算hash值来做比对的。

select /*+ leading(t) */  count(*) from t,tt where t.col=tt.col

Plan hash value: 497415060

--------------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name    | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |         |      1 |        |      1 |00:00:00.01 |       1 |       |       |          |
|   1 |  SORT AGGREGATE     |         |      1 |      1 |      1 |00:00:00.01 |       1 |       |       |          |
|*  2 |   HASH JOIN         |         |      1 |      1 |      0 |00:00:00.01 |       1 |  1969K|  1969K|  460K (0)|
|*  3 |    INDEX FULL SCAN  | IDX_COL |      1 |      3 |      0 |00:00:00.01 |       1 |       |       |          |
|*  4 |    TABLE ACCESS FULL| TT      |      0 |      1 |      0 |00:00:00.01 |       0 |       |       |          |
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("T"."COL"="TT"."COL")
   3 - filter("T"."COL" IS NOT NULL)
   4 - filter("TT"."COL" IS NOT NULL)

 

再说半连接,比如有sql如下:

select /*+ leading(t) */ count(*) from t where t.col in (select tt.col from tt);

 

同样的道理,有索引和无索引的情况如下:

select /*+ leading(t) */ count(*) from t where t.col in (select tt.col from tt)                                        select /*+ leading(t) */ count(*) from t where t.col in (select tt.col from tt)
                                                                                                                       
Plan hash value: 1176692913                                                                                            Plan hash value: 1600321607
                                                                                                                       
-----------------------------------------------------------------------------------------------------------------      --------------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |      | Id  | Operation           | Name    | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------      --------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |00:00:00.14 |   24388 |       |       |          |      |   0 | SELECT STATEMENT    |         |      1 |        |      1 |00:00:00.01 |       1 |       |       |          |
|   1 |  SORT AGGREGATE     |      |      1 |      1 |      1 |00:00:00.14 |   24388 |       |       |          |      |   1 |  SORT AGGREGATE     |         |      1 |      1 |      1 |00:00:00.01 |       1 |       |       |          |
|*  2 |   HASH JOIN SEMI    |      |      1 |      1 |      0 |00:00:00.14 |   24388 |  1969K|  1969K|  501K (0)|      |*  2 |   HASH JOIN SEMI    |         |      1 |      1 |      0 |00:00:00.01 |       1 |  1969K|  1969K|  467K (0)|
|*  3 |    TABLE ACCESS FULL| T    |      1 |      3 |      0 |00:00:00.14 |   24388 |       |       |          |      |*  3 |    INDEX FULL SCAN  | IDX_COL |      1 |      3 |      0 |00:00:00.01 |       1 |       |       |          |
|*  4 |    TABLE ACCESS FULL| TT   |      0 |      1 |      0 |00:00:00.01 |       0 |       |       |          |      |*  4 |    TABLE ACCESS FULL| TT      |      0 |      1 |      0 |00:00:00.01 |       0 |       |       |          |
-----------------------------------------------------------------------------------------------------------------      --------------------------------------------------------------------------------------------------------------------
                                                                                                                       
Predicate Information (identified by operation id):                                                                    Predicate Information (identified by operation id):
---------------------------------------------------                                                                    ---------------------------------------------------
                                                                                                                       
   2 - access("T"."COL"="TT"."COL")                                                                                       2 - access("T"."COL"="TT"."COL")
   3 - filter("T"."COL" IS NOT NULL)                                                                                      3 - filter("T"."COL" IS NOT NULL)
   4 - filter("TT"."COL" IS NOT NULL)                                                                                     4 - filter("TT"."COL" IS NOT NULL)

 

一样的道理,不再赘述。