[PG] Function Candidates Selection Algorithm

发布时间 2023-10-31 23:43:08作者: winter-loo

Function Candidates Selection Algorithm

environment setup

In lightdb orafce extension, execute sql below,

CREATE DOMAIN oracle.clob AS TEXT;

-- version 1
CREATE FUNCTION oracle.btrim(text, text)
RETURNS text
AS 'btrim'
LANGUAGE internal STRICT IMMUTABLE;

-- version 2
CREATE FUNCTION oracle.btrim(text, char)
RETURNS text
AS 'btrim'
LANGUAGE internal STRICT IMMUTABLE;

-- version 3
create function oracle.btrim(oracle.clob, varchar)
RETURNS oracle.clob
AS 'MODULE_PATHNAME', 'oracle_btrim'
LANGUAGE C STRICT;

Here's function candidates,
image

In lightdb oracle mode, execute sql below, tips: use psql -P 'null=(null)'(for lightdb, ltsql)

select trim('x' from 'x'); -- should be null in oracle
create table foo(a clob);
insert into foo values ('x');
select trim('x' from a) from foo; -- should not be null in oracle
-- result as expected, but '::varchar' syntax not supported by oracle
select trim('x'::varchar from a) from foo;

Hence, the btrim function btrim(clob, varchar) -> clob is not got executed in sql select trim('x' from a) from foo;. That's a problem!

FCSA in postgres 13.3

The Function Candidates Selection(FCS) algorithm is executed in postgres func_get_detail function. For LightDB, its' lt_func_get_detail. Here's the code path to this function,

parse_analyze
transformFuncCall
ParseFuncOrColumn
func_get_detail

func_get_detail

  1. use FuncnameGetCandidates to get list of possible candidates from system catalog pg_proc.
  2. try to find an exact match from the candidates
    After transforming the function arguments, the actual function arguments types are, for example,
    the 1st argument type oid: 76716, which is the oid of type `clob`
    the 2nd argument type oid: 705, which is the oid of type `unkown`
    
    Why does the second argument type oid is 705? That's another story.
    For better description, the arguments types will be noted as (76716, 705).
    However, the possible candidate argument types are,
    (76716, 1043) -- 1043 is oid of type `varchar`.
    (25, 1042) -- 25 is oid of type `text`, 1042 is oid of type `char`.
    (25, 25)
    
    At this step, there is no exact match. Let's rephrase the problem,
    target: T = (76716, 705)
    candidates: C = [(76716, 1043), (25, 1042), (25, 25)]
    
    We need to find target T in candidates C.

When examining the func_get_detail source code, PostgreSQL appears to handle the problem in a complex manner.

In postgres, the second try is to implicitly convert the input function arguments to candidate function arguments. This process is handled by the function func_match_argtypes.

Following the rules in function can_coerce_type,
T = (76716, 705), C0 = (76716, 1043),

705 (unkown) can be implicitly converted to 1043(varchar).

T = (76716, 705), C1 = (25, 1042),

76716(clob) can be implicitly converted to 25(text), 705(unknown) can be implicitly converted to 1042(char).

T = (76716, 705), C2 = (25, 25),

76716(clob) and 705(unkown) both can be converted implicitly to 25(text).

So, the second try fails!

The third try enters func_select_candidate which is a more complex function. Let's break down the candidate selection algorithm step by step. As a side effect of function func_match_argtypes,
the candidates in C are now in reverse order,

C = [(25, 25), (25, 1042), (76716, 1043)]

First, get the base type oid of the actual function arguments.

T = (76716, 705)   ->    B = (25, 705)

Then, comparing C0 = (25, 25) with B = (25, 705) with score S, initially 0,

C0(0) = 25 is equal to B(0) = 25, increment score, S = 1
C0(1) = 25 is not comparable to B(1) = 705, score not incremented

Thus, C0 gets score 1. Next, C1 = (25, 1042) vs B = (25, 705), with S = 0 initially,

C0(0) = 25 is equal to B(0) = 25, increment score, S = 1
C0(1) = 1042 is not comparable to B(1) = 705, score not incremented

Thus, C1 gets score 1. Finally, C2 = (76716, 1043) vs B = (25, 705), with S = 0 initially,

C2(0) = 76716 is not equal to B(0) = 25, score not incremented
C2(1) = 1043 is not comparable to B(1) = 705, score not incremented

Thus, C2 gets score 0. C2 is filtered out. Candidates now are,

C = [(25, 25), (25, 1042)]

After the steps above, we are destined to get a bad cadidate. Let's continue to find what's going on in postgres.

First, get the type category of the actual function arguments.

B = (25, 705)    ->    K = ('S', 'X')

Then, comparing C0 = (25, 25) with B = (25, 705) and K = ('S', 'X'). The score S, initially 0,

C0(0) = 25 is equal to B(0) = 25, need not use K, increment score, S = 1
C0(1) = 25 is not comparable to B(1) = 705, score not incremented

Thus, C0 gets score 1. Next, C1 = (25, 1042) with B = (25, 705) and K = ('S', 'X'). As C0, C1 gets score 1.
For this example, the steps above are redundant.

Next, postgres tries to assign a "type category" for the unknown type argument. For this example, the second argument of all cadidates agree with is a STRING('S') category. In postgres, the preferred string type is text. Therefore, after iterating current candidates, we have one more information,

P2 = ('S', True)

It can be read as "the type category of the second actual argument is 'S' and has the preferred type in cadidates".

With P2, postgres can now filter cadidates again. Now, we have following information,

B = (25, 705), P2 = ('S', True), C = [(25, 25), (25, 1042)]

Use P2 to filter C as follows(TC = Type Category, CC = Current Type):,

TC of C0(1) is 'S' which equals to P2(0) and CC of C0(1) is preferred in this TC, keep C0;

TC of C1(1) is 'S' which equals to P2(0) but CC of C1(1) is not preferred, throw C1;

For this example, the final cadidate is C0, i.e., btrim(text, text) -> text.

If there are still no unique candidate, postgres will make one last attempt.
TODO: analyze the last method in function func_select_candidate.