python中function使用class调用和使用对象调用的区别

发布时间 2023-08-14 19:19:26作者: tsecer

问题

在python中,class中函数的定义中有一个特殊的self指针,如果一个函数有一个self参数,通常意味着这是一个非静态函数,也就是调用的时候第一个参数是对象指针,只是这个指针是调用这个函数时由python来自动填充。

tsecer@harry: cat cls_mth.py 
class tsecer():
    def harry(self):
        print(self)

t = tsecer()

t.harry()

tsecer.harry(t)


tsecer@harry: python3 cls_mth.py
<__main__.tsecer object at 0x7fced59ce080>
<__main__.tsecer object at 0x7fced59ce080>
tsecer@harry: python3 -m dis cls_mth.py
  1           0 LOAD_BUILD_CLASS
              2 LOAD_CONST               0 (<code object tsecer at 0x7f5e1f831c90, file "cls_mth.py", line 1>)
              4 LOAD_CONST               1 ('tsecer')
              6 MAKE_FUNCTION            0
              8 LOAD_CONST               1 ('tsecer')
             10 CALL_FUNCTION            2
             12 STORE_NAME               0 (tsecer)

  5          14 LOAD_NAME                0 (tsecer)
             16 CALL_FUNCTION            0
             18 STORE_NAME               1 (t)

  7          20 LOAD_NAME                1 (t)
             22 LOAD_ATTR                2 (harry)
             24 CALL_FUNCTION            0
             26 POP_TOP

  9          28 LOAD_NAME                0 (tsecer)
             30 LOAD_ATTR                2 (harry)
             32 LOAD_NAME                1 (t)
             34 CALL_FUNCTION            1
             36 POP_TOP
             38 LOAD_CONST               2 (None)
             40 RETURN_VALUE
tsecer@harry:  

在上面的例子中,可以通过变量加方法名调用,也可以通过类型名显式提供对象来调用,并且结果相同。

attribute

在type结构中保存的是一个descr(描述符),也就是一个通用的,对于这个类型的所有适用的方法,而如果到object的dict中查找则是一个对象特有的属性,因为是特有的,所以这里就是一个属性的具体内容。

我们再打一个(可能不太恰当的)最为熟悉的C语言的比方:

struct A
{
	int a;
	int b;
	virtual int foo(int, int);
	virtual int bar(int, int);
} obja, objc;

在这个例子中,作为一个结构的A,它需要描述的是b这个字段的信息:它在基地址开始偏移4个字节位置处,而bar函数在在虚函数表的第二个槽位。这个信息对于A类型的所有对象都是成立的。

另外,和C语言不同,python是一个动态语言,对象还可以在运行时添加属于自己的特有字段,这些字段不用通过所属类型中查找,而是直接在对象的字典中就可以找到。这又是和C语言不一样的地方。

所以在属性查找的时候,优先级最高的是通用的类型数据字段,然后是对象特有字段,然后是类型非数据字段。python文档说明

If an object defines __set__() or __delete__(), it is considered a data descriptor. Descriptors that only define __get__() are called non-data descriptors (they are often used for methods but other uses are possible).

Data and non-data descriptors differ in how overrides are calculated with respect to entries in an instance’s dictionary. If an instance’s dictionary has an entry with the same name as a data descriptor, the data descriptor takes precedence. If an instance’s dictionary has an entry with the same name as a non-data descriptor, the dictionary entry takes precedence.

那么python为什么要这么实现呢?这篇文章对该问题有一个解释。

举例说明的是一个object对象的__dict__字段,这个应该是作为类型的一个属性而不能被对象覆盖。

The natural question to ask is: Why does it implement this particular order? More specifically, why do data descriptors take precedence over instance variables but non-data descriptors don't? First of all, note that some descriptors must take precedence over instance variables in order for attributes to work as expected. An example of such a descriptor is the __dict__ attribute of an object. You won't find it in the object's dictionary, because it's a data descriptor stored in the type's dictionary:

>>> a.__dict__['__dict__']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: '__dict__'
>>> A.__dict__['__dict__']
<attribute '__dict__' of 'A' objects>
>>> a.__dict__ is A.__dict__['__dict__'].__get__(a)
True
The tp_descr_get slot of this descriptor returns the object's dictionary located at tp_dictoffset. Now suppose that data descriptors don't take precedence over instance variables. What would happened then if we put '__dict__' in the object's dictionary and assigned it some other dictionary:

>>> a.__dict__['__dict__'] = {}
The a.__dict__ attribute would return not the object's dictionary but the dictionary we assigned! That would be totally unexpected for someone who relies on __dict__. Fortunately, data descriptors do take precedence over instance variables, so we get the object's dictionary:

>>> a.__dict__
{'x': 'instance attribute', 'g': <function <lambda> at 0x108a4cc10>, '__dict__': {}}
Non-data descriptors don't take precedence over instance variables, so that most of the time instance variables have a priority over type variables. Of course, the existing order of precedence is one of many design choices. Guido van Rossum explains the reasoning behind it in PEP 252:

In the more complicated case, there's a conflict between names stored in the instance dict and names stored in the type dict. If both dicts have an entry with the same key, which one should we return? Looking at classic Python for guidance, I find conflicting rules: for class instances, the instance dict overrides the class dict, except for the special attributes (like __dict__ and __class__), which have priority over the instance dict.

I resolved this with the following set of rules, implemented in PyObject_GenericGetAttr(): ...

Why is the __dict__ attribute implemented as a descriptor in the first place? Making it an instance variable would lead to the same problem. It would be possible to override the __dict__ attribute and hardly anyone wants to have this possibility.

We've learned how attributes of an ordinary object work. Let's see now how attributes of a type work.

下面是获得一个对象指定属性的代码

PyObject *
PyObject_GenericGetAttr(PyObject *obj, PyObject *name)
{
    return _PyObject_GenericGetAttrWithDict(obj, name, NULL);
}

int
_PyObject_GenericSetAttrWithDict(PyObject *obj, PyObject *name,
                                 PyObject *value, PyObject *dict)
{
    PyTypeObject *tp = Py_TYPE(obj);
    PyObject *descr;
    descrsetfunc f;
    PyObject **dictptr;
    int res = -1;

    if (!PyUnicode_Check(name)){
        PyErr_Format(PyExc_TypeError,
                     "attribute name must be string, not '%.200s'",
                     name->ob_type->tp_name);
        return -1;
    }

    if (tp->tp_dict == NULL && PyType_Ready(tp) < 0)
        return -1;

    Py_INCREF(name);

    descr = _PyType_Lookup(tp, name);

    if (descr != NULL) {
        Py_INCREF(descr);
        f = descr->ob_type->tp_descr_set;
        if (f != NULL) {
            res = f(descr, obj, value);
            goto done;
        }
    }

    if (dict == NULL) {
        dictptr = _PyObject_GetDictPtr(obj);
        if (dictptr == NULL) {
            if (descr == NULL) {
                PyErr_Format(PyExc_AttributeError,
                             "'%.100s' object has no attribute '%U'",
                             tp->tp_name, name);
            }
            else {
                PyErr_Format(PyExc_AttributeError,
                             "'%.50s' object attribute '%U' is read-only",
                             tp->tp_name, name);
            }
            goto done;
        }
        res = _PyObjectDict_SetItem(tp, dictptr, name, value);
    }
    else {
        Py_INCREF(dict);
        if (value == NULL)
            res = PyDict_DelItem(dict, name);
        else
            res = PyDict_SetItem(dict, name, value);
        Py_DECREF(dict);
    }
    if (res < 0 && PyErr_ExceptionMatches(PyExc_KeyError))
        PyErr_SetObject(PyExc_AttributeError, name);

  done:
    Py_XDECREF(descr);
    Py_DECREF(name);
    return res;
}

int
PyObject_GenericSetAttr(PyObject *obj, PyObject *name, PyObject *value)
{
    return _PyObject_GenericSetAttrWithDict(obj, name, value, NULL);
}

int
PyObject_GenericSetDict(PyObject *obj, PyObject *value, void *context)
{
    PyObject **dictptr = _PyObject_GetDictPtr(obj);
    if (dictptr == NULL) {
        PyErr_SetString(PyExc_AttributeError,
                        "This object has no __dict__");
        return -1;
    }
    if (value == NULL) {
        PyErr_SetString(PyExc_TypeError, "cannot delete __dict__");
        return -1;
    }
    if (!PyDict_Check(value)) {
        PyErr_Format(PyExc_TypeError,
                     "__dict__ must be set to a dictionary, "
                     "not a '%.200s'", Py_TYPE(value)->tp_name);
        return -1;
    }
    Py_INCREF(value);
    Py_XSETREF(*dictptr, value);
    return 0;
}

object vs type

class创建

创建一个class对象的时候,调用的其实是type_call和type_new函数。在type_new函数中,重新设置了这个新创建类型的PyObject_GenericGetAttr和PyObject_GenericSetAttr字段
注意: 这个设置是区分type和object的一个核心关键步骤,这会导致type和object来获得属性的时候走的是不同路径。

static PyObject *
type_call(PyTypeObject *type, PyObject *args, PyObject *kwds)
{
    PyObject *obj;

    if (type->tp_new == NULL) {
        PyErr_Format(PyExc_TypeError,
                     "cannot create '%.100s' instances",
                     type->tp_name);
        return NULL;
    }

#ifdef Py_DEBUG
    /* type_call() must not be called with an exception set,
       because it may clear it (directly or indirectly) and so the
       caller loses its exception */
    assert(!PyErr_Occurred());
#endif

    obj = type->tp_new(type, args, kwds);
    obj = _Py_CheckFunctionResult((PyObject*)type, obj, NULL);
    if (obj == NULL)
        return NULL;

    /* Ugly exception: when the call was type(something),
       don't call tp_init on the result. */
    if (type == &PyType_Type &&
        PyTuple_Check(args) && PyTuple_GET_SIZE(args) == 1 &&
        (kwds == NULL ||
         (PyDict_Check(kwds) && PyDict_Size(kwds) == 0)))
        return obj;

    /* If the returned object is not an instance of type,
       it won't be initialized. */
    if (!PyType_IsSubtype(Py_TYPE(obj), type))
        return obj;

    type = Py_TYPE(obj);
    if (type->tp_init != NULL) {
        int res = type->tp_init(obj, args, kwds);
        if (res < 0) {
            assert(PyErr_Occurred());
            Py_DECREF(obj);
            obj = NULL;
        }
        else {
            assert(!PyErr_Occurred());
        }
    }
    return obj;
}

static PyObject *
type_new(PyTypeObject *metatype, PyObject *args, PyObject *kwds)
{
///...
    /* Check arguments: (name, bases, dict) */
    if (!PyArg_ParseTuple(args, "UO!O!:type.__new__", &name, &PyTuple_Type,
                          &bases, &PyDict_Type, &orig_dict))
        return NULL;
///...
    /* Adjust for empty tuple bases */
    nbases = PyTuple_GET_SIZE(bases);
    if (nbases == 0) {
        bases = PyTuple_Pack(1, &PyBaseObject_Type);
        if (bases == NULL)
            goto error;
        nbases = 1;
    }
    else
        Py_INCREF(bases);
        ///...
    /* Special case some slots */
    if (type->tp_dictoffset != 0 || nslots > 0) {
        if (base->tp_getattr == NULL && base->tp_getattro == NULL)
            type->tp_getattro = PyObject_GenericGetAttr;
        if (base->tp_setattr == NULL && base->tp_setattro == NULL)
            type->tp_setattro = PyObject_GenericSetAttr;
    }
///...
    /* Always override allocation strategy to use regular heap */
    type->tp_alloc = PyType_GenericAlloc;
///...
    /* Initialize tp_dict from passed-in dict */
    Py_INCREF(dict);
    type->tp_dict = dict;
///...

type属性

在type类型对象获得属性的接口中,会先从所属类型获得;如果类型中没有获得,会从对象字典中获得;对于这两种情况,传递给local_get的参数不同:

        if (local_get != NULL) {
            /* NULL 2nd argument indicates the descriptor was
             * found on the target object itself (or a base)  */
            return local_get(attribute, (PyObject *)NULL,
                             (PyObject *)type);
        }

如果是从对象中获得的,那么传入给local_get函数的对象参数(第二个参数)为空,这个也是判断函数时静态函数还是对象函数的关键。

/* This is similar to PyObject_GenericGetAttr(),
   but uses _PyType_Lookup() instead of just looking in type->tp_dict. */
static PyObject *
type_getattro(PyTypeObject *type, PyObject *name)
{
    PyTypeObject *metatype = Py_TYPE(type);
    PyObject *meta_attribute, *attribute;
    descrgetfunc meta_get;

    if (!PyUnicode_Check(name)) {
        PyErr_Format(PyExc_TypeError,
                     "attribute name must be string, not '%.200s'",
                     name->ob_type->tp_name);
        return NULL;
    }

    /* Initialize this type (we'll assume the metatype is initialized) */
    if (type->tp_dict == NULL) {
        if (PyType_Ready(type) < 0)
            return NULL;
    }

    /* No readable descriptor found yet */
    meta_get = NULL;

    /* Look for the attribute in the metatype */
    meta_attribute = _PyType_Lookup(metatype, name);

    if (meta_attribute != NULL) {
        meta_get = Py_TYPE(meta_attribute)->tp_descr_get;

        if (meta_get != NULL && PyDescr_IsData(meta_attribute)) {
            /* Data descriptors implement tp_descr_set to intercept
             * writes. Assume the attribute is not overridden in
             * type's tp_dict (and bases): call the descriptor now.
             */
            return meta_get(meta_attribute, (PyObject *)type,
                            (PyObject *)metatype);
        }
        Py_INCREF(meta_attribute);
    }

    /* No data descriptor found on metatype. Look in tp_dict of this
     * type and its bases */
    attribute = _PyType_Lookup(type, name);
    if (attribute != NULL) {
        /* Implement descriptor functionality, if any */
        descrgetfunc local_get = Py_TYPE(attribute)->tp_descr_get;

        Py_XDECREF(meta_attribute);

        if (local_get != NULL) {
            /* NULL 2nd argument indicates the descriptor was
             * found on the target object itself (or a base)  */
            return local_get(attribute, (PyObject *)NULL,
                             (PyObject *)type);
        }

        Py_INCREF(attribute);
        return attribute;
    }

    /* No attribute found in local __dict__ (or bases): use the
     * descriptor from the metatype, if any */
    if (meta_get != NULL) {
        PyObject *res;
        res = meta_get(meta_attribute, (PyObject *)type,
                       (PyObject *)metatype);
        Py_DECREF(meta_attribute);
        return res;
    }

    /* If an ordinary attribute was found on the metatype, return it now */
    if (meta_attribute != NULL) {
        return meta_attribute;
    }

    /* Give up */
    PyErr_Format(PyExc_AttributeError,
                 "type object '%.50s' has no attribute '%U'",
                 type->tp_name, name);
    return NULL;
}

object属性

这个代码在前面的PyObject_GenericGetAttr函数,可以看到并不会传入空对象指针。也就是只要是通过对象调用的属性查找就不会传入空指针。

function

生成

如果传入参数中有个self指针,会在生成的PyMethod_Type对象中记录self指针的地址,并且在调用函数(执行function_call时)将保存的这个参数作为函数的第一个参数处理。

/* Bind a function to an object */
static PyObject *
func_descr_get(PyObject *func, PyObject *obj, PyObject *type)
{
    if (obj == Py_None || obj == NULL) {
        Py_INCREF(func);
        return func;
    }
    return PyMethod_New(func, obj);
}

/* Method objects are used for bound instance methods returned by
   instancename.methodname. ClassName.methodname returns an ordinary
   function.
*/

PyObject *
PyMethod_New(PyObject *func, PyObject *self)
{
    PyMethodObject *im;
    if (self == NULL) {
        PyErr_BadInternalCall();
        return NULL;
    }
    im = free_list;
    if (im != NULL) {
        free_list = (PyMethodObject *)(im->im_self);
        (void)PyObject_INIT(im, &PyMethod_Type);
        numfree--;
    }
    else {
        im = PyObject_GC_New(PyMethodObject, &PyMethod_Type);
        if (im == NULL)
            return NULL;
    }
    im->im_weakreflist = NULL;
    Py_INCREF(func);
    im->im_func = func;
    Py_XINCREF(self);
    im->im_self = self;
    _PyObject_GC_TRACK(im);
    return (PyObject *)im;
}

调用

在调用(执行method_call函数)时,将保存的self指针作为函数调用的第一个参数处理。

static PyObject *
method_call(PyObject *method, PyObject *args, PyObject *kwargs)
{
    PyObject *self, *func;

    self = PyMethod_GET_SELF(method);
    if (self == NULL) {
        PyErr_BadInternalCall();
        return NULL;
    }

    func = PyMethod_GET_FUNCTION(method);

    return _PyObject_Call_Prepend(func, self, args, kwargs);
}


/* Positional arguments are obj followed args. */
PyObject *
_PyObject_Call_Prepend(PyObject *func,
                       PyObject *obj, PyObject *args, PyObject *kwargs)
{
    PyObject *small_stack[8];
    PyObject **stack;
    Py_ssize_t argcount;
    PyObject *result;

    assert(PyTuple_Check(args));

    argcount = PyTuple_GET_SIZE(args);
    if (argcount + 1 <= (Py_ssize_t)Py_ARRAY_LENGTH(small_stack)) {
        stack = small_stack;
    }
    else {
        stack = PyMem_Malloc((argcount + 1) * sizeof(PyObject *));
        if (stack == NULL) {
            PyErr_NoMemory();
            return NULL;
        }
    }

    /* use borrowed references */
    stack[0] = obj;
    memcpy(&stack[1],
              &PyTuple_GET_ITEM(args, 0),
              argcount * sizeof(PyObject *));

    result = _PyObject_FastCallDict(func,
                                    stack, argcount + 1,
                                    kwargs);
    if (stack != small_stack) {
        PyMem_Free(stack);
    }
    return result;
}

栗子

属性

回到最开始的问题,当通过

t.harry()

调用时,执行的时PyObject_GenericGetAttr函数,这个函数从class类型中找到harry属性的函数描述,并且传入对象指针,所以调用时会自动把调用时参数t传递给函数调用。
而当通过tsecer.harry调用时,执行的时type_getattro函数,tsecer虽然是一个类型,但是它所属的类型是type(PyType_Type),在所属类型中没有找到这个字段,反而在class本身找到这个字段,所以执行到

        if (local_get != NULL) {
            /* NULL 2nd argument indicates the descriptor was
             * found on the target object itself (or a base)  */
            return local_get(attribute, (PyObject *)NULL,
                             (PyObject *)type);
        }

分支,传入的第二个参数为NULL,进而导致作为一个静态方法调用。

type

在下面的例子中,type(int)是一个内置的PyType_Type类型,所以它获取属性的接口调用的是type_getattro接口,

>>> type(1).mro()  
[<class 'int'>, <class 'object'>]
>>> type(int).mro()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: descriptor 'mro' of 'type' object needs an argument
>>> type(int).mro(type(int))
[<class 'type'>, <class 'object'>]
>>> int.mro()        
[<class 'int'>, <class 'object'>]
>>> 

对于PyType_Type类型的mro成员来说,它是一个PyMethodDescr_Type类型,它的tp_descr_get是method_get函数。

PyTypeObject PyMethodDescr_Type = {
    PyVarObject_HEAD_INIT(&PyType_Type, 0)
    "method_descriptor",
    sizeof(PyMethodDescrObject),
    0,
    (destructor)descr_dealloc,                  /* tp_dealloc */
    0,                                          /* tp_print */
    0,                                          /* tp_getattr */
    0,                                          /* tp_setattr */
    0,                                          /* tp_reserved */
    (reprfunc)method_repr,                      /* tp_repr */
    0,                                          /* tp_as_number */
    0,                                          /* tp_as_sequence */
    0,                                          /* tp_as_mapping */
    0,                                          /* tp_hash */
    (ternaryfunc)methoddescr_call,              /* tp_call */
    0,                                          /* tp_str */
    PyObject_GenericGetAttr,                    /* tp_getattro */
    0,                                          /* tp_setattro */
    0,                                          /* tp_as_buffer */
    Py_TPFLAGS_DEFAULT | Py_TPFLAGS_HAVE_GC, /* tp_flags */
    0,                                          /* tp_doc */
    descr_traverse,                             /* tp_traverse */
    0,                                          /* tp_clear */
    0,                                          /* tp_richcompare */
    0,                                          /* tp_weaklistoffset */
    0,                                          /* tp_iter */
    0,                                          /* tp_iternext */
    descr_methods,                              /* tp_methods */
    descr_members,                              /* tp_members */
    method_getset,                              /* tp_getset */
    0,                                          /* tp_base */
    0,                                          /* tp_dict */
    (descrgetfunc)method_get,                   /* tp_descr_get */
    0,                                          /* tp_descr_set */
};

也是判断如果obj为空则返回描述符本身(没有self指针)。

static PyObject *
method_get(PyMethodDescrObject *descr, PyObject *obj, PyObject *type)
{
    PyObject *res;

    if (descr_check((PyDescrObject *)descr, obj, &res))
        return res;
    return PyCFunction_NewEx(descr->d_method, obj, NULL);
}

static int
descr_check(PyDescrObject *descr, PyObject *obj, PyObject **pres)
{
    if (obj == NULL) {
        Py_INCREF(descr);
        *pres = (PyObject *)descr;
        return 1;
    }
    if (!PyObject_TypeCheck(obj, descr->d_type)) {
        PyErr_Format(PyExc_TypeError,
                     "descriptor '%V' for '%s' objects "
                     "doesn't apply to '%s' object",
                     descr_name((PyDescrObject *)descr), "?",
                     descr->d_type->tp_name,
                     obj->ob_type->tp_name);
        *pres = NULL;
        return 1;
    }
    return 0;
}

在对应的函数调用中,会判断必须有至少有一个参数

static PyObject *
classmethoddescr_call(PyMethodDescrObject *descr, PyObject *args,
                      PyObject *kwds)
{
    Py_ssize_t argc;
    PyObject *self, *func, *result, **stack;

    /* Make sure that the first argument is acceptable as 'self' */
    assert(PyTuple_Check(args));
    argc = PyTuple_GET_SIZE(args);
    if (argc < 1) {
        PyErr_Format(PyExc_TypeError,
                     "descriptor '%V' of '%.100s' "
                     "object needs an argument",
                     descr_name((PyDescrObject *)descr), "?",
                     PyDescr_TYPE(descr)->tp_name);
        return NULL;
    }
    self = PyTuple_GET_ITEM(args, 0);
    if (!PyType_Check(self)) {
        PyErr_Format(PyExc_TypeError,
                     "descriptor '%V' requires a type "
                     "but received a '%.100s'",
                     descr_name((PyDescrObject *)descr), "?",
                     PyDescr_TYPE(descr)->tp_name,
                     self->ob_type->tp_name);
        return NULL;
    }
    if (!PyType_IsSubtype((PyTypeObject *)self, PyDescr_TYPE(descr))) {
        PyErr_Format(PyExc_TypeError,
                     "descriptor '%V' "
                     "requires a subtype of '%.100s' "
                     "but received '%.100s",
                     descr_name((PyDescrObject *)descr), "?",
                     PyDescr_TYPE(descr)->tp_name,
                     self->ob_type->tp_name);
        return NULL;
    }

    func = PyCFunction_NewEx(descr->d_method, self, NULL);
    if (func == NULL)
        return NULL;
    stack = &PyTuple_GET_ITEM(args, 1);
    result = _PyObject_FastCallDict(func, stack, argc - 1, kwds);
    Py_DECREF(func);
    return result;
}