Inheritance in SQLAlchemy

Inheritance in SQLAlchemy

SQLAlchemy is one of the commonly used ORMs in Python. Compared to other ORMs like Django ORM or Peewee, it might be a bit more challenging for beginners. However, it provides a rich set of advanced and flexible features. This article focuses on the topic of model inheritance.

The code in this article is based on Flask-SQLAlchemy for demonstration. The differences with native SQLAlchemy are minor, primarily:

  • db.Model is roughly equivalent to sqlalchemy.ext.declarative.declarative_base
  • Model.query is similar to db.session.query(Model) {: .note}

Single Table Inheritance

The simplest form of table association is single table inheritance. In this inheritance relationship, regardless of how many subclasses are derived, there is always only one table at the database level, distinguished by a polymorphic_identity column. Evidently, this is the most efficient query method among inheritance types. However, it doesn’t allow for the flexibility of adding unique fields to each subclass. It’s suitable for scenarios where the data structure of subclasses is the same, but their business logic differs.

For example, consider a user table where ordinary users require permission checks, but administrator users have all permissions. Of course, one could use if statements for condition checks, but this can significantly reduce code readability and maintainability in complex conditions. A more object-oriented approach can address this problem, for instance:

class User(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    user_type = db.Column(db.String(10))

    __mapper_args__ = {
        'polymorphic_identity': 'normal',
        'polymorphic_on': user_type
    }

    def has_permission(self, permission):
        return some_permission_check_logic()


class AdminUser(User):
    __mapper_args__ = {
        'polymorphic_identity': 'admin',
    }

    def has_permission(self, permission):
        return True

In this manner, when we perform database queries, the ORM will automatically use different class mapping objects based on the polymorphic_identity, for example:

>>> User.query.all()
[<User: xxx@xxx.com>, <User: xxx2@xxx.com>, <AdminUser: xxx3@xxx.com>]

This approach also brings an additional benefit: when we directly create subclass objects, the corresponding polymorphic_identity is automatically filled in for us:

>>> admin_user = AdminUser()
>>> print(admin_user.user_type)
admin

Automatic Initialization of Subclass Objects

Since creating a subclass automatically fills in the polymorphic_identity, one might wonder if passing the corresponding value to the parent class could automatically create a subclass object.

Unfortunately, the answer is no:

>>> user = User(user_type='admin', email='xxx@xxx.com') 
>>> print(user)
<User: xxx@xxx.com>
>>> # 但是当我们重新从数据库取出对象的时候,可以正确的使用子类
>>> db.session.add(user)
>>> db.session.commit()
>>> user_id = user.id
>>> del user # 这里需要删除已经生成的对象, 否则sqlalchemy会自动复用已经生成的对象
>>> user = User.query.get(user_id)
>>> print(user)
<AdminUser: xxx@xxx.com>

To achieve this effect, one could use metaclasses (see reference #2). However, I personally advise against using this kind of “black magic” for a couple of reasons. First, modifying the deeply customized metaclasses in SQLAlchemy might introduce some unpredictable issues. Second, initializing one class and ending up with an instance of another class is somewhat unconventional. Therefore, I prefer to implement this using a more straightforward approach with class methods:

class User(db.Model):
    # ...

    @classmethod
    def init_for(cls, user_type, **kwargs):
        """根据user_type初始化不同子类的实例"""
        children_classes = {
            x.polymorphic_identity: x.class_
            for x in cls.__mapper__.self_and_descendants
            }
        return children_classes[user_type](**kwargs)

Note: Single table inheritance should not define __tablename__ for subclasses

Joined Table Inheritance

Many times, we need to deal with more flexible situations where subclasses inherit some attributes from their parent class but also need to define some of their own attributes. Joined table inheritance can help us handle these situations, but it comes with some performance overhead.

As the name implies, joined table inheritance performs a JOIN query when hitting the database. The data structure definition is actually the subclass’s primary key acts as a foreign key, one-to-one associating to the parent class table’s primary key, and the subclass’s unique data is stored in a separate table.

See the code:

class User(db.Model):
    __tablename__ = 'user'
    id = db.Column(db.Integer, primary_key=True)
    user_type = db.Column(db.String(10))

    __mapper_args__ = {
        'polymorphic_identity': 'normal',
        'polymorphic_on': user_type
    }

class WechatUser(User):
    __tablename__ = 'user_wechat'

    # Define a foreign key to parent
    id = db.Column(db.Integer, db.ForeignKey('user.id', primary_key=True))
    open_id = db.Column(db.String(40))
    __mapper_args__ = {
        'polymorphic_identity': 'wechat',
    }

Query Behaviour

Let’s take a look at underlying SQL

>>> print(User.query)
SELECT "user".id AS user_id, "user".user_type AS user_user_type
FROM "user"
>>> print(WechatUser.query)
SELECT user_wechat.id AS user_wechat_id, "user".id AS user_id, "user".user_type AS user_user_type, user_wechat.open_id AS user_wechat_open_id
FROM "user" JOIN user_wechat ON "user".id = user_wechat.id

As you can see, when only querying the parent class, the subclass’s table will not be queried; conversely, it will compose a JOIN query, and by default, it is an INNER JOIN, meaning when only querying the subclass, the parent class’s related data will not be listed.

>>> db.session.add(WechatUser())
>>> db.session.add(User())
>>> db.session.commit()
>>> 
>>> print(User.query.all())
>>> print(WechatUser.query.all())
[<__main__.WechatUser object at 0x1016a3e10>, <__main__.User object at 0x1016a3ed0>]
[<__main__.WechatUser object at 0x10170b510>]

Using `with_polymorphic`` to control queries

By default, if a subclass object is queried from the parent class, an additional query will be initiated when accessing subclass attributes, similar to the behavior when lazy=True is set in a relationship we define normally. For example:

Code: (The first object in the User table in the test data is a WechatUser)

user = User.query.first()
print(user.open_id)

query:

SELECT user.id AS user_id, user.user_type AS user_user_type
FROM user
 LIMIT 1 OFFSET 0;
SELECT user_wechat.open_id AS user_wechat_open_id;

When the number of times we access subclass properties is low, such queries are completely acceptable. However, when we anticipate frequent access to subclass properties, doing so will result in a large number of queries, and joining all tables at that time would be a better choice.

Fortunately, SQLAlchemy also provides this kind of support:

>>> mixed = db.with_polymorphic(User, [User, WechatUser])
>>> # 也可以写作 mixed = db.with_polymorphic(User, '*') join所有的子表
>>> query = db.session.query(mixed)
>>> print(query)
SELECT "user".id AS user_id, "user".user_type AS user_user_type, user_wechat.id AS user_wechat_id, user_wechat.open_id AS user_wechat_open_id
FROM "user" LEFT OUTER JOIN user_wechat ON "user".id = user_wechat.id
>>> query.all()
[<__main__.WechatUser object at 0x1016a3e10>, <__main__.User object at 0x1016a3ed0>]

with_polymorphic also supports query,

>>> print(db.session.query(mixed).filter(mixed.WechatUser.open_id.is_(None)))
SELECT "user".id AS user_id, "user".user_type AS user_user_type, user_wechat.id AS user_wechat_id, user_wechat.open_id AS user_wechat_open_id
FROM "user" LEFT OUTER JOIN user_wechat ON "user".id = user_wechat.id
WHERE user_wechat.open_id IS NULL

The documentation also provides some more refined means of controlling queries, such as of_type, custom Join, and the use of .options() with the joinedload strategy. This article will not elaborate on these here; please refer to the documentation if needed.

Concrete Inheritance

SQLAlchemy also offers another method of inheritance, called concrete inheritance, which we might translate as concrete inheritance.

In this inheritance relationship, the parent class and each subclass have their own complete table, and there can be no relationship between the tables at the database level. Moreover, in Python, each subclass must completely define every field, which will not be inherited from the parent class.

Because each subclass has its own independent table, unlike the previous two types of inheritance relationships, there is no need for a polymorphic_identity to distinguish each subclass. This also avoids the need for JOINs and the issue of needing an additional request when encountering query fields.

However, the cost is that since each class has its own independent table, querying subclasses from the parent class requires UNIONing all the tables, so subclasses should not be too numerous, otherwise, the performance cost can be quite significant.

(So honestly, I don’t quite understand the purpose of this type of inheritance relationship.)

Class definitions are as follows:

from sqlalchemy.ext.declarative import ConcreteBase

class User(ConcreteBase, db.Model):  # 注意: 此处应继承ConcreteBase
    __tablename__ = 'user'
    id = db.Column(db.Integer, primary_key=True)
    name = db.Column(db.String(50))

class WechatUser(User):
    __tablename__ = 'user_wechat'

    id = db.Column(db.Integer, primary_key=True)
    name = db.Column(db.String(50))  # 注意: 子类必须重新定义字段
    open_id = db.Column(db.String(40))

    __mapper_args__ = {
        'concrete': True
    }

query:

print(User.query)
SELECT pjoin.id AS pjoin_id, pjoin.name AS pjoin_name, pjoin.type AS pjoin_type, pjoin.open_id AS pjoin_open_id
FROM (SELECT "user".id AS id, "user".name AS name, CAST(NULL AS VARCHAR(40)) AS open_id, 'normal' AS type
FROM "user" UNION ALL SELECT user_wechat.id AS id, user_wechat.name AS name, user_wechat.open_id AS open_id, 'wechat' AS type
FROM user_wechat) AS pjoin
print(WechatUser.query)
SELECT pjoin.id AS pjoin_id, pjoin.name AS pjoin_name, pjoin.open_id AS pjoin_open_id, pjoin.type AS pjoin_type
FROM (SELECT user_wechat.id AS id, user_wechat.name AS name, user_wechat.open_id AS open_id, 'wechat' AS type
FROM user_wechat) AS pjoin

As you can see, this query involves UNION and CAST, forming quite a complex query. Moreover, as the number of subclasses increases, the complexity of the query will rapidly increase. I have not used this type of inheritance in a production environment (too risky, and I feel it’s unnecessary).

If anyone has a suitable scenario for this type of inheritance relationship, I hope they can point it out so we can learn together.

Code

The code related to this article can be referred to in this gist.

References

  1. Mapping Class Inheritance Hierarchies
  2. Add child classes in SQLAlchemy session using parent constructor
comments powered by Disqus