Advanced SQLAlchemy Features You Need To Start Using
If you are Python developer and you work with SQL databases, then SQLAlchemy is most likely a library you are familiar with. It’s a powerful, yet flexible toolkit for working with SQL in Python with lots of features. Some of these features like ORM and basic queries are common knowledge, but there are quite a few features you might not know about and should definitely be taking advantage of. So, let’s se how to leverage things like hybrid properties, nested queries, table metadata, dialects and more!
Column Properties
Let’s start simple. I think it’s pretty common that you might want to create mapped attribute based on other columns — essentially creating computed column. The simplest example would be string concatenation:
class User(Base):
__tablename__ = 'user'
id = Column(Integer, primary_key=True)
firstname = Column(String(50))
lastname = Column(String(50))
fullname = column_property(firstname + " " + lastname)
This is nice, but it’s much more useful when we use SQL expressions to create such attribute:
class CreditCard(Base):
__tablename__ = 'card'
id = Column(Integer, primary_key=True)
user_id = Column(Integer, ForeignKey('user.id'), nullable=True)
class User(Base):
__tablename__ = 'user'
id = Column(Integer, primary_key=True)
firstname = Column(String(50))
lastname = Column(String(50))
fullname = column_property(firstname + " " + lastname)
credit_card = relationship(CreditCard, backref='report')
has_credit_card = column_property(
exists().where(CreditCard.user_id == id)
)
john = User(id=1, firstname='John', lastname='Doe')
session.add(john)
session.commit()
print(john.has_credit_card)
# False
johns_card = CreditCard(user_id=1)
session.add(johns_card)
session.commit()
print(john.has_credit_card)
# True
For the example above we added a little bit more code. We created CreditCard class which has many-to-one relationship with User. This user - on top of the columns and attributes from first example - has also column property named has_credit_card, which is computed by checking whether credit card with users ID exists.
One thing you should be mindful of though when using this feature is that column properties won’t be populated before you commit the session, which might be unexpected when working with freshly created record:
john = User(firstname='John', lastname='Doe')
print(john.fullname)
# None
session.add(john)
session.commit()
print(john.fullname)
# John Doe
Hybrid Properties
To follow up on the previous tip, let me also show you hybrid properties. They are similar to column properties in the sense that they produce computed attributes. Hybrid properties however, produce value from Python expression on instance level and SQL expression on class level. Little confusing? Alright, let’s see an example:
class Order(Base):
__tablename__ = 'order'
id = Column(Integer, primary_key=True)
user_id = Column(Integer, ForeignKey('user.id'))
state = Column(String(20)) # Pending/Complete
class User(Base):
__tablename__ = 'user'
id = Column(Integer, primary_key=True)
orders = relationship("Order")
name = Column(String(50))
@hybrid_property
def has_pending_orders(self):
return any(order.state == "Pending" for order in self.orders) # -> produces value
@has_pending_orders.expression
def has_pending_orders(cls):
return (
select([
case([(exists().where(and_(
Order.user_id == cls.id,
Order.state == "Pending", )).correlate(cls), True)], else_=False,
).label("has_pending_order")
]).label("number_of_pending_orders")
) # -> produces SQL expression
user = User(
name="John",
orders=[
Order(state="Complete"),
Order(state="Pending"),
]
)
print(user.has_pending_orders) # evaluate as Python expression
# True
user = session.query(User).filter(User.has_pending_orders).scalar() # evaluate as SQL expression (Filter)
# SELECT * FROM user
# WHERE (
# SELECT CASE WHEN (EXISTS (
# SELECT *
# FROM order
# WHERE order.user_id = user.id AND order.state = 'Pending'
# )) THEN 1 ELSE 0 END AS has_pending_order)
To show off the capabilities of hybrid_property, we implement simple relationship between User and Order, where each user has list of orders which have .state - in this case either Pending or Complete. Now, if we want to find out whether user has any Pending orders, we need to think of 2 cases - If we are working with rows that were already loaded into Python objects, then we can just use Python expression and produce Python value has_pending_orders(self). If we are on the other hand querying this information directly from database, we can't use Python expression as database engine won't understand it. Therefore, for this case has_pending_orders(cls) we write SQL expression, that can be ran against the database.
As a side note — if your expression is same for both Python and SQL evaluation, then you can omit the second function decorated with .expression and SQLAlchemy will use the first one for both cases.
Mixins
One of my favourite features are Mixin classes. Mixins aren’t something specific only to SQLAlchemy, but they are especially useful in conjunction with ORM models. Quite often you might run into situation, where you have multiple classes (models) that require same attribute or same classmethod. One such example is User model below:
class MixinAsDict:
def as_dict(self):
return {c.name: getattr(self, c.name) for c in self.__table__.columns}
class MixinGetByUsername:
username = Column(String(200), unique=True, nullable=True)
@classmethod
def get_by_username(cls, username):
return session.query(cls).filter(cls.username == username).first()
class User(MixinAsDict, MixinGetByUsername, Base):
__tablename__ = 'user'
id = Column(Integer, primary_key=True)
user = User(id=1, username="John")
print(user.as_dict())
# {'username': 'John', 'id': 1}
session.add(user)
session.commit()
john = User.get_by_username("John")
print(f"User: {john.username} with ID: {john.id}")
# User: John with ID: 1
In this example, we have 2 Mixin classes from which the User model inherits. First of them MixinAsDict provides method as_dict(self), that can be used to get dict representation of the model. The other one MixinGetByUsername provides both username column as well as static method for querying users by their username.
Defining these functions as Mixins allows us to make them reusable and add them to other models without copy-pasting same code everywhere.
If you don’t want to write all the Mixins yourself, then you can take a look at https://github.com/absent1706/sqlalchemy-mixins which is a collection of common SQLAlchemy Mixins.
Working with Metadata
Sometimes you might need to access table column names, check constraints on the table or maybe check if columns is nullable. All of this can be done with MetaData() class:
class Address(Base):
__tablename__ = 'address'
id = Column(Integer, primary_key=True)
user_id = Column(Integer, ForeignKey('user.id'), nullable=True)
street = Column(String(50))
class User(Base):
__tablename__ = 'user'
id = Column(Integer, primary_key=True)
firstname = Column(String(50))
lastname = Column(String(50))
address = relationship(Address, backref='report')
Base.metadata.create_all(engine)
meta = Base.metadata # Metadata()
for t in meta.sorted_tables:
print(t.name)
# user
# address
print(meta.tables["user"].c)
# ['user.id', 'user.firstname', 'user.lastname']
print(meta.tables["user"].c["lastname"].type)
# VARCHAR(50)
print(meta.tables["user"].c["lastname"].nullable)
# True
print(meta.tables["address"].foreign_keys)
# {ForeignKey('user.id')}
print(meta.tables["address"].primary_key)
# PrimaryKeyConstraint(Column('id', Integer(), table=<address>, primary_key=True, nullable=False))
The important part here are the print statements at bottom of code snippet. Each of them demonstrates some of the things you can access through the metadata object. This includes table names, columns names, column type, foreign and primary keys as well as other constraints.
Configuring Tables
Some of your database tables might require a bit more extensive initial setup. For example — you might want to include a few check constraints, indexes or specify different schema:
class Card(Base):
__tablename__ = 'card'
__table_args__ = (
CheckConstraint("created < valid_until", name="validity_check"),
CheckConstraint("card_type ~* '^(debit|credit){1}$''", name="type_check"),
Index("index", "id"),
ForeignKeyConstraint(['id'], ['remote_table.id']),
{'extend_existing': True, "schema": "default"},
)
id = Column(Integer, primary_key=True)
created = Column(Date)
valid_until = Column(Date)
card_type = Column(String(50))
All of these things can be configured using __table_args__
class attribute. Here, we setup 2 check constraints, 1 index for ID column and foreign key constraint. We also turn on automatic table extensions, which means that if we add columns to this table after it was created, then it will be automatically added. Lastly, we also specify to which schema this table belongs to.
Using Custom Dialects
Every database engine has some custom features, which you might want to make use of. For me — as a PostgreSQL user — I would like to use some of the custom column types that PostgreSQL has. So how would one use those with SQLAlchemy?
from uuid import uuid4
from sqlalchemy.dialects.postgresql import UUID, INT4RANGE, NUMRANGE, JSON
engine = create_engine('postgresql+psycopg2://postgres:postgres@localhost/testdb', echo=True)
class Example(Base):
__tablename__ = 'example'
id = Column(Integer, primary_key=True)
uuid = Column(UUID(as_uuid=True), unique=True, nullable=False, default=uuid4)
int_range = Column(INT4RANGE)
num_range = Column(NUMRANGE)
pg_json = Column(JSON)
pg_array = Column(postgresql.ARRAY(Integer), server_default='{}')
from psycopg2.extras import NumericRange
example = Example(
uuid=uuid4(),
int_range=NumericRange(1, 3),
num_range=NumericRange(1, 3),
pg_json={"key": "value"},
pg_array=[1, 5, 7, 24, 74, 8],
)
print(session.query(Example).filter(Example.pg_array.contains([5])).scalar())
# SELECT * FROM example WHERE example.pg_array @> [5]
# <__main__.Example object at 0x7f2d600a4070> # Object we previously inserted
print(session.query(Example).filter(Example.pg_json["key"].astext == "value").scalar())
# SELECT *
# FROM example
# WHERE (example.pg_json ->> 'key' = 'value'
# <__main__.Example object at 0x7f04dee05070> # Object we previously inserted
The code above shows one Example table that has PostgreSQL UUID, INT4RANGE, NUMRANGE, JSON and ARRAY columns. All of these and more can be imported from sqlalchemy.dialects.postgresql.
Creating rows that include values of these types is pretty self-explanatory. When it comes to querying them though, you will need to use the dialect and type specific comparators as shown above with PostgreSQL ARRAY type and .contains comparator.
For other types like JSON you might be able to get away with just comparing them as text using .astext.
To make your life easier when creating these queries, I recommend setting echo=True when creating engine, which will make SQLAchemy print all SQL queries into console, so that you can check whether your code actually produces correct queries.
All of the dialects, their types and comparators are documented at https://docs.sqlalchemy.org/en/13/dialects/
Full-text Search with PostgreSQL
While on the topic of PostgreSQL features. What about the full-text search with tsqeury and tsvector? We can do that with SQLAchemy too:
class MixinSearch:
@classmethod
def fulltext_search(cls, session, search_string, field):
return session.query(cls). \
filter(func.to_tsvector('english', getattr(cls, field)).match(search_string, postgresql_regconfig='english')).all()
class Book(MixinSearch, Base):
__tablename__ = 'book'
id = Column(Integer, primary_key=True)
title = Column(String(100))
body = Column(Text)
book = Book(
title="The Catcher in the Rye",
body="""First page of the book..."""
)
success = Book.fulltext_search(session, "David & Copperfield", "body")
# SELECT *
# FROM book
# WHERE to_tsvector(english, book.body) @@ to_tsquery('english','David & Copperfield')
print(success)
# [<__main__.Book object at 0x7fdac5e44520>]
fail = Book.fulltext_search(session, "cat & dog", "body")
# SELECT *
# FROM book
# WHERE to_tsvector(english, book.body) @@ to_tsquery('english', 'cat & dog')
print(fail)
# []
Once again we create Mixin class for full-text search, as this is something that a lot of models can use. This Mixin has single static method, which takes search string and column to search in field. To do the actual search we use func.to_tsvector to which we pass language and reference to tables column. On that, we chain call to .match function which really is a call to to_tsquery in PostgreSQL and we give it search string and search configuration as arguments.
From the generated SQL we can see that the Python code really produces correct SQL queries.
Tracking Last Update on Rows
Creating created_at or updated_at column is pretty common practice. This can be done very simply with SQLAlchemy:
class Example(Base):
__tablename__ = 'example'
id = Column(Integer, primary_key=True)
updated_at = Column(DateTime(timezone=True), server_default=func.now(), onupdate=func.now())
data = Column(String(100))
example = Example(
data="Some data..."
)
row = session.query(Example).scalar()
print(row.updated_at)
# 10:13:14.001813+00:00
time.sleep(...)
row.data = "Some new data..."
session.add(row)
session.commit()
row = session.query(Example).scalar()
print(row.updated_at)
# 10:13:16.590945+00:00
For updated_at you just need to set onupdate to func.now() which will make it so that every time the row is updated, this column will be set to current timestamp. As for the created_at column, you can omit the onupdate argument and instead use server_default which sets the function that is called when row is created.
Self-Referencing Tables
It’s not uncommon to have recursive/self-referential relations in database — whether it’s manager -> employee relationship, tree structures or some materialized path. This tip shows how you can setup this kind of relationship using SQLAlchemy:
class Node(Base):
__tablename__ = 'node'
id = Column(Integer, primary_key=True)
parent_id = Column(Integer, ForeignKey('node.id'))
data = Column(String(50))
children = relationship("Node",
backref=backref('parent', remote_side=[id])
)
def __str__(self, level=0):
ret = f"{' ' * level} {repr(self.data)} \n"
for child in self.children:
ret += child.__str__(level + 1)
return ret
def __repr__(self):
return self.data
node = Node(
data="Node 1",
children=[
Node(data="Node 2"),
Node(
data="Node 3",
children=[
Node(data="Node 5")
]
),
Node(data="Node 4"),
]
)
rows = session.query(Node).all()
print(rows[0])
# 'Node 1'
# 'Node 2'
# 'Node 3'
# 'Node 5'
# 'Node 4'
print(rows[2])
# 'Node 3'
# 'Node 5'
For this example we use tree structure created using Node records. Each node has some data, reference to its parent and list of its children. As a convenience method we also include __str__
and __repr__
to help us visualize the tree little better.
If you are fine with normal one-to-many relationship, then you can do it the same way as for any non-self-referential relationship. To make it work for bi-directional relationships however, you need to also include the backref with remote_side=[id] as shown above.
Binding Multiple Databases with Flask
The last one is for all the Flask users. If you ever need to connect to multiple databases — for example, because of multiple geographies or multiple data sources — then you can use SQLALCHEMY_BINDS to specify extra database binds:
# Config
SQLALCHEMY_DATABASE_URI = 'postgres+psycopg2://localhost/emea' # Europe, the Middle East and Africa
SQLALCHEMY_BINDS = {
'emea': 'postgres+psycopg2://localhost/emea', # Europe, the Middle East and Africa
'ap': 'mysqldb://localhost/ap', # Asia Pacific
'la': 'postgres+psycopg2://localhost/la', # Latin America
}
# Models
class User(db.Model):
__bind_key__ = 'emea' # Declare to which database the model belongs to
id = db.Column(db.Integer, primary_key=True)
username = db.Column(db.String(80), unique=True)
In the code snippet above we configure default database by setting SQLALCHEMY_DATABASE_URI and alternative binds in SQLALCHEMY_BINDS. With this configuration, all the above databases will be available to us. Next, we set __bind_key__
of a table to refer to one of the binds, so that whenever we interact with this particular table, SQLAlchemy will know which database to connect to.
If you, however, need to connect to multiple DBs with same tables/schema, you can use multiple engines and sessions — one for each database and switch between them as you wish, like so:
engine_emea = create_engine(...)
engine_ap = create_engine(...)
engine_la = create_engine(...)
session_emea = sessionmaker(bind=engine_emea)
session_ap = sessionmaker(bind=engine_ap)
session_la = sessionmaker(bind=engine_la)