How Validate Object Attributes in Python
Generally speaking, type checking and value checking are handled by Python in a flexible and implicit way. Python has introduced typing module since Python3 which provides runtime support for type hints. But for value checking, there is no unified way to validate values due to its many possibilities.
One of the scenarios where we need value checking is when we initialize a class instance. We want to ensure valid input attributes in the first stage, for example, an email address should have the correct format xxx@xxxxx.com, an age should not be negative, the surname should not exceed 20 characters, etc.
In this article, I want to demonstrate 7 out of many options to validate class attributes using either Python built-in modules or third-party libraries. I’m curious which option you prefer, please tell me in the comments. If you know other good options, you are welcome to share as well.
Create validation functions
We start with the most straightforward solution: creating a validation function for each requirement. Here we have 3 methods to validate name, email, and age individually. The attributes are validated in sequence, any failed validation will immediately throw a ValueError exception and stop the program.
import re
class Citizen:
def __init__(self, id, name, email, age):
self.id = id
self.name = self._is_valid_name(name)
self.email = self._is_valid_email(email)
self.age = self._is_valid_age(age)
def _is_valid_name(self, name):
if len(name) > 20:
raise ValueError("Name cannot exceed 20 characters.")
return name
def _is_valid_email(self, email):
regex = "^[a-z0-9]+[\._]?[a-z0-9]+[@]\w+[.]\w{2,3}$"
if not re.match(regex, email):
raise ValueError("It's not an email address.")
return email
def _is_valid_age(self, age):
if age < 0:
raise ValueError("Age cannot be negative.")
return age
citizen_ok = Citizen("id1", "John Smith", "john_smith@gmail.com", 27)
citizen_err = Citizen("id1", "John Smith1234567890123456789", "john_smith@gmail.com", 27)
# ValueError: Name cannot exceed 20 characters.
citizen_err = Citizen("id1", "John Smith", "john_smith@gmail.c", 27)
# ValueError: It's not an email address.
citizen_err = Citizen("id1", "John Smith", "john_smith@gmail.com", -27)
# ValueError: Age cannot be negative.
This option is simple, but on the other hand, it's probably not the most Pythonic solution you've ever seen and many people prefer to have a clean __init__
as much as possible.
citizen_ok = Citizen("id1", "John Smith", "john_smith@gmail.com", 27)
citizen_ok.email = "john_smith@gmail.c"
citizen_ok.email
'john_smith@gmail.c'
# This email is not valid, but still accepted by the code
Python @property
The second option uses a built-in function: @property . It works as a decorator that is added to an attribute. According to Python documentation:
@property
A property object has getter, setter, and deleter methods usable as decorators that create a copy of the property with the corresponding accessor function set to the decorated function.At the first glance, it creates more code than the first option, but on the other hand, it relieves the responsibility of __init__
. Each attribute has 2 methods (except for id), one with @property
, the other one with setter. Whenever an attribute is retrieved like citizen.name, the method with @property is called. When an attribute value is set during initialization or updating like citizen.name="John Smith"
, the method with setter is called.
import re
class Citizen:
def __init__(self, id, name, email, age):
self._id = id
self.name = name
self.email = email
self.age = age
@property
def id(self):
return self._id
@property
def name(self):
return self._name
@name.setter
def name(self, value):
if len(value) > 20:
raise ValueError("Name cannot exceed 20 characters.")
self._name = value
@property
def email(self):
return self._email
@email.setter
def email(self, value):
regex = "^[a-z0-9]+[\._]?[a-z0-9]+[@]\w+[.]\w{2,3}$"
if not re.match(regex, value):
raise ValueError("It's not an email address.")
self._email = value
@property
def age(self):
return self._age
@age.setter
def age(self, value):
if value < 0:
raise ValueError("Age cannot be negative.")
self._age = value
citizen_ok = Citizen("id1", "John Smith", "john_smith@gmail.com", 27)
citizen_err = Citizen("id1", "John Smith1234567890123456789", "john_smith@gmail.com", 27)
# ValueError: Name cannot exceed 20 characters.
citizen_err = Citizen("id1", "John Smith", "john_smith@gmail.c", 27)
# ValueError: It's not an email address.
citizen_err = Citizen("id1", "John Smith", "john_smith@gmail.com", -27)
# ValueError: Age cannot be negative.
This option moves validation logic to the setter method of each attribute and therefore keeps __init__
very clean. Besides, the validation also applies to every update of each attribute after initialization. So the code in the previous example is not accepted anymore.
citizen_ok = Citizen("id1", "John Smith", "john_smith@gmail.com", 27)
citizen_ok.email = "john_smith@gmail.c"
# ValueError: It's not an email address.
citizen_ok.email
'john_smith@gmail.com'
Attribute id is an exception here because it doesn’t have a setter method. This is because I want to tell the client that this attribute is not supposed to be updated after initialization. If you try to do that, you will get an AttributeError exception.
citizen_ok = Citizen("id1", "John Smith", "john_smith@gmail.com", 27)
citizen_ok.id
'id1'
citizen_ok.id = 'id2'
# AttributeError: can't set attribute
citizen_ok = Citizen("id1", "John Smith", "john_smith@gmail.com", 27)
citizen_ok.age = -4
# Traceback (most recent call last):
# ValueError: Age cannot be negative.
Use Python Descriptors
The third option makes use of Python Descriptors which is a powerful but often overlooked feature. Maybe the community has realized this problem, since Python3.9, examples of using descriptors to validate attributes have been added to the documentation.
Here is the code using descriptors. Every attribute becomes a descriptor which is a class with methods __get__
and __set__
. When the attribute value is set like self.name=name
, then __set__
is called. When the attribute is retrieved like print(self.name)
, then __get__
is called.
import re
class Name:
def __get__(self, obj):
return self.value
def __set__(self, obj, value):
if len(value) > 20:
raise ValueError("Name cannot exceed 20 characters.")
self.value = value
class Email:
def __get__(self, obj):
return self.value
def __set__(self, obj, value):
regex = "^[a-z0-9]+[\._]?[a-z0-9]+[@]\w+[.]\w{2,3}$"
if not re.match(regex, value):
raise ValueError("It's not an email address.")
self.value = value
class Age:
def __get__(self, obj):
return self.value
def __set__(self, obj, value):
if value < 0:
raise ValueError("Age cannot be negative.")
self.value = value
class Citizen:
name = Name()
email = Email()
age = Age()
def __init__(self, id, name, email, age):
self.id = id
self.name = name
self.email = email
self.age = age
citizen_ok = Citizen("id1", "John Smith", "john_smith@gmail.com", 27)
citizen_err = Citizen("id1", "John Smith1234567890123456789", "john_smith@gmail.com", 27)
# ValueError: Name cannot exceed 20 characters.
citizen_err = Citizen("id1", "John Smith", "john_smith@gmail.c", 27)
# ValueError: It's not an email address.
citizen_err = Citizen("id1", "John Smith", "john_smith@gmail.com", -27)
# ValueError: Age cannot be negative.
This solution is comparable to @property . It works better when the descriptors can be reused in multiple classes. For example, in the class of Employee, we can simply reuse previous descriptors without creating many boilerplate code.
class Salary:
def __get__(self, obj):
self.value
def __set__(self, obj, value):
if value < 1000:
raise ValueError("Salary cannot be lower than 1000.")
self.value = value
class Employee:
name = Name()
email = Email()
age = Age()
salary = Salary()
def __init__(self, id, name, email, age, salary):
self.id = id
self.name = name
self.email = email
self.age = age
self.salary = salary
emp = Employee("id1", "John Smith", "john_smith@gmail.com", 27, 1000)
emp = Employee("id1", "John Smith", "john_smith@gmail.com", 27, 900)
# ValueError: Salary cannot be lower than 1000.
citizen_ok = Citizen("id1", "John Smith", "john_smith@gmail.com", 27)
citizen_ok.age = -4
# ValueError: Age cannot be negative.
Combine Decorator and Descriptor
A variant of option3 is to combine decorator and descriptor. The end result looks like the following where the rules are encapsulated in those decorators.
def email(attr):
def decorator(cls):
setattr(cls, attr, Email())
return cls
return decorator
def age(attr):
def decorator(cls):
setattr(cls, attr, Age())
return cls
return decorator
def name(attr):
def decorator(cls):
setattr(cls, attr, Name())
return cls
return decorator
@email("email")
@age("age")
@name("name")
class Citizen:
def __init__(self, id, name, email, age):
self.id = id
self.name = name
self.email = email
self.age = age
These decorators can be extended quite easily. For example, you can have more generic rules with multiple attributes applied such as @positive_number(attr1,attr2)
.
Until now, we have gone through 4 options using only built-in functions. In my opinion, Python built-in functions are already powerful enough to cover what we often need for data validation. But let’s also look around and see some third-party libraries.
Object Validation in Python @dataclass
Another way to create a class in Python is using @dataclass
. Dataclass provides a decorator for automatically generating __init__()
method.
Besides, @dataclass
also introduces a special method called __post_init__()
, which is invoked from the hidden __init__()
. __post_init__
is the place to initialize a field based on other fields or include validation rules.
from dataclasses import dataclass
import re
@dataclass
class Citizen:
id: str
name: str
email: str
age: int
def __post_init__(self):
if self.age < 0:
raise ValueError("Age cannot be negative.")
regex = "^[a-z0-9]+[\._]?[a-z0-9]+[@]\w+[.]\w{2,3}$"
if not re.match(regex, self.email):
raise ValueError("It's not an email address.")
if len(self.name) > 20:
raise ValueError("Name cannot exceed 20 characters.")
citizen_ok = Citizen("id1", "John Smith", "john_smith@gmail.com", 27)
citizen_err = Citizen("id1", "John Smith1234567890123456789", "john_smith@gmail.com", 27)
# ValueError: Name cannot exceed 20 characters.
citizen_err = Citizen("id1", "John Smith", "john_smith@gmail.c", 27)
# ValueError: It's not an email address.
citizen_err = Citizen("id1", "John Smith", "john_smith@gmail.com", -27)
# ValueError: Age cannot be negative.
This option has the same effect as option 1, but using @dataclass style. If you prefer using @dataclass rather than the traditional class, then this option could be something for you.
citizen_ok = Citizen("id1", "John Smith", "john_smith@gmail.com", 27)
citizen_ok.age = -4
citizen_ok
# Citizen(id='id1', name='John Smith', email='john_smith@gmail.com', age=-4)
Use the third-party library — Pydantic
Pydantic is a library similar to Marshmallow. It also follows the idea of creating a schema or model for the object and meanwhile provides many pre-cooked validation classes like PositiveInt , EmailStr , etc. Compared to Marshmallow , Pydantic integrates validation rules into the object class rather than creating a separate schema class.
Here is how we can achieve the same goal using Pydantic. ValidationError stores all 3 errors found in the object.
import re
from datetime import datetime
from pydantic import BaseModel, ValidationError, validator, PositiveInt, EmailStr
class HomeAddress(BaseModel):
postcode: str
city: str
country: str
class Config:
anystr_strip_whitespace = True
@validator('postcode')
def dutch_postcode(cls, v):
if not re.match("^\d{4}\s?\w{2}$", v):
raise ValueError("must follow regex ^\d{4}\s?\w{2}$")
return v
class Citizen(BaseModel):
id: str
name: str
birthday: str
email: EmailStr
age: PositiveInt
address: HomeAddress
@validator('birthday')
def valid_date(cls, v):
try:
datetime.strptime(v, "%Y-%m-%d")
return v
except ValueError:
raise ValueError("date must be in YYYY-MM-DD format.")
try:
citizen = Citizen(
id="1234",
name="john_smith_1234567889901234567890",
birthday="1998-01-32",
email="john_smith@gmail.",
age=0,
address=HomeAddress(
postcode="1095AB", city=" Amsterdam", country="NL"
),
)
print(citizen)
except ValidationError as e:
print(e)
# 3 validation errors for Citizen
# birthday
# date must be in YYYY-MM-DD format. (type=value_error)
# email
# value is not a valid email address (type=value_error.email)
# age
# ensure this value is greater than 0 (type=value_error.number.not_gt; limit_value=0)
Actually, Pydantic could do much more than that. It could also export a schema
via schema_json
method.
print(Citizen.schema_json(indent=2))
{
"title": "Citizen",
"type": "object",
"properties": {
"id": {
"title": "Id",
"type": "string"
},
"name": {
"title": "Name",
"type": "string"
},
"birthday": {
"title": "Birthday",
"type": "string"
},
"email": {
"title": "Email",
"type": "string",
"format": "email"
},
"age": {
"title": "Age",
"exclusiveMinimum": 0,
"type": "integer"
},
"address": {
"$ref": "#/definitions/HomeAddress"
}
},
"required": [
"id",
"name",
"birthday",
"email",
"age",
"address"
],
"definitions": {
"HomeAddress": {
"title": "HomeAddress",
"type": "object",
"properties": {
"postcode": {
"title": "Postcode",
"type": "string"
},
"city": {
"title": "City",
"type": "string"
},
"country": {
"title": "Country",
"type": "string"
}
},
"required": [
"postcode",
"city",
"country"
]
}
}
}
The schema is compliant with JSON Schema Core , JSON Schema Validation and OpenAPI .
citizen = Citizen(
id="1",
name="John Smith",
birthday="1990-01-01",
email="john_smith@gmail.com",
age=28,
address=HomeAddress(
postcode="109505", city=" Amsterdam", country="NL"
),
)
citizen
# Citizen(id='1', name='John Smith', birthday='1990-01-01', email='john_smith@gmail.com', age=28, address=HomeAddress(postcode='109505', city='Amsterdam', country='NL'))
citizen.age = -4
citizen
# Citizen(id='1', name='John Smith', birthday='1990-01-01', email='john_smith@gmail.com', age=-4, address=HomeAddress(postcode='109505', city='Amsterdam', country='NL'))