How to Create Fake Data with Faker
Faker is an open-source python library that allows you to create your own dataset i.e you can generate random data with random attributes like name, age, location, etc. It supports all major locations and languages which is beneficial for generating data based on locality.
Let's say you want to create data with certain data types (bool, float, text, integers) with special characteristics (names, address, color, email, phone number, location) to test some Python libraries or specific implementation. But it takes time to find that specific kind of data. You wonder: is there a quick way that you can create your own data?
This can be done with Faker, a Python package that generates fake data for you, ranging from a specific data type to specific characteristics of that data, and the origin or language of the data. Let’s discover how we can use Faker to create fake data.
Basics of Faker
Start with installing the package
pip install Faker
Some basic methods of Faker:
>>> from faker import Faker
>>> fake = Faker()
>>> fake = Faker()
>>> fake.color_name()
'Red'
>>> fake.name()
'Kyle Johnson'
>>> fake.address()
'0891 Chloe Manors Apt. 227\nSavagechester, MI 27550'
>>> fake.date_of_birth(minimum_age=25)
datetime.date(1951, 9, 16)
>>> fake.job()
'Call centre manager'
>>> fake.city()
'Lake Jim'
But what if I need the Information to be Specific to one Location?
Luckily, we can also specify the location of the data we want to fake. Maybe the character you want to create is from Italy. You also want to create instances of her friends. Since you are from the US, it is difficult for you to generate relevant information to that location. That can be easily taken care of by adding location parameter in the class Faker.
>>> fake = Faker('it_IT')
>>> for _ in range(10):
... print(fake.name())
...
Gianmarco Falloppio
Goffredo Toscani
Dott. Filippa Musatti
Adelasia Pontecorvo
Eleanora Giannotti-Solari
Dina Tremonti
Dott. Gastone Poerio
Flavia Moschino
Pompeo Guglielmi
Rosa Cafarchia
Or create information from multiple locations:
>>> fake = Faker(['it_IT', 'en_US', 'es_ES'])
>>> for _ in range(10):
... print(fake.city())
...
Lake Mary
Quarto Fernanda
Filippini umbro
Navarra
Cassandramouth
Cáceres
Falier sardo
Robertfort
León
East Gwendolyn
Create Random Text
We can create random text with:
>>> fake.text()
'Sport southern with per support mouth. Girl real resource product. Character make record think rich charge could. Computer special employee allow body director action.\nBoy like behind environmental.'
Try with the Japanese language:
>>> fake = Faker('ja')
>>> fake.text()
'賞賛する月花嫁タワー協力犯罪者器官。ヒット索引今緩む意図。\n明らかにする教会保持する販売装置バーゲンリフト大統領。トス運リハビリ。楽しんで主人ささやき鉱山。\n参加する編組リンク追放する。パーセント教授意図合計。\n残る本質的な柔らかいトス賞賛するコミュニティ。持ってるバナーブランチないストレージ必要。\n創傷野球は埋め込む緩む主人。あなた自身スペルじぶんの合計。尊敬するピック器官職人ささやき催眠術。'
Create Text from Selected Words
We can also create sentences by using our own defined word library which contains words of our choice and the faker will generate fake sentences using those words.
>>> from faker import Faker
>>> fake = Faker()
>>> my_words = ['My', 'dog', 'is', '3', 'years', 'old', 'and', 'his', 'name', 'is', 'Jessie']
>>> fake.sentence(ext_word_list=my_words)
'Is years name My.'
>>> fake.sentence(ext_word_list=my_words)
'Name old his My 3.'
Create a Quick Profile Data
We can quickly create a profile with:
>>> fake = Faker()
>>> fake.profile()
{'job': 'Learning mentor', 'company': 'Alvarez, Scott and Martinez', 'ssn': '006-95-9713', 'residence': '579 Joshua Glens Suite 372\nLeahland, IA 88987', 'current_location': (Decimal('56.059606'), Decimal('177.275739')), 'blood_group': 'O+', 'website': ['http://www.solis-weiss.org/'], 'username': 'joseph44', 'name': 'Richard Brady', 'sex': 'M', 'address': '646 Dawson Common Apt. 159\nPort Rachel, VT 57481', 'mail': 'harrisonmaria@hotmail.com', 'birthdate': datetime.date(1960, 4, 9)}
or with specific fields:
>>> fake.profile(fields=['name', 'job', 'mail'])
{'job': 'Hospital doctor', 'name': 'Alexander Keller', 'mail': 'nvasquez@gmail.com'}
Create a fake dataset using faker
Now we will use the faker object functions and generate a dataset that contains profiles of 100 unique people that are fake. Email for people is in ascii chars, so we need install unidecode
package pip install unidecode
For this, we will also use pandas to store these profiles into a data frame.
from faker import Faker
import pandas as pd
import unidecode
class User(object):
f = Faker()
def __init__(self):
self.first_name = User.f.first_name()
self.last_name = User.f.last_name()
self.name = "{} {}".format(self.first_name, self.last_name)
self.age = User.f.pyint(min_value=18, max_value=65)
self.private_email = unidecode.unidecode("{}.{}@{}".format(self.first_name, self.last_name, User.f.free_email_domain()).lower())
data = [User().__dict__ for i in range(100)]
df = pd.DataFrame(data)
df.head()
# OUTPUT:
first_name last_name name age private_email
0 Eric James Eric James 53 eric.james@yahoo.com
1 Michelle Gardner Michelle Gardner 47 michelle.gardner@yahoo.com
2 Julie Oliver Julie Oliver 53 julie.oliver@hotmail.com
3 Olivia Delacruz Olivia Delacruz 62 olivia.delacruz@gmail.com
4 Kimberly Williams Kimberly Williams 57 kimberly.williams@gmail.com
x.__dict__
, it's actually more pythonic to use builtin vars(x)
function. So we can use:data = [vars(User()) for i in range(100)]