Whenever we are developing something, a lot of questions come to our mind. One of the questions that comes to my mind quite often is 'What if my database gets a hell lot of entries? What would happen to my APIs that fetch data from the database? Would they take a lot of time to filter and get the data? How do I handle all this?' I'm sure all the super developers out there would be encountering these questions while developing something. So, today, let's discuss how to handle all this? We'll discuss some basic strategies that we can leverage to optimise the performance of our Django applications.
Performance optimisation is process of improving the speed, responsiveness, efficiency, and overall resource utilization of a software application, so that it can handle a large number of requests and provide a responsive user experience. It could include reducing the number of database queries, using efficient query techniques, caching and so on. So, let's see how can we bring this not so complex definition to life in our favourite Django! Let's go Django!
Let's quickly start a Django project.
python -m venv env #This is a good practice, peeps
pip install django Pillow #Pillow is used to handle Image files
django-admin startproject djangooptimisation
cd djangooptimisation
python manage.py startapp home
Now, we'll add our new and shiny app named 'home' to our INSTALLED_APPS
in the settings.py
file
INSTALLED_APPS = [
....
'home'
]
Now, let's quickly create some models in the home app's models.py
file.
# home/models.py
from django.db import models
class Artist(models.Model):
name = models.CharField(max_length=100)
bio = models.TextField()
def __str__(self):
return self.name
class Album(models.Model):
title = models.CharField(max_length=200)
artist = models.ForeignKey(Artist, on_delete=models.CASCADE)
genre = models.CharField(max_length=50)
release_date = models.DateField()
cover_image = models.ImageField(upload_to='album_covers/')
def __str__(self):
return self.title
Lets migrate the changes to the database.
python manage.py makemigrations
python manage.py migrate
So, we have two models Artist
and Album
, where Album has a Foreign Key relation with Artist. Now, let's say we have 50000 artists in our database, which indeed is a huge number, how much time is it going to take to retrieve some among them. Ummm, I can't guess, can you? Why not ingest 50000 artists in the database and see how it goes. So, we'll just write a simple test case to do so and measure the time taken.
# home/tests.py
from django.test import TestCase
from .models import Artist
import datetime
class ArtistTestCase(TestCase):
def setUp(self):
start_time = datetime.datetime.now()
artists = []
for i in range(50000):
artists.append(Artist(name=f'Artist {i}', bio=f'Artist {i} bio goes like this'))
Artist.objects.bulk_create(artists)
end_time = datetime.datetime.now()
print(f'Ingesting 50000 Artists took us: {end_time - start_time} without any optimisation')
def test_get_artists(self):
start_time = datetime.datetime.now()
for i in range(12000, 14500):
artist = Artist.objects.get(name=f'Artist {i}')
end_time = datetime.datetime.now()
print(f'Fetching 2500 Artists took us: {end_time - start_time} without any optimisation')
Now, we'll run the tests
python manage.py test
So, here's what I got.
So, it took us 1.065 seconds to ingest 50000 artists in the database and to fetch 2500 artists from those, it took us 9.008 seconds. The whole test took 10.079 seconds to run. The ingestion time is kind of fine, but the filtering and retrieval time is not at all good. Imagine we have 10,000,00 entries, our APIs would go crazy, man. It'd feel like waiting in a government office to get some documents. We need to optimise this, right? So, the first technique, we'll be using is Database Indexing.
Database Indexing: Imagine you have a large book with many pages, and you need to find information about a specific topic. Without an index, you would have to manually flip through each page, reading every line until you find the relevant information. This process can be time-consuming and inefficient, especially if the book is extensive.
Now, consider the book's index, which lists keywords along with the page numbers where those keywords appear. The index allows you to quickly locate the relevant pages related to a particular topic without scanning the entire book. Similarly, in a database, an index works as a structured reference to the data, enabling the database engine to locate specific rows efficiently.
So, database indexing is a technique used to optimize the performance of database queries by providing a faster way to look up and retrieve data from database tables. It involves creating an index data structure, which is a separate structure that holds a subset of the data from the main table, organized in a way that allows for efficient search and retrieval operations. This index structure serves as a roadmap to quickly locate specific rows in the table, thereby reducing the time it takes to execute queries. Without an index, a database has to perform a full table scan, examining every row, to find the requested data. With an index, the database engine can use the index structure to locate the relevant rows directly, significantly speeding up the data retrieval process.
Time to bring this theory to code. Django has made database indexing so very simple. Just a small change in the code makes a super-huge impact. We'll just add db_index=True
to the name field of the Artist model and examine the performance again.
from django.db import models
class Artist(models.Model):
name = models.CharField(max_length=100, db_index=True)
bio = models.TextField()
def __str__(self):
return self.name
python manage.py makemigrations
python manage.py migrate
Now, let's run our tests again and see if there's any change
python manage.py test
Damn. I'm surprised. The time to fetch has been reduced to 0.79 seconds. Really! This is super-awesome. We can calculate the performance improvement by the formula (|New time - Old time| / Old time) \ 100%,* which, in our case, would give a result of 80.88%, which is damn awesome, but that's not all.
The second strategy that we'll use is Query Optimisation.
Query Optimisation: So, here, we'll simply try to optimise the database queries we write. How? Let's see the methods.
Using select_related: This is primarily used when we are dealing with models having Foreign Keys in between. In our case Album has a Many-to-One relationship with Artist. So, imagine we want to fetch some Albums and also fetch their respective Artist names. We have multiple ways to do that, but
select_related
is an optimal way and increases the performance. Now, we'll see how to use it and how does it improve the performance. So, for this, we'll write some more test cases and examine the performance of every case.from django.test import TestCase from .models import Artist, Album import datetime class AlbumTestCase(TestCase): def setUp(self): start_time = datetime.datetime.now() albums = [] for i in range(50000): artist = Artist.objects.create(name=f'Artist {i}', bio=f'Artist {i} bio') albums.append(Album(title=f'Album {i}', artist=artist, genre=f'Genre {i}', release_date=datetime.date.today(), cover_image=f'album_covers/album_{i}.jpg')) Album.objects.bulk_create(albums) end_time = datetime.datetime.now() print(f'Ingesting 50000 Albums took us: {end_time - start_time}') def test_get_albums_without_select_related(self): start_time = datetime.datetime.now() for i in range(12000, 14500): album = Album.objects.get(title=f'Album {i}') album_artist = album.artist.name end_time = datetime.datetime.now() print(f'Fetching 2500 Albums and their Artist name without using select related took us: {end_time - start_time}') def test_get_albums_with_select_related(self): start_time = datetime.datetime.now() for i in range(12000, 14500): album = Album.objects.select_related('artist').get(title=f'Album {i}') album_artist = album.artist.name end_time = datetime.datetime.now() print(f'Fetching 2500 Albums and their Artist name using select related took us: {end_time - start_time}')
Now, let's run the tests and see what's the result.
Clearly, we can see that using
select_related
the query took lesser time than the query that didn't useselect_related
. Now, what's the logic behind it. It's pretty simple. The query that usesselect_related
does not hit the database again for executingalbum_artist = album.artist.name
. Lesser the database hits, lesser the execution time. 4.8% optimisation might not feel big here, but as soon as the database size increases, it'd start looking like a significant change. Let's move aheadUsing prefetch_related: While
select_related()
is used with ForeignKeys,prefetch_related()
is used with ManyToMany fields. Assuming ourAlbum
model has a ManyToMany field calledsongs
,prefetch_related()
would be used like this:albums = Album.objects.prefetch_related('songs').all()
This way we don't have to query the database again to get the songs associated with a particular Album. One query does the job.
Retrieve whatever you need: The simplest yet efficient technique. So, if I need to fetch an Album, but the requirement is just the title and the artist. The other fields aren't required by a specific query, so why'd I fetch them. It has no complex logic behind it. Lesser the data, lesser the time taken by it to send across different services, right? But how do we do it. Simple.
albums_data = Album.objects.only("title", "artist")
There are other simple techniques like using contains()
instead of using conditions to improve the performance.
The next most important strategy that we'll be using is something you'd have heard about a lot, but not used that much. You might have studied this in Operating Systems or System Design, but now is the time to put it in code. Yes, I'm talking about Caching. So, let's discuss some theoretical aspects to understand it or maybe use some simple flow charts as we always do.
Caching:
Let's discuss how exactly it works. We all know that Cache is a small yet very fast memory used to store and retrieve data more quickly than fetching the data from its original source, in our case, the database. Django provides us it's own cache framework allows us to store various types of data, including database query results, HTML fragments, or any other computationally expensive or time-consuming operation results. We can use different sorts of cache like the Database Cache, which creates a cache table in the database itself or In-Memory Cache, which stores cached data in the server's memory or File-Based Cache, where cached data is stored in files on the filesystem. This is useful when you want to cache data without relying on a separate server or external service. We'll be using an In-Memory Cache. Let's see how to set it up.
In our settings.py
file, we'll add the following code.
CACHES = {
'default': {
'BACKEND': 'django.core.cache.backends.locmem.LocMemCache',
'LOCATION': 'musaibs-cache',
}
}
# Use any unique name, my name has a copyright!
Now, we'll write a simple view to get our Albums, assuming we have a lot of them stored in the database.
from django.core.cache import cache
from .models import Album
def get_albums():
cached_data = cache.get('albums')
if cached_data is not None:
return cached_data
data = Article.objects.all()
cache.set('albums', data, 3600)
return data
Let's discuss how it actually works.
cached_data = cache.get('albums')
:- This line attempts to retrieve data from the cache using the key
'albums'
. If the data is found in the cache, it is assigned to the variablecached_data
. If not,cached_data
will beNone
.
- This line attempts to retrieve data from the cache using the key
if cached_data is not None:
:- This condition checks whether data was successfully retrieved from the cache. If
cached_data
is notNone
, it means the data was found in the cache, and the function returns the cached data.
- This condition checks whether data was successfully retrieved from the cache. If
data = Album.objects.all()
:- If the data is not found in the cache, this line fetches all records from the
Album
model usingAlbum.objects.all()
.
- If the data is not found in the cache, this line fetches all records from the
cache.set('albums', data, 3600)
:- After fetching the data from the database, the function stores it in the cache with the key
'albums'
and a timeout of 3600 seconds (1 hour). This means that the data will be kept in the cache for 1 hour before it is considered stale.
- After fetching the data from the database, the function stores it in the cache with the key
return data
:- Finally, the function returns the fetched data, whether it was retrieved from the cache or fetched from the database.
This way, we can reduce our database hits by caching our data for some time. If the client requests for the same data again, we can supply the data without a database lookup, which would significantly improve the performance.
In conclusion, optimizing a Django application is a multifaceted journey that involves thoughtful design, efficient querying, and strategic use of caching mechanisms. By delving into the intricacies of database indexing and harnessing the power of caching, we've explored ways to significantly enhance the performance of our application.
Remember, performance optimization is an ongoing process, and the strategies discussed here are just the tip of the iceberg. Continuously monitoring and analysing the application's performance, staying updated on Django best practices, and adapting to evolving requirements will ensure a robust and efficient web application.
Keep Learning, Keep Sharing, Keep Djangoing!
Go Django!
Peace, Dot!