Indexing on MongoDB Collection - Part 1

Indexing on MongoDB Collection - Part 1

Indexing makes the read queries faster. We can apply indexing on any of the fields in the document including the embedded fields. When documents are indexed mongo DB will search with filtered offset first rather than doing COLLSCAN ( column-span) on all documents and then applying filter criteria to it.

Let us understand this by creating an index to the existing collection and comparing the query execution stats before and after applying the indexing.

By the way, I am using Mongo DB Atlas Free cluster for this walk-through. Mongo Altas is a cloud-based solution to host and manage NoSQL data and it also provides a free cluster for learning purposes with one click option to load sample data set into your free cluster. I liked it :)

To connect to the Atlas cluster I am using VS Code extension “MongoDB VS Code”. There are other ways as well such as using mongo shell, mongo Atlas UI. Please refer to the link below https://www.mongodb.com/docs/atlas/getting-started/

I am using the “sample_airbnb” data set loaded into my Atlas account and querying them using MongoDB for VS code extension.

Please note, In case the image screenshot are not clearly visible. kindly open them in a new tab or click on the image description.

Fetch query on property type without having any index on “listingAndReview” collection

In above query, I am executing the find query on collection: “listingAndReviews” where the property_type is house. I am also using regex to fetch result ignoring case sensitive and the explain(‘executionStats’) function to get the query plan.

In the presence of indexing, the winningPlan section will have inputStage.stage as IXSCAN

execution stats for “listingandReview” with Indexing.

I remove all pre-created indexes from the “listingAndReviews” collection and re-execute the same query. We can see the winningPlan.stage as “COLLSPAN” and the executionStats.executionTimeMillis as 118. The totalDocsExamined as 5555 to find the 606 records matching our search criteria.

execution stats without applying indexing to “listingAndReviews” collection

Let us create an index on the field “property_ type” to fetch results faster.

Create a Single index on property type field

execution stats for fetch query after applying indexing on “property_type”

To define an index, We need to understand the queries required and the volume of read and write operations that going to be performed on the collection. Indexing will have some amount of taxation on the write operation that’s the reason we shouldn’t apply indexing to all the document fields.

Note\ Creating indices on the fields where the fetch query will return close to all documents from the collection will not help and in fact will slow the query execution. Create an index based on the query use case such that the query result should return only the limited fetch result.*

Compound Index

The above examples are on the single index field. There is another type of index called “Compound Index” where two or more fields are considered for indexing in the same order they were added.

If we have two fields added as compound indexes the precedence will start from left to right. For example, let us create a compound index on “listingAndReviews” collection using “beds and address.country_code ” (embedded document) fields.

Compound index using no of beds and address country code

The order of the fields is important here and precedence is from left to right. i.e. the field beds get first priority and followed by address.country_code.

In the above query, Even though we use two filters in the find query to fetch results having beds count as 1 and can accommodate 4 people. here “accommodates” field is not part of our index keys. Mongo DB performs IXSCAN i.e. index scan and not the COLLSPAN column spanning all the documents within the collection because we have the “beds” added as part of our key index and the precedence is from left to right.

In case we query only with accommodates fields, Mongo DB performs COLLSPAN and examines all the documents of a total count of 5555

MongoDB Read Queries executed so far

// Select the database to use.
use('sample_airbnb');db.listingsAndReviews.find({}).count()db.listingsAndReviews.find({"property_type":{$eq:"House"}})// { '$regex':/^yourValue$/i}db.listingsAndReviews.find({"property_type":{$regex:/^house$/i}}).explain('executionStats')//Create index on field property_type 1 denotes ascending and -1 descending
db.listingsAndReviews.createIndex({"property_type":1})//Create compound index on beds and address.country_code
db.listingsAndReviews.createIndex({'beds':1 ,'address.country_code':1}, {'name':'ixd_bed_addr.country'})db.listingsAndReviews.find({'beds':1,'accommodates': {$eq:4}}).explain('executionStats')db.listingsAndReviews.find({'accommodates': {$eq:4}}).explain('executionStats')db.listingsAndReviews.find({'accommodates': {$eq:4}}).sort({'maximum_nights':1}).explain('executionStats')

Rules for defining an index

Use the ESR (Equality, Sort, Range) Rule

Create Indexes to Support Your Queries

Use Indexes to Sort Query Results

We need index not only for fetching the results quickly but also to use sorting. In case of non indexed document, Mongo DB has limit of 32 MB in memory and this will time out in case sorting (Default in- memory sorting) of huge millions of document without indexing.
  1. Ensure Indexes Fit in RAM

  2. Create Queries that Ensure Selectivity

In the next part of this series, we will drill down to other types of indexes and the best practices to follow while creating indexes.

To be continued.

Did you find this article valuable?

Support Vivekananthan Pasupathi by becoming a sponsor. Any amount is appreciated!