IBM Cloud Docs
Explore time series collections in Databases for MongoDB

Explore time series collections in Databases for MongoDB

Time series data is a sequence of data points in which insights are gained by analyzing changes over time. Typical use cases are things, such as monitoring or sensor devices that report data (temperature or pressure) at regular intervals.

MongoDB offers the abiity to define a collection as a time series collection, which allows the data to be stored more efficiently and retrieved faster.

If your use case involves time series data, make use of time series collections to benefit from faster query performance and efficient storage.

In this tutorial you will create a Databases for MongoDB instance, load the same pre-generated data into two different collections, one being a time series collection. You will then run the same query against both collections so that you can see the difference it makes to query execution to have a time series collection.

Databases for MongoDB is a paid-for service, so following this tutorial will incur charges.

Before you begin

Before you begin, ensure you have the following:

Obtain an API key to deploy infrastructure to your account

Follow these steps to create an IBM Cloud API key that enables Terraform to provision infrastructure into your account. You can create up to 20 API keys.

For security reasons, the API key is only available to be copied or downloaded at the time of creation. If the API key is lost, you must create a new API key.

Clone the project

Get a local copy of the code you will need by cloning the public Github repository. Use the following command:

git clone https://github.com/IBM/ibm-mongodb-timeseries-collections.git

Deploy an instance of Databases for MongoDB to your account

In this step you will deploy an instance of Databases for MongoDB and obtain key information to connect to it.

  1. Navigate into the terraform folder of the cloned project.

    cd ibm-mongodb-timeseries-collections/terraform
    
  2. In that folder, create a document that is named terraform.tfvars, with the following fields:

     ibmcloud_api_key = "<your_api_key_from_step_1>"
     region = "<the IBM region where you will deploy the MongoDB database>"
     admin_password = "<the password of your mongodb admin user>"
    

    The terraform.tfvars document contains variables that you might want to keep secret.

  3. Install the infrastructure with the following command:

    terraform init 
    terraform apply --auto-approve
    

Create access credentials

In this step you will create some environment variables to access the MongoDB deployment. Use the following command (inside the terraform folder):

terraform output --json | jq -r .cert.value | base64 --decode > ../mongo.cert
export MONGO_URL=`terraform output --json | jq -r .url.value | sed 's/^.*@//'| sed 's/\/.*$//'` 
export MONGO_URL="replset/$MONGO_URL"
export MONGO_PASSWORD=`terraform output --json | jq -r .password.value`

Upload data

You are now ready to upload data. Your project contains a zip file (dump.zip) with more than a million timestamped records. Each one of them has a reading from a sensor located in London:

{
   "timestamp":"2024-05-20T09:25:47.645Z",
   "metadata":{
      "location":"London",
      "sensor_id":"3406"
   },
   "temp":18.756,
   "pressure":105.0461,
   "speed":89.4478
}

This data is ready to be imported into MongoDB using a tool called mongorestore. It uses data exported from another MongoDB instance which is already in Databases for MongoDB's BSON format.

From the terraform directory of your project, use the following command:

cd ..
unzip dump.zip
mongorestore -u admin -p $MONGO_PASSWORD --ssl --sslCAFile mongo.cert --authenticationDatabase admin --host $MONGO_URL dump/

Query your data

You are now ready to see the difference that having a timeseries collection can make to the efficiency of your queries.

You can log into your Databases for MongoDB instance with the following command:

mongosh -u admin -p $MONGO_PASSWORD --tls --tlsCAFile mongo.cert --authenticationDatabase admin --host $MONGO_URL

You are now inside the Mongo shell. To use the uploaded data, type:

use timeseriesdb

Now you will run the same query on both the time series and non-time series collections. It is a simple query that retrieves the data from a single sensor between two dates. You will add an "explain" function that will show you how the query will be executed by the database. In the Mongo shell type:

db.nontsCollection.explain("executionStats").find({"timestamp":{ $gt: new Date("2024-10-23T00:00:00.000Z"), $lt: new Date("2024-10-23T01:10:00.000Z")   }, "metadata.sensor_id": { $eq: "3405"} })

db.tsCollection.explain("executionStats").find({"timestamp":{ $gt: new Date("2024-10-23T00:00:00.000Z"), $lt: new Date("2024-10-23T01:10:00.000Z")   }, "metadata.sensor_id": { $eq: "3405"} })

Now, compare the output of both executions. Interpreting most of it is beyond the scope of this tutorial, but if you look at the docsExamined parameter, you will see that in the case of the time series collection, the database only had to look at 22 documents to find enough matching documents to return the first page of results. Meanwhile, the same query in a non-time series collection had to examime all 1,253,999 documents in the collection (a full collection scan) to be able to retrieve the same result set.

Clearly, the more documents you have in a non-timeseries collection, the worse your query performance will get.

A time series collection is automatically indexed by time, so queries on time ranges are very efficient. In addition, time series collections store their metadata values (such as the sensor id in this case) in a columnar format {: external}, which makes queries on one of those items combined with time even more efficient.

Conclusion and Next Steps

In this tutorial you used an Databases for MongoDB instance to store time series data, that is data where time is a critical factor. You learned that by storing such data in a time series collection you can run much faster queries because MongoDB stores this type of data more efficiently and can therefore retrieve it much more quickly. You don't even have to create indexes on these fields yourself because the database does it for you.

You can learn more about time series collections in the MongoDB documentation

Tear dowm your infrastructure

Your Databases for MongoDB incurs charges. After you finish this tutorial, you can remove all the infrastructure by going to the terraform directory of the project and using the command:

terraform destroy