As developers, we face many cases when we import data from one source and want to upload this data into another source. This was the same situation I was facing. I exported data from Notion and wanted to import it into the MongoDB Atlas.
To insert data into MongoDB, we will use pymongo python package. To install pymongo use the following command:
pip install pymongo==3.12.3
Note: Please note that there is some issue with a newer version of pymongo. So, I strongly suggest using version 3.12.3.
I will also use the pandas library. I find it easier to work with pandas than with python's default csv module. You can install pandas the same way we installed pymongo.
pip install pandas
How to read CSV data in Python
To read CSV files, as I mentioned earlier, we will use the pandas library. To read a CSV file we need only 2 lines of code.
import pandas as pd df = pd.read_csv("CsvFile.csv") # return a dataframe print(df.head()) # df.head() return first 5 records
Now, after reading the CSV file we need to convert it into
json format. Why? We need to convert it into
dict because MongoDB is JSON based database. And we can easily convert dict to json format and insert data into MongoDB.
How to convert pandas
Pandas provide a method called
_dict() which converts dataframe to an array of dictionaries. We have multiple rows, so we need an array of dictionaries.
data_dict = df.to_dict("records")
Here, the "record" keyword is important. That's the argument which converts it into a proper array of dict. You can check out the official documentation here.
How to insert data to MongoDB
As I mentioned earlier, we will use
pymongo it to connect with MongoDB. You can do that using the following code:
from pymongo import MongoClient with MongoClient(URL) as client: db = client.prod # prod is a database name tools = db.tools # tools is a collection name ...
Here, we have used context manager (With statement), so we don't need to manually close the connection after operations are done. We will need the URL of our database, regardless of whether it is local or hosted somewhere.
Now, we got out a collection, to upload data into this collection we have 2 options. Either we insert row by row or all row data at once. First I will show you how you can insert all data at once.
How to insert multiple rows at once in MongoDB
To insert multiple data at once, we can use
with MongoClient(URL,connect=False) as client: db = client.prod tools = db.tools result = tools.insert_many(data_list)
If you do not want any other operation during insertion and have already prepared an array of dict as we did earlier. This is the way to go.
How to insert data in MongoDB
To insert data row by row, we can use
insert_one() the method. For example, we want to add it to every row.
with MongoClient(URL,connect=False) as client: db = client.Tools tools = db.tools for (index, data) in enumerate(data_list): data['id'] = index # id added to existing data tools.insert_one(dub)
There are so many things we can do with Python and MongoDB. This was one of the examples. If you want to know or learn any other MongoDB operations apart from insertion let me know on Twitter or LinkedIn