
Data Manipulation with MongoDB Aggregation Framework in Python
MongoDB Aggregation Framework is a powerful tool that allows for data manipulation and analysis within MongoDB collections. It provides a flexible and efficient way to process and transform data, enabling users to perform complex operations such as grouping, sorting, filtering, and computing aggregate values. In this lab tutorial, we will introduce the concepts of MongoDB Aggregation Framework, provide a detailed explanation of the code, and walk through each line to understand its functionality. Visit the detailed tutorial here.
Code
import pymongo
from pymongo import MongoClient
# Connect to MongoDB
client = pymongo.MongoClient("mongodb+srv://user:pass@cluster0.ergtejf.mongodb.net/?retryWrites=true&w=majority&appName=Cluster0")
# Switch to the desired database
db = client.afzal
collection = db.test
# Insert sample data into a collection
db.collection.insert_many([
{ 'name': 'Afzal', 'age': 25, 'city': 'Islamabad' },
{ 'name': 'Jalal', 'age': 28, 'city': 'Mianwali' }
{ 'name': 'Yousaf', 'age': 30, 'city': 'Quetta' },
{ 'name': 'Ibrahim', 'age': 35, 'city': 'Karachi' },
])
print("Total documents in collection:", db.collection.count_documents({}))
# Calculate average age
pipeline_avg_age = [{ '$group': { '_id': None, 'avgAge': { '$avg': '$age' } } }]
avg_age_result = list(db.collection.aggregate(pipeline_avg_age))
print("Average age:", avg_age_result[0]['avgAge'])
# Group by city and count
pipeline_city_count = [{ '$group': { '_id': '$city', 'count': { '$sum': 1 } } }]
city_count_result = list(db.collection.aggregate(pipeline_city_count))
print("City count:", city_count_result)
# Group by city and find max age
pipeline_max_age = [{ '$group': { '_id': '$city', 'maxAge': { '$max': '$age' } } }]
max_age_result = list(db.collection.aggregate(pipeline_max_age))
print("Max age by city:", max_age_result)
# Filter documents where age is greater than 25
pipeline_filtered = [{ '$match': { 'age': { '$gt': 25 } } }]
filtered_result = list(db.collection.aggregate(pipeline_filtered))
print("Filtered documents:", filtered_result)
# Sort documents by age in descending order
pipeline_sorted = [{ '$sort': { 'age': -1 } }]
sorted_result = list(db.collection.aggregate(pipeline_sorted))
print("Sorted documents:", sorted_result)Connection and Database Selection
Contents
- We import the necessary libraries
pymongoandMongoClient. - We establish a connection to the MongoDB server using the connection string (replace with your actual connection details).
- We specify the desired database (
afzal) and collection (test) within the connected client.
import pymongo
from pymongo import MongoClient
# Connect to MongoDB (replace with your connection details)
client = pymongo.MongoClient("mongodb+srv://user:pass@cluster0.ergtejf.mongodb.net/?retryWrites=true&w=majority&appName=Cluster0")
# Switch to the desired database and collection (replace with your database and collection names)
db = client.afzal
collection = db.testSample Data and Counting Documents
- We insert sample data containing documents with names, ages, and cities.
- We use
db.collection.insert_manyto insert multiple documents at once. - We then count the total number of documents in the collection using
db.collection.count_documents({}). The empty dictionary{}specifies matching all documents.
# Insert sample data into a collection
db.collection.insert_many([
{ 'name': 'Afzal', 'age': 25, 'city': 'Islamabad' },
{ 'name': 'Jalal', 'age': 28, 'city': 'Mianwali' }
{ 'name': 'Yousaf', 'age': 30, 'city': 'Quetta' },
{ 'name': 'Ibrahim', 'age': 35, 'city': 'Karachi' },
])
print("Total documents in collection:", db.collection.count_documents({}))
Calculating Average Age
- We define an aggregation pipeline named
pipeline_avg_age. - The pipeline consists of a single stage using the
$groupoperator._id: Nonediscards the original document’s_idfield in the output.$avg: This operator calculates the average value of theagefield and stores it in theavgAgefield of the output document.
- We use
db.collection.aggregate(pipeline_avg_age)to execute the pipeline and retrieve the results. - We convert the results to a list using
listand access the first element (since there’s only one document in the output) to get the average age stored in theavgAgefield.
# Calculate average age
pipeline_avg_age = [{ '$group': { '_id': None, 'avgAge': { '$avg': '$age' } } }]
avg_age_result = list(db.collection.aggregate(pipeline_avg_age))
print("Average age:", avg_age_result[0]['avgAge'])Grouping by City and Counting Documents
- The pipeline groups documents by their
cityusing$group. _id: '$city'sets the group identifier to thecityfield value.$sum: 1increments a counter for each document in the group, resulting in a count of documents for each city.- We process and print the results similar to the previous step.
# Group by city and find max age
pipeline_max_age = [{ '$group': { '_id': '$city', 'maxAge': { '$max': '$age' } } }]
max_age_result = list(db.collection.aggregate(pipeline_max_age))
print("Max age by city:", max_age_result)
# Group by city and count
pipeline_city_count = [{ '$group': { '_id': '$city', 'count': { '$sum': 1 } } }]
city_count_result = list(db.collection.aggregate(pipeline_city_count))
print("City count:", city_count_result)
Grouping by City and Finding Maximum Age
- The pipeline groups documents by city using
$groupwith_id: '$city'. $max: '$age'calculates the maximum value of theagefield within each group, storing it in themaxAgefield of the output document.- We process and print the results as in previous steps.
pipeline_max_age = [{ '$group': { '_id': '$city', 'maxAge': { '$max': '$age' } } }]
max_age_result = list(db.collection.aggregate(pipeline_max_age))
print("Max age by city:", max_age_result)Filtering Documents
- We define a pipeline with a single stage using the
$matchoperator. $matchfilters documents based on a criteria. Here, it selects documents where theageis greater than ($gt) 25.- We process and print the results, showcasing the filtered documents.
# Filter documents where age is greater than 25
pipeline_filtered = [{ '$match': { 'age': { '$gt': 25 } } }]
filtered_result = list(db.collection.aggregate(pipeline_filtered))
print("Filtered documents:", filtered_result)Sorting Documents
- The pipeline uses the
$sortoperator to arrange documents. - Here,
age: -1sorts documents by theagefield in descending order (highest age first). - We process and print the results, displaying the sorted documents.
# Sort documents by age in descending order
pipeline_sorted = [{ '$sort': { 'age': -1 } }]
sorted_result = list(db.collection.aggregate(pipeline_sorted))
print("Sorted documents:", sorted_result)
6 thoughts on “Data Manipulation with MongoDB Aggregation Framework in Python”
어제 친구들과 회식 자리로강남가라오케추천다녀왔는데, 분위기도 좋고 시설도 깨끗해서 추천할 만했어요.
요즘 회식 장소 찾는 분들 많던데, 저는 지난주에강남가라오케추천코스로 엘리트 가라오케 다녀와봤습니다.
분위기 있는 술자리 찾을 땐 역시강남하퍼추천확인하고 예약하면 실패가 없더라고요.
회사 동료들이랑강남엘리트가라오케방문했는데, VIP룸 덕분에 프라이빗하게 즐길 수 있었어요.
신논현역 근처에서 찾다가강남룸살롱를 예약했는데, 접근성이 좋아서 만족했습니다.
술자리도 좋지만 요즘은강남셔츠룸가라오케이라고 불릴 만큼 서비스가 좋은 곳이 많더군요.