{"id":3069,"date":"2024-04-03T11:47:47","date_gmt":"2024-04-03T06:47:47","guid":{"rendered":"https:\/\/afzalbadshah.com\/?p=3069"},"modified":"2024-04-03T11:47:49","modified_gmt":"2024-04-03T06:47:49","slug":"data-manipulation-with-mongodb-aggregation-framework-in-python","status":"publish","type":"post","link":"https:\/\/afzalbadshah.com\/index.php\/2024\/04\/03\/data-manipulation-with-mongodb-aggregation-framework-in-python\/","title":{"rendered":"Data Manipulation with MongoDB Aggregation Framework in Python"},"content":{"rendered":"\n<p>MongoDB Aggregation Framework is a powerful tool that allows for data manipulation and analysis within MongoDB collections. It provides a flexible and efficient way to process and transform data, enabling users to perform complex operations such as grouping, sorting, filtering, and computing aggregate values. In this lab tutorial, we will introduce the concepts of MongoDB Aggregation Framework, provide a detailed explanation of the code, and walk through each line to understand its functionality.  <a href=\"https:\/\/afzalbadshah.com\/index.php\/category\/courses\/data-driven-applications-using-mongodb-python-and-google-colab\/\" target=\"_blank\" rel=\"noopener\" title=\"\">Visit the detailed tutorial here<\/a>. <\/p>\n\n\n\n<p><strong>Code<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import pymongo\nfrom pymongo import MongoClient\n\n# Connect to MongoDB\nclient = pymongo.MongoClient(\"mongodb+srv:\/\/user:pass@cluster0.ergtejf.mongodb.net\/?retryWrites=true&amp;w=majority&amp;appName=Cluster0\")\n\n# Switch to the desired database\ndb = client.afzal\ncollection = db.test\n\n# Insert sample data into a collection\ndb.collection.insert_many(&#91;\n    { 'name': 'Afzal', 'age': 25, 'city': 'Islamabad' },\n    { 'name': 'Jalal', 'age': 28, 'city': 'Mianwali' }\n    { 'name': 'Yousaf', 'age': 30, 'city': 'Quetta' },\n    { 'name': 'Ibrahim', 'age': 35, 'city': 'Karachi' },\n])\n\nprint(\"Total documents in collection:\", db.collection.count_documents({}))\n\n# Calculate average age\npipeline_avg_age = &#91;{ '$group': { '_id': None, 'avgAge': { '$avg': '$age' } } }]\navg_age_result = list(db.collection.aggregate(pipeline_avg_age))\nprint(\"Average age:\", avg_age_result&#91;0]&#91;'avgAge'])\n\n# Group by city and count\npipeline_city_count = &#91;{ '$group': { '_id': '$city', 'count': { '$sum': 1 } } }]\ncity_count_result = list(db.collection.aggregate(pipeline_city_count))\nprint(\"City count:\", city_count_result)\n\n# Group by city and find max age\npipeline_max_age = &#91;{ '$group': { '_id': '$city', 'maxAge': { '$max': '$age' } } }]\nmax_age_result = list(db.collection.aggregate(pipeline_max_age))\nprint(\"Max age by city:\", max_age_result)\n\n# Filter documents where age is greater than 25\npipeline_filtered = &#91;{ '$match': { 'age': { '$gt': 25 } } }]\nfiltered_result = list(db.collection.aggregate(pipeline_filtered))\nprint(\"Filtered documents:\", filtered_result)\n\n# Sort documents by age in descending order\npipeline_sorted = &#91;{ '$sort': { 'age': -1 } }]\nsorted_result = list(db.collection.aggregate(pipeline_sorted))\nprint(\"Sorted documents:\", sorted_result)<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Connection and Database Selection<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>We import the necessary libraries <code>pymongo<\/code> and <code>MongoClient<\/code>.<\/li>\n\n\n\n<li>We establish a connection to the MongoDB server using the connection string (replace with your actual connection details).<\/li>\n\n\n\n<li>We specify the desired database (<code>afzal<\/code>) and collection (<code>test<\/code>) within the connected client.<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code>import pymongo\nfrom pymongo import MongoClient\n\n# Connect to MongoDB (replace with your connection details)\nclient = pymongo.MongoClient(\"mongodb+srv:\/\/user:pass@cluster0.ergtejf.mongodb.net\/?retryWrites=true&amp;w=majority&amp;appName=Cluster0\")\n\n# Switch to the desired database and collection (replace with your database and collection names)\ndb = client.afzal\ncollection = db.test<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Sample Data and Counting Documents<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>We insert sample data containing documents with names, ages, and cities.<\/li>\n\n\n\n<li>We use <code>db.collection.insert_many<\/code> to insert multiple documents at once.<\/li>\n\n\n\n<li>We then count the total number of documents in the collection using <code>db.collection.count_documents({})<\/code>. The empty dictionary <code>{}<\/code> specifies matching all documents.<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code># Insert sample data into a collection\ndb.collection.insert_many(&#91;\n    { 'name': 'Afzal', 'age': 25, 'city': 'Islamabad' },\n    { 'name': 'Jalal', 'age': 28, 'city': 'Mianwali' }\n    { 'name': 'Yousaf', 'age': 30, 'city': 'Quetta' },\n    { 'name': 'Ibrahim', 'age': 35, 'city': 'Karachi' },\n])\n\nprint(\"Total documents in collection:\", db.collection.count_documents({}))\n<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Calculating Average Age<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>We define an aggregation pipeline named <code>pipeline_avg_age<\/code>.<\/li>\n\n\n\n<li>The pipeline consists of a single stage using the <code>$group<\/code> operator.\n<ul class=\"wp-block-list\">\n<li><code>_id: None<\/code> discards the original document&#8217;s <code>_id<\/code> field in the output.<\/li>\n\n\n\n<li><code>$avg<\/code>: This operator calculates the average value of the <code>age<\/code> field and stores it in the <code>avgAge<\/code> field of the output document.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>We use <code>db.collection.aggregate(pipeline_avg_age)<\/code> to execute the pipeline and retrieve the results.<\/li>\n\n\n\n<li>We convert the results to a list using <code>list<\/code> and access the first element (since there&#8217;s only one document in the output) to get the average age stored in the <code>avgAge<\/code> field.<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code># Calculate average age\npipeline_avg_age = &#91;{ '$group': { '_id': None, 'avgAge': { '$avg': '$age' } } }]\navg_age_result = list(db.collection.aggregate(pipeline_avg_age))\nprint(\"Average age:\", avg_age_result&#91;0]&#91;'avgAge'])<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Grouping by City and Counting Documents<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The pipeline groups documents by their <code>city<\/code> using <code>$group<\/code>.<\/li>\n\n\n\n<li><code>_id: '$city'<\/code> sets the group identifier to the <code>city<\/code> field value.<\/li>\n\n\n\n<li><code>$sum: 1<\/code> increments a counter for each document in the group, resulting in a count of documents for each city.<\/li>\n\n\n\n<li>We process and print the results similar to the previous step.<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code># Group by city and find max age\npipeline_max_age = &#91;{ '$group': { '_id': '$city', 'maxAge': { '$max': '$age' } } }]\nmax_age_result = list(db.collection.aggregate(pipeline_max_age))\nprint(\"Max age by city:\", max_age_result)\n<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code># Group by city and count\npipeline_city_count = &#91;{ '$group': { '_id': '$city', 'count': { '$sum': 1 } } }]\ncity_count_result = list(db.collection.aggregate(pipeline_city_count))\nprint(\"City count:\", city_count_result)\n<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Grouping by City and Finding Maximum Age<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The pipeline groups documents by city using <code>$group<\/code> with <code>_id: '$city'<\/code>.<\/li>\n\n\n\n<li><code>$max: '$age'<\/code> calculates the maximum value of the <code>age<\/code> field within each group, storing it in the <code>maxAge<\/code> field of the output document.<\/li>\n\n\n\n<li>We process and print the results as in previous steps.<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code>pipeline_max_age = &#91;{ '$group': { '_id': '$city', 'maxAge': { '$max': '$age' } } }]\nmax_age_result = list(db.collection.aggregate(pipeline_max_age))\nprint(\"Max age by city:\", max_age_result)<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Filtering Documents<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>We define a pipeline with a single stage using the <code>$match<\/code> operator.<\/li>\n\n\n\n<li><code>$match<\/code> filters documents based on a criteria. Here, it selects documents where the <code>age<\/code> is greater than (<code>$gt<\/code>) 25.<\/li>\n\n\n\n<li>We process and print the results, showcasing the filtered documents.<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code># Filter documents where age is greater than 25\npipeline_filtered = &#91;{ '$match': { 'age': { '$gt': 25 } } }]\nfiltered_result = list(db.collection.aggregate(pipeline_filtered))\nprint(\"Filtered documents:\", filtered_result)<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Sorting Documents<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The pipeline uses the <code>$sort<\/code> operator to arrange documents.<\/li>\n\n\n\n<li>Here, <code>age: -1<\/code> sorts documents by the <code>age<\/code> field in descending order (highest age first).<\/li>\n\n\n\n<li>We process and print the results, displaying the sorted documents.<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code># Sort documents by age in descending order\npipeline_sorted = &#91;{ '$sort': { 'age': -1 } }]\nsorted_result = list(db.collection.aggregate(pipeline_sorted))\nprint(\"Sorted documents:\", sorted_result)\n<\/code><\/pre>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>MongoDB Aggregation Framework is a powerful tool that allows for data manipulation and analysis within MongoDB collections. It provides a flexible and efficient way to process and transform data, enabling users to perform complex operations such as grouping, sorting, filtering, and computing aggregate values. In this lab tutorial, we will introduce the concepts of MongoDB Aggregation Framework, provide a detailed explanation of the code, and walk through each line to understand its functionality. Visit the detailed tutorial here. Code Connection&#8230;<\/p>\n<p class=\"read-more\"><a class=\"btn btn-default\" href=\"https:\/\/afzalbadshah.com\/index.php\/2024\/04\/03\/data-manipulation-with-mongodb-aggregation-framework-in-python\/\"> Read More<span class=\"screen-reader-text\">  Read More<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":3075,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"enabled":false},"version":2}},"categories":[489],"tags":[522,562,502],"class_list":["post-3069","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-driven-applications-using-mongodb-python-and-google-colab","tag-mongodb","tag-mongodb-aggregation-framework","tag-python"],"aioseo_notices":[],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2024\/04\/MongoDB-jpg.webp?fit=1280%2C720&ssl=1","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/pf3emP-Nv","jetpack-related-posts":[],"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/afzalbadshah.com\/index.php\/wp-json\/wp\/v2\/posts\/3069","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/afzalbadshah.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/afzalbadshah.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/afzalbadshah.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/afzalbadshah.com\/index.php\/wp-json\/wp\/v2\/comments?post=3069"}],"version-history":[{"count":8,"href":"https:\/\/afzalbadshah.com\/index.php\/wp-json\/wp\/v2\/posts\/3069\/revisions"}],"predecessor-version":[{"id":3078,"href":"https:\/\/afzalbadshah.com\/index.php\/wp-json\/wp\/v2\/posts\/3069\/revisions\/3078"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/afzalbadshah.com\/index.php\/wp-json\/wp\/v2\/media\/3075"}],"wp:attachment":[{"href":"https:\/\/afzalbadshah.com\/index.php\/wp-json\/wp\/v2\/media?parent=3069"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/afzalbadshah.com\/index.php\/wp-json\/wp\/v2\/categories?post=3069"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/afzalbadshah.com\/index.php\/wp-json\/wp\/v2\/tags?post=3069"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}