{"id":3211,"date":"2024-05-09T07:55:15","date_gmt":"2024-05-09T02:55:15","guid":{"rendered":"https:\/\/afzalbadshah.com\/?p=3211"},"modified":"2024-05-09T08:01:46","modified_gmt":"2024-05-09T03:01:46","slug":"matrix-multiplication-on-multi-processors-mpi4py","status":"publish","type":"post","link":"https:\/\/afzalbadshah.com\/index.php\/2024\/05\/09\/matrix-multiplication-on-multi-processors-mpi4py\/","title":{"rendered":"Matrix Multiplication on Multi-Processors: MPI4PY"},"content":{"rendered":"<p>\n\n\n<p><\/p>\n\n\n<\/p>\n<p>\n\n\n<\/p>\n<p>In this scenario, each processor handles a portion of the matrices, performing computations independently, and then the results are combined to obtain the final result. This parallelization technique leverages the capabilities of multiple processors to expedite the overall computation time.&nbsp;<\/p>\n<p>\n\n\n<\/p>\n<p>\n\n\n<\/p>\n<p><strong>Code:<\/strong><\/p>\n<p>\n\n\n<\/p>\n<p>\n\n\n<\/p>\n<pre><code>from mpi4py import MPI\nimport numpy as np\n# Function to perform matrix multiplication\ndef matrix_multiply(A, B):\n    C = np.zeros((A.shape[0], B.shape[1]))\n    for i in range(A.shape[0]):\n        for j in range(B.shape[1]):\n            for k in range(A.shape[1]):\n                C[i][j] += A[i][k] * B[k][j]\n    return C\n# Initialize MPI\ncomm = MPI.COMM_WORLD\nrank = comm.Get_rank()\nsize = comm.Get_size()\n# Master process\nif rank == 0:\n    # Generate matrices A and B\n    A = np.random.rand(2, 2)\n    B = np.random.rand(2, 2)\n    # Split matrices for distribution\n    chunk_size = A.shape[0] \/\/ size\n    A_chunks = [A[i:i+chunk_size] for i in range(0, A.shape[0], chunk_size)]\n    # Send parts of A and B to worker processes\n    for i in range(1, size):\n        comm.send(A_chunks[i-1], dest=i, tag=1)\n        comm.send(B, dest=i, tag=2)\n    # Calculate its own part of multiplication\n    C_partial = matrix_multiply(A_chunks[0], B)\n    # Collect results from worker processes\n    for i in range(1, size):\n        C_partial += comm.recv(source=i, tag=3)\n    # Print the resulting matrix\n    print(\"Resulting matrix C:\")\n    print(C_partial)\n# Worker processes\nelse:\n    # Receive matrix chunks from master\n    A_chunk = comm.recv(source=0, tag=1)\n    B = comm.recv(source=0, tag=2)\n    # Perform multiplication\n    C_partial = matrix_multiply(A_chunk, B)\n    # Send back the result to master\n    comm.send(C_partial, dest=0, tag=3)\n<\/code><\/pre>\n<p>\n\n\n<\/p>\n<p>\n\n\n<\/p>\n<h2 class=\"wp-block-heading\"><strong>Explanation<\/strong><\/h2>\n<p>\n\n\n<\/p>\n<p>\n\n\n<\/p>\n<p><strong>Import MPI Module and Initialize MPI Environment<\/strong><\/p>\n<p>\n\n\n<\/p>\n<p>\n\n\n<\/p>\n<pre><code>from mpi4py import MPI<\/code><\/pre>\n<p>\n\n\n<\/p>\n<p>\n\n\n<\/p>\n<p>This line imports the MPI module from the mpi4py package, enabling the use of MPI functionalities.<\/p>\n<p>\n\n\n<\/p>\n<p>\n\n\n<\/p>\n<pre><code>comm = MPI.COMM_WORLD\nrank = comm.Get_rank()\nsize = comm.Get_size()<\/code><\/pre>\n<p>\n\n\n<\/p>\n<p>\n\n\n<\/p>\n<p>These lines initialize the MPI environment. <code>MPI.COMM_WORLD<\/code> creates a communicator object representing all processes in the MPI world. <code>comm.Get_rank()<\/code> returns the rank of the current process in the communicator, and <code>comm.Get_size()<\/code> returns the total number of processes in the communicator.<\/p>\n<p>\n\n\n<\/p>\n<p>\n\n\n<\/p>\n<p><strong>Function to Perform Matrix Multiplication<\/strong><\/p>\n<p>\n\n\n<\/p>\n<p>\n\n\n<\/p>\n<pre><code>def matrix_multiply(A, B):\n    C = np.zeros((A.shape[0], B.shape[1]))\n    for i in range(A.shape[0]):\n        for j in range(B.shape[1]):\n            for k in range(A.shape[1]):\n                C[i][j] += A[i][k] * B[k][j]\n    return C<\/code><\/pre>\n<p>\n\n\n<\/p>\n<p>\n\n\n<\/p>\n<p>This function <code>matrix_multiply<\/code> takes two matrices <code>A<\/code> and <code>B<\/code> as input and returns their multiplication <code>C<\/code>. It initializes an empty matrix <code>C<\/code> with dimensions derived from the multiplication of matrices <code>A<\/code> and <code>B<\/code>. Then, it performs matrix multiplication using nested loops to iterate through rows and columns of matrices <code>A<\/code> and <code>B<\/code>, computing each element of matrix <code>C<\/code>.<\/p>\n<p>\n\n\n<\/p>\n<p>\n\n\n<\/p>\n<p><strong>Master Process<\/strong><\/p>\n<p>\n\n\n<\/p>\n<p>\n\n\n<\/p>\n<pre><code>if rank == 0:\n    A = np.random.rand(2, 2)\n    B = np.random.rand(2, 2)<\/code><\/pre>\n<p>\n\n\n<\/p>\n<p>\n\n\n<\/p>\n<p>In the master process (rank 0), random matrices <code>A<\/code> and <code>B<\/code> of size 2&#215;2 are generated.<\/p>\n<p>\n\n\n<\/p>\n<p>\n\n\n<\/p>\n<pre><code>chunk_size = A.shape[0] \/\/ size\nA_chunks = [A[i:i+chunk_size] for i in range(0, A.shape[0], chunk_size)]<\/code><\/pre>\n<p>\n\n\n<\/p>\n<p>\n\n\n<\/p>\n<p>The matrices <code>A<\/code> is split into chunks based on the total number of processes (<code>size<\/code>). Each chunk is of size <code>chunk_size<\/code>, and the list <code>A_chunks<\/code> contains these chunks.<\/p>\n<p>\n\n\n<\/p>\n<p>\n\n\n<\/p>\n<pre><code>for i in range(1, size):\n    comm.send(A_chunks[i-1], dest=i, tag=1)\n    comm.send(B, dest=i, tag=2)<\/code><\/pre>\n<p>\n\n\n<\/p>\n<p>\n\n\n<\/p>\n<p>Parts of matrices <code>A<\/code> and <code>B<\/code> are sent to worker processes using <code>comm.send()<\/code>. Each chunk of <code>A<\/code> along with the entire matrix <code>B<\/code> is sent to a different worker process.<\/p>\n<p>\n\n\n<\/p>\n<p>\n\n\n<\/p>\n<pre><code>C_partial = matrix_multiply(A_chunks[0], B)<\/code><\/pre>\n<p>\n\n\n<\/p>\n<p>\n\n\n<\/p>\n<p>The master process calculates its own partial result of matrix multiplication using the first chunk of matrix <code>A<\/code>.<\/p>\n<p>\n\n\n<\/p>\n<p>\n\n\n<\/p>\n<pre><code>for i in range(1, size):\n    C_partial += comm.recv(source=i, tag=3)<\/code><\/pre>\n<p>\n\n\n<\/p>\n<p>\n\n\n<\/p>\n<p>The master process receives partial results from each worker process using <code>comm.recv()<\/code>, aggregates them, and stores the final result in <code>C_partial<\/code>.<\/p>\n<p>\n\n\n<\/p>\n<p>\n\n\n<\/p>\n<pre><code>print(\"Resulting matrix C:\")\nprint(C_partial)<\/code><\/pre>\n<p>\n\n\n<\/p>\n<p>\n\n\n<\/p>\n<p>Finally, the resulting matrix <code>C<\/code> is printed.<\/p>\n<p>\n\n\n<\/p>\n<p>\n\n\n<\/p>\n<p><strong>Worker Processes<\/strong><\/p>\n<p>\n\n\n<\/p>\n<p>\n\n\n<\/p>\n<pre><code>else:\n    A_chunk = comm.recv(source=0, tag=1)\n    B = comm.recv(source=0, tag=2)<\/code><\/pre>\n<p>\n\n\n<\/p>\n<p>\n\n\n<\/p>\n<p>In the worker processes (rank != 0), chunks of matrix <code>A<\/code> and entire matrix <code>B<\/code> are received from the master process using <code>comm.recv()<\/code>.<\/p>\n<p>\n\n\n<\/p>\n<p>\n\n\n<\/p>\n<pre><code>C_partial = matrix_multiply(A_chunk, B)<\/code><\/pre>\n<p>\n\n\n<\/p>\n<p>\n\n\n<\/p>\n<p>Each worker process performs matrix multiplication using its received chunk of matrix <code>A<\/code> and matrix <code>B<\/code>.<\/p>\n<p>\n\n\n<\/p>\n<p>\n\n\n<\/p>\n<pre><code>comm.send(C_partial, dest=0, tag=3)<\/code><\/pre>\n<p>\n\n\n<\/p>\n<p>\n\n\n<\/p>\n<p>The resulting partial matrix <code>C_partial<\/code> is sent back to the master process using <code>comm.send()<\/code>.<\/p><p><br><\/p>\n<p><\/p>\n<h2>Material<\/h2>\n<p style=\"color: #424242; font-family: 'Source Sans Pro', -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen-Sans, Ubuntu, Cantarell, 'Helvetica Neue', sans-serif;\"><a title=\"\" href=\"https:\/\/drive.google.com\/drive\/folders\/1vKaZIsBGLhzew2DKmmYfMCSrPAo4jXe-?usp=sharing\" target=\"_blank\" rel=\"noopener\">Download the programs (code), covering the MPI4Py.<\/a><\/p>\n<p>\n\n\n<\/p>\n<p><iframe title=\"MPI (MPI4PY) on Jupyter Tutorial\" width=\"640\" height=\"360\" src=\"https:\/\/www.youtube.com\/embed\/videoseries?list=PLGiqyN7d0mypGRrq2Bv9mXLGoeZPXTcZA\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe><\/p>","protected":false},"excerpt":{"rendered":"<p>In this scenario, each processor handles a portion of the matrices, performing computations independently, and then the results are combined to obtain the final result. This parallelization technique leverages the capabilities of multiple processors to expedite the overall computation time.&nbsp; Code: Explanation Import MPI Module and Initialize MPI Environment This line imports the MPI module from the mpi4py package, enabling the use of MPI functionalities. These lines initialize the MPI environment. MPI.COMM_WORLD creates a communicator object representing all processes in&#8230;<\/p>\n<p class=\"read-more\"><a class=\"btn btn-default\" href=\"https:\/\/afzalbadshah.com\/index.php\/2024\/05\/09\/matrix-multiplication-on-multi-processors-mpi4py\/\"> Read More<span class=\"screen-reader-text\">  Read More<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":3218,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"enabled":false},"version":2}},"categories":[506],"tags":[540,579,370],"class_list":["post-3211","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-mpi-with-python","tag-mpi","tag-mpi4y","tag-tutorial"],"aioseo_notices":[],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2024\/05\/MPI-Python-5-jpg.webp?fit=1280%2C720&ssl=1","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/pf3emP-PN","jetpack-related-posts":[],"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/afzalbadshah.com\/index.php\/wp-json\/wp\/v2\/posts\/3211","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/afzalbadshah.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/afzalbadshah.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/afzalbadshah.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/afzalbadshah.com\/index.php\/wp-json\/wp\/v2\/comments?post=3211"}],"version-history":[{"count":7,"href":"https:\/\/afzalbadshah.com\/index.php\/wp-json\/wp\/v2\/posts\/3211\/revisions"}],"predecessor-version":[{"id":3223,"href":"https:\/\/afzalbadshah.com\/index.php\/wp-json\/wp\/v2\/posts\/3211\/revisions\/3223"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/afzalbadshah.com\/index.php\/wp-json\/wp\/v2\/media\/3218"}],"wp:attachment":[{"href":"https:\/\/afzalbadshah.com\/index.php\/wp-json\/wp\/v2\/media?parent=3211"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/afzalbadshah.com\/index.php\/wp-json\/wp\/v2\/categories?post=3211"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/afzalbadshah.com\/index.php\/wp-json\/wp\/v2\/tags?post=3211"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}