Skip to content

lkmokadam/TFIDF_MPI_OpenMP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

// lmokada - Laxmikant Kishor Mokadam


Q: 
Describe your implementation step-by-step. This should include descriptions of what MPI messages get sent/received by which rank, and in what order.

A:
1) Here in the implementation, I have tried to implement the system with one root and many workers. 
2) A system can be a root or a worker but not both at the same time. 
3) At first, we distribute the work and create the assignment array. 
4) Assignment array is like this [0,3,6]. This means worker 1 will work on file doc0,doc1, doc2, worker 2 will work on the doc3, doc4 doc5, worker 3 will work on doc6 and doc7. 
5) The respective worker reads the allocated file and fills the obj and u_w objects and send to the root node. 
6) Here, all workers send the data to nonblocking MPI_Isend. this allows them to work on the data more rather than waiting for the communications. 
7) In parallel, on the root side, root receives the obj and u_w objects. Root uses blocking receive call to make sure it gets all data before proceeding further in calculation of TFIDF values. 
8) We can know the number of receive request for both u_w by adding all uw_idx values of all processors. This can be achieved from the reduce operations on these variables. 
9) Now on the root side, we will gather all the u_w object using Gather operation. Then we calculate all the numDocswithWord of the respective word, and then update u_w objects and broadcast the data to the nodes using broadcast operation. This will give nodes all required data to calculate TFIDF.
10) Now, each node will calculate the TFIDF values of the respective word and send them back to the root.
11) Now on the root, we ill just open a file and store all the TFIDF value into the file.

Direction to make and run extra credit program

Make:  make -f Makefile_extra
RUN:   mpirun -np 4 ./TFIDF_extra
Note: this gives output in output_extra.txt


Q: Describe how you could add more parallelism to your code so that all of the processors on each MPI node are used instead of only one processor per MPI node.
A: 
At first, each node will read the individual file. This was a tough decision and took most of the time in the assignment. I have tried all three type of file sharing and this one was giving me best results theoretically as well as practically. This is because, generally the documents such as news articles, messages, tweets are of very small size in words may be 100s. If we divide a file among the processors, each process will get very less work and overhead causing due to message passing(especially latency)/sharing of the data is very large.  (Thus consideration depends on the application area. I have considered the application area with less than 1000 words of text like social networking, news articles etc. I thought of this application area as the input files given were very small.)
If we give each thread a file, it will have enough work to do to hide the overhead of directory contention. We can also reduce the directory contention by using replicas of the files. Thus, in an extra credit program I have used OpenMP threads to parallelize the work on the processors of single node, and MPI to parallelize the work over different nodes.
Then in second part of the program, we calculate the TFIDF values of the word, here we can simply use the OpenMP to parallelize the code using parallelized for loop.

Q: Implement the additional parallelism you just described. Submit this as TFIDF_extra.c (Submitted) . Compare this implementation to your MPI implementation. 
A: 
This extra credit implementation has the ability to use multiple cores available on the node to parallelize the code more. As we have Opteron processor and have 16 logical threads, I have increased the MAX_WORDS_IN_CORPUS by multiple of 16 as there can be a case when the same word can be duplicated due to multiple threads. The deduplication has been handled when we aggregate the results at the root. Also, to make sure the program works correctly few variables are made shared/private. shared(TFIDF) shared(TF_idx) shared(uw_idx) as theses are the data collector of the worker program. Also int contains,docSize has made private by declaring them inside the camp threads code. Additionally, as described above, each thread is opening the file to read the data. The reason is also explained above.


Q: Compare your MPI implementation to the previous MapReduce and Spark implementations of TFIDF. 
A: 
MPI Implementation:     0m0.297s
Spark Implementation:   0m8.825s
Hadoop Implementation: 0m4.833s

Note: TIme calculated as the total wall clock time taken to execute. As this will have all the overhead comes with the framework like overhead of mpirun is also be included in MPI timing. 
Here in the implementation we have implemented the code only for TFIDF, we didn't used any higher level framework like MapReduce or Spark just MPI libraries. There is no API development or the portability considered. But in the case of the MapReduce/Spark, they want to provide the API for different application areas. Implementing the TFIDF comes with the abstraction layer they are providing for various facilities such as portability, resilence, etc. This increases the abstraction layer, increasing the execution time.
Comparing with Hadoop MapReduce implementation, Here implementation can be observed as the MapReduce operation as we are reading and creating the u_w and obj objects on each node using map like implementation. Then we implemented the reduce like operation on u_w. Root collects all the u_w objects and fills them with the correct numOfDocwithWord. Now again we have implemented the Map like operation to calculate the TFIDF values and the Reduce like operation to gather the data on root and save to the file. As all the operations are being performed on the memory only, this is like memory map-reduce operations. As compared to the MapReduce/Hadoop implementation, this will surely give high performance as it has no disk I/O operations. each disk I/O operation comes with a lot of overhead increasing the execution time. But as our code running using only the memory, not disk, provides significant advantage for performance. 
Comparing this to spark which uses memory operation, this is very fast. this is because we cannot implement such algorithm due to the limitation of its API/mechanism which restricts our implementation ability. In other words, we don't have to follow some strict rules ( like output should be in key pair format only, etc.)of the predefined mechanism, thus we were free to implement and optimize whatever we want. This freedom provides us the raw like computation performance of MPI which can not be achieved through frameworks.
But, our code does not have any resilience capability which comes in hereditary if we use the Hadoop MapReduce or The Apache Spark. So for that, if we add use some kind of resilience technique, it will slow down our code. But do we really need this? Yes, if we want to work on the very large data, like in tweets of all Twitter users in a day. So at that time, we will need to find the compromise between performance and reliability of the system. In other words, various facilities comes with Hadoop/Spark ( which we didn't use, but that are there which has motivated the internal implementation of API's of framework), these facilities are great for real-world problems, but we are not using them, so these things are just becoming overhead from our perspective( if we compare only correctness of programs, not reliability, resource utilization etc)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published