-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Per Thread Fast Forwarding #138
Comments
The easiest way is to change your code. Normally I would imagine the main thread spawns a bunch of worker threads and then wait in idle until all workers finish. You can reorganize your code to move the work in the main thread before or after the parallel section, then fast forward this part has no effects on worker threads. If the work in the main thread has to happen in parallel with the workers (due to communication, synchronization, etc.), then you probably should not fast forward it since it will affect the performance of your region of interest. |
Define a new Magic op that, when called (by each thread individually), reads out the per-thread cycle count and subtract it later.
|
I am developing a simulator for some new architecture with a new programming model. One, I don't want to impose too many constraints on how the programs should be written. Second, there is a specific reason why I want to ignore cycles spent inside specific functions. Cycles spent inside the current implementation of these functions have no real world deployment significance. |
As Heiner suggested above, you can define a new magic op. The current fast-forward is deferred to the end of phase to be synced. Take a look at the logic in |
Thanks hlitz and gaomy. I implemented this by counting the sum total of cycles spent inside the API calls for each thread, and then simply writing out that count to a per-core statistic like the regular cycle count. It required changes to simple_core.cpp, ooo_core.cpp, etc in addition to zsim.cpp and pin_cmd.cpp. I did not add a new magic call. Simulator seems to be working properly with this change. It was a bit of an intrusive change to the simulator, but I guess that is ok as long as it works without simulation slowdown. |
If you do that, remember that the whole memory hierarchy does see what happened during your magic functions. If those functions are very short and/or non-memory intensive then I think it's fine. If that's not the case, I would suggest you leverage the NullCore, a perfect IPC=1 core, to better model what you want. Basically, add one NullCore in your system. When encountering magic functions, schedule that thread to the NullCore. Once it finishes those functions, schedule it back to the OOO core. |
@benpatsai. Thanks, that's a very good point. I ignored that. I had tried something similar to your suggestion. I had fast forwarded the thread (the main thread in the code) on entering each magic function and exited fast forward on exit of each magic function. But the problem I faced was that a different thread gets scheduled on the core on which thread 0 was initially running. I want thread 0 to ALWAYS run on the first configured core (which I configure as OoO) and rest of the threads run on the other cores that are configured as simpleCores. This happens because thread 1 is created (using pthread_create) after thread 0 encounters the magic call and goes into fast-forward, thus leaving core 0 free for the taking for thread 1. I think I will face the same issue if I follow your NullCore suggestion, won't I? |
So that boils down to how to schedule a particular thread from a particular core to another core. One example implementation can be:
|
@benpatsai. Thanks a lot for your suggestion. However, I think the approach of just measuring and subtracting cycle count in certain functions should work fine for my use case. The functions code should not alter the caching effects by much. |
@benpatsai. I am trying to implement your suggestion of having two core masks and scheduling main thread between the two cores in the mask for the main thread. I am not quite sure how to implement the point 4 in your answer above. Can you direct me to the relevant code in the simulator that will help me figure it out? |
@vijay4454, you can look at process_tree.{cpp,h} to see how the mask is parsed from the config file. And by tracing down ProcessTreeNode.mask, you should be able to learn how/when the scheduler uses it to schedule threads to a set of cores. |
@vijay4454, you may want to take a look at my pending pull request #114, which implements such affinity scheduling. I did it through the standard sched_get/setaffinity syscalls. You can reuse the internal logic with whatever interface you want to use. |
Thanks benpatsai & gaomy3832! I have been able to implement this and get it working. |
Hello
I am wondering if there is an easy way to do fast forwarding for a specific thread (thread 0) in a single process multiple thread simulation. I have a pthread program that I need to simulate on large core count system. One of the cores is OoO while others are simple in-order cores. I need ZSim to ignore/not count the cycles spent by the main thread (that runs on the OoO core) in specific functions.
I tried implementing this feature in ZSim by instrumenting the binary and placing specific handlers before and after those specific functions (whose names I pass to the PIN tool through pin_cmd.cpp). Inside the handler code (which takes thread ID as argument), I invoke the EnterFastForward() or ExitFastForward() as appropriate. However, I realize that it fast forwards the entire process, which means it is fast forwarding the other threads besides thread 0.
Is there an easy way to get around this problem and fast forward just thread 0? If not, what would you recommend is the least intrusive/easiest way to do this?
Thanks
The text was updated successfully, but these errors were encountered: