Per Thread Fast Forwarding #138

vijay4454 · 2016-09-27T14:53:35Z

Hello

I am wondering if there is an easy way to do fast forwarding for a specific thread (thread 0) in a single process multiple thread simulation. I have a pthread program that I need to simulate on large core count system. One of the cores is OoO while others are simple in-order cores. I need ZSim to ignore/not count the cycles spent by the main thread (that runs on the OoO core) in specific functions.

I tried implementing this feature in ZSim by instrumenting the binary and placing specific handlers before and after those specific functions (whose names I pass to the PIN tool through pin_cmd.cpp). Inside the handler code (which takes thread ID as argument), I invoke the EnterFastForward() or ExitFastForward() as appropriate. However, I realize that it fast forwards the entire process, which means it is fast forwarding the other threads besides thread 0.

Is there an easy way to get around this problem and fast forward just thread 0? If not, what would you recommend is the least intrusive/easiest way to do this?

Thanks

gaomy3832 · 2016-09-27T17:47:23Z

The easiest way is to change your code. Normally I would imagine the main thread spawns a bunch of worker threads and then wait in idle until all workers finish. You can reorganize your code to move the work in the main thread before or after the parallel section, then fast forward this part has no effects on worker threads. If the work in the main thread has to happen in parallel with the workers (due to communication, synchronization, etc.), then you probably should not fast forward it since it will affect the performance of your region of interest.

hlitz · 2016-09-27T17:55:41Z

Define a new Magic op that, when called (by each thread individually), reads out the per-thread cycle count and subtract it later.

On Sep 27, 2016, at 8:00 AM, vijay4454 [email protected] wrote:

Hello

I am wondering if there is an easy way to do fast forwarding for a specific thread (thread 0) in a single process multiple thread simulation. I have a pthread program that I need to simulate on large core count system. One of the cores is OoO while others are simple in-order cores. I need ZSim to ignore/not count the cycles spent by the main thread (that runs on the OoO core) in specific functions.

I tried implementing this feature in ZSim by instrumenting the binary and placing specific handlers before and after those specific functions (whose names I pass to the PIN tool through pin_cmd.cpp). Inside the handler code (which takes thread ID as argument), I invoke the EnterFastForward() or ExitFastForward() as appropriate. However, I realize that it fast forwards the entire process, which means it is fast forwarding the other threads besides thread 0.

Is there an easy way to get around this problem and fast forward just thread 0? If not, what would you recommend is the least intrusive/easiest way to do this?

Thanks

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub #138, or mute the thread https://github.com/notifications/unsubscribe-auth/ADZKfnfGklfbht-SJhd42Jaul51yBr5oks5quS-UgaJpZM4KHxEX.

vijay4454 · 2016-09-28T15:47:04Z

I am developing a simulator for some new architecture with a new programming model. One, I don't want to impose too many constraints on how the programs should be written. Second, there is a specific reason why I want to ignore cycles spent inside specific functions. Cycles spent inside the current implementation of these functions have no real world deployment significance.

gaomy3832 · 2016-09-28T20:25:43Z

As Heiner suggested above, you can define a new magic op. The current fast-forward is deferred to the end of phase to be synced. Take a look at the logic in Join(), TakeBarrier() in zsim.cpp to see how to do immediate join and leave of the threads (set cids, call sched->join()/leave(), set fPtrs, etc.).

vijay4454 · 2016-09-28T20:43:35Z

Thanks hlitz and gaomy. I implemented this by counting the sum total of cycles spent inside the API calls for each thread, and then simply writing out that count to a per-core statistic like the regular cycle count. It required changes to simple_core.cpp, ooo_core.cpp, etc in addition to zsim.cpp and pin_cmd.cpp. I did not add a new magic call. Simulator seems to be working properly with this change.

It was a bit of an intrusive change to the simulator, but I guess that is ok as long as it works without simulation slowdown.

benpatsai · 2016-09-28T20:59:21Z

If you do that, remember that the whole memory hierarchy does see what happened during your magic functions. If those functions are very short and/or non-memory intensive then I think it's fine.

If that's not the case, I would suggest you leverage the NullCore, a perfect IPC=1 core, to better model what you want. Basically, add one NullCore in your system. When encountering magic functions, schedule that thread to the NullCore. Once it finishes those functions, schedule it back to the OOO core.

vijay4454 · 2016-09-28T21:36:58Z

@benpatsai. Thanks, that's a very good point. I ignored that.

I had tried something similar to your suggestion. I had fast forwarded the thread (the main thread in the code) on entering each magic function and exited fast forward on exit of each magic function. But the problem I faced was that a different thread gets scheduled on the core on which thread 0 was initially running. I want thread 0 to ALWAYS run on the first configured core (which I configure as OoO) and rest of the threads run on the other cores that are configured as simpleCores. This happens because thread 1 is created (using pthread_create) after thread 0 encounters the magic call and goes into fast-forward, thus leaving core 0 free for the taking for thread 1.

I think I will face the same issue if I follow your NullCore suggestion, won't I?

benpatsai · 2016-09-29T01:42:43Z

So that boils down to how to schedule a particular thread from a particular core to another core. One example implementation can be:

Implement a magic op to distinguish the main thread from other threads (like register thread)
Make the process have multiple core masks, one for the OOO core + the Null core, and one for in-order cores.
For non-main thread, use the in-order core mask. For the main thread, use the other mask. You can achieve this by setting the mask vector in the ThreadInfo for a thread.
When running into the magic functions, schedule the main thread between those two cores within that mask.

vijay4454 · 2016-10-03T14:50:59Z

@benpatsai. Thanks a lot for your suggestion. However, I think the approach of just measuring and subtracting cycle count in certain functions should work fine for my use case. The functions code should not alter the caching effects by much.

vijay4454 · 2016-10-21T21:57:12Z

@benpatsai. I am trying to implement your suggestion of having two core masks and scheduling main thread between the two cores in the mask for the main thread. I am not quite sure how to implement the point 4 in your answer above. Can you direct me to the relevant code in the simulator that will help me figure it out?

benpatsai · 2016-11-04T05:07:30Z

@vijay4454, you can look at process_tree.{cpp,h} to see how the mask is parsed from the config file. And by tracing down ProcessTreeNode.mask, you should be able to learn how/when the scheduler uses it to schedule threads to a set of cores.

gaomy3832 · 2016-11-04T16:02:32Z

@vijay4454, you may want to take a look at my pending pull request #114, which implements such affinity scheduling. I did it through the standard sched_get/setaffinity syscalls. You can reuse the internal logic with whatever interface you want to use.

vijay4454 · 2016-11-04T16:10:08Z

Thanks benpatsai & gaomy3832! I have been able to implement this and get it working.

heojun18 mentioned this issue Mar 14, 2018

Reschedule thread between different core types #191

Open

jangjae mentioned this issue Mar 19, 2018

Omitting xchg overhead #194

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Per Thread Fast Forwarding #138

Per Thread Fast Forwarding #138

vijay4454 commented Sep 27, 2016

gaomy3832 commented Sep 27, 2016

hlitz commented Sep 27, 2016

vijay4454 commented Sep 28, 2016 •

edited

Loading

gaomy3832 commented Sep 28, 2016

vijay4454 commented Sep 28, 2016

benpatsai commented Sep 28, 2016

vijay4454 commented Sep 28, 2016 •

edited

Loading

benpatsai commented Sep 29, 2016

vijay4454 commented Oct 3, 2016

vijay4454 commented Oct 21, 2016 •

edited

Loading

benpatsai commented Nov 4, 2016

gaomy3832 commented Nov 4, 2016 •

edited

Loading

vijay4454 commented Nov 4, 2016

Per Thread Fast Forwarding #138

Per Thread Fast Forwarding #138

Comments

vijay4454 commented Sep 27, 2016

gaomy3832 commented Sep 27, 2016

hlitz commented Sep 27, 2016

vijay4454 commented Sep 28, 2016 • edited Loading

gaomy3832 commented Sep 28, 2016

vijay4454 commented Sep 28, 2016

benpatsai commented Sep 28, 2016

vijay4454 commented Sep 28, 2016 • edited Loading

benpatsai commented Sep 29, 2016

vijay4454 commented Oct 3, 2016

vijay4454 commented Oct 21, 2016 • edited Loading

benpatsai commented Nov 4, 2016

gaomy3832 commented Nov 4, 2016 • edited Loading

vijay4454 commented Nov 4, 2016

vijay4454 commented Sep 28, 2016 •

edited

Loading

vijay4454 commented Sep 28, 2016 •

edited

Loading

vijay4454 commented Oct 21, 2016 •

edited

Loading

gaomy3832 commented Nov 4, 2016 •

edited

Loading