Skip to content

CPU resources

Zarquan edited this page Nov 6, 2020 · 3 revisions

The following calculations assume that the example analysis provided by Roger Mor is totally dependent on the available cpu cores and not limited by disc I/O bandwidth, network bandwidth or any of the other myriad resource limits and metrics.

That is probably not a valid assumption, but until we have run multiple tests with multiple configurations, we have to assume it is true for now.

If it takes H cpu hours to complete Roger's analysis, then how many cores we need depends on how fast we want to complete it. If we want to complete the whole thing in 1 hour, then we need H vCPU cores.

cpuhr = 10^5
time  = 1hr
cores = 10^5 cores

If we are happy for it to take 20 days, then

cpuhr = 10^5
time  =  20 days
      = 480 hrs
cores = 10^5/480
      = 200 cores

If we are happy to allocate the whole system to that one analysis then we need 200 cores. If we want to be able to support 10 people on the system at the same time, then we need 10 times the resources.

The live system we were using last month had 80 cores and 250G of memory. In theory, a system like that would be able to complete Roger's analysis in 52 days

cores = 80
time  = 10^5/80
      = 1250 hours
      = 52 days

The live system deployed a few days ago has 36 cores and 138G of memory.

cores = 36
time  = 10^5/36
      = 2777 hours
      = 115 days

This is what we were planning to offer a standard user account, giving us enough space to support 10 concurrent users at the same time. We can change that ratio to allow fewer larger accounts, or give the system to one huge account if we want to.


As of September we have a total allocation of 1200 vCPU cores and 2.25 Tbytes of memory spread over three projects. (*) I think one virtual vCPU core is 1/2 of a physical pCPU core.

Divided between the three projects, dev, test and live ; each project has 400 vCPU cores and 768Gbytes of memory. If we allocated all the available cores and memory to one user running Spark, then in theory it could complete Roger's analysis in 10 days.

cores = 400
time  = 10^5/400
      = 250  hours
      = 10 days

So - TL;DR; 1200 vCPU cores and 2.25 Tbytes of memory is enough for

  • 3 projects each capable of completing one instance of Roger's analysis in 10 days
  • 3 projects each (in theory) capable of supporting 10 users running the current Kounkel & Covey example in ~7min