Will we see an expected speedup in Chapel if running "inside" VMs?

140 Views Asked by At

I'm teaching with Chapel next semester and we are considering using a VM for students to program on instead of a physical machine. As part of class, I want students to be able to see speedup when using multiple threads. I fear that they won't be able to see this as the VM will act with implicit hyperthreading; one thread will run just as fast as many threads.

Does anyone have any experience with this? Is there any chance I can use a VM instead of a physical device?

2

There are 2 best solutions below

0
On BEST ANSWER

We had success with a Virtual Machine! The VM we used for the whole class has:

  • 16 CPUs
  • a 60 GB hard disk
  • 4 GB RAM
  • 3 ESXI hosts

The system also has umlimited IOPs. (Input/Outputs per second.)

I recommend this solution to other teachers.

1
On

Yes, but any speedup is way more a matter not of just a syntax-constructor, but of the problem's achievable ( [SEQ], [PAR] ) re-formulation:


With all due respect, professor, the Amdahl's Law is going against most of naive, just syntax-decorated efforts.

Contemporary criticism and re-formulation of the original Dr. Gene AMDAHL's argument has brought into account two major extensions:

  • overhead-strict formulation ( not to forget, that going from [SEQ] into [PAR] code-execution comes at a cost, always add-on costs, that go heavily against any expected ( actual add-on overheads costs agnostic ) speedup )

  • a principal limit of any [PAR]-execution granularity, at a finite, atomic-transaction level, where whatever further available resource, even in an infinite capacity, will not further improve the overall speed right due to a further indivisible scheduling "atomicity"

These both issues will dominate your education efforts way more than your actual VM-abstractions and would be indeed great to discuss in more detail all these impacts from scheduling-"blocking"-resources, not just the CPU-core(s) and hardware-threads ( onto which the O/S schedule ), be them physical or abstracted by the VM-hypervisor.

As the great CRAY Chapel team members has already noted many times, the real-hardware NUMA-issues are of great impact on final add-on overheads a high-level formulated syntax will actually inject into the real-platform processing, so the landscape is even wilder.


Virtual Machines:

Better inspect the VM-hypervisor generated VM-NUMA topology ( hwloc / lstopo ) to better decode VM-CPU-Cache architecture, your VM-sand-boxes will enjoy towards any hardware-directed low-level { C | assembly }-code, and one may imagine many "fooling" effects, if VM claims the vCPU has 8 independent vCPU-sockets, each having 4 independent vCPU-cores, each of which has a fully separate & autonomous hierarchy of non-shared vCPU-CACHE(s), none level of which is shared ( in spite of the facts, that the host's physical CPU(s) operate(s) principally shared L3_CACHE(s) ).

All this mis-directs any hardware-focused resources-optimiser's decisions ( and performance never goes up, if virtualisation missed the physical properties of the host ).


( One may also use a Live platform at https://tio.run for tweaking and prototyping )