Will we see an expected speedup in Chapel if running "inside" VMs?

Question

Will we see an expected speedup in Chapel if running "inside" VMs?

132 Views Asked by Kyle At 28 July 2025 at 05:16

I'm teaching with Chapel next semester and we are considering using a VM for students to program on instead of a physical machine. As part of class, I want students to be able to see speedup when using multiple threads. I fear that they won't be able to see this as the VM will act with implicit hyperthreading; one thread will run just as fast as many threads.

Does anyone have any experience with this? Is there any chance I can use a VM instead of a physical device?

Original Q&A

There are 2 best solutions below

user3666197 On 12 December 2017 at 09:25

Yes, but any speedup is way more a matter not of just a syntax-constructor, but of the problem's achievable ( [SEQ], [PAR] ) re-formulation:

With all due respect, professor, the Amdahl's Law is going against most of naive, just syntax-decorated efforts.

Contemporary criticism and re-formulation of the original Dr. Gene AMDAHL's argument has brought into account two major extensions:

overhead-strict formulation ( not to forget, that going from [SEQ] into [PAR] code-execution comes at a cost, always add-on costs, that go heavily against any expected ( actual add-on overheads costs agnostic ) speedup )
a principal limit of any [PAR]-execution granularity, at a finite, atomic-transaction level, where whatever further available resource, even in an infinite capacity, will not further improve the overall speed right due to a further indivisible scheduling "atomicity"

These both issues will dominate your education efforts way more than your actual VM-abstractions and would be indeed great to discuss in more detail all these impacts from scheduling-"blocking"-resources, not just the CPU-core(s) and hardware-threads ( onto which the O/S schedule ), be them physical or abstracted by the VM-hypervisor.

As the great CRAY Chapel team members has already noted many times, the real-hardware NUMA-issues are of great impact on final add-on overheads a high-level formulated syntax will actually inject into the real-platform processing, so the landscape is even wilder.

Virtual Machines:

Better inspect the VM-hypervisor generated VM-NUMA topology ( hwloc / lstopo ) to better decode VM-CPU-Cache architecture, your VM-sand-boxes will enjoy towards any hardware-directed low-level { C | assembly }-code, and one may imagine many "fooling" effects, if VM claims the vCPU has 8 independent vCPU-sockets, each having 4 independent vCPU-cores, each of which has a fully separate & autonomous hierarchy of non-shared vCPU-CACHE(s), none level of which is shared ( in spite of the facts, that the host's physical CPU(s) operate(s) principally shared L3_CACHE(s) ).

All this mis-directs any hardware-focused resources-optimiser's decisions ( and performance never goes up, if virtualisation missed the physical properties of the host ).

( One may also use a Live chapel platform at https://tio.run for tweaking and prototyping )

**Kyle** · Accepted Answer

Kyle On 08 March 2019 at 19:30 BEST ANSWER

We had success with a Virtual Machine! The VM we used for the whole class has:

16 CPUs
a 60 GB hard disk
4 GB RAM
3 ESXI hosts

The system also has umlimited IOPs. (Input/Outputs per second.)

I recommend this solution to other teachers.

Will we see an expected speedup in Chapel if running "inside" VMs?

There are 2 best solutions below

Virtual Machines:

Related Questions in MULTITHREADING

Related Questions in VIRTUAL-MACHINE

Related Questions in CHAPEL

Related Questions in PARALLELISM-AMDAHL

Trending Questions

Popular # Hahtags

Popular Questions