I had a short look at the Forth programming language for a while. Is it possible to do multithreading with synchronization primitives in Forth?
For example, is it possible to do n-by-n matrix multiplication with multiple threads in Forth? If so, what is the basic mechanism, or programming patterns?
For now, Forth standard doesn't specify any multithreading or multitask related words. Although, many historic Forth implementations have such primitives, or allow to define them using Forth-assembler or API to underlying system.
As example, synchronization primitives and multithreading in SP-Forth/4 are mostly just generic wrappers over Windows and Linux (pthreads) APIs.
Note that a threads pool should be used to have better performance for small operations — since creating/destroying thread could be time-consuming operation.
Also it is possible that implementation of n-by-n matrix multiplication can get better gain from using SSE operations, or even GPU (see gpu.js for example).
In any way, the solution depends on particular Forth system.
Example (conceptual model)
Using matrices and thread-pool libraries, matrix multiplication could look like the following: