Can anyone highlight ways by which inter-core communication can be reduced in a NUMA multicore architecture. Case study Intel NEHALEM micro architecture.
Minimizing inter-core Communication in a NUMA architecture
1.3k Views Asked by Oyinlade Olumide At
1
There are 1 best solutions below
Related Questions in MULTICORE
- Is processor cache flushed during context switch in multicore?
- Does MATLAB support the parallelization of supervised machine learning algorithms? Alternatives?
- Memory barriers force cache coherency?
- Shared memory and concurrency on multi-core processors?
- How to do multi core programming in F#
- Meaning of cores and logical processors in intel icore
- Fill array with multiple threads in C
- Can any computer (multi or single core) run many threads at the same time
- Do user created processes in C run parallel by linux on multi core systems?
- How does OpenMP do thread allocation?
- Using multiple core on Zynq
- Multi-thread program(process) on multicore-core processor(s) with hyperthreading
- System.Threading.ThreadPool excluding a core?
- Extracting data from a raster brick using multiple threads
- OpenMP loop gives different result to the same serial loop
Related Questions in INTEL
- How can I compile *without* various instruction sets enabled?
- Restrict MKL optimized scipy to single thread
- Why is genymotion running so slowly?
- Intel VT-X not found
- Intel Edison with Kinect
- Formatting a MicroSD card within OSX
- Can I run Cuda or OpenCl on Intel processor graphics I7 (3rd or 4rd generation)
- Contrast reduction - intel x86
- x86 assembly fading bmp with linear interpolation
- Why I'm getting "error expected an expression" while compile cilk program
- Intel HAXM's intelhaxm-android.exe is not running
- Cordova - Media Plugin - Intel XDK - IOS build fail
- intel xdk: my links are not working
- running a python script that requires matplotlib gives: ImportError: undefined symbol: __libm_sse2_sincos
- To which cache a function pointer belongs to?
Related Questions in NEHALEM
- Why do Intel QPI chipsets have memory specifications?
- Number of banks in Nehalem l2 cache
- What is the maximum possible IPC can be achieved by Intel Nehalem Microarchitecture?
- Nehalem Xeon performance on 32-bit OS, XP vs 2003
- Memory access by multiple threads
- Mapping of memory addresses to physical modules in Windows XP
- Nehalem memory architecture address mapping
- floating point operations per cycle - intel
- Minimizing inter-core Communication in a NUMA architecture
- Software prefetching across page boundary on x86
- Unexpectedly large number of TLB misses in simple PAPI profiling on x86
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
The Nehalem processor uses QuickPath Interconnect (QPI) for inter-processor/node/package communication. In a NUMA system each node has its own local memory, which is shared with other nodes in the system. When the working set of a program fits in the L1 cache and is read-only then it doesn't matter much which NUMA node owns the memory. Communication between NUMA nodes is necessary when a core gets a cache miss and the memory is owned by another node. However, this doesn't mean that it is slower to access memory owned by another node, it depends on whether the other node has it cached in the cache associated with its local memory, what Intel calls the Last Level Cache (LLC). Access by a core to a memory location that is local to that node is faster than access to memory owned by another node, but only if it misses in the LLC on both nodes. It is faster to access memory that hits in the LLC on another node than it is to go to memory on the local node, that is because memory is so much slower than the CPU and QPI is optimized for this sort of communication. Most systems don't bother trying to reduce inter-processor communication because, as you can imagine, it is not an easy problem - it requires setting affinity of threads to cores, setting affinity of the memory working set of that thread to the local memory of that node. You can read more about this in Drepper Ulrich's paper, search for NUMA. In this paper Ulrich refers to QPI as Common System Interface (CSI), which was the Intel name for QPI before announcement.