I am referring to this question and the top voted answer :-
Why are elementwise additions much faster in separate loops than in a combined loop?
My question is, is there an easy way of determining the number of bits (call it N) that the specific cpu uses for address aliasing for load/store?
At the OS level: no. I'm not aware of any standard OS APIs (including anything in Linux or Win32) that give you any user-space visibility to CPU cache.
However, Intel provides some great tools for low-level performance analysis and optimization. For example,