I'm using mmap to read a large file(50+GB), it's mainly about random read and I know mmap won't load the whole file content into memory but only the pages that I accessed.
The problem is, in my case, even if it's random read, I would only read each part of the file once, so I hope the pages I've already accessed swapped back to the disk such that I can save some memory.
For example, I hope OS swap the pages I've read back to disk when RSS used by mmap reach 1GB, is it possible?
Well, just don't think that way. Let me tell you what happens,
mmapjust maps the file contents into your virtual address space. Well, this sounds as if it has been loaded into memory, because anywhere you go in the file, it will appear there as if it had been loaded. When you access some part of the memorymmaped for the first time, there is a trap, from the kernel (the kernel treats the whole memory mapped segment as a forbidden place) and actually reads only one page from the file, the one you are accessing. This is as if the kernel had read one page of memory and put it there for you. You get the effect that the whole file has been read. But beware, as if you are just picking here and there, the kernel will be reading one page of memory to give you the impression that the whole file is at your disposition. The mechanism is exactly the same as the one used by the swapper to swap in the memory of your process into memory. Indeed this is what is done when your program loads a dynamic shared object or first loads the program code during theexec(2)system call... only the page immediately needed to start execution is actually read from the file, because the kernel doesn't know if your program will quickly call code in other places (that will trigger new pages to be swapped in) or will remain forever in that page (that will make the swapper happy because there is no work to be done)