In my program I create 5 vectors, each with 1 million elements. When I compile my program with O3 optimization, it takes around 2 GB while running. However, if I compile with O3 opitimization and link with the tcmalloc library provided by google-perf it takes only 1.5 GB maximum resident set size. Can someone please explain to me why does this happen? Is linking against tcmalloc always better than linking against glibc malloc?
Why does linking with tcmalloc reduce my memory usage by 500MB?
1k Views Asked by Shubham At
1
There are 1 best solutions below
Related Questions in C++
- How to immediately apply DISPLAYCONFIG_SCALING display scaling mode with SetDisplayConfig and DISPLAYCONFIG_PATH_TARGET_INFO
- Why can't I use templates members in its specialization?
- How to fix "Access violation executing location" when using GLFW and GLAD
- Dynamic array of structures in C++/ cannot fill a dynamic array of doubles in structure from dynamic array of structures
- How do I apply the interface concept with the base-class in design?
- File refuses to compile std::erase() even if using -std=g++23
- How can I do a successful map when the number of elements to be mapped is not consistent in Thrust C++
- Can std::bit_cast be applied to an empty object?
- Unexpected inter-thread happens-before relationships from relaxed memory ordering
- How i can move element of dynamic vector in argument of function push_back for dynamic vector
- Brick Breaker Ball Bounce
- Thread-safe lock-free min where both operands can change c++
- Watchdog Timer Reset on ESP32 using Webservers
- How to solve compiler error: no matching function for call to 'dmhFS::dmhFS()' in my case?
- Conda CMAKE CXX Compiler error while compiling Pytorch
Related Questions in MEMORY-MANAGEMENT
- Polars with Rust: Out of Memory Error when Processing Large Dataset in Docker Using Streaming
- how is strncpy able to copy from source to empty destination?
- Mallocing int* inside of int** gives unexpected integer values in the first and sometimes second allocation
- How to prevent R from slowing down in long analysis besides freeing up memory?
- React Navigation: Navigate into page, increase RAM, navigate back and RAM stays high
- Java Memory UTF-16 Vs UTF-8
- How to protect a page so that it cannot be write in mips arch?
- How does pre-allocating a pool of SocketAsyncEventArgs objects upfront improve the performance of a server application in c#
- Finding total RAM consumption of process, including swap
- How do special libraries in C cause memory allocation to fail or interact improperly?
- Does CLR add overhead fields to type which value is null?
- How do I improve the performance of this C# code - looping through a DataTable and building a Dictionary?
- Numpy memmap still using RAM instead of disk while doing vector operation
- Does the Direct Memory Access (DMA) interfere with the execution of user program execution?
- How to read and process big csv file fast and keep memory usage low in java?
Related Questions in GOOGLE-PERFTOOLS
- Google Performance API dailySubEntityType
- Install tcmalloc from source to link without bazel?
- Install tcmalloc on CentOS
- google-pprof show result from ARM
- Google BigQuery DML - Slow performance when executing updates & deletes
- CPU profiler on Google Performance Tool (gperftools) - Process with shared library with NO OUTPUT ISSUE
- how to get tcmalloc static of all class
- Why does linking with tcmalloc reduce my memory usage by 500MB?
- gperftools failing to identify files
- Why does tcmalloc fail when I compile and run this program with a shared library?
- Why tcmalloc don't print function name, which provided via dlopen
- How does gperftools work under the hood?
- Is my using gperftools to profile a R script with RCpp correct?
- How to display symbols in stack trace of google-perftools heap profiler
- gperftools cpu profiler does not support multi process?
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular # Hahtags
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
tcmallocis page-oriented, meaning that the internal unit of measure is usually pages rather than bytes. This has the effect of making it easier to reduce fragmentation, and increase locality in various ways.tcmalloc` defines a page as 8192 bytes, which is actually 2 pages on most linux systems.
Chunks can be thought of as divided in to two top-level categories. "Small" chunks are smaller than kMaxPages (defaults to 128) and are further divided in to size classes and satisfied by the thread caches or the central per-size class caches. "Large" chunks are >= kMaxPages and are always satisfied by the central PageHeap.
more here : http://jamesgolick.com/2013/5/19/how-tcmalloc-works.html