I am implementing a cache in Golang. Let's say the cache could be implemented as sync.Map with integer key and value as a struct:
type value struct {
fileName string
functionName string
}
Huge number of records have the same fileName and functionName. To save memory I want to use string pool. Go has immutable strings and my idea looks like:
var (
cache sync.Map
stringPool sync.Map
)
type value struct {
fileName string
functionName string
}
func addRecord(key int64, val value) {
fileName, _ := stringPool.LoadOrStore(val.fileName, val.fileName)
val.fileName = fileName.(string)
functionName, _ := stringPool.LoadOrStore(val.functionName, val.functionName)
val.functionName = functionName.(string)
cache.Store(key, val)
}
My idea is to keep every unique string (fileName and functionName) in memory once. Will it work?
Cache implementation must be concurrent safe. The number of records in the cache is about 10^8. The number of records in the string pool is about 10^6.
I have some logic that removes records from the cache. There is no problem with main cache size.
Could you please suggest how to manage string pool size?
I am thinking about storing reference count for every record in the string pool. It will require additional synchronizations or probably global locks to maintain it. I would like to implementation as simple as possible. You can see in my code snippet I don't use additional mutexes.
Or may be I need to follow completely different approach to minimize memory usage for my cache?
What you are trying to do with
stringPoolis commonly known as string interning. There are libraries like github.com/josharian/intern that provide "good enough" solutions to that kind of problem, and that do not require you to manually maintain thestringPoolmap. Note that no solution (including yours, assuming you eventually remove some elements fromstringPool) can reliably deduplicate 100% of strings without incurring impractical levels of CPU overhead.As a side note, it's worth pointing out that
sync.Mapis not really designed for update-heavy workloads. Depending on thekeys used, you may actually experience significant contention when callingcache.Store. Furthermore, sincesync.Maprelies oninterface{}for both keys and values, it normally incurs much more allocations that a plainmap. Make sure to benchmark with realistic workloads to ensure that you picked the right approach.