Why redis instance gave Out Of Memory issue if server had plenty of memory available?

1.6k Views Asked by At

Redis setting and twemproxy:
We have six instances of Redis(32 bit) version 3.2.3 running on the same server (AWS EC2 instance r3.xlarge) along with Twemproxy. It was running smoothly until few days. Individual instances have started giving OOM issues. Not able to identify exact issue of Out of memory.

Redis conf:

  1. BG save is triggered every half an hour through a script for each instance in sequence. It is turned off in conf file.
  2. Kept maxmemory at 3.5 GB for all six redis instances.
  3. Eviction policy to volatile TTL
  4. Till now the peak usage of each redis instance is around 2.2GB so it has still not touched the 3.5 maxmemory limit.

Things I searched and fixed:

  1. Increased max connections to 65535
  2. vm.overcommit_memory to 1 (Can this be the reason for OOM?)
  3. THP disabled
  4. Increased ulimit

From what is understand if Redis requires memory it should use it from buffer memory. Output of free command:

             total       used       free     shared    buffers    cached
Mem:         30679      27430       3249          0        288      11481
-/+ buffers/cache:      15660      15019
Swap:            0          0          0

Output of top command: top command output

Error log of Redis:

1491:M 05 Jul 21:01:55.090 * Background saving terminated with success
1491:M 05 Jul 21:19:32.121 # Out Of Memory allocating 32774 bytes!


=== REDIS BUG REPORT START: Cut & paste starting from here ===
1491:M 05 Jul 21:19:32.121 # ------------------------------------------------
1491:M 05 Jul 21:19:32.121 # !!! Software Failure. Press left mouse button to continue
1491:M 05 Jul 21:19:32.122 # Guru Meditation: "Redis aborting for OUT OF MEMORY" #server.c:3817
1491:M 05 Jul 21:19:32.122 # (forcing SIGSEGV in order to print the stack trace)
1491:M 05 Jul 21:19:32.122 # ------------------------------------------------
1491:M 05 Jul 21:19:32.123 # Redis 3.2.3 crashed by signal: 11
1491:M 05 Jul 21:19:32.123 # Crashed running the instuction at: 0x80a5c4d
1491:M 05 Jul 21:19:32.123 # Accessing address: 0xffffffff
1491:M 05 Jul 21:19:32.123 # Failed assertion: <no assertion failed> (<no file>:0)

------ STACK TRACE ------
EIP:
/usr/local/bin/redis-server *:6381(_serverPanic+0x8d)[0x80a5c4d]

Backtrace:
/usr/local/bin/redis-server *:6381(logStackTrace+0x35)[0x80a7675]
/usr/local/bin/redis-server *:6381(sigsegvHandler+0xc2)[0x80a7bf2]
[0xf77dc420]
/usr/local/bin/redis-server *:6381(_serverPanic+0x8d)[0x80a5c4d]
/usr/local/bin/redis-server *:6381(redisOutOfMemoryHandler+0x3b)[0x806807b]
/usr/local/bin/redis-server *:6381(zmalloc+0x61)[0x8072621]
/usr/local/bin/redis-server *:6381(sdsMakeRoomFor+0x185)[0x8070245]
/usr/local/bin/redis-server *:6381(readQueryFromClient+0x87)[0x807b447]
/usr/local/bin/redis-server *:6381(aeProcessEvents+0x291)[0x80644f1]
/usr/local/bin/redis-server *:6381(aeMain+0x2c)[0x806477c]
/usr/local/bin/redis-server *:6381(main+0x454)[0x80614a4]
/lib32/libc.so.6(__libc_start_main+0xf3)[0xf75ddad3]
/usr/local/bin/redis-server *:6381[0x806175c]

------ INFO OUTPUT ------
# Server
redis_version:3.2.3
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:bcabd22adb39b278
redis_mode:standalone
os:Linux 3.13.0-74-generic x86_64
arch_bits:32
multiplexing_api:epoll
gcc_version:4.8.4
process_id:1491
run_id:653209594a744f112fb3c7915d3c349a2c65b912
tcp_port:6381
uptime_in_seconds:26753226
uptime_in_days:309
hz:10
lru_clock:6117348
executable:/usr/local/bin/redis-server
config_file:/etc/redis/redis_6381.conf

# Clients
connected_clients:20
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0

# Memory
used_memory:2326410216
used_memory_human:2.17G
used_memory_rss:2691825664
used_memory_rss_human:2.51G
used_memory_peak:2340860240
used_memory_peak_human:2.18G
total_system_memory:2105458688
total_system_memory_human:1.96G
used_memory_lua:25600
used_memory_lua_human:25.00K
maxmemory:3758096384
maxmemory_human:3.50G
maxmemory_policy:volatile-ttl
mem_fragmentation_ratio:1.16
mem_allocator:jemalloc-4.0.3

# Persistence
loading:0
rdb_changes_since_last_save:3132
rdb_bgsave_in_progress:0
rdb_last_save_time:1499288515
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:34
rdb_current_bgsave_time_sec:-1
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok

# Stats
total_connections_received:171113
total_commands_processed:82044062
instantaneous_ops_per_sec:47
total_net_input_bytes:127272847135
total_net_output_bytes:565881208424
instantaneous_input_kbps:20.80
instantaneous_output_kbps:1176.51
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:7334558
evicted_keys:0
keyspace_hits:35375621
keyspace_misses:8547031
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:44027
migrate_cached_sockets:0

# Replication
role:master
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

# CPU
used_cpu_sys:11970.49
used_cpu_user:8139.96
used_cpu_sys_children:12063.56
used_cpu_user_children:106498.20

# Commandstats
cmdstat_get:calls=41565702,usec=241468654,usec_per_call=5.81
cmdstat_set:calls=8389482,usec=67020049,usec_per_call=7.99
cmdstat_del:calls=591225,usec=2263300,usec_per_call=3.83
cmdstat_exists:calls=287486,usec=828495,usec_per_call=2.88
cmdstat_sadd:calls=10618626,usec=25885617,usec_per_call=2.44
cmdstat_sismember:calls=771569,usec=897763,usec_per_call=1.16
cmdstat_smembers:calls=174,usec=5667,usec_per_call=32.57
cmdstat_expire:calls=16306131,usec=37164332,usec_per_call=2.28
cmdstat_scan:calls=243377,usec=3670788,usec_per_call=15.08
cmdstat_auth:calls=14946,usec=32970,usec_per_call=2.21
cmdstat_bgsave:calls=14805,usec=295936307,usec_per_call=19988.94
cmdstat_lastsave:calls=75581,usec=188956,usec_per_call=2.50
cmdstat_flushall:calls=6,usec=406220,usec_per_call=67703.33
cmdstat_info:calls=118318,usec=5922315,usec_per_call=50.05
cmdstat_monitor:calls=17,usec=19,usec_per_call=1.12
cmdstat_ttl:calls=3046578,usec=7189430,usec_per_call=2.36
cmdstat_command:calls=39,usec=19330,usec_per_call=495.64

# Cluster
cluster_enabled:0

# Keyspace
db0:keys=394418,expires=353679,avg_ttl=5553166456
hash_init_value: 1473181388

------ CLIENT LIST OUTPUT ------
id=12 addr=127.0.0.1:44210 fd=16 name= age=26752669 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=get
id=13 addr=127.0.0.1:44211 fd=17 name= age=26752669 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=get
id=14 addr=127.0.0.1:44212 fd=18 name= age=26752669 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=get
id=15 addr=127.0.0.1:44213 fd=19 name= age=26752669 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=get
id=16 addr=127.0.0.1:44214 fd=20 name= age=26752669 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=get
id=17 addr=127.0.0.1:44215 fd=21 name= age=26752669 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=get
id=18 addr=127.0.0.1:44219 fd=22 name= age=26752669 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=set
id=19 addr=127.0.0.1:44220 fd=23 name= age=26752669 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=get
id=20 addr=127.0.0.1:44221 fd=24 name= age=26752669 idle=1 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=get
id=21 addr=127.0.0.1:44222 fd=25 name= age=26752669 idle=1 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=get
id=2 addr=127.0.0.1:44142 fd=6 name= age=26752852 idle=1 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=get
id=3 addr=127.0.0.1:44199 fd=7 name= age=26752669 idle=1 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=get
id=4 addr=127.0.0.1:44201 fd=8 name= age=26752669 idle=1 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=get
id=5 addr=127.0.0.1:44202 fd=9 name= age=26752669 idle=1 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=get
id=6 addr=127.0.0.1:44203 fd=10 name= age=26752669 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=get
id=7 addr=127.0.0.1:44204 fd=11 name= age=26752669 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=get
id=8 addr=127.0.0.1:44205 fd=12 name= age=26752669 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=get
id=9 addr=127.0.0.1:44207 fd=13 name= age=26752669 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=get
id=10 addr=127.0.0.1:44208 fd=14 name= age=26752669 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=get
id=11 addr=127.0.0.1:44209 fd=15 name= age=26752669 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=get

------ REGISTERS ------
1491:M 05 Jul 21:19:32.131 #
EAX:00000000 EBX:00000000 ECX:f776b420 EDX:00000000
EDI:00000002 ESI:00008006 EBP:00000005 ESP:ff881300
SS :0000002b EFL:ff881300 EIP:080a5c4d CS:00000023
DS :0000002b ES :0000002b FS :00000000 GS:00000063
1491:M 05 Jul 21:19:32.131 # (ff88130f) -> f7010c09
1491:M 05 Jul 21:19:32.131 # (ff88130e) -> 00000ee9
1491:M 05 Jul 21:19:32.131 # (ff88130d) -> 0812a6a0
1491:M 05 Jul 21:19:32.131 # (ff88130c) -> 081280e0
1491:M 05 Jul 21:19:32.131 # (ff88130b) -> 0806807b
1491:M 05 Jul 21:19:32.131 # (ff88130a) -> 00000000
1491:M 05 Jul 21:19:32.131 # (ff881309) -> 00000002
1491:M 05 Jul 21:19:32.131 # (ff881308) -> f7010c20
1491:M 05 Jul 21:19:32.131 # (ff881307) -> e3769a00
1491:M 05 Jul 21:19:32.131 # (ff881306) -> 00000000
1491:M 05 Jul 21:19:32.131 # (ff881305) -> 00008006
1491:M 05 Jul 21:19:32.131 # (ff881304) -> 00000ee9
1491:M 05 Jul 21:19:32.131 # (ff881303) -> 0812a6a0
1491:M 05 Jul 21:19:32.131 # (ff881302) -> 081280e0
1491:M 05 Jul 21:19:32.131 # (ff881301) -> 08131cac
1491:M 05 Jul 21:19:32.131 # (ff881300) -> 00000003

------ FAST MEMORY TEST ------
1491:M 05 Jul 21:19:32.133 # Bio thread for job type #0 terminated
1491:M 05 Jul 21:19:32.134 # Bio thread for job type #1 terminated
*** Preparing to test memory region 200000 (132120576 bytes)
*** Preparing to test memory region 816b000 (49152 bytes)
*** Preparing to test memory region 8200000 (14680064 bytes)
*** Preparing to test memory region 91a5000 (135168 bytes)
*** Preparing to test memory region 9200000 (3972005888 bytes)
*** Preparing to test memory region f5fff000 (8388608 bytes)
*** Preparing to test memory region f6800000 (8388608 bytes)
*** Preparing to test memory region f7000000 (4194304 bytes)
*** Preparing to test memory region f75c3000 (4096 bytes)
*** Preparing to test memory region f776c000 (12288 bytes)
*** Preparing to test memory region f7789000 (8192 bytes)
*** Preparing to test memory region f77d1000 (4096 bytes)
*** Preparing to test memory region f77da000 (4096 bytes)
*** Preparing to test memory region f77db000 (4096 bytes)
*** Preparing to test memory region f7800000 (134217728 bytes)
*** Preparing to test memory region ffa00000 (4194304 bytes)
1

There are 1 best solutions below

0
On

Few learnings -

  • Redis 32-bit instance can use upto 4G of maximum memory. It don't use the swap space or vm overcommit memory if 4G is utilized.

  • We should use Redis 32-bit instance when the actual usage is way less than 4G i.e 2-2.5 G. The space left is utilised by other things like input buffer etc.

  • Recommended max memory is 3G but it still gives OOM near 2.8-3 G of memory in our case. So it should be lesser than this.

  • To monitor usage of redis we should always look at RSS (Resident Set Size).