Redis Performance Triage Handbook

Intro

I do a lot of work with growth-stage startups, and many of them use Redis for all sorts of things. Sometimes as a key/value store for caching, sometimes as a message queue, sometimes as a pub/sub message broker, etc. Redis is a great tool, with great performance, when used properly. However, I’ve often seen cases where it is not used with good performance in mind, often to the detriment of system uptime and customer satisfaction. High-performance, customer-facing products literally go down for hours per day due to relatively subtle or seemingly innocent misuses of Redis.

Redis Overview

I won’t go into too much detail about the particulars of Redis commands and capabilities here, since the assumption is that you already know those things and are wanting to make sure you either use them properly, or you have some kind of performance problem which requires correction.

From a performance perspective, it’s important to note some things about Redis. The foremost of these is that it is single-threaded.

Redis is single-threaded, by design (with small exception for background IO). If you run it on a machine with more than one hyperthread, all of the additional CPUs/cores/hyperthreads will be wasted. One Redis instance will run commands on one hyperthread (AKA AWS vCPU), no matter how many are present in the machine. If you need more performance, you will have to run multiple Redis instances, one per hyperthread, on different ports. If you run multiple Redis commands and some are faster than others, then all the commands will be blocked while each slow command is being run.

If you are using Redis on Amazon Web Services ElastiCache, this single-threaded nature means that using any AWS ElastiCache instances which have multiple vCPUs makes no sense from a CPU perspective. Your Redis will not use them (though Memcached will). Those larger ElastiCache instances are only useful for Redis if you need the additional memory space.

Performance triage

Look at `SLOWLOG`.

This will tell you which commands are taking a lot of time to run, and therefore blocking all the other commands. Remember, Redis only does one thing at a time. If you find some slow-running commands here, track them down in your code and see if there are ways to speed them up. Also note any appearance of evalsha, which denotes a LUA script. While using LUA inside Redis is fine in theory, and gives some nice functionality akin to stored procedures, it can be a performance problem. Consider the code below from the kue.js Javascript library:

var script =
        'local msg = redis.call( "keys", "' + prefix + ':jobs:*:inactive" )\n\
        local need_fix = 0\n\
        for i,v in ipairs(msg) do\n\
          local queue = redis.call( "zcard", v )\n\
          local jt = string.match(v, "' + prefix + ':jobs:(.*):inactive")\n\
          local pending = redis.call( "LLEN", "' + prefix + ':" .. jt .. ":jobs" )\n\
          if queue > pending then\n\
            need_fix = need_fix + 1\n\
            for j=1,(queue-pending) do\n\
              redis.call( "lpush", "' + prefix + ':"..jt..":jobs", 1 )\n\
            end\n\
          end\n\
        end\n\
        return need_fix';

This code can potentially be very slow, since it uses the KEYS command (among other things). Which leads us to…

No `KEYS` command. Ever.

Even the Redis site says this is for debugging only. This command stops the server while the keyspace is scanned for matches, which can take a really long time.

Warning: consider KEYS as a command that should only be used in production environments with extreme care. It may ruin performance when it is executed against large databases. This command is intended for debugging and special operations, such as changing your keyspace layout. Don’t use KEYS in your regular application code. If you’re looking for a way to find keys in a subset of your keyspace, consider using SCAN or sets.

Instead of KEYS, use SCAN, which like all the Redis commands has good documentation.

Don’t open a new connection for every command.

In Redis, establishing a new connection and tearing it down again is expensive. If you are opening a new connection for every command then you might be wasting 95% of your performance doing so. Keeping the connection alive and issuing commands over a persistent connection, in my local tests, is about 25 times faster than opening a new connection for each request. If you want to check for this on your current Redis instance, check the total_connections_received value in your INFO output. If it is high, then your application(s) might be opening a new connection for every request, and thus wasting a ton of capacity.

In the tests below, the -k 0 flag means do not keep the connection alive (i.e., start a new connection for each command).

$ redis-benchmark -k 0 -t get,set -q
WARNING: keepalive disabled, you probably need 'echo 1 > /proc/sys/net/ipv4/tcp_tw_reuse' for Linux and 'sudo sysctl -w net.inet.tcp.msl=1000' for Mac OS X in order to use a lot of clients/requests
SET: 8142.00 requests per second
GET: 7230.14 requests per second

We can see the huge performance increase if we change the benchmark and reuse the connections.

$ redis-benchmark -k 1 -t get,set -q
SET: 198807.16 requests per second
GET: 193423.59 requests per second

P.S. - If you haven’t used redis-benchmark before, now is a good time to check it out.

Use variadic versions of commands instead of using `SET` in a loop, for example.

If you have a list of key/value pairs you need to set, or some other operation you need to do repeatedly, use the variadic version of the command if it exists. Looping over a list and issuing 100 SET commands is a lot slower than using a single MSET with 100 key/value pairs. This slowness can compound if you are opening and closing a new connection for every command, which should not be the case if you are reusing connections as in the previous step. Check out the example with SET versus MSET.

$ redis-benchmark -q -t set,mset
SET: 158730.16 requests per second
MSET (10 keys): 178253.12 requests per second

This gets even more impressive if we use pipelining…

Pipeline your commands.

In cases where you want to further improve performance, or cannot use variadic commands, Redis supports pipelining, which lets you send a big batch of commands all at once. Doing this can dramatically speed up the number of commands your Redis instance can execute every second. Check out the below examples comparing zero pipelining to pipelining 75 commands.

$ redis-benchmark -k 1 -t get,set -q -P 75
SET: 1587301.50 requests per second
GET: 2127659.75 requests per secondh

This is a speedup of almost 300 times over the first benchmark with no connection keepalive. Pipelining can make a massive difference in throughput on your Redis if it is appropriate for your application.

Now check out the performance when we use connection keepalive, and use a variadic command for setting key/value pairs, and use pipelining.

$ redis-benchmark -q -P 75 -t set,mset
SET: 1754386.00 requests per second
MSET (10 keys): 387596.91 requests per second

Since MSET is sending 10 keys per request, and doing about 400,000 requests per second, we’re setting about 4 million keys, compared to about 2 million for the non-variadic version. Now we are 500 times faster than the initial version.

Provide expiration for a key along with the `SET` command, not as a separate `EXPIRE` command.

Because Redis is single-threaded, every unneccessary command is a waste and is blocking other commands. You can provide an expiration time/TTL in your SET command, so issuing a SET, followed immediately by an EXPIRE is a waste. Avoid using separate EXPIRE commands if at all possible.

Use blocking push/pop commands, where appropriate.

Redis can work well as a basic queing system, but you don’t want to tax the system for no reason by repeatedly checking the queue to see if there are items present. Instead, on the client, use the blocking version of push and pop like BLPUSH and BLPOP, respectively. This way the client will wait to do the operation instead of just issuing the standard command over and over again. This is a common mistake when building a worker to push queue items, and results in high CPU usage on the client system in addition to a lot of commands issued on the Redis instance for no reason. You should probably provide a timeout for the operation as well, unless you want to block forever.

Don’t use multiple databases.

Remember that due to the single-threaded nature of Redis, if one command is running, the next one is blocked. If you use multiple databases inside the same Redis instance, then you are requiring the use of the SELECT command to choose which database you want. There are other issues surrounding using multiple databases in Redis, and even Salvatore himself has said it isn’t a good idea and is a feature he wishes he could remove. If you’re using multiple databases, you’re probably just better off having a prefix in your keyspace to represent the different sets of data.

Don’t send an `AUTH` command with an empty string.

As with the SELECT, taking up CPU cycles with an AUTH command is not a good idea. If you don’t have any password on your Redis instance, make sure your code and/or the Redis library you are using is not sending an AUTH "" to the Redis server.

If memory usage is getting consistently high, `SCAN` your keyspace.

Redis generally handles key expiration very well, but sometimes it can be necessary to do a manual SCAN through the keyspace in order to expire keys. I wouldn’t do this as a matter of course since eviction seems to work well in current versions of Redis, but I do know of some customers using ElastiCache Redis instances on AWS that have to do this in order to keep their memory usage under control. The expiration of keys in Redis is quite good in general.

How Redis expires keys:

Redis keys are expired in two ways: a passive way, and an active way. A key is passively expired simply when some client tries to access it, and the key is found to be timed out. Of course, this is not a complete solution as there are expired keys that will never be accessed again. To address this, Redis periodically tests a few keys at random among keys with an expire set. All the keys that are expired are deleted from the keyspace. Specifically, this is what Redis does 10 times per second:
Test 20 random keys from the set of keys with an associated expire.
Delete all the keys found expired.
If more than 25% of keys were expired, start again from step 1.
This is a trivial probabilistic algorithm. Basically, the assumption is that our sample is representative of the whole key space. We continue to expire keys until the percentage of keys that are likely to be expired is under 25%. This means that at any given moment the maximum amount of keys already expired that are using memory is at most equal to the maximum amount of write operations per second divided by 4.

Fin

Above are some areas to examine if your Redis instance isn’t performing well. In general, you should be getting many thousands - if not millions - of commands from your Redis instance. Chances are that your application/site isn’t that busy, so you should be able to use Redis as your data structure server throughout your growth cycle. If you do happen to need additional Redis instances with some kind of load balancing or sharding, consider setting up twemproxy.