Performance

The main goals for kitty performance are user perceived latency while typing and “smoothness” while scrolling as well as CPU usage. kitty tries hard to find an optimum balance for these. To that end it keeps a cache of each rendered glyph in video RAM so that font rendering is not a bottleneck. Interaction with child programs takes place in a separate thread from rendering, to improve smoothness. Parsing of the byte stream is done using vector CPU instructions for maximum performance. Updates to the screen typically require sending just a few bytes to the GPU.

There are two config options you can tune to adjust the performance, repaint_delay and input_delay. These control the artificial delays introduced into the render loop to reduce CPU usage. See Performance tuning for details. See also the sync_to_monitor option to further decrease latency at the cost of some screen tearing while scrolling.

Benchmarks

Measuring terminal emulator performance is fairly subtle, there are three main axes on which performance is measured: Energy usage for typical tasks, Keyboard to screen latency, and throughput (processing large amounts of data).

Keyboard to screen latency

This is measured either with dedicated hardware, or software such as Typometer. Third party measurements comparing kitty with other terminal emulators on various systems show kitty has best in class keyboard to screen latency.

Note that to minimize latency at the expense of more energy usage, use the following settings in kitty.conf:

input_delay 0
repaint_delay 2
sync_to_monitor no
wayland_enable_ime no

Hardware based measurement on macOS show that kitty and Apple’s Terminal.app share the crown for best latency. These measurements were done with input_delay at its default value of 3 ms which means kitty’s actual numbers would be even lower.

Typometer based measurements on Linux show that kitty has far and away the best latency of the terminals tested.

Throughput

kitty has a builtin kitten to measure throughput, it works by dumping large amounts of data of different types into the tty device and measuring how fast the terminal parses and responds to it. The measurements below were taken with the same font, font size and window size for all terminals, and default settings, on the same computer. They clearly show kitty has the fastest throughput. To run the tests yourself, run kitten __benchmark__ in the terminal emulator you want to test, where the kitten binary is part of the kitty install.

The numbers are megabytes per second of data that the terminal processes. Measurements were taken under Linux/X11 with an AMD Ryzen 7 PRO 5850U. Entries are in order of decreasing performance. kitty is twice as fast as the next best.

Terminal

ASCII

Unicode

CSI

Images

Average

kitty 0.33

121.8

105.0

59.8

251.6

134.55

gnometerm 3.50.1

33.4

55.0

16.1

142.8

61.83

alacritty 0.13.1

43.1

46.5

32.5

94.1

54.05

wezterm 20230712

16.4

26.0

11.1

140.5

48.5

xterm 389

47.7

18.3

0.6

56.3

30.72

konsole 23.08.04

25.2

37.7

23.6

23.4

27.48

alacritty+tmux

30.3

7.8

14.7

46.1

24.73

In this table, each column represents different types of data. The CSI column is for data consisting of a mix of typical formatting escape codes and some ASCII only text.

Note

By default, the benchmark kitten suppresses actual rendering, to better focus on parser speed, you can pass it the --render flag to not suppress rendering. However, modern terminals typically render asynchronously, therefore the numbers are not really useful for comparison, as it is just a game about how much input to batch before rendering the next frame. However, even with rendering enabled kitty is still faster than all the rest. For brevity those numbers are not included.

Note

foot, iterm2 and Terminal.app are left out as they do not run under X11. Alacritty+tmux is included just to show the effect of putting a terminal multiplexer into the mix (halving throughput) and because alacritty isnt remotely comparable to any of the other terminals feature wise without tmux.

Note

konsole, gnome-terminal and xterm do not support the Synchronized update escape code used to suppress rendering, if and when they gain support for it their numbers are likely to improve by 20 - 50%, depending on how well they implement it.

Energy usage

Sadly, I do not have the infrastructure to measure actual energy usage so CPU usage will have to stand in for it. Here are some CPU usage numbers for the task of scrolling a file continuously in less. The CPU usage is for the terminal process and X together and is measured using htop. The measurements are taken at the same font and window size for all terminals on a Intel(R) Core(TM) i7-4820K CPU @ 3.70GHz CPU with a Advanced Micro Devices, Inc. [AMD/ATI] Cape Verde XT [Radeon HD 7770/8760 / R7 250X] GPU.

Terminal

CPU usage (X + terminal)

kitty

6 - 8%

xterm

5 - 7% (but scrolling was extremely janky)

termite

10 - 13%

urxvt

12 - 14%

gnome-terminal

15 - 17%

konsole

29 - 31%

As you can see, kitty uses much less CPU than all terminals, except xterm, but its scrolling “smoothness” is much better than that of xterm (at least to my, admittedly biased, eyes).

Instrumenting kitty

You can generate detailed per-function performance data using gperftools. Build kitty with make profile. Run kitty and perform the task you want to analyse, for example, scrolling a large file with less. After you quit, function call statistics will be displayed in KCachegrind. Hence, profiling is best done on Linux which has these tools easily available.