The terminal graphics protocol

The goal of this specification is to create a flexible and performant protocol that allows the program running in the terminal, hereafter called the client, to render arbitrary pixel (raster) graphics to the screen of the terminal emulator. The major design goals are

  • Should not require terminal emulators to understand image formats.

  • Should allow specifying graphics to be drawn at individual pixel positions.

  • The graphics should integrate with the text, in particular it should be possible to draw graphics below as well as above the text, with alpha blending. The graphics should also scroll with the text, automatically.

  • Should use optimizations when the client is running on the same computer as the terminal emulator.

For some discussion regarding the design choices, see #33.

To see a quick demo, inside a kitty terminal run:

kitty +kitten icat path/to/some/image.png

You can also see a screenshot with more sophisticated features such as alpha-blending and text over graphics.

Demo of graphics rendering in kitty

Some programs and libraries that use the kitty graphics protocol:

  • - a terminal PDF/DJVU/CBR viewer

  • ranger - a terminal file manager, with image previews, see this PR

  • kitty-diff - a side-by-side terminal diff program with support for images

  • pixcat - a third party CLI and python library that wraps the graphics protocol

  • neofetch - A command line system information tool

  • viu - a terminal image viewer

  • glkitty - C library to draw OpenGL shaders in the terminal with a glgears demo

  • - Library for drawing graphics

  • timg <> - a terminal image and video viewer

  • notcurses - C library for terminal graphics with bindings for C++, Rust and Python

  • rasterm - Go library to display images in the the terminal

Getting the window size

In order to know what size of images to display and how to position them, the client must be able to get the window size in pixels and the number of cells per row and column. This can be done by using the TIOCGWINSZ ioctl. Some code to demonstrate its use

In C:

#include <stdio.h>
#include <sys/ioctl.h>

int main(int argc, char **argv) {
    struct winsize sz;
    ioctl(0, TIOCGWINSZ, &sz);
    printf("number of rows: %i, number of columns: %i, screen width: %i, screen height: %i\n", sz.ws_row, sz.ws_col, sz.ws_xpixel, sz.ws_ypixel);
    return 0;

In Python:

import array, fcntl, sys, termios
buf = array.array('H', [0, 0, 0, 0])
fcntl.ioctl(sys.stdout, termios.TIOCGWINSZ, buf)
print('number of rows: {}, number of columns: {}, screen width: {}, screen height: {}'.format(*buf))

Note that some terminals return 0 for the width and height values. Such terminals should be modified to return the correct values. Examples of terminals that return correct values: kitty, xterm

You can also use the CSI t escape code to get the screen size. Send <ESC>[14t to STDOUT and kitty will reply on STDIN with <ESC>[4;<height>;<width>t where height and width are the window size in pixels. This escape code is supported in many terminals, not just kitty.

A minimal example

Some minimal python code to display PNG images in kitty, using the most basic features of the graphics protocol:

import sys
from base64 import standard_b64encode

def serialize_gr_command(cmd, payload=None):
   cmd = ','.join('{}={}'.format(k, v) for k, v in cmd.items())
   ans = []
   w = ans.append
   w(b'\033_G'), w(cmd.encode('ascii'))
   if payload:
   return b''.join(ans)

def write_chunked(cmd, data):
   data = standard_b64encode(data)
   while data:
      chunk, data = data[:4096], data[4096:]
      m = 1 if data else 0
      cmd['m'] = m
      sys.stdout.buffer.write(serialize_gr_command(cmd, chunk))

with open(sys.argv[-1], 'rb') as f:
   write_chunked({'a': 'T', 'f': 100},

Save this script as, then you can use it to display any PNG file in kitty as:

python file.png

The graphics escape code

All graphics escape codes are of the form:

<ESC>_G<control data>;<payload><ESC>\

This is a so-called Application Programming Command (APC). Most terminal emulators ignore APC codes, making it safe to use.

The control data is a comma-separated list of key=value pairs. The payload is arbitrary binary data, base64-encoded to prevent interoperation problems with legacy terminals that get confused by control codes within an APC code. The meaning of the payload is interpreted based on the control data.

The first step is to transmit the actual image data.

Transferring pixel data

The first consideration when transferring data between the client and the terminal emulator is the format in which to do so. Since there is a vast and growing number of image formats in existence, it does not make sense to have every terminal emulator implement support for them. Instead, the client should send simple pixel data to the terminal emulator. The obvious downside to this is performance, especially when the client is running on a remote machine. Techniques for remedying this limitation are discussed later. The terminal emulator must understand pixel data in three formats, 24-bit RGB, 32-bit RGBA and PNG. This is specified using the f key in the control data. f=32 (which is the default) indicates 32-bit RGBA data and f=24 indicates 24-bit RGB data and f=100 indicates PNG data. The PNG format is supported for convenience and a compact way of transmitting paletted images.

RGB and RGBA data

In these formats the pixel data is stored directly as 3 or 4 bytes per pixel, respectively. The colors in the data must be in the sRGB color space. When specifying images in this format, the image dimensions must be sent in the control data. For example:


Here the width and height are specified using the s and v keys respectively. Since f=24 there are three bytes per pixel and therefore the pixel data must be 3 * 10 * 20 = 600 bytes.

PNG data

In this format any PNG image can be transmitted directly. For example:


The PNG format is specified using the f=100 key. The width and height of the image will be read from the PNG data itself. Note that if you use both PNG and compression, then you must provide the S key with the size of the PNG data.


The client can send compressed image data to the terminal emulator, by specifying the o key. Currently, only RFC 1950 ZLIB based deflate compression is supported, which is specified using o=z. For example:


This is the same as the example from the RGB data section, except that the payload is now compressed using deflate (this occurs prior to base64-encoding). The terminal emulator will decompress it before rendering. You can specify compression for any format. The terminal emulator will decompress before interpreting the pixel data.

The transmission medium

The transmission medium is specified using the t key. The t key defaults to d and can take the values:

Value of t



Direct (the data is transmitted within the escape code itself)


A simple file


A temporary file, the terminal emulator will delete the file after reading the pixel data. For security reasons the terminal emulator should only delete the file if it is in a known temporary directory, such as /tmp, /dev/shm, TMPDIR env var if present and any platform specific temporary directories.


A POSIX shared memory object. The terminal emulator will delete it after reading the pixel data

Local client

First let us consider the local client techniques (files and shared memory). Some examples:

<ESC>_Gf=100,t=f;<encoded /path/to/file.png><ESC>\

Here we tell the terminal emulator to read PNG data from the specified file of the specified size:

<ESC>_Gs=10,v=2,t=s,o=z;<encoded /some-shared-memory-name><ESC>\

Here we tell the terminal emulator to read compressed image data from the specified shared memory object.

The client can also specify a size and offset to tell the terminal emulator to only read a part of the specified file. The is done using the S and O keys respectively. For example:

<ESC>_Gs=10,v=2,t=s,S=80,O=10;<encoded /some-shared-memory-name><ESC>\

This tells the terminal emulator to read 80 bytes starting from the offset 10 inside the specified shared memory buffer.

Remote client

Remote clients, those that are unable to use the filesystem/shared memory to transmit data, must send the pixel data directly using escape codes. Since escape codes are of limited maximum length, the data will need to be chunked up for transfer. This is done using the m key. The pixel data must first be base64 encoded then chunked up into chunks no larger than 4096 bytes. The client then sends the graphics escape code as usual, with the addition of an m key that must have the value 1 for all but the last chunk, where it must be 0. For example, if the data is split into three chunks, the client would send the following sequence of escape codes to the terminal emulator:

<ESC>_Gs=100,v=30,m=1;<encoded pixel data first chunk><ESC>\
<ESC>_Gm=1;<encoded pixel data second chunk><ESC>\
<ESC>_Gm=0;<encoded pixel data last chunk><ESC>\

Note that only the first escape code needs to have the full set of control codes such as width, height, format etc. Subsequent chunks must have only the m key. The client must finish sending all chunks for a single image before sending any other graphics related escape codes. Note that the cursor position used to display the image must be the position when the final chunk is received. Finally, terminals must not display anything, until the entire sequence is received and validated.

Querying support and available transmission mediums

Since a client has no a-priori knowledge of whether it shares a filesystem/shared memory with the terminal emulator, it can send an id with the control data, using the i key (which can be an arbitrary positive integer up to 4294967295, it must not be zero). If it does so, the terminal emulator will reply after trying to load the image, saying whether loading was successful or not. For example:

<ESC>_Gi=31,s=10,v=2,t=s;<encoded /some-shared-memory-name><ESC>\

to which the terminal emulator will reply (after trying to load the data):

<ESC>_Gi=31;error message or OK<ESC>\

Here the i value will be the same as was sent by the client in the original request. The message data will be a ASCII encoded string containing only printable characters and spaces. The string will be OK if reading the pixel data succeeded or an error message.

Sometimes, using an id is not appropriate, for example, if you do not want to replace a previously sent image with the same id, or if you are sending a dummy image and do not want it stored by the terminal emulator. In that case, you can use the query action, set a=q. Then the terminal emulator will try to load the image and respond with either OK or an error, as above, but it will not replace an existing image with the same id, nor will it store the image.

While as of May 2020, kitty is the only terminal emulator to support this graphics protocol, we intend that any terminal emulator that wishes to support it can. To check if a terminal emulator supports the graphics protocol the best way is to send the above query action followed by a request for the primary device attributes <>. If you get back an answer for the device attributes without getting back an answer for the query action the terminal emulator does not support the graphics protocol.

This means that terminal emulators that support the graphics protocol, must reply to query actions immediately without processing other input. Most terminal emulators handle input in a FIFO manner, anyway.

So for example, you could send:


If you get back a response to the graphics query, the terminal emulator supports the protocol, if you get back a response to the device attributes query without a response to the graphics query, it does not.

Display images on screen

Every transmitted image can be displayed an arbitrary number of times on the screen, in different locations, using different parts of the source image, as needed. Each such display of an image is called a placement. You can either simultaneously transmit and display an image using the action a=T, or first transmit the image with a id, such as i=10 and then display it with a=p,i=10 which will display the previously transmitted image at the current cursor position. When specifying an image id, the terminal emulator will reply to the placement request with an acknowledgement code, which will be either:


when the image referred to by id was found, or:

<ESC>_Gi=<id>;ENOENT:<some detailed error msg><ESC>\

when the image with the specified id was not found. This is similar to the scheme described above for querying available transmission media, except that here we are querying if the image with the specified id is available or needs to be re-transmitted.

Since there can be many placements per image, you can also give placements an id. To do so add the p key with a number between 1 and 4294967295. When you specify a placement id, it will be added to the acknowledgement code above. Every placement is uniquely identified by the pair of the image id and the placement id. If you specify a placement id for an image that does not have an id, it will be ignored. An example response:

<ESC>_Gi=<image id>,p=<placement id>;OK<ESC>\

If you send two placements with the same image id and placement id the second one will replace the first. This can be used to resize or move placements around the screen, without flicker.

New in version 0.19.3: Support for specifying placement ids (see Query terminal to query kitty version)

Controlling displayed image layout

The image is rendered at the current cursor position, from the upper left corner of the current cell. You can also specify extra X=3 and Y=4 pixel offsets to display from a different origin within the cell. Note that the offsets must be smaller that the size of the cell.

By default, the entire image will be displayed (images wider than the available width will be truncated on the right edge). You can choose a source rectangle (in pixels) as the part of the image to display. This is done with the keys: x, y, w, h which specify the top-left corner, width and height of the source rectangle.

You can also ask the terminal emulator to display the image in a specified rectangle (num of columns / num of lines), using the control codes c,r. c is the number of columns and r the number of rows. The image will be scaled (enlarged/shrunk) as needed to fit the specified area. Note that if you specify a start cell offset via the X,Y keys, it is not added to the number of rows/columns.

Finally, you can specify the image z-index, i.e. the vertical stacking order. Images placed in the same location with different z-index values will be blended if they are semi-transparent. You can specify z-index values using the z key. Negative z-index values mean that the images will be drawn under the text. This allows rendering of text on top of images. Negative z-index values below INT32_MIN/2 (-1,073,741,824) will be drawn under cells with non-default background colors.


After placing an image on the screen the cursor must be moved to the right by the number of cols in the image placement rectangle and down by the number of rows in the image placement rectangle. If either of these cause the cursor to leave either the screen or the scroll area, the exact positioning of the cursor is undefined, and up to implementations. The client can ask the terminal emulator to not move the cursor at all by specifying C=1 in the command, which sets the cursor movement policy to no movement for placing the current image.

New in version 0.20.0: Support for the C=1 cursor movement policy

Deleting images

Images can be deleted by using the delete action a=d. If specified without any other keys, it will delete all images visible on screen. To delete specific images, use the d key as described in the table below. Note that each value of d has both a lowercase and an uppercase variant. The lowercase variant only deletes the images without necessarily freeing up the stored image data, so that the images can be re-displayed without needing to resend the data. The uppercase variants will delete the image data as well, provided that the image is not referenced elsewhere, such as in the scrollback buffer. The values of the x and y keys are the same as cursor positions (i.e. x=1, y=1 is the top left cell).

Value of d


a or A

Delete all placements visible on screen

i or I

Delete all images with the specified id, specified using the i key. If you specify a p key for the placement id as well, then only the placement with the specified image id and placement id will be deleted.

n or N

Delete newest image with the specified number, specified using the I key. If you specify a p key for the placement id as well, then only the placement with the specified number and placement id will be deleted.

c or C

Delete all placements that intersect with the current cursor position.

f or F

Delete animation frames.

p or P

Delete all placements that intersect a specific cell, the cell is specified using the x and y keys

q or Q

Delete all placements that intersect a specific cell having a specific z-index. The cell and z-index is specified using the x, y and z keys.

x or X

Delete all placements that intersect the specified column, specified using the x key.

y or Y

Delete all placements that intersect the specified row, specified using the y key.

z or Z

Delete all placements that have the specified z-index, specified using the z key.

Note when all placements for an image have been deleted, the image is also deleted, if the capital letter form above is specified. Also, when the terminal is running out of quota space for new images, existing images without placements will be preferentially deleted.

Some examples:

<ESC>_Ga=d<ESC>\              # delete all visible placements
<ESC>_Ga=d,d=i,i=10<ESC>\     # delete the image with id=10, without freeing data
<ESC>_Ga=d,d=i,i=10,p=7<ESC>\ # delete the image with id=10 and placement id=7, without freeing data
<ESC>_Ga=d,d=Z,z=-1<ESC>\     # delete the placements with z-index -1, also freeing up image data
<ESC>_Ga=d,d=p,x=3,y=4<ESC>\  # delete all placements that intersect the cell at (3, 4), without freeing data

Suppressing responses from the terminal

If you are using the graphics protocol from a limited client, such as a shell script, it might be useful to avoid having to process responses from the terminal. For this, you can use the q key. Set it to 1 to suppress OK responses and to 2 to suppress failure responses.

New in version 0.19.3: The ability to suppress responses (see Query terminal to query kitty version)

Requesting image ids from the terminal

If you are writing a program that is going to share the screen with other programs and you still want to use image ids, it is not possible to know what image ids are free to use. In this case, instead of using the i key to specify and image id use the I key to specify and image number instead. These numbers are not unique. When creating a new image, even if an existing image has the same number a new one is created. And the terminal will reply with the id of the newly created image. For example, when creating an image with I=13, the terminal will send the response:


Here, the value of i is the id for the newly created image and the value of I is the same as was sent in the creation command.

All future commands that refer to images using the image number, such as creating placements or deleting images, will act on only the newest image with that number. This allows the client program to send a bunch of commands dealing with an image by image number without waiting for a response from the terminal with the image id. Once such a response is received, the client program should use the i key with the image id for all future communication.


Specifying both i and I keys in any command is an error. The terminal must reply with an EINVAL error message, unless silenced.

New in version 0.19.3: The ability to use image numbers (see Query terminal to query kitty version)


New in version 0.20.0: Animation support (see Query terminal to query kitty version)

When designing support for animation, the two main considerations were:

  1. There should be a way for both client and terminal driven animations. Since there is unknown and variable latency between client and terminal, especially over SSH, client driven animations are not sufficient.

  2. Animations often consist of small changes from one frame to the next, the protocol should thus allow transmitting these deltas for efficiency and performance reasons.

Animation support is added to the protocol by adding two new modes for the a (action) key. A f mode for transmitting frame data and an a mode for controlling the animation of an image. Animation proceeds in two steps, first a normal image is created as described earlier. Then animation frames are added to the image to make it into an animation. Since every animation is associated with a single image, all animation escape codes must specify either the i or I keys to identify the image being operated on.

Transferring animation frame data

Transferring animation frame data is very similar to Transferring pixel data above. The main difference is that the image the frame belongs to must be specified and it is possible to transmit data for only part of a frame, declaring the rest of the frame to be filled in by data from a previous frame, or left blank. To transfer frame data the a=f key must be used in all escape codes.

First, to transfer a simple frame that has data for the full image area, the escape codes used are exactly the same as for transferring image data, with the addition of: a=f,i=<image id> or a=f,I=<image number>.

If the frame has data for only a part of the image, you can specify the rectangle for it using the x, y, s, v keys, for example:

x=10,y=5,s=100,v=200  # A 100x200 rectangle with its top left corner at (10, 5)

Frames are created by composing the transmitted data onto a background canvas. This canvas can be either a single color, or the pixels from a previous frame. The composition can be of two types, either a simple replacement (X=1) key or a full alpha blend (the default).

To use a background color for the canvas, specify the Y key as a 32-bit RGBA color. For example:

Y=4278190335 # 0xff0000ff opaque red
Y=16711816   # 0x00ff0088 translucent green (alpha=0.53)

The default background color when none is specified is 0 i.e. a black, transparent pixel.

To use the data from a previous frame, specify the c key which is a 1-based frame number. Thus c=1 refers to the root frame (the base image data), c=2 refers to the second frame and so on.

If the frame is composed of multiple rectangular blocks, these can be expressed by using the r key. When specifying the r key the data for an existing frame is edited. The same composition operation as above happens, but now the background canvas is the existing frame itself. r is a 1-based index, so r=1 is the root frame (base image data), r=2 is the second frame and so on.

Finally, while transferring frame data, the frame gap can also be specified using the z key. The gap is the number of milliseconds to wait before displaying the next frame when the animation is running. A value of z=0 is ignored, z=positive number sets the gap to the specified number of milliseconds and z=negative number creates a gapless frame. Gapless frames are not displayed to the user since they are instantly skipped over, however they can be useful as the base data for subsequent frames. For example, for an animation where the background remains the same and a small object or two move.

Controlling animations

Clients can control animations by using the a=a key in the escape code sent to the terminal.

The simplest is client driven animations, where the client transmits the frame data and the also instructs the terminal to make a particular frame the current frame. To change the current frame, use the c key:


This will make the seventh frame in the image with id 3 the current frame.

However, client driven animations can be sub-optimal, since the latency between the client and terminal is unknown and variable especially over the network. Also they require the client to remain running for the lifetime of the animation, which is not desirable for cat like utilities.

Terminal driven animations are achieved by the client specifying gaps (time in milliseconds) between frames and instructing the terminal to stop or start the animation.

The animation state is controlled by the s key. s=1 stops the animation. s=2 runs the animation, but in loading mode, in this mode when reaching the last frame, instead of looping, the terminal will wait for the arrival of more frames. s=3 runs the animation normally, after the last frame, the terminal loops back to the first frame. The number of loops can be controlled by the v key. v=0 is ignored, v=1 is loop infinitely, and any other positive number is loop number - 1 times. Note that stopping the animation resets the loop counter.

Finally, the gap for frames can be set using the z key. This can be specified either when the frame is created as part of the transmit escape code or separately using the animation control escape code. The gap is the time in milliseconds to wait before displaying the next frame in the animation. For example:


This sets the gap for the third frame of the image with id 7 to 48 milliseconds. Note that gapless frames are not displayed to the user since the next frame comes immediately, however they can be useful to store base data for subsequent frames, such as in an animation with an object moving against a static background.

In particular, the first frame or root frame is created with the base image data and has no gap, so its gap must be set using this control code.

Image persistence and storage quotas

In order to avoid Denial-of-Service attacks, terminal emulators should have a maximum storage quota for image data. It should allow at least a few full screen images. For example the quota in kitty is 320MB per buffer. When adding a new image, if the total size exceeds the quota, the terminal emulator should delete older images to make space for the new one. In kitty, for animations, the additional frame data is stored on disk and has a separate, larger quota of five times the base quota.

Control data reference

The table below shows all the control data keys as well as what values they can take, and the default value they take when missing. All integers are 32-bit.






Single character. (t, T, q, p, d)


The overall action this graphics command is performing. t - transmit data, T - transmit data and display image, q - query terminal, p - put (display) previous transmitted image, d - delete image, f - transmit data for animation frames, a - control animation


0, 1, 2


Suppress responses from the terminal to this graphics command.

Keys for image transmission


Positive integer. (24, 32, 100).


The format in which the image data is sent.


Single character. (d, f, t, s).


The transmission medium used.


Positive integer.


The width of the image being sent.


Positive integer.


The height of the image being sent.


Positive integer.


The size of data to read from a file.


Positive integer.


The offset from which to read data from a file.


Positive integer. (0 - 4294967295)


The image id


Positive integer. (0 - 4294967295)


The image number


Positive integer. (0 - 4294967295)


The placement id


Single character. only z


The type of data compression.


zero or one


Whether there is more chunked data available.

Keys for image display


Positive integer


The left edge (in pixels) of the image area to display


Positive integer


The top edge (in pixels) of the image area to display


Positive integer


The width (in pixels) of the image area to display. By default, the entire width is used


Positive integer


The height (in pixels) of the image area to display. By default, the entire height is used


Positive integer


The x-offset within the first cell at which to start displaying the image


Positive integer


The y-offset within the first cell at which to start displaying the image


Positive integer


The number of columns to display the image over


Positive integer


The number of rows to display the image over


Positive integer


Cursor movement policy. 0 is the default, to move the cursor to after the image. 1 is to not move the cursor at all when placing the image.


32-bit integer


The z-index vertical stacking order of the image

Keys for animation frame loading


Positive integer


The left edge (in pixels) of where the frame data should be updated


Positive integer


The top edge (in pixels) of where the frame data should be updated


Positive integer


The 1-based frame number of the frame whose image data serves as the base data when creating a new frame, by default the base data is black, fully transparent pixels


Positive integer


The 1-based frame number of the frame that is being edited. By default, a new frame is created


32-bit integer


The gap (in milliseconds) of this frame from the next one. A value of zero is ignored. Negative values create a gapless frame. If not specified, frames have a default gap of 40ms. The root frame defaults to zero gap.


Positive integer


The composition mode for blending pixels when creating a new frame or editing a frame's data. The default is full alpha blending. 1 means a simple overwrite.


Positive integer


The background color for pixels not specified in the frame data. Must be in 32-bit RGBA format

Keys for animation control


Positive integer


1 - stop animation, 2 - run animation, but wait for new frames, 3 - run animation


Positive integer


The 1-based frame number of the frame that is being affected


32-bit integer


The gap (in milliseconds) of this frame from the next one. A value of zero is ignored. Negative values create a gapless frame.


Positive integer


The 1-based frame number of the frame that should be made the current frame


Positive integer


The number of loops to play. 0 is ignored, 1 is play infinite and is the default and larger number means play that number -1 loops

Keys for deleting images


Single character. (a, A, c, C, n, N, i, I, p, P, q, Q, x, X, y, Y, z, Z).


What to delete.

Interaction with other terminal actions

When resetting the terminal, all images that are visible on the screen must be cleared. When switching from the main screen to the alternate screen buffer (1049 private mode) all images in the alternate screen must be cleared, just as all text is cleared. The clear screen escape code (usually <ESC>[2J) should also clear all images. This is so that the clear command works.

The other commands to erase text must have no effect on graphics. The dedicated delete graphics commands must be used for those.

When scrolling the screen (such as when using index cursor movement commands, or scrolling through the history buffer), images must be scrolled along with text. When page margins are defined and the index commands are used, only images that are entirely within the page area (between the margins) must be scrolled. When scrolling them would cause them to extend outside the page area, they must be clipped.