Adding New REPL Kernels

Architecture Overview

Inside containers, each kernel is a simple daemon process that accepts user code snippets and replies with its execution results via TCP-based ZeroMQ connections. The rationale to use ZeroMQ is: 1) it is message-based; we do not have to concern the message boundaries and encodings, 2) it automatically reconnects when the connection is lost due to network failures or packet losses, 3) it is one of the most universally supported networking library in various programming languages.

A kernel should offer the query mode and/or the PTY mode. The TCP port 2001 is reserved for the query mode whereas 2002 and 2003 are reserved for the PTY mode (stdin and stdout combined with stderr).

Ingredients of Kernel Images

A kernel is a Docker image with the following format:

  • Dockerfile

    • WORKDIR /home/work: this path is used to mount an external directory so that the agent can access files generated by user codes.

    • CMD must be set to the main program.

    • Required Labels

      • ai.backend.maxcores: N (the number of CPU cores recommended for this kernel)

      • ai.backend.maxmem: M (the memory size in a human-readable bytes recommended for this kernel, 128m (128 MBytes) for example)

      • ai.backend.timeout: T (the maximum seconds allowed to execute a single query)

      • Above limits are used as default settings by Backend.AI Agent, but the agents may enforce lower limits due to the service policy. Backend.AI Gateway may refer these information for load balancing and scheduling.

      • ai.backend.mode: query, pty, or query+pty

    • Optional Labels

      • ai.backend.envs.corecount: a comma-separated string of environment variable names which will be set to the number of assigned CPU cores by the agent. (e.g., JULIA_CPU_CORES, OPENBLAS_NUM_THREADS)

      • ai.backend.nvidia.enabled: yes or no (if yes, Backend.AI Agent attaches an NVIDIA CUDA GPU device with a driver volume. You must use nvidia-docker images as base of your Dockerfile.)

      • ai.backend.extra_volumes: a comma-separated string of extra volume mounts (volume name and path inside container separated by a colon), such as deep learning sample data sets (e.g., sample-data:/home/work/samples,extra-data:/home/work/extra). Note that we allow only read-only mounts. The available list of extra volumes depends on your Backend.AI Agent setup; there is no standard or predefined ones. If you want to add a new one, use docker volume commands. When designated volumes do not exist in the agent’s host, the agent silently skips mounting them.

      • ai.backend.features: a comma-separated string keywords indicating available features of this kernel.

        Keyword

        Feature

        media.images

        Generates images (PNG, JPG, and SVG) without uploading into AWS S3.

        media.svgplot

        Generates plots in SVG.

        media.drawing

        Generates animated vector graphics which can be rendered by sorna-media Javascript library

        media.audio

        Generates audio signal streams. (not implemented)

  • The main program that implements the query mode and/or the PTY mode (see below).

    • We strongly recommend to create a normal user instead of using root for the main program.

    • The main program should be wrapped with jail, like:

      #! /bin/bash
      exec /home/backend.ai/jail default `which lua` /home/backend.ai/run.lua
      

      The first argument to jail is the policy name and the second and laters are the absolute path of the main program with its arguments. To customize the jail policy, see below.

    • jail and intra-jail must be copied into the kernel image.

  • Other auxilliary files used in Dockerfile or the main program. (e.g., Python and package installation scripts)

Writing Query Mode Kernels

Most kernels fall into this category. You just write a simple blocking loop that receives a input code message and send a output result message via a ZeroMQ REP socket listening on port 2001. All complicated stuffs such as multiplexing multiple user requests and container management is done by Backend.AI Agent.

The input is a ZeroMQ’s multipart message with two payloads. The first payload should contain a unique identifier for the code snippet (usually a hash of it), but currently it is ignored (reserved for future caching implementations). The second payload should contain a UTF-8 encoded source code string.

The reply is a ZeroMQ’s multipart message with a single payload, containing a UTF-8 encoded string of the following JSON object:

{
    "stdout": "hello world!",
    "stderr": "oops!",
    "exceptions": [
        ["exception-name", ["arg1", "arg2"], false, null]
    ],
    "media": [
        ["image/png", "data:image/base64,...."]
    ],
    "options": {
        "upload_output_files": true
    }
}

Each item in exceptions is an array composed of four items: exception name, exception arguments (optional), a boolean indicating if the exception is raised outside the user code (mostly false), and a traceback string (optional).

Each item in media is an array of two items: MIME-type and the data string. Specific formats are defined and handled by the Backend.AI Media module.

The options field may present optionally. If upload_output_files is true (default), then the agent uploads the files generated by user code in the working directory (/home/work) to AWS S3 bucket and make their URLs available in the front-end.

Writing PTY Mode Kernels

If you want to allow users to have real-time interactions with your kernel using web-based terminals, you should implement the PTY mode as well. A good example is our “git” kernel runner.

The key concept is separation of the “outer” daemon and the “inner” target program (e.g., a shell). The outer daemon should wrap the inner program inside a pseudo-tty. As the outer daemon is completely hidden in terminal interaction by the end-users, the programming language may differ from the inner program. The challenge is that you need to implement piping of ZeroMQ sockets from/to pseudo-tty file descriptors. It is up to you how you implement the outer daemon, but if you choose Python for it, we recommend to use asyncio or similar event loop libraries such as tornado and Twisted to mulitplex sockets and file descriptors for both input/output directions. When piping the messages, the outer daemon should not apply any specific transformation; it should send and receive all raw data/control byte sequences transparently because the front-end (e.g., terminal.js) is responsible for interpreting them. Currently we use PUB/SUB ZeroMQ socket types but this may change later.

Optionally, you may run the query-mode loop side-by-side. For example, our git kernel supports terminal resizing and pinging commands as the query-mode inputs. There is no fixed specification for such commands yet, but the current CodeOnWeb uses the followings:

  • %resize <rows> <cols>: resize the pseudo-tty’s terminal to fit with the web terminal element in user browsers.

  • %ping: just a no-op command to prevent kernel idle timeouts while the web terminal is open in user browsers.

A best practice (not mandatory but recommended) for PTY mode kernels is to automatically respawn the inner program if it terminates (e.g., the user has exited the shell) so that the users are not locked in a “blank screen” terminal.

Writing Custom Jail Policies

Implement the jail policy interface in Go and ebmed it inside your jail build. Please give a look to existing jail policies as good references.