Install from Source
For production deployments, we recommend to create separate virtualenvs for individual services and install the pre-built wheel distributions, following Install from Packages.
Setting Up Manager and Agent (single node, all-in-one)
Check out Development Setup.
Setting Up Additional Agents (multi-node)
Updating manager configuration for multi-nodes
scripts/install-dev.sh assumes a single-node all-in-one setup, it configures the etcd and Redis addresses to be
You need to update the etcd configuration of the Redis address so that additional agent nodes can connect to the Redis server using the address advertised via etcd:
$ ./backend.ai mgr etcd get config/redis/addr 127.0.0.1:xxxx $ ./backend.ai mgr etcd put config/redis/addr MANAGER_IP:xxxx # use the port number read above
MANAGER_IP is an IP address of the manager node accessible from other agent nodes.
Installing additional agents in different nodes
First, you need to initialize a working copy of the core repository for each additional agent node.
scripts/install-dev.sh does not yet provide an “agent-only” installation mode,
you need to manually perform the same repository cloning along with the pyenv, Python, and Pants setup procedures as the script does.
Since we use the mono-repo for the core packages, there is no way to separately clone the agent sources only. Just clone the entire repository and configure/execute the agent only. Ensure that you also pull the LFS files and submodules when you manually clone it.
pants is up and working, run
pants export to populate virtualenvs and install dependencies.
Then start to configure
agent.toml by copying it from configs/agent/halfstack.toml as follows:
[etcd].addr.host: Replace with
[agent].rpc-listen-addr.host: Replace with
[container].bind-host: Replace with
[watcher].service-addr.host: Replace with
AGENT_IP is an IP address of this agent node accessible from the manager and
MANAGER_IP is an IP address of the manager node accessible from this agent node.
./backend.ai ag start-server to connect this agent node to an existing manager.
We assume that the agent and manager nodes reside in a same local network, where all TCP ports are open to each other.
If this is not the case, you should configure firewalls to open all the port numbers appearing in
There are more complicated setup scenarios such as splitting network planes for control and container-to-container communications, but we provide assistance with them for enterprise customers only.
Setting Up Accelerators
Ensure that your accelerator is properly set up using vendor-specific installation methods.
Clone the accelerator plugin package into
plugins directory if necessary or just use one of the already existing one in the mono-repo.
You also need to configure
[agent].allow-compute-plugins with the full package path (e.g.,
ai.backend.accelerator.cuda_open) to activate them.
Configuring Overlay Networks for Multi-node Training (Optional)
All other features of Backend.AI except multi-node training work without this configuration. The Docker Swarm mode is used to configure overlay networks to ensure privacy between cluster sessions, while the container monitoring and configuration is done by Backend.AI itself.
Currently the cross-node inter-container overlay routing is controlled via Docker Swarm’s overlay networks. In the manager, you need to create a Swarm. In the agent nodes, you need to join the Swarm. Then restart all manager and agent daemons to make it working.