Setup OS Environment
Backend.AI and its associated components share common requirements and configurations for proper operation. This section explains how to configure the OS environment.
Note
This section assumes the installation on Ubuntu 20.04 LTS.
Create a user account for operation
We will create a user account bai
to install and operate Backend.AI
services. Set the UID
and GID
to 1100
to prevent conflicts with
other users or groups. sudo
privilege is required so add bai
to
sudo
group.
$ username="bai"
$ password="secure-password"
$ sudo adduser --disabled-password --uid 1100 --gecos "" $username
$ echo "$username:$password" | sudo chpasswd
$ sudo usermod -aG sudo bai
If you do not want to expose your password in the shell history, remove the
--disabled-password
option and interactively enter your password.
Login as the bai
user and continue the installation.
Install Docker engine
Backend.AI requires Docker Engine to create a compute session with the Docker
container backend. Also, some service components are deployed as containers. So
installing Docker Engine is
required. Ensure docker-compose-plugin
is installed as well to use
docker compose
command.
After the installation, add the bai
user to the docker
group not to
issue the sudo
prefix command every time interacting with the Docker engine.
$ sudo usermod -aG docker bai
Logout and login again to apply the group membership change.
Optimize sysctl/ulimit parameters
This is not essential but the recommended step to optimize the performance and stability of operating Backend.AI. Refer to the guide of the Manager repiository for the details of the kernel parameters and the ulimit settings. Depending on the Backend.AI services you install, the optimal values may vary. Each service installation section guide with the values, if needed.
Note
Modern systems may have already set the optimal parameters. In that case, you can skip this step.
To cleanly separate the configurations, you may follow the steps below.
Save the resource limit parameters in
/etc/security/limits.d/99-backendai.conf
.root hard nofile 512000 root soft nofile 512000 root hard nproc 65536 root soft nproc 65536 bai hard nofile 512000 bai soft nofile 512000 bai hard nproc 65536 bai soft nproc 65536
Logout and login again to apply the resource limit changes.
Save the kernel parameters in
/etc/sysctl.d/99-backendai.conf
.fs.file-max=2048000 net.core.somaxconn=1024 net.ipv4.tcp_max_syn_backlog=1024 net.ipv4.tcp_slow_start_after_idle=0 net.ipv4.tcp_fin_timeout=10 net.ipv4.tcp_window_scaling=1 net.ipv4.tcp_tw_reuse=1 net.ipv4.tcp_early_retrans=1 net.ipv4.ip_local_port_range="10000 65000" net.core.rmem_max=16777216 net.core.wmem_max=16777216 net.ipv4.tcp_rmem=4096 12582912 16777216 net.ipv4.tcp_wmem=4096 12582912 16777216 vm.overcommit_memory=1
Apply the kernel parameters with
sudo sysctl -p /etc/sysctl.d/99-backendai.conf
.
Prepare required Python versions and virtual environments
Prepare a Python distribution whose version meets the requirements of the target package. Backend.AI 22.09, for example, requires Python 3.10. The latest information on the Python version compatibility can be found at here.
There can be several ways to prepare a specific Python version. Here, we will be using pyenv and pyenv-virtualenv.
Use pyenv to manually build and select a specific Python version
Install pyenv and pyenv-virtualenv. Then, install a Python version that are needed:
$ pyenv install "${YOUR_PYTHON_VERSION}"
Note
You may need to install suggested build environment to build Python from pyenv.
Then, you can create multiple virtual environments per service. To create a virtual environment for Backend.AI Manager 22.09.x and automatically activate it, for example, you may run:
$ mkdir "${HOME}/manager"
$ cd "${HOME}/manager"
$ pyenv virtualenv "${YOUR_PYTHON_VERSION}" bai-22.09-manager
$ pyenv local bai-22.09-manager
$ pip install -U pip setuptools wheel
You also need to make pip
available to the Python installation with the
latest wheel
and setuptools
packages, so that any non-binary extension
packages can be compiled and installed on your system.
Use a standalone static built Python
We can use a standalone static built Python.
Warning
Details will be added later.
Configure network aliases
Although not required, using a network aliases instead of IP addresses can make
setup and operation easier. Edit the /etc/hosts
file for each node and
append the contents like example below to access each server with network
aliases.
##### BEGIN for Backend.AI services #####
10.20.30.10 bai-m1 # management node 01
10.20.30.20 bai-a01 # agent node 01 (GPU 01)
10.20.30.22 bai-a02 # agent node 02 (GPU 02)
##### END for Backend.AI services #####
Note that the IP addresses should be accessible from other nodes, if you are installing on multiple servers.
Setup accelerators
If there are accelerators (e.g., GPU) on the server, you have to install the vendor-specific drivers and libraries to make sure the accelerators are properly set up and working. Please refer to the vendor documentation for the details.
To integrate NVIDIA GPUs,
Install the NVIDIA driver and CUDA toolkit.
Install the NVIDIA container toolkit (nvidia-docker2).
Pull container images
For compute nodes, you need to pull some container images that are required for creating a compute session. Lablup provides a set of open container images and you may pull the following starter images:
docker pull cr.backend.ai/stable/filebrowser:21.02-ubuntu20.04
docker pull cr.backend.ai/stable/python:3.9-ubuntu20.04
docker pull cr.backend.ai/stable/python-pytorch:1.11-py38-cuda11.3
docker pull cr.backend.ai/stable/python-tensorflow:2.7-py38-cuda11.3