Install Backend.AI Manager

Refer to Prepare required Python versions and virtual environments to setup Python and virtual environment for the service.

Install the latest version of Backend.AI Manager for the current Python version:

$ cd "${HOME}/manager"
$ # Activate a virtual environment if needed.
$ pip install -U backend.ai-manager

If you want to install a specific version:

$ pip install -U backend.ai-manager==${BACKEND_PKG_VERSION}

You must generate an RPC keypair for the first-time setup. This keypair is used for authentication and encryption in our manager-to-agent RPC connections. Even if you don’t want to encrypt the RPC channel, you still need to have a keypair.

$ cd "${HOME}/manager"
$ mkdir fixtures
$ backend.ai mgr generate-rpc-keypair fixtures 'manager'
2024-04-23 12:06:31.913 INFO ai.backend.manager.cli [21024] Generating a RPC keypair...
Public Key: >B-mF}N{WygT92d&=Kceix$7cWzg!dT])rIc39=S (stored at fixtures/manager.key)
Secret Key: g.4&?*b&0#oRRC9?DMO[SUXikjKZ7nYj!bzFJN92 (stored at fixtures/manager.key_secret)

You have generated an RPC keypair for the manager. The public key is stored in fixtures/manager.key and the secret key is stored in fixtures/manager.key_secret.

Local configuration

Backend.AI Manager uses a TOML file (manager.toml) to configure local service. Refer to the manager.toml sample file for a detailed description of each section and item. A configuration example would be:

[etcd]
namespace = "local"
addr = { host = "bai-m1", port = 8120 }
user = ""
password = ""

[db]
type = "postgresql"
addr = { host = "bai-m1", port = 8100 }
name = "backend"
user = "postgres"
password = "develove"

[manager]
num-proc = 2
service-addr = { host = "0.0.0.0", port = 8081 }
# user = "bai"
# group = "bai"
ssl-enabled = false

heartbeat-timeout = 40.0
rpc-auth-manager-keypair = "fixtures/manager.key_secret"
pid-file = "manager.pid"
disabled-plugins = []
hide-agents = true
# event-loop = "asyncio"
# importer-image = "lablup/importer:manylinux2010"
distributed-lock = "filelock"

[docker-registry]
ssl-verify = false

[logging]
level = "INFO"
drivers = ["console", "file"]

[logging.pkg-ns]
"" = "WARNING"
"aiotools" = "INFO"
"aiopg" = "WARNING"
"aiohttp" = "INFO"
"ai.backend" = "INFO"
"alembic" = "INFO"

[logging.console]
colored = true
format = "verbose"

[logging.file]
path = "./logs"
filename = "manager.log"
backup-count = 10
rotation-size = "10M"

[debug]
enabled = false
enhanced-aiomonitor-task-info = true

Save the contents to ${HOME}/.config/backend.ai/manager.toml. Backend.AI will automatically recognize the location. Adjust each field to conform to your system.

Global configuration

Etcd (cluster) stores globally shared configurations for all nodes. Some of them should be populated prior to starting the service.

Note

It might be a good idea to create a backup of the current Etcd configuration before modifying the values. You can do so by simply executing:

$ backend.ai mgr etcd get --prefix "" > ./etcd-config-backup.json

To restore the backup:

$ backend.ai mgr etcd delete --prefix ""
$ backend.ai mgr etcd put-json "" ./etcd-config-backup.json

The commands below should be executed at ${HOME}/manager directory.

To list a specific key from Etcd, for example, config key:

$ backend.ai mgr etcd get --prefix config

Now, configure Redis access information. This should be accessible from all nodes.

$ backend.ai mgr etcd put config/redis/addr "bai-m1:8110"
$ backend.ai mgr etcd put config/redis/password "develove"

Set the container registry. The following is the Lablup’s open registry (cr.backend.ai). You can set your own registry with username and password if needed. This can be configured via GUI as well.

$ backend.ai mgr etcd put config/docker/image/auto_pull "tag"
$ backend.ai mgr etcd put config/docker/registry/cr.backend.ai "https://cr.backend.ai"
$ backend.ai mgr etcd put config/docker/registry/cr.backend.ai/type "harbor2"
$ backend.ai mgr etcd put config/docker/registry/cr.backend.ai/project "stable"
$ # backend.ai mgr etcd put config/docker/registry/cr.backend.ai/username "bai"
$ # backend.ai mgr etcd put config/docker/registry/cr.backend.ai/password "secure-password"

Also, populate the Storage Proxy configuration to the Etcd:

$ # Allow project (group) folders.
$ backend.ai mgr etcd put volumes/_types/group ""
$ # Allow user folders.
$ backend.ai mgr etcd put volumes/_types/user ""
$ # Default volume host. The name of the volume proxy here is "bai-m1" and volume name is "local".
$ backend.ai mgr etcd put volumes/default_host "bai-m1:local"
$ # Set the "bai-m1" proxy information.
$ # User (browser) facing API endpoint of Storage Proxy.
$ # Cannot use host alias here. It should be user-accessible URL.
$ backend.ai mgr etcd put volumes/proxies/bai-m1/client_api "http://127.0.0.1:6021"
$ # Manager facing internal API endpoint of Storage Proxy.
$ backend.ai mgr etcd put volumes/proxies/bai-m1/manager_api "http://bai-m1:6022"
$ # Random secret string which is used by Manager to communicate with Storage Proxy.
$ backend.ai mgr etcd put volumes/proxies/bai-m1/secret "secure-token-to-authenticate-manager-request"
$ # Option to disable SSL verification for the Storage Proxy.
$ backend.ai mgr etcd put volumes/proxies/bai-m1/ssl_verify "false"

Check if the configuration is properly populated:

$ backend.ai mgr etcd get --prefix volumes

Note that you have to change the secret to a unique random string for secure communication between the manager and Storage Proxy. The most recent set of parameters can be found from sample.etcd.volumes.json.

To enable access to the volumes defined by the Storage Proxy from every user, you need to update the allowed_vfolder_hosts column of the domains table to hold the storage volume reference (e.g., bai-m1:local). You can do this by issuing SQL statement directly inside the PostgreSQL container:

$ vfolder_host_val='{"bai-m1:local": ["create-vfolder", "modify-vfolder", "delete-vfolder", "mount-in-session", "upload-file", "download-file", "invite-others", "set-user-specific-permission"]}'
$ docker compose -f "$HOME/halfstack/postgres-cluster-default" exec -it backendai-half-db psql -U postgres -d backend \
      -c "UPDATE domains SET allowed_vfolder_hosts = '${vfolder_host_val}' WHERE name = 'default';"

Populate the database with initial fixtures

You need to prepare alembic.ini file under ${HOME}/manager to manage the database schema. Copy the sample halfstack.alembic.ini and save it as ${HOME}/manager/alembic.ini. Adjust the sqlalchemy.url field if database connection information is different from the default one. You may need to change localhost to bai-m1.

Populate the database schema and initial fixtures. Copy the example JSON files (example-keypairs.json and example-resource-presets.json) as keypairs.json and resource-presets.json, save them under ${HOME}/manager/. Customize them to have unique keypairs and passwords for your initial superadmin and sample user accounts for security.

$ backend.ai mgr schema oneshot
$ backend.ai mgr fixture populate ./users.json
$ backend.ai mgr fixture populate ./keypairs.json
$ backend.ai mgr fixture populate ./resource-presets.json
$ backend.ai mgr fixture populate ./set-user-main-access-keys.json

Sync the information of container registry

You need to scan the image catalog and metadata from the container registry to the Manager. This is required to display the list of compute environments in the user web GUI (Web UI). You can run the following command to sync the information with Lablup’s public container registry:

$ backend.ai mgr image rescan cr.backend.ai

Run Backend.AI Manager service

You can run the service:

$ cd "${HOME}/manager"
$ python -m ai.backend.manager.server

Check if the service is running. The default Manager API port is 8081, but it can be configured from manager.toml:

$ curl bai-m1:8081
{"version": "v6.20220615", "manager": "22.09.6"}

Press Ctrl-C to stop the service.

Register systemd service

The service can be registered as a systemd daemon. It is recommended to automatically run the service after rebooting the host machine, although this is entirely optional.

First, create a runner script at ${HOME}/bin/run-manager.sh:

#! /bin/bash
set -e

if [ -z "$HOME" ]; then
   export HOME="/home/bai"
fi

# -- If you have installed using static python --
source .venv/bin/activate

# -- If you have installed using pyenv --
if [ -z "$PYENV_ROOT" ]; then
   export PYENV_ROOT="$HOME/.pyenv"
   export PATH="$PYENV_ROOT/bin:$PATH"
fi
eval "$(pyenv init --path)"
eval "$(pyenv virtualenv-init -)"

if [ "$#" -eq 0 ]; then
   exec python -m ai.backend.manager.server
else
   exec "$@"
fi

Make the script executable:

$ chmod +x "${HOME}/bin/run-manager.sh"

Then, create a systemd service file at /etc/systemd/system/backendai-manager.service:

[Unit]
Description= Backend.AI Manager
Requires=network.target
After=network.target remote-fs.target

[Service]
Type=simple
ExecStart=/home/bai/bin/run-manager.sh
PIDFile=/home/bai/manager/manager.pid
User=1100
Group=1100
WorkingDirectory=/home/bai/manager
TimeoutStopSec=5
KillMode=process
KillSignal=SIGTERM
PrivateTmp=false
Restart=on-failure
RestartSec=10
LimitNOFILE=5242880
LimitNPROC=131072

[Install]
WantedBy=multi-user.target

Finally, enable and start the service:

$ sudo systemctl daemon-reload
$ sudo systemctl enable --now backendai-manager

$ # To check the service status
$ sudo systemctl status backendai-manager
$ # To restart the service
$ sudo systemctl restart backendai-manager
$ # To stop the service
$ sudo systemctl stop backendai-manager
$ # To check the service log and follow
$ sudo journalctl --output cat -u backendai-manager -f