# leylines this repo enables managing a [dask](https://dask.org) cluster using [wireguard](https://www.wireguard.com/) to link nodes which may be separated by WAN[^1] and includes an opinionated mini wireguard manager (on the server side, workers use wg-quick) that doubles as an [ansible](https://www.ansible.com/) inventory plugin. finally, ansible playbooks can run setup and deployment for dask nodes ## how to ### install the server ```bash (cd leylines-monocypher && pip3 install --user .) (cd leylines && pip3 install --user .) mkdir -p ~/.config/leylines ``` ok now take a moment to edit `leylines-support/leylines-daemon.service` to be running as your user (change `User=` and `Group=`). put that into your `/etc/systemd/system` and then do ```bash sudo systemctl --enable now leylines-daemon ``` congrats wireguard should be up. next, edit `leylines-support/nginx.conf` (change the listen address and the SSL certificate paths -- point those towards letsencrypt directories for a domain you already provisioned that your nginx is serving). put that block into your `/etc/nginx/nginx.conf`. to export your dask dashboard publicly, also adjust `leylines-support/nginx-http.conf` to your needs and include it in an http server block. it may be advantageous to do that first, then run `certbot` on the domain to get the certs provisioned, and then set up the `stream` block using the same certs as certbot inserted for https then run ```bash sudo nginx -s reload ``` ### install client now that the server is running, you may choose to access it remotely. make a note of `leylines print-token` -- this is the auth token you will need. on your client (local laptop, or something) ```bash (cd leylines && pip3 install --user .) mkdir -p ~/.config/leylines echo "auth token here" > ~/.config/leylines/token echo "mycluster.domain.lgbt" > ~/.config/leylines/host ``` now you can access your server using the CLI. initialize it and add some nodes. in the `init` command provide the server's externally-facing public IP, and provide an SSH key that can be used to access it for ansible. then, to add workers provide a name for each one and an SSH key ```bash leylines init -n myserver -i 1.2.3.4 -k path/to/ssh-key leylines add -n worker-0 -k path/to/ssh-key ... leylines add -n worker-n -k path/to/ssh-key ``` sync wireguard settings (this applies the configuration to the server's wireguard interface) ```bash leylines sync ``` get status ```bash leylines status ``` ### connect a worker get config for a node ```bash leylines get-conf ``` manually copy that config to your worker node, `/etc/wireguard/leyline-wg.conf` and then `systemctl enable --now wg-quick@leyline-wg` currently the wireguard topology is a star. this doesn't actually work optimally for my config, where some nodes are colocated and should have direct connections to each other and others should go over WAN to reach distant nodes. this will be changed in a later version ### provision workers run the ansible playbook. this will provision the needed components for dask on the server and all workers ```bash cd leylines-ansible ansible-playbook -i leylines_inv.py playbook-setup.yml ``` the first run will take a while. it builds python 3.9.5 and installs it, then builds a virtualenv with python dependencies in it, and then installs and starts systemd user services now you can open `:31336` to view the dask dashboard (or if you are proxying it with nginx, it should be available there too) use the cluster with ```python from dask.distributed import Client client = Client(":31337") ``` or more easily ```python from leylines.dask import init_dask client = init_dask() ``` or ```python from leylines.dask import init_dask_async client = await init_dask_async ``` `leylines.dask` also provides `tqdmprogress` which can be used in the place of `distributed.diagnostics.progress` for a task monitor using `tqdm`, and `tqdm_await` which can be used with an iterable of dask futures to display progress as they go (but only for async clients) ```python futures = [ some list of futures ... ] async for fut in tqdm_await(futures, pbar=): print(fut.result()) ``` ### time for magic copy `leylines-support/02-dask.py` into `~/.ipython/profile_default/startup` this provides 2 new spells: `%dask` connects to your cluster, and `%daskworker` splits off a new ipython console on a worker selected by having free RAM available and not being busy. this is useful for ad-hoc code testing on a real worker %dask also installs `client`, a reference to the client, and `tqdmprogress` from `leylines.dask`, and `upload` which uploads a file and returns a delayed function which will fetch the filename on a worker ### resources there is an abstract idea of nodes having resources which can be controlled by `leylines add-resource` and `leylines del-resource` (and `leylines status` shows you the resources). currently this assigns those with quantity 1 when starting the workers. due to a limitation of dask every worker process inherits the same quantity of resources. you can assign resources in a more ad-hoc way by opening an ipython session to a worker and then calling `await distribted.get_worker().set_resources(someresource=1)`, which will _temporarily_ assign that to the worker. if you modify resources through leylines you will need to run the ansible playbook again to apply the changes. you can use `--start-at-task "install systemd task"` to save some time