2021-06-15 10:00:44 +00:00
|
|
|
# leylines
|
|
|
|
|
|
|
|
this repo enables managing a [dask](https://dask.org) cluster using
|
|
|
|
[wireguard](https://www.wireguard.com/) to link nodes which may be separated by WAN[^1] and includes
|
|
|
|
an opinionated mini wireguard manager (on the server side, workers use wg-quick) that doubles as an
|
|
|
|
[ansible](https://www.ansible.com/) inventory plugin. finally, ansible playbooks can run setup and
|
|
|
|
deployment for dask nodes
|
|
|
|
|
|
|
|
## how to
|
|
|
|
|
2021-06-18 07:53:27 +00:00
|
|
|
### install the server
|
2021-06-15 10:00:44 +00:00
|
|
|
```bash
|
|
|
|
(cd leylines-monocypher && pip3 install --user .)
|
|
|
|
(cd leylines && pip3 install --user .)
|
|
|
|
mkdir -p ~/.config/leylines
|
|
|
|
```
|
|
|
|
|
2021-06-18 07:53:27 +00:00
|
|
|
ok now take a moment to edit `leylines-support/leylines-daemon.service` to be running as your user
|
|
|
|
(change `User=` and `Group=`). put that into your `/etc/systemd/system` and then do
|
|
|
|
|
|
|
|
```bash
|
|
|
|
sudo systemctl --enable now leylines-daemon
|
|
|
|
```
|
|
|
|
|
|
|
|
congrats wireguard should be up. next, edit `leylines-support/nginx.conf` (change the listen address
|
|
|
|
and the SSL certificate paths -- point those towards letsencrypt directories for a domain you
|
|
|
|
already provisioned that your nginx is serving). put that block into your `/etc/nginx/nginx.conf`.
|
|
|
|
to export your dask dashboard publicly, also adjust `leylines-support/nginx-http.conf` to your needs
|
|
|
|
and include it in an http server block. it may be advantageous to do that first, then run `certbot`
|
|
|
|
on the domain to get the certs provisioned, and then set up the `stream` block using the same certs
|
|
|
|
as certbot inserted for https
|
|
|
|
|
|
|
|
then run
|
|
|
|
```bash
|
|
|
|
sudo nginx -s reload
|
|
|
|
```
|
|
|
|
|
|
|
|
### install client
|
|
|
|
|
|
|
|
now that the server is running, you may choose to access it remotely. make a note of `leylines
|
|
|
|
print-token` -- this is the auth token you will need. on your client (local laptop, or something)
|
|
|
|
|
|
|
|
```bash
|
|
|
|
(cd leylines && pip3 install --user .)
|
|
|
|
mkdir -p ~/.config/leylines
|
|
|
|
echo "auth token here" > ~/.config/leylines/token
|
|
|
|
echo "mycluster.domain.lgbt" > ~/.config/leylines/host
|
|
|
|
```
|
|
|
|
|
|
|
|
now you can access your server using the CLI. initialize it and add some nodes. in the `init`
|
|
|
|
command provide the server's externally-facing public IP, and provide an SSH key that can be used to
|
|
|
|
access it for ansible. then, to add workers provide a name for each one and an SSH key
|
2021-06-15 10:00:44 +00:00
|
|
|
|
|
|
|
```bash
|
2021-06-18 07:53:27 +00:00
|
|
|
leylines init -n myserver -i 1.2.3.4 -k path/to/ssh-key
|
|
|
|
leylines add -n worker-0 -k path/to/ssh-key
|
|
|
|
...
|
|
|
|
leylines add -n worker-n -k path/to/ssh-key
|
2021-06-15 10:00:44 +00:00
|
|
|
```
|
|
|
|
|
2021-06-18 07:53:27 +00:00
|
|
|
sync wireguard settings (this applies the configuration to the server's wireguard interface)
|
2021-06-15 10:00:44 +00:00
|
|
|
```bash
|
|
|
|
leylines sync
|
|
|
|
```
|
|
|
|
|
|
|
|
get status
|
|
|
|
```bash
|
|
|
|
leylines status
|
|
|
|
```
|
|
|
|
|
2021-06-18 07:53:27 +00:00
|
|
|
### connect a worker
|
|
|
|
|
2021-06-15 10:00:44 +00:00
|
|
|
get config for a node
|
|
|
|
```bash
|
|
|
|
leylines get-conf <id>
|
|
|
|
```
|
|
|
|
|
|
|
|
manually copy that config to your worker node, `/etc/wireguard/leyline-wg.conf` and then
|
|
|
|
`systemctl enable --now wg-quick@leyline-wg`
|
|
|
|
|
|
|
|
currently the wireguard topology is a star. this doesn't actually work optimally for my config,
|
|
|
|
where some nodes are colocated and should have direct connections to each other and others should go
|
2021-06-18 07:53:27 +00:00
|
|
|
over WAN to reach distant nodes. this will be changed in a later version
|
|
|
|
|
|
|
|
### provision workers
|
|
|
|
|
|
|
|
run the ansible playbook. this will provision the needed components for dask on the server and all
|
|
|
|
workers
|
2021-06-15 10:00:44 +00:00
|
|
|
|
|
|
|
```bash
|
|
|
|
cd leylines-ansible
|
|
|
|
ansible-playbook -i leylines_inv.py playbook-setup.yml
|
|
|
|
```
|
|
|
|
|
|
|
|
the first run will take a while. it builds python 3.9.5 and installs it, then builds a virtualenv
|
2021-06-18 07:53:27 +00:00
|
|
|
with python dependencies in it, and then installs and starts systemd user services
|
2021-06-15 10:00:44 +00:00
|
|
|
|
2021-06-18 07:53:27 +00:00
|
|
|
now you can open `<your server's wireguard ip>:31336` to view the dask dashboard (or if you are
|
|
|
|
proxying it with nginx, it should be available there too)
|
2021-06-15 10:00:44 +00:00
|
|
|
|
|
|
|
use the cluster with
|
|
|
|
```python
|
|
|
|
from dask.distributed import Client
|
|
|
|
client = Client("<your server's wireguard ip>:31337")
|
|
|
|
```
|
2021-06-16 12:39:03 +00:00
|
|
|
|
2021-06-18 07:53:27 +00:00
|
|
|
or more easily
|
|
|
|
```python
|
|
|
|
from leylines.dask import init_dask
|
|
|
|
client = init_dask()
|
|
|
|
```
|
|
|
|
|
|
|
|
or
|
|
|
|
```python
|
|
|
|
from leylines.dask import init_dask_async
|
|
|
|
client = await init_dask_async
|
|
|
|
```
|
|
|
|
|
|
|
|
`leylines.dask` also provides `tqdmprogress` which can be used in the place of
|
|
|
|
`distributed.diagnostics.progress` for a task monitor using `tqdm`, and `tqdm_await` which can be
|
|
|
|
used with an iterable of dask futures to display progress as they go (but only for async clients)
|
|
|
|
|
|
|
|
```python
|
|
|
|
futures = [ some list of futures ... ]
|
|
|
|
async for fut in tqdm_await(futures, pbar=<optional tqdm instance to use>):
|
|
|
|
print(fut.result())
|
|
|
|
```
|
|
|
|
|
2021-06-16 12:39:03 +00:00
|
|
|
### time for magic
|
|
|
|
|
|
|
|
copy `leylines-support/02-dask.py` into `~/.ipython/profile_default/startup`
|
|
|
|
|
|
|
|
this provides 2 new spells: `%dask` connects to your cluster, and `%daskworker` splits off a new
|
|
|
|
ipython console on a worker selected by having free RAM available and not being busy. this is useful
|
|
|
|
for ad-hoc code testing on a real worker
|
|
|
|
|
2021-06-18 07:53:27 +00:00
|
|
|
%dask also installs `client`, a reference to the client, and `tqdmprogress` from `leylines.dask`,
|
|
|
|
and `upload` which uploads a file and returns a delayed function which will fetch the filename on a
|
|
|
|
worker
|
2021-06-18 08:11:01 +00:00
|
|
|
|
|
|
|
### resources
|
|
|
|
|
|
|
|
there is an abstract idea of nodes having resources which can be controlled by `leylines
|
|
|
|
add-resource` and `leylines del-resource` (and `leylines status` shows you the resources). currently
|
|
|
|
this assigns those with quantity 1 when starting the workers. due to a limitation of dask every
|
|
|
|
worker process inherits the same quantity of resources. you can assign resources in a more ad-hoc
|
|
|
|
way by opening an ipython session to a worker and then calling `await
|
|
|
|
distribted.get_worker().set_resources(someresource=1)`, which will _temporarily_ assign that to
|
|
|
|
the worker. if you modify resources through leylines you will need to run the ansible playbook again
|
|
|
|
to apply the changes. you can use `--start-at-task "install systemd task"` to save some time
|