leylines/README.md

151 lines
5.4 KiB
Markdown
Raw Permalink Normal View History

2021-06-15 10:00:44 +00:00
# leylines
this repo enables managing a [dask](https://dask.org) cluster using
[wireguard](https://www.wireguard.com/) to link nodes which may be separated by WAN[^1] and includes
an opinionated mini wireguard manager (on the server side, workers use wg-quick) that doubles as an
[ansible](https://www.ansible.com/) inventory plugin. finally, ansible playbooks can run setup and
deployment for dask nodes
## how to
### install the server
2021-06-15 10:00:44 +00:00
```bash
(cd leylines-monocypher && pip3 install --user .)
(cd leylines && pip3 install --user .)
mkdir -p ~/.config/leylines
```
ok now take a moment to edit `leylines-support/leylines-daemon.service` to be running as your user
(change `User=` and `Group=`). put that into your `/etc/systemd/system` and then do
```bash
sudo systemctl --enable now leylines-daemon
```
congrats wireguard should be up. next, edit `leylines-support/nginx.conf` (change the listen address
and the SSL certificate paths -- point those towards letsencrypt directories for a domain you
already provisioned that your nginx is serving). put that block into your `/etc/nginx/nginx.conf`.
to export your dask dashboard publicly, also adjust `leylines-support/nginx-http.conf` to your needs
and include it in an http server block. it may be advantageous to do that first, then run `certbot`
on the domain to get the certs provisioned, and then set up the `stream` block using the same certs
as certbot inserted for https
then run
```bash
sudo nginx -s reload
```
### install client
now that the server is running, you may choose to access it remotely. make a note of `leylines
print-token` -- this is the auth token you will need. on your client (local laptop, or something)
```bash
(cd leylines && pip3 install --user .)
mkdir -p ~/.config/leylines
echo "auth token here" > ~/.config/leylines/token
echo "mycluster.domain.lgbt" > ~/.config/leylines/host
```
now you can access your server using the CLI. initialize it and add some nodes. in the `init`
command provide the server's externally-facing public IP, and provide an SSH key that can be used to
access it for ansible. then, to add workers provide a name for each one and an SSH key
2021-06-15 10:00:44 +00:00
```bash
leylines init -n myserver -i 1.2.3.4 -k path/to/ssh-key
leylines add -n worker-0 -k path/to/ssh-key
...
leylines add -n worker-n -k path/to/ssh-key
2021-06-15 10:00:44 +00:00
```
sync wireguard settings (this applies the configuration to the server's wireguard interface)
2021-06-15 10:00:44 +00:00
```bash
leylines sync
```
get status
```bash
leylines status
```
### connect a worker
2021-06-15 10:00:44 +00:00
get config for a node
```bash
leylines get-conf <id>
```
manually copy that config to your worker node, `/etc/wireguard/leyline-wg.conf` and then
`systemctl enable --now wg-quick@leyline-wg`
currently the wireguard topology is a star. this doesn't actually work optimally for my config,
where some nodes are colocated and should have direct connections to each other and others should go
over WAN to reach distant nodes. this will be changed in a later version
### provision workers
run the ansible playbook. this will provision the needed components for dask on the server and all
workers
2021-06-15 10:00:44 +00:00
```bash
cd leylines-ansible
ansible-playbook -i leylines_inv.py playbook-setup.yml
```
the first run will take a while. it builds python 3.9.5 and installs it, then builds a virtualenv
with python dependencies in it, and then installs and starts systemd user services
2021-06-15 10:00:44 +00:00
now you can open `<your server's wireguard ip>:31336` to view the dask dashboard (or if you are
proxying it with nginx, it should be available there too)
2021-06-15 10:00:44 +00:00
use the cluster with
```python
from dask.distributed import Client
client = Client("<your server's wireguard ip>:31337")
```
2021-06-16 12:39:03 +00:00
or more easily
```python
from leylines.dask import init_dask
client = init_dask()
```
or
```python
from leylines.dask import init_dask_async
client = await init_dask_async
```
`leylines.dask` also provides `tqdmprogress` which can be used in the place of
`distributed.diagnostics.progress` for a task monitor using `tqdm`, and `tqdm_await` which can be
used with an iterable of dask futures to display progress as they go (but only for async clients)
```python
futures = [ some list of futures ... ]
async for fut in tqdm_await(futures, pbar=<optional tqdm instance to use>):
print(fut.result())
```
2021-06-16 12:39:03 +00:00
### time for magic
copy `leylines-support/02-dask.py` into `~/.ipython/profile_default/startup`
this provides 2 new spells: `%dask` connects to your cluster, and `%daskworker` splits off a new
ipython console on a worker selected by having free RAM available and not being busy. this is useful
for ad-hoc code testing on a real worker
%dask also installs `client`, a reference to the client, and `tqdmprogress` from `leylines.dask`,
and `upload` which uploads a file and returns a delayed function which will fetch the filename on a
worker
2021-06-18 08:11:01 +00:00
### resources
there is an abstract idea of nodes having resources which can be controlled by `leylines
add-resource` and `leylines del-resource` (and `leylines status` shows you the resources). currently
this assigns those with quantity 1 when starting the workers. due to a limitation of dask every
worker process inherits the same quantity of resources. you can assign resources in a more ad-hoc
way by opening an ipython session to a worker and then calling `await
distribted.get_worker().set_resources(someresource=1)`, which will _temporarily_ assign that to
the worker. if you modify resources through leylines you will need to run the ansible playbook again
to apply the changes. you can use `--start-at-task "install systemd task"` to save some time