leylines/README.md

81 lines
2.7 KiB
Markdown
Raw Normal View History

2021-06-15 10:00:44 +00:00
# leylines
this repo enables managing a [dask](https://dask.org) cluster using
[wireguard](https://www.wireguard.com/) to link nodes which may be separated by WAN[^1] and includes
an opinionated mini wireguard manager (on the server side, workers use wg-quick) that doubles as an
[ansible](https://www.ansible.com/) inventory plugin. finally, ansible playbooks can run setup and
deployment for dask nodes
## how to
```bash
(cd leylines-monocypher && pip3 install --user .)
(cd leylines && pip3 install --user .)
mkdir -p ~/.config/leylines
leylines init -n myserver -i 1.2.3.4 -k path/to/ssh-key
leylines add -n worker-0 -k path/to/ssh-key
...
leylines add -n worker-n -k path/to/ssh-key
```
optionally copy the database to your laptop so you can run ansible locally (there will be some
actual API soon but not right now)
start a privileged shell (there is no service for the wireguard stuff yet -- coming soon)
```bash
systemd-run -tS --uid $(id -u) --gid $(id -g) -pAmbientCapabilities=CAP_NET_ADMIN
```
sync wireguard settings
```bash
leylines sync
```
get status
```bash
leylines status
```
get config for a node
```bash
leylines get-conf <id>
```
manually copy that config to your worker node, `/etc/wireguard/leyline-wg.conf` and then
`systemctl enable --now wg-quick@leyline-wg`
currently the wireguard topology is a star. this doesn't actually work optimally for my config,
where some nodes are colocated and should have direct connections to each other and others should go
over WAN to reach distant nodes. this will be changed (you may be sensing a pattern with the amount
of TODO)
run the ansible playbook
```bash
cd leylines-ansible
ansible-playbook -i leylines_inv.py playbook-setup.yml
```
the first run will take a while. it builds python 3.9.5 and installs it, then builds a virtualenv
with python dependencies in it, and then installs and starts systemd user services for the scheduler
and workers
now you can open `<your server's wireguard ip>:31336` to view the dask dashboard
use the cluster with
```python
from dask.distributed import Client
client = Client("<your server's wireguard ip>:31337")
```
2021-06-16 12:39:03 +00:00
### time for magic
copy `leylines-support/02-dask.py` into `~/.ipython/profile_default/startup`
this provides 2 new spells: `%dask` connects to your cluster, and `%daskworker` splits off a new
ipython console on a worker selected by having free RAM available and not being busy. this is useful
for ad-hoc code testing on a real worker
%dask also installs `client`, a reference to the client, and `tqdmprogress`, which can be used in
place of `distributed.diagnostics.progress` for a task monitor using `tqdm`, and `upload` which
uploads a file and returns a delayed function which will fetch the filename on a worker