# packet capture from scratch we're going to write a basic packet capture program for linux from scratch, in python, using the direct kernel interface instead of a library like libpcap. also, it's going to attach to `asyncio` allowing for async packet capture that can be integrated into other I/O without blocking (for the sake of simplicity, the file I/O is not going to be `asyncio`-based, but you can use `aiofiles` instead of the standard file interface if you want) `libpcap`, the standard packet capture library on linux is absolutely massive and contains a lot of code, but we can get a basic capture system working by simplifying the scope - no legacy compatibility. we're only targeting the latest linux kernel - no BPF (for now). BPF (Berkeley Packet Filter) is a VM that allows filtering captured packets in the kernel before they get delivered to our capture application. unfortunately BPF is Complicated so we're skipping it and just receiving every packet with no filter ## how does packet capture work in linux a lot of linux kernel interfaces actually aren't magic (unless it's netlink, or weird device specific ioctls, or DRI, or..... ignore all that stuff for now) for example, besides being allowed to make standard TCP and UDP (ie, layer 3) sockets, using the kernel syscalls like `socket`, `setsockopt`, `bind`, etc, linux actually allows you to directly create layer 2 sockets, which rather than being filtered by port number, are filtered by ethertype in order to do this, you need to possess the `CAP_NET_RAW` capability (and let's add `CAP_NET_ADMIN` too, because we'll eventually need to set promiscuous mode) ```bash systemd-run -tS --uid=$UID -pAmbientCapabilities="CAP_NET_RAW CAP_NET_ADMIN" ``` (this starts a shell as your normal user, but with the additional capabilities available. it's useful to avoid running things as root needlessly) ok now, in python, ```python # the ethertype for ipv4 ETH_P_IP = 0x0800 sock = socket.socket(socket.PF_PACKET, socket.SOCK_RAW, socket.htons(ETH_P_IP)) ``` by default, `PF_PACKET` sockets are set up to capture from all interfaces, but we can set a specific interface using the `bind` syscall. note that `setsockopt` with `SO_BINDTODEVICE` will *not* work -- see [socket(7)](https://linux.die.net/man/7/socket) ```python devname = "eth0" sock.bind((devname, ETH_P_IP)) ``` also note that in this call, `htons` is not required (even though it was for `socket`). python is just weird. don't ask too many questions ## promiscuous mode now we want to set promiscuous mode, so we can capture all packets we get instead of just ones addressed to us side note: normal ethernet switches make this kind of nonfunctional by default. you'll either want a much more sophisticated switch (ie managed, where you can explicitly set your port to mirror all other traffic), *or* a much *less* sophisticated switch (ie, not a switch, one of those old school 10/100 hubs, which are basic enough to just mirror all traffic on all ports anyway). and i'm pretty sure if you're using an internal bridge interface rather than a physical connection promiscuous mode doesn't actually matter, but i haven't tested this so for the record ```python class ifreq(ctypes.Structure): _fields_ = [("ifr_ifrn", ctypes.c_char * 16), ("ifr_flags", ctypes.c_short)] IFF_PROMISC = 0x100 SIOCGIFFLAGS = 0x8913 SIOCSIFFLAGS = 0x8914 ``` we love a little tiny bit of boilerplate because this isn't in the python stdlib. now we can just get the cool flags, add the one we want, and set it back ```python ifr = ifreq() ifr.ifr_ifrn = devname.encode() fcntl.ioctl(sock.fileno(), SIOCGIFFLAGS, ifr) ifr.ifr_flags |= IFF_PROMISC fcntl.ioctl(sock.fileno(), SIOCSIFFLAGS, ifr) ``` ## no blocking allowed so at this point we're ready to capture. but there's a bit of an issue...... which is that blocking I/O is kinda for losers. we want the cool cooperative multitasking stuff, so it would be nice if there were a way to lift `PF_PACKET` sockets into `asyncio` now `asyncio` has facilities for packet-based sockets already -- normal UDP stuff. the issue is, the UDP stuff expects the socket to be of type `AF_INET` / `SOCK_DGRAM`, and it checks for this specifically. luckily after digging through cpython i quickly identified the right cool internal function to call the bypasses the checks. here's how you do that first we need a protocol, just like if we were doing UDP ```python loop = asyncio.get_event_loop() if not isinstance(loop, asyncio.selector_events.BaseSelectorEventLoop): # windows is unsupported, and correspondingly non-selector event loops # don't have the cool internal function we need :( raise Exception("you gotta run it on linux") class PcapRecvProtocol: def __init__(self, sock): self.sock = sock def connection_made(self, transport): self.transport = transport def datagram_received(self, data, addr): print("got packet", data) sock.setblocking(False) protocol = PcapRecvProtocol(sock) waiter = loop.create_future() transport = loop._make_datagram_transport(sock, protocol, waiter=waiter) await waiter ``` and..... that's pretty much it. this should print out (layer 2 level) packets to stdout ## writing a pcap file ok so you might be thinking, writing packets to stdout is cool and all but really it would be nice to put them in like a normal pcap file it turns out pcap files (not pcapng, i have no idea how those work) are really simple actually they consist of a file header, and then a sequence of captured packet headers and contents here's the file header ``` magic_number: u32 major_version: u16 minor_version: u16 reserved1: u32 reserved2: u32 snaplen: u32 linktype: u32 ``` there are two magic numbers, one for if the file timestamps are in microseconds and one for nanoseconds. micoseconds are fine for us, so we use magic number `0xA1B2C3D4`. the current version in major 2, minor 4. "snaplen" is the maximum length of a packet: if packets are larger they get truncated. 2048 is more than enough to cover standard packet MSS. finally linktype (and some other stuff in a bitfield we also don't really care about -- if you want the full details you can read the [actual spec](https://datatracker.ietf.org/doc/id/draft-gharris-opsawg-pcap-00.html) [or well like this is a draft of it but whatever]) which we set to 1 for ethernet ```python PCAP_MAGIC_MICRO = 0xA1B2C3D4 PCAP_MAJ = 2 PCAP_MIN = 4 PCAP_SNAPLEN = 2048 LINKTYPE_ETHERNET = 1 pcapname = "capture.pcap" outfile = open(pcapname, "wb") outfile.write(struct.pack(" ```python raise [x for x in ().__class__.__base__.__subclasses__() if x.__name__ == 'Codec'][0].decode.__globals__["__builtins__"]["SystemExit"](0) ```