This refactors all handling of audio-only and last N to 2 features in preparation
for "low bandwidth mode".
The main motivation to do this is that lastN is a "global" setting so it helps
to have all processing for it in a single place.
Simplify the code by using a bitfied instead of a couple of boolean flags. This
allows us to mute the video from multiple places and only make the unmute
effective once they have all unmuted.
Alas, this cannot be applied to the web without a massive refactor, because it
uses the track muted state as the source of truth instead of the media state.