This refactors all handling of audio-only and last N to 2 features in preparation
for "low bandwidth mode".
The main motivation to do this is that lastN is a "global" setting so it helps
to have all processing for it in a single place.
On Android we go into "immersive mode" when in a conference, this is our way of
being full-creen. There are occasions, however, in which Android takes us out of
immerfive mode without us (the application / SDK) knowing: when a child activity
is started, a modal window shown, etc.
In order to be resilient to any possible change in the immersive mode, register
a listener which will be called when Android changes it, so we can re-eavluate
if we need it and thus re-enable it.
Simplify the code by using a bitfied instead of a couple of boolean flags. This
allows us to mute the video from multiple places and only make the unmute
effective once they have all unmuted.
Alas, this cannot be applied to the web without a massive refactor, because it
uses the track muted state as the source of truth instead of the media state.