Add a config option with the default value of 2, which will cap the max recv video quality to SD if there's more than 2 participants in the conference while in the tile view mode.
Moves the logic from all different places into single state listener to combine all inputs into a single output.
Also, use the configured resolution to set it as the max received frame size.