The problem is the deactivate/activate calls, which introduce a period
of silence related to the length of time it takes the server to send
an SM_PLAY in response to SM_FINISHED.
The solution is for the speaker to lie to the server, and pretend that
tracks have finished 0.5-1s before they really have. This requires
mild ugliness scattered through the speaker process to maintain the
illusion, but keeps ugliness out of your ears l-)