In setting the playout schedule at the receiver, we have to deal with the tradeoffs among delay, loss and voice SNR degradation. Here we refer the event that both descriptions are either lost or discarded as packet loss, while we use the term SNR degradation to describe the situation that only one description is lost or discarded.
A playout deadline must be set before the arrival of each packet i. With
no knowledge of the future network delay of packet i, we set the playout
deadline according to the most recent delays we recorded in the past. We
denote the playout deadline for packet i by , which is the
time from the packet is delivered to the network till it has to be played
out. It is the total end-to-end delay of packet i (without including the
packetization time at the sender), which characterizes the latency of
transmission and playout.
In order to determine , we define a Lagrange cost function for
packet i as follows
where and
are the estimated loss probability of the packet from stream 1 and 2
respectively, given certain
. The estimate of
and
is based on past delays
recorded of the two streams, which will be discussed in Subsection 2.4 in detail. The Lagrange multipliers
and
are predefined parameters to balance the tradeoffs.
is the SNR degradation of packet i, or the noise power
introduced by receiving only one description.
is a constant
in (1), depending on the codecs used. Since our concern
here is transmission, the received SNR is compared to that of the quantized
(in full resolution) signal at the sender.
The playout deadline is obtained by searching for the optimal
which minimizes the cost function. Perceptually, the quality degradation
resulting from high latency and high loss rate is ``orthogonal''. The
multiplier
is used to tradeoff total delay and loss
probability. Greater
puts more penalty to higher loss rate,
and the optimization results in lower loss rate at the cost of higher
latency.
For multiple streams, we are also concerned about the voice quality when we
do not receive all the MDC descriptions. The third term in (1) with multiplier is introduced to give penalty
to degraded SNR as a result of receiving only one description. The greater
is, the better the SNR of reconstructed signal, at the cost of
higher delay. One should note that, packet loss by losing both descriptions
(the second term in (1)) and SNR degradation (the third
term in (1)) are not orthogonal perceptual experiences.
Packet loss also impairs SNR greatly. From (1), it can be
observed that greater
also leads to lower loss rate, which
makes the existence of
trivial. However, with very small
, only packet loss is given emphasis. In this case good
reconstruction quality is not a priority but latency is given more concern,
with the tradeoff between loss rate and delay determined mainly by
. In practice, this is usually desired since human perceptual
experience is impaired by high latency most, while the degraded voice
quality can be largely tolerated [9].
Figure 2: Playout scheduling of multiple streams.
Fig. 2 illustrates the scheduling process when
is small and low latency is given more emphasis. The source stream is coded
and sent in two streams p and q. The playout deadline is being kept to
the minimum level and dynamically adjusted according to the varying delay
jitter of the two paths. At the receiver, the first two packets played are
taken from stream p, since they have lower delays. As the delay of stream p increases, the playout switches to stream q and adjusts the scheduling
accordingly, so as to avoid any late loss while keeping buffering delay low.
The playout switches back to stream p from the 5th packet, when the
turbulence in path p is over and the network delay comes back to normal.
In adaptive playout, proper reconstruction of continuous output speech is
achieved by scaling individual voice packets using a time-scale modification
technique which modifies the rate of speech [10].