This post concerns the use of queries or fragments in the URI specification for accessing segments of media over HTTP. We outline the user-visible differences between the two approaches, including the form of the URIs seen by users in each scenario and the consequent user interface activity, and then explain the HTTP request and response mechanisms that result. The purpose of this analysis is to better understand the trade-offs in usability and the impact on network performance, with reference to existing implementations rather than hypothetical scenarios.
I will make the case that the user-visible differences between the two syntaxes are immaterial, and that a more important distinction is that they induce different protocols. I will also claim that the use of the fragment syntax introduces unnecessary complexity in that it lacks a discovery mechanism and has no useful fallback to existing HTTP.
User-visible differences
We are constructing a URI syntax for addressing segments of media data. Taking the simple case of addressing some video content beginning at an offset of 10 seconds, we consider the two forms:
- Query syntax:http://www.example.com/media.ogv?t=10
- Fragment syntax:http://www.example.com/media.ogv#t=10
For simplicity here we are using a shortened segment identifier t=10; I touched on the topic of segment identifiers in a recent article about pretty printing durations.
Regarding the direct HTTP semantics of these two forms, if the user is already viewing the specified media.ogv, the query syntax reloads the portion from 10 seconds as a new resource, whereas the fragment syntax modifies the view of the current resource.
Although developers are rightly wary of a page refresh due to the time required to render complex HTML, in practice no visible change occurs when reloading a video. The query syntax has been used to control video seeking in JavaScript (using the Java cortado video player plugin, or an earlier Oggplay plugin), and also natively in the current Firefox 3.5 implementation.
In any case, this distinction is only user-visible if the video is the top-level resource. In the common case of a web page that embeds a video, the user-visible resource is the HTML page. In this case, the mechanism for controlling video is under the control of the embedding web page via JavaScript.
For example, URIs to YouTube pages allow a time segment to be appended using a fragment syntax. However, this fragment is used by JavaScript to control the embedded Flash video player; the mechanism for then retrieving video data is then managed by the Flash player. Similarly, in HTML5 Ogg <video> implementations, a fragment identifier appended to the HTML page may be interpreted by JavaScript to control seeking in the <video> source using a non-fragment mechanism, like query syntax.
Differences in request mechanisms
Either way we introduce a new behaviour that user agents can use to retrieve media segments over HTTP.
When handling a media segment which is specified by a query, the user agent initiates a standard HTTP request. It connects to port 80 on the specified host, and uses the entire path, including the query specifer, in the GET request. The server then begins transferring the required data representing that segment of the media.
To retrieve the URI http://www.example.com/media.ogv?t=10:
GET /media.ogv?t=10 HTTP/1.1 Host: example.com
However the proposed request mechanism for handling a segment specified by a fragment is not standard HTTP. In conventional HTTP, a fragment specifier is stripped by the user agent and not sent to the server at all; rather, the server sends the requested response (representing the entire resource), and after retrieval, the user-agent uses the fragment specifier to select the view shown to the user.
A recently proposed behaviour for handling media segments involves placing the segment specifier into the Range HTTP Request header, with a new units of seconds.
To retrieve the URI http://www.example.com/media.ogv#t=10:
GET /media.ogv?t=10 HTTP/1.1 Host: example.com Range: seconds=10-
Response mechanism: byte-range redirection
The byte-range redirection response mechanism involves identifying parts of the segment view which are byte-wise identical to the original resource, and specifying redirections to those.
How discovery works
A user-agent will only receive a byte-range redirection response if it has indicated that it is capable of interpreting that, by including an extra HTTP request header. For example, here using a media segment URL specified with a query parameter:
GET /media.ogv?t=10 HTTP/1.1 Host: example.com X-Accept-Range-Redirect: bytes
If the server is capable of handling the byte-range redirection mechanism, it will do so and indicate that it has done so explicitly in its response headers.
Query syntax has a sensible fallback to standard HTTP
However if the extra request header is not present, the server will simply send an entire response corresponding to the requested segment. Similarly if the header is present but the server is not capable of this new mechanism, it will simply continue with a standard HTTP response. The client can tell if the response is a segment response or not by the presence of an acknowledging response header.
If either client or server does not understand the byte-range redirection protocol, the request falls back to standard HTTP and the required segment is correctly returned. The cost of this fallback, compared to the case where both client and server understand the new request/response headers, is a loss of cacheability for subsequent overlapping segment requests.
Fragment syntax has a high cost of failure
The mechanism involving the fragment specifier does not have a fallback to standard HTTP: if the client does not understand that it should add the Range header with newly defined units, then it will end up simply requesting the entire resource. Similarly, if the server does not understand the new header then it will simply respond with the entire resource. If the cost of failure is to download some number of hours of extra video, as it would be in the case of MetaVid's congress proceedings, that is a prohibitive cost.
Summary- The distinction is one of protocol mechanism
- For the common case of video displayed in HTML, the distinction is not user-visible
- The use of fragment specifiers do not have a fallback to standard HTTP
- The cost of discovery failure for fragments is high (retrieval of entire resource)
- To clarify within the Media Fragments WG how queries can be used effectively, for both considered user scenarios.
- To consider how the byte-range redirection mechanism can be generalized for other segment specifiers, such as spatial regions.