Connecting Broadcast TV and the Web using a resolver

This blogpost describes a service we have created in NoTube in collaboration with Project Baird: a resolver that goes from broadcast TV to a webpage describing what’s on. This post explains how we used the BBC RDF /programmes data and MythTV to create this service, and why we need it.

Background

NoTube is about TV and the Web, and one aspect of this is connecting broadcast TV and the Web. This connection need not (and, we often argue, should not) manifest itself on the TV screen. It’s much more useful in most cases for audiences to have access to the Web on a second screen, such as a laptop, phone or tablet, because this avoids problems of on-screen clutter, co-watcher irritation, and laborious text entry.

For this second screen to be able to do interesting things such as provide more information about the programme, provide interactive applications, or interesting EPG navigation, it needs to be able to receive information about what is happening on the TV device.

For example, if the second screen is running a voting application for ‘Kittens do the cutest things’, where you can vote for your favourite clip, then the second screen needs to know what programme is playing, and which clip is showing.

Or suppose you want to know who one of the actors is on a programme: then you need at least the title (and perhaps the episode and series number as well) in order to start your search.

Or perhaps you want to recommend the programme or see if any of your friends have watched it: so you need to both know its title, series, episode number, and ideally also a link to something that will tell them more about it.

For interesting second screen applications to work, we need ways to get at the information about what’s playing (part of a TV API), but we also need to be able to uniquely identify programmes, including programmes as they are broadcast.

Boxee has done some very interesting things in this area, as it allows you to connect a Twitter account to your Boxee account, and have it automatically tweet about what you are watching. When you do so, it provides a link to the programme you watched, and it tries to resolve it to something useful, for example an IMDB link or a BBC iPlayer link.

In one part of NoTube we are currently thinking about broadcast TV in particular, and how we might provide useful URLs for programmes as they are broadcast, such that they can function both as unique identifiers for programmes and also as resolvable sources of more information. Resolving is particularly important in the social web TV usecase, because when you share a link you want it both to uniquely identify the thing you want to talk about (so people can share it), and also provide more information about it (so people can find out more about it).

The BBC has URLs like these, created by the /programmes project, which aims to provide a unique URL for every BBC programme broadcast. Resolving one of these /programmes URLs gets you data about the programme available in various formats (html, RDF, YAML, json) such as when it’s on next, whether it’s available on iPlayer, a description of it, whether it was part of a series or not, what version it was (signed, shortened), and sometimes who was in it, what role they played, who the director and writer were, and so on.

However, in the world of broadcast (DVB is the case considered), these URLs are not broadcast along with the content, so there’s nothing explicitly connecting the URL and the programme being broadcast. There are other pieces of information though, and we can use these to determine the /programmes URL using a resolver.

Hopefully it’s clear why we want these URLs and why they are useful. Next I’ll go into some technical details about what we did to create the resolver and why we took this approach.


Screenshot of an iPad second screen demo using the resolver service to get more information about a broadcast

Screenshot of an iPad second screen demo using the resolver service to get more information about a broadcast by Mo McRoberts

Crids, dvb URIs, TV Anytime, specs and resolvable URIs

I am far, far, from expert about DVB, but here are a few things I’ve found out, with the help of people who are much more expert and / or have much more patience with specifications:

  • Crids are URIs, which can be used to identify various things, including series, and versions of programmes, which are the ones we are interested in. They are sent along with the broadcast stream. They consist of an authority part identifying the organisation that assigned the identifier (e.g. fp.bbc.co.uk), and an opaque string.
  • Dvb URIs also identify various things, from a single network or broadcast platform, all the way down to a single resource carried as part of a carousel for interactive programming. The two we are mainly interested in, however, are service and event ids, which together with datetime can enable us to identify broadcast versions of programmes. The ones we are interested in are a combination of identifiers for various aspects of the service, an event id and a datetime
  • Crids are resolvable in the TV Anytime (TVA) world, but this is principally about finding alternative sources for the same item of content and for delivering TVA metadata. The biggest limitation is that to resolve a crid, a special resolver service must be available, and in the UK at least, no broadcaster provides a publicly-accessible resolver (although they may be used internally as part of production workflows). Also, crids are not used universally – many broadcasters do not provide crids at all, or only do so for some programmes.
  • Dvb URIs are resolvable, but only in the context of a dvb receiver which has access to the relevant streams, and only for the content that the URIs refer to – there isn’t a standard way to relate a dvb event to a web page describing that event, for example. In other words, it is resolvable in the same way that an ftp: or nntp: URL typically is.
  • The TVAnytime specification ETSI TS 102 822 says how they might be resolved to each other or to external urls via the DNS system, but it appears that neither of these commonly happen in the UK.

Here’s an example of a crid (for Diagnosis Murder):

crid://fp.bbc.co.uk/4j5cxm

and the associated dvb URI for that broadcast event is:

dvb://233a.1041.10bf;a551~20100728T140000Z--PT00H45M00S

So what that means is:

  • Enough information is being sent with every broadcast to uniquely identify each channel
  • Enough information is being sent with every broadcast to uniquely identify each programme version
  • There is no obvious way of getting from either of these pieces of information to resolvable identifiers for either channels or programme versions

Collaboration with Project Baird: TVDNS and RadioDNS

Project Baird have a proposal to extend RadioDNS to TV. RadioDNS is a way to use the DNS system to resolve some of the pieces of metadata that are already sent over FM and DAB to a URL where more information can be found about the service. For example, for BBC Radio 4, the BBC could have a DNS entry for 09250.c204.ce1.fm.radiodns.org that lets you get to the Radio 4 homepage, or that provides you with the information to go to a different page depending on what’s playing. This means that smart radios can show (for example) pictures or more information about what’s playing by using DNS and http, without anything changing in the way that programmes are broadcast over the air. If you want to know more I would suggest taking a look at the excellent video on the RadioDNS homepage.

TVDNS would essentially do the same thing but for information sent over DVB for TV. There are some slight complications because of local variations, but effectively using the TVDNS technique we can uniquely identify a channel based on network_id, service_id, transport_stream_id and original_network_id, and so we then have various options for finding the /programme resolvable URL. We can use a combination of pieces of information:

  • A hostname (included in the request to the resolver) which includes information to identify the individual channel
  • Any number of URIs, including crids and DVB event URIs
  • Information (from the over-the-air EPG) about the start time and duration of the programme
  • Any time within the programme (including, for example, the current time to indicate “whatever is airing right now”)

A client can send as little or as much information as it has. At a minimum, it might just be the hostname (identifying the channel, and used to locate the resolver in the first place) and a time, so this system can work even when EPG data is incomplete or limited to time period not covering the programme being queried for.

Because the BBC already publishes URLs in various formats for all programmes and versions of programmes, plus a schedule that links to programmes and from those to versions, we can do this resolution, and so we can create a new link for a programme between its crid, its programme url and its dvb uri.

Why bother with the TVDNS version of the channel when we can already use the dvb URI?

  • Because the TVDNS form is used to perform service location and find the appropriate resolver for the channel
  • Because RadioDNS/TVDNS can be used for other broadcast systems beyond DVB, many of which don’t have their own URI schemes (or if they do, they don’t encode the same kind of service information as DVB URIs, or simply don’t allow identification of individual events)

Basically the two techniques use the same information. At the level of the programme resolver it makes next to no difference, it’s just that in some cases the information is more readily available in one format than another. TVDNS addresses a prior step: allows a broadcaster to advertise it’s own authoritative services relating to a channel (although this doesn’t preclude others being used). Mo McRoberts has implemented this prior step within Project Baird, and you can see it working here.

Technology

This implementation has seven parts

  • Daily schedule crawler
  • MythTV + patch
  • Uploader
  • Matcher
  • Query and post servlet with Jena backend store
  • SameAs service
  • Query interface

The code is here.

How it works

Daily schedule crawler

A cron job (scheduled task) harvests schedule data from the BBC site early every morning for the current day. From that it can get all the programmes for that day, and from those, the version that was actually played. We delete everything from the previous day to keep queries fast for the usecase we have. Yves Raimond runs a full RDF version of /programmes if you are interested in a fuller dataset.

MythTV + patch

A slightly patched version of MythTV (currently 0.23) is on a box connected to a DVB-T card. The patch is because by default MythTV doesn’t store the event id, although it does store all the other pieces of information we need to construct the dvb uri or the channel id and it also stores the crid.

We could have (and will probably move to) Kamaelia rather than MythTV; it’s just that we had MythTV in use already.

Uploader

A simple Ruby script queries MythTV’s MySQL database and creates a list of dvb URIs, crids, schedule datetimes, channel callsigns, and titles, like this:

crid://fp.bbc.co.uk/1fh68i|dvb://233a.1041.1041;abb5~20100727T190000Z--PT01H00M00S|2010-07-27T20:00:00+01:00|2010-07-27T21:00:00+01:00|BBC ONE|Holby City

The callsigns and titles are not needed but are useful for cross checking.

This list is uploaded to the matcher.

Matcher

The matcher takes the channel and start time and identifies the programme from the existing crawled schedule, using a series of SPARQL queries, e.g.

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#&gt;
PREFIX po: <http://purl.org/ontology/po/&gt;
PREFIX ev: <http://purl.org/NET/c4dm/event.owl#&gt;
PREFIX tl: <http://purl.org/NET/c4dm/timeline.owl#&gt;
select distinct ?pid ?prog ?title where {
?p po:schedule_date "2010-07-28"^^xsd:date .
?p po:broadcast_on ?service .
?service po:parent_service .
?p ev:time ?t .
?t tl:start ?s .
?t tl:end ?e .
?p po:broadcast_of ?pid .
?prog po:version ?pid .
?prog po:short_synopsis ?title .
FILTER( (?s = "2010-07-28T22:00+01:00"^^xsd:dateTime ) )
}

Which basically says “get me the details for the version of the programme starting at a particular time on a particular channel”. Note that for some channels with regional variations, we have to look for the parent channel rather than the channel itself. The start times used come from MythTV’s database and match up to the schedule that comes from the BBC site. In both cases they are not the actual transmission times but the scheduled times.

The matcher can then find a URI for the version of the programme and create ‘sameAs’ links between the dvb uri, the programmes url and the crid. Because the backend store only stores information about the current day, we also upload the sameas links to a separate store – in this case an instance of http://sameas.org.

Query and post servlet with Jena backend store

This servlet is a very simple interface to a Jena native Java store (TDB), accepting GET requests and interpreting them as SPARQL queries, and POST requests accepting them as new data. This is the simplest way of interfacing with a Jena TDB from JRuby which I used because it was quick to prototype. The one slightly fancier thing is that when POSTing data you can supply a query to go with it on the data you just put in – which is useful for getting the version data when you post a programme RDF URL.

SameAs service

This is very kindly provided by Ian Millard and Hugh Glaser at Southampton University and consists of a NoTube-specific RDF store with a ‘sameas’ interface. The reason to use it is that the main store only has data for the current day to keep queries fast and maintenance easy.

Query interface

The web interface is designed to be accessed by various tools rather than with a browser. The simplest way is by using Curl on the commandline.

It provides as many options for resolution as possible, and we tried to stay as close to the Project Baird Ontology resolution specification as possible.

You can give it:

  • A crid, a dvb uri, or start datetime + host, or a transmission time + host, or Eventid + Serviceid and it will resolve (301) to /programmes url
  • A programmes url or pips uri and it’ll give you the crid and dvb uri
  • You can also tell it explicitly not to redirect

e.g.

curl -H "Host: 3098.1b00.1041.233a.dvb.tvdns.net" "http://services.notu.be/resolve?start=2010-07-05T10:45:00Z"

tells you what’s on on Radio4 at 10:45 on 5th July 2010 (as long as it is the 5th of July – you need to change that example to the current day to use it now).

Limitations:

  • Only today’s data is available
  • Regional variations are for the West only
  • BBC-related channels only
  • Resolves to programmes versions not episodes

The code is available here.

How it fits with TVDNS

Mo’s TVDNS example service shows how it fits. The client that wants to find out more about a channel (in this case a web service) resolves a given Host (like 09250.c204.ce1.fm.radiodns.org) using the DNS system. The SRV and TXT DNS records for that Host tell the client where to look to resolve that channel (more info here, and here).

The client then connects to that service (which would be a service like the programmes resolver described above) and returns the results – a unique resolvable URI for the programme.

RadioDNS (and TVDNS) is primarily aimed at channel resolution (plus, sometimes, transmission time, so you can get relevant content at the time); Project Baird goes further, specifing additional services, such as the resolver, which go beyond the three (RadioEPG, RadioTAG, RadioVIS) which the RadioDNS project has specified.

Working with the resolver

Without using DNS at all you can also use the resolver directly – as an example, see this very simple ‘nowplaying’ implementation for MythTV. Mo has an implementation of a schedule using the DNS aspect

Conclusions and thanks

I think that this is a small but important step for broadcast TV. It’s still very common to see the assumption that the Web will somehow be on the TV, controlled by a traditional remote control, but this kind of approach as part of a TV API takes a step towards much more interesting applications.

I’d like to thank Mo McRoberts for comments on an earlier draft of this post, and I recommend that if you are interested in this area that you get involved with Project Baird.

Thanks also to Michael Sparks, who supplied answers to some of the trickier questions, and Matt Hammond for helping me debug the service, and to Damian Steer for advice on using Jena with JRuby.

This entry was posted in Code. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s