N-Screen backend: XMPP/Jabber and group chats

The idea of N-Screen (demo) is to have real-time small-group non-text communication – so for example, sharing a programme (or perhaps a specific point in a programme) with a person, with a TV, or with a group, using drag and drop.


N-Screen related content screenshot

We had a number of very specific requirements:

  • Real time communication
  • Different types of receivers (people, TV/video players, others)
  • Structured data transfer
  • Anonymous usage

We also needed good, open tools and libraries available because of the limited amount of time we had to implement.

Like several other groups, we’ve been using XMPP (Jabber) for the backend because it works in real time and has plenty of tools and libraries. Others have been using the PubSub framework to broadcast synchronised content to connected devices, but integral to our plan was to enable any people watching to also be able to share. I had a surprising amount of success with using a central negotiator that allowed ad-hoc groups to be formed from anonymous users, populating each user’s roster with other people it knew about. However, a much less error-prone approach has been to use ad-hoc XMPP group chats, and this has enabled us to make a pure HTML/Javascript implementation with no backend dependencies apart from an XMPP server and some simple APIs to the database of content.

I’ll talk a little about the requirements in more detail, mentioning some implementation issues as we go.

Requirements

Real-time communication

This is essential for drag and drop between devices to be ‘realistic’ – i.e. for a good user experience. Network issues can always be tricky here, particularly under demonstration (rather than real-life) conditions.

Different types of receivers

A ‘TV’ listens for ‘play’ and ‘pause’ messages and does something with them. A ‘person’ listens for ‘drop’ messages and displays them appropriately. There might also be other kinds of listeners – loggers perhaps, or bots that enhance or modify content dropped to them. All types need to take account of who is joining the group and the kind of thing that they are so that they can do the right thing and display them appropriately.

Structured data transfer

For user experience reasons a fair bit of data needs to be send on most interactions. A shared item needs to have basic metadata (identifier, URL, title, description, image) and also who shared it. Other kinds of message include announcements about the kind of thing you are. We chose Json as the body of the XML XMPP message, though XML would also have been fine or better. One issue is that ‘IQ’ (hidden data) messages cannot be sent to group chats, so that all group messages are visible in a standard chat room if connected to with, say, PSI.

Anonymous usage

Although there is plenty of potential for connecting N-screen with Twitter and / or Facebook, we didn’t want to require it. In N-Screen you need to give a name so that other people using the application can refer to you, but that’s the limit of the requirement for identification. For scalability and maintainability reasons we didn’t want to create a lot of named users on the XMPP server. Fortunately, XMPP allows you to create group chats with anonymous users, which is perfect for our needs.

The setup

We’ve been through many iterations to get here but I’m now pretty happy with the setup we have.

Ejabberd server with Bosh and group chat enabled

Ejabberd is not particularly simple to set up, but once it is up, seems pretty stable. I’ve put some tips on troubleshooting here (scroll to the end). PSI is a great tool for debugging as you can set it to log the XML messages going past.


PSI view of a groupchat created behind the scness of N-Screen

PSI XML view of a groupchat in N-Screen

One thing to note is that for ejabberd at least the group chat URL is

[room_name]@conference.[server]/[nick]

e.g.

default@conference.localhost/libby

APIs to the content

I used a simple ruby server and mysql backend to generate Json search and random APIs. For content-to-content recommendations for TED we have used TF-IDF analysis of the transcripts using this code by my BBC colleague Chris Lowis.

The workflow is as follows:

  • The user goes to a webpage, and gets an alert requesting their name
  • Based on the window hash (the bit after ‘#’), the Javascript chooses what group chat to join / create, using Strophe over Bosh to make the connection and announces itself to the room using a presence message with the name provided by the user
  • The eJabberd server then automatically tells the user about the other partipants in the room, and the Javascript renders them either as users or as TV
  • The ‘TV’ is also a piece of Javascript / Strophe that additionally announces itself to all joiners of the room as a particular type of thing (a ‘TV’). Multiple TVs are allowed in the room.
  • All user pages keep a list of all TVs, and on dropping to the TV sends a programme onto all of them
  • On leaving the page the user is disconnected from the eJabberd server – this can take a few seconds to percolate to the user interface.

The rest is client-side, which I’ll talk about further in another post. Feel free to try out N-Screen here.

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s