Today we'll look at how to use sighttpd for multi-camera H.264 video encoding and streaming.
This post is the last in a series about using hardware video encoding and image conversion features of Renesas SH-Mobile on Linux. In earlier posts, we described the way we do resource management in userspace (libuiomux), use the hardware image manipulation features for colorspace conversion and rescaling (libshveu); hardware encoding with libshcodecs; and simple HTTP streaming from standard input with sighttpd:
- Driving the VEU from userspace (libuiomux 1.1.0 and libshveu 1.2.0)
- Multi-camera, multi-resolution hardware encoding (libshcodecs 1.1.0)
- A new HTTP streaming server (sighttpd 1.0.0)
Today's post ties all these together, showing how to use sighttpd's support for integrated capture, video encoding and streaming. We'll also look at the performance of the server under some light load, rather than the performance of raw encoding to /dev/null that was done in the earlier libshcodecs article.
(Apologies to people reading this from Planet Haskell, I'll have to whip up something with Happstack and Hogg to make up for the disruption ;-)
Configuration
The sighttpd.conf setup is fairly straightforward; we put the options for each stream that we want to serve into an <SHRecord> block, including the desired URL path and the location of the control file to use. The same control file that are used for shcodecs-record can be used (the output filename is ignored by sighttpd).
Listen 3000 <SHRecord> Path "/video0/vga.264" CtlFile "/usr/share/shcodecs-record/k264-v4l2-vga-stream.ctl" Preview off </SHRecord> <SHRecord> Path "/video0/cif.264" CtlFile "/usr/share/shcodecs-record/k264-v4l2-cif-stream.ctl" Preview off </SHRecord> <SHRecord> Path "/video1/vga.264" CtlFile "/usr/share/shcodecs-record/k264-v4l2-vga-stream2.ctl" Preview off </SHRecord> <SHRecord> Path "/video1/cif.264" CtlFile "/usr/share/shcodecs-record/k264-v4l2-cif-stream2.ctl" Preview off </SHRecord>
I turn the on-screen Preview off because the Ecovec board I'm using has no LCD panel and is instead plugged directly into an HDMI display, which introduces a lot of bus contention. Disabling the on-screen preview improves performance markedly.
This configuration on the host ecovec will make four H.264 streams appear at: http://ecovec:3000/video0/vga.264, http://ecovec:3000/video0/cif.264, http://ecovec:3000/video1/vga.264, and http://ecovec:3000/video1/cif.264. These streams are derived from two camera sources, which here happen to be /dev/video0 and /dev/video2 (sic) as specified in the control files.
Performance
Before any clients connect, sighttpd is continuously running the cameras, colorspace conversion, rescaling and encoding all 4 streams. The CPU usage is similar to that of shcodecs-record encoding 4 streams, ie. a little under 2% of this 500MHz SH7724 CPU:
top - 06:47:47 up 3:35, 2 users, load average: 0.17, 0.13, 0.24 Tasks: 50 total, 1 running, 49 sleeping, 0 stopped, 0 zombie Cpu(s): 1.9%us, 0.6%sy, 0.0%ni, 95.8%id, 1.6%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 248332k total, 211220k used, 37112k free, 0k buffers Swap: 0k total, 0k used, 0k free, 143752k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 27787 root 20 0 93284 9.8m 1396 S 1.6 4.0 0:01.72 sighttpd 27821 root 20 0 2976 1204 988 R 1.0 0.5 0:00.19 top 1 root 20 0 2372 708 620 S 0.0 0.3 0:01.46 init 2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
I hacked up the following quick script on a locally connected Linux PC to create 400 stream connections (100 to each of the 4 video streams) and fire them off one per second. The -m option to curl provides a maximum timeout for each connection, which we use here to fetch 20s of video during each connection. (If you know a similar option for httperf to tell it to receive only a specified duration of a continuous HTTP stream with each connection, please leave a note in the comments!)
#!/bin/sh for i in `seq 1 100`; do curl http://ecovec:3000/video0/vga.264 -o /dev/null -s -m 20 \ -w "$i vga0: HTTP %{http_code} , %{time_total}s %{size_download} bytes\n" >> benchmark.log & sleep 1 curl http://ecovec:3000/video1/vga.264 -o /dev/null -s -m 20 \ -w "$i vga1: HTTP %{http_code} , %{time_total}s %{size_download} bytes\n" >> benchmark.log & sleep 1 curl http://ecovec:3000/video0/cif.264 -o /dev/null -s -m 20 \ -w "$i cif0: HTTP %{http_code} , %{time_total}s %{size_download} bytes\n" >> benchmark.log & sleep 1 curl http://ecovec:3000/video1/cif.264 -o /dev/null -s -m 20 \ -w "$i cif1: HTTP %{http_code} , %{time_total}s %{size_download} bytes\n" >> benchmark.log & sleep 1 done
The middle section of the benchmark.log file produced (while there are 20 parallel connections) looks like this:
52 vga1: HTTP 200 , 20.001s 475165 bytes 52 cif0: HTTP 200 , 20.004s 211838 bytes 52 cif1: HTTP 200 , 20.608s 353310 bytes 53 vga0: HTTP 200 , 20.024s 963123 bytes 53 vga1: HTTP 200 , 20.015s 568863 bytes 53 cif0: HTTP 200 , 20.032s 1172898 bytes 53 cif1: HTTP 200 , 20.012s 1004619 bytes 54 vga0: HTTP 200 , 20.039s 1269070 bytes 54 vga1: HTTP 200 , 20.068s 951508 bytes 54 cif0: HTTP 200 , 20.059s 1088203 bytes
and while that is running, top looks like this:
top - 08:30:54 up 5:18, 2 users, load average: 0.30, 1.28, 0.79 Tasks: 49 total, 1 running, 48 sleeping, 0 stopped, 0 zombie Cpu(s): 12.0%us, 1.2%sy, 0.0%ni, 81.0%id, 3.6%wa, 1.2%hi, 0.9%si, 0.0%st Mem: 248332k total, 210472k used, 37860k free, 0k buffers Swap: 0k total, 0k used, 0k free, 144124k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 29162 root 20 0 130m 9.8m 1348 S 12.7 4.0 0:01.42 sighttpd 29169 conrad 20 0 2976 1204 988 R 1.3 0.5 0:00.15 top 1 root 20 0 2372 708 620 S 0.0 0.3 0:01.46 init 2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
I'm not claiming that it can handle thousands of connections, but at least we can be sure that an embedded camera system based on this will reliably provide all the streams that you have asked it to capture and encode without dropouts. The usual use-case for this is as an input to an HTTP stream repeater on a larger server with a faster upstream connection, designed to handle a much higher load.
The bigger picture
Stepping back, the point of this series of articles has been to demonstrate that it is very easy use hardware acceleration with Linux: we can export complex driver functionality to userspace, we can quickly develop layered applications, and we can do this while still leaving enough CPU around for other (perhaps unrelated) tasks.