|
I've just released the first new version of Spook in almost six months.
Embarrassing that I've let things go for so long, but at least all the
features that have been accumulating in private trees since then have
mostly been added. The network-side code has been overhauled which not
only made things more efficient but paved the way for multiple media
streams in the same RTSP session. Audio. w00t.
Since I started adding the audio support, the limitations and flaws in
the current stream implementation have been bothering me. It kind of made
sense at first to declare "the output from this grabber is named 'foo', and
the input to this encoder is named 'foo' and the output will be named
'bar'," etc, but it's not only confusing but it prevents me from
implementing a number of more useful features. It would be nice, for
example, to have Spook automatically open all the video and audio capture
devices it could get its hands on, compress them to some default reasonable
format, and put up a template webpage with a list of everything it found so
that the user could test out Spook without configuring anything at all.
Not possible if all capture and transformation actions have to be declared
explicitly.
There's two major changes I'd like to make to the way Spook creates
media streams. The first is to change the media compression and format
specification from an imperative form to a declarative form. There's no
good reason (other than Unix tradition, I suppose) to force the user to
explicitly list the various filters and encoders and the order they should
be used, when all that's really necessary is for the user to specify the
desired end result. "I want a 320x240 384kbps MPEG4 stream" is enough
information to set up the entire module pipeline automatically. The main
problem that arises is getting all the nuances of each type of stream
correct—providing two different streams, one of 30 fps and one of
10fps, can be done by dropping frames after compression with JPEG, but with
MPEG4 the frames must be dropped before they are encoded.
The other change is to convert the stream namespace into a hierarchy
instead of using free-form strings. This sounds minor, but it allows
modules to create new streams on the fly without causing confusion.
Through the magic of sysfs, the V4L module can automatically discover all
your USB webcams, configure streams for them, and create meaningful, static
names for them. The two cams plugged into the hub on the lower USB port on
the front of your system will be always be named something like
"Device::Video::USB::2-1:1.0" and "Device::Video::USB::2-2:1.0" no matter
when they were connected, because their name is based on their position on
the bus. The same devices can be accessible with their device path, like
"Device::Video::/dev/video1" or whatever, if you prefer that.
The real advantage of putting the streams into a tree is being able to
perform transformations over an entire subtree rather than on one at a
time. This would allow one part of the tree to contain a "mirror" the same
streams as another part, but having passed through one or more modules.
For example, the "Low Quality::" tree could contain the above streams as
"Low Quality::Video::USB::2-1:1.0", etc, compressed as 100 kbps 10 fps
video, rather than the native 30 fps uncompressed YUV format received from
the hardware. These mappings from an existing subtree to a new one would
be specified in the configuration file, of course, so the user could
control which subtrees were available and the exact parameters used.
At the end of the whole chain, there still needs to be some sort of
connection to make the stream available to clients across the network.
This could be done through explicit pairings ("rtsp://*/webcam" will serve
video from "Low Quality::Video::USB::2-1:1.0") or by exporting entire
subtrees ("rtsp://*/cams/" will serve anything below "Low Quality::Video",
such as "rtsp://*/cams/USB::2-1:1.0") as the user sees fit. As I
mentioned, template webpages could optionally enumerate the streams in
exported subtrees to simplify the initial configuration.
By now, you're probably wondering why I'm putting so much effort into
automated support for large numbers of streams. It's not like most people
have so many webcams that they can't configure them manually, right? Well,
the ultimate goal is to turn Spook into more of a generic network media
mixer, capable of importing video from any video source local or remote,
performing some set of transforms on it, and exporting it over the network
using a variety of formats and protocols. Re-encode video from the DCS-900 watching your koi
pond into MPEG4 and relay it through another, more well-connected server
via RTSP? No problem. Webcast press conferences from the room's AV system
while simultaneously recording them for later retrieval over streaming RTSP
or AVI download over HTTP? Makes sense to me. "Tune in" to the physics
class lecture you're skipping by dialing a number from your mobile phone?
Why not?
Putting as many media sources as possible into a common namespace
doesn't get much closer to convergence, but it's a necessary step.
|