Live From WARM ’09: The World’s Best Winter Augmented Reality Event

Welcome to WARM 2009, where augmented reality eggheads from both sides of the Danube meet for 2 days to share ideas and collaborate.

It’s the 4th year WARM is taking place – always in Graz university, and always in February – to provide an excuse for a skiing event, once the big ideas are taken in. Hence the cunning logo:

This year 54 attendees from 16 different organizations in 5 countries are expected (Austria, Germany Switzerland, England and the US). The agenda is jam-packed with XX sessions, Lab demos and a keynote by Oliver Bimber. I have the unenviable pleasure of speaking last.

It’s 10 am. Lights are off. Spotlight on Dieter Schmalstieg, the master host, taking the stage to welcome everybody.
He admits, the event started as a Graz meeting and just happened because guests kept coming.

Daniel Wagner, the eternal master of ceremony of WARM, introduces Simon Hay from Cambridge (Tom Drummond group) the first speaker in the Computer Vision session. Simon will talk about “Repeatability experiments¬† for interest point location and orientation assignment”¬† – an improvement in feature based matching for the rest of us…

The basic idea: detect interest regions in canonical parameters.
Use, known parameters that come through Ferns, PhonySift, Sit Mops, and MSERs searches,
and accelerate and improve the search with location detectors and orientation assignments.

After a very convincing set of graphs, Simon concludes by confirming Harris and FAST give reasonable performance and gradient orientation assignment works better than expected.

Next talk is by Qi Pan (from the same Cambridge group) about “Real time interactive 3D reconstruction.”

From the abstract:
“High quality 3D reconstruction algorithms currently require an input sequence of images or video which is then processed offline for a lengthy time. After the process is complete, the reconstruction is viewed by the user to confirm the algorithm has modelled the input sequence successfully. Often certain parts of the reconstructed model may be inaccurate or sections may be missing due to insufficient coverage or occlusion in the input sequence. In these cases, a new input sequence needs to be obtained and the whole process repeated.
The aim of the project is to produce a real-time modelling system using the¬† key frame approach which provides immediate feedback about the quality of the input sequence. This enables the system to guide the user to provide additional views for reconstruction, yielding a complete model without having to collect a new input sequence.”

Couldn’t resist pointing out the psychological sounding algorithms (and my ignorance) Qi uses such as Epipolar Geometry and PROSAC, reconstructing Delauney Triangulation followed by probabilistic Tetrahedral carving. You got to love these terms.

The result is pretty good, though still noisy – so stay tuned for future results of Qi’s research.

Third talk is by Vincent Lepetit from Computer Vision Lab from the Swiss CV Lab at EPFL.
Vincent starts with a recap of Keypoint recognition: Train the system to recognize keypoints of an object.
Vincent then demonstrates works leveraging this technique: an awarded work by Camille Scherrer “Le monde des montagnes” a beautiful augmented book, and a demo by Total Immersion targeted for advertising.

Now, on to the new research dubbed Generic Trees. The motivation is to speed up the training phase and to scale.
A comparison results shows it’s 35% faster. To prove, he shows a video of a SLAM application.
Generic Trees method is used by Willow Garages for autonomous robotics – which is implementing Open CV.

Next, he shows recognizing camera pose with 6 degrees of freedom (DOF) based on a single feature point (selected by the user). Impressive.

That’s a wrap of the brainy Computer Vision session. Next is Oliver Bimber’s keynote.

Live from ISMAR ’08: Tracking – Latest and Greatest in Augmented Reality

After a quick liquid adjustment, and a coffee fix – we are back with the next session of ISMAR ’08, tackling a major topic in augmented reality: Tracking.

Youngmin Park is first on stage with Multiple 3D Object Tracking. His first demonstration is mind blowing. He shows an application that tracks multiple 3D objects, which have never been done before – and is quite essential for an AR application.

The approach combines the benefits of multiple approaches while avoiding their drawbacks:

  • Match input image against only a subset of keyframes
  • Track features lying on the visible objects over consecutive frames
  • Two sets of matches are combined to estimate the object 3d poses by propagating errors

Conclusion: Multiple objects are tracked in interactive frame rate and is not affected by the number of objects.

Don’t miss the demo.

~~~

Next two talks with Daniel Wagner from Graz university about his favorite topic Robust and Unobtrusive Marker Tracking on Mobile Phones.

Why AR on cell phones? there are more than a billion phones out there and everyone knows how to use them (which is unusual for new hardware).

A key argument, Daniel is making: Marker tracking and natural feature tracking are complementary. But we need a more robust tracking for phones, and create less obtrusive markers.

The goal: Less obtrusive markers. Here are 3 new marker designs:

The frame markers (the frame provides the marker while the inner area is used to present human readable information.

The split marker (somewhat inspired by Sony’s by the eye of judgment) we use barcode split, with a similar thinking to the frame marker.

A third marker is a Dot marker. It covers only 1% of the overall area (assuming it’s uniquely textured – such as a map).

Incremental tracking using optical flow:

These requirements are driven from industrial needs: “more beautiful markers” and of course making them more robust.

~~~

Daniel continues with the next discussion about Natural feature tracking on mobile phones.

Compared with marker tracking, natural feature tracking is less robust, more knowledge about the scene, more memory, better cameras, more computational load…

To make things worse, mobile phones have less memory, with less processing power (and no floating point computation), and a low camera resolution…

The result is that a high end cell phone runs x10 slower than a PC, and it’s not going to improve soon, because the battery power is limiting the advancement of this capabilities.

So what to do?

We looked at two approaches:

  • SIFT (one of the best object recognition engines – though slow) and –
  • Ferns (state of the art for fast pose tracking – but is very memory intensive)

So both approaches wont work for cell phones…

The solution: combine the best of both worlds into what they call: PhonySift (Modified SIFT for phones). And then complementing it with PhonyFern – detecting dominant orientation and predicting where the feature will be in the next frame.

Conclusion: both approaches did eventually work on mobile phones in an acceptable fashion. The combined strength made it work, and now both Fern and Sift work at similar speeds and memory usages.

================

From ISMAR ’08 Program:

  • Multiple 3D Object Tracking for Augmented Reality
    Youngmin Park, Vincent Lepetit, Woontack Woo
  • Robust and Unobtrusive Marker Tracking on Mobile Phones
    Daniel Wagner, Tobias Langlotz, Dieter Schmalstieg
  • Pose Tracking from Natural Features on Mobile Phones
    Daniel Wagner, Gerhard Reitmayr, Alessandro Mulloni, Tom Drummond, Dieter Schmalstieg