Tuesday, July 29, 2008

Centres are Circular Things

If the receptive field centre covers a large area, it will provide a really big output when it's focused on a round object (that fits within the centre).

Centre / surrounds make their biggest output when the centre is stimulated, and the surround is inhibited, like on bright points of light in the dark, like stars at night. Colour versions ought to provide their biggest outputs on round things that contrast with their background, like berries or fruits. Or maybe larger round things like flowers or rocks. Or maybe almost round things like insects or fish. Or maybe distant almost round things like birds in the sky. Then, also, large receptive fields could trigger on partially round things, like rounded corners or fingertips.
So maybe there's not really a curve detector in a brain at all? Maybe curves are detected by partial outputs from the centres of large receptive fields. If it contrasted enough, a semicircle in the centre of a receptive field would cause a positive signal. If the brain knows, 'hey, that positive signal came from a big receptive field', then it could assume that a curved thing is in the field, with its size relative to the receptive field size.

Centres Can Be Corners

Centre/surround receptive fields can detect corners!


Look at the diagrams: corners and angles produce stronger outputs than edges, but weaker than centre dots. If the next layer evaluates its input from receptive fields carefully (i.e. looks for values between edges and dots), it can inform its superiors that a corner is in its field. If that next layer is also getting input from edge detectors, one of these inputs can strengthen its corner detection.

Of course these mid-range values can also indicate some other messy patterns, but that kind of stuff can be neutralized by expectation.

Friday, July 18, 2008

MonoSLAM Port

At the Robot Vision conference, I was impressed by the presentation 'Realtime visualization of monocular data for 3D reconstruction'. This identified interesting features in a scene and used SLAM technology to track and map them in real time. I thought this would be useful for my object and scene measurement system. Such a system requires identifying objects, comparing them to others and building a map of their relationships. SLAM is great for mapping spacial relationships.
I jumped into this and found that Andrew Davison has made his monocular camera SLAM software available as open-source. So now I am adapting this to my machine: 1. building using Windows and VS9, 2. creating camera input for DirectShow, 3. revising the UI for Windows forms using WTL.
The source includes Davison's MonoSLAMGlow application code, his SceneLib, which implements the SLAM algorithm and VW34, the Oxford Active Vision Lab libraries. My plan was to use OpenCV and DirectX as the input and output for my port, but SceneLib is so centred around OpenGL, that it's a lot less trouble to use that instead of DirectX.
The OpenCV demos would not work with my cameras (I have a cheapo USB webcam and a Canon DV Camcorder that connects by FireWire). I tried their suggested alternatives. Only videoInput worked for both my cameras, but I found it uses a lot of CPU, waaay more than DirectShow, so why not just use DirectShow, which works fine? So that's what I'm doing.
VW34 comes with project files for VS7.1, so it wasn't too much trouble getting those built in VS9. The compiler found a few problems that I must yet report back to Oxford. VW34 has a lot of parts. I only used VNL, VW and VWGL. Most of the other parts are Linux UI adapters. VWGL is an OpenGL adapter that on Windows requires GLUT for Windows.
ScenelLib was built on Linux so I made VS project files for it. While porting Scenelib I found something interesting. It appears as though in GCC you can instantiate an array with a size specified by a variable, like this: int Size = fn(X); char Array [Size]; Microsoft C++ won't allow that! I had to change to malloc(). There's something weird in GeomObjects/calibrationtable.h that causes an inexplicable list of errors, so I removed all references to it.
I built my own camera grabber using DirectShow. DirectShow samples are available in the Windows SDK. I made this grabber run in callback mode, store the bitmap and post a message to the application's main thread. This uses 13% of my Core 2 Duo 6400 (2.13 GHz).
So now I've built a WTL application, linked in the libraries and I'm working on getting the OpenGL visualizations working.

Tuesday, July 1, 2008

No Math Required

I don't think the vision problem is mathematical (except to prove that it's accurate or correct). Mathematics can be good for optimizations and specializations and extended functionality. I think simple methods can be used for early vision. The basis should be in the findings of neuroscience. I think the solution at higher stages is extracting the right information from the early stages and representing it in a way that simple pattern matching (using highly interconnected neural networks) will do the job. I think the right information and representation can also come from neuroscience. So that should be my focus. What does neuroscience know?