Wednesday, November 12, 2008
Who Gets the Money?
I'll post a question just for fun, because everyone loves politics!
My question is: The oil industry makes massive profits - is it better for that money to be controlled by the shareholders, or would it be better if consumer prices were reduced so there were no profits and the money was left in control of the consumers? Who should get the money: investors/shareholders or people who did not participate in the development of the resource, but only consume the product?
It's not like the money does not get spent. Investors do spend the money on other ventures. Maybe they build the Burj Dubai. When consumers spend most of it goes to the local community for food, shelter and services.
The question came from a twisted path. I've been a contributor to Greenpeace for 20 years and I get a quarterly magazine. This month they pleaded to control the Alberta oil sands projects that are using and polluting water and stripping the land and killing birds. I don't agree with them entirely. Oil is important and we have to balance the good with the bad.
My view on water (tailing ponds) is that it always evaporates and recycles. You cannot run out of water and it always cleans itself in the long run. So using water in any amount is not bad.
My view on the land (surface mining) is that nature will reclaim it. It was not clean in the first place - it was full of oil. What they return after extraction is dirt with less oil. I don't see a problem there. Over time nature will blow in sand and soil and it'll be the same as it was. So using land is OK (but only up to the point where the reduction of vegetation threatens the support all of the life on the planet).
Burning gas to provide heat for the extraction process raises the greenhouse gas emissions and I think that is bad.
Anyway the project is huge. Being a tech nerd, I wanted to see how huge it is, so I went to look at it on Google maps (I entered 'Fort McMurray Alberta' - just look for the huge grey patch north of there). You can see the tailing ponds, you can see the huge dump trucks.
I thought I could get better pictures in Wikipedia. I read the Wikipedia articles on oil sands and Athabasca oil sands to learn the history, technology and economics (I love Wikipedia!). The articles say that they are building more mines out there right now, so there will be construction work out there for at least ten years. So I'm thinking, if I really get broke, there is work out there.
While describing the economics, they go into how it costs like $25 to produce a barrel of oil, which they can sell for $60 to $100. I'm thinking: that's an outrageous profit! Is it moral? If they just dropped the selling price to $30, reducing their profit, a huge amount of money would be left for the consumers to spend on other stuff, which would provide jobs for people. As it is, the investors use the profits to capitalize other ventures, which provide jobs for people.
So which is better? That the consumer or the producer gets to control the money… Oh, I won't give an answer. It's a really big question.
I might have posted this because I just read 'Trinity' by Leon Uris, about the English exploitation of Ireland, or maybe not. Who knows how minds work?
Tuesday, July 29, 2008
Centres are Circular Things
If the receptive field centre covers a large area, it will provide a really big output when it's focused on a round object (that fits within the centre).
Centre / surrounds make their biggest output when the centre is stimulated, and the surround is inhibited, like on bright points of light in the dark, like stars at night. Colour versions ought to provide their biggest outputs on round things that contrast with their background, like berries or fruits. Or maybe larger round things like flowers or rocks. Or maybe almost round things like insects or fish. Or maybe distant almost round things like birds in the sky. Then, also, large receptive fields could trigger on partially round things, like rounded corners or fingertips.
So maybe there's not really a curve detector in a brain at all? Maybe curves are detected by partial outputs from the centres of large receptive fields. If it contrasted enough, a semicircle in the centre of a receptive field would cause a positive signal. If the brain knows, 'hey, that positive signal came from a big receptive field', then it could assume that a curved thing is in the field, with its size relative to the receptive field size.
Centre / surrounds make their biggest output when the centre is stimulated, and the surround is inhibited, like on bright points of light in the dark, like stars at night. Colour versions ought to provide their biggest outputs on round things that contrast with their background, like berries or fruits. Or maybe larger round things like flowers or rocks. Or maybe almost round things like insects or fish. Or maybe distant almost round things like birds in the sky. Then, also, large receptive fields could trigger on partially round things, like rounded corners or fingertips.
So maybe there's not really a curve detector in a brain at all? Maybe curves are detected by partial outputs from the centres of large receptive fields. If it contrasted enough, a semicircle in the centre of a receptive field would cause a positive signal. If the brain knows, 'hey, that positive signal came from a big receptive field', then it could assume that a curved thing is in the field, with its size relative to the receptive field size.
Centres Can Be Corners
Centre/surround receptive fields can detect corners!
Look at the diagrams: corners and angles produce stronger outputs than edges, but weaker than centre dots. If the next layer evaluates its input from receptive fields carefully (i.e. looks for values between edges and dots), it can inform its superiors that a corner is in its field. If that next layer is also getting input from edge detectors, one of these inputs can strengthen its corner detection.
Of course these mid-range values can also indicate some other messy patterns, but that kind of stuff can be neutralized by expectation.
Look at the diagrams: corners and angles produce stronger outputs than edges, but weaker than centre dots. If the next layer evaluates its input from receptive fields carefully (i.e. looks for values between edges and dots), it can inform its superiors that a corner is in its field. If that next layer is also getting input from edge detectors, one of these inputs can strengthen its corner detection.
Of course these mid-range values can also indicate some other messy patterns, but that kind of stuff can be neutralized by expectation.
Friday, July 18, 2008
MonoSLAM Port
At the Robot Vision conference, I was impressed by the presentation 'Realtime visualization of monocular data for 3D reconstruction'. This identified interesting features in a scene and used SLAM technology to track and map them in real time. I thought this would be useful for my object and scene measurement system. Such a system requires identifying objects, comparing them to others and building a map of their relationships. SLAM is great for mapping spacial relationships.
I jumped into this and found that Andrew Davison has made his monocular camera SLAM software available as open-source. So now I am adapting this to my machine: 1. building using Windows and VS9, 2. creating camera input for DirectShow, 3. revising the UI for Windows forms using WTL.
The source includes Davison's MonoSLAMGlow application code, his SceneLib, which implements the SLAM algorithm and VW34, the Oxford Active Vision Lab libraries. My plan was to use OpenCV and DirectX as the input and output for my port, but SceneLib is so centred around OpenGL, that it's a lot less trouble to use that instead of DirectX.
The OpenCV demos would not work with my cameras (I have a cheapo USB webcam and a Canon DV Camcorder that connects by FireWire). I tried their suggested alternatives. Only videoInput worked for both my cameras, but I found it uses a lot of CPU, waaay more than DirectShow, so why not just use DirectShow, which works fine? So that's what I'm doing.
VW34 comes with project files for VS7.1, so it wasn't too much trouble getting those built in VS9. The compiler found a few problems that I must yet report back to Oxford. VW34 has a lot of parts. I only used VNL, VW and VWGL. Most of the other parts are Linux UI adapters. VWGL is an OpenGL adapter that on Windows requires GLUT for Windows.
ScenelLib was built on Linux so I made VS project files for it. While porting Scenelib I found something interesting. It appears as though in GCC you can instantiate an array with a size specified by a variable, like this: int Size = fn(X); char Array [Size]; Microsoft C++ won't allow that! I had to change to malloc(). There's something weird in GeomObjects/calibrationtable.h that causes an inexplicable list of errors, so I removed all references to it.
I built my own camera grabber using DirectShow. DirectShow samples are available in the Windows SDK. I made this grabber run in callback mode, store the bitmap and post a message to the application's main thread. This uses 13% of my Core 2 Duo 6400 (2.13 GHz).
So now I've built a WTL application, linked in the libraries and I'm working on getting the OpenGL visualizations working.
I jumped into this and found that Andrew Davison has made his monocular camera SLAM software available as open-source. So now I am adapting this to my machine: 1. building using Windows and VS9, 2. creating camera input for DirectShow, 3. revising the UI for Windows forms using WTL.
The source includes Davison's MonoSLAMGlow application code, his SceneLib, which implements the SLAM algorithm and VW34, the Oxford Active Vision Lab libraries. My plan was to use OpenCV and DirectX as the input and output for my port, but SceneLib is so centred around OpenGL, that it's a lot less trouble to use that instead of DirectX.
The OpenCV demos would not work with my cameras (I have a cheapo USB webcam and a Canon DV Camcorder that connects by FireWire). I tried their suggested alternatives. Only videoInput worked for both my cameras, but I found it uses a lot of CPU, waaay more than DirectShow, so why not just use DirectShow, which works fine? So that's what I'm doing.
VW34 comes with project files for VS7.1, so it wasn't too much trouble getting those built in VS9. The compiler found a few problems that I must yet report back to Oxford. VW34 has a lot of parts. I only used VNL, VW and VWGL. Most of the other parts are Linux UI adapters. VWGL is an OpenGL adapter that on Windows requires GLUT for Windows.
ScenelLib was built on Linux so I made VS project files for it. While porting Scenelib I found something interesting. It appears as though in GCC you can instantiate an array with a size specified by a variable, like this: int Size = fn(X); char Array [Size]; Microsoft C++ won't allow that! I had to change to malloc(). There's something weird in GeomObjects/calibrationtable.h that causes an inexplicable list of errors, so I removed all references to it.
I built my own camera grabber using DirectShow. DirectShow samples are available in the Windows SDK. I made this grabber run in callback mode, store the bitmap and post a message to the application's main thread. This uses 13% of my Core 2 Duo 6400 (2.13 GHz).
So now I've built a WTL application, linked in the libraries and I'm working on getting the OpenGL visualizations working.
Tuesday, July 1, 2008
No Math Required
I don't think the vision problem is mathematical (except to prove that it's accurate or correct). Mathematics can be good for optimizations and specializations and extended functionality. I think simple methods can be used for early vision. The basis should be in the findings of neuroscience. I think the solution at higher stages is extracting the right information from the early stages and representing it in a way that simple pattern matching (using highly interconnected neural networks) will do the job. I think the right information and representation can also come from neuroscience. So that should be my focus. What does neuroscience know?
Tuesday, June 24, 2008
Cloud Robot
I was reading about cloud computing.
So here's an idea:
A robot relying on vision can upload its real-time sensor input to the cloud.
The cloud can extract whatever it likes and present it to humans.
But the cloud will also do all the image processing and download the result (?) back to the robot.
The result could be motor instructions or descriptors for the robot to work with.
So that the robot itself doesn't need a high-powered processor.
All that's needed is real fast wireless link:
640*480*4*60=73,728,000 bytes/sec!
(640x480 pixels, each 4 bytes (RGBI) @ 60 frames/sec)
You could strip that down to 3 bytes/pixel @ 30 frames/sec 640*480*3*30=27,648,000 bytes/sec
OK, 320x240 image = 320*240*3*30=6,912,000 bytes/sec
OK, 256-shade grey scale = 320*240*30=2,304,000 bytes/sec
That's possible!
And that leaves 500 KBs to send stuff back.
Probabaly been done. But maybe not so real time?
So here's an idea:
A robot relying on vision can upload its real-time sensor input to the cloud.
The cloud can extract whatever it likes and present it to humans.
But the cloud will also do all the image processing and download the result (?) back to the robot.
The result could be motor instructions or descriptors for the robot to work with.
So that the robot itself doesn't need a high-powered processor.
All that's needed is real fast wireless link:
640*480*4*60=73,728,000 bytes/sec!
(640x480 pixels, each 4 bytes (RGBI) @ 60 frames/sec)
You could strip that down to 3 bytes/pixel @ 30 frames/sec 640*480*3*30=27,648,000 bytes/sec
OK, 320x240 image = 320*240*3*30=6,912,000 bytes/sec
OK, 256-shade grey scale = 320*240*30=2,304,000 bytes/sec
That's possible!
And that leaves 500 KBs to send stuff back.
Probabaly been done. But maybe not so real time?
Sunday, June 22, 2008
2008 Canadian Robot Vision Conference
This is my report on the Canadian Intelligent Systems Collaborative (AI/GI/CRV/IS) 2008 Conference. This was five conferences in one event. My interest was in the CRV (Computer and Robot Vision) conference held by the Canadian Image Processing and Pattern Recognition Society (CIPPRS).
The buildings in which conference was held were being reconstructed and the noise was distracting. But, with the inconvenience tolerated, the conference was very enlightening for a beginner like me.
The conference was essentially three days, with a keynote each morning and papers or talks throughout each day.
The only keynote I heard was the first, which was by Peter Carbone from Nortel. This was interesting to me (with my history in telecommunications), but off-target for the AI people who were the majority of conference attendees. Mr. Carbone made several predictions that I think will be significant in telecom: widespread broadband wireless penetration by 2010, telephony using SOA with mashup potential, and 100 MBs real-time encryption capability.
Following the keynote the CAIAC Precarn Intelligent Systems Challenge was announced. This offers a $10K prize to the student submitting the best method of detecting ships meeting at sea using satellite and radar tracking data. I think the poor data makes this a challenging problem.
The highlight of the conference for me was a talk by Dr. Steven Zucker from Yale. He was the only presenter who seemed interested in doing AI and computer vision to emulate biology, as I am. I think artificial systems should understand what they're working on; be a part of their world, as biological systems are. A biological goal Zucker identified is guiding animal movement, such as monkeys jumping to tree branches. This objective is the same as Arathorn's example of goats jumping to rocky ledges. Zucker's talk was mainly on stereo vision. He confirmed my assessment that Canny edge detectors suck. He implemented a nice curve detector based on tangents. He showed how he used spacial and orientation disparity to get a better matching of the image pair. A nice point was about self-referential calibration: a system that can move can identify its own parts (e.g. in a mirror) as the ones that move when it moves them.
I missed the talk by Dr. James Crowley, to my regret. I gathered that his points included that intelligence requires embodiment and autonomy. This confirms my subscription to the philosophy of Spinoza, who states that the mind is the entire body. Any organism's mental reality would not be what it is without all of the sensory input and motor feedback provided by the body.
The talk by Dr. Greg Dudek about his AQUA robot was interesting because of the focus and completeness of the project. It's another very specialized machine, although you can program its actions by a visual language. Apparently they discarded a visual system that recognized human hand gestures.
I attended all of the CRV paper presentations. These seemed to be arranged in ascending order of complexity and accomplishment. I was surprised that I could understand much of the work. Some of the papers were not amazing to me at all. Some were incremental improvements on previous work. Most were applications of existing work. This may be a survey of the state of the art, or it may just be a sampling of people who are trying to get attention (who didn't go to other conferences). I'm not going to summarize all of the papers - just give criticisms of the ones I found useful.
'An Efficient Region-Based Background Subtraction Technique' and 'Ray-based Color Image Segmentation' presented image segmentation optimizations based in iterative deduction. This is a good and intuitive idea and I think I can implement it using layers of neural networks. The ray-based segmentation idea was clever, but had problems finding all segments and was slow. I still don't know if colour-based segmentation is natural.
The methods used in 'A Cue to Shading - Elongations Near Intensity Maxima' to differentiate shadows from textures got me confused, but Gipsman's point that knowledge of shading detection is still primitive surprised me - I think analysis of shading would be fundamental to determining shape and orientation of 3D objects. I agree with her that feedback from higher layers will be essential. But I think the feedback will loop: the shape of the shadow will help in recognizing the object and the shape of the object will help in recognizing the shadow.
'Fast Normal Map Acquisition Using an LCD Screen Emitting Gradient Patterns' presents an innovative method for lighting objects to get 3D information. An interesting point is their use of the polarized LCD light and a filter to remove specular reflection. I found later that the human eye can differentiate linear from non-linearly polarized light (see Haidinger's Brush). Perhaps the brain can use this information in determining where the light is really coming from?
'Realtime visualization of monocular data for 3D reconstruction' was a treat for me because it relates so well to my planned measurement with a camera project. To me, this paper is like an instruction book on how to model 3D space from a single camera. I must look into its Simultaneous Localization and Mapping (SLAM) methods and other tricks. Monocular is cheap, stereo is more accurate? Again, the system doesn't have a clue what its looking at, but it may be a good start for a more complex system. I must analyze to more depth.
'Object Class Recognition using Quadrangles' is a general-purpose implementation of edge-based object recognition which also considers colour uniform regions. On top of this the authors implemented a structural descriptor (the paper describes quadrangles only, but the speaker described use of ellipse descriptors in their newer work) and template-based spacial relationship matching system much simpler than that used by Sinisa Todorovic's self-learning segment-based system (but less capable). 'Geometrical Primitives for the Classification of Images Containing Structural Cartographic Objects' is another system base on edge/ region/ structural descriptor, but focused on the single problem of finding roads, bridges and such in satellite imagery. The software seems to be more capable, handling higher level primitives such as blobs, polygons, arcs and junctions. It uses AdaBoost binary classification. Results were good except for detecting bridges. I suggest looking for the bridge shadows.
Most of the motion tracking papers used very simple recognition techniques or did not describe them. '3D Human Motion Tracking Using Dynamic Probabilistic Latent Semantic Analysis' presents a highly mathematical approach that seems to be another form of template matching. It works well but it will take quite an effort for me to understand it. 'Visual-Model Based Spatial Tracking in the Presence of Occlusions' presents a pre-processing trick to mask occlusions from a template/visual-model based 3D tracking system. While the system is highly performant, using the GPU, it is highly specialized to a single object. 'Automatically Detecting and Tracking People Walking Through Transparent Door with Vision' tracks Harris corners through time. It can be taught to subtract expected movements from new ones, by simple geometric trajectory comparison. This can serve many applications, but the use of just corners means it can't tell you what is moving through the scene. But could specializations like this be used as keys to brain behaviour? e.g. does the brain just use the moving corners of a door to perceive it? 'Invariant Classification of Gait Types' classifies body movements by comparison to a database of shape contexts derived from template silhouettes. This is an efficient and accurate method used in handwriting recognition, and I think I'll look into it more, because the bin concept applied to pattern matching lends itself to implementation using neural networks.
'Active Vision for Door Localization and Door Opening using Playbot' is another specialization - for doorframes and handles. Its advance is active vision. The robot solves its position geometrically after it detects the door by using a pre-programmed door size - meaning it will only work with one size of door. The active vision part is that the robot takes pictures at multiple angles and multiple positions and solves its position using the camera angles and the detected door edges and corners, then calculates a move to a new position. 'Automatic Pyramidal Intensity-based Laser Scan Matcher for 3D Modeling of Large Scale Unstructured Environments' tackles an incredibly hard problem of mosaicing adjacent spherical laser images without feature, location or rotation information by matching their overlapping depth values. This is useful for other mosaicing problems, but I don't think its complicated methods will be required in most computer vision applications which will have less interval between images and can rely on feature detection and sense of place. '6D Vision Goes Fisheye for Intersection Assistance' shows that using fisheye lenses provides a wider angle of view with only small hits on their low processing time requirements and relatively wide accuracy requirements in a real-time stereo mobile object tracking application. 'Challenges of Vision for Real-Time Sensor Based Control' explains how additional sensor input to an extended Kalman filter can be used to supplement poor video data caused by bad camera angles.
One thing I brought away from this conference is that although there is a large amount of existing work and many new efforts in the computer vision field, the presented applications are not trying to understand or duplicate biology. They're using mathematical methods to solve specific problems. Well perhaps the concepts can be implemented in neural networks. And the solutions are so specific! I guess it'll be a long time until there is general purpose vision. And not surprisingly so, because that will require general purpose concept representation. Too bad I didn't hear the AI papers too.
Since this was my first academic conference, I learned that what to look for in papers is what is new, or what can be adapted to my purpose. Attending has motivated me to get an IEEE membership so I can access more research papers. Poster presentations seem pretty valueless to me. Either they don't present enough information or I am forced to stand while reading an entire paper.
Another thing is that there is a lot of existing technology out there that can be used to solve problems. A counterpoint to this and a kind of semi-corollary to the first point is that a lot of the existing technology is highly focused, inaccurate and slow, so there is still a lot of research and development needed.
From a business point of view, I got no leads on paying work. Some people at the conference believe that contracting in this field can be viable. But I think I'll have to prove I'm capable by example before anyone will hire me. Since most researchers only solve special cases, another opportunity is to complete a project to make it useful in lots of situations.
The buildings in which conference was held were being reconstructed and the noise was distracting. But, with the inconvenience tolerated, the conference was very enlightening for a beginner like me.
The conference was essentially three days, with a keynote each morning and papers or talks throughout each day.
The only keynote I heard was the first, which was by Peter Carbone from Nortel. This was interesting to me (with my history in telecommunications), but off-target for the AI people who were the majority of conference attendees. Mr. Carbone made several predictions that I think will be significant in telecom: widespread broadband wireless penetration by 2010, telephony using SOA with mashup potential, and 100 MBs real-time encryption capability.
Following the keynote the CAIAC Precarn Intelligent Systems Challenge was announced. This offers a $10K prize to the student submitting the best method of detecting ships meeting at sea using satellite and radar tracking data. I think the poor data makes this a challenging problem.
The highlight of the conference for me was a talk by Dr. Steven Zucker from Yale. He was the only presenter who seemed interested in doing AI and computer vision to emulate biology, as I am. I think artificial systems should understand what they're working on; be a part of their world, as biological systems are. A biological goal Zucker identified is guiding animal movement, such as monkeys jumping to tree branches. This objective is the same as Arathorn's example of goats jumping to rocky ledges. Zucker's talk was mainly on stereo vision. He confirmed my assessment that Canny edge detectors suck. He implemented a nice curve detector based on tangents. He showed how he used spacial and orientation disparity to get a better matching of the image pair. A nice point was about self-referential calibration: a system that can move can identify its own parts (e.g. in a mirror) as the ones that move when it moves them.
I missed the talk by Dr. James Crowley, to my regret. I gathered that his points included that intelligence requires embodiment and autonomy. This confirms my subscription to the philosophy of Spinoza, who states that the mind is the entire body. Any organism's mental reality would not be what it is without all of the sensory input and motor feedback provided by the body.
The talk by Dr. Greg Dudek about his AQUA robot was interesting because of the focus and completeness of the project. It's another very specialized machine, although you can program its actions by a visual language. Apparently they discarded a visual system that recognized human hand gestures.
I attended all of the CRV paper presentations. These seemed to be arranged in ascending order of complexity and accomplishment. I was surprised that I could understand much of the work. Some of the papers were not amazing to me at all. Some were incremental improvements on previous work. Most were applications of existing work. This may be a survey of the state of the art, or it may just be a sampling of people who are trying to get attention (who didn't go to other conferences). I'm not going to summarize all of the papers - just give criticisms of the ones I found useful.
'An Efficient Region-Based Background Subtraction Technique' and 'Ray-based Color Image Segmentation' presented image segmentation optimizations based in iterative deduction. This is a good and intuitive idea and I think I can implement it using layers of neural networks. The ray-based segmentation idea was clever, but had problems finding all segments and was slow. I still don't know if colour-based segmentation is natural.
The methods used in 'A Cue to Shading - Elongations Near Intensity Maxima' to differentiate shadows from textures got me confused, but Gipsman's point that knowledge of shading detection is still primitive surprised me - I think analysis of shading would be fundamental to determining shape and orientation of 3D objects. I agree with her that feedback from higher layers will be essential. But I think the feedback will loop: the shape of the shadow will help in recognizing the object and the shape of the object will help in recognizing the shadow.
'Fast Normal Map Acquisition Using an LCD Screen Emitting Gradient Patterns' presents an innovative method for lighting objects to get 3D information. An interesting point is their use of the polarized LCD light and a filter to remove specular reflection. I found later that the human eye can differentiate linear from non-linearly polarized light (see Haidinger's Brush). Perhaps the brain can use this information in determining where the light is really coming from?
'Realtime visualization of monocular data for 3D reconstruction' was a treat for me because it relates so well to my planned measurement with a camera project. To me, this paper is like an instruction book on how to model 3D space from a single camera. I must look into its Simultaneous Localization and Mapping (SLAM) methods and other tricks. Monocular is cheap, stereo is more accurate? Again, the system doesn't have a clue what its looking at, but it may be a good start for a more complex system. I must analyze to more depth.
'Object Class Recognition using Quadrangles' is a general-purpose implementation of edge-based object recognition which also considers colour uniform regions. On top of this the authors implemented a structural descriptor (the paper describes quadrangles only, but the speaker described use of ellipse descriptors in their newer work) and template-based spacial relationship matching system much simpler than that used by Sinisa Todorovic's self-learning segment-based system (but less capable). 'Geometrical Primitives for the Classification of Images Containing Structural Cartographic Objects' is another system base on edge/ region/ structural descriptor, but focused on the single problem of finding roads, bridges and such in satellite imagery. The software seems to be more capable, handling higher level primitives such as blobs, polygons, arcs and junctions. It uses AdaBoost binary classification. Results were good except for detecting bridges. I suggest looking for the bridge shadows.
Most of the motion tracking papers used very simple recognition techniques or did not describe them. '3D Human Motion Tracking Using Dynamic Probabilistic Latent Semantic Analysis' presents a highly mathematical approach that seems to be another form of template matching. It works well but it will take quite an effort for me to understand it. 'Visual-Model Based Spatial Tracking in the Presence of Occlusions' presents a pre-processing trick to mask occlusions from a template/visual-model based 3D tracking system. While the system is highly performant, using the GPU, it is highly specialized to a single object. 'Automatically Detecting and Tracking People Walking Through Transparent Door with Vision' tracks Harris corners through time. It can be taught to subtract expected movements from new ones, by simple geometric trajectory comparison. This can serve many applications, but the use of just corners means it can't tell you what is moving through the scene. But could specializations like this be used as keys to brain behaviour? e.g. does the brain just use the moving corners of a door to perceive it? 'Invariant Classification of Gait Types' classifies body movements by comparison to a database of shape contexts derived from template silhouettes. This is an efficient and accurate method used in handwriting recognition, and I think I'll look into it more, because the bin concept applied to pattern matching lends itself to implementation using neural networks.
'Active Vision for Door Localization and Door Opening using Playbot' is another specialization - for doorframes and handles. Its advance is active vision. The robot solves its position geometrically after it detects the door by using a pre-programmed door size - meaning it will only work with one size of door. The active vision part is that the robot takes pictures at multiple angles and multiple positions and solves its position using the camera angles and the detected door edges and corners, then calculates a move to a new position. 'Automatic Pyramidal Intensity-based Laser Scan Matcher for 3D Modeling of Large Scale Unstructured Environments' tackles an incredibly hard problem of mosaicing adjacent spherical laser images without feature, location or rotation information by matching their overlapping depth values. This is useful for other mosaicing problems, but I don't think its complicated methods will be required in most computer vision applications which will have less interval between images and can rely on feature detection and sense of place. '6D Vision Goes Fisheye for Intersection Assistance' shows that using fisheye lenses provides a wider angle of view with only small hits on their low processing time requirements and relatively wide accuracy requirements in a real-time stereo mobile object tracking application. 'Challenges of Vision for Real-Time Sensor Based Control' explains how additional sensor input to an extended Kalman filter can be used to supplement poor video data caused by bad camera angles.
One thing I brought away from this conference is that although there is a large amount of existing work and many new efforts in the computer vision field, the presented applications are not trying to understand or duplicate biology. They're using mathematical methods to solve specific problems. Well perhaps the concepts can be implemented in neural networks. And the solutions are so specific! I guess it'll be a long time until there is general purpose vision. And not surprisingly so, because that will require general purpose concept representation. Too bad I didn't hear the AI papers too.
Since this was my first academic conference, I learned that what to look for in papers is what is new, or what can be adapted to my purpose. Attending has motivated me to get an IEEE membership so I can access more research papers. Poster presentations seem pretty valueless to me. Either they don't present enough information or I am forced to stand while reading an entire paper.
Another thing is that there is a lot of existing technology out there that can be used to solve problems. A counterpoint to this and a kind of semi-corollary to the first point is that a lot of the existing technology is highly focused, inaccurate and slow, so there is still a lot of research and development needed.
From a business point of view, I got no leads on paying work. Some people at the conference believe that contracting in this field can be viable. But I think I'll have to prove I'm capable by example before anyone will hire me. Since most researchers only solve special cases, another opportunity is to complete a project to make it useful in lots of situations.
Subscribe to:
Posts (Atom)