Nimajin: 2008

Wednesday, November 12, 2008

Who Gets the Money?

I'll post a question just for fun, because everyone loves politics!
My question is: The oil industry makes massive profits - is it better for that money to be controlled by the shareholders, or would it be better if consumer prices were reduced so there were no profits and the money was left in control of the consumers? Who should get the money: investors/shareholders or people who did not participate in the development of the resource, but only consume the product?
It's not like the money does not get spent. Investors do spend the money on other ventures. Maybe they build the Burj Dubai. When consumers spend most of it goes to the local community for food, shelter and services.
The question came from a twisted path. I've been a contributor to Greenpeace for 20 years and I get a quarterly magazine. This month they pleaded to control the Alberta oil sands projects that are using and polluting water and stripping the land and killing birds. I don't agree with them entirely. Oil is important and we have to balance the good with the bad.
My view on water (tailing ponds) is that it always evaporates and recycles. You cannot run out of water and it always cleans itself in the long run. So using water in any amount is not bad.
My view on the land (surface mining) is that nature will reclaim it. It was not clean in the first place - it was full of oil. What they return after extraction is dirt with less oil. I don't see a problem there. Over time nature will blow in sand and soil and it'll be the same as it was. So using land is OK (but only up to the point where the reduction of vegetation threatens the support all of the life on the planet).
Burning gas to provide heat for the extraction process raises the greenhouse gas emissions and I think that is bad.
Anyway the project is huge. Being a tech nerd, I wanted to see how huge it is, so I went to look at it on Google maps (I entered 'Fort McMurray Alberta' - just look for the huge grey patch north of there). You can see the tailing ponds, you can see the huge dump trucks.
I thought I could get better pictures in Wikipedia. I read the Wikipedia articles on oil sands and Athabasca oil sands to learn the history, technology and economics (I love Wikipedia!). The articles say that they are building more mines out there right now, so there will be construction work out there for at least ten years. So I'm thinking, if I really get broke, there is work out there.
While describing the economics, they go into how it costs like $25 to produce a barrel of oil, which they can sell for $60 to $100. I'm thinking: that's an outrageous profit! Is it moral? If they just dropped the selling price to $30, reducing their profit, a huge amount of money would be left for the consumers to spend on other stuff, which would provide jobs for people. As it is, the investors use the profits to capitalize other ventures, which provide jobs for people.
So which is better? That the consumer or the producer gets to control the money… Oh, I won't give an answer. It's a really big question.
I might have posted this because I just read 'Trinity' by Leon Uris, about the English exploitation of Ireland, or maybe not. Who knows how minds work?

Tuesday, July 29, 2008

Centres are Circular Things

If the receptive field centre covers a large area, it will provide a really big output when it's focused on a round object (that fits within the centre).

Centre / surrounds make their biggest output when the centre is stimulated, and the surround is inhibited, like on bright points of light in the dark, like stars at night. Colour versions ought to provide their biggest outputs on round things that contrast with their background, like berries or fruits. Or maybe larger round things like flowers or rocks. Or maybe almost round things like insects or fish. Or maybe distant almost round things like birds in the sky. Then, also, large receptive fields could trigger on partially round things, like rounded corners or fingertips.
So maybe there's not really a curve detector in a brain at all? Maybe curves are detected by partial outputs from the centres of large receptive fields. If it contrasted enough, a semicircle in the centre of a receptive field would cause a positive signal. If the brain knows, 'hey, that positive signal came from a big receptive field', then it could assume that a curved thing is in the field, with its size relative to the receptive field size.

Centres Can Be Corners

Centre/surround receptive fields can detect corners!

Look at the diagrams: corners and angles produce stronger outputs than edges, but weaker than centre dots. If the next layer evaluates its input from receptive fields carefully (i.e. looks for values between edges and dots), it can inform its superiors that a corner is in its field. If that next layer is also getting input from edge detectors, one of these inputs can strengthen its corner detection.

Of course these mid-range values can also indicate some other messy patterns, but that kind of stuff can be neutralized by expectation.

Friday, July 18, 2008

MonoSLAM Port

At the Robot Vision conference, I was impressed by the presentation 'Realtime visualization of monocular data for 3D reconstruction'. This identified interesting features in a scene and used SLAM technology to track and map them in real time. I thought this would be useful for my object and scene measurement system. Such a system requires identifying objects, comparing them to others and building a map of their relationships. SLAM is great for mapping spacial relationships.
I jumped into this and found that Andrew Davison has made his monocular camera SLAM software available as open-source. So now I am adapting this to my machine: 1. building using Windows and VS9, 2. creating camera input for DirectShow, 3. revising the UI for Windows forms using WTL.
The source includes Davison's MonoSLAMGlow application code, his SceneLib, which implements the SLAM algorithm and VW34, the Oxford Active Vision Lab libraries. My plan was to use OpenCV and DirectX as the input and output for my port, but SceneLib is so centred around OpenGL, that it's a lot less trouble to use that instead of DirectX.
The OpenCV demos would not work with my cameras (I have a cheapo USB webcam and a Canon DV Camcorder that connects by FireWire). I tried their suggested alternatives. Only videoInput worked for both my cameras, but I found it uses a lot of CPU, waaay more than DirectShow, so why not just use DirectShow, which works fine? So that's what I'm doing.
VW34 comes with project files for VS7.1, so it wasn't too much trouble getting those built in VS9. The compiler found a few problems that I must yet report back to Oxford. VW34 has a lot of parts. I only used VNL, VW and VWGL. Most of the other parts are Linux UI adapters. VWGL is an OpenGL adapter that on Windows requires GLUT for Windows.
ScenelLib was built on Linux so I made VS project files for it. While porting Scenelib I found something interesting. It appears as though in GCC you can instantiate an array with a size specified by a variable, like this: int Size = fn(X); char Array [Size]; Microsoft C++ won't allow that! I had to change to malloc(). There's something weird in GeomObjects/calibrationtable.h that causes an inexplicable list of errors, so I removed all references to it.
I built my own camera grabber using DirectShow. DirectShow samples are available in the Windows SDK. I made this grabber run in callback mode, store the bitmap and post a message to the application's main thread. This uses 13% of my Core 2 Duo 6400 (2.13 GHz).
So now I've built a WTL application, linked in the libraries and I'm working on getting the OpenGL visualizations working.

Tuesday, July 1, 2008

No Math Required

I don't think the vision problem is mathematical (except to prove that it's accurate or correct). Mathematics can be good for optimizations and specializations and extended functionality. I think simple methods can be used for early vision. The basis should be in the findings of neuroscience. I think the solution at higher stages is extracting the right information from the early stages and representing it in a way that simple pattern matching (using highly interconnected neural networks) will do the job. I think the right information and representation can also come from neuroscience. So that should be my focus. What does neuroscience know?

Tuesday, June 24, 2008

Cloud Robot

I was reading about cloud computing.
So here's an idea:
A robot relying on vision can upload its real-time sensor input to the cloud.
The cloud can extract whatever it likes and present it to humans.
But the cloud will also do all the image processing and download the result (?) back to the robot.
The result could be motor instructions or descriptors for the robot to work with.
So that the robot itself doesn't need a high-powered processor.
All that's needed is real fast wireless link:
640*480*4*60=73,728,000 bytes/sec!
(640x480 pixels, each 4 bytes (RGBI) @ 60 frames/sec)
You could strip that down to 3 bytes/pixel @ 30 frames/sec 640*480*3*30=27,648,000 bytes/sec
OK, 320x240 image = 320*240*3*30=6,912,000 bytes/sec
OK, 256-shade grey scale = 320*240*30=2,304,000 bytes/sec
That's possible!
And that leaves 500 KBs to send stuff back.
Probabaly been done. But maybe not so real time?

Sunday, June 22, 2008

2008 Canadian Robot Vision Conference

This is my report on the Canadian Intelligent Systems Collaborative (AI/GI/CRV/IS) 2008 Conference. This was five conferences in one event. My interest was in the CRV (Computer and Robot Vision) conference held by the Canadian Image Processing and Pattern Recognition Society (CIPPRS).
The buildings in which conference was held were being reconstructed and the noise was distracting. But, with the inconvenience tolerated, the conference was very enlightening for a beginner like me.
The conference was essentially three days, with a keynote each morning and papers or talks throughout each day.
The only keynote I heard was the first, which was by Peter Carbone from Nortel. This was interesting to me (with my history in telecommunications), but off-target for the AI people who were the majority of conference attendees. Mr. Carbone made several predictions that I think will be significant in telecom: widespread broadband wireless penetration by 2010, telephony using SOA with mashup potential, and 100 MBs real-time encryption capability.
Following the keynote the CAIAC Precarn Intelligent Systems Challenge was announced. This offers a $10K prize to the student submitting the best method of detecting ships meeting at sea using satellite and radar tracking data. I think the poor data makes this a challenging problem.

The highlight of the conference for me was a talk by Dr. Steven Zucker from Yale. He was the only presenter who seemed interested in doing AI and computer vision to emulate biology, as I am. I think artificial systems should understand what they're working on; be a part of their world, as biological systems are. A biological goal Zucker identified is guiding animal movement, such as monkeys jumping to tree branches. This objective is the same as Arathorn's example of goats jumping to rocky ledges. Zucker's talk was mainly on stereo vision. He confirmed my assessment that Canny edge detectors suck. He implemented a nice curve detector based on tangents. He showed how he used spacial and orientation disparity to get a better matching of the image pair. A nice point was about self-referential calibration: a system that can move can identify its own parts (e.g. in a mirror) as the ones that move when it moves them.
I missed the talk by Dr. James Crowley, to my regret. I gathered that his points included that intelligence requires embodiment and autonomy. This confirms my subscription to the philosophy of Spinoza, who states that the mind is the entire body. Any organism's mental reality would not be what it is without all of the sensory input and motor feedback provided by the body.
The talk by Dr. Greg Dudek about his AQUA robot was interesting because of the focus and completeness of the project. It's another very specialized machine, although you can program its actions by a visual language. Apparently they discarded a visual system that recognized human hand gestures.

I attended all of the CRV paper presentations. These seemed to be arranged in ascending order of complexity and accomplishment. I was surprised that I could understand much of the work. Some of the papers were not amazing to me at all. Some were incremental improvements on previous work. Most were applications of existing work. This may be a survey of the state of the art, or it may just be a sampling of people who are trying to get attention (who didn't go to other conferences). I'm not going to summarize all of the papers - just give criticisms of the ones I found useful.
'An Efficient Region-Based Background Subtraction Technique' and 'Ray-based Color Image Segmentation' presented image segmentation optimizations based in iterative deduction. This is a good and intuitive idea and I think I can implement it using layers of neural networks. The ray-based segmentation idea was clever, but had problems finding all segments and was slow. I still don't know if colour-based segmentation is natural.
The methods used in 'A Cue to Shading - Elongations Near Intensity Maxima' to differentiate shadows from textures got me confused, but Gipsman's point that knowledge of shading detection is still primitive surprised me - I think analysis of shading would be fundamental to determining shape and orientation of 3D objects. I agree with her that feedback from higher layers will be essential. But I think the feedback will loop: the shape of the shadow will help in recognizing the object and the shape of the object will help in recognizing the shadow.
'Fast Normal Map Acquisition Using an LCD Screen Emitting Gradient Patterns' presents an innovative method for lighting objects to get 3D information. An interesting point is their use of the polarized LCD light and a filter to remove specular reflection. I found later that the human eye can differentiate linear from non-linearly polarized light (see Haidinger's Brush). Perhaps the brain can use this information in determining where the light is really coming from?
'Realtime visualization of monocular data for 3D reconstruction' was a treat for me because it relates so well to my planned measurement with a camera project. To me, this paper is like an instruction book on how to model 3D space from a single camera. I must look into its Simultaneous Localization and Mapping (SLAM) methods and other tricks. Monocular is cheap, stereo is more accurate? Again, the system doesn't have a clue what its looking at, but it may be a good start for a more complex system. I must analyze to more depth.
'Object Class Recognition using Quadrangles' is a general-purpose implementation of edge-based object recognition which also considers colour uniform regions. On top of this the authors implemented a structural descriptor (the paper describes quadrangles only, but the speaker described use of ellipse descriptors in their newer work) and template-based spacial relationship matching system much simpler than that used by Sinisa Todorovic's self-learning segment-based system (but less capable). 'Geometrical Primitives for the Classification of Images Containing Structural Cartographic Objects' is another system base on edge/ region/ structural descriptor, but focused on the single problem of finding roads, bridges and such in satellite imagery. The software seems to be more capable, handling higher level primitives such as blobs, polygons, arcs and junctions. It uses AdaBoost binary classification. Results were good except for detecting bridges. I suggest looking for the bridge shadows.
Most of the motion tracking papers used very simple recognition techniques or did not describe them. '3D Human Motion Tracking Using Dynamic Probabilistic Latent Semantic Analysis' presents a highly mathematical approach that seems to be another form of template matching. It works well but it will take quite an effort for me to understand it. 'Visual-Model Based Spatial Tracking in the Presence of Occlusions' presents a pre-processing trick to mask occlusions from a template/visual-model based 3D tracking system. While the system is highly performant, using the GPU, it is highly specialized to a single object. 'Automatically Detecting and Tracking People Walking Through Transparent Door with Vision' tracks Harris corners through time. It can be taught to subtract expected movements from new ones, by simple geometric trajectory comparison. This can serve many applications, but the use of just corners means it can't tell you what is moving through the scene. But could specializations like this be used as keys to brain behaviour? e.g. does the brain just use the moving corners of a door to perceive it? 'Invariant Classification of Gait Types' classifies body movements by comparison to a database of shape contexts derived from template silhouettes. This is an efficient and accurate method used in handwriting recognition, and I think I'll look into it more, because the bin concept applied to pattern matching lends itself to implementation using neural networks.
'Active Vision for Door Localization and Door Opening using Playbot' is another specialization - for doorframes and handles. Its advance is active vision. The robot solves its position geometrically after it detects the door by using a pre-programmed door size - meaning it will only work with one size of door. The active vision part is that the robot takes pictures at multiple angles and multiple positions and solves its position using the camera angles and the detected door edges and corners, then calculates a move to a new position. 'Automatic Pyramidal Intensity-based Laser Scan Matcher for 3D Modeling of Large Scale Unstructured Environments' tackles an incredibly hard problem of mosaicing adjacent spherical laser images without feature, location or rotation information by matching their overlapping depth values. This is useful for other mosaicing problems, but I don't think its complicated methods will be required in most computer vision applications which will have less interval between images and can rely on feature detection and sense of place. '6D Vision Goes Fisheye for Intersection Assistance' shows that using fisheye lenses provides a wider angle of view with only small hits on their low processing time requirements and relatively wide accuracy requirements in a real-time stereo mobile object tracking application. 'Challenges of Vision for Real-Time Sensor Based Control' explains how additional sensor input to an extended Kalman filter can be used to supplement poor video data caused by bad camera angles.

One thing I brought away from this conference is that although there is a large amount of existing work and many new efforts in the computer vision field, the presented applications are not trying to understand or duplicate biology. They're using mathematical methods to solve specific problems. Well perhaps the concepts can be implemented in neural networks. And the solutions are so specific! I guess it'll be a long time until there is general purpose vision. And not surprisingly so, because that will require general purpose concept representation. Too bad I didn't hear the AI papers too.
Since this was my first academic conference, I learned that what to look for in papers is what is new, or what can be adapted to my purpose. Attending has motivated me to get an IEEE membership so I can access more research papers. Poster presentations seem pretty valueless to me. Either they don't present enough information or I am forced to stand while reading an entire paper.
Another thing is that there is a lot of existing technology out there that can be used to solve problems. A counterpoint to this and a kind of semi-corollary to the first point is that a lot of the existing technology is highly focused, inaccurate and slow, so there is still a lot of research and development needed.
From a business point of view, I got no leads on paying work. Some people at the conference believe that contracting in this field can be viable. But I think I'll have to prove I'm capable by example before anyone will hire me. Since most researchers only solve special cases, another opportunity is to complete a project to make it useful in lots of situations.

Sunday, April 13, 2008

Web Site Update

I've been busy. Not making any money, but busy. When the jobs aren't pouring in it's time to concentrate on marketing. So I'm prospecting and networking. And I've updated the Nimajin website. It looks much more professional now. I've added a resume and portfolio so people can learn more about me. I'm not real happy with the site yet - it concentrates on my past and not my future. So it'll change again.
The future is not real clear to me now. I like to be associated with science and communications. My intention to concentrate on media processing and content recognition is still attractive to me, but I'm not finding much interest from others. I think it's mostly because of my lack of communication. I'll work on that. Some friends say I've got to jump on the web services bandwagon to make a living. Improving people's advertising or bookkeeping is very useful, but not as incredibly fascinating to me as making a machine able to draw a floor plan of my house from photographs or making it able to fly me through town from surveillance camera input. I know what it'll take to do these and I'll keep working on them, but it'd be great to have some sponsorship so I could afford some more time to work on them.
I think this happens to a lot of people. There's not enough commercial value to our ambitions of making machines more intelligent, so over time we have to abandon our efforts and go where the money is. I'm still holding on!
That isn't really where I wanted to go with this post, but heck I'll post anything. My goal is to perkily point out how great the new web site is and how ready I am to go an extra mile, learn more new technology and build anything you can think of. What I really love is making people happy.

Monday, January 28, 2008

Tune Searching

You know when you remember a bit of melody, but can't place the song? I've got an idea to create a website to accept input of musical notation and find out what the song is and return information, MP3's and sheet music. So all we need is a simple notation input, a database of notation for all songs in history, rights to use the notation, a fast and smart matching algorithm, a web front end, and we've got a music search site!

Input:

Maybe make the PC's qwerty keyboard act like a musical keyboard
Capture from a midi device
Put a musical keyboard on the screen & let them click on it
Accept input from microphone and decode the essentials to notation
Show their input on screen in some form of notation
Allow playback so they can interactively adjust it until it sounds right to them
Include some way to adjust the tempo (slider)
Include some way to adjust the spacing (timing of each note), like stretching it or a slider
Include something to adjust the timbre / instrument they hear on playback

Database:

Once the input is in, convert it to some kind of text or binary
Encode the database in the same format and search
We'll get snippets as input and it'll be timed wrong, out of key and notes will be wrong
We'll want closest matches

A brief look for 'music search' returns nothing. All music search is by artist, song name etc, even for sheet music. It seems somebody tried something like what I want years ago at ThemeFinder - I don't understand what input they're asking for. I found out about Abc notation, which stores notation as text, and has software that can play midi and generate musical notation. More looking. 'tunesearch' gives Richard Robinson's Tunebook Search and JC's ABC Tune Match at trillian.mit.edu. They all seem related to ThemeFinder.

Then I found Musipedia. Musipedia is almost what I was planning. It lets you input by keyboard, by drawing notes, tapping in the rythm or by humming, singing or whistling. All the stuff I thought of except the editing. And ugly and kludgy. I tried it. I suck at keyboards and didn't take the time to put in Pachelbel's Canon well enough that I recognized it on playback. It was - almost. Of course it didn't match that to anything I knew. When I whistled it, the closest thing it found that I knew was The Doors' 'People Are Strange'! It didn't find Elvis' 'Hound dog' by me tapping either. I think it's database is limited. The database is like a wiki though - anybody can add more tunes. This is a great idea. Maybe it's search is the problem. I'm disappointed. Musipedia is based on a prize-winning retrieval mechanism by Rainer Typke. I bought his book.

Musipedia included a Google ad from Midomi. At Midomi you to sing into your microphone and it finds a match for what you sing. In contrast to Musipedia, this site is slick, and it works. It helped me get my microphone level correct. When I sang 'I'm Leaving On A Jet Plane' (after doing some talking first) it knew it. It uses more than one rendition of a song to make a match. It encourages people to sing songs that they know and it uses these in its database. It seems to be getting lots of karaoke people.

I don't know if these guys are making money because they're both covered with Google ads. Musipedia could use some polish and Midomi needs instrumental input. So should I persue my idea?

Thursday, January 17, 2008

The Future is Coming!

Software UI is coming to look and act more like Star Trek control consoles.

The iPhone's UI looks like Star Trek graphics. Check out the black background and brightly coloured icons, all behind a glossy glass surface.
Office 2007 is starting to look like that and all of the WPF examples too.

But the really big deal is that Star Trek consoles are touch screens. They are operated by tapping on them. The iPhone operates by tapping, pinching, dragging, flicking, etc. The Microsoft Surface is so like a Star Trek console. It's a big, bright touch console. We are there!

I'm real happy about this. I've always loved the way the consoles looked in Star Trek. I want to be flicking chunks of code around in one of those one day. It makes me wonder whether all of the computer developers love Star Trek consoles too, or if the Star Trek set designers just correctly imagined the future.

Tuesday, January 15, 2008

Vista Spooler.xml

I was working away on my less than year old Vista PC with a 250 Gig hard drive when Windows tells me the disk is getting full! Windows is all like 'disk cleanup' and 'remove some programs'. Serious! Was it the VS2008 I just installed? No.
I used Silurian DiskSpaceChart to find out where all my disk was going. That's a typical disk usage pie chart thing that puts itself in the right menu. You can drill down through the large folders to find the large file. It's OK. It was the first thing I found for Vista.
So what I found was something writing continuously to C:/Windows/system32/spool/spooler.xml, at like 300 MB per minute! I tried to find out who was writing the file. Resource Monitor helpfully identified 'system'. Thanks. I didn't have SysInternals Process Monitor installed, and I didn't want to try while the machine was so sick, so I couldn't get any details on who was doing all that writing.
I rebooted into safe mode and deleted the file.
When I booted back the file was 33k and stayed that way. I think the file is a printer log and the problem is either my HP printer drivers for the 3390 or MS XPS. I recall running WireShark and seeing every machine with those 3390 drivers polling the printer status over the LAN every millisecond.