Via Geeking with Greg, where he points to paper by four Googlers (Its a PDF Link : “The Happy Searcher: Challenges in Web Information Retrieval”) regarding challenges facing image/video/music search.
Read the challenges about image search, sound search and video search. Here are my thoughts about for improving image search.
Lets us say content recognition systems are not ready for real time. But I am sure image matching is ready for real time? Why are images getting indexed only on the basis of what the images are named and the surrounding text?
Take this example, this is one of the top results for “cars” image search on Google. The image below is on result number 11 on Google’s image search results? Its a car alright, but not something I was searching for.
(image credit : www.turtleduck.org)
And this one ? Its No.7 on the results. Beats me.
(image credit : www.smh.com.au)
The page ranking works well for text search but it irrelevant for images. What you need is image ranking algorithm that ranks images on the basis of the following
1. Occurrence of an image: How many times the same image is occurring on different websites? Here we need to able to search if two images are one and the same, i.e. the same photograph with no alterations. Though the image might be called DSC001.jpeg on one website and cars.jpeg on other, search engine can now have a ranking with regards to the popularity of the image. We are have not yet decided if the image is a car or if it is called “DSC001.jpeg”. We are just searching for the occurrence of this image on various website.
2. Occurrence of variation of full image versions: Image might be 800X600PX on one website or 1024X768PX on another and I am assuming image recognition software is a mature to identify that the two images are the same. We are not talking about content identification, but just image identification, to find out if two images are one and the same. This again counts to the ranking of the image.
3. Occurrence of cropped versions of that image: Cropped images, where sections of images are posted, so one image might just have the car and not the surrounding, while another one has entire image with surrounding etc. Based on this the image recognition algorithm should be able recognize the right image. Error prone, so should have less weight age in the image ranking algorithm.
4. Now use, what is being currently used for image search. Use the surroundings text, name of the image, name of the link leading to the image as evidence of the nature of image. Between two images that have same number of occurrences, the one that is referred to as “cars” more number of times by the surroundings text takes higher precedent in the result over the other.
5. Based on the above four derive the ranking of images for a particular keyword.
Update: Image, Video, Multimedia searches are the new frontiers for growth. As it is the quality of these searches are not upto the mark and this and this going to further put pressure on the evolution of image search.