{"id":87405,"date":"2013-10-05T13:57:46","date_gmt":"2013-10-05T11:57:46","guid":{"rendered":"http:\/\/mybroadband.co.za\/news\/?p=87405"},"modified":"2013-10-05T13:58:44","modified_gmt":"2013-10-05T11:58:44","slug":"teaching-computers-to-see","status":"publish","type":"post","link":"https:\/\/mybroadband.co.za\/news\/software\/87405-teaching-computers-to-see.html","title":{"rendered":"Teaching computers to see"},"content":{"rendered":"<p>Object-recognition systems \u2014 software that tries to identify objects in digital images \u2014 typically rely on machine learning. They comb through databases of previously labeled images and look for combinations of visual features that seem to correlate with particular objects. Then, when presented with a new image, they try to determine whether it contains one of the previously identified combinations of features.<\/p>\n<p>Even the best object-recognition systems, however, succeed only around 30 or 40 percent of the time \u2014 and their failures can be totally mystifying. Researchers are divided in their explanations: Are the learning algorithms themselves to blame? Or are they being applied to the wrong types of features? Or \u2014 the \u201cbig-data\u201d explanation \u2014 do the systems just need more training data?<\/p>\n<p>To attempt to answer these and related questions, researchers at MIT\u2019s Computer Science and Artificial Intelligence Laboratory have created a system that, in effect, allows humans to see the world the way an object-recognition system does. The system takes an ordinary image, translates it into the mathematical representation used by an object-recognition system and then, using inventive new algorithms, translates it back into a conventional image.<\/p>\n<p>In a paper to be presented at the upcoming International Conference on Computer Vision, the researchers report that, when presented with the retranslation of a translation, human volunteers make classification errors that are very similar to those made by computers. That suggests that the learning algorithms are just fine, and throwing more data at the problem won\u2019t help; it\u2019s the choice of features that\u2019s the culprit. The researchers are hopeful that, in addition to identifying the problem, their system will also help solve it, by letting their colleagues reason more intuitively about the consequences of particular feature decisions.<\/p>\n<p><strong>Whole HOG<\/strong><\/p>\n<p>Today, the feature set most widely used in object-detection research is called the histogram of oriented gradients, or HOG (hence the name of the MIT researchers\u2019 system: HOGgles). HOG first breaks an image into square chunks, usually eight pixels by eight pixels. Then, for each square, it identifies a \u201cgradient,\u201d or change in color or shade from one region to another. It characterizes the gradient according to 32 distinct variables, such as its orientation \u2014 vertical, horizontal or diagonal, for example \u2014 and the sharpness of the transition \u2014 whether it changes color suddenly or gradually.<\/p>\n<p>Thirty-two variables for each square translates to thousands of variables for a single image, which define a space with thousands of dimensions. Any conceivable image can be characterized as a single point in that space, and most object-recognition systems try to identify patterns in the collections of points that correspond with particular objects.<\/p>\n<p>\u201cThis feature space, HOG, is very complex,\u201d says Carl Vondrick, an MIT graduate student in electrical engineering and computer science and first author on the new paper. \u201cA bunch of researchers sat down and tried to engineer, \u2018What\u2019s the best feature space we can have?\u2019 It\u2019s very high-dimensional. It\u2019s almost impossible for a human to comprehend intuitively what\u2019s going on. So what we\u2019ve done is built a way to visualize this space.\u201d<\/p>\n<p>Vondrick; his advisor, Antonio Torralba, an associate professor of electrical engineering and computer science; and two other researchers in Torralba\u2019s group, graduate student Aditya Khosla and postdoc Tomasz Malisiewicz, experimented with several different algorithms for converting points in HOG space back into ordinary images. One of those algorithms, which didn\u2019t turn out to be the most reliable, nonetheless offers a fairly intuitive understanding of the process.<\/p>\n<p>The algorithm first produces a HOG for an image and then scours a database for images that match it \u2014 on a very weak understanding of the word \u201cmatch.\u201d<\/p>\n<p>\u201cBecause it\u2019s a weak detector, you won\u2019t find very good matches,\u201d Vondrick explains. \u201cBut if you average all the top ones together, you actually get a fairly good reconstruction. Even though each detection is wrong, each one still captures the statistics of the original image patch.\u201d<\/p>\n<p><strong>Dictionary definition<\/strong><\/p>\n<p>The reconstruction algorithm that ended up proving the most reliable is more complex. It uses a so-called \u201cdictionary,\u201d a technique that\u2019s\u00a0<a href=\"http:\/\/web.mit.edu\/newsoffice\/2013\/multiview-3d-photography-made-simple-0619.html\" target=\"_self\">increasingly popular\u00a0<\/a>in computer-vision research. The dictionary consists of a large group of HOGs with fairly regular properties: One, for instance, might have a top half that\u2019s all diagonal gradients running bottom left to upper right, while the bottom half is all horizontal gradients; another might have gradients that rotate slowly as you move from left to right across each row of squares. But any given HOG can be represented as a weighted combination of these dictionary \u201catoms.\u201d<\/p>\n<p>The researchers\u2019 algorithm assembled the dictionary by analyzing thousands of images downloaded from the Internet and settled on the dictionary that allowed it to reconstruct the HOG for each of them with, on average, the fewest atoms. The trick is that, for each atom in the dictionary, the algorithm also learned the ordinary image that corresponds to it. So for an arbitrary HOG, it can apply the same weights to the ordinary images that it does to the dictionary atoms, producing a composite image.<\/p>\n<p>Those composites are quite striking. What appears to be a blurry image of a woman sitting at a vanity mirror, for instance, turns out to be a reconstruction of the HOG produced by a photo of an airplane sailing over a forest canopy. And, indeed, a standard object-recognition system will, erroneously, identify a person in the image of the plane. It\u2019s a mistake that\u2019s baffling without the elucidation offered by the HOGgles.<\/p>\n<p>To quantify the intuition that, given the representations of images in HOG space, object detectors\u2019 false positives are not as bizarre as they initially seem, the MIT researchers presented collections of their HOG reconstructions to volunteers recruited through Amazon\u2019s Mechanical Turk crowdsourcing service. The volunteers were slightly better than machine-learning algorithms at identifying the objects depicted in the reconstructions, but only slightly \u2014 nowhere near the disparity of 60 or 70 percent when object detectors and humans are asked to identify objects in the raw images. And the dropoff in accuracy as the volunteers moved from the easiest cases to the more difficult ones mirrored that of the object detectors.<\/p>\n<p><strong>Building intuitions<\/strong><\/p>\n<p>\u201cOne of the beauties of our field is that, unlike something like statistics or some kind of financial data, you can see what you\u2019re working on,\u201d says Alexei Efros, an associate professor of computer science and electrical engineering at the University of California at Berkeley who works on computer vision. \u201cI think having large-scale data in computer vision is a very important phenomenon, but a negative side product of this has been that the new students, the new researchers \u2026 don\u2019t look at the pixels anymore. They\u2019re so overwhelmed with the data, there are so many images, that they\u2019re just treating it as if it were stock market data, or biosequence data, or any kind of other data. They\u2019re just looking at graphs and curves and spreadsheets and tables.\u201d<\/p>\n<p>The MIT researchers\u2019 work could be a corrective to that trend, Efros says. \u201cI think that is what appeals to me,\u201d he says. \u201cIt\u2019s breaking the tide of students not looking at images.\u201d<\/p>\n<p>Efros adds that, in a more direct way, HOGgles could be a useful research tool. \u201cIf you\u2019re looking to do some task, and you\u2019re using this [HOG] descriptor, and it doesn\u2019t work, then before, you basically just stared at your code and you stared at the numbers and you thought, \u2018I have no idea,\u2019\u201d he says. \u201cNow you can really just invert the data and at least look to see whether the computer even had any chance.\u201d<\/p>\n<p>\u201cBut it\u2019s not just a tool for getting better descriptors,\u201d he adds. \u201cIt\u2019s a tool for building up intuitions.\u201d<\/p>\n<p><em>Reprinted with permission of\u00a0<a title=\"MIT\" href=\"http:\/\/web.mit.edu\/newsoffice\/\" target=\"_blank\">MIT News<\/a><\/em><\/p>\n<h3 class=\"my-4\">More on Internet<\/h3>\n<p><a title=\"Faster Internet, designed by computers\" href=\"http:\/\/mybroadband.co.za\/news\/internet\/82669-faster-internet-designed-by-computers.html\"><strong>Faster Internet, designed by computers<\/strong><\/a><\/p>\n<p><a title=\"New model of wireless networks\" href=\"http:\/\/mybroadband.co.za\/news\/general\/83927-new-model-of-wireless-networks.html\"><strong>New model of wireless networks<\/strong><\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>By translating images into the language spoken by object-recognition systems, then translating them back, researchers hope to explain the systems\u2019 failures, writes Larry Hardesty from MIT<\/p>\n","protected":false},"author":340941,"featured_media":87409,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_sma_x_autopost_status":"idle","_sma_x_autopost_error":"","_sma_x_post_id":"","_sma_x_attempts":0,"footnotes":""},"categories":[16],"tags":[21545,35,21543,7699],"class_list":["post-87405","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-software","tag-carl-vondrick","tag-headline","tag-larry-hardesty","tag-mit"],"_links":{"self":[{"href":"https:\/\/mybroadband.co.za\/news\/wp-json\/wp\/v2\/posts\/87405"}],"collection":[{"href":"https:\/\/mybroadband.co.za\/news\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mybroadband.co.za\/news\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mybroadband.co.za\/news\/wp-json\/wp\/v2\/users\/340941"}],"replies":[{"embeddable":true,"href":"https:\/\/mybroadband.co.za\/news\/wp-json\/wp\/v2\/comments?post=87405"}],"version-history":[{"count":1,"href":"https:\/\/mybroadband.co.za\/news\/wp-json\/wp\/v2\/posts\/87405\/revisions"}],"predecessor-version":[{"id":87407,"href":"https:\/\/mybroadband.co.za\/news\/wp-json\/wp\/v2\/posts\/87405\/revisions\/87407"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/mybroadband.co.za\/news\/wp-json\/wp\/v2\/media\/87409"}],"wp:attachment":[{"href":"https:\/\/mybroadband.co.za\/news\/wp-json\/wp\/v2\/media?parent=87405"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mybroadband.co.za\/news\/wp-json\/wp\/v2\/categories?post=87405"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mybroadband.co.za\/news\/wp-json\/wp\/v2\/tags?post=87405"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}