{"id":242926,"date":"2017-12-23T13:01:06","date_gmt":"2017-12-23T11:01:06","guid":{"rendered":"https:\/\/mybroadband.co.za\/news\/?p=242926"},"modified":"2017-12-23T13:02:56","modified_gmt":"2017-12-23T11:02:56","slug":"computer-systems-predict-objects-responses-to-physical-forces","status":"publish","type":"post","link":"https:\/\/mybroadband.co.za\/news\/science\/242926-computer-systems-predict-objects-responses-to-physical-forces.html","title":{"rendered":"Computer systems predict objects\u2019 responses to physical forces"},"content":{"rendered":"<p>Josh Tenenbaum, a professor of brain and cognitive sciences at MIT, directs research on the development of intelligence at the Center for Brains, Minds, and Machines, a multiuniversity, multidisciplinary project based at MIT that seeks to explain and replicate human intelligence.<\/p>\n<p>Presenting their work at this year\u2019s Conference on Neural Information Processing Systems, Tenenbaum and one of his students, Jiajun Wu, are co-authors on four papers that examine the fundamental cognitive abilities that an intelligent agent requires to navigate the world: discerning distinct objects and inferring how they respond to physical forces.<\/p>\n<p>By building computer systems that begin to approximate these capacities, the researchers believe they can help answer questions about what information-processing resources human beings use at what stages of development. Along the way, the researchers might also generate some insights useful for robotic vision systems.<\/p>\n<p>\u201cThe common theme here is really learning to perceive physics,\u201d Tenenbaum says. \u201cThat starts with seeing the full 3-D shapes of objects, and multiple objects in a scene, along with their physical properties, like mass and friction, then reasoning about how these objects will move over time. Jiajun\u2019s four papers address this whole space. Taken together, we\u2019re starting to be able to build machines that capture more and more of people\u2019s basic understanding of the physical world.\u201d<\/p>\n<p>Three of the papers deal with inferring information about the physical structure of objects, from both visual and aural data. The fourth deals with predicting how objects will behave on the basis of that data.<\/p>\n<h3 class=\"my-4\">Two-way street<\/h3>\n<p>Something else that unites all four papers is their unusual approach to machine learning, a technique in which computers learn to perform computational tasks by analyzing huge sets of training data. In a typical machine-learning system, the training data are labeled: Human analysts will have, say, identified the objects in a visual scene or transcribed the words of a spoken sentence. The system attempts to learn what features of the data correlate with what labels, and it\u2019s judged on how well it labels previously unseen data.<\/p>\n<p>In Wu and Tenenbaum\u2019s new papers, the system is trained to infer a physical model of the world \u2014 the 3-D shapes of objects that are mostly hidden from view, for instance. But then it works backward, using the model to resynthesize the input data, and its performance is judged on how well the reconstructed data matches the original data.<\/p>\n<p>For instance, using visual images to build a 3-D model of an object in a scene requires stripping away any occluding objects; filtering out confounding visual textures, reflections, and shadows; and inferring the shape of unseen surfaces. Once Wu and Tenenbaum\u2019s system has built such a model, however, it rotates it in space and adds visual textures back in until it can approximate the input data.<\/p>\n<p>Indeed, two of the researchers\u2019 four papers address the complex problem of inferring 3-D models from visual data. On those papers, they\u2019re joined by four other MIT researchers, including William Freeman, the Perkins Professor of Electrical Engineering and Computer Science, and by colleagues at DeepMind, ShanghaiTech University, and Shanghai Jiao Tong University.<\/p>\n<h3 class=\"my-4\">Divide and conquer<\/h3>\n<p>The researchers\u2019 system is based on the influential theories of the MIT neuroscientist\u00a0<a href=\"https:\/\/mitpress.mit.edu\/books\/vision-0\">David Marr<\/a>, who died in 1980 at the tragically young age of 35. Marr hypothesized that in interpreting a visual scene, the brain first creates what he called a 2.5-D sketch of the objects it contained \u2014 a representation of just those surfaces of the objects facing the viewer. Then, on the basis of the 2.5-D sketch \u2014 not the raw visual information about the scene \u2014 the brain infers the full, three-dimensional shapes of the objects.<\/p>\n<p>\u201cBoth problems are very hard, but there\u2019s a nice way to disentangle them,\u201d Wu says. \u201cYou can do them one at a time, so you don\u2019t have to deal with both of them at the same time, which is even harder.\u201d<\/p>\n<p>Wu and his colleagues\u2019 system needs to be trained on data that include both visual images and 3-D models of the objects the images depict. Constructing accurate 3-D models of the objects depicted in real photographs would be prohibitively time consuming, so initially, the researchers train their system using synthetic data, in which the visual image is generated from the 3-D model, rather than vice versa. The process of creating the data is like that of creating a computer-animated film.<\/p>\n<p>Once the system has been trained on synthetic data, however, it can be fine-tuned using real data. That\u2019s because its ultimate performance criterion is the accuracy with which it reconstructs the input data. It\u2019s still building 3-D models, but they don\u2019t need to be compared to human-constructed models for performance assessment.<\/p>\n<p>In evaluating their system, the researchers used a measure called intersection over union, which is common in the field. On that measure, their system outperforms its predecessors. But a given intersection-over-union score leaves a lot of room for local variation in the smoothness and shape of a 3-D model. So Wu and his colleagues also conducted a qualitative study of the models\u2019 fidelity to the source images. Of the study\u2019s participants, 74 percent preferred the new system\u2019s reconstructions to those of its predecessors.<\/p>\n<h3 class=\"my-4\">All that fall<\/h3>\n<p>In another of Wu and Tenenbaum\u2019s papers, on which they\u2019re joined again by Freeman and by researchers at MIT, Cambridge University, and ShanghaiTech University, they train a system to analyze audio recordings of an object being dropped, to infer properties such as the object\u2019s shape, its composition, and the height from which it fell. Again, the system is trained to produce an abstract representation of the object, which, in turn, it uses to synthesize the sound the object would make when dropped from a particular height. The system\u2019s performance is judged on the similarity between the synthesized sound and the source sound.<\/p>\n<p>Finally, in their fourth paper, Wu, Tenenbaum, Freeman, and colleagues at DeepMind and Oxford University describe a system that begins to model humans\u2019 intuitive understanding of the physical forces acting on objects in the world. This paper picks up where the previous papers leave off: It assumes that the system has already deduced objects\u2019 3-D shapes.<\/p>\n<p>Those shapes are simple: balls and cubes. The researchers trained their system to perform two tasks. The first is to estimate the velocities of balls traveling on a billiard table and, on that basis, to predict how they will behave after a collision. The second is to analyze a static image of stacked cubes and determine whether they will fall and, if so, where the cubes will land.<\/p>\n<p>Wu developed a representational language he calls scene XML that can quantitatively characterize the relative positions of objects in a visual scene. The system first learns to describe input data in that language. It then feeds that description to something called a physics engine, which models the physical forces acting on the represented objects. Physics engines are a staple of both computer animation, where they generate the movement of clothing, falling objects, and the like, and of scientific computing, where they\u2019re used for large-scale physical simulations.<\/p>\n<p>After the physics engine has predicted the motions of the balls and boxes, that information is fed to a graphics engine, whose output is, again, compared with the source images. As with the work on visual discrimination, the researchers train their system on synthetic data before refining it with real data.<\/p>\n<p>In tests, the researchers\u2019 system again outperformed its predecessors. In fact, in some of the tests involving billiard balls, it frequently outperformed human observers as well.<\/p>\n<p>&#8220;The key insight behind their work is utilizing forward physical tools \u2014 a renderer, a simulation engine, trained models, sometimes \u2014 to train generative models,&#8221; says Joseph Lim, an assistant professor of computer science at the University of Southern California. &#8220;This simple yet elegant idea combined with recent state-of-the-art deep-learning techniques showed great results on multiple tasks related to interpreting the physical world.&#8221;<\/p>\n<p><a href=\"http:\/\/news.mit.edu\/2017\/computer-systems-predict-objects-responses-physical-forces-1214\" target=\"_blank\" rel=\"noopener\">MIT News<\/a><\/p>\n<h3 class=\"my-4\">Now read:\u00a0<a href=\"https:\/\/mybroadband.co.za\/news\/science\/241500-the-needle-free-injection.html\" rel=\"bookmark\">The needle-free injection<\/a><\/h3>\n","protected":false},"excerpt":{"rendered":"<p>MIT seeks to explain and replicate human intelligence.<\/p>\n","protected":false},"author":340957,"featured_media":242932,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[31750],"tags":[27887,35],"class_list":["post-242926","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-science","tag-ai","tag-headline"],"_links":{"self":[{"href":"https:\/\/mybroadband.co.za\/news\/wp-json\/wp\/v2\/posts\/242926"}],"collection":[{"href":"https:\/\/mybroadband.co.za\/news\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mybroadband.co.za\/news\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mybroadband.co.za\/news\/wp-json\/wp\/v2\/users\/340957"}],"replies":[{"embeddable":true,"href":"https:\/\/mybroadband.co.za\/news\/wp-json\/wp\/v2\/comments?post=242926"}],"version-history":[{"count":1,"href":"https:\/\/mybroadband.co.za\/news\/wp-json\/wp\/v2\/posts\/242926\/revisions"}],"predecessor-version":[{"id":242934,"href":"https:\/\/mybroadband.co.za\/news\/wp-json\/wp\/v2\/posts\/242926\/revisions\/242934"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/mybroadband.co.za\/news\/wp-json\/wp\/v2\/media\/242932"}],"wp:attachment":[{"href":"https:\/\/mybroadband.co.za\/news\/wp-json\/wp\/v2\/media?parent=242926"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mybroadband.co.za\/news\/wp-json\/wp\/v2\/categories?post=242926"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mybroadband.co.za\/news\/wp-json\/wp\/v2\/tags?post=242926"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}