Facebook allows AI to use natural language to guide each other, and the correct rate is higher than humans.

The Facebook Artificial Intelligence research team published a study called Talk the Walk, which allowed an AI wizard to use natural language to direct another AI visitor to a specific location. Facebook uses a called MASC (Masked Attention for Spatial Convolution). Focus on the mechanism, help the AI ​​guide to understand the natural language, and turn the information obtained into a 2D top-down map marked with landmarks. The overall success rate of the guide is 87.08%, which has won the performance of human beings.

According to Facebook, the application of artificial intelligence can not be limited to the virtual assistant to provide some functions such as voice or text, not only to understand the human language, but also to interact with the environment, so that it can be used for human daily life. Help. The FAIR team used a 360-degree camera to capture streetscape information for five New York neighborhoods, including Manhattan's Hell's Kitchen and East Village. The features of these neighborhoods are square, with a typical four-corner street intersection grid layout, and let AI simulation A situation in which one looks at a map and directs another person through a message.

The purpose of this task is to guide the AI ​​to the specific location, the AI ​​passengers use the 360-degree camera to obtain the street image, and the AI ​​guide uses the 2D overhead map labeled with the restaurant, hotel and other landmarks. In the case where the two-party map cannot be shared, the AI ​​passenger is guided by the natural language. When the AI ​​passenger arrives at the destination, the experiment ends. When it arrives at the right place, it is successful. If it arrives at the wrong place, it fails. The process does not limit the number of communication and the number of AI passengers moving. .

The research team asked AI to learn how human gamers communicated, so there would be no elaborate statement structure like Google Maps Navigation, "Go to the next block, then turn right into the restaurant." The research team was in the real world. In the middle experiment, FAIR mentioned that compared with the actual city block, the simulation environment is usually less chaotic and more predictable, so it is difficult to capture the real application context.

The ultimate goal of Talk the Walk is to help the computer communicate clearly with humans. The FAIR team also adopted a new state of mind mechanism MASC, which allows the AI ​​Wizard to translate the information of AI travelers into a 2D overhead map and predict The location of the AI ​​traveler. The focus mechanism is usually used in deep learning to simulate the attention of human beings to their own learning. MASC translates into map landmarks according to the moving state of AI passengers moving to the left or to the right. Semantic understanding associates the map of the navigation.

Talk the Walk achieves more concrete results of natural language communication. For example, AI travelers will not only describe the restaurant in front, but will provide more information on the way forward. The AI ​​guide guides the success of AI travelers to the correct location. The rate is 87.07%, while humans only have 76.74. FAIR mentioned that this is a predictable result. Because natural language has its shortcomings, vagueness and uncertainty will reduce the efficiency of communication, and natural language communication between AI guides and AI travelers. After a period of training and fine-tuning, only the words related to the task are generated, and the communication efficiency is improved.

2016 GoodChinaBrand | ICP: 12011751 | China Exports