Accurate is not enough | AI machine vision fight | 'Efficiency'@goodchinabrand.com

Although there are numerous artificial intelligence processors competing to grab the market - each claiming to be a "breakthrough" - today's AI community is still beset by countless problems, including energy, speed, AI hardware size and AI algorithms, None of these have proven to be improved in toughness and performance.

In computer vision, such as Rogerio Feris, manager of computer vision and multimedia research at IBM Research, the biggest challenge is how to 'make visual analysis more efficient'. In particular, AI is still in its early stages of development and needs new Ideas, long-term vision, and more investment in research and development by academics and research institutions.

IBM Research publishes two papers on AI software and hardware technology at this week's 2018 Conference on Computer Vision and Pattern Recognition (CVPR) held in Salt Lake City this week. CVPR is sponsored by the Computer Vision Foundation and the IEEE Computer Society and is regarded as one of the most competitive computer vision technology seminars.

In the AI hardware section, IBM Research is promoting a stereo-vision system that applies brain-inspired spiking neural-network technology to data acquisition (sensors). Developed by Data Processing; The design leverages IBM's own TureNorth chip - a non-von-Neumann architecture processor - and an event-driven development by Swiss industry iniLabs. )camera

IBM's TrueNorth architecture (Source: IBM)

In the AI software section, IBM Research's paper is about 'Blockdrop', a key step that is considered to reduce the total amount of computation required for deep residual networks. Feris explained that the above two papers are from Solve the same problem from two different perspectives - visual analysis efficiency.

Feris said that when someone wants to cross the road, a self-driving vehicle is expected to make an 'instant inference'; although the accuracy of image recognition is very important, but how much time it takes to drive a car to draw conclusions and identify what it is, is Its ultimate test in real world applications.

What is 'Blockdrop'?

The residual network that became the winner at ImageNet 2015, set off a storm in the computer vision technology community; this technology has proved that it can provide excellent recognition results because it can train hundreds or even hundreds of neural networks. Thousands of layers. However, Feris points out: 'Applying the one-piece applicable calculations required by the residual network to all imaging is too inefficient;' he explained that if there is a dog in front of a white background, it will be better than in a busy urban street scene. Easier to identify.

For this reason, IBM Research developed BlockDrop, which is a method to learn which blocks (including multiple layers) in the residual network to dynamically perform inference tasks; Feris pointed out: 'The goal of this method is to properly reduce the overall computing Without loss of forecast accuracy.

BlockDrop instructions (Source: IBM)

IBM claims that BlockDrop can increase recognition speed by an average of 20% during testing, sometimes even up to 36%, without sacrificing the accuracy of the residual network achieved in the ImagNet dataset. Feris said that IBM's study was in 2017. In the summer with the University of Texas and the University of Maryland, the company will release BlockDrop to the open source community.

Neuromorphic techniques for stereo vision applications

In terms of hardware, IBM Research is aiming at a stereoscopic vision system that uses spiked neural networks; the company stated that at present, the industry is using two traditional (frame) cameras to generate stereoscopic vision, but no one has ever tried it yet. Neuromorphic technology. Although it is not impossible to provide stereoscopic images using conventional cameras, high-definition video signal processing is required, such as high dynamic range (HDR) imaging, ultra-high resolution processing, and automatic calibration.

According to IBM researcher Alexander Andreopoulos described in the paper, the system is the use of two iniLabs developed event-oriented camera (also known as dynamic vision sensor - DVSe), after capturing the picture with IBM TrueNorth chip cluster to extract fast moving objects depth.

IBM's goal is to significantly reduce the power consumption and latency required to obtain stereoscopic images. After receiving a live spike input (which has drastically reduced the amount of data), the system uses IBM's neuromorphic hardware to reconstruct 3D images. Estimate the difference between the images from two DVSe and locate the objects in 3D space by triangulation.

Neuromorphic stereo images (Source: IBM)

Data retrieval and processing

A French start-up company, Prophesee, uses neuromorphology to capture data and reduce the amount of data collected by sensors. The company’s sensor technology is not based on frames, but it simplifies and creates data suitable for machine use. Design goals. In an earlier interview with EE Times, Prophesee said that this can drastically reduce the data burden and should allow the car to make almost instant decisions.

However, the new generation of IBM stereoscopic vision system not only uses human brain technology for data retrieval, but also for data processing to reconstruct stereoscopic images; Andreopoulos said that the system has one of the biggest achievements, which is programmed by TrueNorth. Effectively implement various 'sub-routines' that are necessary for 'spinning neural network stereo vision'. IBM added that the TrueNorth chip's architecture consumes less power than traditional systems, which would be beneficial for automated driving systems. design.

Similarly, using a pair of DVS cameras (not framed) can also reduce the amount of data and power consumption, and increase speed, reduce latency, provide better dynamic range, and IBM said these are key elements of instant system design. When asked about the advantages of the new TrueNorth system, Andreopoulos said that it has two hundred times more power per pixel disparity map than the most advanced systems using traditional CPU/GPU processors or FPGAs. Improvements.

Using event-based input, the real-time image data fed into the IBM system is processed with 9 TrueNorth chips, which can compute 400 disparity maps per second with a delay of only 11 milliseconds (ms). IBM pointed out in the paper that borrowing By specific trade-offs, the system can further increase the rate to 2,000 disparity maps per second.

When can stereoscopic vision systems with TrueNorth chips be commercialized? Andreopoulos said: “We can't disclose the time points yet. We can only say that we have tested and successfully programmed the chip to effectively handle disparity maps. At this stage, it is proof of concept.

Compilation: Judith Cheng