From the Chinese Academy of Sciences to Schlumberger: The birth of China's ultra-high-performance chips

The arrival of 5G is accelerating.

Currently, 5G is at a critical stage of standards determination. In June of this year, the international standard organization 3GPP will soon complete the first version of the 5G international standard. At the same time, policies will continue to be favorable. On April 24, the National Development and Reform Commission, the Ministry of Finance issued a notice to reduce 5G. Public mobile communication system frequency occupancy fee standard......

5G technology can not only support the smooth interoperability of various types of robots including automobiles, but also will be the basic technology for upgrading 'quality' in smart phones, smart homes, artificial intelligence, big data, and cloud computing.

Faced with this on-going 5G wave, is China's chip industry ready? Although it is long-winded, China’s Zhizhi, represented by Huawei Hass, is still looking forward to. Currently in the baseband field, Huawei Hass It is the only Chinese company that can compare with Qualcomm. This is the result of Huawei's accumulation of various elements in more than 30 years. It is also not an overnight event.

At the start-up company level, there is also a company that is particularly special: Silang Technology, born out of the Institute of Automation of the Chinese Academy of Sciences and the former National Specialized Integrated Circuit Design Engineering Technology Research Center (established in 1992), former director of the center, former Institute of Automation, Chinese Academy of Sciences Director Dr. Wang Donglin led the team and has developed MaPU, a high-performance microprocessor for the first time. MaPU achieved global optimization for algebraic algorithms for the first time and is highly programmable. It was successfully streamed out in 2015.

MaPU can not only fully realize the performance of international giants' programmable processors, but also has a power consumption comparable to that of ASIC. Based on MaPU, Silang Technology has further developed three major areas of processors: UCP for 5G communication, for multimedia applications. UMP, and HPP for the field of hypercomputing. At the same time, it is also equipped with an AI domain processor: Deep neural network engine NNE.

Recently, at the Chinese Academy of Sciences Institute of Automation, the investment community interviewed the founder and chief scientist Wang Donglin of Schlumberger Technology. Wang Donglin first proposed an algebraic algorithm-level 'global optimization calculation' architecture based on this architecture, and the computational power and performance power consumption of the MaPU based on this architecture design. Compared with the international advanced level.

Systech Founder and Chief Scientist Wang Donglin

High Performance Microprocessor MaPU

Wang Donglin introduced that MaPU's greatest feature is its high computing power and low power consumption.

At present, there are several types of processors commonly used in the market. One type is a programmable processor, such as Intel, TI's processor. It is programmable and adaptable. But when executing a mathematical algorithm, its processor The utilization rate is generally around 15%, and the high energy is up to 20%. The processor utilization rate of TI's processors is only up to 40-50%. That is to say, these processors have such a high frequency. Multiple resources, but its execution efficiency is not high.

There is also an ASIC solution that does not require programming. It uses hardware to control the algorithm flow and control of the algorithm. This is actually an accelerator of the algorithm, so the execution efficiency can be very high, and it can be almost 100%.

Obviously, there is a huge difference in power consumption between programmable processors and ASICs. The problem with ASICs is that they are highly efficient, but algorithms are invariable. If the algorithm changes a bit, this chip cannot be used.

The MaPU can achieve near-ASIC efficiency (computational resource utilization can reach over 90%), and it is also highly programmable, with both advantages.

Take the supercomputing chip as an example. The performance and power consumption of the MaPU is the highest in the world

In Wang Donglin's view, the core problem of the current mainstream programmable processors is that it is a traditional architecture with low instruction levels and attempts to achieve local parallel execution through out-of-order multiple-emission and other techniques at runtime. This has caused computing resources in the chip. The utilization rate is not high, the amount of data IO is large, the dynamic power consumption is large, and the overall performance/power ratio is not high. It has been unsuitable for today's society to double the microprocessor's huge computing power and extremely low power consumption. If it can be applied from the whole algorithm To consider the parallel characteristics of different dimensions, such as time and space, and to use these parallel characteristics for overall optimization, the usage rate of the calculators in the kernel will be greatly improved. So Wang Donglin and his team have conducted rigorous measurement and experimentation. A solution for global optimization of algebraic algorithm level is proposed.

'A single instruction can implement an algebraic algorithm, so it's called an algebraic instruction. The instruction sets of traditional architectures are all arithmetic-level instructions.' Wang Donglin said that MaPU upgrades it to algebra-level arithmetic instructions, and 'MaPU uses algebraic instructions.' Pipeline zero-delay dynamic reconstruction (adapted to the algorithm) hardware architecture, achieves the same basic algorithm architecture as ASIC, and implements the global optimization execution process of the entire algorithm.

In short, MaPU can support both application-level global optimization and high-reconfigurable computing architecture and storage architecture at the software level, which can be flexibly adapted to the field (5G communication, multimedia, supercomputer or manual Various kinds of algorithms within the intelligent), it can be said that MaPU combines the advantages of ASIC, FPGA, and CPU, and it is a soft ASIC that is almost comparable to the performance power ratio of ASIC.

'MaPU-Algebra Operational Microprocessors, which generate major original innovations in parallel algebraic operations, parallel storage system instruction systems, and hardware architectures, increase microprocessor hardware support from scalar/superscalar operations to algebraic operations, increasing orders of magnitude Computational intensive field microprocessor energy efficiency ratio. 'Wang Donglin so summarizes.

Then, given the specific performance and power consumption ratio, Wang Donglin gave a set of intuitive comparison data:

Taking the Aurora H1.0 supercomputing chip as an example, the chip integrates 32 HPP processing cores, and the double-precision floating-point processing capability will reach 4,659GFLOPS@64. The estimated power consumption is only about 40W, and the performance/power ratio reaches 116GFLOPs@64. /W, first in the world.

Therefore, after the mass production of the MaPU, its original architectural advantages will hopefully enable China to achieve a major breakthrough in the microprocessor architecture, release huge computing power at the same energy ratio, and lead the independent innovation and development of the Chinese electronics industry.

In response, Bai Chunli, the president of the Chinese Academy of Sciences, said in the media in March this year: 'In the research and development of high-tech products, the Chinese Academy of Sciences will soon release a microprocessor with completely independent intellectual property rights - MaPU algebraic processor, reaching the international advanced level. It is believed that after the MaPU series processors come out, they will shine in the world of computers, communications and other areas of the consumer electronics market.

'MaPU's three children'

On the basis of MaPU, Smart Technology has further developed three powerful field processors: UCP in the 5G communications field, UMP in the multimedia field, and HPP in the Hypercomputing field.

UCP: The world's first full implementation of software-defined radio.

UCP is MaPU's enhanced general-purpose communications processor for mobile communications is the chip core of the 5G macro base station base station processor, UCP core can complete 5.8G fixed-point complex FFT per second, per second can complete 55GBPS LDPC encoding and 2.5GBPS decoding. According to preliminary calculations, a baseband processor with twenty UCP cores can satisfy all baseband processing requirements for a 64-antenna 5G macro base station.

'If you use FPGS to build a 5G system, you will need multiple pieces of interconnection to achieve a system solution, and FPGA-based circuit systems will generally<400-600MHz, 片间互联总线带宽受限, 运算能力受限, 这将成为实现5G系统的瓶颈. ' 王东琳说.

At present, UCP core is an international leading processor core that realizes a full software-defined 5G wireless transmission baseband processing system at an acceptable cost. Wang Donglin defines UCP as a software-defined radio that completely implements baseband processing in the field of mobile communications.

In addition to base station equipment manufacturers, UCP cores can also be provided to 5G terminal manufacturers. Due to the 5G standard, all terminals must be re-embedded to accommodate 5G algorithm baseband core (the original baseband core or DSP core can not deal with 5G downlink reception And the large amount of operations during uplink transmission, this is also an opportunity for the UCP core. All-in-one wireless communication equipment, broadband self-organizing network terminals in various fields are also UCP's ability to display its capabilities through full software-defined radio technology.

UMP: Thanks to its smart phone, smart TVs can be upgraded online with audiovisual experience, and can provide super-engines for high-definition photography, video and other applications.

UMP is Mapu's second 'kid' for smart phones, smart TV's multimedia microprocessor core.

UMP invented a more efficient parallel processing architecture based on the MaPU basic architecture, making the performance and power ratios of various types of video processing operations comparable to those of ASICs. In some respects, it even outperformed, while maintaining a highly programmable feature.

'This feature can make home TV with TV manufacturers video and audio processing algorithms to improve the visual and audio experience online, this is unmatched by the ASIC TV chip. 'Wang Donglin said, 'Also can quickly introduce new products through algorithm and software improvements . '

4 ultra-high definition TV engine chips (up to 8 watts) consisting of 4 UMP cores and 1 ARM core can meet all the video and audio processing requirements of 4K ultra-high definition TVs and TV system management based on Android. The audio and visual effects can be comparable to those of Sony. With Samsung's highest-end TV. 14 UMP cores + 1 ARM core super TV engine chip to meet all processing and computing needs of 8K Ultra HD TV.

Together with the NNE core developed by Silang in the AI ​​field deep neural network engine, whether it is a smart TV or a smart phone, it can achieve high performance in high-definition video, machine vision, human-computer interaction, and lower domestic-made consumer electronics products. The cost will greatly improve the user experience. The first super TV engine chip will be finished in 2018, and can be applied for smart TV manufacturers to promote the application.

Compared with Intel's latest processor, the HPP core super processor is nearly an order of magnitude higher in power consumption.

At present, MaPU's third 'kid' HPP kernel is already a mature product.

'MaPU's core capabilities obtained through architectural innovation are still highly computation-intensive. ' Therefore, Wang Donglin has been hoping to build a supercomputing microprocessor for Mapu's enhanced core HPP (high-performance processing) for general-purpose computing to meet the needs of high-end server applications. .

In this way, Aurora H1.0, a super-computation MaPU microprocessor (which is expected to be completed at the end of the year) has been developed. The performance and power ratio of the Aurora H1.0 has been designed to be far superior to other microprocessors in the world. , It can be used as the core processor of super-computing system and super server.

Wang Donglin gave a set of data:

HPP-based supercomputer Aurora H1.0 has the same performance as Intel's latest Xeon Phi, but it can be reconfigured with 16/32/64/128 bits, and the performance/power ratio is improved by nearly an order of magnitude: providing equivalent 64-bit floating The point-to-point computing power when the Aurora H1.0 (2x16 HPP cores) consumes 40W and the Intel Xeon Phi processor is 300W.

NNE: Top Deep Neural Network Processing

Another product of SFRON: Accelerated neural network engine NNE inherited and developed MaPU's 'Concentric Circle Storage System Optimization Model', and optimized it for deep neural networks. The advantage is that for the mainstream neural network, the whole network throughput rate is high. Memory access requirements and low power consumption are comparable to those of Nvidia's deep neural network kernels, but they are more efficient. NNE's NNE absorbs the MaPU global optimization concept when it comes to storage systems and deep neural network configurations.

'NNE can support deep-learning training, especially supports intelligent reasoning, has been optimized specifically in video image recognition, and has outstanding advantages in target detection, recognition, and video image structuring.'. Wang Donglin.

An application scenario that can be expected is smart driving. In this scenario, UMP can process multi-camera images at high speed and extract objects to be identified. NNE is responsible for understanding the processing conditions, vehicle conditions, and providing key information needed for decision-making and driving control. UCP is responsible for providing extremely short-delay vehicle networking communications capabilities.

Gan sat on the bench, insisted on research and development for nearly 10 years of research team

Silang and its predecessor team have been researching the deployment of new instruction set architectures since 2009, and have developed fully independent and innovative microprocessor architectures. MaPU has experienced nine years of hardship. The R&D team is the original national ASIC design. More than 70 core scientific researchers in the Engineering Technology Research Center.

From the second quarter of 2017, the team began its corporate operation.

Wang Donglin is a person with a technical belief, as is teammates. The field of integrated circuits has always been inadequate, and there is a shortage of personnel. Because chips such as chips are too hard and the benefits are not high, many outstanding students are more willing to choose financial and Internet industries after graduation. Craftsmanship is required to make chips, requiring top-notch technical personnel willing to concentrate on R&D, and can withstand the pressure of successful R&D for a long time. The R&D team of Silang Technology has done it.

MaPU-based processors in several enhanced areas have their own advantages in their respective fields, and they can also be used in combination to achieve a variety of practical scenarios: 5G communications, smart phones, smart homes, supercomputers, smart driving, intelligence Cities, robots and drones, etc.

'Moore's Law cannot be applied forever, and chip performance upgrades have encountered a worldwide bottleneck. This is just the best opportunity for us to catch up.' Wang Donglin said, 'The chip is an industry that needs patience. We have been doing it for 9 years. OK, the next important thing is to adjust to the actual application and optimize.

2016 GoodChinaBrand | ICP: 12011751 | China Exports