IBM shows 10x GPU machine learning, processing 30GB training data in just 1 minute@goodchinabrand.com

The research team marked the importance of training data sets, training only important data, and most of the unnecessary data was no longer required to be fed into the GPU, saving much time in data transfer.

IBM Research and EPFL Launches Big Data Machine Learning Solution at 2017 NIPS Conference, a method that takes advantage of a GPU to process 30GB of training data in less than 10 minutes, up to 10 times the amount of existing limited-memory training.

The team said that the challenge machine training encounters in big data age is that it is a common but tricky problem to jump-start training at any TB level. Perhaps a server with enough memory capacity can load all training data Memory, but it can take hours or even weeks.

They believe that the current special computing hardware, such as GPUs, can really speed up computing, but it is limited to computationally intensive tasks rather than data intensive tasks, and if you want to capitalize on the benefits of GPU intensive computing, you need to preload data To GPU memory, and the current maximum capacity of GPU memory is only 16GB, which is not ample for machine learning.

Batch operation seems to be a feasible method, the training data is divided into pieces, and in accordance with the order to load the GPU to do the model training, but the experiment found that the data transfer from the CPU into the GPU transmission costs completely covered The benefits of putting data into high-speed GPU computing, IBM researcher Celestine Dünner said the biggest challenge with machine learning on the GPU is the inability to throw all the data into memory.

To solve this problem, the research team developed techniques to mark the importance of training data sets, so training uses only the most important data, so most of the unnecessary data does not need to be sent to the GPU, saving a lot of training time. To train a model to distinguish between pictures of dogs and cats, once the model finds that one of the differences between a cat and a dog is that the cat's ear must be smaller than the dog, the system will retain this feature and will not repeat this feature in future training models, As a result, models are being trained faster and faster, according to IBM researcher Thomas Parnell, a feature that facilitates more frequent training of models and more timely tuning of models.

This technique is used to measure how much each data point contributes to the learning algorithm, mainly using the concept of binary disparity and adjusting the training algorithm in time.With the practical application of this method, the research team is working on Heterogeneous compute platforms , Developed a new, reusable component, DuHL, for a machine learning training model designed for heterogeneous learning of binary disparity.

IBM said their next goal is to provide DuHL in the cloud as the current unit of cloud GPU service is based on hours, and the cost savings will be staggering if the model training time is reduced from ten hours to one hour.

(Top) The graph shows the time required for the three algorithms, including DuHL's performance on a massive SVM, using a 30GB ImageNet database with hardware as an 8GB NVIDIA Quadro M4000 GPU, The figure shows the efficiency of batches of GPU sequences even worse than CPU-only methods, while DuHL is more than 10 times faster than the other two methods.