“Neural nets are the new apps,” said Raja M. Koduri, senior vice president and general manager of Intel’s Accelerated Computing Systems and Graphics Group.
Koduri said that by accelerating matrix multiplications at the core of neural networks, Intel will have the fastest chips for machine learning and deep learning, as well as all forms of artificial intelligence processing.
Intel presented a demonstration as part of the Architecture Day, where its upcoming standalone GPU Ponte Vecchio beat Nvidia’s A100 GPU in a neural network benchmark task and used the ResNet-50 neural network to categorize images from the ImageNet image library.
In the demonstration, Intel claims that Ponte Vecchio is capable of processing over 3,400 images in fractions of a second in pre-production silicon, surpassing previous records of 3,000 images and neural network training. In the field of inference, where a trained neural network makes predictions, Ponte Vecchio is able to predict over 43,000 images in a single second, surpassing the current peak of 40,000 images per second.
One of today’s major revelations is that the new CPUs will use a hardware structure called “Thread Director,” which controls how execution strands on the processor are designed to adapt to important factors such as power consumption to relieve the operating system of some of these functions.
According to Intel, the Thread Director, “provides low-level telemetry on the state of the core and the instruction mix of the thread, empowering the operating system to place the right thread on the right core at the right time.”
Another innovation is how the chips will utilize memory bandwidth technologies. For example, Alder Lake has been announced to support PCIe Gen 5, DDR 5 memory interfaces. Intel’s future data center processor, Sapphire Rapids, offers ways to distribute data across both DDR 5 main memory and high-bandwidth HBM memory without the application software needing to know anything about both chips. This will continuously improve both storage capacity and memory bandwidth for AI workloads that require both memory and I/O aspects.
Intel also unveiled for the first time various features for Sapphire Rapids, which will form the next era of its Xeon family of server chips. For example, the chip will perform 2,048 operations per clock cycle on 8-bit integer data types, using Intel’s so-called AMX, or “advanced matrix extensions.” The focus here is on neural net kinds of operations.
When asked how Intel’s recent innovations could transform the way neural networks are built, Koduri said that the different processor types now spreading at Intel and elsewhere need to work much more together and function less separately to share tasks.
For more information, read the original story in ZDNet.