Tuning the software doubles the computer’s processing speed and halves its energy consumption
February 23, 2024
0
At the 56th Annual IEEE/ACM International Symposium on Microarchitecture, researchers at the University of California, Riverside (UCR) demonstrated an approach in which any of the computing components of
At the 56th Annual IEEE/ACM International Symposium on Microarchitecture, researchers at the University of California, Riverside (UCR) demonstrated an approach in which any of the computing components of the platform would actually operate simultaneously. In this way, it is possible to double the calculation speed and halve the energy consumption. The technology can run on any processor and accelerator, from smartphones to data center servers, but it needs to be improved.
“You do not have to [для прискорення обчислень] add new processors because they already exist” – said Hung-Wei Tseng (Hung-Wei Tseng), assistant professor at the Department of Electrical and Computer Engineering at the University of California and co-author of the study. It is only necessary to competently dispose of existing hardware resources and not sort them.
The platform developed by the researchers, called concurrent and heterogeneous multithreading (SHMT), differs from traditional programming models. Rather than providing data to only one of the system’s computing components (central, graphics, tensor, or another processor or accelerator), SHMT technology parallelizes code execution across all components simultaneously.
SHMT uses a quality-aware work stealing (QAWS) multi-thread scheduling policy, which is resource efficient but helps maintain quality control and workload balance. The execution system creates a set of virtual processes (vOPS) and divides them into one or more higher-level processes (HLOPs) to use multiple hardware resources simultaneously. The SHMT runtime then allocates these HLOPS into job queues to run on the target hardware. Since HLOPS is hardware independent, the execution system can direct tasks to one or another component of the computing platform as needed.
Comparison of traditional, modern heterogeneous and SHMT parallelization methods
What is especially valuable is that the researchers demonstrated the effectiveness of new software libraries using the example of the test platform they created. They created a hybrid that can be considered both a smartphone, a computer, and even a server. On the basis of a link board with a PCIe connector, a “computer” was created from the combination of the NVIDIA Nano Jetson module with a quad-core ARM Cortex-A57 processor (CPU) and 128 graphics cores (GPU) of the Maxwell architecture. ). The Google Edge Accelerator (TPU) was connected via the M.2 Key E slot on the card.
Speeding up of SHMT calculations depending on the selected policy
The main memory of the presented system is 4 GB LPDDR4 with a frequency of 1600 MHz and a speed of 25.6 Gbit/s, where general data is stored. The Edge TPU module also includes 8 MB of memory and Ubuntu Linux 18.04 was used as the operating system.
Running the SHMT suite on an improvised heterogeneous platform using standard testing practices showed that the QAWS framework with the most efficient policy showed a 1.95x increase in computation speed and a significant 51% reduction in consumption compared to the baseline computation allocation method. If this approach is scaled up for use in a data center, the profit promises to be huge, and at the same time, all the “iron” will remain unchanged – nothing will need to be changed. The proposed solution is not yet ready for implementation, but it will probably easily find people interested in it.
As an experienced journalist and author, Mary has been reporting on the latest news and trends for over 5 years. With a passion for uncovering the stories behind the headlines, Mary has earned a reputation as a trusted voice in the world of journalism. Her writing style is insightful, engaging and thought-provoking, as she takes a deep dive into the most pressing issues of our time.