The remarkable improvement in computing performance in the last ten years has seen Neural Network (NN) training accelerate. Now, there is room for the application of techniques of deep learning to real-life situations. Attempting to enhance the performance of available artificial intelligence applications has made machine learning practitioners focus on custom hardware.
It is a consensus that Artificial Intelligence and Data Science have become significant advancements in technology. A just-concluded report by the IDC estimates that almost $37.5 billion is being funneled into just the financial sector, with this figure expected to rise to $100 billion by the year 2023.
Despite these significant investments, there is a visible challenge for most companies in this niche to achieve a proportional value in terms of output. The process is time-intensive, especially as it pertains to processing and organizing high-quality data for artificial intelligence learning. The route of selecting, organizing, and reviewing data is vital in the learning phase of AI. This process is arguably the most important of all methods in AI learning if exceptional results must be obtained.
Before exploring the greatest hindrances to the learning phase of AI from a technical perspective, we must first establish what AI learning is.
AI learning is the “mathematical” way in which the systems can learn from the experience (data) to adapt the structure of the Neural Network in order to mimic the history, extract fundamental patterns and predict future behaviours extrapolating these patterns.
It is important to note the difference between AI learning and AI machine learning.
AI machine learning refers to an engagement of artificial intelligence that makes systems capable of learning and improving automatically based on experience without explicit programming. It pays attention to developing computer programs capable of accessing data and using the same to learn.
Greatest Hindrances in Artificial Intelligence Learning
Improvements in machine learning, which is the artificial intelligence responsible for self-driving vehicles and other technological applications, have led to a new computing era referred to as the data-centric era. Engineers are being made to review elements of the conventional computing concepts that have been widely accepted for the better part of 75 years This may be attributed to inefficient optimization of resources in older computing models.
According to Yingyan Lin, assistant professor of computer and electrical engineering, deep neural networks which have large scale tendencies (and are considered state-of-the-art concerning conventional machine learning) have a problem where 90% of the electrical power required for the system is used up when data is transferred from the memory to the processor and vice versa.
Along with her colleagues, she offers a solution of 2 methods that complements each other to optimize data-centric processing. Both ways were presented on the 3rd of June at the ISCA, encouraging novel ideas and research in computer architecture in a significant conference.
The desire for data-centric architecture traces its origins to a challenge known as the von Neumann bottleneck, resulting from the separation of processing from the memory of the computer architecture, which has been dominant since John von Neumann invented it in 1945. In separating data and programs from memory, von Neumann’s architecture makes a unitary computer have incredible versatility. A laptop can place video calls, run simulations of the Martian weather or prepare a spreadsheet, depending on the storage program recalled from its memory.
This separation, however, also results in even the most essential addition operations requiring the processor of the computer to access the memory numerous times. Worsening this memory bottleneck are large operations in deep neural networks, where systems teach themselves to make decisions familiar with humans by studying great numbers of pre-existing examples.
There is also an increased difficulty in mastering tasks for more extensive networks, and networks shown more examples often perform better. Deep neural network training needs banks of customized processors that operate continuously for over a week. If tasks with roots in learned networks are performed on a smartphone, the battery would be exhausted in under an hour.
A director at Rice’s Efficient and Intelligence Computing Lab believes that data-centric hardware architecture should be innovative in the era of machine learning. However, there is no singular optimal hardware architecture regarding machine learning because various applications require different machine learning algorithms, especially if the algorithm’s structure and complexity are taken into account.
At ISCA 2020, Lin and her students developed TIMELY (Time-domain, In-Memory Execution, LocalitY) to process in-memory. This approach is inconsistent with von Neumann’s and integrates processing with memory arrays. Resistive random access memory (ReRAM) is a PIM (processing in-memory) platform showing promise; it is a memory with no volatility, not unlike flash.
Experiments carried out on over ten deep neural network models revealed TIMELY as having 18 times more efficient in terms of energy and having delivery exceeding the computational density of the best ReRAM PIM accelerator by a factor of more than 30. TIMELY meets such remarkable performance standards by eliminating the main factors contributing to decreased efficiency resulting from recurrent access to the main memory to manage intermediate input and output and interfacing between local and central memories.
Memory is stored in a digital format in the main memory but must undergo a conversion to analog when introduced into the local memory to process in-memory. Previous ReRAM PIM accelerators have resultant values that transform analog to digital. Then they are returned to the main memory.
TIMELY eliminates overhead payments for unrequited access to the main memory and data conversions that are interfacing by using analog buffers in the local memory. Efficiency is improved significantly as TIMELY stores needed data in local memory arrays.
The group had a second proposal at the same event, which was for SmartExchange. It is a design that amalgamated algorithmic and accelerator hardware innovations for energy saving.
Lin illustrates that 200 times more energy is required to access the main memory (DRAM) than to process a computation. The main idea behind SmartExchange is compelling structures in the algorithm that permit the trade of high-cost memory for calculation which has a much lower cost.
Researchers utilized the SmartExchange algorithm in conjunction with their conventional hardware accelerator to investigate seven benchmark models in the deep neural network and three benchmark datasets. The result was 19 times more latency reduction than in state-of-the-art deep neural network accelerators.