+1 (970) 414-2609
  • Active Campaign
  • Global News
Thursday, June 1, 2023
  • Home
  • About Us
  • Our Experts
  • AI Tools
  • Our Work
    • Education
    • Community
    • Humanitarian
No Result
View All Result
Boys & Girls Clubs of Senegal
No Result
View All Result
Home Artificial Intelligence

Floating-Point Arithmetic for AI Inference

by anna commander
April 11, 2023
in Artificial Intelligence
Reading Time: 5 mins read
A A
464
SHARES
1.8k
VIEWS
Share on FacebookShare on Twitter

NORTHAMPTON, MA / ACCESSWIRE / April 11, 2023 / Qualcomm:

OnQ Blog

Artificial intelligence (AI) has become pervasive in our lives, improving our phones, cars, homes, medical centers, and more. As currently structured, these models primarily run in power-hungry, network-dependent data centers. Running AI on edge devices such as smartphones and PCs would improve reliability, latency, privacy, network bandwidth usage, and overall cost.

To move AI workloads to devices, we need to make neural networks considerably more efficient. Qualcomm has been investing heavily in the tools to do so, most recently showcasing the world’s first Stable Diffusion model on an Android phone. Bringing models like GPT, with its hundreds of billions of parameters, to devices will require even more work.

The Qualcomm AI Research team has been making advances in deep learning model efficiency for the past years with state-of-the-art results in neural architecture search, compilation, conditional compute, and quantization. Quantization, which reduces the number of bits needed to represent information, is particularly important because it allows for the largest effective reduction of the weights and activations to improve power efficiency and performance while maintaining accuracy. It also helps enable use cases that run multiple AI models concurrently, which is relevant for industries such as mobile, XR, automotive, and more.

Recently, a new 8-bit floating-point format (FP8) has been suggested for efficient deep-learning network training. As some layers in neural networks can be trained in FP8 as opposed to the incumbent FP16 and FP32 networks, this format would improve efficiency for training tremendously. However, the integer formats such as INT4 and INT8 have traditionally been used for inference, producing an optimal trade-off between network accuracy and efficiency.

We investigate the differences between the FP8 and INT8 formats for efficient inference and conclude that the integer format is superior from a cost and performance perspective. We have also open sourced the code for our investigation for transparency

Differences between floating point and integer quantization

Our whitepaper compares the efficiency of floating point and integer quantization. For training, the floating-point formats FP16 and FP32 are commonly used as they have high enough accuracy, and no hyper-parameters. They mostly work out of the box, making them easy to use.

Going down in the number of bits improves the efficiency of networks greatly, but the ease-of-use advantage disappears. For formats like INT8 and FP8, you have to set hyper-parameters for the representable range of the distributions. To get your original network accuracy back, you also have to spend some extra time quantizing these networks. Either in some simple quantization steps called post-training-quantitation (PTQ), or by training the network in a quantized way all together, called quantization-aware-training (QAT).

Given that most training in the industry is currently conducted with entire networks in FP32, or sometimes FP16 with mixed precision, the step to having some parts of a network run in FP8 is an appealing potential speed-up for the costly and time-intensive training procedures in deep learning. This topic has gained quite some traction lately, so we set out to find out what this development means for efficient inference on edge devices. Specifically, we look at both the hardware considerations for the formats and the effect of the chosen formats on neural network accuracy.

Our whitepaper shows that the hardware implementation of the FP8 format is somewhere between 50% to 180% less efficient than INT8 in terms of chip area and energy usage. This is because of the additional logic needed in the accumulation of FP formats versus integer formats. This seems like a broad range, but the actual efficiency depends on many hardware design choices that vary greatly. A similar conclusion was reached recently by Microsoft and Meta: Floating-point arithmetic is just much less efficient than integer arithmetic.

This means that FP8 will have to be significantly more accurate than INT8 to be worthwhile from a hardware-efficiency perspective.

Quantization-aware training (QAT) results

ADVERTISEMENT

Quantization-aware training is the quantization scenario most like how a format like FP8 would be used in practice, you train with the format while optimizing your neural network. We show the QAT results below for different tested formats. We see that all quantized networks get close to their original floating-point performance. In most cases, we even see an improvement over the baseline results of FP32. The reason for this is simply that training these models for longer generally improves results, even if we would train longer in FP32.

The results are quite clear: INT8 tends to perform better than other formats for most types of networks. It is only for transformers that FP8 performs better, but in the paper, we delve deeper into transformers and show that this difference is easily mitigated. The conclusion is simple however: there is no a-priori reason to believe that the FP8 format is more accurate for neural networks. In some cases, even when going as low as 4-bit weights with the W4A8 format (as indicated in the rightmost column of Figure 2), the accuracy is comparable to the FP8 format.

Can we convert FP8 to INT8 with good accuracy?

Since there are some benefits to using the FP8 data format for training, we also investigated the performance when FP8-E4 (a FP8 format with 4 exponent bits) trained networks are converted naively to INT8 for inference. We found that INT8 can precisely represent roughly 90% of the range covered by the FP8-E4 format without any quantization error. The remaining 10% of the range close to 0 incurs a small quantization error.

The general conclusion is that for networks that were originally easy to quantize from FP32 to INT8, the conversion is expected to be smooth, and can in several cases be done directly.

For networks that were already problematic to convert to INT8 from FP32 with simple PTQ techniques, mostly networks with significant outliers, similar issues will arise when converting from FP8 to INT8. However, since these latter networks are trained to deal with the reduced precision of the FP8 format, the INT8 conversion results from FP8 are better when compared against INT8 simple conversion from FP32. Moreover, INT8 QAT can be further employed to recover more accuracy in such cases.

The path towards better AI inference on device

Overall, integer quantization is still the way to do efficient AI inference. With varying effort levels, you can achieve significant efficiency benefits without sacrificing much accuracy.

For optimizing networks even further, opting for QAT can get the networks into the W4A8 (4-bit weight and 8-bit activation) regime. This is very achievable for a wide range of networks. Transformer-based large language models such as GPT, Bloom and Llama tend to benefit greatly from this jump in efficiency from 8- to 4-bit, as they are weight-bounded. Several works have shown that 4-bit weights are not only possible for large language models, but this is also optimal and possible to do in the PTQ setting. This is an efficiency boost that currently does not exist in the floating-point world.

To sum it all up, we see that floating-point format FP8-E4 is not a replacement for INT8 in terms of performance and accuracy. In most cases, they perform worse. Only in some extremely specific scenarios where layers have significant outliers, can the floating-point format perform better in terms of accuracy. We are confident that our proposed solutions will lead to a better and more seamless implementation of large AI models on edge devices. For this purpose, the Qualcomm Innovation Center has open-sourced the AI Model Efficiency Toolkit (AIMET). This allows developers to quantize their models more easily and implement AI on device more efficiently.

View additional multimedia and more ESG storytelling from Qualcomm on 3blmedia.com.

Contact Info:

Spokesperson: Qualcomm
Website: https://www.3blmedia.com/profiles/qualcomm
Email: [email protected]

SOURCE: Qualcomm

news image

Previous Post

EchoSpeech: Artificial Intelligence-Equipped Eyeglasses Can Read Silent Speech

Next Post

All Flags on the Moon Are Now Completely White But Why

Related Posts

AI, metaverse showcased at Taiwan trade show

by augustine haslett
May 30, 2023

TAIPEI, Taiwan: Artificial intelligence (AI), high performance computing and the metaverse are among the key themes of Computex Taipei 2023...

Read more

Protecting privacy with artificial intelligence

by jeanice wiers
May 30, 2023

WEBWIRE – Tuesday, May 30, 2023 In the wake of a recent incident where personal identification numbers of several municipality...

Read more

AI poses ‘extinction’ risk, experts warn

by anthony guillemette
May 30, 2023

Updated 7 hours ago · Published on 30 May 2023 11:59PM · Industry chief and experts are warning global leaders...

Read more

“AI First” Applications are the Favorites of the Future, Says

by georgianna motsinger
May 30, 2023

May 28, 2023, Kai-Fu Lee, Chairman and CEO of Innovation Works, delivered a keynote speech at the “Artificial Intelligence Large...

Read more

AI tool helps couples write wedding vows as marriage expert

by tmz staff
May 30, 2023

Artificial intelligence is the hot new topic of conversation as AI tools are increasingly threatening to replace the “human” component in...

Read more

System Shock

by diego redner
May 30, 2023

A remake that closely follows the original classic, with a slightly different overall effect. As a newcomer to System Shock,...

Read more
Next Post

All Flags on the Moon Are Now Completely White But Why

ADVERTISEMENT

Trending Posts

Brain Teasers

Crucial Skills When Danger’s Here and Seconds Count

by BGC Senegal
May 31, 2023

Read more

Crucial Skills When Danger’s Here and Seconds Count

Crack All These Riddles, You’re a Problem Solving Beast

18 Detective Riddles to Test Your Friend’s IQ

AE Live 17.2 – Visible Thinking Routine for the Language Classroom

You Won’t Believe This Harry Potter Illusion

This Cloud Screams Danger, Run Away If You See It

Load More

Popular Posts

The Senegalese Judicial Branch

by BGC Senegal
January 29, 2023

Senegal is fortunate to have a robust and well-functioning judicial system that is effective in protecting the rights of citizens...

50 Tongue Twisters to improve pronunciation in English

by BGC Senegal
August 3, 2021

Tongue twisters are a great way to practice and improve pronunciation and fluency. Below, you will find some of the...

How to cope with change?

by BGC Senegal
February 13, 2023

  How to Cope with Change Change can be hard, but it's part of life. Whether it's a big change...

Facebook Twitter LinkedIn Youtube

NEWSLETTER

Subscribe to our newsletter and be the first to know about our upcoming events and programs.

QUICK LINKS

  • About Us
  • Learning Center
  • Active Campaign
  • Privacy Policy
  • Terms and Conditions
  • Contact us
  • Global News

CONTACT INFO

  • info@senegalbgc.org
  • For donations contact us at: donate@senegalbgc.org

© 2019-2023 Boys & Girls Clubs of Senegal. We are a 501 (C)(3) organization and donations are tax deductible. - EIN: 83-3699796

No Result
View All Result
  • Home
  • About Us
  • Our Experts
  • AI Tools
  • Our Work
    • Education
    • Community
    • Humanitarian

© 2019-2023 Boys & Girls Clubs of Senegal. We are a 501 (C)(3) organization and donations are tax deductible. - EIN: 83-3699796

We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept All”, you consent to the use of ALL the cookies. However, you may visit "Cookie Settings" to provide a controlled consent.
Cookie SettingsAccept All
Manage consent

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
CookieDurationDescription
cookielawinfo-checkbox-analytics11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional11 monthsThe cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy11 monthsThe cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
Functional
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Performance
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
Analytics
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
Advertisement
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
Others
Other uncategorized cookies are those that are being analyzed and have not been classified into a category as yet.
SAVE & ACCEPT