What is "i8mm"?
The i8mm feature, short for "Integer 8-bit Matrix Multiplication," is a specialized instruction set designed to accelerate machine learning operations on mobile devices. This technology has become increasingly important as smartphones and tablets incorporate more AI-driven features, from image processing to natural language understanding.
At its core, i8mm allows mobile CPUs to perform matrix multiplication operations using 8-bit integer values more efficiently. Matrix multiplication is a fundamental operation in many machine learning algorithms, especially in neural networks. By optimizing these calculations for 8-bit integers instead of the more common 32-bit floating-point numbers, i8mm significantly reduces power consumption and increases processing speed for AI tasks.
This feature is perfect for LLMs, because it's all matrix multiplications at heart!
Layla's support for i8mm
There is a special model quant that takes advantage of i8mm support in your hardware. It is called Q4_0_4_8.
By using this quant, you may get a 2x or more speed-up in your character responses! On the latest hardware, it could be even more.
All models quanted in the l3utterfly repository on HuggingFace provides Q4_0_4_8 quants: https://huggingface.co/l3utterfly
You are looking for this file:
(A quick note: the other two "special" quants Q4_0_4_4 and Q4_0_8_8 are for other hardware architectures and are not used currently).
Once you've downloaded the special quant, you can load them as a custom model in Layla: https://www.layla-network.ai/post/what-are-gguf-models-what-are-model-quants
Does my phone support i8mm?
The next question is if your hardware supports this. Modern flagship phones should all support them (flagship being S24 Ultra, latest Pixel Pro, etc.)
To check if your phone supports it, you need to find out what is your chipset. You can look up your phone on a website called GSMArena. For example: https://www.gsmarena.com/samsung_galaxy_s23_ultra-12024.php
Scroll down to the Platform section and note your chipset. For example:
Next, you need to check if your chipset supports the i8mm instruction sets. You can look them up here: https://gpages.juszkiewicz.com.pl/arm-socs-table/arm-socs.html
Look for your chipset name in the left column, and then look to see if the "i8mm" column shows YES or NO.
IMPORTANT: do not try to load the Q4_0_4_8 quant if your phone does not support i8mm, Layla will crash.
Comentarios