top of page
Writer's pictureLayla

What are GGUF models? What are model "quants"?

The inference engine behind Layla is "llama.cpp" (https://github.com/ggerganov/llama.cpp), a very popular open-source inference engine that allows running LLM on mobile.


This inference engine supports a specific file format, called the GGUF. This is the AI model that powers all features in Layla.


The pre-built models in Layla that you downloaded when you first open the app are good, but the true power of Layla comes from being to load any AI model you wish. This can include uncensored ones, professional ones, roleplay ones, or any other that have been created by the local AI community.


You can find all GGUF models that Layla supports here: https://huggingface.co/l3utterfly


How to load a custom GGUF into Layla

  1. Choose a model that you like, in this example, we will use the popular Stheno-Mahou: https://huggingface.co/l3utterfly/llama-3-Stheno-Mahou-8B-gguf

  2. Click the "Files and versions" tab

file and versions tab in huggingface

3. You will see a list of files (models) that you can download

list of model files in huggingface repo

Each filename is annotated with a QXX "quant", for example "Q2_K". These are quants.


You will notice the higher the quant (the bigger the number after "Q"), the larger the file size. The larger the file size, the higher quality the responses from the AI will be, however, it means you will need better hardware to try to run it.


As a general rule of thumb, I suggest starting with the Q4_K model. If you feel it's fast enough, you can try going up to Q6 or Q8, if you feel it's too slow, then go down to Q2.


There are three "special" quants: "Q4_0_4_4", "Q4_0_4_8", and "Q4_0_8_8". These are special models for the latest hardware. To learn more about them and if you can use them, read here: https://www.layla-network.ai/post/layla-supports-i8mm-hardware-for-running-llm-models


4. Download the model by clicking the little download arrow

download button in huggingface repo

5. Once your model is downloaded somewhere in your phone, load up Layla and go to the Inference Settings

LLM settings in Layla

6. Choose to "Add a custom model"

add custom model in Layla

7. Choose a "Local file"

pick a local file when adding custom models in Layla

8. In the file picker options, choose your model that you just downloaded.

9. Lastly, make sure your prompt format matches the model that you chose!

switch prompt formats in Layla

449 views0 comments

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
bottom of page