The inference engine behind Layla is "llama.cpp" (https://github.com/ggerganov/llama.cpp), a very popular open-source inference engine that allows running LLM on mobile.
This inference engine supports a specific file format, called the GGUF. This is the AI model that powers all features in Layla.
The pre-built models in Layla that you downloaded when you first open the app are good, but the true power of Layla comes from being to load any AI model you wish. This can include uncensored ones, professional ones, roleplay ones, or any other that have been created by the local AI community.
You can find all GGUF models that Layla supports here: https://huggingface.co/l3utterfly
How to load a custom GGUF into Layla
Choose a model that you like, in this example, we will use the popular Stheno-Mahou: https://huggingface.co/l3utterfly/llama-3-Stheno-Mahou-8B-gguf
Click the "Files and versions" tab
3. You will see a list of files (models) that you can download
Each filename is annotated with a QXX "quant", for example "Q2_K". These are quants.
You will notice the higher the quant (the bigger the number after "Q"), the larger the file size. The larger the file size, the higher quality the responses from the AI will be, however, it means you will need better hardware to try to run it.
As a general rule of thumb, I suggest starting with the Q4_K model. If you feel it's fast enough, you can try going up to Q6 or Q8, if you feel it's too slow, then go down to Q2.
There are three "special" quants: "Q4_0_4_4", "Q4_0_4_8", and "Q4_0_8_8". These are special models for the latest hardware. To learn more about them and if you can use them, read here: https://www.layla-network.ai/post/layla-supports-i8mm-hardware-for-running-llm-models
4. Download the model by clicking the little download arrow
5. Once your model is downloaded somewhere in your phone, load up Layla and go to the Inference Settings
6. Choose to "Add a custom model"
7. Choose a "Local file"
8. In the file picker options, choose your model that you just downloaded.
9. Lastly, make sure your prompt format matches the model that you chose!
Comments