Layla v6.9.0 has been published
- Layla
- 2 hours ago
- 2 min read
Introducing "Companion Mode" in Layla. (Note this feature is Android only.)
Layla's companion mode allows you to select a character to run alongside you as you use your phone! Your companion will appear on top of other apps.
Your companion runs your selected LLM and voice option, with a vision model that is capable of looking at your phone screen. They can even play games with you!
Tapping your companion will trigger a screenshot, where your companion will then listen to your voice and look at the screenshot automatically to give you tips, ideas, or simply chit-chat! Your companion supports animated Live2D models, making them react to what they see.
Layla's Companion Mode supports all Inference Settings in Layla: GGUF, LiteRT, connecting to your PC, or Cloud.
Enabling Companion Mode
To enable companion mode, you can install the "Companion" mini-app in Layla:

Once the mini-app is installed, go into it and you will be able to choose a companion!
A companion can be any character you create, or you can use one of the preset ones that comes with the app.

Note: it is recommended that you create a special character with a custom system prompt to tell the LLM it is "viewing the phone with you". This will give you a better experience.
When prompted, give permissions to Layla to record your screen an audio. The companion works by sending a screenshot of what you see every time you tap your companion to chat.
Full Changelog
Improvements:
advanced settings allows adjusting LLM and VLM hardware acceleration separately, allowing you to put the LLM on the CPU while the VLM on the GPU
allow adjusting running model on CPU/GPU for LiteRT models
context length settings are now applied to LiteRT models
supports Gemma-4 tool calling agents in both GGUF and LiteRT
allow setting additional llama.cpp command-line arguments in Advanced Settings
additional llama.cpp cmd arguments can be saved as Advanced Settings presets
Bug fixes:
fixed a few "Failed to eval" bugs on long context usage
fixed a few UI issues that happens when the screen is horizontal
fixed a bug where viewing a memory that is too long will soft-lock you without any way to exit the chat
fixed a bug where OpenCL crashes on older GPUs
LTM now removes think tags from Gemma 4 if used as summariser