Layla v6.6.0 has been published
- Layla

- Feb 25
- 2 min read
A new update for Layla has just been published! This release brings an exciting new experimental feature for users with MediaTek or Google's Tensor chips, alongside numerous quality-of-life improvements and bug fixes to enhance your offline AI experience.

Experimental Support for LiteRT-LM models
We are testing out a brand new engine! LiteRT-LM models run on LiteRT, an inference engine by Google: https://github.com/google-ai-edge/LiteRT-LM
LiteRT-LM (.literlm) models can be added in the Inference Settings the exact same way as you would a GGUF model. Layla is smart enough to detect the format and will use the LiteRT-LM engine automatically.
Please note: LiteRT support is currently experimental. Most advanced features are not implemented yet, but we are actively working on it!
Gemma 3n E2B: https://huggingface.co/google/gemma-3n-E2B-it-litert-lm/resolve/main/gemma-3n-E2B-it-int4.litertlm
Gemma 3n E4B: https://huggingface.co/google/gemma-3n-E4B-it-litert-lm/resolve/main/gemma-3n-E4B-it-int4.litertlm?download=true
Qwen 2.5 1.5B: https://huggingface.co/litert-community/Qwen2.5-1.5B-Instruct/resolve/main/Qwen2.5-1.5B-Instruct_multi-prefill-seq_q8_ekv4096.litertlm
This update also includes numerous other improvements and bug fixes. The full changelog is below.
Full Changelog
New features:
adds experimental support for LiteRT-LM models
LiteRT-LM models run on LiteRT, an inference engine by Google, better optimised for MediaTek CPUs
LiteRT-LM (.literlm) models can be added in the Inference Settings the same way as you would a GGUF model, Layla will use the LiteRT-LM engine automatically
LiteRT support is experimental, most features are not implemented yet
Improvements:
improved download stability of new model downloads in the welcome screen and stable diffusion model downloads
added TTS setting for a global default voice
added the ability to given your character/LLM a custom instruction during chatting in Chat Actions
added UI setting to control text area expansion in voice chat
<think> content is removed when reloading the chat (a compromise between removing <think> content on every message causing a reload, which is too slow for mobile)
added "copy" character button for user created characters
TTS now skips speaking unpronounceable characters (such as ASCII art, tables etc.)
added a button in Diagnostics popup that force restarts Layla
Bug fixes:
fixed a bug where Dreams keep scheduling messages for the same character more than once
fixed a bug where blockquotes were not being rendered correctly
fixed a bug where Layla character cannot be duplicated
fixed a bug where different model options in Inference Settings cannot be scrolled sometimes
fixed a bug where "Speak Responses" chat actions were not working
fixed a bug where searching chat history does not search through all chat histories
fixed a bug where GPT-OSS model was not generating responses
fixed a bug where if you cancel voice chat before it initialises the UI dark cover is gone
fixed a bug where edited memories are still using old embeddings, causing recalled content to be different to queried content
fixed a bug where reloading chat sometimes do not read from cached session and instead loads everything from scratch
fixed prompt format for GPT-OSS
fixed a bug where Layla as your phone's default assistant was not working properly



Comments