Llama 2 on Yunhost

alainsanguinetti · July 25, 2023, 7:14pm

Hi there,

I was wondering, is there anyone smarter than me who has figured how to make Llama 2 into an app on YunoHost?

I have no idea how good it would be expected to perform on a typical entry-level SSD VPS on OVH

Thanks for the help!

edbarz9 · July 26, 2023, 7:37am

Hum, that will depend a lot on your VPS. I have tried to run the smallest model on my local machine (old iMac) but inference time is just unusable. On the other hand, I’ve been playing with RWKV. It’s a RNN inspired transformer that runs reaaaaalllllly fast on CPU. I’ve started to work on a flask API server to be able to run it on a VPS and use inference in web project but there’s still a lot of work. In the meantime, I’m learning to use weaviate to store embeddings of documents to enhance the response. All of that should run with langchain.

In short, I believe we could have, in the near future, open source and transparent large language models applications self-hosted using CPU only. But there’s still a lot of work if you don’t want to use cloud SaaS.

edbarz9 · July 26, 2023, 7:45am

(also for context, I’ve tried Llama2 with q4_1 quantization, you could try lower quantization to get faster inference and lower RAM and CPU usage but lower you quantize, the worst the response gets. also, so far the only non cloud/non GPU demo I’ve seen with fast response time were run on M2 macs)

edbarz9 · October 13, 2023, 8:43am

Okay so I back again with good news, Mistral Instruct 7B is running great on CPU (with the GGUF quantization), currently working on a demo basic app with streamlit to provide self-hosted LLM processing

alainsanguinetti · October 13, 2023, 11:28am

That would be awesome

ndx1905-github · September 13, 2024, 9:21pm

Any news ?

Aleks · September 13, 2024, 11:53pm

The news is that LLM / AI is a rapidly evolving technology, while YunoHost is about having stuff that work and are actually useful instead of what many may consider controversial gadgets which I don’t even know how they could be called “an app” …