Ollama

crypticgoose · Sep 17, 2024

@Willie Trombone Using it on desktop at home and in AWS. I have a RTX 3090 I got second hand with and AMD Ryzen 6800X.

Use it for embedding & inference. Embedding: https://ollama.com/library/nomic-embed-text. Inference: mistral-7b-instruct, llama3.1-7b, etc.

I build RAG demos with a focus on spinning it out into a startup some day. Basically ingest a lot of data, summarize and RAG it, then build chat experiences.

But Ive also used more gpt4-o-mini recently as its cheap and fast. A single rtx3090 can still be slow depending on the model.

crypticgoose · Sep 17, 2024

You can also hook it up to: https://www.litellm.ai.

Then you can load balance in case ollama fails to gpt-4 or another service like groq.

Join the MyBroadband community

Get started

Ollama

JerryMungo

Honorary Master

crypticgoose

New Member

crypticgoose

New Member