Ollama

@Willie Trombone Using it on desktop at home and in AWS. I have a RTX 3090 I got second hand with and AMD Ryzen 6800X.

Use it for embedding & inference. Embedding: https://ollama.com/library/nomic-embed-text. Inference: mistral-7b-instruct, llama3.1-7b, etc.

I build RAG demos with a focus on spinning it out into a startup some day. Basically ingest a lot of data, summarize and RAG it, then build chat experiences.

But Ive also used more gpt4-o-mini recently as its cheap and fast. A single rtx3090 can still be slow depending on the model.
 
Top
Sign up to the MyBroadband newsletter
X