toomuch.sh

Ollama

Run large language models locally with a simple CLI

/ Local & Self-Hosted | oss
#local-ai#cli#self-hosted#inference#open-source

Getting Started

  1. Install Ollama from ollama.com with the one-line installer for macOS, Linux, or Windows.
  2. Run your first model with a single command: ollama run llama3.2 to download and start chatting.
  3. Browse available models at ollama.com/library and pull any model with ollama pull model-name.
  4. Use the OpenAI-compatible API at localhost:11434 to integrate local models into your applications.

Key Features

  • One-command setup downloads and runs any supported model with a single ollama run command.
  • Extensive model library hosts hundreds of models including Llama, Mistral, Gemma, Phi, Qwen, and DeepSeek.
  • OpenAI-compatible API serves models locally with an API that works as a drop-in replacement for OpenAI.
  • Automatic quantization optimizes models for your hardware, running efficiently on consumer GPUs and Apple Silicon.
  • Modelfile customization lets you create custom model configurations with system prompts, parameters, and templates.
  • Cross-platform support runs natively on macOS (Apple Silicon), Linux (NVIDIA/AMD), and Windows.

// related tools