Models

All models available on Lambda Chat.

Active

DeepSeek R1 Distill Llama 3.3 70B DeepSeek R1 Distill Llama 70B is a distilled large language model based on Llama-3.3-70B-Instruct, using outputs from DeepSeek R1.

DeepSeek V3-0324 State of the art Mixture-of-Experts model with coding performance on par with Claude Sonnet 3.7 while being fast and cost-effective.

DeepSeek R1 0528 DeepSeek R1 0528 is an FP8-quantized model based on the deepseek_v3 architecture, designed for complex computations and demanding workloads. It has demonstrated outstanding performance across various benchmark evaluations, including mathematics, programming, and general logic. Its overall performance is now approaching that of leading models, such as O3 and Gemini 2.5 Pro.

Apriel 5B Instruct The Apriel model family offers high efficiency across diverse tasks through a multi-stage training process that includes continual pretraining, supervised finetuning, and alignment techniques.

Hermes 3 Llama 3.1 405B FP8 Hermes 3 is a generalist finetune of the Llama-3.1 70B foundation model, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the board.

Llama 3.3 70B The Llama 3.3 instruction tuned text only model is optimized for multilingual dialogue use cases and outperforms many of the available open source and closed chat models on common industry benchmarks.

Llama 4 Maverick 17B Meta's most intelligent multimodal open-source model in its class. Features 17B active parameters with 128 experts (400B total), beating GPT-4o and Gemini 2.0 Flash on coding, reasoning, and image understanding benchmarks. First Llama model with native multimodality and mixture-of-experts architecture.

Llama 4 Scout 17B Meta's efficient multimodal model with industry-leading 10M token context window. Features 17B active parameters with 16 experts (109B total), optimized for long-context tasks like multi-document analysis, extensive codebase reasoning, and personalized AI experiences.

Qwen3-32B-FP8 Qwen3-32B-FP8 is part of Alibaba's latest generation of hybrid reasoning models featuring seamless switching between thinking and non-thinking modes. With 32.8B parameters and FP8 quantization, it offers superior reasoning capabilities for mathematics, coding, and logical tasks while maintaining cost efficiency. The model supports 119 languages and dialects with MCP support for advanced agent workflows.