Qwen 3 14B
Qwen 3 model with 14B parameters offering excellent performance-to-size efficiency
Read more→Qwen 3 model with 14B parameters offering excellent performance-to-size efficiency
Read more→MoE Qwen 3 model with 30B total parameters, activating 3B for efficient inference
Read more→Qwen3 235B Thinking is a MoE model (235B total, 22B active) optimized for complex reasoning. It generates thinking traces for deep problem solving.
Read more→Qwen API model with 1M token context support for extensive document processing
Read more→Qwen3-235B-A22B-Instruct-2507 is a multilingual, instruction-tuned mixture-of-experts language model based on the Qwen3-235B architecture, with 22B active parameters per forward pass
Read more→Qwen3 30B Instruct is a MoE model (30.5B total, 3.3B active). It offers strong instruction following and multilingual capabilities.
Read more→Qwen3 Next 80B Thinking is a reasoning-first MoE model (80B total). It specializes in hard multi-step problems and agentic planning.
Read more→Qwen3.5 35B A3B is an efficient hybrid Gated DeltaNet + MoE transformer activating 3B of its 35B parameters. It delivers massive multimodal capabilities and 201-language support under an Apache 2.0 license.
Read more→Qwen3.6 Plus is Alibaba's proprietary flagship featuring a 1M token context. It provides a superior "vibe coding" experience through highly stable hybrid thinking modes and repository-level problem solving.
Read more→Qwen3.5 122B A10B balances high performance with efficiency, activating 10B of its 122B parameters. It achieves 72.4% on SWE-bench Verified, making it a premier open-weight model for agentic workflows.
Read more→Amazon Nova Lite 1.0 is a low-cost multimodal model with 300k context. It is optimized for speed and processing image/video inputs.
Read more→Amazon Nova Pro 1.0 is a balanced multimodal model offering accuracy and speed. It handles extensive context and is suitable for general tasks.
Read more→Amazon Nova Premier is the most capable of Amazon’s multimodal models for complex reasoning tasks and for use as the best teacher for distilling custom models.
Read more→Amazon Nova 2 Lite is a cost-efficient multimodal engine with a 1M token context. It seamlessly processes text, code, images, and video, natively supporting python interpreter tools for data analysis workflows.
Read more→Claude 3.5 Haiku is Anthropic's fastest and most cost-effective model, featuring 200k context. It excels in coding, data extraction, and real-time tasks, matching Claude 3 Opus in many benchmarks.
Read more→Latest generation Sonnet with best-in-class performance for complex agents and coding tasks
Read more→Exceptional reasoning model for specialized complex tasks requiring advanced analytical capabilities
Read more→Hybrid reasoning model with extended thinking mode for complex problem-solving and quick responses
Read more→Updated Haiku model from October 2024 with enhanced accuracy and performance
Read more→Claude Haiku 4.5 is Anthropic's fastest and most cost-effective model, featuring a 200K context window. It delivers near-frontier reasoning and coding speeds suitable for real-time agentic applications.
Read more→Claude Sonnet 4.5 is Anthropic's most advanced model for real-world agents and coding. It features a 1M token context, state-of-the-art coding performance, and enhanced agentic capabilities.
Read more→Claude Opus 4.5 is Anthropic's frontier reasoning model, optimized for complex software engineering and long-horizon tasks. It supports extended thinking and multimodal capabilities.
Read more→Claude Opus 4.6 is features 1M token context and leading scores on Terminal-Bench 2.0. It leverages Context Compaction to sustain infinitely long agentic coding workflows.
Read more→Claude Sonnet 4.6 represents a total upgrade in knowledge work and design. It achieves unprecedented computer-use reliability, executing complex UI automation and software engineering across a 1M token context window.
Read more→Claude Opus 4.7 is the next generation of Anthropic's Opus family, built for long-running, asynchronous agents. Building on the coding and agentic strengths of Opus 4.6, it delivers stronger performance on complex, multi-step tasks and more reliable agentic execution across extended workflows.
Read more→DeepSeek V3 0324 is a cost-effective MoE model with 671B parameters. It excels in coding and problem-solving, offering a budget-friendly alternative with strong performance.
Read more→DeepSeek R1 0528 is an open-source model with 671B parameters (37B active). It offers performance on par with proprietary reasoning models, featuring fully open reasoning tokens.
Read more→Advanced DeepSeek reasoning model with RL training, comparable to OpenAI o1 in performance
Read more→Efficient MoE model with 671B parameters trained with FP8, achieving strong benchmark results
Read more→DeepSeek V3.2 Exp is an experimental model featuring DeepSeek Sparse Attention (DSA) for high efficiency. It delivers long-context handling up to 128k tokens with reduced inference costs.
Read more→DeepSeek-V3.1 is a hybrid reasoning model (671B params, 37B active) supporting thinking and non-thinking modes. It improves on V3 with better tool use, code generation, and reasoning efficiency.
Read more→DeepSeek V3.2 is a 685B parameter MoE model leveraging DeepSeek Sparse Attention (DSA). It excels in complex mathematical reasoning and programming competitions, featuring integrated tool-use thinking modes.
Read more→DeepSeek-V3.2-Speciale is a high-compute variant of DeepSeek-V3.2 optimized for maximum reasoning and agentic performance
Read more→Gemini 2.5 Flash is Google's workhorse model for high-frequency tasks. It features a 1M context window, optimized for speed and efficiency in reasoning and multimodal processing.
Read more→Gemini 2.5 Flash-Lite is a lightweight reasoning model optimized for ultra-low latency. It offers a 1M context window and is designed for cost-effective, high-throughput applications.
Read more→Gemini 2.5 Pro is Google's best reasoning model, featuring a 1M token context window. It uses a sparse MoE architecture to excel in complex reasoning, coding, and multimodal tasks.
Read more→Gemma 3 27B is an open-source multimodal model by Google. It supports vision-language inputs, 128k context, and offers improved math and reasoning capabilities.
Read more→Specific Gemini 2.0 Flash version with stable performance and consistent behavior
Read more→Preview version of Gemini 2.5 Pro with advanced reasoning capabilities released in March 2025
Read more→Gemini 3 Pro Preview is Google's flagship frontier model. It offers high-precision multimodal reasoning across text, audio, video, and code, with a 1M token context.
Read more→Gemini 3 Flash Preview is a highly efficient model delivering Gemini 3 Pro-level reasoning and near real-time agentic tool orchestration with significantly lower latency and cost.
Read more→Gemini 3.1 Flash Lite Preview is an ultra-efficient, high-volume workhorse model featuring a 1M context window and 2.5x faster time-to-first-token than previous generations.
Read more→Gemma 4 31B IT is an open-weights dense multimodal model from Google DeepMind featuring a 256K context window, native video/audio processing, and advanced configurable reasoning capabilities under an Apache 2.0 license.
Read more→Gemma 4 26B A4B IT is a latency-optimized open-weights MoE model activating only 3.8B parameters per token. It delivers near-31B dense quality while preserving hardware constraints for edge and enterprise deployments.
Read more→Gemini 3.1 Pro Preview is Google's flagship reasoning model featuring a 1M token context, three-tier adjustable reasoning depth controls, and unparalleled complex problem-solving capabilities across multimodal inputs.
Read more→Llama 4 Scout variant with 17B parameters and mixture-of-experts architecture for efficiency
Read more→Llama 3.3 model with 70B parameters offering improved performance over 3.1 version
Read more→FP8-quantized 17B Llama 4 Maverick model optimized for deployment efficiency and speed
Read more→Llama 4 Scout is a 109B parameter (17B active) MoE model. It is designed for efficiency and visual reasoning with a 328k context.
Read more→Phi-4 is a 14B parameter model by Microsoft. It excels in complex reasoning and limited memory environments, trained on high-quality synthetic data.
Read more→Phi-3 Mini is a 3.8B lightweight model with 128k context. It offers state-of-the-art performance for its size, suitable for edge devices.
Read more→MiniMax M2 is a 230B (10B active) MoE model. It is highly efficient, designed for coding and agentic workflows with low latency.
Read more→MiniMax M2.7 is a 230B parameter MoE model (10B active) utilizing RoPE and QK RMSNorm. It features recursive self-optimization, updating its own memory to execute highly complex software engineering tasks.
Read more→MiniMax M2.5 is a hyper-efficient 230B MoE model (10B active) trained via large-scale RL in 200,000+ environments. It excels in office productivity, outputting at 100 tokens/sec at unprecedented cost efficiency.
Read more→Updated Mistral Large from November 2024 with improved performance and capabilities
Read more→Magistral Small 2506 is a 24B parameter model by Mistral AI. It is optimized for multilingual reasoning and instruction following.
Read more→Compact 24B parameter Mistral model optimized for cost-effective instruction following
Read more→No description available.
Read more→No description available.
Read more→Mistral Large 3 is a massive open-weight granular MoE model featuring 675B total parameters (41B active). It offers top-tier reliability for production-grade assistants and long-context code comprehension.
Read more→A balanced model in the Ministral 3 family, Ministral 3 8B is a powerful, efficient tiny language model with vision capabilities.
Read more→Mistral Medium 3.1 is an updated version of Mistral Medium 3, which is a high-performance enterprise-grade language model designed to deliver frontier-level capabilities at significantly reduced operational cost.
Read more→Mistral Small 4 unifies Instruct, Magistral, and Devstral capabilities into a single 119B MoE architecture activating just 6.5B parameters. It offers configurable reasoning effort and native multimodality.
Read more→Kimi K2 Instruct is a large open-weight model (1T params, 32B active) by Moonshot AI. It offers strong performance in instruction following and general tasks.
Read more→No description available.
Read more→Kimi K2 Thinking is a reasoning variant capable of autonomous long-horizon tasks. It can execute hundreds of sequential tool calls.
Read more→Kimi K2.5 is a 1-trillion parameter open-weight MoE (32B active). It features a native MoonViT encoder and self-directed Agent Swarm technology capable of orchestrating 100 sub-agents in parallel.
Read more→NVIDIA-tuned 253B Llama 3.1 model optimized for enterprise applications and instruction following
Read more→NVIDIA optimized 49B Llama 3.3 model providing excellent performance-to-size ratio
Read more→Llama 4 Maverick is Meta's natively multimodal 400B MoE model (17B active). It utilizes early fusion of text and vision tokens and was codistilled using online RL to master complex visual-reasoning tasks.
Read more→NVIDIA tuned 70B Llama 3.1 model with enhanced instruction following and helpfulness
Read more→Llama 3.3 Nemotron Super 49B is a reasoning model derived from Llama 3.3 70B. It is post-trained for agentic workflows, RAG, and tool calling.
Read more→Nemotron Nano 9B v2 is a compact 9B model by NVIDIA. It is a unified model for reasoning and non-reasoning tasks, trained from scratch.
Read more→Llama 3.1 Nemotron Ultra 253B is a derivative of Llama 3.1 405B, optimized for reasoning and chat. It offers a balance of accuracy and efficiency.
Read more→Nemotron 3 Nano 30B A3B is a highly efficient 31.6B total parameter MoE model activating only 3.2B parameters. It offers a 1M token context window and up to 3.3x higher throughput for agentic systems.
Read more→Nemotron 3 Super is a 120B parameter hybrid Mamba-Transformer model (12B active). It utilizes LatentMoE and Multi-Token Prediction (MTP) to maximize compute efficiency for complex RAG and IT ticket automation.
Read more→A large-scale, 32-billion-parameter model designed for deep reasoning, complex multi-step logic, and advanced instruction following.
Read more→Smaller, faster, and more affordable version of GPT-4o, ideal for high-volume applications requiring good intelligence
Read more→Enhanced iteration of GPT-4 with improved reasoning, coding, and multimodal capabilities
Read more→Most advanced OpenAI reasoning model with multimodal capabilities and agentic tool use for complex analysis
Read more→Lightweight reasoning model balancing speed and intelligence for everyday complex tasks
Read more→GPT-5 is OpenAI's latest flagship model, designed as an adaptive system. It features dynamic reasoning depth, 400k context, and improvements in accuracy and multimodal integration.
Read more→GPT-5-Nano is the smallest and fastest variant in the GPT-5 system, optimized for developer tools, rapid interactions, and ultra-low latency environments.
Read more→GPT-5 Mini is a compact version of GPT-5 for lightweight reasoning. It offers low latency and cost, suitable for high-frequency tasks.
Read more→GPT-OSS-120B is an open-weight MoE model from OpenAI containing 116.8B total parameters (5.1B active). Licensed under Apache 2.0, it is post-trained with MXFP4 quantization to run inference efficiently on a single 80GB GPU.
Read more→Compact GPT-4.1 variant optimized for efficiency while maintaining strong performance
Read more→January 2025 release of o3-mini with enhanced STEM capabilities and developer features
Read more→April 2025 o4-mini release with improved reasoning efficiency and balanced performance
Read more→GPT-5.1 offers stronger general-purpose reasoning and instruction adherence than GPT-5. It features adaptive computation and a natural conversational style.
Read more→GPT-5.2 Pro is OpenAI’s most advanced model, offering major improvements in agentic coding and long context performance over GPT-5 Pro. It is optimized for complex tasks that require step-by-step reasoning, instruction following, and accuracy in high-stakes use cases.
Read more→GPT-5.2 is the latest frontier-grade model in the GPT-5 series, offering stronger agentic and long context perfomance compared to GPT-5.1. It uses adaptive reasoning to allocate computation dynamically.
Read more→GPT-OSS-20B is a compact open-weight MoE model from OpenAI containing 21B parameters (3.6B active). It uses grouped multi-query attention for low-latency inference on consumer hardware under an Apache 2.0 license.
Read more→GPT-5.4 Mini is OpenAI's highly efficient small model offering 2x faster execution than GPT-5 Mini. It supports a 400K context window and achieves 54.4% on SWE-Bench Pro, ideal for responsive coding assistants.
Read more→GPT-5.4 Nano is OpenAI's most cost-effective tier, optimized for massive-scale classification and supporting subagents. It features a 400K context window at just $0.20 per million input tokens.
Read more→GPT-5.4 is OpenAI's flagship frontier model, natively integrating frontier coding (57.7% SWE-bench Pro), state-of-the-art computer-use abilities, and deep agentic workflows over a 1.05M token context window.
Read more→Grok 3 Mini is a lightweight, fast reasoning model from xAI. It is designed for logic-based tasks and offers accessible thinking traces.
Read more→Grok 4 is xAI's general-purpose reasoning model with 314B parameters (MoE). It features real-time data integration and strong performance in general tasks.
Read more→Grok 2 version from December 2024 with incremental improvements and optimizations
Read more→Beta version of Grok 3 with extended reasoning for complex problem-solving tasks
Read more→Grok 4.1 Fast provides an immense 2M token context window. It is specifically optimized for high-speed document retrieval, customer support automation, and processing massive data pipelines.
Read more→Grok 4.1 Fast Thinking is the reasoning-enabled variant of Grok 4.1 Fast. It provides extended thought processes for complex problem-solving within a 2M context.
Read more→Grok 4.20 is a revolutionary ~6-trillion parameter MoE model that runs four specialized agents simultaneously on a shared backbone. It utilizes persona adapters to coordinate multi-agent workflows within a 2M token context.
Read more→MiMo V2 Pro is Xiaomi's flagship ~1-Trillion parameter MoE (42B active) agentic engine. Achieving an Elo of 1426 on GDPval-AA, it is designed for extreme reliability in long-horizon autonomous task execution.
Read more→GLM-4.5 is an open-weight model with 355B parameters (32B active). It uses MoE architecture to deliver state-of-the-art performance in reasoning, coding, and multimodal tasks.
Read more→GLM-4.5 Air is an efficient MoE model with 106B parameters (12B active). It is optimized for agentic applications, tool use, and speed.
Read more→GLM-4.6 is an open-weight model with 355B parameters (32B active). It uses MoE architecture to deliver state-of-the-art performance in reasoning, coding, and multimodal tasks.
Read more→GLM-5.1 is an open-weight 744B MoE model (40B active) released under the MIT license. Integrating DeepSeek Sparse Attention, it matches proprietary frontier models on SWE-Bench Pro (58.4%).
Read more→GLM-4.7 is a highly stable 358B parameter model optimized for coding and UI generation. It utilizes Interleaved Thinking and Turn-level Thinking for reliable execution of complex mathematical tasks.
Read more→