SiliconCloud · AI Models · LobeChat

Ctrl K

Model List

53

DeepSeek R1

deepseek-ai/DeepSeek-R1

DeepSeek-R1 is a reinforcement learning (RL) driven inference model that addresses issues of repetitiveness and readability within the model. Prior to RL, DeepSeek-R1 introduced cold start data to further optimize inference performance. It performs comparably to OpenAI-o1 in mathematical, coding, and reasoning tasks, and enhances overall effectiveness through meticulously designed training methods.

DeepSeek V3

deepseek-ai/DeepSeek-V3

DeepSeek-V3 is a mixture of experts (MoE) language model with 671 billion parameters, utilizing multi-head latent attention (MLA) and the DeepSeekMoE architecture, combined with a load balancing strategy that does not rely on auxiliary loss, optimizing inference and training efficiency. Pre-trained on 14.8 trillion high-quality tokens and fine-tuned with supervision and reinforcement learning, DeepSeek-V3 outperforms other open-source models and approaches leading closed-source models in performance.

DeepSeek R1 (Pro)

Pro/deepseek-ai/DeepSeek-R1

DeepSeek-R1 is a reinforcement learning (RL) driven inference model that addresses issues of repetitiveness and readability in models. Prior to RL, DeepSeek-R1 introduced cold start data to further optimize inference performance. It performs comparably to OpenAI-o1 in mathematical, coding, and reasoning tasks, and enhances overall effectiveness through carefully designed training methods.

DeepSeek V3 (Pro)

Pro/deepseek-ai/DeepSeek-V3

DeepSeek-V3 is a mixed expert (MoE) language model with 671 billion parameters, utilizing multi-head latent attention (MLA) and the DeepSeekMoE architecture, combined with a load balancing strategy without auxiliary loss to optimize inference and training efficiency. Pre-trained on 14.8 trillion high-quality tokens and fine-tuned with supervision and reinforcement learning, DeepSeek-V3 outperforms other open-source models and approaches leading closed-source models.

DeepSeek R1 Distill Llama 70B

deepseek-ai/DeepSeek-R1-Distill-Llama-70B

The DeepSeek-R1 distillation model optimizes inference performance through reinforcement learning and cold-start data, refreshing the benchmark for open-source models across multiple tasks.

DeepSeek R1 Distill Qwen 32B

deepseek-ai/DeepSeek-R1-Distill-Qwen-32B

DeepSeek-R1-Distill-Qwen-32B is a model obtained through knowledge distillation based on Qwen2.5-32B. This model is fine-tuned using 800,000 selected samples generated by DeepSeek-R1, demonstrating exceptional performance in mathematics, programming, and reasoning across multiple domains. It has achieved excellent results in various benchmark tests, including a 94.3% accuracy rate on MATH-500, showcasing strong mathematical reasoning capabilities.

DeepSeek R1 Distill Qwen 14B

deepseek-ai/DeepSeek-R1-Distill-Qwen-14B

The DeepSeek-R1 distillation model optimizes inference performance through reinforcement learning and cold-start data, refreshing the benchmark for open-source models across multiple tasks.

DeepSeek R1 Distill Llama 8B (Free)

deepseek-ai/DeepSeek-R1-Distill-Llama-8B

deepseek-ai/DeepSeek-R1-Distill-Llama-8B.description

DeepSeek R1 Distill Qwen 7B (Free)

deepseek-ai/DeepSeek-R1-Distill-Qwen-7B

DeepSeek-R1-Distill-Qwen-7B is a model obtained through knowledge distillation based on Qwen2.5-Math-7B. This model is fine-tuned using 800,000 selected samples generated by DeepSeek-R1, demonstrating excellent reasoning capabilities. It has performed outstandingly in multiple benchmark tests, achieving a 92.8% accuracy rate on MATH-500, a 55.5% pass rate on AIME 2024, and a score of 1189 on CodeForces, showcasing strong mathematical and programming abilities as a 7B scale model.

DeepSeek-R1-Distill-Qwen-1.5B (Free)

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

The DeepSeek-R1 distillation model optimizes inference performance through reinforcement learning and cold-start data, refreshing the benchmark for open-source models across multiple tasks.

DeepSeek V2.5

deepseek-ai/DeepSeek-V2.5

DeepSeek V2.5 combines the excellent features of previous versions, enhancing general and coding capabilities.

DeepSeek VL2

deepseek-ai/deepseek-vl2

DeepSeek-VL2 is a mixture of experts (MoE) visual language model developed based on DeepSeekMoE-27B, employing a sparsely activated MoE architecture that achieves outstanding performance while activating only 4.5 billion parameters. This model excels in various tasks, including visual question answering, optical character recognition, document/table/chart understanding, and visual localization.

QVQ 72B Preview

Qwen/QVQ-72B-Preview

QVQ-72B-Preview is a research-oriented model developed by the Qwen team, focusing on visual reasoning capabilities, with unique advantages in understanding complex scenes and solving visually related mathematical problems.

QwQ 32B Preview

Qwen/QwQ-32B-Preview

QwQ-32B-Preview is Qwen's latest experimental research model, focusing on enhancing AI reasoning capabilities. By exploring complex mechanisms such as language mixing and recursive reasoning, its main advantages include strong analytical reasoning, mathematical, and programming abilities. However, it also faces challenges such as language switching issues, reasoning loops, safety considerations, and differences in other capabilities.

Qwen2.5 7B Instruct (Free)

Qwen/Qwen2.5-7B-Instruct

Qwen2.5 is a brand new series of large language models designed to optimize the handling of instruction-based tasks.

Qwen2.5 7B Instruct (LoRA)

LoRA/Qwen/Qwen2.5-7B-Instruct

LoRA/Qwen/Qwen2.5-7B-Instruct.description

Qwen2.5 7B Instruct (Pro)

Pro/Qwen/Qwen2.5-7B-Instruct

Qwen2.5-7B-Instruct is one of the latest large language models released by Alibaba Cloud. This 7B model shows significant improvements in coding and mathematics. It also provides multilingual support, covering over 29 languages, including Chinese and English. The model has made notable advancements in instruction following, understanding structured data, and generating structured outputs, especially JSON.

Qwen2.5 14B Instruct

Qwen/Qwen2.5-14B-Instruct

Qwen2.5 is a brand new series of large language models designed to optimize the handling of instruction-based tasks.

Qwen2.5 32B Instruct

Qwen/Qwen2.5-32B-Instruct

Qwen2.5 is a brand new series of large language models designed to optimize the handling of instruction-based tasks.

Qwen2.5 72B Instruct

Qwen/Qwen2.5-72B-Instruct

A large language model developed by the Alibaba Cloud Tongyi Qianwen team

Qwen2.5 72B Instruct (LoRA)

LoRA/Qwen/Qwen2.5-72B-Instruct

LoRA/Qwen/Qwen2.5-72B-Instruct.description

Qwen2.5 72B Instruct (Vendor-A)

Vendor-A/Qwen/Qwen2.5-72B-Instruct

Qwen2.5-72B-Instruct is one of the latest large language models released by Alibaba Cloud. This 72B model shows significant improvements in coding and mathematics. It also provides multilingual support, covering over 29 languages, including Chinese and English. The model has made notable advancements in instruction following, understanding structured data, and generating structured outputs, especially JSON.

Qwen2.5 72B Instruct 128K

Qwen/Qwen2.5-72B-Instruct-128K

Qwen2.5 is a new large language model series with enhanced understanding and generation capabilities.

Qwen2.5 Coder 7B Instruct (Free)

Qwen/Qwen2.5-Coder-7B-Instruct

Qwen2.5-Coder-7B-Instruct is the latest version in Alibaba Cloud's series of code-specific large language models. This model significantly enhances code generation, reasoning, and repair capabilities based on Qwen2.5, trained on 55 trillion tokens. It not only improves coding abilities but also maintains advantages in mathematics and general capabilities, providing a more comprehensive foundation for practical applications such as code agents.