TechFeed

归档 · 2025 年 1 月

2025 年 1 月 5 篇 / 4 天 ← 回到主页

1月28日周二2025-01-281 篇

Qwen
Qwen2.5-Max:探索大规模 MoE 模型的智能Qwen2.5-Max: Exploring the Intelligence of Large-scale MoE Model

QWEN CHAT API DEMO DISCORD 众所周知,持续扩大数据规模和模型规模可以显著提升模型智能。然而,研究界和工业界在有效扩展极大模型方面经验有限…

QWEN CHAT API DEMO DISCORD It is widely recognized that continuously scaling both data size and model size can lead to significant improvements in model intelligence. However, the research and industry community has limited experience in effectively scaling extremely large models…

1月26日周日2025-01-262 篇

Qwen
Qwen2.5-1M:部署您自己的 Qwen,支持最高 1M 令牌的上下文长度Qwen2.5-1M: Deploy Your Own Qwen with Context Length up to 1M Tokens

技术报告 HuggingFace ModelScope Qwen Chat HuggingFace 演示 ModelScope 演示 DISCORD 介绍 在将 Qwen2.5-Turbo 升级至支持最高一百万令牌的上下文长度两个月后,我们再次推出开源的 Qwen2.5-1M 模型及相应的推理框架…

Tech Report HuggingFace ModelScope Qwen Chat HuggingFace Demo ModelScope Demo DISCORD Introduction Two months after upgrading Qwen2.5-Turbo to support context length up to one million tokens, we are back with the open-source Qwen2.5-1M models and the corresponding inference frame…

Qwen
Qwen2.5 VL! Qwen2.5 VL! Qwen2.5 VL!Qwen2.5 VL! Qwen2.5 VL! Qwen2.5 VL!

QWEN CHAT GITHUB HUGGING FACE MODELSCOPE DISCORD 我们发布了 Qwen2.5-VL,作为 Qwen 的全新旗舰视觉语言模型,并且相较于之前的 Qwen2-VL 有显著提升。想体验最新模型,请前往 Qwen Chat 并选择 Qwen2.5-VL-72B-Instruct。同时,我们开放了 b…

QWEN CHAT GITHUB HUGGING FACE MODELSCOPE DISCORD We release Qwen2.5-VL, the new flagship vision-language model of Qwen and also a significant leap from the previous Qwen2-VL. To try the latest model, feel free to visit Qwen Chat and choose Qwen2.5-VL-72B-Instruct. Also, we open b…

1月20日周一2025-01-201 篇

Qwen
全局批次负载均衡几乎是免费午餐,可提升您的 MoE LLM 训练Global-batch load balance almost free lunch to improve your MoE LLM training

GITHUB HUGGING FACE MODELSCOPE DISCORD 背景 Mixture-of-Experts(MoEs)架构已成为一种流行的模型参数扩展技术。通常,一个 MoE 层由一个路由器(常被参数化为单个 Linear 层)和一组专家(用于 transformer…

GITHUB HUGGING FACE MODELSCOPE DISCORD Background The Mixture-of-Experts (MoEs) architecture has become a popular model-parameter-scale-up technique. Typically, one MoE layer consists of a router (often parameterized as one single Linear layer) and a group of experts (for transfo…

1月13日周一2025-01-131 篇

Qwen
迈向数学推理中的有效过程监督Towards Effective Process Supervision in Mathematical Reasoning

GITHUB HUGGING FACE MODELSCOPE DISCORD 介绍 近年来,大型语言模型(LLM)在数学推理方面取得了显著进展,但它们仍可能出现错误,如计算错误或逻辑错误,导致错误的结论。此外,即使实现了…

GITHUB HUGGING FACE MODELSCOPE DISCORD Introduction In recent years, Large Language Models (LLMs) have made remarkable advances in mathematical reasoning, yet they can make mistakes, such as miscalculations or logical errors, leading to wrong conclusions. Moreover, even when achi…

归档按月浏览全部历史