2026 年 5 月 · TechFeed 归档

昨天5月20日周三7 篇

AWS ML9 小时前

宣布对 Amazon SageMaker AI 端点的 OpenAI 兼容 API 支持Announcing OpenAI-compatible API support for Amazon SageMaker AI endpoints

今天，Amazon SageMaker AI 推出对实时推理端点的 OpenAI 兼容 API 支持。如果您使用 OpenAI SDK、LangChain 或 Strands Agents，只需更改端点 URL，即可在 SageMaker AI 上调用模型。您无需自定义客户端或 SigV4 …

Today, Amazon SageMaker AI introduces OpenAI-compatible API support for real-time inference endpoints. If you use the OpenAI SDK, LangChain, or Strands Agents, you can now invoke models on SageMaker AI by changing only your endpoint URL. You don’t need a custom client, a SigV4 wr…

GitHub12 小时前

调查对 GitHub 所拥有仓库的未授权访问Investigating unauthorized access to GitHub-owned repositories

如果发现任何影响，将通过既定的事件响应和通知渠道通知客户。文章《调查对 GitHub 所拥有仓库的未授权访问》首次发表于 GitHub 博客。

If any impact is discovered, customers will be notified via established incident response and notification channels. The post Investigating unauthorized access to GitHub-owned repositories appeared first on The GitHub Blog.

AWS ML15 小时前

多模态评估器：在 Strands Evals 中用于图像转文本任务的 MLLM 评审Multimodal evaluators: MLLM-as-a-judge for image-to-text tasks in Strands Evals

如果您正在构建视觉购物、图像或文档理解，或图表分析，就需要一种方法来验证模型的响应是否真正基于源图像。仅文本评估器无法判断字幕是否忠实描述图像，是否 …

If you’re building visual shopping, image or document understanding, or chart analysis, you need a way to verify whether your model’s response is actually grounded in the source image. A text-only evaluator cannot tell you whether a caption faithfully describes an image, whether…

AWS ML16 小时前

使用 Amazon SageMaker AI 和 vLLM 构建实时语音应用Build real-time voice applications with Amazon SageMaker AI and vLLM

语音代理、实时字幕、联络中心分析和辅助工具都依赖实时语音转文字，即应用程序通过单一持久连接流式传输音频并同步接收转录结果。传统的请求-响应推理 …

Voice agents, live captioning, contact center analytics, and accessibility tools all depend on real-time speech-to-text, where your application streams audio in and receives transcription back simultaneously over a single persistent connection. Traditional request-response infere…

OpenAI1 天前

OpenAI模型已推翻离散几何中的一个核心猜想An OpenAI model has disproved a central conjecture in discrete geometry

OpenAI模型解决了80年历史的单位距离问题，推翻了离散几何中的一个重要猜想，并标志着AI驱动数学的里程碑。

An OpenAI model solved the 80-year-old unit distance problem, disproving a major conjecture in discrete geometry and marking a milestone in AI-driven mathematics.

OpenAI1 天前

Ramp工程师如何使用Codex加速代码审查How Ramp engineers accelerate code review with Codex

Ramp工程师如何使用Codex与GPT-5.5进行代码审查并发布改进，使他们能够在几分钟内获得实质性反馈，而非数小时。

How Ramp engineers use Codex with GPT-5.5 to review code and ship improvements, allowing them to get substantive feedback in minutes instead of hours.

OpenAI1 天前

OpenAI面向各国教育的下一阶段The next phase of OpenAI’s Education for Countries

OpenAI推进面向各国的教育，通过新合作、教师培训和工具扩大AI在学校的应用，提升全球学习成果。

OpenAI advances Education for Countries, expanding AI adoption in schools with new partnerships, teacher training, and tools to improve global learning outcomes.

5月19日周二2026-05-1911 篇

OpenAI1 天前

推出OpenAI for SingaporeIntroducing OpenAI for Singapore

OpenAI for Singapore启动为期多年的AI合作伙伴关系，以扩大部署、培养本地人才，并为企业和公共服务提供AI支持。

OpenAI for Singapore launches a multi-year AI partnership to expand deployment, build local talent, and support businesses and public services with AI.

Hugging Face1 天前

OlmoEarth v1.1：更高效的地球观测模型系列OlmoEarth v1.1: A more efficient family of Earth observation models

Google Research1 天前

Empirical Research Assistance (ERA)：从Nature出版物到催化计算发现Empirical Research Assistance (ERA): From Nature publication to catalyzing Computational Discovery

通用科学

General Science

Airbnb1 天前

使用统一的知识图谱基础设施扩展Airbnb的身份图谱Scaling Airbnb’s identity graph with a unified knowledge graph infrastructure

Airbnb如何在大规模下从PaaS转向内部知识图谱基础设施。作者：Lucen Zhao , Shukun Yang , Ashish Jain 知识图谱提供了一种自然且强大的方式来表示实体之间的关系。许多现实系统本质上都是关于连接的…

How Airbnb shifts from PaaS to an internal knowledge graph infrastructure at scale. By: Lucen Zhao , Shukun Yang , Ashish Jain Knowledge graphs offer a natural and powerful way to represent relationships between entities. Many real-world systems are fundamentally about connection…

AWS ML1 天前

使用Amazon Nova Sonic的可扩展语音代理设计：多代理、工具和会话分段Scalable voice agent design with Amazon Nova Sonic: multi-agent, tools, and session segmentation

在本文中，您将学习如何使用Amazon Nova Sonic、Amazon Bedrock AgentCore和Strands BidiAgent构建可扩展、可维护的语音代理，以高效应对这些挑战，从而实现更快速且智能的客户交互。我们将探讨三…

In this post, you’ll learn how to use Amazon Nova Sonic, Amazon Bedrock AgentCore, and Strands BidiAgent to build scalable, maintainable voice agents that handle these challenges efficiently, resulting in more responsive and intelligent customer interactions. We’ll explore three…

AWS ML1 天前

使用Amazon Bedrock AgentCore Memory在Kiro CLI中扩展对话记忆Extending conversational memory in Kiro CLI using Amazon Bedrock AgentCore Memory

在本文中，我们演示如何通过实现一个自定义的模型上下文协议（MCP）服务器，将其与 Amazon Bedrock AgentCore Memory 集成，从而扩展 Kiro CLI 的对话记忆。您可以使用 Kiro CLI 直接在终端与 Kiro 的 AI 代理交互…

In this post, we demonstrate how you can extend the conversational memory of Kiro CLI by implementing a custom Model Context Protocol (MCP) server that integrates with Amazon Bedrock AgentCore Memory. You can use Kiro CLI to interact with AI agents of Kiro directly from your term…

AWS ML1 天前

使用 Amazon SageMaker Feature Store 的新功能加速机器学习特征流水线Accelerate ML feature pipelines with new capabilities in Amazon SageMaker Feature Store

今天，我们宣布 SageMaker Python SDK v3.8.0 中的三项新功能。在本文中，我们逐一演示这些功能，并提供可直接上手的代码示例。完整的端到端演练请参阅附带的 Lake Formation 治理笔记本…

Today, we’re announcing three new capabilities available in SageMaker Python SDK v3.8.0. In this post, we walk through each capability with code examples you can use to get started. For complete end-to-end walkthroughs, see the accompanying notebooks for Lake Formation governance…

AWS ML1 天前

在 Amazon Bedrock 上实现编程工具调用Implementing programmatic tool calling on Amazon Bedrock

在本文中，我们展示在 Amazon Bedrock 上实现程序化工具调用（PTC）的三种方式：在 ECS 上的自托管 Docker 沙箱以获得最大控制权，使用 Amazon Bedrock AgentCore Code Interpreter 的托管方案，以及通过代理实现与 Anthropic SDK 兼容的路径以 t…

In this post, we show three ways to implement Programmatic tool calling (PTC) on Amazon Bedrock: a self-hosted Docker sandbox on ECS for maximum control, a managed solution using Amazon Bedrock AgentCore Code Interpreter, and an Anthropic SDK-compatible path through a proxy for t…

Cloudflare1 天前

宣布在 Cloudflare 上推出 Claude Managed AgentsAnnouncing Claude Managed Agents on Cloudflare

Cloudflare 已与 Anthropic 的 Claude Managed Agents 集成，提供快速、隔离的自主代码交付执行环境。这意味着构建者可以在全球范围内扩展代理工作流，同时严格控制对私有后端的访问，并轻松定制…

Cloudflare has integrated with Anthropic's Claude Managed Agents to provide a fast, isolated execution environment for autonomous code delivery. This means builders can scale agent workflows globally while strictly controlling access to private backends and easily customizing the…

OpenAI1 天前

推进内容来源追溯，构建更安全、更透明的 AI 生态系统Advancing content provenance for a safer, more transparent AI ecosystem

OpenAI 通过 Content Credentials、SynthID 以及验证工具推进 AI 内容溯源，帮助人们识别并信任 AI 生成的媒体。

OpenAI advances AI content provenance with Content Credentials, SynthID, and a verification tool to help people identify and trust AI-generated media.

Hugging Face2 天前

推出 Ettin Reranker 系列Introducing the Ettin Reranker Family

5月18日周一2026-05-1814 篇

Netflix2 天前

Netflix 上 Cassandra 数据迁移的演进The Evolution of Cassandra Data Movement at Netflix

作者：Guil Pires、Jennifer Prince、Jose Camacho、Ken Kurzweil、Phanindra Chunduru 背景在之前的文章中，我们介绍了 Data Bridge——Netflix 批量数据迁移的统一管理平面。历史上，曾开发了多个定制的数据迁移连接器……

By Guil Pires , Jennifer Prince , Jose Camacho , Ken Kurzweil , Phanindra Chunduru Background In a previous post, we introduced Data Bridge , a unified management plane for batch Data Movement at Netflix. Historically, several bespoke Data Movement connectors were developed acros…

AWS2 天前

AWS 每周综述：AWS Transform 一周年、Claude 平台在 AWS、EC2 M3 Ultra Mac 实例等（2026 年 5 月 18 日）AWS Weekly Roundup: AWS Transform at 1 year, Claude Platform on AWS, EC2 M3 Ultra Mac instances, and more (May 18, 2026)

就在一年前，我们推出了面向 .NET、主机和 VMware 工作负载的 AWS Transform，这是首个专为大规模现代化企业应用而打造的代理式 AI 服务。在 re:Invent 2025 上，我们发布了 AWS Transform Custom，使组织能够现代化并…

Just a year ago, we launched AWS Transform for .NET, Mainframe and VMware workloads, the first agentic AI service purpose-built for modernizing enterprise applications at scale. At re:Invent 2025, we introduced AWS Transform custom, which enables organizations to modernize and tr…

AWS ML2 天前

使用 Amazon Nova 2 进行内容审核的提示Prompting Amazon Nova 2 for content moderation

在本文中，你将学习如何使用结构化和自由形式的方法提示 Amazon Nova 2 Lite 进行内容审核，这些方法基于 MLCommons AILuminate 评估标准。提示技术以 AILuminate 分类法为示例，但同样适用于…

In this post, you learn how to prompt Amazon Nova 2 Lite for content moderation using structured and free-form approaches, grounded in the MLCommons AILuminate Assessment Standard. The prompting techniques use the AILuminate taxonomy as an example, but they work equally well with…

DeepMind2 天前

快速推进基因研究可逆转细胞衰老Fast-tracking genetic leads to reverse cellular aging

生物学家使用 Co-Scientist 发现能够成功使人类细胞恢复活力的新因子。

Biologists use Co-Scientist to find novel factors that successfully rejuvenate human cells.

AWS ML2 天前

Aderant 使用 Amazon Quick 改造云运营Aderant transforms cloud operations with Amazon Quick

在本文中，我们分享 Aderant 如何利用 Amazon Quick 的 AI 能力统一六个供应商系统的搜索并自动化文档工作流，实现搜索速度提升 90%，文档处理加速 75%，以及其他人如何应用这些…

In this post, we share how Aderant used the AI-powered capabilities of Amazon Quick to unify search across six vendor systems and automate documentation workflows, achieving 90 percent faster search times and 75 percent documentation acceleration, and how others can apply these a…

GitHub2 天前

随时随地使用本地 GitHub 会话Take your local GitHub sessions anywhere

在 VS Code 或 CLI 中启动工作，在手机上完成。GitHub Copilot 会话的远程控制现已在 github.com 和 GitHub Mobile 上全面可用。文章《随时随地使用本地 GitHub 会话》首次发布于 The GitHub Blog。

Kick off work in VS Code or the CLI, finish it from your phone. Remote control for GitHub Copilot sessions is now generally available on github.com and GitHub Mobile. The post Take your local GitHub sessions anywhere appeared first on The GitHub Blog.

Hugging Face2 天前

使用 LoRA/DoRA 微调 NVIDIA Cosmos Predict 2.5 以生成机器人视频Fine-Tuning NVIDIA Cosmos Predict 2.5 with LoRA/DoRA for Robot Video Generation

AWS ML2 天前

将 Atlassian Confluence Cloud 与 Amazon Quick 集成Integrate Atlassian Confluence Cloud with Amazon Quick

在本文中，您将学习如何使用 Quick 设置 Confluence Cloud 集成。这包括创建用于语义搜索的知识库、设置用于查询和管理 Confluence 页面的 Actions，以及在 Quick Spaces 中组织资源。Quick 与您的当前…

In this post, you will learn how to set up the Confluence Cloud integration with Quick. This includes creating a knowledge base for semantic search, setting up Actions to query and manage Confluence pages, and organizing resources in Quick Spaces. Quick integrates with your curre…

Hugging Face2 天前

PaddleOCR 3.5：使用 Transformers 后端运行 OCR 和文档解析任务PaddleOCR 3.5: Running OCR and Document Parsing Tasks with a Transformers Backend

AWS ML2 天前

在 Amazon Bedrock AgentCore 中构建基于自定义代码的评估器Build custom code-based evaluators in Amazon Bedrock AgentCore

在本文中，您将为金融市场情报代理实现四个基于 Lambda 的自定义代码评估器，将它们注册到 AgentCore，并在按需和在线模式下运行。您还将了解如何将自定义代码评估器与内置评估器相结合…

In this post, you will implement four Lambda-based custom code evaluators for a financial market-intelligence agent, register each with AgentCore, and run them in on-demand and online modes. You will also see how to combine custom code-based evaluators with built-in evaluators an…

Hugging Face2 天前

Spotify2 天前

更好的 LLM 评估实验 — 漏斗，而非分叉Better Experiments with LLM Evals — A funnel, not a fork

TL;DR LLM 评估，即在大规模上评估相关性、一致性和质量的自动化评审，是一种强大的新工具……文章《Better Experiments with LLM Evals — A funnel, not a fork》首次发表于 Spotify Engineering。

TL;DR LLM evals, automated judges that assess relevance, coherence, and quality at scale, are a powerful new... The post Better Experiments with LLM Evals — A funnel, not a fork appeared first on Spotify Engineering.

OpenAI2 天前

OpenAI 与 Dell 合作，将 Codex 引入混合和本地企业环境OpenAI and Dell partner to bring Codex to hybrid and on-premise enterprise environments

OpenAI 与 Dell 合作，将 Codex 引入混合和本地环境，帮助企业在数据和工作流中安全部署 AI 编码代理。

OpenAI and Dell partner to bring Codex to hybrid and on-premise environments, helping enterprises deploy AI coding agents securely across data and workflows.

Cloudflare3 天前

Project Glasswing：Mythos 向我们展示的内容Project Glasswing: what Mythos showed us

最近几周，我们将 Mythos 以及其他面向安全的 LLM 应用于基础设施关键部分的实时代码。我们分享了观察结果、模型的优势与不足，以及在实现规模化之前，需要如何围绕它们进行工作。

In recent weeks, we pointed Mythos and other security-focused LLMs at live code across critical parts of our infrastructure. We share what we observed, the models’ strengths and weaknesses, and what the work around them needs to look like before any of it can scale.

5月17日周日2026-05-175 篇

DeepMind3 天前

使用 Project Genie 和 Street View 模拟真实场景Simulate real-world places with Project Genie and Street View

我们正在全球范围内为 Google AI Ultra 订阅者扩大访问权限，并推出由 Street View 提供支持的新功能。

We’re expanding access to Google AI Ultra subscribers globally and introducing a new capability powered by Street View.

DeepMind3 天前

推出 Gemini OmniIntroducing Gemini Omni

DeepMind3 天前

推出 Google Antigravity 0.2Introducing Google Antigravity 2.0

DeepMind3 天前

Gemini for Science：面向新发现时代的 AI 实验与工具Gemini for Science: AI experiments and tools for a new era of discovery

一套科学工具和实验，旨在扩大科学探索的规模和精度

A collection of science tools and experiments to expand the scale and precision of scientific exploration.

DeepMind3 天前

让人更容易了解内容是如何创建和编辑的Making it easier to understand how content was created and edited

我们正在扩展工具，帮助您了解网络上内容的创建和编辑方式

We're expanding our tools to help you understand how content was created and edited across the web.

5月16日周六2026-05-168 篇

DeepMind5 天前

加强新加坡的AI未来：全新国家合作伙伴关系Strengthening Singapore’s AI Future: A New National Partnership

Google DeepMind 与新加坡合作，运用前沿 AI 应对健康、教育、可持续发展等领域的复杂挑战

Google DeepMind and Singapore partner to apply frontier AI to address complex challenges across health, education, and sustainability and more.

DeepMind5 天前

寻找新传染病背后的分子开关Finding the molecular switches behind new infectious diseases

Clare Bryant 使用 Co-Scientist 识别新发传染病中的遗传触发因素

Clare Bryant uses Co-Scientist to identify genetic triggers in emerging infectious diseases.

DeepMind5 天前

在衰老研究中开辟新路径Opening new paths in aging research

Calico Life Sciences 使用 Co-Scientist 将分散的发现联系起来，生成老龄化研究的新线索。

Calico Life Sciences uses Co-Scientist to connect scattered findings and generate new leads in aging research.

DeepMind5 天前

加速肝病机制的发现Accelerating discovery of liver disease mechanisms

Filippo Menolascina 使用 Co-Scientist 识别新的肝病治疗方案，并解释为何现有药物仅对部分患者有效。

Filippo Menolascina uses Co-Scientist to identify new liver disease treatments and explain why existing drugs only help certain patients.

DeepMind5 天前

整合生物工具箱，探索 ALS 的新方法Uniting biological toolkits for a new approach to ALS

Co-Scientist 将波士顿儿童医院和 MIT 实验室联合起来，探索 ALS 的新型 RNA 基因疗法。

Co-Scientist unites Boston Children’s Hospital and MIT’s labs to explore new RNA-based treatments for ALS.

DeepMind5 天前

发掘再利用药物以对抗肝纤维化Uncovering repurposed medicines to fight liver fibrosis

斯坦福遗传学家使用 Co-Scientist 帮助寻找慢性肝病和肝纤维化的新治疗方法。

Stanford geneticist uses Co-Scientist to help find new treatments for chronic liver disease and liver fibrosis.

DeepMind5 天前

WeatherNext 如何帮助美国国家飓风中心更准确预测梅丽莎飓风在牙买加的历史性登陆How WeatherNext helped the National Hurricane Center better predict Hurricane Melissa’s historic landfall in Jamaica

了解我们的 WeatherNext AI 模型如何帮助预报员为社区提供前所未有的时间，以在历史性的飓风 Melissa 之前做好准备。

Learn how our WeatherNext AI model help forecasters give communities unprecedented time to prepare ahead of the historic Hurricane Melissa.

OpenAI5 天前

OpenAI 与马耳他合作，将 ChatGPT Plus 带给所有公民OpenAI and Malta partner to bring ChatGPT Plus to all citizens

OpenAI 与马耳他合作，扩大 AI 访问，提供 ChatGPT Plus 和培训，帮助公民培养实用的 AI 技能并负责任地使用 AI。

OpenAI and Malta partner to expand AI access, offering ChatGPT Plus and training to help citizens build practical AI skills and use AI responsibly.

5月15日周五2026-05-1512 篇

DeepMind5 天前

Gemini 3.5：前沿智能与行动Gemini 3.5: frontier intelligence with action

Gemini 3.5 旨在帮助您执行复杂的自主工作流。

Gemini 3.5 is built to help you execute complex, agentic workflows.

Microsoft Research5 天前

关于我们近期 AI 委托与长期可靠性研究的进一步说明Further Notes on Our Recent Research on AI Delegation and Long-Horizon Reliability

我们的近期论文《LLMs Corrupt Your Documents When You Delegate》引发了关于委托工作流中 AI 系统可靠性的讨论。我们感谢对该工作的关注，并希望澄清论文所做——以及未做——的若干重要点…

Our recent paper, “LLMs Corrupt Your Documents When You Delegate”, has generated discussion about the reliability of AI systems in delegated workflows. We appreciate the interest in this work and want to clarify several important points about what the paper does—and does not—clai…

GitHub5 天前

构建通用可访问性代理——以及我们在过程中学到的经验Building a general-purpose accessibility agent—and what we learned in the process

了解 GitHub 正在试点的实验性通用可访问性代理。文章《构建通用可访问性代理——以及我们在过程中学到的东西》首次发表于 The GitHub Blog

Learn about the experimental general-purpose accessibility agent that GitHub is piloting. The post Building a general-purpose accessibility agent—and what we learned in the process appeared first on The GitHub Blog.

AWS ML5 天前

限制对 Amazon S3 上 Amazon Quick 知识库中敏感文档的访问Restrict access to sensitive documents in your Amazon Quick knowledge bases for Amazon S3

在本文中，我们将逐步演示如何为 Amazon Quick 中的 S3 知识库配置文档级 ACL。您将学习如何设置并验证一种 ACL 配置，以在聊天和自动化工作流中强制执行文档级权限

In this post, we walk through how to configure document-level ACLs for your S3 knowledge base in Amazon Quick. You will learn how to set up and verify an ACL configuration that enforces document-level permissions across chat and automated workflows.

GitHub5 天前

提升标准：质量、共享责任与 GitHub 漏洞赏金计划的未来Raising the bar: Quality, shared responsibility, and the future of GitHub’s bug bounty program

我们正在更新漏洞赏金计划标准，以优先考虑高质量提交，明确共享责任边界，并改进对低风险发现的奖励方式。文章《提升标准：质量、共享责任与 GitHub 漏洞赏金计划的未来》出现…

We're updating our bug bounty program standards to prioritize quality submissions, clarify shared responsibility boundaries, and evolve how we reward low-risk findings. The post Raising the bar: Quality, shared responsibility, and the future of GitHub’s bug bounty program appeare…

Amazon Science5 天前

在不牺牲准确性的前提下提升 LLM 的速度Making LLMs faster without sacrificing accuracy

一种将特定架构选择与损失关联的新尺度定律有助于识别出在不降低准确性的情况下吞吐量提升最高可达 47% 的模型。

A new scaling law that relates particular architectural choices to loss helps identify models that improve throughput by up to 47% with no loss of accuracy.

OpenAI6 天前

Databricks 将 GPT-5.5 引入企业代理工作流Databricks brings GPT-5.5 to enterprise agent workflows

Databricks 在模型于 OfficeQA Pro 基准上创下新纪录后，使用 GPT-5.5 进行企业代理工作流

Databricks uses GPT-5.5 for enterprise agent workflows after the model set a new state of the art on the OfficeQA Pro benchmark.

OpenAI6 天前

销售团队如何使用 CodexHow sales teams use Codex

了解销售团队如何利用 Codex 根据真实工作输入生成渠道简报、会议准备材料、预测评审、客户计划以及停滞交易诊断

See how sales teams can use Codex to create pipeline briefs, meeting prep packets, forecast reviews, account plans, and stalled-deal diagnoses from real work inputs.

OpenAI6 天前

数据科学团队如何使用 CodexHow data science teams use Codex

了解数据科学团队如何利用 Codex 根据真实工作输入制作根因简报、影响报告、KPI 备忘录、范围分析和仪表盘规格

See how data science teams can use Codex to build root-cause briefs, impact readouts, KPI memos, scoped analyses, and dashboard specs from real work inputs.

OpenAI6 天前

业务运营团队如何使用 CodexHow business operations teams use Codex

了解业务运营团队如何利用 Codex 根据真实工作输入创建项目简报、战略更新、领导决策材料、进度更新等

See how business operations teams can use Codex to create initiative briefs, strategy updates, leadership decision packets, progress updates, and more from real work inputs.

OpenAI6 天前

ChatGPT 中全新的个人理财体验A new personal finance experience in ChatGPT

在美国的 ChatGPT Pro 用户可预览全新的个人理财体验。安全连接您的金融账户，获取基于您财务背景、目标和优先事项的 AI 驱动洞察和指导。

Preview a new personal finance experience in ChatGPT for Pro users in the U.S. Securely connect your financial accounts and get AI-powered insights and guidance grounded in your financial context, goals, and priorities.

美团6 天前

美团 LongCat 开源 General 365：树立推理评测新标尺Meituan LongCat open-sources General 365: Setting a new benchmark for reasoning evaluation

美团 LongCat 团队正式发布 General 365。我们发现，在对 26 款主流模型的实测中，目前地表最强的 Gemini 3 Pro 准确率仅为 62.8%，而绝大多数模型甚至没能摸到 60 分的及格线。

The Meituan LongCat team officially released General 365. We found that in testing 26 mainstream models, the current top-performing Gemini 3 Pro achieved an accuracy of only 62.8%, while the vast majority of models did not even reach the 60‑point passing line.

5月14日周四2026-05-1414 篇

AWS6 天前

Amazon Bedrock 推出全新高级提示优化与迁移工具Amazon Bedrock introduces new advanced prompt optimization and migration tool

Amazon Bedrock 高级提示优化让客户能够针对当前模型优化提示，或更快地将提示迁移至新模型，内置评估反馈循环。优化提示并可同时比较多达 5 种模型的结果……

Amazon Bedrock Advanced Prompt Optimization enables customers to optimize their prompts for their current model or migrate prompts to new models faster than before with built-in evaluation feedback loops. Optimize your prompts and compare results for up to 5 models simultaneously…

GitHub6 天前

GitHub 可用性报告：2026 年 4 月GitHub availability report: April 2026

4 月份，我们经历了 10 起导致 GitHub 服务性能下降的事件。文章《GitHub 可用性报告：2026 年 4 月》首次发布于 GitHub 博客。

In April, we experienced 10 incidents that resulted in degraded performance across GitHub services. The post GitHub availability report: April 2026 appeared first on The GitHub Blog.

OpenAI6 天前

Sea 对 Codex 引领的代理式软件开发未来的看法Sea's View on the Future of Agentic Software Development with Codex

Sea Limited 的首席产品官解释了公司为何在工程团队中部署 Codex，以加速亚洲的 AI 原生软件开发。

Sea Limited's CPO explains why the company is deploying Codex across engineering teams to accelerate AI-native software development in Asia.

Hugging Face6 天前

Granite Embedding Multilingual R2：开放 Apache 2.0 多语言嵌入，支持 32K 上下文 — 最佳 100M 以下检索质量Granite Embedding Multilingual R2: Open Apache 2.0 Multilingual Embeddings with 32K Context — Best Sub-100M Retrieval Quality

AWS ML6 天前

使用 Amazon Lex Assisted NLU 提升机器人准确率Improve bot accuracy with Amazon Lex Assisted NLU

在本文中，您将学习如何有效实现 Assisted NLU。您将了解如何通过有效的意图和槽位描述改进机器人设计，使用 Test Workbench 验证实现，并规划从传统 NLU 向 Assisted NLU 的转变…

In this post, you will learn how to implement Assisted NLU effectively. You will learn how to improve your bot design with effective intent and slot descriptions, validate your implementation using Test Workbench, and plan your transition from traditional NLU to Assisted NLU for…

AWS ML6 天前

使用 Stream Vision Agents 与 Amazon Nova 2 Sonic 构建实时语音代理Real-time voice agents with Stream Vision Agents and Amazon Nova 2 Sonic

在本文中，您将学习如何将 Stream 的 Vision Agents 开源框架与 Amazon Bedrock 和 Amazon Nova 2 Sonic 结合，构建可在数分钟内投入生产的实时语音代理。您将了解集成的内部工作原理，并浏览代码示例…

In this post, you learn how to combine Stream's Vision Agents open-source framework with Amazon Bedrock and Amazon Nova 2 Sonic to build real-time voice agents that can be production-ready in minutes. You'll learn how the integration works under the hood, walk through code exampl…

AWS ML6 天前

从孤立数据到统一洞察：Amazon Quick 的跨账户 Athena 访问From siloed data to unified insights: Cross-account Athena Access for Amazon Quick

今天，我们宣布 Amazon Quick 的跨账户 Athena 访问功能。借助此功能，客户可以使用 AWS 身份与访问管理（IAM）角色链在其他 AWS 账户中查询 Athena 数据，查询费用计入数据所在的账户。

Today, we're announcing cross-account Athena access for Amazon Quick. With this feature, customers can query Athena data in other AWS accounts using AWS Identity and Access Management (IAM) role chaining, with query costs billed to the account where the data resides.

AWS ML6 天前

使用 Amazon Bedrock AgentCore 的 Chrome 企业策略控制 AI 代理的浏览范围Control where your AI agents can browse with Chrome enterprise policies on Amazon Bedrock AgentCore

本文将演示如何配置 Chrome 企业策略，将浏览器代理限制在特定网站上，通过会话录制观察策略执行情况，并使用公共测试站点展示自定义根 CA 证书。该实操提供了一个可用的解决方案…

In this post, you will configure Chrome enterprise policies to restrict a browser agent to a specific website, observe the policy enforcement through session recording, and demonstrate custom root CA certificates using a public test site. The walkthrough produces a working soluti…

GitHub6 天前

从延迟到即时：现代化 GitHub Issues 导航性能From latency to instant: Modernizing GitHub Issues navigation performance

GitHub Issues 团队通过客户端缓存、智能预取和 Service Worker，使导航几乎即时。文章《从延迟到即时：现代化 GitHub Issues 导航性能》首次发布于 The GitHub Blog。

How the GitHub Issues team used client-side caching, smart prefetching, and service workers to make navigation feel instant. The post From latency to instant: Modernizing GitHub Issues navigation performance appeared first on The GitHub Blog.

Amazon Science6 天前

Promptimus：在无需人工工程的情况下提升已有的优秀 LLM 提示Promptimus: Improving already good LLM prompts with zero manual engineering

通过聚焦特定失效点并提供针对性解决方案，全新的自动化提示工程框架在不影响现有功能的前提下提升提示性能。

By focusing on specific failure points and suggesting targeted solutions, a new automated prompt-engineering framework improves prompt performance without compromising existing functionality.

OpenAI6 天前

随时随地使用 CodexWork with Codex from anywhere

使用 ChatGPT 移动应用随处使用 Codex。实时在各设备和远程环境中监控、引导并批准编码任务。

Use Codex anywhere with the ChatGPT mobile app. Monitor, steer, and approve coding tasks in real time across devices and remote environments.

Cloudflare6 天前

我们的计费流水线突然变慢。罪魁祸首是 ClickHouse 中的隐藏瓶颈Our billing pipeline was suddenly slow. The culprit was a hidden bottleneck in ClickHouse

当对我们 PB 级别 ClickHouse 集群的分区方式进行更改导致关键计费任务卡住时，常规指标未显示明显错误。本文探讨了我们如何发现 ClickHouse 查询规划器中的严重锁竞争，并编写上游补丁予以修复。

When a partitioning change to our petabyte-scale ClickHouse cluster caused critical billing jobs to stall, standard metrics showed no obvious errors. This post explores how we identified severe lock contention in ClickHouse's query planner and built upstream patches to fix it.

OpenAI7 天前

帮助 ChatGPT 更好地识别敏感对话中的上下文Helping ChatGPT better recognize context in sensitive conversations

了解最新的 ChatGPT 安全更新如何提升对敏感对话的上下文感知，帮助随时间检测风险并更安全地响应。

Learn how new ChatGPT safety updates improve context awareness in sensitive conversations, helping detect risk over time and respond more safely.

Hugging Face7 天前

在连续批处理中解锁异步性Unlocking asynchronicity in continuous batching

5月13日周三2026-05-1312 篇

AWS ML7 天前

使用 Pulse AI 和 Amazon Bedrock 构建金融文档处理Build financial document processing with Pulse AI and Amazon Bedrock

本文演示了如何构建文档提取和模型微调流水线，以应对处理复杂金融文档时的挑战。通过将 Pulse AI 的高级文档理解能力与 Amazon Bedrock 强大的 AI 服务相结合…

This post demonstrates how to build a documentation extraction and model fine-tuning pipeline that addresses challenges when processing the complex financial documents. By combining Pulse AI's advanced document understanding capabilities with the powerful AI services of Amazon Be…

AWS ML7 天前

使用 Amazon Nova Sonic 和 WebRTC 构建实时语音流应用Build real-time voice streaming applications with Amazon Nova Sonic and WebRTC

构建端到端的实时语音交互直播应用面临多项挑战。本文介绍了一种基于 Amazon Nova 2 Sonic（Nova Sonic）和 Amazon Kinesis Video Streams WebRTC（WebRTC）的解决方案，以应对这些挑战。本文，…

Building end-to-end live streaming applications with real-time voice interaction presents several challenges. This post introduces a solution based on Amazon Nova 2 Sonic (Nova Sonic) and Amazon Kinesis Video Streams WebRTC (WebRTC) that addresses these challenges. In this post,…

AWS ML7 天前

保护 AI 代理：AWS 与 Cisco AI Defense 如何扩展 MCP 与 A2A 部署Securing AI agents: How AWS and Cisco AI Defense scale MCP and A2A deployments

Cisco 与 AWS 的合作针对企业在扩展 AI 代理时面临的三大挑战——可视性缺口、安全瓶颈和合规风险。本文将探讨如何通过自动化扫描和统一治理来克服 AI 安全挑战。

The Cisco and AWS partnership addresses three challenges enterprises face when scaling AI agents: visibility gaps, security bottlenecks, and compliance risks. In this post, we explore how you can overcome AI security challenges through automated scanning and unified governance.

AWS ML7 天前

使用 Databricks Unity Catalog 与 Amazon SageMaker AI 微调 LLMFine-tune LLM with Databricks Unity Catalog and Amazon SageMaker AI

本文演示了如何构建一个安全、完整的 LLM 微调工作流，该工作流将 Unity Catalog 与 Amazon SageMaker AI 集成，并使用 Amazon EMR Serverless 进行预处理。该方案展示了如何安全访问受治理的数据、在各服务之间保持血缘关系，…

In this post, we demonstrate how to build a secure, complete LLM fine-tuning workflow that integrates Unity Catalog with Amazon SageMaker AI using Amazon EMR Serverless for preprocessing. The solution shows how to securely access governed data, maintain lineage across services, f…

Microsoft Research7 天前

mimalloc：面向现代时代的全新高性能可扩展内存分配器mimalloc: A new, high-performance, scalable memory allocator for the modern era

mimalloc 是一个开源、现代、可扩展的内存分配器，可直接替代 malloc 和 free。它体积相对较小（约 12K 行），内部数据结构清晰，且易于构建和集成到其他项目中。它提供有界的最坏情况分配…

mimalloc is an open-source, modern, scalable memory allocator that is a drop-in replacement for malloc and free. It is relatively small (~12K lines), with clear internal data structures, and is easy to build and integrate into other projects. It provides bounded worst-case alloca…

Airbnb7 天前

Viaduct 1.0 与 Airbnb 数据网格的未来Viaduct 1.0 and the future of Airbnb’s data mesh

从内部工具转向社区驱动、可生产使用的数据网格。作者：Ryan Tanner、Raymie Stata、Adam Miskiewicz 引言我们很高兴宣布 Viaduct 1.0 版发布。此版本标志着 Viaduct 从 Airbnb 内部工具的转变…

Moving from an internal tool to a community-driven, production-ready data mesh. By : Ryan Tanner , Raymie Stata , Adam Miskiewicz Introduction We’re excited to announce the 1.0 release of the Viaduct. This release marks a shift from Viaduct being an Airbnb-internal tool that happ…

Microsoft Research7 天前

GridSFM：面向电网的新型小型基础模型GridSFM: A new, small foundation model for the electric grid

介绍 GridSFM，这是一款能够在毫秒级预测交流最优潮流的小型基础模型，可提升效率并实现成本节约。了解 GridSFM 如何为电网运营商提供对拥堵、稳定性和系统健康的直接可视化。The post GridSFM: A new, s…

Introducing GridSFM, a small foundation model that can predict AC optimal power flow in milliseconds, boosting efficiency and unlocking cost savings. Learn how GridSFM gives grid operators direct visibility into congestion, stability, and system health. The post GridSFM: A new, s…

GitHub7 天前

地下城与桌面：10 款永不消亡的 Roguelike（因为社区不让它们消失）Dungeons & Desktops: 10 roguelikes that never die (because their communities won’t let them)

Roguelike 不会消亡。它们会分叉、变异、被争论、重写、被抛弃，又重新复活。有时甚至同时发生。The post Dungeons & Desktops: 10 roguelikes that never die (because their communities won’t let them) appeared first on The GitHub Blog.

Roguelikes don’t die. They fork, mutate, get argued over, rewritten, abandoned, and revived again. Sometimes all at once. The post Dungeons & Desktops: 10 roguelikes that never die (because their communities won’t let them) appeared first on The GitHub Blog.

Meta7 天前

Reel Friends：构建可扩展至数十亿用户的社交发现Reel Friends: Building Social Discovery that Scales to Billions

表面上，新推出的 Friend Bubbles 功能看起来相当简单。它会突出显示朋友观看并互动的 Reels。但有时看似最直接的功能需要最深入的工程实现。在本期 Meta Tech Podcast 中，Pascal Harti…

On its face the new Friend Bubbles feature looks simple enough. It highlights Reels your friends have watched and reacted to. But sometimes the features that seem the most straightforward require the deepest engineering work. On this episode of the Meta Tech Podcast, Pascal Harti…

Cloudflare7 天前

Browser Run：现已在 Cloudflare Containers 上运行，速度更快且更具可扩展性Browser Run: now running on Cloudflare Containers, it’s faster and more scalable

我们通过在 Cloudflare 的容器之上重建，使 Browser Run 产品实现了更高的使用限制、更快的性能、更好的可靠性和更高的交付速度。以下是具体做法。

We’ve enabled higher usage limits, faster performance, better reliability, and increased shipping velocity for our Browser Run product by rebuilding on top of Cloudflare’s Containers. Here’s how.

OpenAI7 天前

构建安全高效的沙盒以在 Windows 上启用 CodexBuilding a safe, effective sandbox to enable Codex on Windows

了解 OpenAI 如何通过受控的文件访问和网络限制，构建安全高效的沙盒以在 Windows 上启用 Codex

Learn how OpenAI built a safe, effective sandbox to enable Codex on Windows with controlled file access and network limits.

OpenAI8 天前

我们对 TanStack npm 供应链攻击的响应Our response to the TanStack npm supply chain attack

OpenAI 详细说明了对 TanStack “Mini Shai-Hulud” 供应链攻击的响应，概述了为保护系统和签名证书所采取的措施，并解释了为何 macOS 用户必须在 2026 年 6 月 12 日前更新 OpenAI 应用。了解发生了什么、受影响的内容以及 OpenAI 的处理方式…

OpenAI details its response to the TanStack “Mini Shai-Hulud” supply chain attack, outlines protections taken to secure systems and signing certificates, and explains why macOS users must update OpenAI apps by June 12, 2026. Learn what happened, what was affected, and how OpenAI…

5月12日周二2026-05-1212 篇

GitHub8 天前

GitHub Copilot 个人计划：在 Pro 和 Pro+ 中引入灵活配额，并推出全新 Max 计划GitHub Copilot individual plans: Introducing flex allotments in Pro and Pro+, and a new Max plan

自 6 月 1 日起，我们的个人计划阵容将根据您的反馈进行更新。本文《GitHub Copilot 个人计划：在 Pro 和 Pro+ 中引入灵活配额，并推出全新 Max 计划》首次发布于 The GitHub Blog。

Starting June 1, our lineup of individual plans will update based on your feedback. The post GitHub Copilot individual plans: Introducing flex allotments in Pro and Pro+, and a new Max plan appeared first on The GitHub Blog.

AWS8 天前

Amazon Redshift 推出基于 AWS Graviton 的 RG 实例，内置数据湖查询引擎Amazon Redshift introduces AWS Graviton-based RG instances with an integrated data lake query engine

Amazon Redshift RG 实例由 AWS Graviton 提供动力，运行数据仓库和数据湖工作负载的速度比 RA3 实例快最高 2.4 倍，且每 vCPU 成本降低 30%。其集成的数据湖查询引擎支持 Apache Iceberg 等开放表格式。

Amazon Redshift RG instances, powered by AWS Graviton, run data warehouse and data lake workloads up to 2.4x as fast as RA3 instances at 30% lower price per vCPU. Its integrated data lake query engine supports open table formats such as Apache Iceberg.

Pinterest8 天前

工程师提升 AI 技能指南：实现测试流程以优化代理…An Engineer’s Guide to Better AI Skills: Implementing a Testing Process to Optimize Agent…

工程师提升 AI 技能指南：在任何仓库或技能中实现测试流程以优化代理性能作者：Daniel Reed 技术行业正经历工作方式的巨大变革，许多人正享受 AI 代理带来的好处，特别…

An Engineer’s Guide to Better AI Skills: Implementing a Testing Process to Optimize Agent Performance in Any Repository or Skill Author: Daniel Reed The tech industry is currently seeing a massive overhaul in the way we work and many are enjoying the benefits of AI agents, partic…

Meta8 天前

在 Meta 规模下迁移数据摄取系统Migrating Data Ingestion Systems at Meta Scale

Meta 的数据摄取系统被我们的工程团队用于获取社交图谱的最新快照，最近进行了重大改版以提升大规模下的可靠性。从旧系统迁移到新架构需要一次大规模迁移…

Meta’s data ingestion system, which our engineering teams leverage for up-to-date snapshots of the social graph, has recently undergone a significant revamp to enhance its reliability at scale. Moving from our legacy system to our new architecture required a large-scale migration…

OpenAI8 天前

财务团队如何使用 CodexHow finance teams use Codex

了解财务团队如何利用 Codex 从真实工作输入构建 MBR、报告包、差异桥、模型检查和规划情景。

See how finance teams can use Codex to build MBRs, reporting packs, variance bridges, model checks, and planning scenarios from real work inputs.

GitHub8 天前

地下城与桌面：使用 GitHub Copilot CLI 构建程序化生成的 RoguelikeDungeons & Desktops: Building a procedurally generated roguelike with GitHub Copilot CLI

了解一位 Hubber 如何使用 GitHub Copilot CLI 构建扩展，将任意代码库转化为独特的 Roguelike 地牢。文章《Dungeons & Desktops：使用 GitHub Copilot CLI 构建程序化生成的 Roguelike》首次发布于 The GitHub Blog。

Learn how one Hubber used GitHub Copilot CLI to build an extension that turns any codebase into a unique, roguelike dungeon. The post Dungeons & Desktops: Building a procedurally generated roguelike with GitHub Copilot CLI appeared first on The GitHub Blog.

DeepMind8 天前

Co-Scientist：加速研究的多代理 AI 合作伙伴Co-Scientist: A multi-agent AI partner to accelerate research

推出 Co-Scientist，这是一款基于 Gemini 构建的协作式 AI 合作伙伴，帮助研究人员加速科学突破。

Introducing Co-Scientist, a collaborative AI partner built with Gemini to help researchers accelerate scientific breakthroughs.

Microsoft Research8 天前

利用 MatterSim 推进材料领域的 AI：实验合成、更快的模拟以及多任务模型Advancing AI for materials with MatterSim: experimental synthesis, faster simulation, and multi-task models

MatterSim 正在扩展 AI 在材料科学中的应用范围——从更快的大规模模拟到 MatterSim-MT，这一新型多任务模型能够模拟除势能面之外的属性。文章《Advancing AI for materials with MatterSim: experimental synthesis, fa…》

MatterSim is expanding what AI can do for materials science—from faster large-scale simulations to MatterSim-MT, a new multi-task model for simulating properties beyond potential energy surfaces alone. The post Advancing AI for materials with MatterSim: experimental synthesis, fa…

Cloudflare8 天前

当 “idle” 并非空闲时：一次 Linux 内核优化如何演变成 QUIC 漏洞When "idle" isn't idle: how a Linux kernel optimization became a QUIC bug

我们调查了一个 bug：CUBIC 的拥塞窗口被固定在最小值，导致性能急剧下降。解决方案是正确测量空闲期间，以区分 RTT 等待时间和实际的应用空闲状态。

We investigated a bug where CUBIC's congestion window became pinned at its minimum floor, causing a performance to plummet. The fix involved correctly measuring idle periods to distinguish RTT wait times from actual application idleness.

OpenAI9 天前

NVIDIA 工程师和研究人员如何使用 CodexHow NVIDIA engineers and researchers build with Codex

团队使用 Codex 与 GPT-5.5 来交付生产系统，并将研究想法转化为可运行的实验。

Teams use Codex with GPT-5.5 to ship production systems and turn research ideas into runnable experiments.

OpenAI9 天前

AutoScout24 通过 AI 驱动的工作流实现工程规模化AutoScout24 scales engineering with AI-powered workflows

了解 AutoScout24 Group 如何使用 Codex 和 ChatGPT 加速开发周期、提升代码质量并扩大 AI 采纳。

Learn how AutoScout24 Group uses Codex and ChatGPT to speed development cycles, improve code quality, and expand AI adoption.

OpenAI9 天前

Parameter Golf 教会我们关于 AI 辅助研究的内容What Parameter Golf taught us about AI-assisted research

Parameter Golf 汇聚了 1,000 多名参与者和 2,000 多份提交，旨在在严格约束下探索 AI 辅助的机器学习研究、编码代理、量化以及新颖模型设计。

Parameter Golf brought together 1,000+ participants and 2,000+ submissions to explore AI-assisted machine learning research, coding agents, quantization, and novel model design under strict constraints.

5月11日周一2026-05-1112 篇

Netflix9 天前

数据项目：在 Netflix 规模下管理数据资产Data Projects: Managing Data Assets at Netflix Scale

作者：Amer Hesson、Marcelo Mayworm、James Mulcahy 和 Brittany Truong 问题：在 Netflix 规模下管理资产 Netflix 的数据平台规模庞大。我们在数据仓库中拥有数百万张表，数万条计划任务在我们的编排系统中运行……

By Amer Hesson , Marcelo Mayworm , James Mulcahy , and Brittany Truong The Problem: Managing Assets at Netflix Scale Netflix’s Data Platform is vast. We have millions of tables in our data warehouse and tens of thousands of scheduled workloads running across our orchestration sys…

Hugging Face9 天前

在 AWS 上进行基础模型训练与推理的构建模块Building Blocks for Foundation Model Training and Inference on AWS

Microsoft Research9 天前

SocialReasoning-Bench：衡量 AI 代理是否为用户的最佳利益行事SocialReasoning-Bench: Measuring whether AI agents act in users’ best interests

使用 SocialReasoning Bench，我们观察到模型之间出现了稳定的模式——代理执行得很熟练，但即使在明确指示优化用户利益的情况下，也未能始终提升用户的处境。文章 SocialReasoning-Bench：衡量 AI 代理是否为用户的最佳利益行事…

Using SocialReasoning Bench, we observed a stable pattern across models—agents execute competently, but fail to consistently improve the user’s position, even with explicit instructions to optimize for user interest. The post SocialReasoning-Bench: Measuring whether AI agents act…

AWS9 天前

AWS 每周汇总：Amazon Bedrock AgentCore 支付、AWS 代理工具包等（2026 年 5 月 11 日）AWS Weekly Roundup: Amazon Bedrock AgentCore payments, Agent Toolkit for AWS, and more (May 11, 2026)

我上周最激动的消息：Amazon Bedrock AgentCore 预览了首个托管支付功能，使 AI 代理能够自主访问并支付 API、MCP 服务器、网页内容以及其他代理。该功能与 Coinbase 和 Stripe 合作构建，消除了未…

My most exciting news of last week: Amazon Bedrock AgentCore previewed the first managed payment capabilities enabling AI agents to autonomously access and pay for APIs, MCP servers, web content, and other agents. Built in partnership with Coinbase and Stripe, it removes the undi…

Meta9 天前

Labyrinth 1.1：让端到端加密备份更可靠Labyrinth 1.1: Making End-to-End Encrypted Backups Even More Reliable

我们正在推出 Labyrinth 1.1 版，这是一套加密存储系统和协议，用于保护 Messenger 上的消息和历史记录。Labyrinth 1.1 通过全新子协议提升了端到端加密备份的可靠性，使消息在设备丢失或…

We’re rolling out version 1.1 of Labyrinth, the encrypted storage system and protocol that secures messages and history on Messenger. Labyrinth 1.1 enhances the reliability of end-to-end encrypted backups with a new sub-protocol that helps messages survive the loss of a device, a…

GitHub9 天前

GitHub 入门指南：开始参与开源贡献GitHub for Beginners: Getting started with OSS contributions

了解如何寻找为开源社区做出贡献的机会。文章 GitHub 入门指南：开始参与开源贡献首发于 The GitHub Blog。

Learn how to find opportunities to contribute to the open source community. The post GitHub for Beginners: Getting started with OSS contributions appeared first on The GitHub Blog.

OpenAI9 天前

ChatGPT 采用在 2026 年初的扩展How ChatGPT adoption broadened in early 2026

ChatGPT 采用在 2026 年第一季度激增，35 岁以上用户增长最快，性别使用更趋平衡，显示出更广泛的主流 AI 采用。

ChatGPT adoption surged in Q1 2026, with fastest growth among users over 35 and more balanced gender usage, signaling broader mainstream AI adoption.

OpenAI9 天前

OpenAI 校园网络：学生社团兴趣表单OpenAI Campus Network: Student club interest form

加入 OpenAI 校园网络——连接全球学生社团，获取 AI 工具，举办活动，打造 AI 驱动的校园社区。

Join the OpenAI Campus Network—connect student clubs worldwide, access AI tools, host events, and build an AI-powered campus community.

OpenAI9 天前

企业如何扩展 AIHow enterprises are scaling AI

企业扩展 AI 的路径：从早期实验到通过信任、治理、工作流设计以及规模化质量实现复合影响。

How enterprises scale AI: from early experiments to compounding impact through trust, governance, workflow design, and quality at scale.

OpenAI10 天前

OpenAI 推出 DeployCo，帮助企业围绕智能构建OpenAI launches DeployCo to help businesses build around intelligence

OpenAI 推出 DeployCo，这是一家新企业部署公司，旨在帮助组织将前沿 AI 投入生产并转化为可衡量的业务影响。

OpenAI launches DeployCo, a new enterprise deployment company built to help organizations bring frontier AI into production and turn it into measurable business impact.

Stripe10 天前

来自 Sessions 2026 的五条垂直 SaaS 洞见Five vertical SaaS insights from Sessions 2026

AI 正在迫使平台超越纯软件范畴。了解垂直 SaaS 平台如何利用支付、金融服务和代理商务构建更持久的业务。

AI is forcing platforms to expand beyond pure software. See how vertical SaaS platforms are using payments, financial services, and agentic commerce to build more durable businesses.

Apple ML10 天前

BalCapRL：一种用于基于 RL 的 MLLM 图像字幕的平衡框架BalCapRL: A Balanced Framework for RL-Based MLLM Image Captioning

图像字幕是计算机视觉最基础的任务之一。由于其开放式特性，在多模态大语言模型（MLLM）时代受到广泛关注。为了追求更细致、准确的字幕，近期工作已经…

Image captioning is one of the most fundamental tasks in computer vision. Owing to its open-ended nature, it has received significant attention in the era of multimodal large language models (MLLMs). In pursuit of ever more detailed and accurate captions, recent work has increasi…

5月8日周五2026-05-088 篇

Microsoft Research12 天前

大规模构建真实电力输电网数据集：来自开放数据集的流水线Building realistic electric transmission grid dataset at scale: a pipeline from open dataset

微软研究院欣然发布一套基于公开数据的美国电网近似输电拓扑开放数据集。研究输电层级电网行为的能力对现代电力系统研究至关重要。对…的分析

Microsoft Research is excited to release an open dataset of approximate transmission topology of the U.S. power grid derived from publicly available data. The ability to study transmission-level power grid behavior is essential for modern power systems research. Analyses of conge…

Pinterest12 天前

Huiqin Xin | 机器学习工程师 II，Ads Vertical Modeling；Lakshmi Manoharan | 高级机器学习工程师，Ads Vertical Modeling；Karthik Jayasurya | 资深机器学习工程师，Ads Signals；Ziwei Guo | 高级机器学习工程师，Ads Vertical Modeling；Al…

Huiqin Xin | Machine Learning Engineer II, Ads Vertical Modeling; Lakshmi Manoharan | Senior Machine Learning Engineer, Ads Vertical Modeling; Karthik Jayasurya | Staff Machine Learning Engineer, Ads Signals; Ziwei Guo | Senior Machine Learning Engineer, Ads Vertical Modeling; Al…

Netflix12 天前

使用 Nebula ArchRules 扩展 ArchUnitScaling ArchUnit with Nebula ArchRules

作者 John Burns 与 Emily Yuan 引言在 Netflix，我们采用 polyrepo 策略管理数万个 Java 仓库。这意味着我们需要在这些仓库之间共享通用构建逻辑。在 Java Platform 的 JVM Ecosystem 团队中，我们…

By John Burns and Emily Yuan Introduction At Netflix, we operate using a polyrepo strategy with tens of thousands of Java repositories. This means that we need to have ways of sharing common build logic across these repositories. On the JVM Ecosystem team within Java Platform, we…

OpenAI12 天前

在 OpenAI 安全运行 CodexRunning Codex safely at OpenAI

OpenAI 如何通过沙箱、审批、网络策略和原生代理遥测安全运行 Codex，以支持安全合规的编码代理采纳。

How OpenAI runs Codex securely with sandboxing, approvals, network policies, and agent-native telemetry to support safe and compliant coding agent adoption.

Apple ML13 天前

基于多视角捕获的大规模高质量 3D 高斯头部重建Large-Scale High-Quality 3D Gaussian Head Reconstruction from Multi-View Captures

我们提出 HeadsUp，这是一种可扩展的前馈方法，用于从大规模多摄像头设置中重建高质量 3D 高斯头部。我们的方法采用高效的编码器‑解码器架构，将输入视图压缩为紧凑的潜在表示。该潜在…

We propose HeadsUp, a scalable feed-forward method for reconstructing high-quality 3D Gaussian heads from large-scale multi-camera setups. Our method employs an efficient encoder-decoder architecture that compresses input views into a compact latent representation. This latent re…

Apple ML13 天前

Apple 2026 隐私保护机器学习与 AI 研讨会Apple Workshop on Privacy-Preserving Machine Learning & AI 2026

在 Apple，我们认为隐私是基本人权。随着 AI 能力的提升并日益融入人们的日常生活，推进隐私保护技术的研究变得愈发重要，以确保在用户享受创新 AI 的同时，隐私得到保障。该…

At Apple, we believe privacy is a fundamental human right. As AI capabilities increase and become more integrated into people’s daily lives, advancing research in privacy-preserving techniques is increasingly important to ensure privacy is protected while users enjoy innovative A…

Apple ML13 天前

Velox：学习4维几何与外观的表征Velox: Learning Representations of 4D Geometry and Appearance

我们提出一个用于学习4维对象潜在表征的框架，该框架具备描述性，能够忠实捕捉对象的几何形状和外观；压缩性，有助于下游效率；以及可访问性，仅需最少输入，即非结构化的动态点云，…

We introduce a framework for learning latent representations of 4D objects which are descriptive, faithfully capturing object geometry and appearance; compressive, aiding in downstream efficiency; and accessible, requiring minimal input, i.e., an unstructured dynamic point cloud,…

Apple ML13 天前

RVPO：通过方差正则化实现风险敏感对齐RVPO: Risk-Sensitive Alignment via Variance Regularization

当前无critic的RLHF方法通过算术平均聚合多目标奖励，使其容易出现约束忽视：某一目标的高幅度成功可能在数值上抵消其他目标的关键失败（例如安全或格式），掩盖低表现…

Current critic-less RLHF methods aggregate multi-objective rewards via an arithmetic mean, leaving them vulnerable to constraint neglect: high-magnitude success in one objective can numerically offset critical failures in others (e.g., safety or formatting), masking low-performin…

5月7日周四2026-05-0710 篇

Cloudflare13 天前

为未来而建Building for the future

今天下午，我们向全球团队发送了以下邮件。Cloudflare 的核心价值之一是透明，我们认为让你直接听到我们的说明很重要，因为这是 Cloudflare 的重要时刻。

This afternoon, we sent the following email to our global team. One of our core values at Cloudflare is transparency, and we believe it's important that you hear this directly from us because it’s a major moment at Cloudflare.

OpenAI13 天前

使用 GPT-5.5 和 GPT-5.5-Cyber 扩展网络安全可信访问Scaling Trusted Access for Cyber with GPT-5.5 and GPT-5.5-Cyber

OpenAI 使用 GPT-5.5 和 GPT-5.5-Cyber 扩展网络安全可信访问，帮助已验证的防御者加速漏洞研究并保护关键基础设施。

OpenAI expands Trusted Access for Cyber with GPT-5.5 and GPT-5.5-Cyber, helping verified defenders accelerate vulnerability research and protect critical infrastructure.

Cloudflare13 天前

Cloudflare 如何应对 “Copy Fail” Linux 漏洞How Cloudflare responded to the “Copy Fail” Linux vulnerability

当一项关键的 Linux 内核提权漏洞被公开披露时，Cloudflare 的安全和工程团队在全球网络中检测、调查并缓解了该威胁，确认未对客户造成任何影响，也未出现恶意利用。

When a critical Linux kernel privilege escalation was publicly disclosed, Cloudflare's security and engineering teams detected, investigated, and mitigated the threat across our global fleet, confirming zero customer impact and no malicious exploitation.

OpenAI13 天前

Parloa 打造客户想要对话的服务代理Parloa builds service agents customers want to talk to

Parloa 利用 OpenAI 模型为可扩展的语音驱动 AI 客服代理提供动力，使企业能够设计、模拟并部署可靠的实时交互。

Parloa leverages OpenAI models to power scalable, voice-driven AI customer service agents, enabling enterprises to design, simulate, and deploy reliable, real-time interactions.

OpenAI13 天前

通过 API 中的新模型推进语音智能Advancing voice intelligence with new models in the API

探索 OpenAI API 中能够推理、翻译和转录语音的实时新模型，打造更自然、更智能的语音体验。

Explore new realtime voice models in the OpenAI API that can reason, translate, and transcribe speech, enabling more natural and intelligent voice experiences.

OpenAI14 天前

在 ChatGPT 中推出 Trusted ContactIntroducing Trusted Contact in ChatGPT

在 ChatGPT 中推出 Trusted Contact，这是一项可选的安全功能，当检测到严重自残风险时会通知您信任的联系人。

Introducing Trusted Contact in ChatGPT, an optional safety feature that notifies someone you trust if serious self-harm concerns are detected.

美团14 天前

用Agent评测思路管理AI Coding —— 31万行代码AI重构的实践Using an Agent evaluation approach to manage AI coding — practice of AI refactoring 310,000 lines of code

当 90% 以上代码由 AI 生成，决定系统走向的不是谁写得更快，而是约束 AI 的能力。没有统一规范，AI 只会成倍放大混乱。本文基于 31 万行代码重构实践，分享我们如何用 Agent 评测思路管理 AI Coding——通过技术债梳理、建设Rule、重构 SOP 和 Pre-PR 机制，把重构从高成本专项变成随迭代持续推进的日常动作。

When more than 90% of the code is generated by AI, the direction of the system is determined not by who writes faster, but by constraining AI's capabilities. Without unified standards, AI only amplifies chaos. Based on a refactoring effort of 310,000 lines of code, this article shares how we manage AI coding with an Agent evaluation approach—by sorting technical debt, establishing rules, revamping SOPs and a pre‑PR mechanism, turning refactoring from a high‑cost special project into a routine activity continuously advanced with each iteration.

Apple ML14 天前

实用学习图像压缩的关键要素What Matters in Practical Learned Image Compression

相较于硬编码的传统编解码器，学习型编解码器的主要区别之一在于它们能够直接针对人类视觉系统进行优化。尽管具备这种潜力，仍未出现兼具感知效果和实用性的图像编解码器……

One of the major differentiators unlocked by learned codecs relative to their hard-coded traditional counterparts is their ability to be optimized directly to appeal to the human visual system. Despite this potential, a perceptual yet practical image codec is yet to be proposed.…

Apple ML14 天前

文本条件化 JEPA 用于学习语义丰富的视觉表征Text-Conditional JEPA for Learning Semantically Rich Visual Representations

基于图像的联合嵌入预测架构（I‑JEPA）通过掩码特征预测提供了一种有前景的视觉自监督学习方法。然而，由于掩码位置固有的视觉不确定性，特征预测仍具挑战性，且可能会失败……

Image-based Joint-Embedding Predictive Architecture (I-JEPA) offers a promising approach to visual self-supervised learning through masked feature prediction. However with the inherent visual uncertainty at masked positions, feature prediction remains challenging and may fail to…

OpenAI14 天前

在 ChatGPT 中测试广告Testing ads in ChatGPT

OpenAI 开始在 ChatGPT 中测试广告，以支持免费使用，采用明确标识、答案独立、强隐私保护和用户控制等措施。

OpenAI begins testing ads in ChatGPT to support free access, with clear labeling, answer independence, strong privacy protections, and user control.

5月6日周三2026-05-069 篇

Hugging Face14 天前

vLLM V0 到 V1：强化学习中先确保正确性再进行修正vLLM V0 to V1: Correctness Before Corrections in RL

Cloudflare14 天前

当 DNSSEC 出错时：我们如何应对 .de 顶级域中断When DNSSEC goes wrong: how we responded to the .de TLD outage

2026年5月5日，DENIC 发布了损坏的 .de 顶级域的 DNSSEC 签名，导致数百万域名无法访问。以下是 1.1.1.1 观察到的情况、serve stale 如何缓冲冲击以及我们如何恢复解析。

On May 5, 2026, DENIC published broken DNSSEC signatures for the .de TLD, making millions of domains unreachable. Here's what 1.1.1.1 saw, how serve stale cushioned the impact, and how we restored resolution.

AWS14 天前

AWS MCP Server 现已正式发布The AWS MCP Server is now generally available

AWS 宣布 AWS MCP Server 正式上线，这是一款托管的远程模型上下文协议（MCP）服务器，为 AI 代理和编码助手提供对所有 AWS 服务的安全、认证访问。AWS MCP Server 是 AWS 代理工具包的一部分，工具套件…

AWS announces the general availability of the AWS MCP Server, a managed remote Model Context Protocol (MCP) server that gives AI agents and coding assistants secure, authenticated access to all AWS services. The AWS MCP Server is part of the Agent Toolkit for AWS, a suite of tool…

Amazon Science14 天前

在亚马逊的中程网络中应对不确定性Navigating uncertainty in Amazon's middle-mile network

亚马逊的工程师和科学家开发了新工具，在不确定性下优化配送网络，并保持其持续适应，毫不间断。

Amazon engineers and scientists have created new tools to optimize delivery networks under uncertainty — and keep them adapting without missing a beat.

DeepMind14 天前

AlphaEvolve：我们的 Gemini 驱动编码代理如何在各领域扩大影响AlphaEvolve: How our Gemini-powered coding agent is scaling impact across fields

了解 AlphaEvolve 的 Gemini 驱动算法如何在商业、基础设施和科学领域产生影响。

Explore how AlphaEvolve's Gemini-powered algorithms are driving impact across business, infrastructure, and science.

Apple ML15 天前

SpecMD：关于投机专家预取的综合研究SpecMD: A Comprehensive Study on Speculative Expert Prefetching

Mixture-of-Experts (MoE) 模型实现稀疏专家激活，即在每次推理时仅使用模型参数的一个子集。然而，要将这种稀疏性转化为实际性能，需要专家缓存机制。此前的工作已提出…

Mixture-of-Experts (MoE) models enable sparse expert activation, meaning that only a subset of the model’s parameters is used during each inference. However, to translate this sparsity into practical performance, an expert caching mechanism is required. Previous works have propos…

Apple ML15 天前

从事物所在到事物用途：多模态大语言模型空间‑功能智能基准测试From Where Things Are to What They’re For: Benchmarking Spatial–Functional Intelligence for Multimodal LLMs

真正的空间智能超越了低层次的几何感知，从了解事物所在位置演进到理解其用途。虽然现有基准，如 VSI-Bench，能够有效评估这一基础几何阶段，但它们仍然…

True spatial intelligence for multimodal agents transcends low-level geometric perception, evolving from knowing where things are to understanding what they are for. While existing benchmarks, such as VSI-Bench, effectively evaluate this foundational geometric stage, they fall sh…

Apple ML15 天前

迭代去噪的归一化流Normalizing Flows with Iterative Denoising

归一化流（NFs）是一类经典的基于似然的方法，近期重新受到关注。诸如 TARFlow 的最新工作表明，NFs 能在图像建模任务中取得令人满意的性能，成为其他元…的可行替代方案

Normalizing Flows (NFs) are a classical family of likelihood-based methods that have received revived attention. Recent efforts such as TARFlow have shown that NFs are capable of achieving promising performance on image modeling tasks, making them viable alternatives to other met…

Hugging Face15 天前

将 Benchmaxxer Repellant 添加到 Open ASR 排行榜Adding Benchmaxxer Repellant to the Open ASR Leaderboard

5月5日周二2026-05-055 篇

AWS15 天前

现代化工作流：Amazon WorkSpaces 现为 AI 代理提供专属桌面（预览）Modernize your workflows: Amazon WorkSpaces now gives AI agents their own desktop (preview)

Amazon WorkSpaces 现在允许 AI 代理在现有安全框架下，通过 IAM 认证、MCP 支持和计算机视觉，安全运行传统桌面应用——无需 API 或现代化改造

Amazon WorkSpaces now lets AI agents securely operate legacy desktop applications—without APIs or modernization—using IAM authentication, MCP support, and computer vision within existing security frameworks.

Airbnb15 天前

在规模化环境中可靠监控Monitoring reliably at scale

在其他一切失效时仍能工作的监控设计。作者：Abdurrahman J. Allawala 引言当事故发生时，团队依赖可观测性来回答唯一重要的问题：哪里出了问题，为什么？监控系统旨在帮助你回答这些问题…

Designing monitoring that works when everything else doesn’t. By : Abdurrahman J. Allawala Introduction When an incident hits, teams lean on observability to answer the only questions that matter: what’s broken, and why? Monitoring systems are designed to help you answer these qu…

Microsoft Research15 天前

Microsoft参加2026年NSDI：大规模网络系统的进展Microsoft at NSDI 2026: Advances in large-scale networked systems

Microsoft研究人员在NSDI ’26期间分享了构建和运营大规模分布式系统的最新进展，涵盖数据中心、网络以及与AI日益交叉的领域。文章《Microsoft参加2026年NSDI：大规模网络系统的进展》首次发表于Mic…

Microsoft researchers share advances in building and operating large-scale distributed systems, spanning datacenters, networking, and the growing intersection with AI during NSDI ’26. The post Microsoft at NSDI 2026: Advances in large-scale networked systems appeared first on Mic…

Slack15 天前

从SSH到REST：以安全为驱动的Slack EMR数据管道现代化From SSH to REST: A Security-Driven Modernization of Slack’s EMR Data Pipelines

摘录截至2024年，Slack的数据平台已累计超过700个基于SSH的操作员，编排关键数据管道。我们指的是每日搜索索引，处理数TB数据，驱动商业智能的分析作业，整个体系。每一个这些…

Excerpt By 2024, Slack’s data platform had accumulated 700+ SSH-based operators orchestrating critical data pipelines. We’re talking daily search indexing that processed terabytes of data, analytics jobs powering business intelligence, the whole shebang. Every single one of these…

Amazon Science15 天前

机制设计理论如何帮助优化亚马逊与供应商的协作How mechanism design theory helps optimize Amazon-vendor collaboration

代理机制使亚马逊和供应商能够在不泄露私人信息的前提下优化供应链管理

Agentic mechanism enables Amazon and vendors to optimize supply chain management without disclosing private information.

5月4日周一2026-05-043 篇

AWS16 天前

AWS 每周汇总：AWS 2026 的下一步、Amazon Quick、OpenAI 合作等（2026年5月4日）AWS Weekly Roundup: What’s Next with AWS 2026, Amazon Quick, OpenAI partnership, and more (May 4, 2026)

上周，我在英格兰约克短暂休息，这座城市常被称为全国最闹鬼的城市。我漫步于已有近千年历史的修道院遗址，沿着中世纪城墙行走，并在一次鬼魂之旅的晚上聆听流传的故事…

Last week, I took some time off in York, England, often described as the most haunted city in the country. I wandered through the ruins of abbeys that have stood for nearly a thousand years, walked along medieval walls, and spent an evening on a ghost tour hearing stories passed…

Netflix16 天前

在 Netflix 实现机器学习民主化：构建模型生命周期图Democratizing Machine Learning at Netflix: Building the Model Lifecycle Graph

Saish Sali, Nipun Kumar, Sura Elamurugu 引言随着 Netflix 的发展，机器学习持续支持我们为会员提供价值并在业务多个领域推动卓越。当 Netflix 在十多年前开始投资机器学习时…

Saish Sali , Nipun Kumar , Sura Elamurugu Introduction As Netflix has grown, machine learning continues to support our ability to deliver value to members and drive excellence across multiple areas of our business. When Netflix began investing in machine learning over a decade ag…

Amazon Science16 天前

在 AI 中构建信任Building trust into AI

亚马逊的科学家和政策专家讨论公司负责 AI 流程如何在整个 AI 开发生命周期中嵌入安全性和价值观。

Amazon scientists and policy experts discuss how the company’s responsible-AI pipeline embeds safety and values throughout the AI development lifecycle.

5月1日周五2026-05-017 篇

Cloudflare19 天前

代码橙色：Fail Small 已完成。结果是更强大的 Cloudflare 网络Code Orange: Fail Small is complete. The result is a stronger Cloudflare network

我们已完成一次大规模的工程工作，使我们的基础设施更具弹性。通过 Snapstone 和 Engineering Codex 等新工具，我们实现了更安全的配置更改并自动化最佳实践，以防止未来的事故。

We have completed a massive engineering effort to make our infrastructure more resilient. Through new tools like Snapstone and the Engineering Codex, we've implemented safer configuration changes and automated best practices to prevent future incidents.

Netflix19 天前

模型服务中的路由现状State of Routing in Model Serving

作者：Nipun Kumar、Rajat Shah、Peter Chng 引言这是多篇系列博客的第一篇，分享我们在机器学习模型服务基础设施方面的技术洞见，说明该基础设施如何在多个领域（例如标题推荐…）大规模支持个性化体验

By Nipun Kumar , Rajat Shah , Peter Chng Introduction This is the first blog post in a multi-part series that shares technical insights into how our ML model serving infrastructure powers several personalized experiences at scale across various domains (e.g., title recommendation…

Google Research19 天前

通过全球合作伙伴关系和开放资源加速科学影响Catalyzing scientific impact through global partnerships and open resources

数据挖掘与建模

Data Mining & Modeling

Pinterest19 天前

优化机器学习工作负载网络效率（第一部分）：特征裁剪器Optimizing ML Workload Network Efficiency (Part I): Feature Trimmer

Guangtong Bai | Staff Software Engineer, Product ML Infrastructure*; Shantam Shorewala | Software Engineer II, Product ML Infrastructure*; Chi Zhang | Staff Software Engineer, AI Platform*; Neha Upadhyay | Software Engineer II, AI Platform*; Haoyang Li | Director, Product ML Infr…

Meta19 天前

Meta 如何加强端到端加密备份How Meta Is Strengthening End-to-End Encrypted Backups

基于 HSM 的备份密钥库 Meta 的基于 HSM 的备份密钥库为 WhatsApp 和 Messenger 的端到端加密备份提供基础。该系统允许用户使用恢复码保护其备份的消息历史，确保恢复码被存…

The HSM-based Backup Key Vault Meta’s HSM-based Backup Key Vault provides the foundation for end-to-end encrypted backups for WhatsApp and Messenger. The system allows people to protect their backed-up message history with a recovery code, ensuring that the recovery code is store…

Spotify19 天前

使用 Claude 代码插件构建 Spotify Ads API 的自然语言接口Building a Natural Language Interface to the Spotify Ads API with Claude Code Plugins

将 OpenAPI 规范和 Markdown 文件转化为对话式广告管理工具——无需编译代码。文章《使用 Claude 代码插件构建 Spotify Ads API 的自然语言接口》首次发表于 Spotify Engineering。

Turning OpenAPI spec and Markdown files into a conversational ads management tool — no compiled code required. The post Building a Natural Language Interface to the Spotify Ads API with Claude Code Plugins appeared first on Spotify Engineering.

Cloudflare19 天前

推出 Dynamic Workflows：随租户而动的持久执行Introducing Dynamic Workflows: durable execution that follows the tenant

Dynamic Workflows 是一个库，可让您即时将持久执行路由到租户提供的代码。基于 Dynamic Workers 构建，它使平台能够以几乎为零的空闲成本服务数百万个独特工作流。

Dynamic Workflows is a library that lets you route durable execution to tenant-provided code on the fly. Built on Dynamic Workers, it enables platforms to serve millions of unique workflows at near-zero idle cost.