< 返回版块

shenjinti 发表于 2026-01-13 13:21

Tags:rust,voice-agent,webrtc,sip

https://github.com/restsend/active-call 欢迎Star

先给大家看看效果: Playbook demo

这次最重要的改进,就是不再依赖onnxruntime处理Silero VAD.

重新手写了Silero VAD的推理(纯Rust实现),性能有了非常高的提升 (大概2.5x提升),并且内存占用非常的低:

VAD Engine Implementation Time (60s) RTF (Ratio) Note
TinySilero Rust (Optimized) ~60.0 ms 0.0010 >2.5x faster than ONNX
ONNX Silero ONNX Runtime ~158.3 ms 0.0026 Standard baseline
WebRTC VAD C/C++ (Bind) ~3.1 ms 0.00005 Legacy, less accurate

这次还开始引入一个全新的playbook,过去的websocket api过于原始,大家想体验一个voice agent的流程会比较麻烦,干脆给大家提供一个参考方案,可以基于markdown进行扩展:

---
asr:
  provider: "aliyun"
llm:
  provider: "aliyun"
  model: "qwen-turbo"
tts:
  provider: "aliyun"
vad:
  provider: "silero"
denoise: true
greeting: "您好,我是您的AI助理,请问有什么可以帮您?"
interruption: "both"
recorder:
  recorderFile: "hello_{id}.wav"
---
# Role and Purpose
You are an intelligent, polite AI assistant. Your goal is to help users with their inquiries efficiently.

# Tool Usage
- When the user expresses a desire to end the conversation (e.g., "goodbye", "hang up", "I'm done"), you MUST provide a polite closing statement AND call the `hangup` tool.
- Always include your response text in the `text` field and any tool calls in the `tools` array.

# Example Response for Hanging Up:
json
{
  "text": "很高兴能为您服务,如果您还有其他问题,欢迎随时联系。再见!",
  "tools": [{"name": "hangup"}]
}

---

并且保留了我们最大的技术特色: 唯一一个内置SIP支持的Voice-Agent SDK, 可以直接对接sip 网关,实现Ai与电话的对接。

并且把LLM的流式输出也实现了,可以边思考边讲话

相比Pipecat/Livekit这些Framework, active-call更加专注于语音通话,最核心的就是性能很好, 2C4G很轻松支持200路并发,持续稳定运行个把月没问题

当然还是保留了最基础的Websocket API,允许通过Websocket来控制一个通话,这个当然是比较底层的实现,也是开发者最喜欢的方案。

从通话性能数据来看,现在已经能很便捷的和Ai聊天了 (延迟基本上在800ms以内)


Ext Link: https://github.com/restsend/active-call/

评论区

写评论

还没有评论

1 共 0 条评论, 1 页