Code

cookbook/models/vllm/basic_stream.py
from agno.agent import Agent
from agno.models.vllm import vLLM

agent = Agent(
    model=vLLM(id="Qwen/Qwen2.5-7B-Instruct", top_k=20, enable_thinking=False),
    markdown=True,
)
agent.print_response("Share a 2 sentence horror story", stream=True)

Usage

1

创建虚拟环境

打开 Terminal 并创建一个 python 虚拟环境。

python3 -m venv .venv
source .venv/bin/activate
2

安装依赖库

pip install -U agno openai vllm
3

启动 vLLM 服务器

vllm serve Qwen/Qwen2.5-7B-Instruct \
    --enable-auto-tool-choice \
    --tool-call-parser hermes \
    --dtype float16 \
    --max-model-len 8192 \
    --gpu-memory-utilization 0.9
4

运行 Agent

python cookbook/models/vllm/basic_stream.py