Hacker News | Arcee Trinity Mini Inference Benchmarks on Nvidia H200

We ran inference benchmarks for arcee-ai/trinity-mini on Nvidia H200 using our DeployPad inference stack and published the full results.

Key results:

Mean tokens per second: ~114.5 Mean time to first token: 0.74 s

Under batch load, P99 tokens per second reached ~134.8.

The full benchmark report, raw statistics, and methodology are available here: https://github.com/geoddllc/large-llm-inference-benchmarks/b...

Support for larger models (400B class) is planned for next week. If you want to try it yourself, you can deploy via the console https://console.geodd.io/

Arcee Trinity Mini Inference Benchmarks on Nvidia H200(geodd.io)