1 TensorRT多线程并发推理方案

TensorRT在对模型推理速度已经有了非常大的提升了,那如果能够基于TensorRT做并行推理,既可以有效降低推理延迟,也能增加服务吞吐量,那岂不是酷毙了?

那么能用TensorRT做多线程并发吗?

我们看看TensorRT的官方开发者文档怎么说:

In general, TensorRT objects are not thread safe; accesses to an object from different threads must be serialized by the client.

The expected runtime concurrency model is that different threads will operate on different execution contexts. The context contains the state of the network (activation values, and so on) during execution, so using a context concurrently in different threads results in undefined behavior.

To support this model, the following operations are thread safe:

  • Nonmodifying operations on a runtime or engine.
  • Deserializing an engine from a TensorRT runtime.
  • Creating an execution context from an engine.
  • Registering and deregistering plug-ins.

There are no thread-safety issues with using multiple builders in different threads; however, the builder uses timing to determine the fastest kernel for the parameters provided, and using multiple builders with the same GPU will perturb the timing and TensorRT’s ability to construct optimal engines. There are no such issues using multiple threads to build with different GPUs.

上面的第一句话清楚的提到,TensorRT不是线程安全的,需要我们自己管理不同线程之间的对象访问。

此文章剩余74%被隐藏需要付费查看,内容查看价格3小饼子立即购买,VIP免费
支付前请仔细阅读以下说明,如支付代表您了解并同意了以下说明:
(1)资源收集自互联网,仅供自我学习,请在下载后24小时内删除该资源,如下载者将此资源用于其他非法用途,本站不承担任何法律责任;如有侵权,请立即联系我,马上删除!
(2)下载单个资源则点击立即下载或者立即购买按钮;本站VIP可下载本站所有资源。
(3)请不要使用手机以及电脑浏览器的无痕模式进行支付操作,以免造成支付成功但未显示下载链接。
(4)如遇支付问题或者资源失效问题请点击按钮点击反馈进行反馈或者发送说明邮件到stubbornhuang@qq.com