Pytorch – 模型断点续训，optimizer.step()报错：RuntimeError Expected all tensors to be on the same device, but found cuda:0-StubbornHuang Blog

Pytorch – 模型断点续训，optimizer.step()报错：RuntimeError Expected all tensors to be on the same device, but found cuda:0

StubbornHuang Pytorch 发布于2023-05-08 阅读 965次 0次评论 0次点赞本文共1105个字，阅读需要3分钟。

1 模型断点续训，optimizer.step()报错：RuntimeError Expected all tensors to be on the same device, but found cuda:0

Pytroch在实现断点续训功能时，在保存模型文件时，需要同时保存model、optimizer、lr_scheduler的state_dict，比如

torch.save({
    'epoch': epoch,
    'model_state_dict': self.model.state_dict(),
    'optimizer_state_dict': self.optimizer.state_dict(),
    'scheduler_state_dict': self.lr_scheduler.state_dict(),
}, model_save_path)

然后在加载模型时，除了加载模型的权重之外，还需要同时加载optimizer和lr_scheduler的权重，比如

model_weights = modified_weights(check_point_state_dict['model_state_dict'])
optimizer.load_state_dict(check_point_state_dict["optimizer_state_dict"])
lr_scheduler.load_state_dict(check_point_state_dict["scheduler_state_dict"])

这个时候比较容易犯的错误是，optimizer默认是在cpu上加载权重的，而我们之后继续训练模型时都是在GPU上进行了，所以如果optimizer没有任何修改，则会出在optimizer.step()执行时出现

RuntimeError: Expected all tensors to be on the same device, but found cuda:0

其实际上就是optimizer的权重没有在GPU上，所以解决方法就是将optimizer的权重转移到GPU上，示例代码如下

optimizer.load_state_dict(check_point_state_dict["optimizer_state_dict"])
for state in optimizer.state.values():
    for k, v in state.items():
        if isinstance(v, torch.Tensor):
            state[k] = v.to(self.output_device)

其中self.output_device就是项目中的GPU索引号。

修改完成之后，错误解决。

联系我

资助我们

随机推荐

C++ – 数据库连接和操作第三方库整理

人工智能 – YOLO v3,YOLO v4,YOLO v5等版本演变史

Duilib – 设置窗体阴影

计算机图形学 – PBR纹理中不同贴图的作用和意义

Python – 爬取直播吧首页重要赛事赛程信息

资源分享 – OpenGL编程指南（原书第7版）- OpenGL红宝书中文PDF下载

最新评论

Pytorch – 模型断点续训，optimizer.step()报错：RuntimeError Expected all tensors to be on the same device, but found cuda:0

1 模型断点续训，optimizer.step()报错：RuntimeError Expected all tensors to be on the same device, but found cuda:0

发表评论点击这里取消回复。

联系我

资助我们

随机推荐

C++ – 数据库连接和操作第三方库整理

人工智能 – YOLO v3,YOLO v4,YOLO v5等版本演变史

Duilib – 设置窗体阴影

计算机图形学 – PBR纹理中不同贴图的作用和意义

Python – 爬取直播吧首页重要赛事赛程信息

资源分享 – OpenGL编程指南（原书第7版）- OpenGL红宝书 中文PDF下载

最新评论

Pytorch – 模型断点续训，optimizer.step()报错：RuntimeError Expected all tensors to be on the same device, but found cuda:0

1 模型断点续训，optimizer.step()报错：RuntimeError Expected all tensors to be on the same device, but found cuda:0

发表评论 点击这里取消回复。

大家都在搜

关注我们的公众号

资源分享 – OpenGL编程指南（原书第7版）- OpenGL红宝书中文PDF下载

发表评论点击这里取消回复。