Pytorch - 模型断点续训，optimizer.step()报错：RuntimeError Expected all tensors to be on the same device, but found cuda:0

Pytorch – 模型断点续训，optimizer.step()报错：RuntimeError Expected all tensors to be on the same device, but found cuda:0

StubbornHuang Pytorch 发布于2023-05-08 阅读 687次 0次评论 0次点赞本文共1105个字，阅读需要3分钟。

1 模型断点续训，optimizer.step()报错：RuntimeError Expected all tensors to be on the same device, but found cuda:0

Pytroch在实现断点续训功能时，在保存模型文件时，需要同时保存model、optimizer、lr_scheduler的state_dict，比如

torch.save({
    'epoch': epoch,
    'model_state_dict': self.model.state_dict(),
    'optimizer_state_dict': self.optimizer.state_dict(),
    'scheduler_state_dict': self.lr_scheduler.state_dict(),
}, model_save_path)

然后在加载模型时，除了加载模型的权重之外，还需要同时加载optimizer和lr_scheduler的权重，比如

model_weights = modified_weights(check_point_state_dict['model_state_dict'])
optimizer.load_state_dict(check_point_state_dict["optimizer_state_dict"])
lr_scheduler.load_state_dict(check_point_state_dict["scheduler_state_dict"])

这个时候比较容易犯的错误是，optimizer默认是在cpu上加载权重的，而我们之后继续训练模型时都是在GPU上进行了，所以如果optimizer没有任何修改，则会出在optimizer.step()执行时出现

RuntimeError: Expected all tensors to be on the same device, but found cuda:0

其实际上就是optimizer的权重没有在GPU上，所以解决方法就是将optimizer的权重转移到GPU上，示例代码如下

optimizer.load_state_dict(check_point_state_dict["optimizer_state_dict"])
for state in optimizer.state.values():
    for k, v in state.items():
        if isinstance(v, torch.Tensor):
            state[k] = v.to(self.output_device)

其中self.output_device就是项目中的GPU索引号。

修改完成之后，错误解决。

联系我

资助我们

随机推荐

资源分享 – Physics Modeling for Game Programmers 英文PDF下载

左右手坐标系与旋转正向

语音识别的RTF和RTX评价指标

OpenGL地球与太阳绕转代码

资源分享 – OpenGL SuperBible – Comprehensive Tutorial and Reference, Fifth Edition OpenGL蓝宝书第5版英文PDF下载

资源分享 – Vulkan学习指南 , Learning Vulkan 中文版PDF下载

最新评论

Pytorch – 模型断点续训，optimizer.step()报错：RuntimeError Expected all tensors to be on the same device, but found cuda:0

1 模型断点续训，optimizer.step()报错：RuntimeError Expected all tensors to be on the same device, but found cuda:0

发表评论点击这里取消回复。

联系我

资助我们

随机推荐

资源分享 – Physics Modeling for Game Programmers 英文PDF下载

左右手坐标系与旋转正向

语音识别的RTF和RTX评价指标

OpenGL地球与太阳绕转代码

资源分享 – OpenGL SuperBible – Comprehensive Tutorial and Reference, Fifth Edition OpenGL蓝宝书第5版英文PDF下载

资源分享 – Vulkan学习指南 , Learning Vulkan 中文版PDF下载

最新评论

Pytorch – 模型断点续训，optimizer.step()报错：RuntimeError Expected all tensors to be on the same device, but found cuda:0

1 模型断点续训，optimizer.step()报错：RuntimeError Expected all tensors to be on the same device, but found cuda:0

发表评论 点击这里取消回复。

大家都在搜

关注我们的公众号

发表评论点击这里取消回复。