1 Mediapipe HolisticTracking

Mediapipe Holistic Tracking可对身体,面部和手部的共543个关键点进行实时跟踪检测,其中包含面部468个关节点,身体33个关键点,以及每只手分别21个关键点。

Mediapipe – 将Mediapipe HolisticTracking封装成动态链接库dll/so,实现在桌面应用中嵌入全身关节点识别、手势识别、抬手放手检测识别功能-StubbornHuang Blog

所有关键点的存储顺序以及标注对照索引图可参考我的另一篇文章:https://www.stubbornhuang.com/1916/

自己封装dll的测试程序:

Mediapipe – 将Mediapipe HolisticTracking封装成动态链接库dll/so,实现在桌面应用中嵌入全身关节点识别、手势识别、抬手放手检测识别功能-StubbornHuang Blog

1.1 封装Mediapipe Holistic Tracking的目的

之前尝试对Mediapipe的手部跟踪功能HandTracking进行了动态的封装,本质是想基于HandTracking做手势识别和手臂的抬手放手检测,HandTracking做手势识别没有任何问题,但是在做手臂的抬手放手检测的过程中主要是比较手腕的关节点在视频帧高度的位置,这就导致了基于HandTracking所做的手臂的抬手放手检测不是很准确。

针对于这个问题,以及前期封装HandTracking功能的经验,如果可以封装Mediapipe的HolisticTracking功能,直接对全身的关节点进行检测,那么做手臂的抬手放手检测只需要比较手腕与手肘的y坐标就可以,这种方式无疑是最准确的。

2 Mediapipe C++ Windows下编译环境搭建

可参考我的文章:

进行Mediapipe Windows C++环境编译。

如果能跑通上述的编译流程,则说明整个编译环境搭建没有问题,可以阅读下面的步骤了。

3 Mediapipe HolisticTracking封装

dll相关代码以及编译文件以及测试程序在Github上开源:

欢迎大家star!

该工程所需要的bazel编译文件以及代码文件位于仓库dll/holistic_tracking_dll文件夹下,封装的dll的测试工程位于dll_use_example的MediapipeTest.sln工程下。

3.1 dll接口设计

dll接口设计如下:

#ifndef HOLISTIC_TRACKING_API_H
#define HOLISTIC_TRACKING_API_H

#define EXPORT

/* 定义动态链接库dll的导出 */
#include <malloc.h>
#ifdef _WIN32
#ifdef EXPORT
#define EXPORT_API __declspec(dllexport)
#else
#define EXPORT_API __declspec(dllimport)
#endif
#else
#include <stdlib.h>

#ifdef EXPORT
#define EXPORT_API __attribute__((visibility ("default")))
#else
#endif

#endif


#ifdef __cplusplus
extern "C" {
#endif 

#ifndef EXPORT_API
#define EXPORT_API
#endif

    /*
    @brief 初始化Google Mediapipe
    @param[in] model_path 需要加载的模型路径
    @return 返回操作成功或者失败
        0 失败
        1 成功
    */
    EXPORT_API int MediapipeHolisticTrackingInit(const char* model_path);

    /*
    @brief 检测视频帧
    @param[in] image_width 视频帧宽度
    @param[in] image_height 视频帧高度
    @param[in] image_data 视频帧数据
    @param[in] show_result_image 是否显示结果图片
    @param[out] gesture_result - 手势识别结果
    @return 返回操作成功或者失败
        0 失败
        1 成功
    */
    EXPORT_API int MediapipeHolisticTrackingDetectFrameDirect(int image_width, int image_height, void* image_data, int* detect_result, bool show_result_image = false);

    /*
    @brief 检测摄像头
    @param[in] show_image 是否显示结果图片
    @return 返回操作成功或者失败
    0 失败
    1 成功
    */
    EXPORT_API int MediapipeHolisticTrackingDetectCamera(bool show_image = false);
    /*
    @brief Google Mediapipe释放
    @return 返回操作成功或者失败
        0 失败
        1 成功
    */
    EXPORT_API int MediapipeHolisticTrackingRelease();


#ifdef __cplusplus
}
#endif 

#endif // !HOLISTIC_TRACKING_API_H

3.2 dll调用流程

首先通过MediapipeHolisticTrackingInit接口函数传递holistic的模型文件初始化holistic graph。

然后直接调用MediapipeHolisticTrackingDetectFrameDirect接口函数传递需要检测的OpenCV视频帧数据进行检测,并通过给定的int数组指针进行检测结果返回。如果需要各个部分的关节点的识别坐标可参考项目代码增加新的检测接口返回坐标数据。

或者使用MediapipeHolisticTrackingDetectCamera接口函数直接在dll内部用OpenCV开启摄像头进行holistic跟踪检测,当然此函数仅用于测试功能。

最后,在所有视频帧处理完成之后或者该视频处理完成之后通过MediapipeHolisticTrackingRelease函数释放内存。

3.3 封装流程

3.3.1 如何确定Graph的数据输入输出流

针对holistic tracking在cpu上的的跟踪功能,可在Mediapipe的官方仓库的mediapipe\mediapipe\graphs\holistic_tracking文件夹下找到holistic_tracking_cpu.pbtxt文件,

该文件的内容如下:

# Tracks and renders pose + hands + face landmarks.

# CPU image. (ImageFrame)
input_stream: "input_video"

# CPU image with rendered results. (ImageFrame)
output_stream: "output_video"

# Throttles the images flowing downstream for flow control. It passes through
# the very first incoming image unaltered, and waits for downstream nodes
# (calculators and subgraphs) in the graph to finish their tasks before it
# passes through another image. All images that come in while waiting are
# dropped, limiting the number of in-flight images in most part of the graph to
# 1. This prevents the downstream nodes from queuing up incoming images and data
# excessively, which leads to increased latency and memory usage, unwanted in
# real-time mobile applications. It also eliminates unnecessarily computation,
# e.g., the output produced by a node may get dropped downstream if the
# subsequent nodes are still busy processing previous inputs.
node {
  calculator: "FlowLimiterCalculator"
  input_stream: "input_video"
  input_stream: "FINISHED:output_video"
  input_stream_info: {
    tag_index: "FINISHED"
    back_edge: true
  }
  output_stream: "throttled_input_video"
  node_options: {
    [type.googleapis.com/mediapipe.FlowLimiterCalculatorOptions] {
      max_in_flight: 1
      max_in_queue: 1
      # Timeout is disabled (set to 0) as first frame processing can take more
      # than 1 second.
      in_flight_timeout: 0
    }
  }
}

node {
  calculator: "HolisticLandmarkCpu"
  input_stream: "IMAGE:throttled_input_video"
  output_stream: "POSE_LANDMARKS:pose_landmarks"
  output_stream: "POSE_ROI:pose_roi"
  output_stream: "POSE_DETECTION:pose_detection"
  output_stream: "FACE_LANDMARKS:face_landmarks"
  output_stream: "LEFT_HAND_LANDMARKS:left_hand_landmarks"
  output_stream: "RIGHT_HAND_LANDMARKS:right_hand_landmarks"
}

# Gets image size.
node {
  calculator: "ImagePropertiesCalculator"
  input_stream: "IMAGE:throttled_input_video"
  output_stream: "SIZE:image_size"
}

# Converts pose, hands and face landmarks to a render data vector.
node {
  calculator: "HolisticTrackingToRenderData"
  input_stream: "IMAGE_SIZE:image_size"
  input_stream: "POSE_LANDMARKS:pose_landmarks"
  input_stream: "POSE_ROI:pose_roi"
  input_stream: "LEFT_HAND_LANDMARKS:left_hand_landmarks"
  input_stream: "RIGHT_HAND_LANDMARKS:right_hand_landmarks"
  input_stream: "FACE_LANDMARKS:face_landmarks"
  output_stream: "RENDER_DATA_VECTOR:render_data_vector"
}

# Draws annotations and overlays them on top of the input images.
node {
  calculator: "AnnotationOverlayCalculator"
  input_stream: "IMAGE:throttled_input_video"
  input_stream: "VECTOR:render_data_vector"
  output_stream: "IMAGE:output_video"
}

从holistic_tracking_cpu.pbtxt的文件中的第3-7行中发现该Graph的原始输入流为:input_video,最后输出流为:output_video。然后在该文件的第39-48行中有以下内容:

node {
  calculator: "HolisticLandmarkCpu"
  input_stream: "IMAGE:throttled_input_video"
  output_stream: "POSE_LANDMARKS:pose_landmarks"
  output_stream: "POSE_ROI:pose_roi"
  output_stream: "POSE_DETECTION:pose_detection"
  output_stream: "FACE_LANDMARKS:face_landmarks"
  output_stream: "LEFT_HAND_LANDMARKS:left_hand_landmarks"
  output_stream: "RIGHT_HAND_LANDMARKS:right_hand_landmarks"
}

在这个节点中就表明了我们所需要的POSE_LANDMARKS、FACE_LANDMARKS、LEFT_HAND_LANDMARKS、RIGHT_HAND_LANDMARKS关于身体、面部以及手部关键点的位置,有了这些信息我们在初始化holistic graph的时候就可以将我们需要的输出流添加到graph中。

3.3.2 代码实现

3.3.2.1 holistic tracking关键点检测类

HolisticTrackingDetect.h

#ifndef HOLISTIC_TRACKING_DETECT_H
#define HOLISTIC_TRACKING_DETECT_H

#include <cstdlib>
#include "absl/flags/flag.h"
#include "absl/flags/parse.h"
#include "mediapipe/framework/calculator_framework.h"
#include "mediapipe/framework/formats/image_frame.h"
#include "mediapipe/framework/formats/image_frame_opencv.h"
#include "mediapipe/framework/port/file_helpers.h"
#include "mediapipe/framework/port/opencv_highgui_inc.h"
#include "mediapipe/framework/port/opencv_imgproc_inc.h"
#include "mediapipe/framework/port/opencv_video_inc.h"
#include "mediapipe/framework/port/parse_text_proto.h"
#include "mediapipe/framework/port/status.h"

#include "mediapipe/framework/formats/detection.pb.h"
#include "mediapipe/framework/formats/landmark.pb.h"
#include "mediapipe/framework/formats/rect.pb.h"

namespace GoogleMediapipeDetect {

    class HolisticTrackingDetect
    {
    public:
        HolisticTrackingDetect();
        virtual~HolisticTrackingDetect();

    public:
        int InitModel(const char* model_path);
        int DetectImageDirect(int image_width, int image_height, void* image_data ,int* detect_result,bool show_result_image = false);
        int DetectCamera(bool show_image = false);
        int Release();

    private:
        absl::Status Mediapipe_InitGraph(const char* model_path);
        absl::Status Mediapipe_RunMPPGraph_Direct(int image_width, int image_height, void* image_data, int* detect_result, bool show_result_image = false);
        absl::Status Mediapipe_RunMPPGraph_Camera(bool show_image = false);
        absl::Status Mediapipe_ReleaseGraph();

    private:
        bool m_bIsInit;
        bool m_bIsRelease;

        mediapipe::CalculatorGraph m_Graph;

        const char* m_Video_InputStreamName;

        const char* m_Video_OutputStreamName;
        const char* m_PoseLandmarks_OutputStreamName;
        const char* m_LeftHandLandmarks_OutputStreamName;
        const char* m_RightHandLandmarks_OutputStreamName;
        const char* m_FaceLandmarks_OutputStreamName;


        std::unique_ptr<mediapipe::OutputStreamPoller> m_pVideoPoller;
        std::unique_ptr<mediapipe::OutputStreamPoller> m_pPoseLandmarksPoller;
        std::unique_ptr<mediapipe::OutputStreamPoller> m_pLeftHandLandmarksPoller;
        std::unique_ptr<mediapipe::OutputStreamPoller> m_pRightHandLandmarksPoller;
        std::unique_ptr<mediapipe::OutputStreamPoller> m_pFaceLandmarksPoller;
    };
}

#endif // !HOLISTIC_TRACKING_DETECT_H

HolisticTrackingDetect.cpp

#include <vector>

#include "HolisticTrackingDetect.h"
#include "GestureRecognition.h"
#include "ArmUpAndDownRecognition.h"

GoogleMediapipeDetect::HolisticTrackingDetect::HolisticTrackingDetect()
{
    m_bIsInit = false;
    m_bIsRelease = false;

    m_Video_InputStreamName = "input_video";

    m_Video_OutputStreamName = "output_video";
    m_PoseLandmarks_OutputStreamName = "pose_landmarks";
    m_LeftHandLandmarks_OutputStreamName = "left_hand_landmarks";
    m_RightHandLandmarks_OutputStreamName = "right_hand_landmarks";
    m_FaceLandmarks_OutputStreamName = "face_landmarks";

    m_pVideoPoller = nullptr;
    m_pPoseLandmarksPoller = nullptr;
    m_pLeftHandLandmarksPoller = nullptr;
    m_pRightHandLandmarksPoller = nullptr;
    m_pFaceLandmarksPoller = nullptr;
}

GoogleMediapipeDetect::HolisticTrackingDetect::~HolisticTrackingDetect()
{
    if (m_bIsInit && !m_bIsRelease)
    {
        Release();
    }
}

int GoogleMediapipeDetect::HolisticTrackingDetect::InitModel(const char* model_path)
{
    absl::Status run_status = Mediapipe_InitGraph(model_path);
    if (!run_status.ok())
        return 0;
    m_bIsInit = true;
    return  1;
}

int GoogleMediapipeDetect::HolisticTrackingDetect::DetectImageDirect(int image_width, int image_height, void* image_data, int* detect_result, bool show_result_image)
{
    if (!m_bIsInit)
        return 0;

    absl::Status run_status = Mediapipe_RunMPPGraph_Direct(image_width, image_height, image_data, detect_result,show_result_image);
    if (!run_status.ok()) {
        return 0;
    }
    return 1;
}

int GoogleMediapipeDetect::HolisticTrackingDetect::DetectCamera(bool show_image)
{
    if (!m_bIsInit)
        return 0;
    absl::Status run_status = Mediapipe_RunMPPGraph_Camera(show_image);
    if (!run_status.ok()) {
        return 0;
    }
    return 1;

}

int GoogleMediapipeDetect::HolisticTrackingDetect::Release()
{
    absl::Status run_status = Mediapipe_ReleaseGraph();
    if (!run_status.ok()) {
        return 0;
    }
    m_bIsRelease = true;
    return 1;
}

absl::Status GoogleMediapipeDetect::HolisticTrackingDetect::Mediapipe_InitGraph(const char* model_path)
{
    std::string calculator_graph_config_contents;
    MP_RETURN_IF_ERROR(mediapipe::file::GetContents(model_path, &calculator_graph_config_contents));
    std::cout << "mediapipe::file::GetContents success" << std::endl;

    mediapipe::CalculatorGraphConfig config =
        mediapipe::ParseTextProtoOrDie<mediapipe::CalculatorGraphConfig>(
            calculator_graph_config_contents);

    MP_RETURN_IF_ERROR(m_Graph.Initialize(config));
    std::cout << "m_Graph.Initialize(config) success" << std::endl;

    // 1 视频输出
    auto videoOutputStream = m_Graph.AddOutputStreamPoller(m_Video_OutputStreamName);
    assert(videoOutputStream.ok());
    m_pVideoPoller = std::make_unique<mediapipe::OutputStreamPoller>(std::move(videoOutputStream.value()));

    // 2 PoseLandmarks输出
    mediapipe::StatusOrPoller poseLandmarks = m_Graph.AddOutputStreamPoller(m_PoseLandmarks_OutputStreamName);
    assert(poseLandmarks.ok());
    m_pPoseLandmarksPoller = std::make_unique<mediapipe::OutputStreamPoller>(std::move(poseLandmarks.value()));

    // 3 LeftHandLandmarks输出
    mediapipe::StatusOrPoller leftHandLandmarks = m_Graph.AddOutputStreamPoller(m_LeftHandLandmarks_OutputStreamName);
    assert(leftHandLandmarks.ok());
    m_pLeftHandLandmarksPoller = std::make_unique<mediapipe::OutputStreamPoller>(std::move(leftHandLandmarks.value()));

    // 4 RightHandLandmarks输出
    mediapipe::StatusOrPoller rightHandLandmarks = m_Graph.AddOutputStreamPoller(m_RightHandLandmarks_OutputStreamName);
    assert(rightHandLandmarks.ok());
    m_pRightHandLandmarksPoller = std::make_unique<mediapipe::OutputStreamPoller>(std::move(rightHandLandmarks.value()));

    // 5 FaceLandmarks输出
    mediapipe::StatusOrPoller faceLandmarks = m_Graph.AddOutputStreamPoller(m_FaceLandmarks_OutputStreamName);
    assert(faceLandmarks.ok());
    m_pFaceLandmarksPoller = std::make_unique<mediapipe::OutputStreamPoller>(std::move(faceLandmarks.value()));

    MP_RETURN_IF_ERROR(m_Graph.StartRun({}));
    std::cout << "----------------Graph StartRun Success---------------------" << std::endl;
    return absl::OkStatus();
}

absl::Status GoogleMediapipeDetect::HolisticTrackingDetect::Mediapipe_RunMPPGraph_Direct(int image_width, int image_height, void* image_data, int* detect_result, bool show_result_image)
{
    /*----- 1 构造cv::Mat对象 -----*/
    cv::Mat camera_frame(cv::Size(image_width, image_height), CV_8UC3, (uchar*)image_data);
    cv::cvtColor(camera_frame, camera_frame, cv::COLOR_BGR2RGB);
    // 水平翻转输入图像
    cv::flip(camera_frame, camera_frame, 1);
    //std::cout << "cv::Mat对象构建完成" << std::endl;

    /*----- 2 将OpenCV Mat转换为ImageFrame -----*/
    auto input_frame = absl::make_unique<mediapipe::ImageFrame>(
        mediapipe::ImageFormat::SRGB, camera_frame.cols, camera_frame.rows,
        mediapipe::ImageFrame::kDefaultAlignmentBoundary);
    cv::Mat input_frame_mat = mediapipe::formats::MatView(input_frame.get());
    camera_frame.copyTo(input_frame_mat);
    //std::cout << "将OpenCV Mat转换为ImageFrame完成" << std::endl;

    /*----- 3 发送图片到图中推理 -----*/
    size_t frame_timestamp_us =
        (double)cv::getTickCount() / (double)cv::getTickFrequency() * 1e6;

    MP_RETURN_IF_ERROR(m_Graph.AddPacketToInputStream(
        m_Video_InputStreamName, mediapipe::Adopt(input_frame.release())
        .At(mediapipe::Timestamp(frame_timestamp_us))));
    //std::cout << "发送图片到图中推理完成" << std::endl;

    /*----- 4 得到结果 -----*/

    // 1 视频输出结果帧
    mediapipe::Packet packet;
    if (!m_pVideoPoller->Next(&packet))
    {
        return absl::InvalidArgumentError("no next packet");
    }
    if (show_result_image)
    {
        // 从视频输出获取mediapipe::ImageFrame结果
        auto& output_frame = packet.Get<mediapipe::ImageFrame>();

        // 转换mediapipe::ImageFrame为cv::Mat
        cv::Mat output_frame_mat = mediapipe::formats::MatView(&output_frame);

        // 显示cv::Mat结果
        cv::cvtColor(output_frame_mat, output_frame_mat, cv::COLOR_RGB2BGR);
        cv::Mat dst;
        cv::resize(output_frame_mat, dst, cv::Size(output_frame_mat.cols, output_frame_mat.rows));
        cv::imshow("MediapipeHolistic", dst);
        cv::waitKey(1);
    }

    // 2 PoseLandmarks
    mediapipe::Packet poseeLandmarksPacket;
    int left_arm_result = (int)ArmUpDown::NoResult;
    int right_arm_result = (int)ArmUpDown::NoResult;
    if (m_pPoseLandmarksPoller->QueueSize() != 0)
    {
        if (m_pPoseLandmarksPoller->Next(&poseeLandmarksPacket))
        {
            auto& output_landmarks = poseeLandmarksPacket.Get<mediapipe::NormalizedLandmarkList>();
            //std::cout << "PoseLandmarks size:" << output_landmarks.landmark_size() << std::endl;

            std::vector<Point2D> posePoints;
            posePoints.clear();

            for (int i = 0; i < output_landmarks.landmark_size(); ++i)
            {
                Point2D tempPoint2D;
                const mediapipe::NormalizedLandmark landmark = output_landmarks.landmark(i);
                float x = landmark.x() * camera_frame.cols;
                float y = landmark.y() * camera_frame.rows;
                tempPoint2D.x = x;
                tempPoint2D.y = y;

                posePoints.emplace_back(tempPoint2D);
            }

            ArmUpAndDownRecognition armUpAndDownRecognition;
            armUpAndDownRecognition.RecognizeProcess(posePoints,left_arm_result,right_arm_result);
            //std::cout << "手臂抬手放手识别结果:" << poseDetectResult << std::endl;
        }
    }
    detect_result[0] = left_arm_result;
    detect_result[1] = right_arm_result;

    // 3 LeftHandLandmarks
    mediapipe::Packet leftHandLandmarksPacket;
    int leftHandDetectResult = Gesture::NoGesture;
    if (m_pLeftHandLandmarksPoller->QueueSize() > 0)
    {
        if (m_pLeftHandLandmarksPoller->Next(&leftHandLandmarksPacket))
        {
            auto& output_landmarks = leftHandLandmarksPacket.Get<mediapipe::NormalizedLandmarkList>();
            //std::cout << "LeftHandLandmarks size:" << output_landmarks.landmark_size() << std::endl;

            std::vector<Point2D> singleGesturePoints;
            singleGesturePoints.clear();

            for (int i = 0; i < output_landmarks.landmark_size(); ++i)
            {
                Point2D tempPoint2D;
                const mediapipe::NormalizedLandmark landmark = output_landmarks.landmark(i);
                float x = landmark.x() * camera_frame.cols;
                float y = landmark.y() * camera_frame.rows;
                tempPoint2D.x = x;
                tempPoint2D.y = y;

                singleGesturePoints.emplace_back(tempPoint2D);
            }

            GestureRecognition gestureRecognition;
            leftHandDetectResult = gestureRecognition.RecognizeProcess(singleGesturePoints);
            //std::cout << "左手手势识别结果:" << leftHandDetectResult << std::endl;
        }
    }
    detect_result[2] = leftHandDetectResult;

    // 4 RightHandLandmarks
    mediapipe::Packet rightHandLandmarksPacket;
    int rightHandDetectResult = Gesture::NoGesture;
    if (m_pRightHandLandmarksPoller->QueueSize() > 0)
    {
        if (m_pRightHandLandmarksPoller->Next(&rightHandLandmarksPacket))
        {
            auto& output_landmarks = rightHandLandmarksPacket.Get<mediapipe::NormalizedLandmarkList>();
            //std::cout << "RightHandLandmarks size:" << output_landmarks.landmark_size() << std::endl;

            std::vector<Point2D> singleGesturePoints;
            singleGesturePoints.clear();

            for (int i = 0; i < output_landmarks.landmark_size(); ++i)
            {
                Point2D tempPoint2D;
                const mediapipe::NormalizedLandmark landmark = output_landmarks.landmark(i);
                float x = landmark.x() * camera_frame.cols;
                float y = landmark.y() * camera_frame.rows;
                tempPoint2D.x = x;
                tempPoint2D.y = y;

                singleGesturePoints.emplace_back(tempPoint2D);
            }

            GestureRecognition gestureRecognition;
            rightHandDetectResult = gestureRecognition.RecognizeProcess(singleGesturePoints);
            //std::cout << "右手手势识别结果:" << rightHandDetectResult << std::endl;

        }
    }
    detect_result[3] = rightHandDetectResult;

    // 4 FaceLandmarks
    //mediapipe::Packet faceLandmarksPacket;
    //if (m_pFaceLandmarksPoller->QueueSize() > 0)
    //{
    //  if (m_pFaceLandmarksPoller->Next(&faceLandmarksPacket))
    //  {
    //      auto& output_landmarks = faceLandmarksPacket.Get<mediapipe::NormalizedLandmarkList>();
    //      std::cout << "FaceLandmarks size:" << output_landmarks.landmark_size() << std::endl;

    //      for (int i = 0; i < output_landmarks.landmark_size(); ++i)
    //      {
    //          const mediapipe::NormalizedLandmark landmark = output_landmarks.landmark(i);
    //          float x = landmark.x() * camera_frame.cols;
    //          float y = landmark.y() * camera_frame.rows;
    //      }
    //  }
    //}

    return absl::OkStatus();
}


absl::Status GoogleMediapipeDetect::HolisticTrackingDetect::Mediapipe_RunMPPGraph_Camera(bool show_image)
{
    std::string cvWindowName = "MediapipeHolistic";

    // 打开OpenCV摄像头
    cv::VideoCapture capture(0);
    if (!capture.isOpened())
    {
        return absl::InvalidArgumentError("cv camera is not open");
    }

    bool grab_frames = true;
    while (grab_frames) {

        // 从摄像头抓取视频帧
        cv::Mat camera_frame_raw;
        capture >> camera_frame_raw;
        if (camera_frame_raw.empty())
            break;

        cv::Mat camera_frame;
        cv::cvtColor(camera_frame_raw, camera_frame, cv::COLOR_BGR2RGB);
        cv::flip(camera_frame, camera_frame, 1);

        // 将OpenCV Mat转换为ImageFrame
        auto input_frame = absl::make_unique<mediapipe::ImageFrame>(
            mediapipe::ImageFormat::SRGB, camera_frame.cols, camera_frame.rows,
            mediapipe::ImageFrame::kDefaultAlignmentBoundary);
        cv::Mat input_frame_mat = mediapipe::formats::MatView(input_frame.get());
        camera_frame.copyTo(input_frame_mat);

        // 发送图片到图中推理
        size_t frame_timestamp_us =
            (double)cv::getTickCount() / (double)cv::getTickFrequency() * 1e6;

        MP_RETURN_IF_ERROR(m_Graph.AddPacketToInputStream(
            m_Video_InputStreamName, mediapipe::Adopt(input_frame.release())
            .At(mediapipe::Timestamp(frame_timestamp_us))));

        // 获取推理结果
        mediapipe::Packet packet;
        if (!m_pVideoPoller->Next(&packet)) break;

        if (show_image)
        {
            // 从视频输出获取mediapipe::ImageFrame结果
            auto& output_frame = packet.Get<mediapipe::ImageFrame>();

            // 转换mediapipe::ImageFrame为cv::Mat
            cv::Mat output_frame_mat = mediapipe::formats::MatView(&output_frame);

            // 显示cv::Mat结果
            cv::cvtColor(output_frame_mat, output_frame_mat, cv::COLOR_RGB2BGR);
            cv::Mat dst;
            cv::resize(output_frame_mat, dst, cv::Size(output_frame_mat.cols / 2, output_frame_mat.rows / 2));
            cv::imshow(cvWindowName, dst);
            cv::waitKey(1);
        }
    }
    if (show_image)
        cv::destroyWindow(cvWindowName);

    return absl::OkStatus();
}

absl::Status GoogleMediapipeDetect::HolisticTrackingDetect::Mediapipe_ReleaseGraph()
{
    MP_RETURN_IF_ERROR(m_Graph.CloseInputStream(m_Video_InputStreamName));
    MP_RETURN_IF_ERROR(m_Graph.CloseInputStream(m_Video_OutputStreamName));
    MP_RETURN_IF_ERROR(m_Graph.CloseInputStream(m_PoseLandmarks_OutputStreamName));
    MP_RETURN_IF_ERROR(m_Graph.CloseInputStream(m_LeftHandLandmarks_OutputStreamName));
    MP_RETURN_IF_ERROR(m_Graph.CloseInputStream(m_RightHandLandmarks_OutputStreamName));
    MP_RETURN_IF_ERROR(m_Graph.CloseInputStream(m_FaceLandmarks_OutputStreamName));

    return m_Graph.WaitUntilDone();
}

仔细查看上述检测类的代码,根据3.3.1节中pbtxt文件的内容,在初始化graph的Mediapipe_InitGraph函数中为graph增加了我们所需要的输入流,然后在检测函数Mediapipe_RunMPPGraph_Direct和Mediapipe_RunMPPGraph_Camera函数中对这些数据数据进行了具体的操作。

3.3.2.2 手势检测类

GestureRecognition.h

#ifndef GESTURE_RECOGNITION_H
#define GESTURE_RECOGNITION_H

#include <vector>
#include "GestureRecognition.h"
#include "TrackingDataStructure.h"

namespace GoogleMediapipeDetect {

    class GestureRecognition
    {
    public:
        GestureRecognition();
        virtual~GestureRecognition();

    public:
        int RecognizeProcess(const std::vector<Point2D>& single_hand_joint_points);

    private:
        float Vector2DAngle(const Vector2D& vec1, const Vector2D& vec2);
    };
}

#endif // !GESTURE_RECOGNITION_H

GestureRecognition.cpp

#include "GestureRecognition.h"

GoogleMediapipeDetect::GestureRecognition::GestureRecognition()
{

}

GoogleMediapipeDetect::GestureRecognition::~GestureRecognition()
{

}

int GoogleMediapipeDetect::GestureRecognition::RecognizeProcess(const std::vector<Point2D>& single_hand_joint_points)
{
    if (single_hand_joint_points.size() != 21)
        return Gesture::NoGesture;

    // 大拇指角度
    Vector2D thumb_vec1;
    thumb_vec1.x = single_hand_joint_points[0].x - single_hand_joint_points[2].x;
    thumb_vec1.y = single_hand_joint_points[0].y - single_hand_joint_points[2].y;

    Vector2D thumb_vec2;
    thumb_vec2.x = single_hand_joint_points[3].x - single_hand_joint_points[4].x;
    thumb_vec2.y = single_hand_joint_points[3].y - single_hand_joint_points[4].y;

    float thumb_angle = Vector2DAngle(thumb_vec1, thumb_vec2);
    //std::cout << "thumb_angle = " << thumb_angle << std::endl;
    //std::cout << "thumb.y = " << single_hand_joint_vector[0].y << std::endl;


    // 食指角度
    Vector2D index_vec1;
    index_vec1.x = single_hand_joint_points[0].x - single_hand_joint_points[6].x;
    index_vec1.y = single_hand_joint_points[0].y - single_hand_joint_points[6].y;

    Vector2D index_vec2;
    index_vec2.x = single_hand_joint_points[7].x - single_hand_joint_points[8].x;
    index_vec2.y = single_hand_joint_points[7].y - single_hand_joint_points[8].y;

    float index_angle = Vector2DAngle(index_vec1, index_vec2);
    //std::cout << "index_angle = " << index_angle << std::endl;


    // 中指角度
    Vector2D middle_vec1;
    middle_vec1.x = single_hand_joint_points[0].x - single_hand_joint_points[10].x;
    middle_vec1.y = single_hand_joint_points[0].y - single_hand_joint_points[10].y;

    Vector2D middle_vec2;
    middle_vec2.x = single_hand_joint_points[11].x - single_hand_joint_points[12].x;
    middle_vec2.y = single_hand_joint_points[11].y - single_hand_joint_points[12].y;

    float middle_angle = Vector2DAngle(middle_vec1, middle_vec2);
    //std::cout << "middle_angle = " << middle_angle << std::endl;


    // 无名指角度
    Vector2D ring_vec1;
    ring_vec1.x = single_hand_joint_points[0].x - single_hand_joint_points[14].x;
    ring_vec1.y = single_hand_joint_points[0].y - single_hand_joint_points[14].y;

    Vector2D ring_vec2;
    ring_vec2.x = single_hand_joint_points[15].x - single_hand_joint_points[16].x;
    ring_vec2.y = single_hand_joint_points[15].y - single_hand_joint_points[16].y;

    float ring_angle = Vector2DAngle(ring_vec1, ring_vec2);
    //std::cout << "ring_angle = " << ring_angle << std::endl;

    // 小拇指角度
    Vector2D pink_vec1;
    pink_vec1.x = single_hand_joint_points[0].x - single_hand_joint_points[18].x;
    pink_vec1.y = single_hand_joint_points[0].y - single_hand_joint_points[18].y;

    Vector2D pink_vec2;
    pink_vec2.x = single_hand_joint_points[19].x - single_hand_joint_points[20].x;
    pink_vec2.y = single_hand_joint_points[19].y - single_hand_joint_points[20].y;

    float pink_angle = Vector2DAngle(pink_vec1, pink_vec2);
    //std::cout << "pink_angle = " << pink_angle << std::endl;


    // 根据角度判断手势
    float angle_threshold = 65;
    float thumb_angle_threshold = 40;

    int result = -1;
    if ((thumb_angle > thumb_angle_threshold) && (index_angle > angle_threshold) && (middle_angle > angle_threshold) && (ring_angle > angle_threshold) && (pink_angle > angle_threshold))
        result = Gesture::Fist;
    else if ((thumb_angle > 5) && (index_angle < angle_threshold) && (middle_angle > angle_threshold) && (ring_angle > angle_threshold) && (pink_angle > angle_threshold))
        result = Gesture::One;
    else if ((thumb_angle > thumb_angle_threshold) && (index_angle < angle_threshold) && (middle_angle < angle_threshold) && (ring_angle > angle_threshold) && (pink_angle > angle_threshold))
        result = Gesture::Two;
    else if ((thumb_angle > thumb_angle_threshold) && (index_angle < angle_threshold) && (middle_angle < angle_threshold) && (ring_angle < angle_threshold) && (pink_angle > angle_threshold))
        result = Gesture::Three;
    else if ((thumb_angle > thumb_angle_threshold) && (index_angle < angle_threshold) && (middle_angle < angle_threshold) && (ring_angle < angle_threshold) && (pink_angle < angle_threshold))
        result = Gesture::Four;
    else if ((thumb_angle < thumb_angle_threshold) && (index_angle < angle_threshold) && (middle_angle < angle_threshold) && (ring_angle < angle_threshold) && (pink_angle < angle_threshold))
        result = Gesture::Five;
    else if ((thumb_angle < thumb_angle_threshold) && (index_angle > angle_threshold) && (middle_angle > angle_threshold) && (ring_angle > angle_threshold) && (pink_angle < angle_threshold))
        result = Gesture::Six;
    else if ((thumb_angle < thumb_angle_threshold) && (index_angle > angle_threshold) && (middle_angle > angle_threshold) && (ring_angle > angle_threshold) && (pink_angle > angle_threshold))
        result = Gesture::ThumbUp;
    else if ((thumb_angle > 5) && (index_angle > angle_threshold) && (middle_angle < angle_threshold) && (ring_angle < angle_threshold) && (pink_angle < angle_threshold))
        result = Gesture::Ok;
    else
        result = Gesture::NoGesture;

    return result;
}

float GoogleMediapipeDetect::GestureRecognition::Vector2DAngle(const Vector2D& vec1, const Vector2D& vec2)
{
    double PI = 3.141592653;
    float t = (vec1.x * vec2.x + vec1.y * vec2.y) / (sqrt(pow(vec1.x, 2) + pow(vec1.y, 2)) * sqrt(pow(vec2.x, 2) + pow(vec2.y, 2)));
    float angle = acos(t) * (180 / PI);
    return angle;
}
3.3.2.3 手臂抬起放下检测类

ArmUpAndDownRecognition.h

#ifndef ARM_UP_AND_DOWN_RECOGNITION_H
#define ARM_UP_AND_DOWN_RECOGNITION_H

#include <vector>

#include "TrackingDataStructure.h"

namespace GoogleMediapipeDetect {
    class ArmUpAndDownRecognition 
    {
    public:
        ArmUpAndDownRecognition();
        virtual~ArmUpAndDownRecognition();

    public:
        bool RecognizeProcess(const std::vector<Point2D>& pose_joint_points,int& left_arm_result,int& right_arm_result);
    };
}

#endif // !ARM_UP_AND_DOWN_RECOGNITION_H

ArmUpAndDownRecognition.cpp

#include "ArmUpAndDownRecognition.h"

GoogleMediapipeDetect::ArmUpAndDownRecognition::ArmUpAndDownRecognition()
{

}

GoogleMediapipeDetect::ArmUpAndDownRecognition::~ArmUpAndDownRecognition()
{

}

bool GoogleMediapipeDetect::ArmUpAndDownRecognition::RecognizeProcess(const std::vector<Point2D>& pose_joint_points, int& left_arm_result, int& right_arm_result)
{
    if (pose_joint_points.size() != 33)
        return false;

    Point2D left_elbow = pose_joint_points[13];
    Point2D right_elbow = pose_joint_points[14];

    Point2D left_wrist = pose_joint_points[15];
    Point2D right_wrist = pose_joint_points[16];

    // 检测左手
    if (left_wrist.y > left_elbow.y)
    {
        left_arm_result = (int)ArmUpDown::ArmDown;
    }
    else if (left_wrist.y < left_elbow.y)
    {
        left_arm_result = (int)ArmUpDown::ArmUp;
    }
    else
    {
        left_arm_result = (int)ArmUpDown::NoResult;
    }

    // 检测右手
    if (right_wrist.y > left_elbow.y)
    {
        right_arm_result = ArmUpDown::ArmDown;
    }
    else if (right_wrist.y < left_elbow.y)
    {
        right_arm_result = ArmUpDown::ArmUp;
    }
    else
    {
        right_arm_result = ArmUpDown::NoResult;
    }

    return true;
}

其他的项目相关类参考Github项目。

4 项目在bazel中的编译与调试

4.1 编译

4.1.1 编译文件位置与bazel编译文件

在Mediapipe仓库目录mediapipe\mediapipe\examples\desktop下新建一个新的文件夹,命名为holistic_tracking_dll。

将Github项目的dll/holistic_tracking_dll下的所有文件拷贝到mediapipe\mediapipe\examples\desktop\holistic_tracking_dll文件夹下。

其中编译该项目的bazel的编译文件如下:

# Copyright 2020 The MediaPipe Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

licenses(["notice"])

package(default_visibility = ["//mediapipe/examples:__subpackages__"])

cc_binary(
    name = "holistic_tracking_cpu",
    deps = [
        "//mediapipe/examples/desktop:demo_run_graph_main",
        "//mediapipe/graphs/holistic_tracking:holistic_tracking_cpu_graph_deps",
    ],
)

cc_binary(
    name = "MediapipeHolisticTracking",
    srcs = ["HolisticTrackingApi.h","HolisticTrackingApi.cpp","HolisticTrackingDetect.h","HolisticTrackingDetect.cpp","GestureRecognition.h","GestureRecognition.cpp","TrackingDataStructure.h","ArmUpAndDownRecognition.h","ArmUpAndDownRecognition.cpp"],
    linkshared=True,
    deps = [
        "//mediapipe/graphs/holistic_tracking:holistic_tracking_cpu_graph_deps",
    ],
)


# Linux only
cc_binary(
    name = "holistic_tracking_gpu",
    deps = [
        "//mediapipe/examples/desktop:demo_run_graph_main_gpu",
        "//mediapipe/graphs/holistic_tracking:holistic_tracking_gpu_deps",
    ],
)

其中,

cc_binary(
    name = "MediapipeHolisticTracking",
    srcs = ["HolisticTrackingApi.h","HolisticTrackingApi.cpp","HolisticTrackingDetect.h","HolisticTrackingDetect.cpp","GestureRecognition.h","GestureRecognition.cpp","TrackingDataStructure.h","ArmUpAndDownRecognition.h","ArmUpAndDownRecognition.cpp"],
    linkshared=True,
    deps = [
        "//mediapipe/graphs/holistic_tracking:holistic_tracking_cpu_graph_deps",
    ],
)

为该项目的编译配置。

4.1.2 编译

4.1.2.1 Release模式

示例命令:

bazel build -c opt --define MEDIAPIPE_DISABLE_GPU=1 --action_env PYTHON_BIN_PATH="D:\\Anaconda\\python.exe" mediapipe/examples/desktop/holistic_tracking_dll:MediapipeHolisticTracking --verbose_failures
4.1.2.2 Debug可调试模式

示例命令:

bazel build -c dbg --define MEDIAPIPE_DISABLE_GPU=1 --action_env PYTHON_BIN_PATH="D:\\Anaconda\\python.exe" mediapipe/examples/desktop/holistic_tracking_dll:MediapipeHolisticTracking --verbose_failures

Debug模式下编译的dll包含了可调试符号,可直接在Visual Studio中下断点进行调试。

4.1.2.3 编译输出目录

如果编译一切顺利,将会在mediapipe\bazel-bin\mediapipe\examples\desktop\holistic_tracking_dll目录下生成MediapipeHolisticTracking.dll文件。

4.2 调试

在本节中,教大家如何进行调试此想要调试此动态库。

首先需要使用4.1.2.2的Debug模式进行编译,编译完成之后会在mediapipe\bazel-bin\mediapipe\examples\desktop\holistic_tracking_dll下生成MediapipeHolisticTracking.dll文件与MediapipeHolisticTracking.PDB文件,看到PDB文件是不是很亲切啊?

然后将MediapipeHolisticTracking.dll文件与MediapipeHolisticTracking.PDB文件拷贝到调用dll的目录下,然后在代码文件中下断点,使用调试模式即可命中断点,赶紧愉快的进行调试吧。