3D虚拟人口唇同步 – 可以输出音素的TTS语音合成接口
1 3D虚拟人口唇同步为什么需要音素信息
音素是语音中最小的区分单位,而Viseme(视位),是说话时音素对应的视觉描述,定义了一个人说话时嘴巴以及面部的位置,每一个视素都描述了一组特定音素对应的面部姿态和口唇形状。视素和音素之前不存在一一对应的关系,而是多对一的关系,通常多个音素对应一个视素,因为一些音素发音时其面部和嘴唇位置形似。
在常规方案3D虚拟人口唇同步方案中,常规通过获取语音合成过程中音素的时间戳信息,通过音素时间戳信息计算每一个音素的开始和结束时间,进一步计算每一个视位的开始和结束时间,而视位则可以通过blendshape来表示,所以如果语音合成能伴随输出音素时间戳信息,则对后续3D虚拟人口唇同步非常有用。
2 可输出音素时间戳信息的TTS接口
2.1 微软语音合成
微软文本转语音服务中的神经网络TTS,可以将输入文本或者SSML(语音合成标记语言)转换为语音,同时可以附带输出viseme ID、2D Scalable Vector Graphics (SVG) 权重、3D blendshapes 权重。
相关文档可参考:https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-speech-synthesis-viseme?tabs=visemeid&pivots=programming-language-cpp
通过附带输出的Viseme ID和Blendshape权重我们就可以使用这些信息进行虚拟人口唇同步。
{
"FrameIndex":0,
"BlendShapes":[
[0.021,0.321,...,0.258],
[0.045,0.234,...,0.288],
...
]
}
2.2 阿里语音合成
阿里语音合成也支持在输出音频流的同时,可输出每个汉字/英文单词在音频中的时间位置,即时间戳,时间戳功能又叫字级别音素边界接口。该时间信息可用于驱动虚拟人口型、做视频配音字幕等。
相关文档可参考:https://help.aliyun.com/zh/isi/developer-reference/timestamp-feature?spm=a2c4g.11186623.0.i3
输出结果示例:
//今天
{
"subtitles":[
{"begin_index":0,"begin_time":0,"end_index":1,"end_time":100,
"phoneme_list":[
{"index":0, "begin_time":0, "end_time": 20, "phoneme":"hh", "tone":0},
{"index":1, "begin_time":20, "end_time": 40, "phoneme":"ah", "tone":0},
{"index":1, "begin_time":40, "end_time": 60, "phoneme":"l", "tone":1},
{"index":1, "begin_time":60, "end_time": 100, "phoneme":"ow", "tone":1}
],
"text":"hello", "phoneme":"hh ah l ow"},
{"begin_index":1,"begin_time":100,"end_index":2,"end_time":200,
"phoneme_list":[
{"index":0, "begin_time":100, "end_time": 150, "phoneme":"j_c", "tone":1},
{"index":1, "begin_time":150, "end_time": 200, "phoneme":"in_c", "tone":1}
],
"text":"今", "phoneme":"j_c in_c"},
{"begin_index":1,"begin_time":200,"end_index":2,"end_time":400,
"phoneme_list":[
{"index":0, "begin_time":200, "end_time": 300, "phoneme":"t_c", "tone":1},
{"index":1, "begin_time":300, "end_time": 400, "phoneme":"ian_c", "tone":1}
],"text":"天", "phoneme":"t_c ian_c"}
]
}
2.3 腾讯语音合成
腾讯语音合成也提供音素时间戳,相关文档如下:https://cloud.tencent.com/document/product/1073/57374
2.4 火山引擎语音合成
相关文档:https://www.volcengine.com/docs/6561/79823
输出结果示例:
{
"reqid": "reqid",
"code": 3000,
"operation": "query",
"message": "Success",
"sequence": -1,
"data": "base64 encoded binary data",
"addition": {
"description": "...",
"duration": "1960",
"frontend": "{
"words": [{
"word": "字",
"start_time": 0.025,
"end_time": 0.185
},
...
{
"word": "。",
"start_time": 1.85,
"end_time": 1.955
}],
"phonemes": [{
"phone": "C0z",
"start_time": 0.025,
"end_time": 0.105
},
...
{
"phone": "。",
"start_time": 1.85,
"end_time": 1.955
}]
}"
}
}
2.5 百度语音合成
百度语音合成只支持词级别时间戳信息,相关文档:https://ai.baidu.com/ai-doc/SPEECH/ulbxh8rbu
输出示例如下:
{
"log_id": 16739423288701914,
"tasks_info": [
{
"task_status": "Success",
"task_result": {
"speech_url": "http://bj.bcebos.com/aipe-speech/text_to_speech/2023-01-17/63c6550e52064d000104da0d/speech/0.mp3?authorization=bce-auth-v1%2F8a6ca9b78c124d89bb6bca18c6fc5944%2F2023-01-17T07%3A58%3A12Z%2F259200%2F%2Fbb3f38b53425ced397a107aebe21d2e951ed0e27a964f39c2a350249ba07b47c",
"speech_timestamp": {
"sentences": [
{ "paragraph_index": 0,
"sentence_texts": "今年上半年我国工业经济面临的内外部环境还是比较严峻复杂的",
"begin_time": 104,
"end_time": 5970,
"characters": [
{
"character_text": "今",
"begin_time": 106,
"end_time": 313
},
{
"character_text": "年",
"begin_time": 316,
"end_time": 522
},
{
"character_text": "上",
"begin_time": 525,
"end_time": 732
},
{
"character_text": "半",
"begin_time": 735,
"end_time": 941
},
{
"character_text": "年",
"begin_time": 944,
"end_time": 1151
},
{
"character_text": "我",
"begin_time": 1154,
"end_time": 1360
},
{
"character_text": "国",
"begin_time": 1363,
"end_time": 1570
},
{
"character_text": "工",
"begin_time": 1573,
"end_time": 1779
},
{
"character_text": "业",
"begin_time": 1782,
"end_time": 1989
},
{
"character_text": "经",
"begin_time": 1992,
"end_time": 2198
},
{
"character_text": "济",
"begin_time": 2201,
"end_time": 2408
},
{
"character_text": "面",
"begin_time": 2411,
"end_time": 2617
},
{
"character_text": "临",
"begin_time": 2620,
"end_time": 2827
},
{
"character_text": "的",
"begin_time": 2830,
"end_time": 3036
},
{
"character_text": "内",
"begin_time": 3039,
"end_time": 3246
},
{
"character_text": "外",
"begin_time": 3249,
"end_time": 3455
},
{
"character_text": "部",
"begin_time": 3458,
"end_time": 3664
},
{
"character_text": "环",
"begin_time": 3667,
"end_time": 3874
},
{
"character_text": "境",
"begin_time": 3877,
"end_time": 4083
},
{
"character_text": "还",
"begin_time": 4086,
"end_time": 4293
},
{
"character_text": "是",
"begin_time": 4296,
"end_time": 4502
},
{
"character_text": "比",
"begin_time": 4505,
"end_time": 4712
},
{
"character_text": "较",
"begin_time": 4715,
"end_time": 4921
},
{
"character_text": "严",
"begin_time": 4924,
"end_time": 5131
},
{
"character_text": "峻",
"begin_time": 5134,
"end_time": 5340
},
{
"character_text": "复",
"begin_time": 5343,
"end_time": 5550
},
{
"character_text": "杂",
"begin_time": 5553,
"end_time": 5759
},
{
"character_text": "的",
"begin_time": 5762,
"end_time": 5969
}
]
}
]
}
},
"task_id": "63c6550e52064d000104da0d"
}
]
}
2.6 云知声语音合成
相关文档:https://ai.unisound.com/doc/ttslong/WebAPI.html
输出示例:
[{
"sentence": "你好",
"phoneme": [
{
"end": 208.98,
"phone": "sil",
"start": 0,
"type": 8
},
{
"end": 278.639,
"phone": "n",
"start": 208.98,
"type": 1
},
{
"end": 417.959,
"phone": "i",
"start": 278.639,
"type": 4
},
{
"end": 557.279,
"phone": "h",
"start": 417.959,
"type": 1
},
{
"end": 801.088,
"phone": "ao",
"start": 557.279,
"type": 4
},
{
"end": 1001.361,
"phone": "sil",
"start": 801.088,
"type": 8
}
]
}]
本文作者:StubbornHuang
版权声明:本文为站长原创文章,如果转载请注明原文链接!
原文标题:3D虚拟人口唇同步 – 可以输出音素的TTS语音合成接口
原文链接:https://www.stubbornhuang.com/3068/
发布于:2024年08月30日 18:02:48
修改于:2024年08月30日 18:18:44
声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。
评论
50