UniMRCP Custom Development Q&A

QA
Since my future work is completely unrelated to the call center, this article may be the last one about UniMRCP. I met some problems during UniMRCP plugin development process, here are the solutions to them.

How to Configure UniMRCP Server Startup Options ?

Sometimes, we need to set some custom parameters for our UniMRCP server, such as ASR server IP address, output directory. Where should we put these parameters, and how can we get them in the plugin?

Modify The Config File

Open unimrcpserver.xml in the conf dir, find engine element, add param element which contains name and value attributes.

1
2
3
<engine id="CO-Recog-1" name="corecog" enable="true">
<param name="asr_ip_address" value="wss://cotin.tech/ws/"/>
</engine>

Get The Parameters

You can call mrcp_engine_param_get to get the parameters you set in the config file.

1
2
3
4
5
static apt_bool_t co_recog_engine_open(mrcp_engine_t *engine)
{
const char *asr_ip_addr = mrcp_engine_param_get(engine, "asr_ip_address");
...
}

How to Send Custom ASR Parameters to UniMRCP Server ?

Vendor-Specific Parameters

1
2
3
4
5
vendor-specific          =    "Vendor-Specific-Parameters" ":"
[vendor-specific-av-pair.
*(";" vendor-specific-av-pair)] CRLF
vendor-specific-av-pair = vendor-av-pair-name "=" value
vendor-av-pair-name = 1*UTFCHAR

Header fields of this form MAY be sent in any method (request) and are used to manage implementation-specific parameters on the server side.

Client

We use FreeSWITCH to illustrate. We add a specific parameter “com.example.companyA.paramxyz”:

1
2
3
play_and_detect_speech(silence.wav detect:unimrcp:unimrcpserver-cotest-8000 
{start-input-timers=false, save-waveform=true, no-input-timeout=8000, com.example.companyA.paramxyz=256}
builtin:speech/transcribe)

Server

Get vendor-specific parameters in plugin:

1
2
3
4
5
6
7
8
9
10
static apt_bool_t co_recog_channel_recognize(mrcp_engine_channel_t *channel, mrcp_message_t *request, mrcp_message_t *response) {
...
if(mrcp_generic_header_property_check(request,GENERIC_HEADER_VENDOR_SPECIFIC_PARAMS) == TRUE) {
mrcp_generic_header_t *generic_header = mrcp_generic_header_get(request);
if(generic_header && generic_header->vendor_specific_params)
recog_channel->vendor_params = apt_pair_array_copy(generic_header->vendor_specific_params, request->pool);

}
...
}

Then we can use apt_pair_array_find to get parameters from apt pair array.

1
2
3
4
5
6
7
8
9
10
11
12
13
static char const* vendor_param_find(apt_pair_arr_t* vendor_specific_params, char const* name) 
{
apt_str_t sname;
if (!name) return "";
if (!vendor_specific_params) return "";
apt_string_set(&sname, name);
apt_pair_t const* p = apt_pair_array_find(vendor_specific_params, &sname);
if (!p) return "";
return p->value.buf;
}
...
xyz_value = vendor_param_find(vendor_params, "com.example.companyA.paramxyz");
...

FreeSWITCH SIP Client Audio is 16KHz, Why recognization failed ?

Honestly, I didn’t find the essential cause for this question.

My X-Lite’s codec is G.722, which is used for 16KHz audio, UniMRCP client and server only support 8KHz codecs.

1
<codecs own-preference="false">PCMU PCMA L16/96/8000 telephone-event/101/8000</codecs>

Finally, audio will be convert to L16/8KHz in UniMRCP client and sent to UniMRCP server. But the audio sent to server was distortion, so the ASR server fails to recognize. And I can’t figure out why. If anyone knows the reason, please give me some enlightenment.

Even though I don’t figure out why, I find a solution to avoid this problem.

Config Both UniMRCP Client And Server to Support 16KHz

Modify the codecs config for both UniMRCP client and server. The 16KHz codecs must be written before 8KHz codecs.

1
<codecs own-preference="false">PCMU/97/16000 PCMA/98/16000 L16/99/16000 PCMU PCMA L16/96/8000 telephone-event/101/8000</codecs>

So when the FreeSWITCH SIP client’s codecs are both in mod_unimrcp(UniMRCP client) and UniMRCP server supporting list, the codecs will be matched sequentially. In this case, 16KHz codecs will be compared before 8KHz codecs.

UniMRCP client send supported codecs to server:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
v=0
o=FreeSWITCH 8660554130009454654 6491466976311633172 IN IP4 172.16.169.159
s=-
c=IN IP4 127.0.0.1
t=0 0
m=application 9 TCP/MRCPv2 1
a=setup:active
a=connection:new
a=resource:speechrecog
a=cmid:1
m=audio 4018 RTP/AVP 97 98 99 0 8 96
a=rtpmap:97 PCMU/16000
a=rtpmap:98 PCMA/16000
a=rtpmap:99 L16/16000
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:96 L16/8000
a=sendonly
a=mid:1

UniMRCP server send the matched codec to client:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
v=0
o=UniMRCPServer 0 0 IN IP4 127.0.0.1
s=-
c=IN IP4 127.0.0.1
t=0 0
m=application 1544 TCP/MRCPv2 1
a=setup:passive
a=connection:new
a=channel:1cf7cd42c6b211e8@speechrecog
a=cmid:1
m=audio 5024 RTP/AVP 97
a=rtpmap:97 PCMU/16000
a=recvonly
a=mid:1

If we write 8KHz codecs first, UniMRCP client and server will communicate by first matched codec, which is PCMU(8KHz). The audio from FreeSWITCH will still convert to L16/8KHz. So this is not working.

1
<codecs own-preference="false">PCMU PCMA L16/96/8000 PCMU/97/16000 PCMA/98/16000 L16/99/16000 telephone-event/101/8000</codecs>

Why The Recognization Won’t Finish ?

The UniMRCP provide a default VAD solution which uses a simple power-threshold to tell if a voice is present. When the noise volumn is higher than the threshold, the state of the voice will always be active. And the session will end till no-input timeout. To solve the problem, you should replace the default solution with a effective third party library.

Cotin Yang wechat
欢迎订阅我的微信公众号 CotinDev
小小地鼓励一下吧~😘