Audio Control Service

About 4107 wordsAbout 14 min

2025-05-29

Provides a voice service controller for the robot system. Through AudioController, you can use RPC to control robot audio commands and obtain status.

Interface Definition

AudioController is a C++ class encapsulating audio control functions, mainly used for audio playback control, TTS playback, volume setting and query, and subscribing to raw voice data.

AudioController

Item	Description
Function Name	AudioController
Declaration	`AudioController();`
Overview	Initializes the audio controller object, constructs internal state, allocates resources, etc.
Note	Constructs internal state.

~AudioController

Item	Description
Function Name	~AudioController
Declaration	`~AudioController();`
Overview	Releases audio controller resources, ensures playback is stopped and underlying resources are cleaned up.
Note	Ensures resources are safely released.

Initialize

Item	Description
Function Name	Initialize
Declaration	`bool Initialize();`
Overview	Initializes the audio control module, prepares playback resources and devices.
Return Value	`true` for success, `false` for failure.
Note	Used in pair with `Shutdown()`.

Shutdown

Item	Description
Function Name	Shutdown
Declaration	`void Shutdown();`
Overview	Shuts down the audio controller and releases resources.
Note	Be sure to call before destruction.

GetVoiceConfig

Item	Description
Function Name	GetVoiceConfig
Declaration	`Status GetVoiceConfig(GetSpeechConfig& config);`
Overview	Gets the complete configuration information of the voice system.
Parameter	`config`: Returns the voice system configuration by reference.
Return Value	`Status::OK` for success, others for failure.
Note	Blocking interface. The configuration includes all sub-configs such as speaker, bot, wakeup, dialog, and the current TTS model type.

SwitchTtsVoiceModel

Item	Description
Function Name	SwitchTtsVoiceModel
Declaration	`Status SwitchTtsVoiceModel(TtsType tts_type, GetSpeechConfig& config);`
Overview	Switches the TTS model and gets the updated configuration.
Parameter	`tts_type`: TTS model type to switch to `config`: Returns the updated voice system configuration by reference
Return Value	`Status::OK` for success, others for failure.
Note	Blocking interface. The configuration is automatically updated after switching the TTS model.

SetVoiceConfig

Item	Description
Function Name	SetVoiceConfig
Declaration	`Status SetVoiceConfig(const SetSpeechConfig& config);`
Overview	Sets the complete configuration information of the voice system.
Parameter	`config`: The voice system configuration to set.
Return Value	`Status::OK` for success, others for failure.
Note	Blocking interface. The set configuration will completely overwrite all current voice-related configurations.

Play

Item	Description
Function Name	Play
Declaration	`Status Play(const TtsCommand& cmd);`
Overview	Plays a TTS (Text-to-Speech) command.
Parameter	`cmd`: TTS command, including text, speed, pitch, etc.
Return Value	`Status::OK` for success, others for failure.
Note	Blocking interface. Make sure the module is initialized before calling.

Stop

Item	Description
Function Name	Stop
Declaration	`Status Stop();`
Overview	Stops the current audio playback.
Return Value	`Status::OK` for success, others for failure.
Note	Blocking interface. Usually used to interrupt current speech.

SetVolume

Item	Description
Function Name	SetVolume
Declaration	`Status SetVolume(int volume);`
Overview	Sets the audio output volume.
Parameter	`volume`: Volume value, usually in the range 0~100.
Return Value	`Status::OK` for success, others for failure.
Note	Blocking interface. Takes effect immediately after setting.

GetVolume

Item	Description
Function Name	GetVolume
Declaration	`Status GetVolume(int& volume);`
Overview	Gets the current audio output volume.
Parameter	`volume`: Returns the current volume value by reference.
Return Value	`Status::OK` for success, others for failure.
Note	Blocking interface. Check the return value before using `volume`.

ControlVoiceStream

Item	Description
Function Name	ControlVoiceStream
Declaration	`Status ControlVoiceStream(bool raw_data, bool bf_data);`
Overview	Controls the voice data stream.
Parameter	`raw_data`: Whether to send raw data `bf_data`: Whether to send bf data
Return Value	`Status::OK` for success, others for failure.
Note	Blocking interface. Used to control the opening and closing of the voice data stream.

SubscribeOriginVoiceData

Item	Description
Function Name	SubscribeOriginVoiceData
Declaration	`void SubscribeOriginVoiceData(const RawVoiceDataCallback callback);`
Overview	Subscribes to raw voice data.
Parameter	callback: Callback to handle received raw voice data.
Note	Non-blocking interface. The callback will be called when data is updated.

SubscribeBfVoiceData

Item	Description
Function Name	SubscribeBfVoiceData
Declaration	`void SubscribeBfVoiceData(const BfVoiceDataCallback callback);`
Overview	Subscribes to BF voice data.
Parameter	callback: Callback to handle received BF voice data.
Note	Non-blocking interface. The callback will be called when data is updated.

Type Definitions

`TtsPriority` — TTS Playback Priority Level

Used to control interruption behavior between different TTS tasks. Higher priority tasks will interrupt the playback of current lower priority tasks.

Enum Value	Value	Description
`TtsPriority::HIGH`	0	Highest priority, e.g., low battery alert, emergency reminder
`TtsPriority::MIDDLE`	1	Medium priority, e.g., system prompt, status broadcast
`TtsPriority::LOW`	2	Lowest priority, e.g., daily voice dialogue, background broadcast

`TtsMode` — Task Scheduling Policy at the Same Priority

Used to refine the playback order and clearing logic of multiple TTS tasks under the same priority.

Enum Value	Value	Description
`TtsMode::CLEARTOP`	0	Clear all tasks of the current priority (including playing and waiting queue), play this request immediately
`TtsMode::ADD`	1	Add this request to the end of the current priority queue, play in order (do not interrupt current playback)
`TtsMode::CLEARBUFFER`	2	Clear unplayed requests in the queue, keep current playback, then play this request

Struct Definitions

`TtsCommand` — TTS Playback Command Structure

Describes the complete information of a TTS playback request, supporting unique ID, text content, priority control, and scheduling mode under the same priority.

Field Name	Type	Description
`id`	`std::string`	Unique TTS task ID, e.g., `"id_01"`, used to track playback status
`content`	`std::string`	Text content to play, e.g., `"Hello, welcome to the intelligent voice system."`
`priority`	`TtsPriority`	Playback priority, controls whether to interrupt lower priority speech
`mode`	`TtsMode`	Scheduling policy under the same priority, controls whether to append, overwrite, etc.

`CustomBotInfo` — Custom Bot Configuration Structure

Describes the basic configuration information of a custom bot.

Field Name	Type	Description
`name`	`std::string`	Bot name
`workflow`	`std::string`	Workflow ID
`token`	`std::string`	User authorization token

`CustomBotMap` — Custom Bot Mapping Table

Item	Description
Type	`std::map<std::string, CustomBotInfo>`
Description	Custom bot mapping table, key is bot ID, value is bot configuration info

`SetSpeechConfig` — Speech Configuration Parameter Structure

Used to set various configuration parameters of the voice system.

Field Name	Type	Description
`speaker_id`	`std::string`	Speaker ID
`region`	`std::string`	Speaker region
`bot_id`	`std::string`	Mode ID
`is_front_doa`	`bool`	Force recognition from the front
`is_fullduplex_enable`	`bool`	Natural conversation switch
`is_enable`	`bool`	Voice switch
`is_doa_enable`	`bool`	Enable wakeup direction turning
`speaker_speed`	`float`	TTS playback speed, range [1,2]
`wakeup_name`	`std::string`	Wakeup name
`custom_bot`	`CustomBotMap`	Custom bot configuration

`SpeakerConfigSelected` — Selected Speaker Configuration Structure

Current speaker configuration

Field Name	Type	Description
`region`	`std::string`	Selected region
`speaker_id`	`std::string`	Selected speaker ID

`SpeakerConfig` — Speaker Configuration Structure

Speaker configuration

Field Name	Type	Description
`data`	`std::map<std::string, std::vector<std::array<std::string, 2>>>`	Speaker data: region->speaker ID->speaker name
`selected`	`SpeakerConfigSelected`	Currently selected speaker configuration
`speaker_speed`	`float`	Speech speed

`BotInfo` — Bot Configuration Information Structure

Describes the basic information of a standard bot.

Field Name	Type	Description
`name`	`std::string`	Work scenario name
`workflow`	`std::string`	Workflow ID

`BotConfigSelected` — Selected Bot Structure

Field Name	Type	Description
`bot_id`	`std::string`	Selected bot ID

`BotConfig` — Bot Configuration Structure

Describes bot-related configuration information, including standard bots and custom bots.

Field Name	Type	Description
`data`	`std::map<std::string, BotInfo>`	Standard bot data: bot ID->bot info
`custom_data`	`std::map<std::string, CustomBotInfo>`	Custom bot data: bot ID->custom bot info
`selected`	`BotConfigSelected`	Currently selected bot configuration

`WakeupConfig` — Wakeup Configuration Structure

Describes wakeup-related configuration information.

Field Name	Type	Description
`name`	`std::string`	Wakeup name
`data`	`std::map<std::string, std::string>`	Wakeup word data: wakeup word->pinyin

`DialogConfig` — Dialog Configuration Structure

Describes dialog-related configuration parameters.

Field Name	Type	Description
`is_front_doa`	`bool`	Force enhanced pickup from the front
`is_fullduplex_enable`	`bool`	Enable full-duplex dialog
`is_enable`	`bool`	Enable voice
`is_doa_enable`	`bool`	Enable wakeup direction turning

`TtsType` — TTS Model Enum

Describes the available TTS model types.

Enum Value	Value	Description
`TtsType::NONE`	0	No TTS model
`TtsType::DOUBAO`	1	Doubao TTS model
`TtsType::GOOGLE`	2	Google TTS model

`GetSpeechConfig` — Complete Voice System Configuration Structure

Describes the complete configuration information of the voice system, including all sub-configuration modules.

Field Name	Type	Description
`speaker_config`	`SpeakerConfig`	Speaker configuration
`bot_config`	`BotConfig`	Bot configuration
`wakeup_config`	`WakeupConfig`	Wakeup configuration
`dialog_config`	`DialogConfig`	Dialog configuration
`tts_type`	`TtsType`	TTS model, default is `TtsType::NONE`

`MultiArrayDimension` — Multi-dimensional Array Dimension Description

Field Name	Type	Description
`label`	`std::string`	Dimension label
`size`	`int32_t`	Dimension size
`stride`	`int32_t`	Stride

`MultiArrayLayout` — Multi-dimensional Array Layout Description

Field Name	Type	Description
`dim_size`	`int32_t`	Number of dimensions
`dim`	`std::vector<MultiArrayDimension>`	Dimension array
`data_offset`	`int32_t`	Data offset

`ByteMultiArray` — Byte Array Data Structure

Field Name	Type	Description
`layout`	`MultiArrayLayout`	Array layout information
`data`	`std::vector<uint8_t>`	Byte data array