Display live subtitles
When interacting with conversational AI in real time, you can enable real-time subtitles to display the conversation content. This page explains how to implement real-time subtitles in your app.
Understand the tech
To simplify subtitle integration, Agora provides an open-source subtitle processing module. By integrating this module into your project and calling its APIs, you can quickly enable real-time subtitles. The following figure illustrates how the subtitle module interacts with your app and Agora SD-RTN™.
Subtitles module workflow
Prerequisites
Before you begin, make sure you have implemented the Conversational AI Engine REST quickstart.
Implementation
This section describes how to receive subtitle content from the subtitle processing module and display it on your app UI.
Copy ConverationSubtitleController.kt
and MessageParser.kt
files to your project and import the module before calling the module API.
Integrate the subtitle processing module
Inherit your subtitle UI module from the IConversationSubtitleCallback
interface and implement the onSubtitleUpdated
method to handle the message rendering logic.
Implement subtitle UI rendering logic
Create a subtitle processing module instance
When entering the call page, create an ConversationSubtitleController
instance, which monitors the subtitle message callback internally and passes the subtitle information to your UI through the onSubtitleUpdated
callback of IConversationSubtitleCallback
.
Release resources
Call the reset
method at the end of each call to clean up the cache. When leaving the call page, call release
to release resources.
Reference
This section contains content that completes the information on this page, or points you to documentation that explains other aspects to this product.
API Reference
This section provides API reference documentation for the subtitles module.
ConversationSubtitleController
config
: Subtitle rendering configuration. SeeSubtitleRenderConfig
for details.
SubtitleRenderConfig
rtcEngine
:AgoraRtcEngine
instance.renderMode
: Subtitle rendering mode, seeSubtitleRenderMode
for details.callback
: The callback interface for receiving subtitle content updates, seeIConversationSubtitleCallback
for details.
SubtitleRenderMode
Text
: Sentence-by-sentence rendering mode. The subtitle content received by the callback is fully rendered on the UI.Word
: Word-by-word rendering mode. The subtitle content received by the callback is rendered word by word on the UI.
Using the word-by-word rendering mode (Word
) requires that your chosen TTS vendor supports word-by-word output, otherwise, it will automatically fall back to sentence-by-sentence rendering mode (Text
).
IConversationSubtitleCallback
The callback interface for subtitle content update events.
onSubtitleUpdated
: Subtitle update callback.subtitle
: Updated subtitle message, see for detailsSubtitleMessage
.
SubtitleMessage
-
turnId
: The identifier of the conversation turn. One conversation turn between the user and the agent corresponds to oneturnId
, and follows the following rules:turnId = 0
, This is the welcome message of the agent, and there is no subtitle for the user.turnId ≥ 1
, The subtitles for the user or agent in that round. Use theuserId
to display the user's subtitles before the agent's subtitles, and then repeat the process for round +1.
cautionThere is no guarantee that callbacks will be in strictly increasing
turnId
order. If you encounter out-of-order situations, implement the sorting logic yourself. -
userId
: The user ID associated with this subtitle message. In the current version,0
represents the user, non-zero represents the agent ID. -
text
: Subtitle text content. -
status
: The current status of the subtitles. SeeSubtitleStatus
for details.
SubtitleStatus
Use SubtitleStatus
for special UI processing based on the status, such as displaying an interruption mark at the end of the subtitle.
Progress
: The subtitles are still being generated; the user or agent has not finished speaking.End
: The subtitle generation is complete; the user or agent has finished speaking.Interrupted
: The subtitles were interrupted before completion; the user actively stopped the agent’s response.