Display live subtitles
When interacting with conversational AI in real time, you can enable real-time subtitles to display the conversation content. This page explains how to implement real-time subtitles in your app.
Understand the tech
To simplify subtitle integration, Agora provides an open-source subtitle processing module. By integrating this module into your project and calling its APIs, you can quickly enable real-time subtitles. The following figure illustrates how the subtitle module interacts with your app and Agora SD-RTN™.
Subtitles module workflow
Prerequisites
Before you begin, make sure you have implemented the Conversational AI Engine REST quickstart.
Implementation
This section describes how to receive subtitle content from the subtitle processing module and display it on your app UI.
- Android
- iOS/macOS
- Web
Copy ConverationSubtitleController.kt and MessageParser.kt files to your project and import the module before calling the module API.
-
Integrate the subtitle processing module
Inherit your subtitle UI module from the
IConversationSubtitleCallbackinterface and implement theonSubtitleUpdatedmethod to handle the message rendering logic. -
Implement subtitle UI rendering logic
class CovMessageListView @JvmOverloads constructor( context: Context, attrs: AttributeSet? = null, defStyleAttr: Int = 0) : LinearLayout(context, attrs, defStyleAttr), IConversationSubtitleCallback { override fun onSubtitleUpdated(subtitle: SubtitleMessage) { // Implement your UI rendering logic here }} -
Create a subtitle processing module instance
When entering the call page, create an
ConversationSubtitleControllerinstance, which monitors the subtitle message callback internally and passes the subtitle information to your UI through theonSubtitleUpdatedcallback ofIConversationSubtitleCallback.override fun onCreate(savedInstanceState: Bundle?) { val subRenderController = ConversationSubtitleController( SubtitleRenderConfig( rtcEngine = rtcEngine, SubtitleRenderMode.word, mBinding?.messageListView ) )} -
Release resources
Call the
resetmethod at the end of each call to clean up the cache. When leaving the call page, callreleaseto release resources.subRenderController.reset()subRenderController.release()
-
Integrate the subtitle processing module
Copy
ConversationSubtitleController.swiftandMessageParser.swiftfiles to your project and import the module before calling the module API. -
Implement subtitle UI rendering logic
To render subtitles in your UI, implement the
ConversationSubtitleDelegateprotocol in your subtitle UI module. Then, define theonSubtitleUpdatedmethod to handle subtitle message rendering.extension ChatViewController: ConversationSubtitleDelegate { func onSubtitleUpdated(subtitle: SubtitleMessage) { // Implement your UI rendering logic here }} -
Create a subtitle processing module instance
When entering the call page, create a
ConversationSubtitleControllerinstance. This instance monitors subtitle message callbacks internally and passes the subtitle information to your UI using theonSubtitleUpdatedcallback ofConversationSubtitleDelegate.let subRenderConfig = SubtitleRenderConfig(rtcEngine: rtcEngine, renderMode: .words, delegate: self)subRenderController.setupWithConfig(subRenderConfig) -
Release resources
At the end of each call, use the
resetmethod to clean up the cache.subRenderController.reset()
- Integrate the subtitle processing module
Copy the message.ts file to your project and import the module before calling its API. The required dependencies are available in the lib folder.
-
Implement subtitle UI rendering logic
The subtitle UI module you implement processes the
MessageEnginesubtitle message list and includes a simple component to display these messages:const ChatHistory = () => { const [chatHistory, setChatHistory] = useState<IMessageListItem[]>([]); useEffect(() => { const getChatHistoryFromEvent = (event: MessageEvent) => { const { data } = event; if (data.type === "message") { setChatHistory(data?.data?.chatHistory || []); } }; window.addEventListener("message", getChatHistoryFromEvent); return () => { window.removeEventListener("message", getChatHistoryFromEvent); }; }, []); return ( <> {chatHistory.map((message, index) => ( <div key={`${message.uid}-${message.turn_id}`}> {message.uid}: {message.text} </div> ))} </> );};infoThe sample code uses
window.addEventListener("message")to listen for subtitle data sent byMessageEngineusingwindow.postMessage. For complex applications, Agora recommends using Redux or other state management tools to manage these messages more efficiently. -
Create a subtitle processing module instance
Before joining an RTC channel, create a
MessageEngineinstance and pass in theAgoraRTCclient, mode, and callback function.import AgoraRTC, { IAgoraRTCClient } from "agora-rtc-sdk-ng";class RtcEngine { private client: IAgoraRTCClient; private messageEngine: MessageEngine | null = null; constructor() { // Create an AgoraRTC client with RTC mode and VP8 codec this.client = AgoraRTC.createClient({ mode: "rtc", codec: "vp8" }); } public joinChannel() { // Create a MessageEngine instance, passing in the AgoraRTC client, mode, and callback function this.messageEngine = new MessageEngine( this.client, EMessageEngineMode.AUTO, (chatHistory) => { // Log chatHistory to the console console.log("chatHistory", chatHistory); // Send chatHistory to the web page; using Redux or other state management tools is recommended // Here, window.postMessage is used as an example window.postMessage({ type: "message", chatHistory, }); } ); this.client.join("***", "****", "****", "****"); }} -
Release resources
When leaving the call page or ending the conversation, call the
cleanupmethod to release resources.this.messageEngine.clearup()
Reference
This section contains content that completes the information on this page, or points you to documentation that explains other aspects to this product.
API Reference
This section provides API reference documentation for the subtitles module.
- Android
- iOS/macOS
- Web
ConversationSubtitleController
class ConversationSubtitleController ( private val config: SubtitleRenderConfig)config: Subtitle rendering configuration. SeeSubtitleRenderConfigfor details.
SubtitleRenderConfig
data class SubtitleRenderConfig ( val rtcEngine: RtcEngine, val renderMode: SubtitleRenderMode?, val callback: IConversationSubtitleCallback?)rtcEngine:AgoraRtcEngineinstance.renderMode: Subtitle rendering mode, seeSubtitleRenderModefor details.callback: The callback interface for receiving subtitle content updates, seeIConversationSubtitleCallbackfor details.
SubtitleRenderMode
enum class SubtitleRenderMode { Text, Word}Text: Sentence-by-sentence rendering mode. The subtitle content received by the callback is fully rendered on the UI.Word: Word-by-word rendering mode. The subtitle content received by the callback is rendered word by word on the UI.
Using the word-by-word rendering mode (Word) requires that your chosen TTS vendor supports word-by-word output, otherwise, it will automatically fall back to sentence-by-sentence rendering mode (Text).
IConversationSubtitleCallback
The callback interface for subtitle content update events.
interface IConversationSubtitleCallback { fun onSubtitleUpdated(subtitle: SubtitleMessage)}onSubtitleUpdated: Subtitle update callback.subtitle: Updated subtitle message, see for detailsSubtitleMessage.
SubtitleMessage
data class SubtitleMessage( val turnId: Long, val userId: Int, val text: String, var status: SubtitleStatus)-
turnId: The identifier of the conversation turn. One conversation turn between the user and the agent corresponds to oneturnId, and follows the following rules:turnId = 0, This is the welcome message of the agent, and there is no subtitle for the user.turnId ≥ 1, The subtitles for the user or agent in that round. Use theuserIdto display the user's subtitles before the agent's subtitles, and then repeat the process for round +1.
cautionThere is no guarantee that callbacks will be in strictly increasing
turnIdorder. If you encounter out-of-order situations, implement the sorting logic yourself. -
userId: The user ID associated with this subtitle message. In the current version,0represents the user, non-zero represents the agent ID. -
text: Subtitle text content. -
status: The current status of the subtitles. SeeSubtitleStatusfor details.
SubtitleStatus
Use SubtitleStatus for special UI processing based on the status, such as displaying an interruption mark at the end of the subtitle.
enum class SubtitleStatus { Progress, End, Interrupted}Progress: The subtitles are still being generated; the user or agent has not finished speaking.End: The subtitle generation is complete; the user or agent has finished speaking.Interrupted: The subtitles were interrupted before completion; the user actively stopped the agent’s response.
ConversationSubtitleController
class ConversationSubtitleController { func setupWithConfig(_ config: SubtitleRenderConfig) func reset()}setupWithConfig(_ config:): Set subtitle rendering configuration.config: Subtitle rendering configuration. SeeSubtitleRenderConfigfor details.
reset(): Clear the cache.
SubtitleRenderConfig
struct SubtitleRenderConfig { let rtcEngine: AgoraRtcEngineKit let renderMode: SubtitleRenderMode let delegate: ConversationSubtitleDelegate?}rtcEngine:AgoraAgoraRtcEngineKitinstance.renderMode: Subtitle rendering mode. SeeSubtitleRenderModefor details.delegate: Callback protocol for receiving subtitle content update events. SeeConversationSubtitleDelegatefor details.
SubtitleRenderMode
enum SubtitleRenderMode { case words case text}words: Word-by-word rendering mode. The subtitle content received by the callback is rendered word by word on the UItext: Sentence-by-sentence rendering mode. The subtitle content received by the callback is fully rendered on the UI.
Using the word-by-word rendering mode (words) requires that your chosen TTS vendor supports word-by-word output, otherwise, it will automatically fall back to sentence-by-sentence rendering mode (text).
ConversationSubtitleDelegate
Callback protocol for subtitle content update events.
protocol ConversationSubtitleDelegate: AnyObject { func onSubtitleUpdated(subtitle: SubtitleMessage)}ConversationSubtitleDelegate: Subtitle update callback protocol.onSubtitleUpdated: Subtitle update callback.subtitle: Updated subtitle message. SeeSubtitleMessagefor details.
SubtitleMessage
struct SubtitleMessage { let turnId: Int let userId: UInt let text: String var status: SubtitleStatus}-
turnId: The identifier of the conversation turn. One conversation turn between the user and the agent corresponds to oneturnId, and follows the following rules:turnId = 0, This is the welcome message of the agent, and there is no subtitle for the user.turnId ≥ 1, The subtitles for the user or agent in that round. Use theuserIdto display the user's subtitles before the agent's subtitles, and then repeat the process for round +1.
cautionThere is no guarantee that callbacks will be in strictly increasing
turnIdorder. If you encounter out-of-order situations, implement the sorting logic yourself. -
userId: The user ID associated with this subtitle message. In the current version,0represents the user, non-zero represents the agent ID. -
text: Subtitle text content. -
status: The current status of the subtitles. SeeSubtitleStatusfor details.
SubtitleStatus
enum SubtitleStatus: Int { case inprogress = 0 case end = 1 case interrupt = 2}inprogress: The subtitles are still being generated; the user or agent has not finished speaking.end: The subtitle generation is complete; the user or agent has finished speaking.interrupted: The subtitles were interrupted before completion; the user actively stopped the agent’s response.
MessageEngine
Subtitle processing engine.
class MessageEngine ( rtcEngine: rtcEngine, renderMode?: EMessageEngineMode, callback?: (messageList: IMessageListItem[]) => void)rtcEngine: Agora RTC engine instance.renderMode: Subtitle rendering mode, SeeEMessageEngineModefor details. Default isEMessageEngineMode.AUTO.callback: Callback function for receiving subtitle content updates.IMessageListItem[]is a list of messages. SeeIMessageListItemfor details.
EMessageEngineMode
enum EMessageEngineMode { TEXT = 'text', WORD = 'word', AUTO = 'auto',}TEXT: Sentence-by-sentence rendering mode. The subtitle content received by the callback is fully rendered on the UI.WORD: Word-by-word rendering mode. The subtitle content received by the callback is rendered word by word on the UI.AUTO: Automatic mode. The rendering mode is automatically selected according to the mode supported by the TTS provider.
Using the word-by-word rendering mode (WORD) requires that your chosen TTS vendor supports word-by-word output, otherwise, it will automatically fall back to sentence-by-sentence rendering mode (TEXT).
IMessageListItem
interface IMessageListItem { uid: number turn_id: number text: string status: EMessageStatus}-
uid: The user ID associated with this subtitle message. In the current version,0represents the user, non-zero represents the agent ID. -
turn_id: The identifier of the conversation turn. One conversation turn between the user and the agent corresponds to oneturn_id, and follows the following rules:turn_id = 0, This is the welcome message of the agent, and there is no subtitle for the user.turn_id ≥ 1, The subtitles for the user or agent in that round. Use theuidto display the user's subtitles before the agent's subtitles, and then repeat the process for round +1.
cautionThere is no guarantee that callbacks will be in strictly increasing
turn_idorder. If you encounter out-of-order situations, implement the sorting logic yourself. -
text: Subtitle text content. -
status: The current status of the subtitles. SeeEMessageStatusfor details.
EMessageStatus
enum EMessageStatus { IN_PROGRESS = 0, END = 1, INTERRUPTED = 2,}IN_PROGRESS: The subtitles are still being generated; the user or agent has not finished speaking.END: The subtitle generation is complete; the user or agent has finished speaking.INTERRUPTED: The subtitles were interrupted before completion; the user actively stopped the agent’s response.