Social chatbots or intelligent dialogue systems have the capability to engage in conversations with humans. Mimicking human conversations and passing Turing test has been the longest running goal of Artificial Intelligence (AI). Even in cinema, many movies like Her, and Iron Man have showcased how chatbot technologies are bringing revolutionary changes to the way humans and machines are interacting with each other.
In the early 1970s, there were many attempts to create intelligent dialogue systems, but these systems were designed based on hand-crafted rules. In the year 2016 chatbots were considered “The Next Big Thing”. Major IT companies including Google, Microsoft, Facebook, and Amazon released their own version of chatbot platforms. These chatbots were very poor because of performance, design issues, and bad user experience.
But due to recent advancements in Natural Language Processing (NLP) ( BERT, ELMo, ULMFiT, and the OpenAI transformer), Google has been able to improve Document retrieval performance by 50–100%. Therefore we can assume that NLP’s ImageNet moment is almost here. It is most likely that the chatbots will become even more widely usable in years to come.
If designed properly, chatbots can transform many fields like teaching, e-commerce, law practices, personal assistant, management etc.
XiaoIce is another great example of social chatbot. It is developed by Microsoft and it is considered as most popular chatbot of the world. It has a personality of an 18-year-old girl, who is funny, reliable, sympathetic, and affectionate.
Did you know there are more than 5 AI legal assistants and 195 personal assistants on messenger? Let’s discuss in brief, one of the methods of how these social chatbots can be designed. This post is meant to give a basic introduction to the design of XiaoIce, which is described in the paper.
For social chatbot systems, high emotional quotient (EQ) and Intelligent quotient (IQ) are required, as these systems are supposed to help users in completing a specific task and provide emotional support as well. A unique personality of a chatbot makes it, even more, user-friendly. A social chatbot must have the ability to personalize responses, make it encouraging, motivating and fit according to the area of interest of users.
XiaoIce has developed over 230 different skills which include movie recommendation, comforting, storytelling etc. It also demonstrates EQ by generating socially attractive responses and change the topic of conversation according to the situation. It is designed as an 18-year-old girl, who is creative, funny, reliable and sympathetic.
Social Chatbot Evaluation Metric: CPS
Measuring the performance of social chatbot is difficult as in past Turing test was used to evaluate the performance. XiaoIce is evaluated using Conversation-turns Per Session (CPS) as a metric. As Turing test cannot measure emotional engagement with users. CPS is the average number of conversation turns between chatbot and user in a session. The expected CPS corresponds to long-term engagements.
Social Chat as Hierarchical Decision-Making process
In order to fulfill the design objectives, human-machine interaction is considered as a decision making process. The chatbot is then optimized for long-term engagements.
XiaoIce can maintain the interest of users by diversifying the conversation modes by using various skills. Each conversation mode is maintained by a skill. A top-level process handles overall conversations modes by switching the skills. A low-level process, controlled by currently selected skill, handles the responses to generate a chat segment or complete a task. For example, the top-level process can switch from core chatting skill to song recommendation. Song recommendation skill can recommend a song or choose to take an action like switching to concert ticket booking skill, to book the future concert event of the user’s favorite band.
Such decisions can be cast in the mathematical framework called Markov Decision Processes (MDPs). So the chatbot navigates in an MDP, interacting with users. At each discreet dialogue turn, the chatbot observes current chat state and chooses a skill or send a response. Then chatbot receives the reward from the user. The chatbot finds optimal policies and skills in order to maximize CPS.
User Experience Layer: This layer connects XiaoIce to all major messaging platforms and communication is done via two modes. Full duplex mode is a stream-based conversation mode which supports voice calling. In message-based conversation mode user and chatbot, take turns to send the message. This layer also includes speech recognition, image understanding, and text normalization.
Conversation engine Layer: This layer is composed of dialogue manager, Empathetic computing, and dialogue skills. The dialogue manager keeps track of the dialogue state, select dialogue skill or core chat. Empathetic computing module is designed to understand human emotions and interests from the dialogues. This module helps XiaoIce to acquire social skill and generate a personalized response based on XiaoIce’s personality.
Data layer: This layer consists of various databases that store conversational data in text-text pairs and text-image pairs, non-conversational data like knowledge graphs, user profiles, chatbot profile etc.