Automated Testing for self-learning Systems

In the recent past companies in the field of insurances and banking like Ottonova, Lemonade and Osca have invested time and money into developing conversational agents to communicate with their costumers in a higher quality. According to Grand View Research, the global chatbot market is expected to reach $1.23 billion by 2025 with a compounded annual growth rate (CAGR) of 24.3%. Chatbots reduce operating costs for enterprises and can work in segments such as marketing, payments and processing, and service. Within the global chatbot market, approximately 45% of end users prefer chatbots as the primary mode of communication for customer service inquires. The new generation of chatbots or conversational agents operates with the help of machine learning models. The users tend to have a misconception about the intelligence and abilities of such agents. Conversations may end in a dialog breakdown which in the end leads to unhappy users. To overcome this problem companies need to develop conversational agents with a better language understanding and language generation. With every new version of an agent testers need to manually verify different kinds of possible dialog paths for unwanted dialog breakdowns. The dynamic nature of natural language and the variety of expressing the same meaning in different ways leads to a problem with the task of manual testing a conversational agent.


As usage and demand are growing chatbot systems will be used in more and more domains where they can cause damage for companies, users and third parties in different contexts, like health information, money transactions or insurance contracts. Chatbots need to carefully get tested. The different test scenarios are countless even for very small domains. Manual testing can only focus on testing a small surface of every release of a chat system.

An intelligent test system may automate the tedious task of testing a dynamic chat system. By using predefined test scenarios the system may plan the interaction between the bot under test and test system itself. After the end of the conversation the test system will be able to evaluate if the past conversation has had the same path at it was planned before.

The planning and generating of text to feed the chatbot is done by the system based on training and a given task set. After each full run, the system generates a report which can be used to check the results for further actions like quality assurance or continuous integration.

I am confident to adopt this approach to other, related domains in the field of testing for self-learning systems where the success can not be determined by gaming rules or something similar. In addition the approach also could allow us to understand what a chatbot has learned and which data is needed. This can have a big impact on working with personal data1.

Research Question

My work focuses on the automatic generation, execution and evaluation of test plans for conversational agents. In addition I will try to transfer a possible solution into other domains of self learning systems and investigate it’s applicability.