Machine learning in chatbot and voicebot deployments: how can it help us?

Machine Learning

In the contemporary media we can find more and more information about the use of machine learning in supporting the implementation of IT systems. It seems that machine learning is a cure for all evil of traditional implementations and owing to it, implementations can be significantly accelerated and their effectiveness can be increased without the need to engage large amounts of internal resources. Meanwhile, the truth is slightly different. Depending on the application, machine learning yields better or worse results.

There are no solutions or technologies suitable for the realization of every single objective. The same is true for ML. However, there are areas where machine learning is perfect. An example is the detection of patterns: repeated customer behaviors, recognition of facial images, recognition of fingerprints, recognition of speech, analysis of frequently repeated activities performed by employees, etc. This area, however, hides more nuances than it would seems.

Not so obvious process, namely: how does a bot learn?

It happened many times that in the inquiries that were sent to Stanusch Technologies regarding chatbots or voicebots, we received questions about the use of machine learning in the creation of a knowledge base. The customer’s assumption is that he provides us with records of conversations (in text or sound form) between call center workers and callers and based on these records algorithms “learn” how to respond to individual questions from customers. The presented situation sounds very interesting, but this is not how the process of implementing a voicebot or chatbot in an organization looks like.

Machine learning: fundamental requirements

Like any technology, machine learning also has its strengths, challenges and limitations. Let us start by explaining what is needed to make machine learning work well.

First of all, it is necessary to prepare appropriate sets of information:

  • a set of training information,
  • a set of test information.

On the basis of the first set, the system is “learning”. The second will be used to verify the correctness and additional customization of the solution.

What should such sets contain? In case of creating a knowledge base, at least three groups of information should be included:

  • users’ statements and answers given by the worker,
  • time,
  • interactions performed by a call center worker in external information systems.

Machine learning: can a knowledge base be prepared in such a way?

We already know what we need to prepare a knowledge base. Let us start with the first element, i.e. the statements made by the users and the answers given by the worker. Such a collection should be prepared in the form of a written text. In case of hotline recordings, it is necessary to provide, apart from voice recording, transcripts of the conversations. This is necessary if specific names, phrases and personal names are used in industry jargon. Personal data is a contentious issue. This is both a legal and a practical aspect – the system “must” know what is the ontology of a name and surname, town name or address data. The lack of appropriate marking of these elements may cause incorrect “learning” effects.

Time is also important – certain figures change over time (prices, commissions, terms of service etc.). Failing to take these changing factors into account may result in incorrect results. For example, let us assume that the training file contains 30 times the information that the commission on granting a mortgage loan is 3% and only twice there is the information that it is 2%. Without taking into account the time function, machine learning will teach the system that the commission is 3% (statistically more frequently given answers with this commission). Meanwhile, the correct answer is of course 2% (because the change took place for example yesterday and is valid from today).

Most importantly – what does the worker do during customer service? Our analyses show that even 85% of user inquiries require the worker’s interaction with external IT systems. These are: billing systems, e-banking systems, ERP/CRM type systems, etc. In practice, nobody collects data that would show what the worker did during the conversation, what systems he asked, what function was used to answer the customer’s question. It is therefore necessary to prepare an appropriate template:

user’s statement -> worker’s answer -> activities in IT systems – time

How to prepare the template for training? One of the solutions is to use the appropriate software to track the user’s activities (however, the question is – who will allow it to be installed, for example, in a bank?). A manual description of hundreds of thousands of interactions is also pointless, because it will cause that chatbot or voicebot to handle correctly 15% of interactions at most. It seems pointless to create such a solution, especially that the remaining 85% of the “automatically” created knowledge base will have to be cleaned manually. At the same time the knowledge base will be built “from scratch”.

Alternative: rule-based knowledge base

Our experience from over 130 implementations shows that the process of cleaning the knowledge base based on machine learning is as time-consuming (and often even more time-consuming) as preparing a knowledge base based on rules from scratch. In addition, such a knowledge base (based on the rules) is fully detailed and gives 100% certainty what answer will be given after meeting the business rule. It is also important to remember that solutions based on machine learning never give 100% certainty as to the answers given. At the initial stage, their operation is somewhat unpredictable. Of course, after the required time the solution works well, even in 99% of cases. However, until that time and until the solution is properly trained, the answers will be a bit chaotic, as if they were given by an untrained worker.

Can, therefore, machine learning be well adapted to the implementation and creation of a knowledge base? In our opinion, the rule-based approach is much more efficient (in terms of time and quality).

Machine learning – optimal applications

If not for building a database, what can machine learning be used for in terms of chatbot and voicebot technology? There are at least a few possibilities and they are all worth recommending.

1.    Speech recognition


The first area is, of course, speech recognition. Without machine learning it is practically impossible to create a good speech recognition system. What is more, even in the existing speech recognition systems there are always areas that can be improved:

  • Adapting the system to industry-specific requirements,
  • Adapting the system to the industry nomenclature and in particular to the own names used,
  • Adapting to the language of the speakers’ statements (e.g. introducing phrases in dialects).

In the above cases it is worth using call center recordings. Of course, the recording is not enough, the transcription to the text is needed so that the system can “learn” new vocabulary. Such a transcription can be done partially automatically, but it must be verified by a human being.

2.    Testing of chatbots and voicebots



Machine learning is ideal for automatic testing of a knowledge base. Editing and configuration of the knowledge base is a process consisting in such a parameterization of the system that it is able to react to any question formulated in any way, taking into account the context. When asked about the cost of a mortgage, a user might as well ask “how much will the mortgage cost me?” as well as ask about “APRC for home loans”. Expanding and updating the knowledge base, we face the risk of configuration errors and conflicts in the knowledge base. Finding errors manually in a database with thousands of facts and tens of thousands of rules is almost impossible. The ideal solution is to run an automatic tester, which on the basis of an appropriate test set will verify the correctness of the solution and indicate where to make corrections.

In case of using machine learning to test automatic systems, the files to be prepared are limited to test sets and may be simpler in structure (just a sample of questions and a correct answer plus possibly a context).

3.    Detection of trends


Another area where machine learning can be used are statistics and reports. Solutions that predict trends and detect abnormalities are ideal here. For example, we have an extensive knowledge base, which is divided into categories (e.g. products, service, complaints, online, etc.). Machine learning can ” detect” trends (and most importantly – deviations from them) concerning the number of queries from users in specific categories. Imagine that we are a supplier of certain products. One of the categories in the knowledge base that we have created are complaints. Suppose that the system “discovered” that on weekends 10 complaints related to our products are sent to the system. One Saturday, however, the number of complaints exceeds 100. This is much more than the trend, because the trend says about 10 for each weekend. In such a case, machine learning can detect a deviation from the norm and alert the relevant services that something bad (unusual) is happening. Even more interesting are the analyses when we combine the results of conversations conducted by the automated system with other systems of the company.

Summary: the best areas for machine learning

Machine learning is useful, but it also requires work on preparing input files for work. One has to consider carefully whether the workload for the preparation of training and test sets will not exceed the time needed for manual configuration of the solution. In case of voicebots and chatbots, it seems that the biggest benefits are found in the following areas: improvement of speech recognition and automation of knowledge base testing processes. Let us not forget about the area that can enrich the organization with unique knowledge, i.e. detection of trends.

Maciej Stanusch
Stanusch Technologies CEO

Leave a Reply

Your email address will not be published. Required fields are marked *