Ask Questions, get Answers. The future of Enterprise Search with Amazon Kendra

Filotheos Bezerianos
4 min readJun 15, 2020

--

Organisations nowadays store large amount of data which can help business increase productivity and improve decision making. However, integrating large quantities of structured and unstructured data, which are scattered across silos in various formats, in an enterprise search tool has many challenges. In many cases, search for information requires keywords, the return results are not relevant and find the answer you need remains a difficult and time-consuming task.

When Amazon Kendra was announced in last year’s re:Invent it was exciting news. With Amazon Kendra you can index structured and unstructured data and allow the end users to search for information using natural language questions. When it was made generally available in May, I wanted to try its search capabilities and test whether can answer some questions. For this quick test, I chose two recent topics, Brexit and COVID-19 and I used a variety of document styles — web pages, legal documents and FAQs. For the Brexit test case I used the FAQ and Readiness documents from the European Commission website, “The Future Relationship with the EU” document from gov.uk and the main Brexit article from Wikipedia. For the COVID-19 test case I used several articles from HBR which cover a variety of topics.

Indexes Configuration

The Amazon Kendra interface is very intuitive and I was able to load my data and create the indexes with a few clicks.

1. Prepare the Data Sources

For these tests I uploaded the documents to S3, however there are more data sources supported (Amazon RDS, S3, OneDrive, Salesforce, ServiceNow and SharePoint).

2. Create the Indexes

After I uploaded the documents to S3, I created one index for Brexit and one for COVID-19. Please note that the Index might take up to 30 minutes to be created.

An IAM role is required to access CloudWatch logs — remember that if you want to create a new one you must have permissions to create IAM roles.

Amazon Kendra encrypts the data in motion and at rest. When you create the Index you can select to use AWS Key Management Service to manage the key Amazon Kendra will use to encrypt the information in the index.

3. Connect the Data Sources

Once the Index is ready, I created a S3 connector pointing to the S3 bucket created earlier. If the data in S3 are encrypted, you need to provide the decryption key. In addition, an IAM role is required for accessing S3.

You can also specify patterns to include or exclude folders, file types of specific files. Last configuration step is to select a sync schedule. For this POC I chose On Demand but there are multiple options available including cron expressions.

4. Test you Search results

After your data have synced you are ready to check the search results. Visit the Search Console and type your questions.

Asking Questions, getting Answers

For this test I asked questions I knew the answers are somewhere in the data sources. As I was also aware of the context of the documents, some biased in the way the questions were asked is expected.

The suggested answers in most cases were correct and the relevant text was highlighted — for example, the answers in questions “What is Brexit?”, “When does the transition period end?”, “Can I stay in the UK if I work?”.

In some other cases, the way the question is asked might change the relevance and the order of the results. So, some tuning might be required. For example, the order of the results (including the top answer) for “Can I keep my pet?” and “Can I keep my pet in the UK?” was different. On the other hand, “How to manage remote workers?” and “How to manage remotely?” returned very similar results and the same suggested answer.

For reference, I have pasted some screenshots of the questions asked and answers returned at the end of this article.

Additional configuration and tuning

Amazon Kendra is optimized to return answers for specific domains — e.g. healthcare, legal, financial services. In addition, you can manually tune the relevance of specific fields, like document title and last updated time. In these tests, I did not boost any specific fields.

In addition, Amazon Kendra supports the upload of FAQ documents in CSV format to provide curated answers to common asked questions. In this POC I did not convert any of the FAQ documents in CSV and used the PDF documents, but some of the observations might be different if the FAQ documents where uploaded as FAQ project.

Wrapping-up

Enterprise search can be challenging but integrating Machine Learning and AI in your search tool can help organisations overcome their challenges and meet their business goals. I look forward not to having to remember specific keywords and being able to get the right answer just by asking questions.

What is Brexit?
When does the transition period end?
What is the economic impact of covid-19?

--

--