Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

It is possible extract the text contents of PDFs and images using Optical Character Recognition (OCR). The contents content can then be included in searches.

This page describes how to enable search in asset contentscontent. The setup consists of two steps:

...

Please be aware of the important information in the bottom.

...

Content extraction with Microsoft Azure Cognitive Services

The content extraction of PDFs and images relies on Microsoft Azure Cognitive Services. Thus, a Computer Vision resource in Azure must be used.

...

The Computer Vision resource has a key and a server URIURL, which will be needed shortly. These can be found by navigating to the Computer Vision resource and locating “Keys and Endpoints”.

...

Info

The metafield “Asset content” is predefined and should not be manually modified if asset contents should be made searchable.

...

is automatically created when installing or upgrading to 5.5. The field is created in metagroup “Content”, and it is very important that this exact field is used in the configuration as the GUID of the metadata field is used as a dependency in the system.

Including asset content in searches

The extracted contents of assets can be included in freetext searches by adding the metafield “Asset content” in the search “DigiZuite_System_Framework_Search“ as a freetext input parameter. The “Asset content” metafield can be added as a freetext input parameter by doing the following:

...

Info

The contents of existing assets can be extracted by republishing the assets.

Important Information

Please be aware of the following when using the Computer Vision resource:

...