Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This page describes how to enable search in asset content. The setup consists of two steps:

  1. Content extraction with Microsoft Azure Cognitive Services.

  2. Including asset content in searches.

Please be aware of the important information at the bottom of the page.

...

A new Computer Vision client can be created with the following steps:

  1. Log in to the Azure portal (https://portal.azure.com/).

  2. Search for ‘Cognitive Services’

  3. Click ‘Add’ to add a new Cognitive Service.

  4. Search for ‘Computer Vision’ and create a new client.

...

The Computer Vision resource has a key and a server URL, which will be needed shortly. These can be found by navigating to the Computer Vision resource and locating ‘Keys and Endpoints’.

...

Content extraction can now be enabled in the Cognitive Service, which is part of Digizuite Core.

  1. On the server where the DAM Center is installed, navigate to the Cognitive Service directory (typically “Webs/<yourDAM>/DigizuiteCore/cognitiveservice”).

  2. Edit the “appsettings.json”-file. The following parameters in the “ComputerVisionDetails”-section are relevant:

Parameter

Description

OcrKey

The key from the Computer Vision resource created above.

(One of the ‘KEY’ entries in the image above)

OcrServerUri

The URI from the Computer Vision resource created above.

(The ‘Endpoint’ entry in the image above)

OcrExtractFromPdf

If true, the text content of PDF files is extracted when new PDF files are uploaded to the DAM.

OcrExtractFromImage

If true, the text content of images is extracted when new images are uploaded to the DAM.

OcrLetAzureRequestFiles

If false, files are explicitly uploaded to the Computer Vision client. Otherwise, Azure will request the files from the DAM Center. Setting this to true is expected to be more efficient, but it requires that the DAM Center can be accessed by Azure.

Thus, ensure that the DAM Center is not behind a strict firewall, if this is set to true.

OcrTaskDelayLength

We regularly check the status of ongoing content extractions in the Computer Vision client. This gives the time interval between each check.

The larger the time interval is, the less requests are made to Azure. However, it then also takes more time for the extracted content of the files to be available in the ‘Asset Content’metafield.

You most likely don’t have to change this.

...

Info

The metafield ‘Asset content' is automatically created when installing or upgrading to 5.5. The field is created in metagroup 'Content’, and it is very important that this exact field is used in the configuration as the GUID of the metadata field is used as a dependency in the system.

GUID: 4A8ED71B-574A-43BB-A35E-8826598CF36F

Including asset content in searches (when using Solr)

The extracted content of assets can be included in freetext searches by adding the metafield ‘Asset content' in the search DigiZuite_System_Framework_Search as a freetext input parameter. The 'Asset contentmetafield can be added as a freetext input parameter by following these steps:

  1. Find DigiZuite_System_Framework_Search in the ConfigManager for the correct version of the product the feature should be enabled for.

...

  1. Image Added
  2. Add a new input parameter.

    1. Locate and choose the metafield group ‘Content.

      Image Modified
    2. Choose the metafield ‘Asset content, and choose the ‘FreeText comparison type. Create the input parameter.

      Image Modified
  3. Save the modified search and populate the search cache.

...

The content of the asset types you selected in Step 1 should now be included in freetext searches when new assets are uploaded.

...