Tagged: Coveo index

Coveo Computed Field for Retrieving Page Content and Renderings in Sitecore

I needed to index all text from the page content and its associated renderings, including single-line, multi-line, and rich text, into a single Coveo-computed field.

Let’s get started.

Text Extraction:

  • Created a TextExtraction class that inherits BaseComputedField.
  • GetComputedFieldValue: Computes the concatenated text from the item and its renderings.

  • GetRenderingSource: Retrieves the source item for a given rendering reference.

  • GetDatasourceItem: Resolves and retrieves the data source item using the pipeline manager.

  • GetAllReferencedText: Extracts text from fields and adds them to a result list.

  • GetReferenceFieldData: Handles reference fields and extracts text from referenced items.

 

 

Configuration:

Let’s add the TextExtraction computed field into the config file.

 

I published all the files, and it’s time to check.

I selected the page that has many renderings and hit Rebuild Tree (I set the indexing strategy as SyncMasterIf you have intervalAsyncMaster or onPublishEndSyncSingleInstance, publish the item to see the record in Index.)

If any issues, you can put a breakpoint in Visual Studio, and Rebuild Tree will hit the breakpoint to debug.

 

Let’s check the Coveo index – Yay! Its page content and its associated rendering content were extracted successfully.

Hope this helps.

Happy Sitecoring!

0

Coveo Computed Field for Extracting PDFs with Apache Tika in Sitecore

I needed to index the PDF file content in the Media Library, but when I tried to index it, the PDFSharp library couldn’t extract it.

Sitecore recommends using the following libraries: IFilter, Apache Tika, or SolrCell for indexing the media content.

I had a detailed blog on installing and integrating Apache Tika into the project.

https://madhuanbalagan.com/sitecore-apache-tika-integration-for-secure-media-file-indexing

Now that Tika is integrated, let’s get started on creating a computed field to extract PDF content using the Apache Tika service.

Media Extraction:

  • I created MediaExtraction class, which inherits BaseComputedField.
  • The GetComputedField method calls the ApacheTika service and extracts the text asynchronously.
  • Returns the text document.

 

 

Apache Tika Service:

  • The Tika Service class implements the IContentExtractionService interface
  • The main method ReadJsonObject sends the document to the Tika server and extracts the text content parsed JSON response.

 

 

Tika ConnectionString:

Please make sure that Tika is up and running.

<add name=”tika” connectionString=”http://localhost:9998″ />

When I checked, it wasn’t running for some reason.

 

Run the following Powershell script to restart the Tika.

cd c:\tika

java -jar tika-server-1.22.jar -s

Let’s check – Tika is now up and running.

 

Configuration:

Let’s add the MediaExtraction computed field into the config file.

I published all the files and it’s time to check.

I selected the PDF document in the Media Library and hit Rebuild Tree (I set the indexing strategy as SyncMaster. If you have intervalAsyncMaster or onPublishEndSyncSingleInstance, publish the item to see the record in Index.)

 

 

 

Let’s check the Coveo index – Yay! Its PDF content was extracted successfully.

Sitecore-Media-Computed-Field

 

The same computed field would work for Word and PowerPoint documents as well.

Hope this helps.

Happy Sitecoring!

0

Integrating External Website into Coveo Index for Seamless Search in Sitecore

I recently faced a scenario where I needed to integrate an external website into the Coveo Index and utilize it along with Sitecore Items on the website.

Let’s take my blog as an external data source and integrate it into the Coveo Index.

Trial Account

Feel free to create a new trial account and explore yourself – No credit card is needed it’s free and valid for 14 days. 

https://www.coveo.com/en/free-trial

Sitecore-Coveo-Extenal-Data-Source-Index-1.png

Note: Please make sure to use your business email.

Sources

After signing up, navigate to the Sources section.

Sitecore-Coveo-Extenal-Data-Source-Index-2.png

 

There are many sources available like Sitecore, Web, Sitemap, Sitemap, and many more.


Sitecore-Coveo-Extenal-Data-Source-Index-3.png

Web Source

Let’s focus on Web sources since we want to add my blog.

There are two Web sources available on-prem crawler and cloud-based crawler. Let’s choose the cloud-based crawler that is with the cloud icon on the right.

Sitecore-Coveo-Extenal-Data-Source-Index-4.png

When I start filling in my blog URL, it automatically detects the Sitemap for the website – Switching to the Sitemap URL for better Indexing performance.

Sitecore-Coveo-Extenal-Data-Source-Index-5.png

 After switching, it automatically updated to a Sitemap source with the appropriate sitemap URL.

Sitecore-Coveo-Extenal-Data-Source-Index-6.png

Content Security

The next step in the setup is Content Security.

We can permit for 

  • Same users and groups as in your content system (Grayed out due to trial account)
  • Everyone – Anonymous can access
  • Specific users and groups

By default, the everyone option is selected this would be best for public-facing content. 

Let’s change it to Specific users and groups for the demo.

Sitecore-Coveo-Extenal-Data-Source-Index-8.png

Add Source

Once added after a few minutes the source will be available. You can review other settings if further tuning is needed.

Sitecore-Coveo-Extenal-Data-Source-Index-9.png

 

Rebuild Source

Hit save and rebuild source that will initiate the rebuilding of the index.

Download logs provide more in-depth information – It’s super helpful when you face any issues.

Sitecore-Coveo-Extenal-Data-Source-Index-10.png

 

Content Browser

Once the rebuild is completed, Open the Content Browser from the left navigation under content to see the items in the index.


Sitecore-Coveo-Extenal-Data-Source-Index-11.png

 

It took only a couple of minutes to rebuild the entire blog this depends on the content of the site.

Yay! My entire blog is reindexed and ready to be consumed.

Sitecore-Coveo-Extenal-Data-Source-Index-12.png

 

Now, in the Sitecore Coveo Search interface, I could include this as an external source and use the items in the index. We could also set up a blog template and display the results with images. The source type can also be used as a facet.

Hope this helps.

Happy Searching!

1

Coveo for Sitecore: Index Error Troubleshooting and Resolution

Coveo-Sitecore-p_ApiKey-Error-Title-Image.png

I installed Coveo 5.0.1039.1 on the Sitecore 10.1 instance locally. 

After Coveo activation, the indexes weren’t loading and threw an error.

Coveo-Sitecore-p_ApiKey-Error.png

The logs showed ‘The parameter ‘p_ApiKey’ must not be an empty string’ error.

Exception: System.ArgumentException Message: Precondition failed: The parameter 'p_ApiKey' must not be an empty string Parameter name: p_ApiKey Source: Coveo.Framework at Coveo.Framework.CNL.Precondition.RaiseArgumentException(String p_Message, String p_ParameterName) at Coveo.Framework.CNL.Precondition.NotEmpty(String p_Parameter, String p_ParameterName) at Coveo.CloudPlatformClientBase.Communication.CloudPlatformHttpClientFactory.CreateAuthorizedJsonHttpClient(String p_ApiKey) at Coveo.CloudPlatformClientBase.CloudPlatformClient..ctor(CloudPlatformConfiguration p_Configuration, ICloudPlatformHttpClientFactory p_CloudPlatformHttpClientFactory, IPipelineRunnerHandler p_PipelineRunnerHandler, ISerializer p_Serializer, ICoveoSettings p_CoveoSettings, IStaticTTLCacheFactory`2 p_StaticTTLCacheFactory, ICriticalExceptionHandler p_CriticalExceptionHandler) at Coveo.CloudPlatformClientBase.CloudPlatformClient..ctor(CloudPlatformConfiguration p_Configuration) at Coveo.CloudPlatformClientBase.Communication.CloudPlatformClientFactory.GetCloudPlatformClient(CloudPlatformConfiguration p_Configuration) at Coveo.SearchProvider.Licensing.CloudLicenseRetriever.GetCloudLicense() at Coveo.SearchProvider.Licensing.CloudLicenseRetriever.GetLicense(Boolean p_ForceRetrieve) at Coveo.SearchProvider.Licensing.Cloud.LicenseRetriever.GetLicense(Boolean p_ForceRetrieve) at Coveo.SearchProvider.Licensing.LicenseManager.RetrieveLicense(Boolean p_ForceUpdate) at Coveo.SearchProvider.Licensing.LicenseManager.EnsureValidLicense() at Coveo.SearchProvider.Licensing.LicenseManager.GetLicenseInformation() at Coveo.SearchProvider.Rest.SitecoreRestHttpHandler.InitializeLicenseSettings() at Coveo.SearchProvider.Rest.SitecoreRestHttpHandler.OnInitializeSettings() at Coveo.Search.Api.Proxy.ProxyHttpHandler.OnInitialize() at Coveo.Search.Api.Proxy.ProxyHttpHandler.EnsureInitialized() at Coveo.Search.Api.Proxy.ProxyHttpHandler.ProcessRequest(IHttpContext p_Context) at Coveo.SearchProvider.Rest.SitecoreRestHttpHandlerDispatcher.ProcessRequest(HttpContext p_Context) at System.Web.HttpApplication.CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute() at System.Web.HttpApplication.ExecuteStepImpl(IExecutionStep step) at System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously)

After researching, came to know there are two API Keys in Coveo.CloudPlatformClient.Custom.config needs to match with Coveo Platform.

1. apiKey
2. searchApiKey

Coveo-Sitecore-p_ApiKey-Error-4.png

When I logged into the platform, the keys were not visible since it was secure (not sure where I saved it!), decided to create new keys.

Coveo-Sitecore-p_ApiKey-Error-2.png
1. SearchApiKey

To create the Search API Key, we must ensure the correct permissions are in place.

Ensure that Impersonate -> Allowed is selected to limit the scope of the API Key, which can be selected from the drop-down list.

Coveo-Sitecore-p_ApiKey-Error-3.png
2. ApiKey

To create the ApiKey, we need to set multiple privileges.

Content Tab:

    1. Fields -> Edit
    2. Security Identities -> Edit
    3. Security Identity Providers -> Edit
    4. Sources -> Edit all

Coveo-Sitecore-p_ApiKey-Error-5.png

Organization Tab:

    1. Organization -> Edit

Coveo-Sitecore-p_ApiKey-Error-6.png

Search Tab:

    1. Search Page -> View all

Coveo-Sitecore-p_ApiKey-Error-6.png

When the keys are created, make sure to save them in a secure place!

Coveo-Sitecore-p_ApiKey-Error-8.png
It is time to update the new config keys. 

Modify the apiKey and secureApiKey values in Coveo.CloudPlatformClient.Custom.config under AppConfig/Include/Coveo folder

Coveo-Sitecore-p_ApiKey-Error-4.png

Let’s reload Coveo Index Manager and no more errors.

Indexes are loaded and rebuilt successfully. Yay!

Coveo-Sitecore-p_ApiKey-Error-9.png

Hope this helps.

Happy Sitecoring!

References:
https://docs.coveo.com/en/2484/coveo-for-sitecore-v5/activate-silently#creating-the-api-keys

2