Using the document status webhook API
You can use the document status webhook feature to send a webhook event to your external application when the status of ingested documents becomes available or failed. The webhook event helps you to take the next action
on indexed documents, without having to get the document status first through the Get document details API.
IBM Cloud Pak for Data IBM Software Hub When you run Discovery in an air-gapped environment, you must connect to the external application through an HTTP proxy. For more information, see Setting up HTTP proxy in air-gapped environments.
For using the document status webhook feature, do the following things:
-
Set up the external application that can receive webhook notifications from Discovery.
To do so, you must register your external application as a webhook endpoint on a collection by using the
create collectionorupdate collectionAPI methods. For more information, see Create collection or update collection in the API reference.The external application receives a webhook
pingevent, which notifies that the webhook is sucessfully created. The external application must be accessible from IBM Cloud. -
Ingest the documents to the collection. When the status of the ingested documents becomes
availableorfailed, the external application receives thedocument.statuswebhook event.You can verify the status of the ingested documents in the
dataobject of thedocument.statuswebhook event. Thedocument_idsandstatusparameters show the IDs of the ingested documents and their status. For more information, see Data model of thepingevent and Data model of thedocument.statusevent.
The following image shows the webhook configuration flow.
The following image shows the document status webhook feature process flow.
For more information about the query API, see Query a project API method in the API reference.
You can also refer to the webhook-doc-status-sample application for the document status webhook API feature. To view the sample application, you must have access to the Discovery doc-tutorial-downloads repository.
Authentiating the request for webhook security
To authenticate the webhook request, verify the JSON Web Token (JWT) that is sent with the request. The webhook microservice automatically generates a JWT and sends it in the Authorization header with each webhook call. It is your
responsibility to add code to the external service that verifies the JWT.
The system can generate a JWT based on the sample secret that you specify, and in the Authorization header, you can pass this system-generated JWT to the external application. If you specify a value in the header,
then the webhook microservice sends that value to the external application instead of the JWT.
For example, if you specify sample secret in the Secret field of the Webhooks object in the Create collection or update collection APIs, you might add sample code such as the following in Node.js:
const jwt = require('jsonwebtoken');
...
const token = request.headers.authentication; // grab the "Authentication" header
try {
const decoded = jwt.verify(token, 'sample secret');
} catch(err) {
// error thrown if token is invalid
}
Data model of the ping event
Following are the ping event parameters:
| Parameter | Description |
|---|---|
event |
The event name is ping. |
instance_id |
The Discovery instance ID. |
version |
The Discovery API version in the format yyyy-mm-dd. |
data |
An object with the event information:
|
created_at |
The date and time the event was created. |
For example, following is a ping event that is sent to a webhook:
POST https://example.com/webhook
Authorization: Basic YWxhZGRpbjpvcGVuc2VzYW1l
X-Global-Transaction-ID: 5144bb45-dc81-402c-a045-249fd1318515
Content-Type: application/json
{
"event": "ping",
"version": "2023-03-31",
"instance_id": "1a5d4916-6097-4150-977a-ca897226565c",
"data": {
"url": "https://example.com/webhook",
"events": [
"document.status"
],
"metadata": {
"project_id": "02a803f9-c814-4fcb-a764-e01e3d4dd002",
"collection_id": "f41ae858-0ca9-d0ed-0000-01890118cc5b"
}
},
"created_at": "2023-08-16T08:34:46.000Z"
}
Data model of the document.status event
Following are the document.status event parameters:
| Parameter | Description |
|---|---|
event |
The event name is document.status. |
instance_id |
The Discovery instance ID. |
version |
The Discovery API version in the format yyyy-mm-dd. |
data |
An object with the event specific information: project_id, collection_id, and document_ids. |
status |
The status of the documents. |
created_at |
The date and time the event was created. |
For example, following is a document.status event that is sent to a webhook:
POST https://example.com/webhook
Authorization: Basic YWxhZGRpbjpvcGVuc2VzYW1l
X-Global-Transaction-ID: 5144bb45-dc81-402c-a045-249fd1318515
Content-Type: application/json
{
"event": "document.status",
"version": "2023-03-31",
"instance_id": "1a5d4916-6097-4150-977a-ca897226565c",
"data": {
"project_id": "02a803f9-c814-4fcb-a764-e01e3d4dd002",
"collection_id": "f41ae858-0ca9-d0ed-0000-01890118cc5b",
"document_ids": [
"1a5d4916-6097-4150-977a-ca897226565b",
"2a5d4916-6097-4150-977a-ca897226565b"
],
"status": "available"
},
"created_at": "2023-08-16T08:34:46.000Z"
}