Livy batch APIs
Livy batches API is a REST interface for submitting Spark batch jobs. This interface is very similar to the open source Livy REST interface (see Livy) except for a few limitations which are described in the following topic.
The open source Livy batches log API to retrieve log lines from a batch job is not supported. The logs are added to the IBM Cloud Object Storage bucket that was referenced as the service instance "instance_home"
. At a later
time during the beta release, the logs can be forwarded to IBM Log Analysis.
Gets the log lines from this batch.
Submitting Spark batch jobs
To submit a Spark batch job by using the Livy batches API, enter:
curl \
-H 'Authorization: Bearer <TOKEN>' \
-H 'Content-Type: application/json' \
-d '{ "file": "/ cos://<application-bucket-name>.<cos-reference-name>/my_spark_application.py"
", \
"conf": { \
"spark.hadoop.fs.cos.<cos-reference-name>.endpoint": "https://s3.direct.us-south.cloud-object-storage.appdomain.cloud", \
"spark.hadoop.fs.cos.<cos-reference-name>.access.key": "<access_key>", \
"spark.hadoop.fs.cos.<cos-reference-name>.secret.key": "<secret_key>", \
"spark.app.name": "MySparkApp" \
} \
}' \
-X POST https://api.us-south.ae.cloud.ibm.com/v3/analytics_engines/<instance-id>/livy/batches
Request body for a submitted batch job using the Livy batches API:
Name | Description | Type |
---|---|---|
file |
File containing the application to execute | string (required) |
className |
Application Java/Spark main class | string |
args |
Command line arguments for the application | list of string |
jars |
jars to be used in this session | list of string |
pyFiles |
Python files to be used in this session | list of string |
files |
files to be used in this session | list of string |
driverMemory |
Amount of memory to use for the driver process | string |
driverCores |
Number of cores to use for the driver process | int |
executorMemory |
Amount of memory to use per executor process | string |
executorCores |
Number of cores to use for each executor | int |
numExecutors |
Number of executors to launch for this session | int |
name |
The name of this session | string |
conf |
Spark configuration properties | map of key=val |
The proxyUser
, archives
and queue
properties are not supported in the request body although they are supported in the open source Livy REST interface.
Response body of a submitted batch job using the Livy batches API:
Name | Description | Type |
---|---|---|
id |
The batch ID | int |
appId |
The Spark application ID | string |
appInfo |
Detailed application information | map of key=val |
state |
State of submitted batch job | string |
Examples using the Livy API
The following sections show you how to use the Livy batches APIs.
Submit a batch job with job file in IBM Cloud Object Storage
To submit a batch job where the job file is located in an IBM Cloud Object Storage bucket, enter:
curl -i -X POST https://api.us-south.ae.cloud.ibm.com/v3/analytics_engines/<instance-id>/livy/batches -H 'content-type: application/json' -H "Authorization: Bearer $TOKEN" -d @livypayload.json
The endpoint to your IBM Cloud Object Storage instance in the payload JSON file should be the public endpoint.
Sample payload:
{
"file": "cos://<bucket>.mycos/wordcount.py",
"className": "org.apache.spark.deploy.SparkSubmit",
"args": ["/opt/ibm/spark/examples/src/main/resources/people.txt"],
"conf": {
"spark.hadoop.fs.cos.mycos.endpoint": "https://s3.direct.us-south.cloud-object-storage.appdomain.cloud",
"spark.hadoop.fs.cos.mycos.access.key": "XXXX",
"spark.hadoop.fs.cos.mycos.secret.key": "XXXX",
"spark.app.name": "MySparkApp"
}
}
Sample response:
{"id":13,"app_info":{},"state":"not_started"}
Submit batch job with job file on local disk
To submit a batch job where the job file is located on a local disk enter:
curl -i -X POST https://api.us-south.ae.cloud.ibm.com/v3/analytics_engines/<instance-id>/livy/batches -H 'content-type: application/json' -H "Authorization: Bearer $TOKEN" -d @livypayload.json
Sample payload:
{
"file": "/opt/ibm/spark/examples/src/main/python/wordcount.py",
"args": ["/opt/ibm/spark/examples/src/main/resources/people.txt"],
"className": "org.apache.spark.deploy.SparkSubmit"
}
Sample response:
{"id":15,"app_info":{},"state":"not_started"}
The SparkUiUrl
property in response will have a non-null value when the UI is available for serverless Spark instance.
List the details of a job
To list the job details for a particular Spark batch job enter:
curl \
-H 'Authorization: Bearer <TOKEN>' \
-H 'Content-Type: application/json' \
-X GET https://api.us-south.ae.cloud.ibm.com/v3/analytics_engines/<instance-id>/livy/batches/<batch-id>
The response body for listing the job details:
Name | Description | Type |
---|---|---|
id |
The batch ID | int |
appId |
The Spark application ID | string |
appInfo |
Detailed application information | map of key=val |
state |
State of submitted batch job | string |
An example:
curl -i -X GET https://api.us-south.ae.cloud.ibm.com/v3/analytics_engines/43f79a18-768c-44c9-b9c2-b19ec78771bf/livy/batches/14 -H 'content-type: application/json' -H "Authorization: Bearer $TOKEN"
Sample response:
{
"id": 14,
"appId": "app-20201213175030-0000",
"appInfo": {
"sparkUiUrl": null
},
"state": "success"
}
The SparkUiUrl
property in response will have a non-null value when the UI is available for serverless Spark instance.
Get job state
To get the state of your submitted job enter:
curl \
-H 'Authorization: Bearer <TOKEN>' \
-H 'Content-Type: application/json' \
-X GET https://api.us-south.ae.cloud.ibm.com/v3/analytics_engines/<instance-id>/livy/batches/<batch-id>/state
The response body for getting the state of the batch job:
Name | Description | Type |
---|---|---|
id |
The batch ID | int |
state |
State of submitted batch job | string |
For example:
curl -i -X GET https://api.us-south.ae.cloud.ibm.com/v3/analytics_engines/43f79a18-768c-44c9-b9c2-b19ec78771bf/livy/batches/14/state -H 'content-type: application/json' -H "Authorization: Bearer $TOKEN"
Sample response:
{
"id": 14,
"state": "success"
}
List all submitted jobs
To list all of the submitted Spark batch jobs enter:
curl \
-H 'Authorization: Bearer <TOKEN>' \
-H 'Content-Type: application/json' \
-X GET https://api.us-south.ae.cloud.ibm.com/v3/analytics_engines/<instance-id>/livy/batches
The from
and size
properties are not supported in the request body although they are supported in the open source Livy REST interface.
The response body for listing all submitted Spark batch jobs:
Name | Description | Type |
---|---|---|
from |
The start index of the Spark batch jobs that are retrieved | int |
total |
The total number of batch jobs that are retireved | int |
sessions |
The details for each batch job in a session | list |
For example:
curl -i -X GET https://api.us-south.ae.cloud.ibm.com/v3/analytics_engines/43f79a18-768c-44c9-b9c2-b19ec78771bf/livy/batches -H 'content-type: application/json' -H "Authorization: Bearer $TOKEN"
Sample response:
{
"from": 0,
"sessions": [{
"id": 13,
"appId": "app-20201203115111-0000",
"appInfo": {
"sparkUiUrl": null
},
"state": "success"
},
{
"id": 14,
"appId": "app-20201213175030-0000",
"appInfo": {
"sparkUiUrl": null
},
"state": "success"
}],
"total": 2
}
The SparkUiUrl
property in response will have a non-null value when the UI is available for serverless Spark instance.
Delete a job
To delete a submitted batch job enter:
curl \
-H 'Authorization: Bearer <TOKEN>' \
-H 'Content-Type: application/json' \
-X DELETE https://api.us-south.ae.cloud.ibm.com/v3/analytics_engines/<instance-id>/livy/batches/<batch-id>
For example:
curl -i -X DELETE https://api.us-south.ae.cloud.ibm.com/v3/analytics_engines/43f79a18-768c-44c9-b9c2-b19ec78771bf/livy/batches/14 -H 'content-type: application/json' -H "Authorization: Bearer $TOKEN"
Sample response:
{
"msg": "deleted"
}