Explore creating serverless instances and submitting applications using the CLI
Learn how to use the IBM Analytics Engine CLI to create the services that you need to create and manage a serverless instance, and submit and monitor your Spark applications.
You create a serverless instance by selecting the IBM Analytics Engine Standard serverless plan. When a serverless instance is provisioned, an Apache Spark cluster is created, which you can customize with library packages of your choice, and is where you run your Spark applications.
Objectives
You will learn how to install and set up the following services and components that you will need to use the CLI:
- An IBM Cloud Object Storage instance in which your IBM Analytics Engine instance stores custom application libraries and Spark history events.
- A Object Storage bucket for application files and data files.
- An IBM Analytics Engine serverless instance. This instance is allocated compute and memory resources on demand whenever Spark workloads are deployed. When an application is not in running state, no computing resources are allocated to the instance. The price is based on the actual usage of resources consumed by the instance, billed on a per second basis.
- Logging service to help you to troubleshoot issues that might occur in the IBM Analytics Engine instance and submitted application, as well as to view any output generated by your application. When you run applications with logging enabled, logs are forwarded to an IBM Log Analysis service where they are indexed, enabling full-text search through all generated messages and convenient querying based on specific fields.
Before you begin
To start using the the Analytics Engine V3 CLI you need:
- An IBM Cloud® account.
- To install the IBM Cloud CLI. See Getting started with the IBM Cloud CLI for the instructions to download and install the CLI.
Now you can start using the Analytics Engine V3 CLI. You must follow the instructions in steps 1 and 2 to install the required services before you start with step 3 to upload and submit Spark applications. Step 4 shows you how to create a logging instance and enable logging. Step 5 shows you how to delete an Analytics Engine instance, although this step is optional.
Create a Cloud Object Storage instance and retrieve credentials
Create an IBM Cloud Object Storage instance and retrieve the Cloud Object Storage credentials (service keys) by using the Analytics Engine Serverless CLI.
The Cloud Object Storage instance that you create is required by the IBM Analytics Engine instance as its instance home storage for Spark history events and any custom libraries or packages that you want to use in your applications. See Instance home.
For more information on how you can create a library set with custom packages that is stored in Cloud Object Storage and referenced from your application, see Using a library set.
-
Log in to IBM Cloud® using your IBM Cloud® account.
Action :Enter:
ibmcloud api <URL> ibmcloud login
Example :Enter:
ibmcloud api https://cloud.ibm.com ibmcloud login
-
Select the resource group. Get the list of the resource groups for your account and select one in which to create the IBM Analytics Engine serverless instance:
Action :Enter:
ibmcloud target -g RESOURCE_GROUP_NAME
Parameter values:
- RESOURCE_GROUP_NAME: The name of resource group in which the serverless instance is to reside
Example :Enter:
ibmcloud resource groups ibmcloud target -g default
-
Install the IBM Cloud Object Storage service and then the Analytics Engine V3 CLI:
Action :Enter:
ibmcloud plugin install cloud-object-storage
Action :Enter:
ibmcloud plugin install analytics-engine-v3
-
Create a Cloud Object Storage instance:
Action :Enter:
ibmcloud resource service-instance-create INSTANCE_NAME cloud-object-storage PLAN global
Parameter values:
- INSTANCE_NAME: Any name of your choice
- PLAN: The Cloud Object Storage plan to use when creating the instance
Example :Enter:
ibmcloud resource service-instance-create test-cos-object cloud-object-storage standard global
Response :The example returns:
Service instance test-cos-object was created. Name: test-cos-object ID: crn:v1:bluemix:public:cloud-object-storage:global:a/867d444f64594fd68c7ebf4baf8f6c90:ebad3176-8a1a-41f2-a803-217621bf6309:: GUID: ebad3176-8a1a-41f2-a803-217621bf6309 Location: global State: active Type: service_instance Sub Type: Allow Cleanup: false Locked: false Created at: 2021-12-27T07:57:56Z Updated at: 2021-12-27T07:57:58Z Last Operation: Status create succeeded Message Completed create instance operation
-
Configure the CRN by copying the value of ID from the response the of Cloud Object Storage creation call in the previous step:
Action :Enter:
ibmcloud cos config crn Resource Instance ID CRN: ID
Parameter values:
- ID: The value of ID from the response the of Cloud Object Storage creation call
Example :Enter:
ibmcloud cos config crn Resource Instance ID CRN: crn:v1:bluemix:public:cloud-object-storage:global:a/867d444f64594fd68c7ebf4baf8f6c90:ebad3176-8a1a-41f2-a803-217621bf6309::
-
Create a Cloud Object Storage bucket:
Action :Enter:
ibmcloud cos bucket-create --bucket BUCKET_NAME [--class CLASS_NAME] [--ibm-service-instance-id ID] [--region REGION] [--output FORMAT]
Parameter values:
- BUCKET_NAME: Any name of your choice
- ID: The value of GUID from the response the of Cloud Object Storage creation call
- REGION: The IBM Cloud region in which the Cloud Object Storage instance was created
- FORMAT: Output format can be JSON or text.
Example :Enter:
ibmcloud cos bucket-create --bucket test-cos-storage-bucket --region us-south --ibm-service-instance-id ebad3176-8a1a-41f2-a803-217621bf6309 --output json
-
Create Cloud Object Storage service keys:
Action :Enter:
ibmcloud resource service-key-create NAME [ROLE_NAME] ( --instance-id SERVICE_INSTANCE_ID | --instance-name SERVICE_INSTANCE_NAME | --alias-id SERVICE_ALIAS_ID | --alias-name SERVICE_ALIAS_NAME) [--service-id SERVICE_ID] [-p, --parameters @JSON_FILE|JSON_TEXT] [-g RESOURCE_GROUP] [--service-endpoint SERVICE_ENDPOINT_TYPE] [--output FORMAT] [-f, --force] [-q, --quiet]
Parameter values: - NAME: Any name of your choice - [ROLE_NAME]: This parameter is optional. The access role, for example, `Writer` or `Reader` - SERVICE_INSTANCE_ID: The value of GUID from the response the of Cloud Object Storage creation call - SERVICE_INSTANCE_NAME: The value of NAME from the response the of Cloud Object Storage creation call - JSON_TEXT: The authentication to access Cloud Object Storage. Currently only HMAC keys are supported.
Example :Enter:
ibmcloud resource service-key-create test-service-key-cos-bucket Writer --instance-name test-cos-object --parameters '{"HMAC":true}'
- Response
- The example returns:
Creating service key of service instance test-cos-object under account Test OK Service key crn:v1:bluemix:public:cloud-object-storage:global:a/183**93b485e:9ee135f9-4667-4797-8478-b20**ce-key:21a310e1-bbd6-**bf1f4 was created. Name: test-service-key-cos-bucket ID: crn:v1:bluemix:public:cloud-object-** Created At: Mon Dec 27 12:52:49 UTC 2021 State: active Credentials: apikey: 3a4Ncm**o-WJGFaEzwfY cos_hmac_keys: access_key_id: 21a31**f1f4 secret_access_key: c5a23**b6792d3e0a6c endpoints: https://control.cloud-object-storage.cloud.ibm.com/v2/endpoints iam_apikey_description: Auto-generated for key crn:v1:bluemix:public:cloud-object-storage:global:a/1836f778**c93b485e:9ee**8478-b2019a4b4e20:resource-key:21a3**05a9bf1f4 iam_apikey_name: test-service-key-cos-bucket iam_role_crn: crn:v1:bluemix:public:iam::::serviceRole:Writer iam_serviceid_crn: crn:v1:bluemix:public:iam-identity::a/1836f77885e521c5ab2523aac93b485e::serviceid:ServiceId-702ca222-3615-464c-92d3-1849c03170cc resource_instance_id: crn:v1:bluemix:public:cloud-object-storage:global:a/1836f7**3b485e:9ee135f9-4667-479**4e20::
Create an Analytics Engine serverless instance
Create an serverless Analytics Engine instance by using the CLI.
-
Create the Analytics Engine service instance:
Action :Enter:
ibmcloud resource service-instance-create INSTANCE_NAME ibmanalyticsengine standard-serverless-spark us-south -p @provision.json
Parameter values:
- INSTANCE_NAME: Any name of your choice
- @provision.json: Structure the JSON file as shown in the following example. Use the access and secret key from the response the of Cloud Object Storage service key creation call.
Example of the provision.json file :Sample JSON file:
{ "default_runtime": { "spark_version": "3.4" }, "instance_home": { "region": "us-south", "endpoint": "https://s3.direct.us-south.cloud-object-storage.appdomain.cloud", "hmac_access_key": "<your-hmac-access-key>", "hmac_secret_key": "<your-hmac-secret-key>"} }
Example :Enter:
ibmcloud resource service-instance-create test-ae-service ibmanalyticsengine standard-serverless-spark us-south -p @ae_provision.json
Response :The example returns: ```text Creating service instance test-ae-service in resource group
of account as ... OK Service instance test-ae-service was created. Name: test-ae-service ID: crn:v1:bluemix:public:ibmanalyticsengine:us-south:a/183**aac93b485e:181ea**be1-70978**1b:: GUID: 181ea**9ee01b Location: us-south State: provisioning Type: service_instance Sub Type: Service Endpoints: public Allow Cleanup: false Locked: false Created at: 2022-01-03T08:40:25Z Updated at: 2022-01-03T08:40:26Z Last Operation: Status create in progress Message Started create instance operation ```
-
Check the status of the Analytics Engine service:
Action :Enter:
ibmcloud ae-v3 instance show –id INSTANCE_ID
Parameter values:
- INSTANCE_ID: The value of GUID from the response the of Analytics Engine instance creation call
Example :Enter:
ibmcloud ae-v3 instance show –id 181ea**9ee01b
Response :The example returns:
{ "default_runtime": { "spark_version": "3.4" }, "id": "181ea**9ee01b ", "instance_home": { "bucket": "do-not-delete-ae-bucket-e96**5d-b7**a82", "endpoint": "https://s3.direct.us-south.cloud-object-storage.appdomain.cloud", "hmac_access_key": "**", "hmac_secret_key": "**", "provider": "ibm-cos", "region": "us-south", "type": "objectstore" }, "state": "active", "state_change_time": "**" }
Only submit your Spark application when the state of the Analytics Engine service is active.
Upload and submit a Spark application
Upload an application file to Cloud Object Storage and submit a Spark application.
This tutorial shows you how to add the Spark application to the Cloud Object Storage instance bucket that is used as instance home by the Analytics Engine instance. If you want to separate the instance related files from the files you use to run your applications, for example the applications files themselves, data files, and any results of your analysis, you can use a different bucket in the same Cloud Object Storage instance or use a different Cloud Object Storage instance.
-
Upload the Spark application file:
Action :Enter:
ibmcloud cos upload --bucket BUCKET_NAME --key KEY --file PATH [--concurrency VALUE] [--max-upload-parts PARTS] [--part-size SIZE] [--leave-parts-on-errors] [--cache-control CACHING_DIRECTIVES] [--content-disposition DIRECTIVES] [--content-encoding CONTENT_ENCODING] [--content-language LANGUAGE] [--content-length SIZE] [--content-md5 MD5] [--content-type MIME] [--metadata STRUCTURE] [--region REGION] [--output FORMAT] [--json]
Parameter values:
- BUCKET_NAME: Name of bucket used when bucket was created
- KEY: Application file name
- PATH: file name and path to the Spark application file
Example :Enter:
ibmcloud cos upload --bucket test-cos-storage-bucket --key test-math.py --file test-math.py
- Sample application file
- Sample of
test-math.py
:
from pyspark.sql import SparkSession import time import random import cmath def init_spark(): spark = SparkSession.builder.appName("test-math").getOrCreate() sc = spark.sparkContext return spark,sc def transformFunc(x): return cmath.sqrt(x)+cmath.log(x)+cmath.log10(x) def main(): spark,sc = init_spark() partitions=[10,5] for i in range (0,2): data=range(1,20000000) v0 = sc.parallelize(data, partitions[i]) v1 = v0.map(transformFunc) print(f"v1.count is {v1.count()}. Done") time.sleep(60) if __name__ == '__main__': main()
-
Check the status of the Analytics Engine service:
Action :Enter:
ibmcloud ae-v3 instance show –id INSTANCE ID
Parameter values:
- INSTANCE_ID: The value of GUID from the response the of Analytics Engine instance creation call
Example :Enter:
ibmcloud ae-v3 instance show –id 181ea**9ee01b
Response :The example returns:
{ "default_runtime": { "spark_version": "3.4" }, "id": "181ea**9ee01b ", "instance_home": { "bucket": "do-not-delete-ae-bucket-e96**5d-b7**a82", "endpoint": "https://s3.direct.us-south.cloud-object-storage.appdomain.cloud", "hmac_access_key": "**", "hmac_secret_key": "**", "provider": "ibm-cos", "region": "us-south", "type": "objectstore" }, "state": "active", "state_change_time": "**" }
Only submit your Spark application when the state of the Analytics Engine service is active.
-
Submit the Spark application:
Action :Enter:
ibmcloud ae-v3 spark-app submit --instance-id INSTANCE_ID -–app APPLICATION_PATH
Parameter values:
- INSTANCE_ID: The value of GUID from the response the of Analytics Engine instance creation call
- APPLICATION_PATH: The file name and path to the Spark application file
Example for IOS and Linux :Enter:
ibmcloud ae-v3 spark-app submit --instance-id 181ea**9ee01b --app "cos://test-cos-storage-bucket.mycos/test-math.py" --conf '{"spark.hadoop.fs.cos.mycos.endpoint": "https://s3.direct.us-south.cloud-object-storage.appdomain.cloud", "spark.hadoop.fs.cos.mycos.access.key": "21**bf1f4", "spark.hadoop.fs.cos.mycos.secret.key": "c5a**d3e0a6c"}'
Example for Windows (Not Powershell). Note that on Windows, the quotes needs to be escaped. :Enter:
ibmcloud ae-v3 spark-app submit --instance-id myinstanceid --app "cos://matrix.mycos/test-math.py" --conf "{\"spark.hadoop.fs.cos.mycos.endpoint\": \"https://s3.direct.us-south.cloud-object-storage.appdomain.cloud\", \"spark.hadoop.fs.cos.mycos.access.key\": \"mykey\", \"spark.hadoop.fs.cos.mycos.secret.key\": \"mysecret\"}"
Response :The example returns:
id 7f7096d2-5c44-4d9a-ac01-b904c7611b7b state accepted
-
Check the details or status of the application that you submitted:
Action :Enter:
ibmcloud ae-v3 spark-app show --instance-id INSTANCE_ID --app-id APPLICATION_ID
Parameter values:
- INSTANCE_ID: The value of GUID from the response the of Analytics Engine creation call
- APPLICATION_ID: The value of id from the response of the spark-app submit call
Example :Enter:
ibmcloud ae-v3 spark-app show --instance-id 181ea**9ee01b --app-id 7f7096d2-5c44-4d9a-ac01-b904c7611b7b
Response :The example returns:
application_details <Nested Object> id 7f7096d2-5c44-4d9a-ac01-b904c7611b7b state finished start_time 2022-03-01T12:58:54.000Z finish_time 2022-03-01T13:09:14.000Z
The application might take between 2 to 5 minutes to complete.
Create logging service to see logs
You can use use the Analytics Engine CLI to enable logging to help you troubleshoot issues in IBM Analytics Engine. Before you can enable logging, you need to create an IBM Log Analysis service instance to which the logs are forwarded.
-
Create a logging instance:
Action :Enter:
ibmcloud resource service-instance-create NAME logdna SERVICE_PLAN_NAME LOCATION
Parameter values:
- NAME: Any name of your choice for the IBM Log Analysis service instance
- SERVICE_PLAN_NAME: The name of the service plan. For valid values, see Service plans.
- LOCATION: Locations where Analytics Engine is enabled to send logs to IBM Log Analysis. For valid locations, see Compute serverless services.
Example :Enter:
ibmcloud resource service-instance-create my-log-instance logdna 7-day us-south
Once the logging service is created, you can log in to IBM Cloud®, search for the logging service instance, and click on the monitoring dashboard. There you can view the driver and executor logs, as well as all application logs for your Spark application.
Search using the
application_id
orinstance_id
. -
Enable platform logging:
To view IBM Analytics Engine platform logs, you must use the Observability dashboard in IBM Cloud to configure platform logging. See Configuring platform logs through the Observability dashboard for the steps you need to follow to enable logging through the Observability dashboard.
-
Enable logging for Analytics Engine:
Action :Enter:
ibmcloud analytics-engine-v3 log-config COMMAND [arguments...] [command options]
Parameter values:
analytics-engine-v3
: Useae-v3
to use the v3 CLI commands- COMMAND: Use the
update
command to enable logging
Example :Enter:
ibmcloud ae-v3 log-config update --instance-id 181ea**9ee01b --enable --output json
Delete Analytics Engine instance
You can use the CLI to delete an instance, for example if you need an instance with a completely different configuration to handle greater workloads.
You can retain an Analytics Engine instance as long as you want and submit your Spark applications against the same instance on an as-needed basis.
If you want to delete an Analytics Engine instance:
Action :Enter:
ibmcloud resource service-instance-delete NAME|ID [-g RESOURCE_GROUP] -f
Parameter values:
- NAME|ID: The value of Name or GUID from the response of the Analytics Engine instance creation call
- RESOURCE_GROUP: Optional parameter. The name of resource group in which the serverless instance is resides
Example :Enter:
ibmcloud resource service-instance-delete MyServiceInstance -g default -f