Architecture and concepts in serverless instances
This topic shows you the architecture of IBM Analytics Engine serverless instances and describes some key concepts and definitions.
Instance architecture
The IBM Analytics Engine service is managed by using IBM Cloud® Identity and Access Management (IAM). As an IBM Cloud account owner, you are assigned the account administrator role.
With an IBM Cloud account, you can provision and manage your serverless Analytics Engine instance by using the:
- IBM Cloud console
- CLI
- REST API
The Analytics Engine microservices in the control plane, accessed through an API gateway handle instance creation, capacity provisioning, customization and runtime management while your Spark applications run in isolated namespaces in the data plane. Each Spark application that you submit runs in its own Spark cluster, which is a combination of Spark master and executor nodes. See Isolation and network access.
Each Analytics Engine instance is associated with an IBM Cloud Object Storage instance for instance related data that is accessible by all applications that run in the instance. Currently, all Spark events are stored in this instance as well. Spark application logs are aggregated to a Log Analysis log server.
Key concepts
With IBM Analytics Engine serverless instances, you can spin up Apache Spark clusters as needed and customize the Spark runtime and default Spark configuration options.
The following sections describe key concepts when provisioning serverless instances.
IBM Analytics Engine service instance
An IBM Cloud® service is cloud extension that provides ready-for-use functionality, such as database, messaging, and web software for running code, or application management or monitoring capabilities. Services usually do not require installation or maintenance and can be combined to create applications. An instance of a service is an entity that consists of resources that are reserved for a particular application or a service.
When you create an IBM Analytics Engine from the catalog, you will give the service instance a name of your choice, select the default Spark runtime you want to associate with the instance and provide the default Spark configuration to use
with the instance. Additionally, you need have to specify the Instance home
, which is the storage attached to the instance for instance related data only.
Note:
- When you create an IBM Analytics Engine service instance, no costs are incurred unless you have Spark applications running or the Spark history server is accessed.
- Costs are incurred if IBM Cloud Object Storage if accessed through public endpoints, and when you enable forwarding IBM Analytics Engine logs to IBM Log Analysis.
- There is a default limit on the number of service instances permitted per IBM Cloud® account and on the amount of CPU and memory that can be used in any given IBM Analytics Engine service instance. See Limits and quotas for Analytics Engine instances. If you need to adjust these limits, open an IBM Support ticket.
- There is no limit on the number of Spark applications that can be run in an IBM Analytics Engine service instance.
Default Spark runtime
At the time of instance provisioning, you can select the Spark version to be used. Spark 3.4. is the default version.
The runtime includes open source Spark binaries and the configuration helps you to quickly proceed with the instance creation and run Spark applications in the instance. In addition to the Spark binaries, the runtime also includes the geospatial, data skipping, and Parquet modular encryption libraries.
Across all Spark runtime version, you can submit Spark applications written in the following languages:
- Scala
- Python
- R
The following table shows the Spark runtime version and runtime language version.
Spark version | Apache Spark release | status | Supported Languages |
---|---|---|---|
3.1 | 3.1.2 | Removed(Not Supported) | Java 8, Scala 2.12, Python 3.10 and R 4.2 |
3.3 | 3.3.2 | Removed(Not Supported) | Java 11, Scala 2.12, Python 3.10 and R 4.2 |
3.4 | 3.4.1 | Latest | Java 11, Scala 2.12, Python 3.10 and R 4.2 |
The language versions are upgraded periodically to keep the runtime free from any security vulnerabilities. You can always override the Spark runtime version when you submit an application. For details on what to add to the payload, see Passing the runtime Spark version when submitting an application.
Instance home
Instance home is the storage attached to the instance for instance related data only, such as custom application libraries and Spark history events. Currently, only IBM Cloud Object Storage is accepted for instance home. This instance can be an instance in your IBM Cloud® account or an instance from a different account.
When you provision an instance using the IBM Cloud console, the IBM Cloud Object Storage instances in your IBM Cloud® account are auto discovered and displayed in a list for you to select from. If no IBM Cloud Object Storage instances are found in your account, you can use the REST APIs to update instance home after instance creation.
You can't change instance home after instance creation. You can only edit the access keys.
Default Spark configuration
You can specify default Spark configurations at the time of provisioning an Analytics Engine instance (See Provisioning an IBM Analytics Engine serverless instance). The configurations are automatically applied to the Spark applications submitted on the instance. You can also update the configurations after creating the instance. You can edit the configuration from the Configuration section in the Analytics Engine Instance details page, Analytics Engine Rest APIs or IAE CLI . Values specified as instance level defaults can be overridden at the time of submitting Spark applications.
To learn more about the various Apache Spark configurations, see Spark Configuration.
Serverless instance features and execution methods
The following table shows the supported serverless instance features by access role and execution methods.
Operation | Access roles | IBM Console | API | CLI |
---|---|---|---|---|
Provision instances | Administrator | |||
Delete instances | Administrator | |||
Grant users permission | Administrator | |||
Manage instance home storage | Administrator | |||
Configure logging | Administrator Developer Devops |
Not available | ||
Submit Spark applications | Administrator Developer |
Not available | ||
View list of submitted Spark applications | Administrator Developer DevOps |
Not available | ||
Stop submitted Spark applications | Administrator Developer DevOps |
Not available | ||
Customize libraries | Administrator Developer |
Not available | Not available | |
Access job logs | Administrator Developer DevOps |
from the Log Analysis console | Not applicable | Not applicable |
View instance details; shown details might vary depending on access role | Administrator Developer DevOps |
|||
Manage Spark history server | Administrator Developer |
|||
Access Spark history | Administrator Developer DevOps |