This section describes how to create Data Engineering clusters.
- In the left navigation pane, select Data Science & Engineering > Compute.
- In All-purpose compute (1) tab, select Create with Siemens_AllPurpose_ClusterPolicy (2) from the drop-down list.
- A new cluster with a default cluster name appears.
- Change the name (1) of the cluster according to your requirement.
- Select Multi node (2).
Note: Using Single mode is not recommended.
- For Access mode (3), Shared is selected by default and is not editable.
- In the Performance section, select the following options:
- Select a Databricks runtime version (1). Choose between Standard and ML as required. The latest Databricks runtime version is selected by default.
- For Worker type (2), select the required value from the drop-down list.
- For Min workers (3) and Max workers (4), type in the required value. Minimum value for Min worker is 1 and max value for the Max worker is 20.
- For the Driver type (5), either choose Same as worker or pick from the available types.
- Select or deselect Enable autoscaling local storage (6) as required. The options Enable autoscaling and Terminate after 30 minutes of inactivity are preselected and cannot be modified.
- In the Instance profile section, meta-iam-role extension is preselected. It allows connectivity to the data lake.
- In the Tags section, add Tags as required by providing values in the Key and Value fields. See table below for default tags.
- In the Advanced options section, Enable credential passthrough for user-level data access option is preselected and cannot be edited. Keep default configuration for all other settings under Advanced options.
- Select Create Cluster to create the cluster.
About using cluster policies
Using a cluster has the following advantages:
- To avoid unnecessary costs in case of cluster inactivity, the cluster will shut down automatically after 30 min.
- To avoid misconfiguration which might lead to high cost in case of selecting high number of workers.
- To track the cost through applied tags for each cluster.
- To enforce meta instance profile configuration which is required to build the connectivity to the Data Lake.
Default tags
Key | Value |
---|---|
env | Prodint |
teams | Dataengineering |