# TOSCA Modelling and execution

 This document is under active development and discussion! If you find errors or omissions in this document, please don’t hesitate to submit an issue or open a pull request with a fix. We also encourage you to ask questions and discuss any aspects of the project on the Gitter forum. New contributors are always welcome!

This section describes how to model a MADF or Pilot and then run the execution in the {uri-cli}[MSO4SC Orchestrator CLI].

 Working examples be found at {uri-blueprint-examples}[the resources repository].

## 1. Modelling

 The aim of this section is not to document all the possibilities available using {uri-tosca}[TOSCA], nor the modelling capabilities of Cloudify, but the common modelling of a {uri-mso4sc}[MSO4SC] application in the MSOOrchestrator. To know more about TOSCA and Cloudify modelling, visit {uri-cloudify-blueprint}.

An application is modelled using a {uri-tosca}[TOSCA] blueprint, an {uri-oasis}[OASIS] standard language to describe a topology of cloud based web services, their components, relationships, and the processes that manage them, written in yaml.

A blueprint is typically composed by a header, an inputs section, a node_templates section, and an outputs section. Optinally can have a node_types section.

The header include the TOSCA version used and other {uri-tosca}[TOSCA] imports. In {uri-mso4sc}[MSO4SC] the Cloudify 1.1.3 tosca version, built-in types and the cloudify-hcp-plugin are mandatory:

``````tosca_definitions_version: cloudify_dsl_1_3

imports:
# Cloudify built-in types.
- http://www.getcloudify.org/spec/cloudify/4.1/types.yaml
# HPC pluging
- http://raw.githubusercontent.com/MSO4SC/cloudify-hpc-plugin/master/plugin.yaml``````

Other TOSCA files can also be imported in the `inports` list to compose a blueprint made of more than one file. See Advanced: Node Types for more info.

### 1.2. Inputs

In this section is where all the inputs that the blueprint need to know to execute properly are defined. These then can be passed as an argument list in the CLI, or prefereably by an inputs file. An input can define a default value. (See the {uri-cli}[CLI docs] and the file local-blueprint-inputs-example.yaml in the {uri-blueprint-examples}[blueprint examples]).

``````inputs:
# Monitor
monitor_entrypoint:
description: Monitor entrypoint IP
default: "127.0.0.1"
type: string

# Job prefix name
job_prefix:
description: Job name prefix in HPCs
default: "mso4sc"
type: string

# CESGA FTII parameters
ft2_config:
description: FTII connection credentials
default: {}``````

In the example above, three inputs are defined:

• `monitor_entrypoint` as the Monitor IP, localhost by default.

• `job_prefix` as the prefix that will be used to prepend to the job names in the HPC. mso4sc by default.

• `ft2_config` as a Finis Terrae II configuration. See the {uri-cloudify-hpc-config}[Cloudify HPC plugin configuraton] and for more information.

 Check the file local-blueprint-inputs-example.yaml in the {uri-blueprint-examples}[blueprint examples] as an input file example.

#### 1.2.1. MSOPortal rendering & composition

To be properly presented in the portal, the inputs must be defined using the `INPUT` tag as default value. Later on, the portal will replace this tag with the real value of the input, defined by the user.

``````  job_prefix:
default:
INPUT:
name: Job's prefix
description: Job names prefix
default: "mso"
null: false
order: 150
type: string``````

In the example above, the job_prefix input will be rendered as a string box in the advanced inputs section. Its default value is `mso`, and can not be null. The `order` tab is used to position the input among the others.

All types available has a similar structure. Specifically, types `string`, `int`, `float`, `bool` and `file` are defined the same. However there are three types based on render a list, that changes a little:

``````  app_singuarity_image:
default:
INPUT:
name: Singularity Image
description: Select the singularity image to be used
order: 30
type: list
choices:
- name: OPM Flow generic
description: OPM Flow generic
uri: "shub://sregistry.srv.cesga.es/mso4sc/opm-release:latest"
file: mso4sc-opm-image.simg
default: true

ckan_input_resource:
default:
INPUT:
name: Input dataset
description: Dataset resource that will be used as input
type: resource_list
order: 100
storage:
type: ckan
config:
entrypoint: http://...
key: 1234

ckan_outputs_dataset:
default:
INPUT:
name: Outputs dataset
description: Dataset in which outputs will be published
type: dataset_list
order: 100
storage:
type: ckan
config:
entrypoint: http://...
key: 1234``````

The `list` type uses the tag choices instead of default, to define a list of possible items. The item with `default: true` will be the one selected as default. The portal will replace the INPUT tag with the entire value of the selected item, with all its hierarchy.

Similarly, the `resource_list` and `dataset_list` shows a list of items regarding datasets. The tag that replaces default is in this case the storage config. In the case of `resource_list`, a secondary list of resources will be presented to select, depending on the dataset selected in the first list.

Finally, there is another tag `REPLACE` that it is used to compose and reuse inputs values. It won’t be rendered in the portal.

``````  singularity_image_uri:
default:
REPLACE: app_singuarity_image.uri
singularity_image_filename:
default:
REPLACE: app_singuarity_image.file``````

There are three objects that the portal directly provides to the application developer, that hold information about the user settings and current deployment:

• `INFRA_CONFIG` holds a dictionary of lists, one for each type of infrastructure (`hpc_list`, `openstack_list`, `eosc_list`)

• `USER_CONFIG` holds the storage list of the user

• `INSTANCE_CONFIG` defines the id of the current deployment

`````` primary_hpc:
default:
INPUT:
name: Primary HPC
description: Select the HPC to be used
order: 10
type: list
choices:
REPLACE: INFRA_CONFIG.hpc_list

ckan_outputs_dataset:
default:
INPUT:
name: Outputs dataset
description: Dataset in which outputs will be published
type: dataset_list
order: 100
storage:
REPLACE: USER_CONFIG.storage_list.0

instance_id:
default:
REPLACE: INSTANCE_CONFIG.id``````

### 1.3. Node Templates

In the `node_templates` section is where your application is actually defined, by stablishing nodes and relations between them.

To begin with, every node is identified by its name (`ft2_node` in the example below), and a type is assigned to it. In the case of {uri-mso4sc}[MSO4SC] each node represents either a computation infrastructure (e.g a HPC), or a job. See {uri-cloudify-hpc-types}[Cloudify HPC plugin types] for more information and examples.

HPC node example
``````node_templates:
ft2_node:
type: hpc.nodes.Compute
properties:
config: { get_input: ft2_config }
monitor_entrypoint: { get_input: monitor_entrypoint }
monitor_orchestrator_available: True
job_prefix: { get_input: job_prefix }``````

The example above represents the Finis Terrae II HPC, of type `hpc.nodes.Compute`. All HPC infrastructures must be of this type. Then the HPC is configured using the inputs (using fuction `get_input`). Detailed information about how to configure the HPCs is in the {uri-cloudify-hpc-config}[Cloudify HPC plugin configuraton].

The following code uses `ft2_node` to describe four jobs that should run in the Finis Terrae II. Two of them are of type `hpc.nodes.singularity_job` which means that the job will run using a {uri-singularity}[Singularity] container, while the other two of type `hpc.nodes.job` describe jobs that are going to run directly in the HPC using Slurm commands. Navigate to {uri-cloudify-hpc-types}[Cloudify HPC plugin types] to know more about each parameter.

Four jobs example
``````    first_job:
type: hpc.nodes.singularity_job
properties:
job_options:
modules:
- gcc/5.3.0
- openmpi/1.10.2
- singularity/2.3.1
partition: 'thin-shared'
image: '$LUSTRE/openmpi_1.10.7_ring.img' volumes: - '/scratch' command: 'ring > fourth_example_1.test' nodes: 1 tasks: 1 tasks_per_node: 1 max_time: '00:01' deployment: bootstrap: 'scripts/singularity_bootstrap_example.sh' revert: 'scripts/singularity_revert_example.sh' inputs: - 'first_job' relationships: - type: job_contained_in_hpc target: ft2_node second_parallel_job: type: hpc.nodes.job properties: job_options: type: 'SRUN' modules: - gcc/5.3.0 partition: 'thin-shared' command: 'touch fourth_example_2.test' nodes: 1 tasks: 1 tasks_per_node: 1 max_time: '00:01' deployment: bootstrap: 'scripts/bootstrap_example.sh' revert: 'scripts/revert_example.sh' inputs: - 'second_job' relationships: - type: job_contained_in_hpc target: ft2_node - type: job_depends_on target: first_job third_parallel_job: type: hpc.nodes.singularity_job properties: job_options: modules: - gcc/5.3.0 - openmpi/1.10.2 - singularity/2.3.1 partition: 'thin-shared' image: '$LUSTRE/openmpi_1.10.7_ring.img'
volumes:
- '/scratch'
command: 'ring > fourth_example_3.test'
nodes: 1
max_time: '00:01'
deployment:
bootstrap: 'scripts/singularity_bootstrap_example.sh'
revert: 'scripts/singularity_revert_example.sh'
inputs:
- 'third_job'
relationships:
- type: job_contained_in_hpc
target: ft2_node
- type: job_depends_on
target: first_job

fourth_job:
type: hpc.nodes.job
properties:
job_options:
type: 'SBATCH'
command: "touch.script fourth_example_4.test"
deployment:
bootstrap: 'scripts/bootstrap_sbatch_example.sh'
revert: 'scripts/revert_sbatch_example.sh'
inputs:
- 'fourth_job'
relationships:
- type: job_contained_in_hpc
target: ft2_node
- type: job_depends_on
target: second_parallel_job
- type: job_depends_on
target: third_parallel_job``````

Finally, jobs have two types of relationships: job_contained_in_hpc, to stablish in which HPC the job is going to run, and job_depend_on, to describe the dependency between jobs. In the example above, `fourth_job` depends on `three_parallel_job` and `second_parallel_job`, so it will not execute until the other two have finished. In the same way, `three_parallel_job` and `second_parallel_job` depends on `first_job`, so they will run in parallel once the first job is finished. All jobs are contained in `ft2_node`, so they will run on the Finis Terrae II using the credentials provided. See {uri-cloudify-hpc-relationships}[Cloudify HPC plugin relationships] for more information.

### 1.4. Outputs

The last section, `outputs`, helps to publish different attributes of each node that can be retrieved after the install workflow of the blueprint has finished (See Execution). In the MSO4SC case the attributes are not being used right now, and only the name of each job can be publish. However this can change in the future.

Each output has a name, a description, and value.

``````outputs:
first_job_name:
description: first job name
value: { get_attribute: [first_job, job_name] }
second_job_name:
description: second job name
value: { get_attribute: [second_parallel_job, job_name] }
third_job_name:
description: third job name
value: { get_attribute: [third_parallel_job, job_name] }
fourth_job_name:
description: fourth job name
value: { get_attribute: [fourth_job, job_name] }``````

Similarly to how `node_templates` are defined, new node types can be defined to be used as types. Usually these types are going to be defined in a separate file and imported in the blueprint through the `import` keyword in the Header section, although they can be in the same file. In {uri-mso4sc}[MSO4SC] MADFs are going to be described in this way.

``````node_types:
hpc.nodes.fenics_iter:
derived_from: hpc.nodes.job
properties:
iter_number:
description: Iteration index (two digits string)
job_options:
default:
type: 'SBATCH'
modules:
- 'gcc/5.3.0'
- 'impi'
- 'petsc'
- 'parmetis'
- 'zlib'
command: { concat: ['/mnt/lustre/scratch/home/otras/ari/jci/wing_minimal/fenics-hpc_hpfem/unicorn-minimal/nautilus/fenics_iter.script ', ' ', { get_property: [SELF, iter_number] }] }

hpc.nodes.fenics_post:
derived_from: hpc.nodes.job
properties:
iter_number:
description: Iteration index (two digits string)
file:
description: Input file for dolfin-post postprocessing
job_options:
default:
type: 'SBATCH'
modules:
- 'gcc/5.3.0'
- 'impi'
- 'petsc'
- 'parmetis'
- 'zlib'
command: { concat: ['/mnt/lustre/scratch/home/otras/ari/jci/wing_minimal/fenics-hpc_hpfem/unicorn-minimal/nautilus/post.script ', { get_property: [SELF, iter_number] }, ' ', { get_property: [SELF, file] }] }``````

Above there is dummy example of two new types of a MADF, derived from `hpc.nodes.job`.

The first type, `hpc.nodes.fenics_iter`, simulates an iteration of the FEniCS MADF. A new property has been defined, `iter_number`, with a description and no default value (so it is mandatory). Besides the `job_options` property default value has been overriden with a concrete list of modules, job type, and a command.

The second type, `hpc.nodes.fenics_post`, described a simulated postprocessing operation of FEniCS, defining again the `iter_number` property and another one `file`. Finally the job options default value has been overriden with a list of modules, a SBATCH type, and a command.

 The commands are built using the functions `concat` and `get_property`. This allows the orchestrator to compose the command based on other properties. See the {uri-cloudify-functions}[intrinsic functions] available for more information.

## 2. Execution

Execution of an application is performed through the {uri-cli}[MSOOrchestrator-CLI] in your local machine or a host of your own.

### 2.1. Requirements

• MSOOrchestrator CLI with connection to the MSOOrchestrator. See {uri-cli-installation}[the docs] to prepare the CLI in your local machine or remote host.

• TOSCA Blueprint. See Modelling to create your blueprint.

### 2.2. Services Information

• Finis Terrae II

• Host: ft2.cesga.es

• SZE Cluster

• Host: mso.tilb.sze.hu

• Country TimeZone: Europe/Budapest

• MSOOrchestrator: 193.144.35.131

• MSOMonitor

• Host: 193.144.35.146

• Port: 9090

• Type: PROMETHEUS

• Orchestrator Port: 8079

### 2.3. Steps

Before doing anything, the blueprint we want to execute needs to be uploaded in the orchestrator with an assigned name.

`cfy blueprints upload -b [BLUEPRINT-NAME] [BLUEPRINT-FILE].yaml`

2. Create a deployment

Once we have a blueprint installed, we create a deployment, which is a blueprint with an input file attached. This is usefull to have the same blueprint that represents the application, with different configurations (deployments). A name has to be assigned to it as well.

`cfy deployments create -b [BLUEPRINT-NAME] -i [INPUTS-FILE].yaml --skip-plugins-validation [DEPLOYMENT-NAME]`

 `--skip-plugins-validation` is mandatory as we want that the orchestrator download the plugin from a source location (GitHub in our case). This is for testing purposes, and will be removed in future releases of MSO4SC.
3. Install a deployment

Install workflow puts everything in place to run the application. Usual tasks in this workflow are data movements, binary downloads, HPC configuration, etc.

`cfy executions start -d [DEPLOYMENT-NAME] install`

4. Run the application

Finally to start the execution we run the `run_jobs` workflow to start sending jobs to the different infrastructures. The execution can be followed in the output.

`cfy executions start -d [DEPLOYMENT-NAME] run_jobs`

 The CLI has a timeout of 900 seconds, which normally is not enough time for an application to finish. However, if the CLI timeout, the execution will still be running on the MSOOrchestrator. To follow the execution just follow the instructions in the output.

#### 2.3.1. Revert previous Steps

The following revert the steps above, in order to uninstall the application, recreate the deployment with new inputs, or remove the blueprint (and possibly upload an updated one), follow the following steps.

1. Uninstall a deployment

On the contraty of the install workflow, in this case the orchestrator is tipically goint to perform the revert operation of install, by deleting execution files or moving data to an external location.

`cfy executions start -d [DEPLOYMENT-NAME] uninstall -p ignore_failure=true`

 The `ignore_failure` parameter is optional, to perform the uninstall even if an error occurs.
2. Remove a deployment

`cfy deployments delete [DEPLOYMENT-NAME]`

3. Remove a blueprint

`cfy blueprints delete [BLUEPRINT-NAME]`

#### 2.3.2. Troubleshooting

If an error occurs the revert steps can be followed revert the last steps made. However there are sometimes when the execution is stucked, or you want simply to cancel a runnin execution, or clear a blueprint or deployment that can be uninstall for whatever the reason. The following commands help you resolve these kind of situations.

1. See executions list and status

`cfy executions list`

2. Check one execution status

`cfy executions get [EXECUTION-ID]`

3. Cancel a running (started) execution

`cfy executions cancel [EXECUTION-ID]`

4. Hard remove a deployment with all its executions and living nodes

`cfy deployments delete [DEPLOYMENT-NAME] -f`

 A complete list of CLI commands can be found at {uri-cloudify-cli}