Getting Started
As discussed throughout this guide, workflow inputs can be specified however the developer wishes. However, if the inputs need to be data read from a file, rather than simply string or array inputs, this can come from a variety of locations, including both public and private data.
The simplest example is just to pass in a public URL pointing to a STAC item as a Directory input, the Workflow Runner will call that URL and download the data as a STAC Catalog, provided it is public, ready for processing. This URL can of course be data taken directly from the DataHub Resource Catalog, for example this Sentinel2_ARD item could be used as an input.
However, if you wish to pass in data from your own workspace, for example when trying to process private data, the WR supports passing in data files from both S3 Object Stores and AWS Block Stores.
Note, in the following guidance, when we refer to "your workspace" we mean the workspace you are using to execute the workflow, i.e. the one which you are authenticated as when calling the execution endpoint. This does not depend on who deployed the workflow, only who is executing it, should you be calling a public workflow.
To input data from an S3 Object Store within your workspace you can follow the example here. In order to upload data to your S3 Object Store you can use an S3 client with credentials generated from the DataHub here.
There are two ways you can access S3 data within your workflows
The first allows you to access S3 Objects directly within your workflow steps:
arn:aws:s3:<region>:<account-id>:accesspoint/<access-point-name>
The second will user the STAGEIN step to load the data from S3:
s3://arn:aws:s3:<region>:<account-id>:accesspoint/<access-point-name>/path/to/file
here "path/to/file" is the full file key for the item in this bucket.
To first manage datasets in your workspace Block Store you can use the AppHub (JupyterLab) application on the Hub. This allows you to upload data and create directories to organise your data as you wish. Once you have the files you want to use in your Block Store you can construct a workflow that again uses the STAGEIN step to load your input data.
/workspace/pv-<workspace-name>-workspace
so ensure your input path starts with this prefix and then add the path to your file within your Block Store