Config Objects

A Config object defines and implements how Parsl connects to compute resources used to execute tasks.

The main part of the Config is a list of executors which each define a type of worker.

Consider the following example, a configuration to run 7168 parallel workers on the Frontera supercomputer at TACC:

from parsl.config import Config
from parsl.providers import SlurmProvider
from parsl.executors import HighThroughputExecutor
from parsl.launchers import SrunLauncher
from parsl.addresses import address_by_interface

config = Config(
    executors=[
        HighThroughputExecutor(
            label="frontera_htex",
            address=address_by_interface('ib0'),
            max_workers_per_node=56,
            provider=SlurmProvider(
                partition='normal',
                nodes_per_block=128,
                init_blocks=1,
                launcher=SrunLauncher(),
            ),
        )
    ],
)

The worker options are those for HighThroughputExecutor, the baseline Executor available in Parsl. The options include a name for the workers of this type (label), how the worker connects to the main Parsl process (address), and how many workers to place on each compute node (max_workers_per_node).

The provider options define the queue used for submission (partition), and tells Parsl request one job (init_blocks) of 128 nodes (nodes_per_block).

The launcher uses the default mechanism for starting programs on Frontera compute nodes, srun.

Using a Config Object

Use the Config object to start Parsl’s data flow kernel with the parsl.load method:

from parsl.configs.htex_local import config
import parsl

with parsl.load(config) as dfk:
    future = app(x)
    future.wait()

The .load() function creates a DataFlowKernel (“DFK”) object that maintains the workflows state. The DFK acquires the resources used by Parsl and should be closed to release the resources. While the DFK can be closed manually (dfk.cleanup()), the preferred and Pythonic route is use it as a context manager (the with statement) Using a context manager avoids unnecessary code and ensures cleanup occurs even if the workflow fails.

The load statement can happen after Apps are defined but must occur before tasks are started. Loading the Config object within context manager like with is recommended for implicit cleaning of DFK on exiting the context manager.

The Config object may not be used again once loaded. Consider a configuration function if the application will shut down and re-launch the DFK.

from parsl.config import Config
import parsl

def make_config() -> Config:
    return Config(...)

with parsl.load(make_config()):
    # Your workflow here

# Section which does not require Parsl

with parsl.load(make_config()):
    # Another workflow here

Config Options

Options for the Config object apply to Parsl’s general behavior and affect all executors. Common options include:

  • run_dir for setting where Parsl writes log files

  • retries to restart failed tasks

  • usage_tracking to help Parsl by reporting how you use it

Consult the API documentation for Config or the advanced documentation to learn about options.

Multiple Executors

A single application can configure multiple executors.

All executors define a label field that is used route to specific workers. All types of apps include a executors option which takes a list of executor labels. For example, tasks from the following App will only run on an executor labelled “frontera_htex”.

@python_app(executors=['frontera_htex'])
def single_threaded_task(x: int):
    return x * 2 + 1

Consider using multiple executors in the following cases:

  • Different resource requirements between tasks, such as a workflow with a simulation stage that runs on the CPU nodes of an HPC system followed by an analysis and visualization stage that runs on GPU nodes.

  • Different scales between workflow stages, such as a workflow with a “fan-out” stage of many long running running on a cluster and quick “fan-in” computations which can run on fewer nodes.