The configuration module

This module is used to configure a virtual_ecosystem simulation run. This module reads in a set of configuration files written using toml. It is setup in such a way as to allow a reduced set of modules to be configured (e.g. just plants and soil), and to allow specific module implementations to be configured (e.g. plants_with_hydro instead of plants). The resulting combined model configuration is validated against a set of JSON Schema. If this passes, a combined output file is saved as a permanent record of the model configuration. This configuration is also saved as a dictionary accessible to other modules and scripts.

Configuration files

We decided to use toml as our configuration file format because it is: easily human readable (unlike JSON), allows nesting (unlike ini), not overly complex (unlike yaml), and is well supported in the python ecosystem (unlike strict_yaml). An example of a toml configuration is shown below:

[core]
[core.grid]
cell_nx = 10
cell_ny = 10

Here, the first tag indicates the module in question (e.g. core), and subsequent tags indicate (potentially nested) module level configuration details (e.g. horizontal grid size cell_nx).

The configuration system does not require a single input config file, instead the configuration can be separated out into a set of config files. This allows different configuration files to be re-used in a modular way, allowing a library of configuration options to be set up.

When a simulation is run, users can identify a set of specific configuration files or specific folders containing a set of files that should all be used. This set of files will be loaded and assembled into a complete configuration. Optionally, the configuration can include instructions to export the assembled configuration as single file that provides a useful record of the setup for a particular simulation.

[core.data_output_options]
save_merged_config = true
output = "/output/directory"
out_merge_file_name = "merged_configuration.toml"

Note that configuration setting cannot be repeated between files as there is no way to establish which of two values (of e.g. core.grid.cell_nx) the user intended to provide. When settings are repeated, the configuration process will report a critical error and the simulation will terminate.

Optional module loading

The config system allows for different module implementations and combinations to be configured. The choice of models to be configured is indicated by including the required model names as top level entries in the model configuration. Note that the model name is required, even if the configuration uses all of the default settings. For example, this configuration specifies that four models are to be used, all with their default settings:

[core]  # optional
[soil]
[hydrology]
[plants]
[abiotic_simple]

The [core] element is optional as the Virtual Ecosystem core module is always required and the default core settings will be used if it is omitted. It can be useful to include it as a reminder that a particular configuration is intentionally using the default settings. Each module configuration section can of course be expanded to change defaults.

Warning

Note that there is no guarantee that a particular set of configured models work in combination. You will need to look at model details to understand which other modules might be required.

JSON schema

The contents of the config files are validated using JSON Schema, this is performed using the python package jsonschema. We use these schema to validate the most basic properties of the input data (e.g. that the path to a file is a string), with more complex validation being left to downstream functions. We check for missing expected tags, unexpected tags, that tags are of the correct type, and where relevant that input values are strictly positive. Additionally, we use these schema to populate default values when tags are not provided. The schema is saved a JSON file, which follows the pattern below:

{
   "type": "object",
   "properties": {
      "core": {
         "description": "Configuration settings for the core module",
         "type": "object",
         "properties": {
            "grid": {
               "description": "Details of the grid to configure",
               "type": "object",
               "properties": {
                  "nx": {
                     "description": "Number of grid cells in x direction",
                     "type": "integer",
                     "exclusiveMinimum": 0,
                     "default": 100
                  },
                  "ny": {
                     "description": "Number of grid cells in y direction",
                     "type": "integer",
                     "exclusiveMinimum": 0,
                     "default": 100
                  }
               },
               "default": {},
               "required": [
                  "nx",
                  "ny"
               ]
            }
         },
         "default": {},
         "required": [
            "grid",
         ]
      }
   },
   "required": [
      "core"
   ]
}

The type of every single tag should be specified, with object as the type for tags that are containers for more nested tags (i.e. core). In cases where strictly positive values are required this is achieved by setting exclusiveMinimum to zero. For each object, the required key specifies the tags that must be included for validation to pass.

We do not permit configuration tags that are not included within a schema, therefore the config module automatically sets additionalProperties as false for every object in the schema. The default key is used to specify the default value that should be inserted if the tag in question is not provided for the user. The default value for all objects should be set as {} to ensure that nested defaults can be found and populated. In general, we use default keys to specify relatively simple defaults (e.g. lists or single values), more complex defaults (e.g. tables of plant functional types, climate time series) are not currently supported. The individual module schema are saved as JSON files within their respective module folders, then loaded by the module __init__.py scripts and written to the schema registry using a decorator. The config module extracts the relevant schema from the registry and combines them into a single schema in order to carry out final validation. If any of these schema are incorrectly formatted the configuration process will critically fail.

Final output

In addition to saving the configuration as an output file, it is also returned so that downstream functions can make use of it. This is as a simple nested dictionary.