Defining new pipelines
A pipeline is a sequence of xia2/DIALS commands used to process a
Singla dataset. The result of that processing is saved in a separate
directory named after the pipeline (e.g., the processed/my_pipeline
directory). All existing pipelines are defined in the AutoED default
configuration file. Look into the defined_pipelines field in the
configuration file to understand how to define a new pipeline. When
AutoED runs a pipeline, it creates a pipeline directory and a script
file (a bash script for local processing or a JSON file for SLURM
processing). The pipeline script is generated by processing the
script field in the pipeline definition. For conventions of how to
write this field, see below. You can directly view the JSON or bash script to
check that your pipeline is generated correctly. There is an assumption that
things like finding beam position, plotting spot figures, and converting to
NeXus are done automatically, so they are not considered part of a pipeline. A
pipeline is executed only when these steps are completed.
To understand how a pipeline is defined, let us look at the definition of the
default pipeline in the configuration file.
{
"pipeline_name": "default",
"type": "xia2",
"run_condition": true,
"script": [
"xia2 image={nexus_file}",
"goniometer.axis=0,-1,0 dials.fix_distance=True",
"dials.masking.d_max=9",
"xia2.settings.remove_blanks=True",
"input.gain={g.gain};"
]
},
Here, we have a field pipeline_name that defines the name of the pipeline
and, at the same time, the name of the pipeline output directory. Do not use
space or tab characters when naming a pipeline. Use the underscore character
_ instead. The field type specifies what kind of
pipeline we are defining. Currently, the pipeline can be either a dials
or xia2 pipeline. This field is mainly used when generating reports. We
need to tell AutoED what kind of output to expect. The field run_condition
allows the pipeline to run only when certain conditions are met (for example,
some parameter is set in the global configuration file or in the local JSON
metadata file). If set to true, the pipeline will always run (assuming
it is set to run in the run_pipelines field in the global configuration
file). For more details on setting conditional pipelines, see below.
Finally, the script field defines a bash script template that is
executed when AutoED runs the pipeline.
Writing the script field in a pipeline definition
The script field is a bash command template you want to run during the
pipeline execution. The field is just a list of strings. We used a
list instead of a single string to allow the user to split a long
sequence of commands into multiple lines (for better readability).
There are a few conventions you should be aware when writing the script
field.
All strings in a list are concatenated into a single string with spaces between them. If you define a script as
"script": ['command1', 'option_1=abc', 'option_2=123']
it will get concatenated into
command1 option1=abc option_2=123. If you have a long list of options (e.g., for a DIALS command), splitting those into separate lines is a good idea.If you do not want to insert a space when concatenating two strings, you can end the first string with
%%. For example"script": ['command1', 'option_1=%%', 'abc']
will concatenate to
command1 option_1=abc.Since all lines in the script list are concatenated into one, you should use the semicolon
;to explicitly terminate all bash commands.There is a list of variables you can use in curly brackets (just like in Python f-strings). After the
scriptconcatenation, AutoED will treat the generated string as an f-string and replace the variables in curly brackets{}with their actual values (which depend on the dataset). The list of available arguments is the following:{nexus_file}- The full path to the generated nexus file. You would use this to import into DIALS or as an xia2 image parameter.{processed_dir}- The full output path for the given pipeline (e.g.,/path/to/watched/dir/processed/pipeline_name). You can use{processed_dir}/imported.exptto get theimported.exptfile, etc.{imported_file}- Equivalent to{processed_dir}/imported.exptmentioned above.{refl_file}- Equivalent to{processed_dir}/strong.refl.{m.some_field}- Accesssome_fieldin the dataset metadata JSON file. For example, if the metadata file has a fieldspace_group, you can access it with{m.space_group}.{g.some_field}- Access values in the AutoED global configuration file. For example, if the fieldgainis defined in the global configuration file, use{g.gain}.{unit_cell}- A shortcut for{m.unit_cell[0],m.unit_cell[1],..m.unit_cell[5]}. In other words, if the fieldunit_cellis defined in the dataset JSON metadata file, then{unit_cell}will make a string of this field with comma-separated values (the way this parameter is provided to xia2/DIALS).
Conditional pipelines
In case you want to define a conditional pipeline, you can use the previous
m and g variables (without curly brackets) in Python conditional
statements. You write these conditional statements as strings. They should
return a boolean. For example, the conditional statement for the user
pipeline (defined in the default configuration file) checks if there
is a unit_cell and space_group field defined in the metadata JSON
file.
"run_condition": "(m.unit_cell is not None) or (m.space_group is not None)"