Task Reference¶
Documentation for tasks implementing the workflow_manager EachItemTask or AllItemTask interfaces, along with the workflow_manager internal tasks.
Workflow Manager Tasks¶
Internal workflow_manager tasks.
- lung_modelling.workflow_manager.initialize(dataset_root, task_config, show_progress=True)[source]
Initialization task for processing a dataset directory structure. This is run first when WorkflowManager.run_workflow is called. Loads the dataset config and runs gather_directories, which gathers a list of source directories which tasks will act on.
- Parameters:
dataset_root (Path) – Root directory of the dataset
task_config (DictConfig) –
Configuration dict for this task
- params
- dataset_config_filename
Filename for the dataset configuration file. This should be directly inside the dataset_root directory.
- use_directory_index
Option to use pre-build index of the source directory instead of iterating through with os.walk.
- skip_dirs:
List of glob strings to match directories to skip. The whole path relative to the dataset_root is tested, so slashes can be included to specify depth to match. This takes precedence over select_dirs
- select_dirs:
List of glob strings to match directories to select. If empty, all valid source directories are selected. If not, only valid source directories that match one of these are selected.
show_progress – Option to show progress
- Returns:
dataloc – DatasetLocator object
dataset_config – DatasetConfig object
dirs_list – Record of data folders with list of files
- Return type:
Tuple[DatasetLocator, DictConfig, list]
- lung_modelling.workflow_manager.log_workflow(dataloc, cfg, task_config, task_name, results)[source]
Workflow task for logging the result of a workflow. Runs after all tasks are complete in WorkflowManager.run_workflow.
- Parameters:
dataloc (DatasetLocator) – Dataset locator
cfg (DictConfig) – DatasetConfig object
task_config (DictConfig) – Configuration dict for this task No parameters are currently in use for this task_config
task_name – name of this task
results – Results of workflow tasks to log
Standard Tasks¶
Tasks not using the shapeworks library as a dependency.
Shapeworks Tasks¶
Tasks using the shapeworks library as a dependency.
- class lung_modelling.app.shapeworks_tasks.ExtractLungLobesSW(name, config)[source]¶
- static initialize(dataloc, dataset_config, task_config)[source]¶
- Parameters:
dataloc (DatasetLocator) –
dataset_config (DictConfig) –
task_config (DictConfig) –
- static work(source_directory_primary, source_directory_derivative, output_directory, dataset_config, task_config, initialize_result=None)[source]¶
Pre-process lung lobe images by applying antialiasing using Shapeworks libraries
- Parameters:
source_directory_primary (Path) – Absolute path of the source directory in the primary folder of the dataset
source_directory_derivative (Path) – Absolute path of the source directory in the derivative folder of the dataset
output_directory (Path) – Absolute path of the directory in which to save results of the work function
dataset_config (DictConfig) – Config relating to the entire dataset
task_config (DictConfig) –
results_directory: Name of the results folder (Stem of output_directory)
output_filenames: dict providing a mapping from lobe mapping (in dataset config) to output filenames
- params: (Dict)
- maximumRMSError, numberOfIterations:
Parameters to apply to antialias
- Returns:
smoothed_lobes – list of Path objects representing the files created.
- Return type:
list[Path]
- class lung_modelling.app.shapeworks_tasks.CreateMeshesSW(name, config)[source]¶
- static initialize(dataloc, dataset_config, task_config)[source]¶
- Parameters:
dataloc (DatasetLocator) –
dataset_config (DictConfig) –
task_config (DictConfig) –
- static work(source_directory_primary, source_directory_derivative, output_directory, dataset_config, task_config, initialize_result=None)[source]¶
Convert medical image files to meshes and apply smoothing using Shapeworks libraries.
To meet the shapeworks requirement that all groomed files have unique names, the top level parent folder (which should be the subject number) is appended to the file names, delimited by a dash character (-).
- Parameters:
source_directory_primary (Path) – Absolute path of the source directory in the primary folder of the dataset
source_directory_derivative (Path) – Absolute path of the source directory in the derivative folder of the dataset
output_directory (Path) – Absolute path of the directory in which to save results of the work function
dataset_config (DictConfig) – Config relating to the entire dataset
task_config (DictConfig) –
source_directory: subdirectory within derivative source folder to find source files
results_directory: Name of the results folder (Stem of output_directory)
- params: (Dict)
- step_size
Step size to use for marching cubes. Higher values result in coarser geometry but can prevent meshes from taking up too much RAM
- decimate, decimate_target_faces, volume_preservation, subdivide_passes
Option to decimate and parameters for pyvvista mesh decimate. If subdivide_passes is greater than 0, the mesh is decimated first then subdivided. The initial decimation is calcuated such that the result after subdivision is the target number of faces
- remesh, remesh_target_points, adaptivity:
Option to remesh and parameters for shapeworks remesh
- smooth, smooth_iterations, relaxation:
Option to smooth and parameters for shapeworks smooth
- fill_holes, hole_size:
Option to fill holes andd parameters for shapeworks fill_holes
- remove_shared_faces:
Option to remove duplicate faces in the mesh
- isolate_mesh:
Option to remove islands leaving only the larges connected region in the mesh. If use_geodesic_distance is selected in the optimizer, it is essential that the mesh does not have islands.
- Returns:
mesh_files – list of Path objects representing the files created.
- Return type:
list[Path]
- class lung_modelling.app.shapeworks_tasks.ExtractWholeLungsSW(name, config)[source]¶
- static initialize(dataloc, dataset_config, task_config)[source]¶
- Parameters:
dataloc (DatasetLocator) –
dataset_config (DictConfig) –
task_config (DictConfig) –
- static work(source_directory_primary, source_directory_derivative, output_directory, dataset_config, task_config, initialize_result=None)[source]¶
Pre-process lung segmentation by extracting whole lung and applying antialiasing using Shapeworks libraries.
- Parameters:
source_directory_primary (Path) – Absolute path of the source directory in the primary folder of the dataset
source_directory_derivative (Path) – Absolute path of the source directory in the derivative folder of the dataset
output_directory (Path) – Absolute path of the directory in which to save results of the work function
dataset_config (DictConfig) – Config relating to the entire dataset
task_config (DictConfig) –
results_directory: Name of the results folder (Stem of output_directory)
output_filenames: dict providing a mapping from lobe mapping (in dataset config) to output filenames
- params: (Dict)
- maximumRMSError, numberOfIterations:
Parameters to apply to antialias
- Returns:
smoothed_lobes – list of Path objects representing the files created.
- Return type:
list[Path]
- class lung_modelling.app.shapeworks_tasks.ReferenceSelectionMeshSW(name, config)[source]¶
- static work(dataloc, dirs_list, output_directory, dataset_config, task_config)[source]¶
Load all meshes at once so the shape closest to the mean can be found and selected as the reference
The subject that was used as the reference is indicated in the output filename, surrounded by ()
- Parameters:
dataloc (DatasetLocator) – Dataset locator for the dataset
dirs_list (list) – List of relative paths to the source directories
output_directory (Path) – Absolute path of the directory in which to save results of the work function
dataset_config (DictConfig) – Config relating to the entire dataset
task_config (DictConfig) –
source_directories: subdirectories within derivative source folder to find source files
results_directory: Name of the results folder (Stem of output_directory)
- Returns:
List of reference mesh filenames. The first element is the combined reference mesh, and the following elements
are the domain reference meshes.
- Return type:
list[Path]
- class lung_modelling.app.shapeworks_tasks.MeshTransformSW(name, config)[source]¶
- static initialize(dataloc, dataset_config, task_config)[source]¶
Load reference meshes and convert to points and faces so they can be pickled and sent to the work function
- Parameters:
dataloc (DatasetLocator) – Dataset Locator
dataset_config (DictConfig) – Dataset config
task_config (DictConfig) –
- source_directory_initialize
directory from which to load reference meshes
- Returns:
Dict of reference meshes – {reference_meshes:{points:[points], faces:[faces]}}
- Return type:
dict
- static work(source_directory_primary, source_directory_derivative, output_directory, dataset_config, task_config, initialize_result=None)[source]¶
Calculate alignment transforms for global alignment and per domain alignment. Uses shapeworks rigid alignment for both.
- Parameters:
source_directory_primary (Path) – Absolute path of the source directory in the primary folder of the dataset
source_directory_derivative (Path) – Absolute path of the source directory in the derivative folder of the dataset
output_directory (Path) – Absolute path of the directory in which to save results of the work function
dataset_config (DictConfig) – Config relating to the entire dataset
task_config (DictConfig) –
source_directories: subdirectories within derivative source folder to find source files
results_directory: Name of the results folder (Stem of output_directory)
- params: (Dict)
- iterations
Iterations for shapeworks alignment function
initialize_result – Return dict from the initialize function
- Returns:
List of transform filenames. The first is for the global alignment, and the following are for domain alignments.
- Return type:
list[Path]
- class lung_modelling.app.shapeworks_tasks.OptimizeMeshesSW(name, config)[source]¶
- static work(dataloc, dirs_list, output_directory, dataset_config, task_config)[source]¶
Run the shapeworks optimize command
NOTE! Path to project file cannot have spaces
Todo: This should return the files it creates (at least the shapeworks project…)
- Parameters:
dataloc (DatasetLocator) – Dataset Locator
dirs_list – List of relative paths to the source directories
output_directory (Path) – Absolute path of the directory in which to save results of the work function
dataset_config (DictConfig) – Dataset config
task_config (DictConfig) –
- source_directory_transform
directory for transform files
- source_directories_mesh
directories for mesh files
- source_directories_original
directories for original (pre-grooming) files
- source_directory_subject_data
optional source directory to add subject groups to shapeworks project
- image_globs
glob to find original files
- results_directory:
Name of the results folder (Stem of output_directory)
- params
shapeworks optimization params
- class lung_modelling.app.shapeworks_tasks.ComputePCASW(name, config)[source]¶
- static work(dataloc, dirs_list, output_directory, dataset_config, task_config)[source]¶
Generate PCA model from the optimized shapeworks particle system. The complete PCA model is written to the results directory (Eigenvalues, eigenvectors, mean coordinates, and subject scores). The mean mesh is also computed by warping a subject mesh to the PCA mean points.
- Parameters:
dataloc (DatasetLocator) – Dataset Locator
dirs_list (list) – List of relative paths to the source directories
output_directory (Path) – Absolute path of the directory in which to save results of the work function
dataset_config (DictConfig) – Dataset config
task_config (DictConfig) –
- source_directory
source directory of the shapeworks particle system
- source_directory_reference
source directory for the reference shape
- results_directory:
Name of the results folder (Stem of output_directory)
- mesh_file_domain_name_regex:
Regex pattern to match domain name within mesh file name
- params
- warp_mesh_subject_id
Index of subject mesh to use as the warp mesh
- class lung_modelling.app.shapeworks_tasks.SubjectDataPCACorrelationSW(name, config)[source]¶
- static work(dataloc, dirs_list, output_directory, dataset_config, task_config)[source]¶
Find correlations between subject data and PCA components. Each mode analysed separately. First, RFECV is used to find the highest scoring set of subject data parameters for each mode. Then the f test is run to determine statistical significance of the model. Significant models are written as pickles, along with the regression analysis statistics.
- Parameters:
dataloc (DatasetLocator) – Dataset Locator
dirs_list (list) – List of relative paths to the source directories
output_directory (Path) – Absolute path of the directory in which to save results of the work function
dataset_config (DictConfig) – Dataset config
task_config (DictConfig) –
- source_directory_pca
source directory of the PCA model
- source_directory_subject_data
source directory in which to find subject data
- results_directory:
Name of the results folder (Stem of output_directory)
- subject_data_filename
Filename of the subject data file
- subject_data_keys
List of keys in the subject data to test for correlation with PCA modes
- study_phase
Phase of the COPDGene study to draw subject data from
- params
- percent_variability
Threshold for cumulative proportion of variability. The lowest N modes that meet this threshold are selected for inclusion
- ftest_significance
Significance threshold for f test. (Test of statistical significance for each linear regression model)
- class lung_modelling.app.shapeworks_tasks.GenerateMeshesMatchingSubjectsSW(name, config)[source]¶
- static work(dataloc, dirs_list, output_directory, dataset_config, task_config)[source]¶
Use pre built PCA and Linear Regression models to generate meshes that estimate the current subjects in dirs_list using the specified subject data. Then calculate the RMS error between reference and predicted meshes, and between reference and mean meshes, for each domain.
Note: this could be an each item task with an initialize if initialize got the dirs list
- Parameters:
dataloc (DatasetLocator) – Dataset Locator
dirs_list (list) – List of relative paths to the source directories
output_directory (Path) – Absolute path of the directory in which to save results of the work function
dataset_config (DictConfig) – Dataset config
task_config (DictConfig) –
- source_directory_pca
source directory of the PCA model
- source_directory_subject_data
Source directory in which to find subject data
- source_directory_linear_model
Source directory in which to find linear regression model pickles
- source_directories_reference_mesh
List of directory names in which to search for reference meshes
- mesh_file_domain_name_regex:
Regex pattern to match domain name within mesh file name
- results_directory:
Name of the results folder (Stem of output_directory)
- subject_data_filename
Filename of the subject data file
- subject_data_keys
List of keys in the subject data to test for correlation with PCA modes
- study_phase
Phase of the COPDGene study to draw subject data from
- params
- alignment_iterations
Iterations for rigid alignment between reference meshes and predicted and mean meshes
- class lung_modelling.app.shapeworks_tasks.GenerateMeshesWithSubjectDataSW(name, config)[source]¶
- static work(dataloc, dirs_list, output_directory, dataset_config, task_config)[source]¶
Use subject data regression model and PCA model to generate meshes based on subject data.
# TODO: finish
- Parameters:
dataloc (DatasetLocator) – Dataset Locator
dirs_list (list) – List of relative paths to the source directories
output_directory (Path) – Absolute path of the directory in which to save results of the work function
dataset_config (DictConfig) – Dataset config
task_config (DictConfig) –
- source_directory
source directory of Todo:finish this
- results_directory:
Name of the results folder (Stem of output_directory)
- params
None implemented