Welcome to IDIS Core’s documentation!

IDIS core de-identifies DICOM datasets. It does this by removing or replacing DICOM elements when needed. All DICOM processing is based on pydicom. It processes in accordance to the DICOM deidentification profile and options.

It works like this:

import pydicom
from idiscore.defaults import create_default_core

core = create_default_core()      # create an idiscore instance

ds = pydicom.dcmread("my_file.dcm")  # load a DICOM dataset
ds = core.deidentify(ds)          # remove patient information
ds.save_as("deidentified.dcm")    # save to disk

See Getting started to start using idiscore

Goals

  • Deidentify DICOM datasets in conformance to the DICOM standard

  • Configuration is an extra, not a requirement. With minimal configuration IDIS core should deidentify a dataset ‘well’ This means IDIS core will include opinions on deidentification

  • Python all the way. No custom configuration languages, no installers, just scripts and pip. IDIS core assumes you can write python and leverages class inheritance, docstrings, pytest, variable annotations. This keeps things clean, testable and unambiguous

Non-Goals

  • No deidentification pipeline. IDIS core deidentifies DICOM datasets. It does not want to know where this dataset comes from or where it is going to. It does not offer any installable or server to send files to. It could be used to create such a server, but this is out of this project’s scope

  • Reading and Writing DICOM files. Internally IDIS core only works with pydicom datasets. Reading and writing of DICOM datasets is to pydicom

Alternatives

Alternative methods of de-identification

CTP

MIRC CTP is a widely used, extensive, java-based framework for deidentification and data aggragation. It has many plugins and can be configured using several scripting languages. All in all it is a very good choice for many people. For me as a programmer developing mostly python-based software, I struggled with certain aspects however:

  • It is difficult to integrate into a test suite properly. This is first of all because it is file-based, requiring an actual file on disk for each type of DICOM you might want to verify the deidentification of. Second, because the pipeline is configured with several different file-based custom scripts it is difficult to set up the correct context for tests.

  • I found it tricky to integrate into my python-based infrastructure. Again, because the pipeline is java-based and file-based there is no easy way to access the state of files in the pipeline. Is a file done? Has something gone wrong? Getting this information would require either checking all possible output, stage and quarantine folders. I was really missing exceptions I could catch.

  • Because it is an installable pipeline, I found it difficult to integrate into smaller, non-server based applications like a command line tool that locally deidentifies some data for a user.

deid

pydicom deid is a pydicom based best-effort anonymizer for medical image data. It is part of the pydicom family. It has extensive and friendly documentation and get several concepts right. Reasons for not expanding on this library and instead starting a new one:

  • There seems to have been little development since the libraries start in 2017

  • Seems to be quite file-based in places, often requiring input and output folders for initializing objects

  • No test coverage monitoring, uses unittest for testing which is hard to maintain and expand on

  • Uses custom scripting language for configuring the anonymization. This is useful for non-coding end-users, but adds a layer of indirectness to automated testing.

Indices and tables