Welcome to IDIS Core's documentation!
======================================
IDIS core de-identifies DICOM datasets. It does this by removing or replacing DICOM elements when needed. All DICOM
processing is based on `pydicom `_. It processes in accordance to the
`DICOM deidentification profile and options `_.
It works like this:
.. code-block:: python
import pydicom
from idiscore.defaults import create_default_core
core = create_default_core() # create an idiscore instance
ds = pydicom.dcmread("my_file.dcm") # load a DICOM dataset
ds = core.deidentify(ds) # remove patient information
ds.save_as("deidentified.dcm") # save to disk
See :ref:`getting_started` to start using idiscore
Goals
=====
* Deidentify DICOM datasets in conformance to the DICOM standard
* Configuration is an extra, not a requirement. With minimal configuration IDIS core should deidentify a dataset 'well'
This means IDIS core will include opinions on deidentification
* Python all the way. No custom configuration languages, no installers, just scripts and pip. IDIS core assumes you can
write python and leverages class inheritance, docstrings, pytest, variable annotations. This keeps things clean,
testable and unambiguous
Non-Goals
=========
* No deidentification pipeline. IDIS core deidentifies DICOM datasets. It does not want to know where this dataset comes
from or where it is going to. It does not offer any installable or server to send files to. It could be used to create
such a server, but this is out of this project's scope
* Reading and Writing DICOM files. Internally IDIS core only works with `pydicom `_ datasets. Reading and writing of DICOM datasets is
to pydicom
Alternatives
============
Alternative methods of de-identification
CTP
`MIRC CTP `_ is a widely used, extensive, java-based framework
for deidentification and data aggragation. It has many plugins and can be configured using several scripting languages.
All in all it is a very good choice for many people. For me as a programmer developing mostly python-based software,
I struggled with certain aspects however:
* It is difficult to integrate into a test suite properly. This is first of all because it is file-based, requiring an
actual file on disk for each type of DICOM you might want to verify the deidentification of. Second, because the pipeline
is configured with several different file-based custom scripts it is difficult to set up the correct context for tests.
* I found it tricky to integrate into my python-based infrastructure. Again, because the pipeline is java-based and
file-based there is no easy way to access the state of files in the pipeline. Is a file done? Has something gone wrong?
Getting this information would require either checking all possible output, stage and quarantine folders. I was really
missing exceptions I could catch.
* Because it is an installable pipeline, I found it difficult to integrate into smaller, non-server based applications like
a command line tool that locally deidentifies some data for a user.
deid
`pydicom deid `_ is a pydicom based best-effort anonymizer for medical image data.
It is part of the pydicom family. It has `extensive and friendly documentation `_
and get several concepts right. Reasons for not expanding on this library and instead starting a new one:
* There seems to have been little development since the libraries start in 2017
* Seems to be quite file-based in places, often requiring input and output folders for initializing objects
* No test coverage monitoring, uses unittest for testing which is hard to maintain and expand on
* Uses `custom scripting `_ language for configuring the
anonymization. This is useful for non-coding end-users, but adds a layer of indirectness to automated testing.
.. toctree::
:maxdepth: 2
:caption: Contents:
readme
getting_started
advanced
concepts
modules
contributing
history
Indices and tables
==================
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`