|
17 | 17 | # |
18 | 18 | # ========================================================================= |
19 | 19 |
|
20 | | -# |
21 | | -# Run the script directly from GitHub without downloading it using uv (https://github.com/astral-sh/uv): |
22 | | -# uv run https://raw.githubusercontent.com/InsightSoftwareConsortium/SimpleITK-Notebooks/refs/heads/main/Python/scripts/characterize_data.py -h |
23 | | -# |
24 | | - |
25 | 20 | # |
26 | 21 | # Provide inline script metadata per PEP 723 (https://peps.python.org/pep-0723/) |
27 | 22 | # /// script |
@@ -775,7 +770,7 @@ def characterize_data(argv=None): |
775 | 770 | ------- |
776 | 771 | To run the script one has to specify: |
777 | 772 | 1. Root of the data directory. |
778 | | - 2. Filename of csv output, can include relative or absolute path. |
| 773 | + 2. Filename of csv output. |
779 | 774 | 3. The analysis type to perform per_file or per_series. The latter indicates |
780 | 775 | we are only interested in DICOM files. |
781 | 776 |
|
@@ -830,6 +825,11 @@ def characterize_data(argv=None): |
830 | 825 | python characterize_data.py ../../Data/ Output/generic_image_data_report.csv per_file \ |
831 | 826 | --configuration_file ../../Data/characterize_data_user_defaults.json 2> errors.txt |
832 | 827 |
|
| 828 | + You can also run the script directly from GitHub without downloading it or explicitly creating |
| 829 | + a virtual Python environment using the uv Python package and project manager |
| 830 | + (https://github.com/astral-sh/uv): |
| 831 | + uv run https://raw.githubusercontent.com/InsightSoftwareConsortium/SimpleITK-Notebooks/refs/heads/main/Python/scripts/characterize_data.py -h |
| 832 | +
|
833 | 833 | Output: |
834 | 834 | ------ |
835 | 835 | The output from the script includes: |
@@ -893,16 +893,24 @@ def xyz_to_index(x, y, z, thumbnail_size, tile_size): |
893 | 893 | tile_size = |
894 | 894 | print(df["files"].iloc[xyz_to_index(x, y, z, thumbnail_size, tile_size)]) |
895 | 895 |
|
896 | | - Caveat: |
897 | | - ------ |
| 896 | + Caveats: |
| 897 | + -------- |
898 | 898 | When characterizing a set of DICOM images, start by running the script in per_file |
899 | | - mode. This will identify duplicate image files. Remove them before running using the per_series |
900 | | - mode. If run in per_series mode on the original data the duplicate files will not be identified |
901 | | - as such, they will be identified as belonging to the same series. In this situation we end up |
902 | | - with multiple images in the same spatial location (repeated 2D slice in a 3D volume). This will |
903 | | - result in incorrect values reported for the spacing, image size etc. |
| 899 | + mode. This will identify duplicate images at the file level. Remove them before running |
| 900 | + in per_series mode. If run in per_series mode on data with duplicate files they may |
| 901 | + not be identified as such as they may be identified as belonging to the same series. |
| 902 | + In this situation we end up with multiple images in the same spatial location |
| 903 | + (repeated 2D slice in a 3D volume). This will result in incorrect values reported for the |
| 904 | + spacing, image size etc. |
904 | 905 | When this happens you will see a WARNING printed to the terminal output, along the lines of |
905 | 906 | "ImageSeriesReader : Non uniform sampling or missing slices detected...". |
| 907 | +
|
| 908 | + When file paths are very long and the number of files in a series is large the total |
| 909 | + per cell character count in the "files" column may exceed the cell limits of some |
| 910 | + spreadsheet applications. The limit for Microsoft Excel is 32,767 characters and for |
| 911 | + Google Sheets it is 50,000 characters. When opened with Excel, the contents of the cell are |
| 912 | + truncated and this will corrupt the column layout. The data itself is valid and can be read |
| 913 | + correctly using Python or R. |
906 | 914 | """ |
907 | 915 | # Maximal number of points for which scatterplots are saved in pdf format, |
908 | 916 | # otherwise png. Threshold was deterimined empirically based on rendering |
|
0 commit comments