Skip to content

Commit c114cd9

Browse files
committed
Better support for binary inputs
Turns out that nushell tries to auto-convert binary data to strings. This can be avoided for scalar values by not using SimplePluginCommand and switching to PluginCommand that has access to the input binary stream. Unfortunately nushell doesn't support lists of streams and it converts them to list of strings.
1 parent ad74e48 commit c114cd9

4 files changed

Lines changed: 152 additions & 36 deletions

File tree

README.md

Lines changed: 25 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -2,18 +2,18 @@
22

33
*Note that this plugin works with nu 0.105. If you want to use nu 0.60, use version 0.1.8 of this plugin.*
44

5-
A [nushell](https://www.nushell.sh/) plugin to parse [Dicom](https://en.wikipedia.org/wiki/DICOM) objects.
5+
A [nushell](https://www.nushell.sh/) plugin to parse [DICOM](https://en.wikipedia.org/wiki/DICOM) objects.
66

77
This plugin is in the early stage of the development. It is usable but it might not be able to cope
8-
with all Dicom objects. One notable limitation is that all Dicom objects are expected to have a preamble.
8+
with all DICOM objects. One notable limitation is that all DICOM objects are expected to have a preamble.
99

1010
I'm still trying to figure out what is the most useful way of using this plugin. Please feel free to try it out,
1111
send feedback in [Discussions](https://github.com/realcundo/nu_plugin_dcm/discussions) or report problems in [Issues](https://github.com/realcundo/nu_plugin_dcm/issues).
1212

1313
## Usage
1414
`dcm` plugin reads its input from single values or from specific columns:
15-
- `dcm`: expects a string/filename or binary Dicom data
16-
- `dcm $column_name`: reads a string/filename or binary Dicom data from `$column`. This is
15+
- `dcm`: expects a string/filename or binary DICOM data
16+
- `dcm $column_name`: reads a string/filename or binary DICOM data from `$column`. This is
1717
equivalent to `get $column | dcm`.
1818

1919
## Error handling
@@ -24,28 +24,36 @@ send feedback in [Discussions](https://github.com/realcundo/nu_plugin_dcm/discus
2424

2525
## Known Limitations
2626

27-
- Dicom objects without a preamble and DCIM header will fail to load.
28-
- PixelData is always skipped. For now I'm considering this to be a feature that speeds up Dicom parsing.
27+
- DICOM objects without a preamble and DCIM header will fail to load.
28+
- PixelData is always skipped. For now I'm considering this to be a feature that speeds up DICOM parsing.
29+
- `dcm` can process binary data. You can pass it directly to `dcm` as `open --raw file.dcm | dcm`. However, when passing a list of binary streams,
30+
`nushell` will try to convert it to a list of strings. To work around this, use `into binary`, e.g.:
31+
```sh
32+
[(open --raw file1.dcm | into binary), (open --raw file2.dcm | into binary)] | dcm
33+
```
34+
35+
Without `into binary`, `dcm` would see a list of strings, assuming it's a list of filenames.
36+
2937

3038

3139
## Examples
3240

33-
### Output Dicom file as a table
41+
### Output DICOM file as a table
3442
```sh
35-
echo file.dcm | dcm # uses filename/string to specify which file to open
36-
open file.dcm | dcm # pass binary data to `dcm`
37-
ls file.dcm | dcm name # use `name` column as the filename
38-
echo file.dcm | wrap foo | dcm foo # use `foo` column as the filename
39-
open file.dcm | wrap foo | dcm foo # use `foo` column as binary data
43+
echo file.dcm | dcm # uses filename/string to specify which file to open
44+
open --raw file.dcm | dcm # pass binary data to `dcm`
45+
ls file.dcm | dcm name # use `name` column as the filename (equivalent of `ls file.dcm | select name | dcm`)
46+
echo file.dcm | wrap foo | dcm foo # use `foo` column as the filename
47+
open -r file.dcm | wrap foo | dcm foo # use `foo` column as binary data
4048
```
4149

42-
### Dump Dicom file as a JSON/YAML document
50+
### Dump DICOM file as a JSON/YAML document
4351
```sh
44-
open file.dcm | dcm | to json --indent 2
45-
open file.dcm | dcm | to yaml
52+
open -r file.dcm | dcm | to json --indent 2
53+
open -r file.dcm | dcm | to yaml
4654
```
4755

48-
### Dump all Dicom files in the current directory to a JSON/YAML document
56+
### Dump all DICOM files in the current directory to a JSON/YAML document
4957
```sh
5058
ls *.dcm | dcm name | to json --indent 2
5159
ls *.dcm | dcm name | to yaml
@@ -78,7 +86,7 @@ would fail since selected columns are missing. Another option would be using `de
7886
for missing columns.)
7987
8088
81-
### For each file in all subdirectories, show filename, file size, SHA256 hash of the file, SOP Instance UID and a Dicom parsing error, if any
89+
### For each file in all subdirectories, show filename, file size, SHA256 hash of the file, SOP Instance UID and a DICOM parsing error, if any
8290
Use `par-each` to process files in parallel:
8391
```sh
8492
ls **/* | where type == file |

src/plugin.rs

Lines changed: 43 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -8,10 +8,11 @@ use crate::dcm;
88
use dicom::object::DefaultDicomObject;
99
use dicom::object::StandardDataDictionary;
1010
use indexmap::IndexMap;
11-
use nu_plugin::{EngineInterface, Plugin, SimplePluginCommand};
11+
use nu_plugin::{EngineInterface, Plugin, PluginCommand};
1212
use nu_protocol::ast::CellPath;
1313
use nu_protocol::{
14-
Category, Example, LabeledError, Record, ShellError, Signature, Span, SyntaxShape, Type, Value,
14+
Category, Example, LabeledError, PipelineData, Record, ShellError, Signature, Span,
15+
SyntaxShape, Type, Value,
1516
};
1617

1718
#[derive(Default)]
@@ -32,7 +33,7 @@ impl Plugin for DcmPlugin {
3233
#[derive(Default)]
3334
pub struct DcmPluginCommand;
3435

35-
impl SimplePluginCommand for DcmPluginCommand {
36+
impl PluginCommand for DcmPluginCommand {
3637
type Plugin = DcmPlugin;
3738

3839
fn name(&self) -> &str {
@@ -134,10 +135,25 @@ impl SimplePluginCommand for DcmPluginCommand {
134135
plugin: &DcmPlugin,
135136
engine: &EngineInterface,
136137
call: &nu_plugin::EvaluatedCall,
137-
input: &Value,
138-
) -> Result<Value, LabeledError> {
138+
input: PipelineData,
139+
) -> Result<PipelineData, LabeledError> {
139140
let current_dir = engine.get_current_dir().map(PathBuf::from);
140141

142+
let span = input.span().unwrap_or(call.head);
143+
let input_metadata = input.metadata();
144+
145+
// Convert input PipelineData to Value. The default behaviour for BinaryStream is to detect the type of the stream and if unknown, try to read it as a UTF-8 string.
146+
// This behaviour is wrong for us, since we want to read any binary input stream as binary data.
147+
// Unfortunately, a list of `[(`open file.dcm`), ...]` gets converted to a list of strings by nushell before we can access it.
148+
let input = if let PipelineData::ByteStream(byte_stream, ..) = input {
149+
// TODO eventually support streaming?
150+
let bytes = byte_stream.into_bytes()?;
151+
Value::binary(bytes, span)
152+
} else {
153+
// convert PipelineData to Value the usual way
154+
input.into_value(span)?
155+
};
156+
141157
// parse args, nu makes sure the num of args and type is correct
142158
let source_column = call.nth(0);
143159
let source_column = if let Some(Value::CellPath { val, .. }) = source_column {
@@ -154,13 +170,18 @@ impl SimplePluginCommand for DcmPluginCommand {
154170
};
155171

156172
// run
157-
self.run_filter(
173+
let output = self.run_filter(
158174
plugin,
159175
current_dir.as_deref(),
160-
input,
176+
&input,
161177
source_column,
162178
error_column,
163-
)
179+
)?;
180+
181+
// Forward DataSource metadata from input to output, but clear any content type. This keeps the source.
182+
let output_metadata = input_metadata.map(|m| m.with_content_type(None));
183+
184+
Ok(PipelineData::Value(output, output_metadata))
164185
}
165186
}
166187

@@ -223,13 +244,21 @@ impl DcmPluginCommand {
223244
val, internal_span, ..
224245
} => {
225246
// make absolute if needed
226-
let val = resolve_path(val, current_dir, value.span())?;
247+
let file = resolve_path(val, current_dir, value.span())?;
248+
249+
// TODO add some heuristics to determine if the input string is filename or DICOM binary data connverted to utf-8 by nu?
250+
// (see `ByteStream::into_value()` which does the conversion to string.)
251+
252+
let obj = read_dcm_file(&file).map_err(|e| {
253+
// Report a better error if the input string looks like DICOM binary data with preamble.
254+
// TODO this is messy. In fact the whole error reporting is messy.
255+
let text = if val.get(128..132) == Some("DICM") {
256+
"Input string looks like DICOM binary data. Either pass binary data, or a filename.".to_string()
257+
} else {
258+
format!("{} [file {}]", e, file.to_string_lossy())
259+
};
227260

228-
let obj = read_dcm_file(&val).map_err(|e| {
229-
LabeledError::new("`dcm` expects valid DICOM binary data").with_label(
230-
format!("{} [file {}]", e, val.to_string_lossy()),
231-
*internal_span,
232-
)
261+
LabeledError::new("`dcm` expects valid DICOM binary data").with_label( text, *internal_span)
233262
})?;
234263

235264
self.process_dicom_object(plugin, internal_span, obj, error_column)

tests/assets/file.dcm

276 Bytes
Binary file not shown.

tests/integration_tests.rs

Lines changed: 84 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
use nu_plugin_dcm::plugin::DcmPlugin;
1+
use nu_plugin_dcm::plugin::{DcmPlugin, DcmPluginCommand};
22
use nu_plugin_test_support::PluginTest;
33
use nu_protocol::{record, IntoPipelineData, Span, Value};
44
use std::path::PathBuf;
@@ -13,11 +13,14 @@ macro_rules! assert_dicom_field {
1313
};
1414
}
1515

16-
fn get_asset_path(filename: &str) -> PathBuf {
16+
fn get_asset_base_path() -> PathBuf {
1717
PathBuf::from(env!("CARGO_MANIFEST_DIR"))
1818
.join("tests")
1919
.join("assets")
20-
.join(filename)
20+
}
21+
22+
fn get_asset_path(filename: &str) -> PathBuf {
23+
get_asset_base_path().join(filename)
2124
}
2225

2326
fn create_binary_value(filename: &str) -> Result<Value, Box<dyn std::error::Error>> {
@@ -38,6 +41,82 @@ fn create_file_record_value<S: AsRef<str>>(filename: S) -> Value {
3841
)
3942
}
4043

44+
#[test]
45+
#[ignore]
46+
fn test_examples() -> Result<(), nu_protocol::ShellError> {
47+
let mut plugin_test = PluginTest::new("dcm", DcmPlugin::default().into())?;
48+
49+
plugin_test.add_decl(Box::new(nu_command::Open))?;
50+
plugin_test.add_decl(Box::new(nu_command::Ls))?;
51+
52+
plugin_test.engine_state_mut().add_env_var(
53+
"PWD".to_string(),
54+
Value::string(get_asset_base_path().to_string_lossy(), Span::test_data()),
55+
);
56+
57+
plugin_test.test_command_examples(&DcmPluginCommand)
58+
}
59+
60+
#[test]
61+
fn test_command_scalar_open() -> Result<(), nu_protocol::ShellError> {
62+
let mut plugin_test = PluginTest::new("dcm", DcmPlugin::default().into())?;
63+
64+
plugin_test.add_decl(Box::new(nu_command::Open))?;
65+
66+
plugin_test.engine_state_mut().add_env_var(
67+
"PWD".to_string(),
68+
Value::string(get_asset_base_path().to_string_lossy(), Span::test_data()),
69+
);
70+
71+
let result = plugin_test.eval("open --raw file.dcm | dcm")?;
72+
let result = result.into_value(Span::test_data())?;
73+
74+
// TODO actually test the result
75+
assert!(result.as_record().is_ok());
76+
Ok(())
77+
}
78+
79+
#[test]
80+
fn test_command_vector_open() -> Result<(), nu_protocol::ShellError> {
81+
let mut plugin_test = PluginTest::new("dcm", DcmPlugin::default().into())?;
82+
83+
plugin_test.add_decl(Box::new(nu_command::Open))?;
84+
plugin_test.add_decl(Box::new(nu_command::IntoBinary))?;
85+
86+
plugin_test.engine_state_mut().add_env_var(
87+
"PWD".to_string(),
88+
Value::string(get_asset_base_path().to_string_lossy(), Span::test_data()),
89+
);
90+
91+
let result = plugin_test
92+
.eval("[(open --raw file.dcm | into binary), (open --raw file.dcm | into binary)] | dcm")?;
93+
94+
let result = result.into_value(Span::test_data())?;
95+
96+
// TODO actually test the result
97+
assert!(result.as_list().is_ok());
98+
Ok(())
99+
}
100+
101+
#[test]
102+
fn test_command_ls() -> Result<(), nu_protocol::ShellError> {
103+
let mut plugin_test = PluginTest::new("dcm", DcmPlugin::default().into())?;
104+
105+
plugin_test.add_decl(Box::new(nu_command::Ls))?;
106+
107+
plugin_test.engine_state_mut().add_env_var(
108+
"PWD".to_string(),
109+
Value::string(get_asset_base_path().to_string_lossy(), Span::test_data()),
110+
);
111+
112+
let result = plugin_test.eval("ls *.dcm | dcm name")?;
113+
let result = result.into_value(Span::test_data())?;
114+
115+
// TODO actually test the result
116+
assert!(result.as_list().is_ok());
117+
Ok(())
118+
}
119+
41120
/// Simulate `open file | dcm`
42121
#[test]
43122
fn test_scalar_binary_input() -> Result<(), Box<dyn std::error::Error>> {
@@ -138,7 +217,7 @@ fn test_vector_record_input() -> Result<(), Box<dyn std::error::Error>> {
138217
let mut plugin_test = PluginTest::new("dcm", DcmPlugin::default().into())?;
139218

140219
// Create multiple records that simulate what 'ls *.dcm' would produce
141-
let test_files = vec![
220+
let test_files = [
142221
"ExplicitVRLittleEndian-Preamble.dcm",
143222
"ExplicitVRBigEndian-Preamble.dcm",
144223
"ImplicitVRLittleEndian-Preamble.dcm",
@@ -173,7 +252,7 @@ fn test_vector_record_input() -> Result<(), Box<dyn std::error::Error>> {
173252
}
174253

175254
// Verify specific patient names match the filenames
176-
let expected_names = vec![
255+
let expected_names = [
177256
"ExplicitVRLittleEndian-Preamble",
178257
"ExplicitVRBigEndian-Preamble",
179258
"ImplicitVRLittleEndian-Preamble",

0 commit comments

Comments
 (0)