Skip to content

augmentedManifestFile + PipeModeDataset example #63

Description

@vlordier

It would really help to have a full end to end example of, say, image classification with augmentedManifestFile + PipeModeDataset

as I keep getting errors of formats like
tensorflow/core/framework/op_kernel.cc:1622] OP_REQUIRES failed at example_parsing_ops.cc:240 : Invalid argument: Could not parse example input, value: '����

I build a jsonl augumentedManifest with

{'image-ref': s3://path/to/image, 'label': 3} 
{'image-ref': s3://path/to/image, 'label': 1}  
{'image-ref': s3://path/to/image, 'label': 2}  

then preparing training channel as

train_data = sagemaker.session.s3_input(augmented_manifest_file_on_s3,
                                        distribution 	= 'FullyReplicated',
                                        content_type 	= 'image/jpeg',
                                        s3_data_type 	= 'AugmentedManifestFile',
                                        attribute_names	= ['image-ref', 'label'],
					input_mode 		= 'Pipe',
                                        record_wrapping = 'RecordIO') 

and launching the .fit as

data_channels = {'train': train_data}

# Train a model.
tf_estimator.fit(inputs=data_channels, logs=True)

in my entry script, I have

	dataset = PipeModeDataset(channel = channel)
	dataset = dataset.prefetch(tf.data.experimental.AUTOTUNE)
	dataset = dataset.batch(2)
	dataset = dataset.map(combine)
	dataset = dataset.map(example_parser, num_parallel_calls=batch_size)
	dataset = dataset.repeat(epochs)
	dataset = dataset.batch(batch_size, drop_remainder=True)
	image_batch, label_batch = next(iter(dataset))

and as a modified example parser, I have

`def example_parser(exemple1, exemple2):

feat1 = tf.io.parse_single_example(
	exemple1,
	features={
		'image-ref'		: tf.io.FixedLenFeature([], tf.string),
	})

feat2 = tf.io.parse_single_example(
	exemple2,
	features={
		'label'			: tf.io.FixedLenFeature([], tf.int64),
	})

image 					= feat1['image-ref']
image = tf.image.decode_jpeg(image, channels=3)
image = tf.image.convert_image_dtype(image, tf.float32)
label 					= tf.cast(feat2['label'], tf.int32)
return image, label

`

What am I doing wrong ?
The documentation here is not clear about using augmented manifest files

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions