Skip to content

Question about the SubTransformers sampling process. #17

@Kevinpsk

Description

@Kevinpsk

Hi,

Thanks a lot for releasing this great project.
I have a question on the SubTransformers sampling process in the distributed training environment. I see you sample a random SubTransformer before each train step by doing the following, then in multi-GPU scenario, does each GPU has the same random SubTransformer or they each has a different random Subnetwork? Would reset_rand_seed force all GPUs to sample the same random SubTransformer from the SuperNet? And is trainer.get_num_updates() the same at each train step?

configs = [utils.sample_configs(utils.get_all_choices(args), reset_rand_seed=True, rand_seed=trainer.get_num_updates(), super_decoder_num_layer=args.decoder_layers)]

Thanks a lot for your help.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions