Skip to content

Clarification on deployment configuration #235

@tonyf

Description

@tonyf

It would be helpful to have further documentation on deployment recommendations in a production setting.

For example:

  • Should the parameter server / lighthouse server be colocated for performance? Is it necessary to have high speed interconnect between the lighthouse server and the worker nodes?
  • Can a single lighthouse server be shared amongst multiple training jobs? If so, how are the instances/jobs distinguished from each other?
  • What kind of minimum specs are recommended for the lighthouse / parameter servers? How does this relate to model size?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions