I tried to run the job multiple times. I am told that some data is missing during download.
[rank0]: FileNotFoundError: [Errno 2] No such file or directory: '/home/smci/MLC/repos/local/cache/download-file_ml-model-dlrm-t_4f9d5fc1/model_weights/.snapshot_metadata'
Traceback (most recent call last):
File "/home/smci/mlc/bin/mlcr", line 8, in
sys.exit(mlcr())
File "/home/smci/mlc/lib/python3.10/site-packages/mlc/main.py", line 91, in mlcr
mlc_expand_short("run")
File "/home/smci/mlc/lib/python3.10/site-packages/mlc/main.py", line 88, in mlc_expand_short
main()
File "/home/smci/mlc/lib/python3.10/site-packages/mlc/main.py", line 380, in main
res = method(run_args)
File "/home/smci/mlc/lib/python3.10/site-packages/mlc/script_action.py", line 386, in run
return self.call_script_module_function("run", run_args)
File "/home/smci/mlc/lib/python3.10/site-packages/mlc/script_action.py", line 262, in call_script_module_function
result = automation_instance.run(run_args) # Pass args to the run method
File "/home/smci/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py", line 287, in run
r = self._run(i)
File "/home/smci/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py", line 1458, in _run
r = customize_code.preprocess(ii)
File "/home/smci/MLC/repos/mlcommons@mlperf-automations/script/run-mlperf-inference-app/customize.py", line 285, in preprocess
r = mlc.access(ii)
File "/home/smci/mlc/lib/python3.10/site-packages/mlc/action.py", line 58, in access
result = method(options)
File "/home/smci/mlc/lib/python3.10/site-packages/mlc/script_action.py", line 386, in run
return self.call_script_module_function("run", run_args)
File "/home/smci/mlc/lib/python3.10/site-packages/mlc/script_action.py", line 282, in call_script_module_function
raise ScriptExecutionError(f"Script {function_name} execution failed in {module_path}. \nError : {error}")
mlc.script_action.ScriptExecutionError: Script run execution failed in /home/smci/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py.
Error : Native run script failed inside MLC script (name = benchmark-program, return code = 256)
I tried to run the job multiple times. I am told that some data is missing during download.
[rank0]: FileNotFoundError: [Errno 2] No such file or directory: '/home/smci/MLC/repos/local/cache/download-file_ml-model-dlrm-t_4f9d5fc1/model_weights/.snapshot_metadata'
Traceback (most recent call last):
File "/home/smci/mlc/bin/mlcr", line 8, in
sys.exit(mlcr())
File "/home/smci/mlc/lib/python3.10/site-packages/mlc/main.py", line 91, in mlcr
mlc_expand_short("run")
File "/home/smci/mlc/lib/python3.10/site-packages/mlc/main.py", line 88, in mlc_expand_short
main()
File "/home/smci/mlc/lib/python3.10/site-packages/mlc/main.py", line 380, in main
res = method(run_args)
File "/home/smci/mlc/lib/python3.10/site-packages/mlc/script_action.py", line 386, in run
return self.call_script_module_function("run", run_args)
File "/home/smci/mlc/lib/python3.10/site-packages/mlc/script_action.py", line 262, in call_script_module_function
result = automation_instance.run(run_args) # Pass args to the run method
File "/home/smci/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py", line 287, in run
r = self._run(i)
File "/home/smci/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py", line 1458, in _run
r = customize_code.preprocess(ii)
File "/home/smci/MLC/repos/mlcommons@mlperf-automations/script/run-mlperf-inference-app/customize.py", line 285, in preprocess
r = mlc.access(ii)
File "/home/smci/mlc/lib/python3.10/site-packages/mlc/action.py", line 58, in access
result = method(options)
File "/home/smci/mlc/lib/python3.10/site-packages/mlc/script_action.py", line 386, in run
return self.call_script_module_function("run", run_args)
File "/home/smci/mlc/lib/python3.10/site-packages/mlc/script_action.py", line 282, in call_script_module_function
raise ScriptExecutionError(f"Script {function_name} execution failed in {module_path}. \nError : {error}")
mlc.script_action.ScriptExecutionError: Script run execution failed in /home/smci/MLC/repos/mlcommons@mlperf-automations/automation/script/module.py.
Error : Native run script failed inside MLC script (name = benchmark-program, return code = 256)