-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Image generation benchmark #3082
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
84c0ba6
250c3be
674f544
45f1963
40e768b
ebe0d59
9ce4828
84c5478
66ed7a9
3b81cda
9ab2ef3
805c757
10794ab
81f80e0
bd9f4d4
b964b1d
4bd05f2
3bd307a
566385e
9f28c81
07ef6f9
46f2355
9c6ca45
f4250bf
b757b62
a3517b8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same as for MetaMath benchmark. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,96 @@ | ||
| # Makefile for listing and running the image generation experiments. | ||
|
|
||
| # --- Configuration --- | ||
| PYTHON := python | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Honestly, a Makefile for launching a couple of experiments seems more complicated than it needs to be. But I am sure I am missing out on something. What's the advantage of using Makefiles for this?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. First, it's the same as MetaMath, so we keep it for consistency. Second, it checks which experiments have already run and only runs the missing ones. I'd say that's pretty much the main use case for |
||
| RUN_SCRIPT := run.py | ||
| EXPERIMENTS_DIR := experiments | ||
| RESULTS_DIR := results | ||
|
|
||
| OPTIONAL_FLAGS = | ||
|
|
||
| ifdef UPLOAD_BUCKET | ||
| OPTIONAL_FLAGS += --bucket_name "${UPLOAD_BUCKET}" | ||
| endif | ||
|
|
||
| # --- Automatic Experiment and Result Discovery --- | ||
|
|
||
| # 1. Find all experiment directories by looking for adapter_config.json files. | ||
| # This gives us a list like: experiments/lora/llama-3.2-3B-rank32 ... | ||
| EXPERIMENT_PATHS := $(shell find $(EXPERIMENTS_DIR) \ | ||
| -name "adapter_config.json" -or \ | ||
| -name "training_params.json" | xargs dirname | sort -u) | ||
|
|
||
| # 2. Define a function to replace all occurrences of a character in a string. | ||
| # This is needed to replicate the result naming logic from run.py (e.g., "lora/foo" -> "lora-foo"). | ||
| # Usage: $(call replace-all, string, char_to_replace, replacement_char) | ||
| replace-all = $(if $(findstring $(2),$(1)),$(call replace-all,$(subst $(2),$(3),$(1)),$(2),$(3)),$(1)) | ||
|
|
||
| # 3. Define a function to convert an experiment path to its flat result file path. | ||
| # e.g., "experiments/lora/llama-3.2-3B-rank32" -> "results/lora-llama-3.2-3B-rank32.json" | ||
| exp_to_res = $(RESULTS_DIR)/$(call replace-all,$(patsubst $(EXPERIMENTS_DIR)/%,%,$(1)),/,--).json | ||
|
|
||
| # 4. Generate the list of all target result files we want to build. | ||
| RESULT_FILES := $(foreach exp,$(EXPERIMENT_PATHS),$(call exp_to_res,$(exp))) | ||
|
|
||
|
|
||
| # --- Main Rules --- | ||
|
|
||
| # The default 'all' target depends on all possible result files. | ||
| # Running `make` or `make all` will check and run any outdated or missing experiments. | ||
| all: $(RESULT_FILES) | ||
|
|
||
|
|
||
| # --- Dynamic Rule Generation --- | ||
|
|
||
| # This is the core logic. We dynamically generate a specific Makefile rule for each experiment found. | ||
| # This avoids a complex pattern rule and makes the logic clearer. | ||
| define EXPERIMENT_template | ||
| # Input $1: The full experiment path (e.g., experiments/lora/llama-3.2-3B-rank32) | ||
|
|
||
| # Define the rule: | ||
| # The target is the result file (e.g., results/lora-llama-3.2-3B-rank32.json). | ||
| # The dependencies are its config files, code changes need to be audited manually since they can | ||
| # vary in degree of importance. Note that we explicitly ignore when the script fails to run | ||
| # so that the other experiments still have a chance to run. | ||
| $(call exp_to_res,$(1)): $(wildcard $(1)/adapter_config.json) $(wildcard $(1)/training_params.json) | ||
| @echo "---" | ||
| @echo "Running experiment: $(1)" | ||
| -$(PYTHON) $(RUN_SCRIPT) $(OPTIONAL_FLAGS) -v $(1) | ||
| @echo "Finished: $$@" | ||
| @echo "---" | ||
|
|
||
| endef | ||
|
|
||
| # This command iterates through every found experiment path and evaluates the template, | ||
| # effectively stamping out a unique, explicit rule for each one. | ||
| $(foreach exp_path,$(EXPERIMENT_PATHS),$(eval $(call EXPERIMENT_template,$(exp_path)))) | ||
|
|
||
|
|
||
| # --- Utility Rules --- | ||
|
|
||
| .PHONY: all clean list dump_rules | ||
|
|
||
| # The 'clean' rule removes all generated results. | ||
| clean: | ||
| @echo "Cleaning results directory..." | ||
| @([ -n "$(wildcard $(RESULTS_DIR)/*.json)" ] && rm $(RESULTS_DIR)/*.json) || exit 0 | ||
|
|
||
| # The 'list' rule is for debugging. It shows the discovered experiments | ||
| # and the result files the Makefile expects to create for them. | ||
| list: | ||
| @echo "Discovered experiment configurations:" | ||
| @$(foreach exp,$(EXPERIMENT_PATHS),echo " - $(exp)/adapter_config.json";) | ||
| @echo "\nTarget result files:" | ||
| @$(foreach res,$(RESULT_FILES),echo " - $(res)";) | ||
|
|
||
| # The 'dump_rules' rule is for debugging. It dumps all dynamically defined rules. | ||
| define newline | ||
|
|
||
|
|
||
| endef | ||
| define DUMPED_RULES | ||
| $(foreach exp_path,$(EXPERIMENT_PATHS),$(call EXPERIMENT_template,$(exp_path))) | ||
| endef | ||
|
|
||
| dump_rules: | ||
| @echo -e "$(subst $(newline),\n,${DUMPED_RULES})" | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the filter-error path, the second return value is a raw pandas DataFrame, but the DataFrame component is otherwise fed a styled dataframe via
format_df(...). Returning the raw DF here can break rendering or make the table formatting inconsistent; return the same formatted value as the non-error path.