Skip to content

Commit 9f50c1f

Browse files
authored
Llama 4 component upgrade (#4150)
* Llama 4 component upgrade * Adding new line at eof * Llama 4 component upgrade * Adding new line at eof * Failing gate fixes * Failing gate fix * Failing gate fixes
1 parent f9bce63 commit 9f50c1f

70 files changed

Lines changed: 516 additions & 411 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

assets/training/finetune_acft_hf_nlp/components/finetune/chat_completion/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ Training parameters
4141
| per_device_train_batch_size | Per gpu batch size used for training. The effective training batch size is _per_device_train_batch_size_ * _num_gpus_ * _num_nodes_. | integer | 1 | True | NA |
4242
| per_device_eval_batch_size | Per gpu batch size used for validation. The default value is 1. The effective validation batch size is _per_device_eval_batch_size_ * _num_gpus_ * _num_nodes_. | integer | 1 | True | NA |
4343
| auto_find_batch_size | If set to "true" and if the provided 'per_device_train_batch_size' goes into Out Of Memory (OOM) auto_find_batch_size will find the correct batch size by iteratively reducing batch size by a factor of 2 till the OOM is fixed | string | false | True | ['true', 'false'] |
44-
| optim | Optimizer to be used while training | string | adamw_hf | True | ['adamw_hf', 'adamw_torch', 'adafactor'] |
44+
| optim | Optimizer to be used while training | string | adamw_torch | True | ['adamw_torch', 'adafactor'] |
4545
| learning_rate | Start learning rate used for training. | number | 2e-05 | True | NA |
4646
| warmup_steps | Number of steps for the learning rate scheduler warmup phase. | integer | 0 | True | NA |
4747
| weight_decay | Weight decay to apply (if not zero) to all layers except all bias and LayerNorm weights in AdamW optimizer | number | 0.0 | True | NA |

assets/training/finetune_acft_hf_nlp/components/finetune/chat_completion/spec.yaml

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
11
$schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
22
name: chat_completion_finetune
3-
version: 0.0.73
3+
version: 0.0.74
44
type: command
55

66
is_deterministic: true
77

88
display_name: Chat Completion Finetune
99
description: Component to finetune Hugging Face pretrained models for chat completion task. The component supports optimizations such as LoRA, Deepspeed and ONNXRuntime for performance enhancement. See [docs](https://aka.ms/azureml/components/chat_completion_finetune) to learn more.
1010

11-
environment: azureml://registries/azureml/environments/acft-hf-nlp-gpu/versions/87
11+
environment: azureml://registries/azureml/environments/acft-hf-nlp-gpu/versions/94
1212

1313
code: ../../../src/finetune
1414

@@ -93,12 +93,10 @@ inputs:
9393

9494
optim:
9595
type: string
96-
default: adamw_hf
96+
default: adamw_torch
9797
optional: true
9898
enum:
99-
- adamw_hf
100-
- adamw_torch
101-
# - adamw_apex_fused
99+
- adamw_torch # - adamw_apex_fused
102100
- adafactor
103101
description: Optimizer to be used while training
104102

assets/training/finetune_acft_hf_nlp/components/finetune/question_answering/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ Training parameters
4141
| per_device_train_batch_size | Per gpu batch size used for training. The effective training batch size is _per_device_train_batch_size_ * _num_gpus_ * _num_nodes_. | integer | 1 | True | NA |
4242
| per_device_eval_batch_size | Per gpu batch size used for validation. The default value is 1. The effective validation batch size is _per_device_eval_batch_size_ * _num_gpus_ * _num_nodes_. | integer | 1 | True | NA |
4343
| auto_find_batch_size | If set to "true" and if the provided 'per_device_train_batch_size' goes into Out Of Memory (OOM) auto_find_batch_size will find the correct batch size by iteratively reducing batch size by a factor of 2 till the OOM is fixed | string | false | True | ['true', 'false'] |
44-
| optim | Optimizer to be used while training | string | adamw_hf | True | ['adamw_hf', 'adamw_torch', 'adafactor'] |
44+
| optim | Optimizer to be used while training | string | adamw_torch | True | ['adamw_torch', 'adafactor'] |
4545
| learning_rate | Start learning rate used for training. | number | 2e-05 | True | NA |
4646
| warmup_steps | Number of steps for the learning rate scheduler warmup phase. | integer | 0 | True | NA |
4747
| weight_decay | Weight decay to apply (if not zero) to all layers except all bias and LayerNorm weights in AdamW optimizer | number | 0.0 | True | NA |

assets/training/finetune_acft_hf_nlp/components/finetune/question_answering/spec.yaml

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
11
$schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
22
name: question_answering_finetune
3-
version: 0.0.73
3+
version: 0.0.74
44
type: command
55

66
is_deterministic: true
77

88
display_name: Question Answering Finetune
99
description: Component to finetune Hugging Face pretrained models for extractive question answering task. The component supports optimizations such as LoRA, Deepspeed and ONNXRuntime for performance enhancement. See [docs](https://aka.ms/azureml/components/question_answering_finetune) to learn more.
1010

11-
environment: azureml://registries/azureml/environments/acft-hf-nlp-gpu/versions/87
11+
environment: azureml://registries/azureml/environments/acft-hf-nlp-gpu/versions/94
1212

1313
code: ../../../src/finetune
1414

@@ -93,12 +93,10 @@ inputs:
9393

9494
optim:
9595
type: string
96-
default: adamw_hf
96+
default: adamw_torch
9797
optional: true
9898
enum:
99-
- adamw_hf
100-
- adamw_torch
101-
# - adamw_apex_fused
99+
- adamw_torch # - adamw_apex_fused
102100
- adafactor
103101
description: Optimizer to be used while training
104102

assets/training/finetune_acft_hf_nlp/components/finetune/summarization/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ Training parameters
4141
| per_device_train_batch_size | Per gpu batch size used for training. The effective training batch size is _per_device_train_batch_size_ * _num_gpus_ * _num_nodes_. | integer | 1 | True | NA |
4242
| per_device_eval_batch_size | Per gpu batch size used for validation. The default value is 1. The effective validation batch size is _per_device_eval_batch_size_ * _num_gpus_ * _num_nodes_. | integer | 1 | True | NA |
4343
| auto_find_batch_size | If set to "true" and if the provided 'per_device_train_batch_size' goes into Out Of Memory (OOM) auto_find_batch_size will find the correct batch size by iteratively reducing batch size by a factor of 2 till the OOM is fixed | string | false | True | ['true', 'false'] |
44-
| optim | Optimizer to be used while training | string | adamw_hf | True | ['adamw_hf', 'adamw_torch', 'adafactor'] |
44+
| optim | Optimizer to be used while training | string | adamw_torch | True | ['adamw_torch', 'adafactor'] |
4545
| learning_rate | Start learning rate used for training. | number | 2e-05 | True | NA |
4646
| warmup_steps | Number of steps for the learning rate scheduler warmup phase. | integer | 0 | True | NA |
4747
| weight_decay | Weight decay to apply (if not zero) to all layers except all bias and LayerNorm weights in AdamW optimizer | number | 0.0 | True | NA |

assets/training/finetune_acft_hf_nlp/components/finetune/summarization/spec.yaml

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
11
$schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
22
name: summarization_finetune
3-
version: 0.0.73
3+
version: 0.0.74
44
type: command
55

66
is_deterministic: true
77

88
display_name: Summarization Finetune
99
description: Component to finetune Hugging Face pretrained models for summarization task. The component supports optimizations such as LoRA, Deepspeed and ONNXRuntime for performance enhancement. See [docs](https://aka.ms/azureml/components/summarization_finetune) to learn more.
1010

11-
environment: azureml://registries/azureml/environments/acft-hf-nlp-gpu/versions/87
11+
environment: azureml://registries/azureml/environments/acft-hf-nlp-gpu/versions/94
1212

1313
code: ../../../src/finetune
1414

@@ -93,12 +93,10 @@ inputs:
9393

9494
optim:
9595
type: string
96-
default: adamw_hf
96+
default: adamw_torch
9797
optional: true
9898
enum:
99-
- adamw_hf
100-
- adamw_torch
101-
# - adamw_apex_fused
99+
- adamw_torch # - adamw_apex_fused
102100
- adafactor
103101
description: Optimizer to be used while training
104102

assets/training/finetune_acft_hf_nlp/components/finetune/text_classification/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ Training parameters
4141
| per_device_train_batch_size | Per gpu batch size used for training. The effective training batch size is _per_device_train_batch_size_ * _num_gpus_ * _num_nodes_. | integer | 1 | True | NA |
4242
| per_device_eval_batch_size | Per gpu batch size used for validation. The default value is 1. The effective validation batch size is _per_device_eval_batch_size_ * _num_gpus_ * _num_nodes_. | integer | 1 | True | NA |
4343
| auto_find_batch_size | If set to "true" and if the provided 'per_device_train_batch_size' goes into Out Of Memory (OOM) auto_find_batch_size will find the correct batch size by iteratively reducing batch size by a factor of 2 till the OOM is fixed | string | false | True | ['true', 'false'] |
44-
| optim | Optimizer to be used while training | string | adamw_hf | True | ['adamw_hf', 'adamw_torch', 'adafactor'] |
44+
| optim | Optimizer to be used while training | string | adamw_torch | True | ['adamw_torch', 'adafactor'] |
4545
| learning_rate | Start learning rate used for training. | number | 2e-05 | True | NA |
4646
| warmup_steps | Number of steps for the learning rate scheduler warmup phase | integer | 0 | True | NA |
4747
| weight_decay | Weight decay to apply (if not zero) to all layers except all bias and LayerNorm weights in AdamW optimizer | number | 0.0 | True | NA |

assets/training/finetune_acft_hf_nlp/components/finetune/text_classification/spec.yaml

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
11
$schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
22
name: text_classification_finetune
3-
version: 0.0.73
3+
version: 0.0.74
44
type: command
55

66
is_deterministic: false
77

88
display_name: Text Classification Finetune
99
description: Component to finetune Hugging Face pretrained models for text classification task. The component supports optimizations such as LoRA, Deepspeed and ONNXRuntime for performance enhancement. See [docs](https://aka.ms/azureml/components/text_classification_finetune) to learn more.
1010

11-
environment: azureml://registries/azureml/environments/acft-hf-nlp-gpu/versions/87
11+
environment: azureml://registries/azureml/environments/acft-hf-nlp-gpu/versions/94
1212

1313
code: ../../../src/finetune
1414

@@ -93,12 +93,10 @@ inputs:
9393

9494
optim:
9595
type: string
96-
default: adamw_hf
96+
default: adamw_torch
9797
optional: true
9898
enum:
99-
- adamw_hf
100-
- adamw_torch
101-
# - adamw_apex_fused
99+
- adamw_torch # - adamw_apex_fused
102100
- adafactor
103101
description: Optimizer to be used while training
104102

0 commit comments

Comments
 (0)