Azure · visahan-24 · May 7, 2025 · May 7, 2025 · May 7, 2025 · May 7, 2025
@@ -41,7 +41,7 @@ Training parameters
 | per_device_train_batch_size | Per gpu batch size used for training. The effective training batch size is _per_device_train_batch_size_ * _num_gpus_ * _num_nodes_.                                                                                                                                                  | integer | 1        | True     | NA                                                                                             |
 | per_device_eval_batch_size  | Per gpu batch size used for validation. The default value is 1. The effective validation batch size is _per_device_eval_batch_size_ * _num_gpus_ * _num_nodes_.                                                                                                                       | integer | 1        | True     | NA                                                                                             |
 | auto_find_batch_size        | If set to "true" and if the provided 'per_device_train_batch_size' goes into Out Of Memory (OOM) auto_find_batch_size will find the correct batch size by iteratively reducing batch size by a factor of 2 till the OOM is fixed                                                      | string  | false    | True     | ['true', 'false']                                                                              |
-| optim                       | Optimizer to be used while training                                                                                                                                                                                                                                                   | string  | adamw_hf | True     | ['adamw_hf', 'adamw_torch', 'adafactor']                                                       |
+| optim                       | Optimizer to be used while training                                                                                                                                                                                                                                                   | string  | adamw_torch | True     | ['adamw_torch', 'adafactor']                                                       |
 | learning_rate               | Start learning rate used for training.                                                                                                                                                                                                                                                | number  | 2e-05    | True     | NA                                                                                             |
 | warmup_steps                | Number of steps for the learning rate scheduler warmup phase.                                                                                                                                                                                                                         | integer | 0        | True     | NA                                                                                             |
 | weight_decay                | Weight decay to apply (if not zero) to all layers except all bias and LayerNorm weights in AdamW optimizer                                                                                                                                                                            | number  | 0.0      | True     | NA                                                                                             |

@@ -1,14 +1,14 @@
 $schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
 name: chat_completion_finetune
-version: 0.0.73
+version: 0.0.74
 type: command
 
 is_deterministic: true
 
 display_name: Chat Completion Finetune
 description: Component to finetune Hugging Face pretrained models for chat completion task. The component supports optimizations such as LoRA, Deepspeed and ONNXRuntime for performance enhancement. See [docs](https://aka.ms/azureml/components/chat_completion_finetune) to learn more.
 
-environment: azureml://registries/azureml/environments/acft-hf-nlp-gpu/versions/87
+environment: azureml://registries/azureml/environments/acft-hf-nlp-gpu/versions/94
 
 code: ../../../src/finetune
 
@@ -93,12 +93,10 @@ inputs:
 
   optim:
     type: string
-    default: adamw_hf
+    default: adamw_torch
     optional: true
     enum:
-    - adamw_hf
-    - adamw_torch
-      # - adamw_apex_fused
+    - adamw_torch      # - adamw_apex_fused
     - adafactor
     description: Optimizer to be used while training
 

@@ -41,7 +41,7 @@ Training parameters
 | per_device_train_batch_size | Per gpu batch size used for training. The effective training batch size is _per_device_train_batch_size_ * _num_gpus_ * _num_nodes_.                                                                                                                                                  | integer | 1        | True     | NA                                                                                             |
 | per_device_eval_batch_size  | Per gpu batch size used for validation. The default value is 1. The effective validation batch size is _per_device_eval_batch_size_ * _num_gpus_ * _num_nodes_.                                                                                                                       | integer | 1        | True     | NA                                                                                             |
 | auto_find_batch_size        | If set to "true" and if the provided 'per_device_train_batch_size' goes into Out Of Memory (OOM) auto_find_batch_size will find the correct batch size by iteratively reducing batch size by a factor of 2 till the OOM is fixed                                                      | string  | false    | True     | ['true', 'false']                                                                              |
-| optim                       | Optimizer to be used while training                                                                                                                                                                                                                                                   | string  | adamw_hf | True     | ['adamw_hf', 'adamw_torch', 'adafactor']                                                       |
+| optim                       | Optimizer to be used while training                                                                                                                                                                                                                                                   | string  | adamw_torch | True     | ['adamw_torch', 'adafactor']                                                       |
 | learning_rate               | Start learning rate used for training.                                                                                                                                                                                                                                                | number  | 2e-05    | True     | NA                                                                                             |
 | warmup_steps                | Number of steps for the learning rate scheduler warmup phase.                                                                                                                                                                                                                         | integer | 0        | True     | NA                                                                                             |
 | weight_decay                | Weight decay to apply (if not zero) to all layers except all bias and LayerNorm weights in AdamW optimizer                                                                                                                                                                            | number  | 0.0      | True     | NA                                                                                             |

@@ -1,14 +1,14 @@
 $schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
 name: question_answering_finetune
-version: 0.0.73
+version: 0.0.74
 type: command
 
 is_deterministic: true
 
 display_name: Question Answering Finetune
 description: Component to finetune Hugging Face pretrained models for extractive question answering task. The component supports optimizations such as LoRA, Deepspeed and ONNXRuntime for performance enhancement. See [docs](https://aka.ms/azureml/components/question_answering_finetune) to learn more.
 
-environment: azureml://registries/azureml/environments/acft-hf-nlp-gpu/versions/87
+environment: azureml://registries/azureml/environments/acft-hf-nlp-gpu/versions/94
 
 code: ../../../src/finetune
 
@@ -93,12 +93,10 @@ inputs:
 
   optim:
     type: string
-    default: adamw_hf
+    default: adamw_torch
     optional: true
     enum:
-    - adamw_hf
-    - adamw_torch
-      # - adamw_apex_fused
+    - adamw_torch      # - adamw_apex_fused
     - adafactor
     description: Optimizer to be used while training
 

@@ -41,7 +41,7 @@ Training parameters
 | per_device_train_batch_size | Per gpu batch size used for training. The effective training batch size is _per_device_train_batch_size_ * _num_gpus_ * _num_nodes_.                                                                                                                                                  | integer | 1        | True     | NA                                                                                             |
 | per_device_eval_batch_size  | Per gpu batch size used for validation. The default value is 1. The effective validation batch size is _per_device_eval_batch_size_ * _num_gpus_ * _num_nodes_.                                                                                                                       | integer | 1        | True     | NA                                                                                             |
 | auto_find_batch_size        | If set to "true" and if the provided 'per_device_train_batch_size' goes into Out Of Memory (OOM) auto_find_batch_size will find the correct batch size by iteratively reducing batch size by a factor of 2 till the OOM is fixed                                                      | string  | false    | True     | ['true', 'false']                                                                              |
-| optim                       | Optimizer to be used while training                                                                                                                                                                                                                                                   | string  | adamw_hf | True     | ['adamw_hf', 'adamw_torch', 'adafactor']                                                       |
+| optim                       | Optimizer to be used while training                                                                                                                                                                                                                                                   | string  | adamw_torch | True     | ['adamw_torch', 'adafactor']                                                       |
 | learning_rate               | Start learning rate used for training.                                                                                                                                                                                                                                                | number  | 2e-05    | True     | NA                                                                                             |
 | warmup_steps                | Number of steps for the learning rate scheduler warmup phase.                                                                                                                                                                                                                         | integer | 0        | True     | NA                                                                                             |
 | weight_decay                | Weight decay to apply (if not zero) to all layers except all bias and LayerNorm weights in AdamW optimizer                                                                                                                                                                            | number  | 0.0      | True     | NA                                                                                             |

@@ -1,14 +1,14 @@
 $schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
 name: summarization_finetune
-version: 0.0.73
+version: 0.0.74
 type: command
 
 is_deterministic: true
 
 display_name: Summarization Finetune
 description: Component to finetune Hugging Face pretrained models for summarization task. The component supports optimizations such as LoRA, Deepspeed and ONNXRuntime for performance enhancement. See [docs](https://aka.ms/azureml/components/summarization_finetune) to learn more.
 
-environment: azureml://registries/azureml/environments/acft-hf-nlp-gpu/versions/87
+environment: azureml://registries/azureml/environments/acft-hf-nlp-gpu/versions/94
 
 code: ../../../src/finetune
 
@@ -93,12 +93,10 @@ inputs:
 
   optim:
     type: string
-    default: adamw_hf
+    default: adamw_torch
     optional: true
     enum:
-    - adamw_hf
-    - adamw_torch
-      # - adamw_apex_fused
+    - adamw_torch      # - adamw_apex_fused
     - adafactor
     description: Optimizer to be used while training
 

@@ -41,7 +41,7 @@ Training parameters
 | per_device_train_batch_size | Per gpu batch size used for training. The effective training batch size is _per_device_train_batch_size_ * _num_gpus_ * _num_nodes_.                                                                                                                                                  | integer | 1        | True     | NA                                                                                             |
 | per_device_eval_batch_size  | Per gpu batch size used for validation. The default value is 1. The effective validation batch size is _per_device_eval_batch_size_ * _num_gpus_ * _num_nodes_.                                                                                                                       | integer | 1        | True     | NA                                                                                             |
 | auto_find_batch_size        | If set to "true" and if the provided 'per_device_train_batch_size' goes into Out Of Memory (OOM) auto_find_batch_size will find the correct batch size by iteratively reducing batch size by a factor of 2 till the OOM is fixed                                                      | string  | false    | True     | ['true', 'false']                                                                              |
-| optim                       | Optimizer to be used while training                                                                                                                                                                                                                                                   | string  | adamw_hf | True     | ['adamw_hf', 'adamw_torch', 'adafactor']                                                       |
+| optim                       | Optimizer to be used while training                                                                                                                                                                                                                                                   | string  | adamw_torch | True     | ['adamw_torch', 'adafactor']                                                       |
 | learning_rate               | Start learning rate used for training.                                                                                                                                                                                                                                                | number  | 2e-05    | True     | NA                                                                                             |
 | warmup_steps                | Number of steps for the learning rate scheduler warmup phase                                                                                                                                                                                                                          | integer | 0        | True     | NA                                                                                             |
 | weight_decay                | Weight decay to apply (if not zero) to all layers except all bias and LayerNorm weights in AdamW optimizer                                                                                                                                                                            | number  | 0.0      | True     | NA                                                                                             |

@@ -1,14 +1,14 @@
 $schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
 name: text_classification_finetune
-version: 0.0.73
+version: 0.0.74
 type: command
 
 is_deterministic: false
 
 display_name: Text Classification Finetune
 description: Component to finetune Hugging Face pretrained models for text classification task. The component supports optimizations such as LoRA, Deepspeed and ONNXRuntime for performance enhancement. See [docs](https://aka.ms/azureml/components/text_classification_finetune) to learn more.
 
-environment: azureml://registries/azureml/environments/acft-hf-nlp-gpu/versions/87
+environment: azureml://registries/azureml/environments/acft-hf-nlp-gpu/versions/94
 
 code: ../../../src/finetune
 
@@ -93,12 +93,10 @@ inputs:
 
   optim:
     type: string
-    default: adamw_hf
+    default: adamw_torch
     optional: true
     enum:
-    - adamw_hf
-    - adamw_torch
-      # - adamw_apex_fused
+    - adamw_torch      # - adamw_apex_fused
     - adafactor
     description: Optimizer to be used while training