Skip to content

feat: Qwen2.5-omni-7b full modal speech recognition#3870

Merged
zhanweizhang7 merged 1 commit intov2from
pr@v2@feat_qwen2.5_omni_7b_full_modal_speech_recognition
Aug 18, 2025
Merged

feat: Qwen2.5-omni-7b full modal speech recognition#3870
zhanweizhang7 merged 1 commit intov2from
pr@v2@feat_qwen2.5_omni_7b_full_modal_speech_recognition

Conversation

@shaohuzhang1
Copy link
Copy Markdown
Contributor

feat: Qwen2.5-omni-7b full modal speech recognition

@f2c-ci-robot
Copy link
Copy Markdown

f2c-ci-robot bot commented Aug 18, 2025

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@f2c-ci-robot
Copy link
Copy Markdown

f2c-ci-robot bot commented Aug 18, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ModelTypeConst.STT, aliyun_bai_lian_omi_stt_model_credential, AliyunBaiLianOmiSpeechToText),
]

module_info_vl_list = [
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review and Recommendations

  1. Code Fragment: The code contains multiple changes across modules:

    • Changed modelsProvider.modelsImpl.aliyunBaiLian.credential.omi_stt to modelsProvider.modelsImpl.aliyunBaiLian.credential.omni_stt.

    • Replaced occurrences of _model_credential with _modelcredential.

    • Added two new model entries:

      ModelInfo('qwen2.5-omni-7b', ...
  2. Import Statements:

    • Consistent change from QwenVLModelCredential to QwenVLChatModel.
    • Updated all related classes and models.
  3. Comments:

    • No significant comment modifications noted.

Overall Conclusion

  • The provided changes appear consistent with the overall structure and functionality of the API.

No major technical issues were identified in this snippet. However, consider adding comments above each entry (like ModelInfo) explaining their purpose, given that they might not always be immediately self-explanatory without context.

Additional Advice for Quality Assurance:

For robustness before release, consider these enhancements:

  1. Validation Checks: Add checks to ensure that credential and model configurations match expected types before usage.
  2. Edge Case Testing: Cover scenarios where user inputs may differ slightly to identify potential bugs early.
  3. Testing Frameworks: Integrate existing testing frameworks to automate repeated unit, integration, and end-to-end tests after making these updates.
  4. Performance Monitoring: After deployment, set up performance monitoring tools to catch regressions quickly when updating similar APIs.

Feel free to add more detailed feedback or ask about specific areas if needed!

optional_params['streaming'] = True
return AliyunBaiLianSpeechToText(
model=model_name,
api_key=model_credential.get('api_key'),
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The provided code snippet has some issues:

  1. Optional Parameters Handling: The optional_params dictionary uses the same key 'max_tokens', which can lead to overwriting values in case multiple sources provide this parameter.

  2. Qwen Model Streaming Support: For the 'qwen-omni-turbo' model, there's an assumption that streaming should be enabled. However, it seems like the logic needs adjustments because if both model_name == 'qwen-omni-turbo' and model_kwargs['streaming'] is set elsewhere (and possibly overridden by another source), this line might still enable streaming despite its presence in the original parameters.

  3. Dictionary Usage: While using Dict[str, object] is flexible, using Any instead could help improve type-checking by letting you specify more precise types for the keys and values.

  4. Comments Clarity: The comments describe what each part of the code does; however, ensuring they are accurate with respect to actual behavior could make the function easier to reason about.

Here are the suggested improvements:

from typing import Dict

def new_instance(
    model_type: str,
    model_name: str,
    model_credential: Dict[str, Any],
    **model_kwargs
) -> AliasBaiLianSpeechToText:
    optional_params = {
        "max_tokens": 1000,  # Default value for max tokens, adjust as needed based on service documentation
        "temperature": 0.7   # Default temperature for generation quality, adjust as needed
    }
    
    # Update optional parameters from kwargs if present and not None
    if 'max_tokens' in model_kwargs and model_kwargs['max_tokens'] is not None:
        optional_params['max_tokens'] = model_kwargs['max_tokens']
    if 'temperature' in model_kwargs and model_kwargs['temperature'] is not None:
        optional_params['temperature'] = model_kwargs['temperature']
    
    # Enable streaming explicitly for Qwen-omni-turbo if necessary
    if model_name.lower() == 'qwen-omni-turbo':
        optional_params['streaming'] = True
    
    return AliyunBaiLianSpeechToText(
        model=model_name,
        api_key=model_credential.get('api_key'),
        **optional_params
    )

Key Changes:

  • Used Dict[str, Any] to allow runtime-specific key-value pairs in model_kwargs.
  • Removed hardcoding of default values (optional_params) for clarity.
  • Explicitly enabled streaming only when the model name matches 'qwen-omni-turbo', assuming lowercase comparisons align with the expected input format.
  • Improved commenting to reflect the current behavior better.

@zhanweizhang7 zhanweizhang7 merged commit b32b063 into v2 Aug 18, 2025
3 of 6 checks passed
@zhanweizhang7 zhanweizhang7 deleted the pr@v2@feat_qwen2.5_omni_7b_full_modal_speech_recognition branch August 18, 2025 03:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants