Skip to content

SOLR-18130: rename parameter "zkHost" to "solrCloud" in "solrj-streaming"#4320

Draft
vyatkinv wants to merge 1 commit intoapache:mainfrom
vyatkinv:SOLR-18130-solrj-streaming-solr-cloud-parameter
Draft

SOLR-18130: rename parameter "zkHost" to "solrCloud" in "solrj-streaming"#4320
vyatkinv wants to merge 1 commit intoapache:mainfrom
vyatkinv:SOLR-18130-solrj-streaming-solr-cloud-parameter

Conversation

@vyatkinv
Copy link
Copy Markdown
Contributor

@vyatkinv vyatkinv commented Apr 22, 2026

https://issues.apache.org/jira/browse/SOLR-18130

Description

This PR updates the solrj-streaming module by replacing all usages of zkHost with solrCloud to enable support for HTTP-based quorum configurations.

Solution

Parameters, fields and variables in solrj-streaming, named as zkHost renamed to solrCloud
For backward compatibility, specifying zkHost is still supported.
A shared method has been introduced in an abstract class to resolve the effective solrCloud value using a priority-based approach (e.g., explicit parameter → legacy zkHost → default Zookeeper host).

Tests

To Be Done

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Solr maintainers access to contribute to my PR branch. (optional but recommended, not available for branches on forks living under an organisation)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.
  • I have added documentation for the Reference Guide
  • I have added a changelog entry for my change

Planned follow-ups:

  • Perform more thorough testing of solrj-streaming with both Zookeeper-based and HTTP-based quorum configurations
  • Add unit tests for HTTP quorum support
  • Review and improve error messages and java-doc, where zookeeper still mentioned
  • Add documentation and changelog entry

Open Questions / Discussion Points

I would appreciate feedback on the following topics (also noted as TODOs in the code):

1. Mandatory solrCloud parameter

I made solrCloud required in all stream handlers (an error is thrown if it cannot be resolved from parameters or the default Zookeeper host).
This caused two tests to fail:

  • StreamExpressionTest#testCloudSolrStream

  • StreamExpressionTest#testStatsStream

These tests did not provide any valid host.
I fixed them by adding .withDefaultZkHost(cluster.getZkServer().getZkAddress())

Question:

  • Is this a test issue or a flaw in the implementation?

  • Can such a situation occur in real-world usage?

2. toExpression behavior and backward compatibility

Currently, toExpression reconstructs the expression and always includes solrCloud, even if the user originally provided the legacy zkHost parameter.
As a result, a user who runs an expression with zkHost and then calls explain() will see solrCloud in the output.

Question:

  • Should the original parameter name (zkHost) be preserved?
  • Or is it acceptable to normalize everything to solrCloud?

3. Potential issue with gather parameter
In:
org/apache/solr/client/solrj/io/graph/GatherNodesStream.java:392
there is the following line:
expression.addParameter(new StreamExpressionNamedParameter("gather", solrCloud));

Question:

  • Is this a bug?
  • Or is solrCloud (previously zkHost) actually intended to be passed as gather?

UPDATE: I looked into the context more deeply and yes, it looks like this is a defect that affects explain(). I fixed it as part of my pull request.

4. Inconsistent parameter handling
Some stream handlers use Map<String, String> for parameters, while others use ModifiableSolrParams.

Question:

Would it make sense to standardize on a single approach?

5. ScoreNodesStream limitation
ScoreNodesStream can only obtain zkHost / solrCloud via: factory.getDefaultZkHost();

Question:

  • Should this be improved as part of this PR?
  • Or would it be better to handle it in a separate issue?

@vyatkinv vyatkinv marked this pull request as draft April 22, 2026 11:15
@vyatkinv vyatkinv changed the title [WIP] SOLR-18130: rename parameter "zkHost" to "solrCloud" in "solrj-streaming" WIP: SOLR-18130: rename parameter "zkHost" to "solrCloud" in "solrj-streaming" Apr 22, 2026
@vyatkinv vyatkinv changed the title WIP: SOLR-18130: rename parameter "zkHost" to "solrCloud" in "solrj-streaming" SOLR-18130: rename parameter "zkHost" to "solrCloud" in "solrj-streaming" Apr 22, 2026
@vyatkinv vyatkinv force-pushed the SOLR-18130-solrj-streaming-solr-cloud-parameter branch from 27f4ae5 to c877f2a Compare April 22, 2026 13:36
@epugh
Copy link
Copy Markdown
Contributor

epugh commented Apr 22, 2026

Shouldn't solrCloud be connectionString? I was hoping down the road we could put more into the connection string like username password or other details that are needed....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants