fix: no such method error in lance arrow util due to transitive json4s usage#465
Merged
hamersaw merged 4 commits intolance-format:mainfrom Apr 27, 2026
Merged
Conversation
hamersaw
approved these changes
Apr 27, 2026
Collaborator
hamersaw
left a comment
There was a problem hiding this comment.
Looks great, thanks for the fix!
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Three call sites in
lance-sparkuseorg.json4sin ways that break when the library is included in a shaded fat-JAR with a json4s relocation rule.1.
LanceArrowUtils.toArrowField—Metadata.jsonValue()(write path)Metadata.jsonValue()is a public method onspark-sql-api'sMetadata; its return type isorg.json4s.JsonAST.JValue. The original code called this and immediately chained json4s operations on the result:2.
LanceArrowUtils.fromArrowSchema—Metadata.fromJObject(JObject(...))(read path)Metadata.fromJObjectacceptsorg.json4s.JsonAST.JObject. The original code constructed aJObjectand passed it directly:Root cause (problems 1 & 2): The Shade Plugin rewrites every bytecode descriptor in every class it processes, including those from
lance-spark-bundle, replacingorg.json4s/with the target namespace (e.g.shaded/org/json4s/). It cannot rewritespark-sql-api.jarbecause that JAR is on the cluster classpath and is not processed by the plugin.After shading, call site 1 becomes:
At runtime the JVM resolves
Metadata.jsonValue()fromspark-sql-api.jar, whose actual descriptor is()Lorg/json4s/JsonAST$JValue;. The descriptors do not match →NoSuchMethodError.Call site 2 fails identically in the other direction: the Shade Plugin rewrites the
JObjectargument descriptor toshaded/org/json4s/JsonAST$JObjectwhile the Spark API method still expectsorg/json4s/JsonAST$JObject.Both failures are unconditional once the conditions are met.
field.metadatain Spark defaults toMetadata.empty(never null), so both code paths are always executed.Conditions required to reproduce (problems 1 & 2):
spark-sql-apias a standalone JAR was introduced in 3.4; before 3.4,Metadatalived inspark-sqland the relevant methods were not part of a public API JAR)lance-spark-bundle(or includes it in a fat-JAR execution) and relocatesorg.json4s3.
IndexUtils.toJson—json4s-jacksonnot available on OSS SparkIndexUtils.toJson(called fromFragmentBasedIndexJob.runto serialize index creation arguments) importsorg.json4s.jackson.JsonMethods.{compact, render}:json4s-jacksonis not a declared dependency oflance-spark. and it is absent from OSS Spark's classpath. When a consumer shadeslance-spark-bundleand includesorg.json4s:*, they typically pull injson4s-coreand/orjson4s-nativebut notjson4s-jackson, leaving the shaded references toshaded/org/json4s/jackson/JsonMethods$unresolvable at runtime.The failure is intermittent:
toJsonshort-circuits to"{}"whenargsis empty (e.g.CREATE INDEXwithout options), so it is only triggered by index creation calls that pass options (e.g. FTS indexes with tokenizer configuration).Why the Fix Must Be in the Library
The only consumer-side approach for problems 1 & 2 is to exclude
LanceArrowUtilsfrom the json4s relocation rule so that its descriptors remain unshaded. This relies on knowledge of the library's internal implementation, breaks silently if the class is moved or renamed, and would need to be re-evaluated for every library release. It is not a supportable expectation for library users.More fundamentally, the root cause of problems 1 & 2 is a design choice within
LanceArrowUtils: it calls Spark API methods that carry json4s types in their signatures and uses the results with internal json4s operations in the same expression. No external configuration can safely separate these two requirements.For problem 3, the fix must be in the library because
json4s-jacksonis not a guaranteed transitive dependency in all Spark distributions and is absent from OSS Spark.Fix
All three replacements use
com.fasterxml.jackson.databind.ObjectMapper, which is available on every Spark cluster classpath and carries no json4s type in any method descriptor.Problem 1 (
toArrowField):mapper.writeValueAsString(field.getMetadata)serialises ajava.util.Map[String, String]to a JSON object string;Metadata.fromJsonparses it back, the same round-trip as the json4s version.Problem 2 (
fromArrowSchema):Metadata.jsonreturns the same JSON string thatMetadata.jsonValueserialises to internally;mapper.readValuedeserialises it to a map, the same result.Problem 3 (
IndexUtils.toJson): AnObjectNodeis built field-by-field and serialised withmapper.writeValueAsString, semantically identical output. TheObjectMapperinstance is held as a private singleton on theIndexUtilsobject;ObjectMapperis thread-safe for serialisation once constructed.No json4s type appears in any descriptor after these changes. The fixes are independent of the consuming application's shading configuration.