[SPARK-57627][CONNECT] Support primary key, foreign key, and SQL type metadata in DatabaseMetaData

j1wonpark · pan3793 · commit e28edaf0480e · 2026-06-30T11:34:05.000+08:00
### What changes were proposed in this pull request? This PR implements five `DatabaseMetaData` methods in the Spark Connect JDBC driver (`SparkConnectDatabaseMetaData`) that previously threw `SQLFeatureNotSupportedException`: - `getPrimaryKeys` — returns an empty `ResultSet` with the JDBC-defined schema. Spark Connect does not expose primary keys over JDBC, so "no primary keys" is represented as an empty result rather than an error. - `getImportedKeys` / `getExportedKeys` / `getCrossReference` — return an empty `ResultSet` with the JDBC foreign-key schema, for the same reason. All three share a private `emptyForeignKeys` helper. - `getTypeInfo` — returns a static catalog of the Spark SQL atomic types (12 rows), ordered by `DATA_TYPE`, mirroring the type-code/precision mapping already used by `JdbcTypeUtils`. `TIMESTAMP_NTZ` is omitted because it maps to the same JDBC type code (`Types.TIMESTAMP`) as `TIMESTAMP`; `TIME` is omitted for now because its maximum `PRECISION`/scale representation in `getTypeInfo` is not yet settled. The result-set schemas (column names, order, and types) match the canonical definitions already used by Spark's Thrift server operations (`GetPrimaryKeysOperation`, `GetCrossReferenceOperation`, `GetTypeInfoOperation`), with one intentional correction: the `KEY_SEQ` column uses the JDBC-spec name `KEY_SEQ` rather than the `KEQ_SEQ` typo inherited from Hive in the Thrift operations. `getFunctions` is intentionally left throwing and is out of scope for this PR. ### Why are the changes needed? Returning an empty `ResultSet` (rather than throwing) for metadata that a driver does not support is the conventional JDBC behavior, and it is what other engines in this ecosystem do: Trino returns an empty result set for `getPrimaryKeys`/`getImportedKeys`, and Hive does so for `getImportedKeys`. Throwing `SQLFeatureNotSupportedException` breaks otherwise-recoverable client introspection — for example, BI tools that probe primary/foreign keys to infer table relationships abort the metadata step instead of degrading to "no keys." `getTypeInfo` is the one method here for which Spark can return real data: its atomic types are statically known. Hive, Trino, and the Databricks JDBC driver all implement `getTypeInfo`; Spark Connect was the outlier in throwing. ### Does this PR introduce _any_ user-facing change? Yes. Previously these five methods threw `SQLFeatureNotSupportedException`. After this change: - `getPrimaryKeys`, `getImportedKeys`, `getExportedKeys`, and `getCrossReference` return an empty `ResultSet` with the JDBC-defined columns. - `getTypeInfo` returns the catalog of Spark SQL atomic types. This is a change within the unreleased branch only; the Spark Connect JDBC driver has not been released. ### How was this patch tested? Added in-process tests to `SparkConnectDatabaseMetaDataSuite`: - `getPrimaryKeys` and `getImportedKeys`/`getExportedKeys`/`getCrossReference` assert the result-set column schema and that the result is empty. - `getTypeInfo` asserts the column schema, the rows ordered by `DATA_TYPE`, that every type is nullable and searchable, that only `STRING` is case-sensitive, the per-type `LITERAL_PREFIX`/`LITERAL_SUFFIX` (including the `X'...'` hex syntax for `BINARY`), the `NUM_PREC_RADIX` (10 for numeric types, NULL otherwise), and that `DECIMAL` carries the expected precision and scale. ``` build/sbt 'connect-client-jdbc/testOnly *SparkConnectDatabaseMetaDataSuite' ``` All 10 tests pass. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Code (Opus 4.8) Closes #56688 from j1wonpark/SPARK-57627. Authored-by: Jiwon Park <jpark92@outlook.kr> Signed-off-by: Cheng Pan <chengpan@apache.org>
diff --git a/sql/connect/client/jdbc/src/main/scala/org/apache/spark/sql/connect/client/jdbc/SparkConnectDatabaseMetaData.scala b/sql/connect/client/jdbc/src/main/scala/org/apache/spark/sql/connect/client/jdbc/SparkConnectDatabaseMetaData.scala
@@ -598,26 +598,94 @@ class SparkConnectDatabaseMetaData(conn: SparkConnectConnection) extends Databas
       catalog: String, schema: String, table: String): ResultSet =
     throw new SQLFeatureNotSupportedException
 
-  override def getPrimaryKeys(catalog: String, schema: String, table: String): ResultSet =
-    throw new SQLFeatureNotSupportedException
+  // Spark supports informational PRIMARY KEY constraints on DSv2 tables (SPARK-51207), but the
+  // Spark Connect JDBC client cannot retrieve them in a structured way yet, so return an empty
+  // result set instead of throwing. JDBC clients call this during schema introspection.
+  override def getPrimaryKeys(catalog: String, schema: String, table: String): ResultSet = {
+    conn.checkOpen()
 
-  override def getImportedKeys(catalog: String, schema: String, table: String): ResultSet =
-    throw new SQLFeatureNotSupportedException
+    val df = conn.spark.emptyDataFrame
+      .withColumn("TABLE_CAT", lit(""))
+      .withColumn("TABLE_SCHEM", lit(""))
+      .withColumn("TABLE_NAME", lit(""))
+      .withColumn("COLUMN_NAME", lit(""))
+      .withColumn("KEY_SEQ", lit(0.toShort))
+      .withColumn("PK_NAME", lit(""))
+    new SparkConnectResultSet(df.collectResult())
+  }
 
-  override def getExportedKeys(catalog: String, schema: String, table: String): ResultSet =
-    throw new SQLFeatureNotSupportedException
+  // getImportedKeys, getExportedKeys and getCrossReference share the JDBC foreign-key result
+  // schema. Spark supports informational FOREIGN KEY constraints on DSv2 tables (SPARK-51207),
+  // but the Spark Connect JDBC client cannot retrieve them in a structured way yet, so they all
+  // return an empty result set instead of throwing.
+  private def emptyForeignKeys: ResultSet = {
+    val df = conn.spark.emptyDataFrame
+      .withColumn("PKTABLE_CAT", lit(""))
+      .withColumn("PKTABLE_SCHEM", lit(""))
+      .withColumn("PKTABLE_NAME", lit(""))
+      .withColumn("PKCOLUMN_NAME", lit(""))
+      .withColumn("FKTABLE_CAT", lit(""))
+      .withColumn("FKTABLE_SCHEM", lit(""))
+      .withColumn("FKTABLE_NAME", lit(""))
+      .withColumn("FKCOLUMN_NAME", lit(""))
+      .withColumn("KEY_SEQ", lit(0.toShort))
+      .withColumn("UPDATE_RULE", lit(0.toShort))
+      .withColumn("DELETE_RULE", lit(0.toShort))
+      .withColumn("FK_NAME", lit(""))
+      .withColumn("PK_NAME", lit(""))
+      .withColumn("DEFERRABILITY", lit(0.toShort))
+    new SparkConnectResultSet(df.collectResult())
+  }
+
+  override def getImportedKeys(catalog: String, schema: String, table: String): ResultSet = {
+    conn.checkOpen()
+    emptyForeignKeys
+  }
+
+  override def getExportedKeys(catalog: String, schema: String, table: String): ResultSet = {
+    conn.checkOpen()
+    emptyForeignKeys
+  }
 
   override def getCrossReference(
       parentCatalog: String,
       parentSchema: String,
       parentTable: String,
       foreignCatalog: String,
       foreignSchema: String,
-      foreignTable: String): ResultSet =
-    throw new SQLFeatureNotSupportedException
+      foreignTable: String): ResultSet = {
+    conn.checkOpen()
+    emptyForeignKeys
+  }
 
-  override def getTypeInfo: ResultSet =
-    throw new SQLFeatureNotSupportedException
+  // Static catalog of the Spark SQL atomic types, so JDBC clients can resolve type
+  // information instead of failing on a thrown exception.
+  override def getTypeInfo: ResultSet = {
+    conn.checkOpen()
+
+    val df = TYPE_INFO
+      .toDF(
+        "TYPE_NAME",
+        "DATA_TYPE",
+        "PRECISION",
+        "LITERAL_PREFIX",
+        "LITERAL_SUFFIX",
+        "CREATE_PARAMS",
+        "NULLABLE",
+        "CASE_SENSITIVE",
+        "SEARCHABLE",
+        "UNSIGNED_ATTRIBUTE",
+        "FIXED_PREC_SCALE",
+        "AUTO_INCREMENT",
+        "LOCAL_TYPE_NAME",
+        "MINIMUM_SCALE",
+        "MAXIMUM_SCALE",
+        "SQL_DATA_TYPE",
+        "SQL_DATETIME_SUB",
+        "NUM_PREC_RADIX")
+      .orderBy("DATA_TYPE")
+    new SparkConnectResultSet(df.collectResult())
+  }
 
   override def getIndexInfo(
       catalog: String,
@@ -816,4 +884,83 @@ object SparkConnectDatabaseMetaData {
   )
 
   private[jdbc] val TABLE_TYPES = Seq("TABLE", "VIEW")
+
+  // One row of the java.sql.DatabaseMetaData.getTypeInfo result.
+  private[jdbc] type TypeInfoRow =
+    (
+        String,  // TYPE_NAME
+        Int,     // DATA_TYPE
+        Int,     // PRECISION
+        String,  // LITERAL_PREFIX
+        String,  // LITERAL_SUFFIX
+        String,  // CREATE_PARAMS
+        Short,   // NULLABLE
+        Boolean, // CASE_SENSITIVE
+        Short,   // SEARCHABLE
+        Boolean, // UNSIGNED_ATTRIBUTE
+        Boolean, // FIXED_PREC_SCALE
+        Boolean, // AUTO_INCREMENT
+        String,  // LOCAL_TYPE_NAME
+        Short,   // MINIMUM_SCALE
+        Short,   // MAXIMUM_SCALE
+        Int,     // SQL_DATA_TYPE
+        Int,     // SQL_DATETIME_SUB
+        Integer  // NUM_PREC_RADIX (null for non-numeric types)
+    )
+
+  // Fills the columns that are constant across all Spark atomic types: every type is
+  // nullable and searchable, and none are unsigned, fixed-prec-scale, or auto-increment.
+  // `literalPrefix` is also used as the literal suffix unless `literalSuffix` is given;
+  // BINARY is the exception, whose literals use the hex syntax X'...'.
+  private[jdbc] def typeRow(
+      typeName: String,
+      dataType: Int,
+      precision: Int,
+      literalPrefix: String,
+      createParams: String,
+      caseSensitive: Boolean,
+      minScale: Short,
+      maxScale: Short,
+      numPrecRadix: Integer,
+      literalSuffix: String = null): TypeInfoRow =
+    (
+      typeName,
+      dataType,
+      precision,
+      literalPrefix,
+      if (literalSuffix != null) literalSuffix else literalPrefix,
+      createParams,
+      typeNullable.toShort,
+      caseSensitive,
+      typeSearchable.toShort,
+      false,
+      false,
+      false,
+      null,
+      minScale,
+      maxScale,
+      0,
+      0,
+      numPrecRadix)
+
+  // Static JDBC type metadata for the Spark SQL atomic types, mirroring the
+  // JdbcTypeUtils type-code/precision mapping. Only STRING is case-sensitive.
+  // TIMESTAMP_NTZ is omitted because it maps to the same JDBC type code
+  // (Types.TIMESTAMP) as TIMESTAMP, so the TIMESTAMP row already covers it.
+  // TIME is omitted for now because its maximum PRECISION/scale representation
+  // in getTypeInfo is not yet settled.
+  private[jdbc] val TYPE_INFO: Seq[TypeInfoRow] = Seq(
+    typeRow("BOOLEAN", Types.BOOLEAN, 1, null, null, false, 0, 0, null),
+    typeRow("TINYINT", Types.TINYINT, 3, null, null, false, 0, 0, 10),
+    typeRow("SMALLINT", Types.SMALLINT, 5, null, null, false, 0, 0, 10),
+    typeRow("INT", Types.INTEGER, 10, null, null, false, 0, 0, 10),
+    typeRow("BIGINT", Types.BIGINT, 19, null, null, false, 0, 0, 10),
+    typeRow("FLOAT", Types.FLOAT, 7, null, null, false, 0, 0, 10),
+    typeRow("DOUBLE", Types.DOUBLE, 15, null, null, false, 0, 0, 10),
+    typeRow("DECIMAL", Types.DECIMAL, 38, null, "precision,scale", false, 0, 38, 10),
+    typeRow("STRING", Types.VARCHAR, Int.MaxValue, "'", null, true, 0, 0, null),
+    typeRow("BINARY", Types.VARBINARY, Int.MaxValue, "X'", null, false, 0, 0, null,
+      literalSuffix = "'"),
+    typeRow("DATE", Types.DATE, 10, "'", null, false, 0, 0, null),
+    typeRow("TIMESTAMP", Types.TIMESTAMP, 29, "'", null, false, 0, 6, null))
 }
diff --git a/sql/connect/client/jdbc/src/test/scala/org/apache/spark/sql/connect/client/jdbc/SparkConnectDatabaseMetaDataSuite.scala b/sql/connect/client/jdbc/src/test/scala/org/apache/spark/sql/connect/client/jdbc/SparkConnectDatabaseMetaDataSuite.scala
@@ -808,4 +808,150 @@ class SparkConnectDatabaseMetaDataSuite extends ConnectFunSuite with RemoteSpark
       }
     }
   }
+
+  test("SparkConnectDatabaseMetaData getPrimaryKeys") {
+    withConnection { conn =>
+      val metadata = conn.getMetaData
+      // Spark has no primary keys, so an empty result set with the JDBC schema is returned.
+      Using.resource(metadata.getPrimaryKeys(null, null, null)) { rs =>
+        val rsmd = rs.getMetaData
+        assert((1 to rsmd.getColumnCount).map(rsmd.getColumnName) === Seq(
+          "TABLE_CAT", "TABLE_SCHEM", "TABLE_NAME", "COLUMN_NAME", "KEY_SEQ", "PK_NAME"))
+        assert(!rs.next())
+      }
+    }
+  }
+
+  test("SparkConnectDatabaseMetaData getImportedKeys, getExportedKeys and getCrossReference") {
+    withConnection { conn =>
+      val metadata = conn.getMetaData
+      val foreignKeySchema = Seq(
+        "PKTABLE_CAT", "PKTABLE_SCHEM", "PKTABLE_NAME", "PKCOLUMN_NAME",
+        "FKTABLE_CAT", "FKTABLE_SCHEM", "FKTABLE_NAME", "FKCOLUMN_NAME",
+        "KEY_SEQ", "UPDATE_RULE", "DELETE_RULE", "FK_NAME", "PK_NAME", "DEFERRABILITY")
+      // Spark has no foreign keys, so all three return an empty result set with the JDBC schema.
+      Seq(
+        () => metadata.getImportedKeys(null, null, null),
+        () => metadata.getExportedKeys(null, null, null),
+        () => metadata.getCrossReference(null, null, null, null, null, null))
+        .foreach { getForeignKeys =>
+        Using.resource(getForeignKeys()) { rs =>
+          val rsmd = rs.getMetaData
+          assert((1 to rsmd.getColumnCount).map(rsmd.getColumnName) === foreignKeySchema)
+          assert(!rs.next())
+        }
+      }
+    }
+  }
+
+  test("SparkConnectDatabaseMetaData getTypeInfo") {
+    withConnection { conn =>
+      val metadata = conn.getMetaData
+      Using.resource(metadata.getTypeInfo) { rs =>
+        val rsmd = rs.getMetaData
+        assert((1 to rsmd.getColumnCount).map(rsmd.getColumnName) === Seq(
+          "TYPE_NAME", "DATA_TYPE", "PRECISION", "LITERAL_PREFIX", "LITERAL_SUFFIX",
+          "CREATE_PARAMS", "NULLABLE", "CASE_SENSITIVE", "SEARCHABLE", "UNSIGNED_ATTRIBUTE",
+          "FIXED_PREC_SCALE", "AUTO_INCREMENT", "LOCAL_TYPE_NAME", "MINIMUM_SCALE",
+          "MAXIMUM_SCALE", "SQL_DATA_TYPE", "SQL_DATETIME_SUB", "NUM_PREC_RADIX"))
+
+        case class TypeInfo(
+            name: String,
+            dataType: Int,
+            precision: Int,
+            literalPrefix: String,
+            literalSuffix: String,
+            createParams: String,
+            caseSensitive: Boolean,
+            nullable: Short,
+            searchable: Short,
+            minScale: Short,
+            maxScale: Short,
+            numPrecRadix: Option[Int])
+        val types = new Iterator[TypeInfo] {
+          def hasNext: Boolean = rs.next()
+          def next(): TypeInfo = TypeInfo(
+            name = rs.getString("TYPE_NAME"),
+            dataType = rs.getInt("DATA_TYPE"),
+            precision = rs.getInt("PRECISION"),
+            literalPrefix = rs.getString("LITERAL_PREFIX"),
+            literalSuffix = rs.getString("LITERAL_SUFFIX"),
+            createParams = rs.getString("CREATE_PARAMS"),
+            caseSensitive = rs.getBoolean("CASE_SENSITIVE"),
+            nullable = rs.getShort("NULLABLE"),
+            searchable = rs.getShort("SEARCHABLE"),
+            minScale = rs.getShort("MINIMUM_SCALE"),
+            maxScale = rs.getShort("MAXIMUM_SCALE"),
+            numPrecRadix =
+              Option(rs.getObject("NUM_PREC_RADIX")).map(_.asInstanceOf[Integer].toInt))
+        }.toSeq
+
+        // results are ordered by DATA_TYPE
+        assert(types.map(t => (t.name, t.dataType)) === Seq(
+          ("TINYINT", Types.TINYINT),
+          ("BIGINT", Types.BIGINT),
+          ("BINARY", Types.VARBINARY),
+          ("DECIMAL", Types.DECIMAL),
+          ("INT", Types.INTEGER),
+          ("SMALLINT", Types.SMALLINT),
+          ("FLOAT", Types.FLOAT),
+          ("DOUBLE", Types.DOUBLE),
+          ("STRING", Types.VARCHAR),
+          ("BOOLEAN", Types.BOOLEAN),
+          ("DATE", Types.DATE),
+          ("TIMESTAMP", Types.TIMESTAMP)))
+
+        // every type is nullable and searchable
+        assert(types.forall(_.nullable == DatabaseMetaData.typeNullable))
+        assert(types.forall(_.searchable == DatabaseMetaData.typeSearchable))
+        // only STRING is case-sensitive
+        assert(types.filter(_.caseSensitive).map(_.name) === Seq("STRING"))
+
+        // string-like types are quoted with a single quote on both sides, except BINARY,
+        // whose literals use the hex syntax X'...'. Numeric types carry no literal quote.
+        val quoted = Map(
+          "STRING" -> ("'", "'"),
+          "DATE" -> ("'", "'"),
+          "TIMESTAMP" -> ("'", "'"),
+          "BINARY" -> ("X'", "'"))
+        types.foreach { t =>
+          val (prefix, suffix) = quoted.getOrElse(t.name, (null, null))
+          assert(t.literalPrefix === prefix, s"unexpected LITERAL_PREFIX for ${t.name}")
+          assert(t.literalSuffix === suffix, s"unexpected LITERAL_SUFFIX for ${t.name}")
+        }
+
+        // PRECISION mirrors JdbcTypeUtils.getPrecision for every type
+        val precisions = Map(
+          "BOOLEAN" -> 1, "TINYINT" -> 3, "SMALLINT" -> 5, "INT" -> 10, "BIGINT" -> 19,
+          "FLOAT" -> 7, "DOUBLE" -> 15, "DECIMAL" -> 38, "STRING" -> Int.MaxValue,
+          "BINARY" -> Int.MaxValue, "DATE" -> 10, "TIMESTAMP" -> 29)
+        types.foreach { t =>
+          assert(t.precision === precisions(t.name), s"unexpected PRECISION for ${t.name}")
+        }
+
+        // CREATE_PARAMS is set only for the parameterized types
+        val createParams = Map("DECIMAL" -> "precision,scale")
+        types.foreach { t =>
+          assert(t.createParams === createParams.getOrElse(t.name, null),
+            s"unexpected CREATE_PARAMS for ${t.name}")
+        }
+
+        // (MINIMUM_SCALE, MAXIMUM_SCALE); types not listed carry no scale (0, 0)
+        val scales = Map("DECIMAL" -> (0, 38), "TIMESTAMP" -> (0, 6))
+        types.foreach { t =>
+          val (minScale, maxScale) = scales.getOrElse(t.name, (0, 0))
+          assert(t.minScale === minScale.toShort, s"unexpected MINIMUM_SCALE for ${t.name}")
+          assert(t.maxScale === maxScale.toShort, s"unexpected MAXIMUM_SCALE for ${t.name}")
+        }
+
+        // NUM_PREC_RADIX is 10 for numeric types and NULL otherwise, mirroring JdbcTypeUtils
+        val numericTypes =
+          Set("TINYINT", "SMALLINT", "INT", "BIGINT", "FLOAT", "DOUBLE", "DECIMAL")
+        types.foreach { t =>
+          val expected = if (numericTypes.contains(t.name)) Some(10) else None
+          assert(t.numPrecRadix === expected, s"unexpected NUM_PREC_RADIX for ${t.name}")
+        }
+      }
+    }
+  }
 }