Skip to content

Commit 0bdb10c

Browse files
committed
[SPARK-57556][SQL] Raise a clear error for the TIME data type in Hive SerDe interop
### What changes were proposed in this pull request? Apache Hive has no TIME type, so `TimeType` has no faithful representation in Hive SerDe interop. This PR (the Option B / "clear, documented error" path from [SPARK-57556](https://issues.apache.org/jira/browse/SPARK-57556)) makes `TimeType` produce a clear `AnalysisException` instead of a `scala.MatchError`/internal error when it reaches the `HiveInspectors` mapping functions, and rejects it in the Hive SerDe write path: - `HiveInspectors.toInspector(dataType)`, `toInspector(expr)` (TIME literal) and `toTypeInfo` now throw `UNSUPPORTED_DATATYPE` via a shared `unsupportedHiveType` helper. Previously `toInspector(dataType)` had no `TimeType` case and no default branch, so a TIME column hit a raw `scala.MatchError`. - `HiveFileFormat.supportDataType` rejects `TimeType` (recursing into nested struct/array/map/UDT types, preserving the prior default for all other types) so Hive SerDe writes raise `UNSUPPORTED_DATA_TYPE_FOR_DATASOURCE` (format `Hive`) via `FileFormatWriter.verifySchema`. - Documented the limitation on the TIME entry in `docs/sql-ref-datatypes.md`. ### Why are the changes needed? `HiveInspectors` had no `TimeType` case, so object-inspector creation and TypeInfo mapping fell through to a `MatchError`/internal error when a TIME column or literal reached Hive SerDe paths (for example, a TIME argument to a Hive UDF/UDAF/UDTF). This makes the behavior explicit and documented, consistent with the existing TIME rejection for Hive ORC (SPARK-51590). ### Does this PR introduce _any_ user-facing change? Yes. Using TIME with Hive UDFs or in a Hive SerDe write now fails with a clear error that names the unsupported TIME type, instead of a `MatchError`/internal error. For example, `SELECT myHiveUDF(TIME'12:01:02')` now reports `[UNSUPPORTED_DATATYPE] Unsupported data type "TIME(6)"` (wrapped by the Hive UDF resolver), and writing a TIME column through the Hive SerDe write path reports `[UNSUPPORTED_DATA_TYPE_FOR_DATASOURCE] The Hive datasource doesn't support the column ... of the type "TIME(6)"`. ### How was this patch tested? Added tests and ran them locally (`build/sbt 'hive/testOnly *HiveInspectorSuite *HiveUDFSuite *InsertSuite'`): - `HiveInspectorSuite`: `toInspector(TimeType())`, a TIME literal, and `TimeType().toTypeInfo` raise `UNSUPPORTED_DATATYPE`. - `HiveUDFSuite`: passing `TIME'12:01:02'` to a Hive `GenericUDFHash` fails with a message naming the unsupported TIME type. - `InsertSuite`: `INSERT OVERWRITE LOCAL DIRECTORY ... STORED AS PARQUET SELECT TIME'...'` (with `spark.sql.hive.convertMetastoreInsertDir=false`) raises `UNSUPPORTED_DATA_TYPE_FOR_DATASOURCE`. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor (Claude Opus 4.8) Closes #56850 from MaxGekk/time-hive-serde. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Max Gekk <max.gekk@gmail.com> (cherry picked from commit e80f420) Signed-off-by: Max Gekk <max.gekk@gmail.com>
1 parent da09de9 commit 0bdb10c

6 files changed

Lines changed: 92 additions & 2 deletions

File tree

docs/sql-ref-datatypes.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,7 @@ Spark SQL and DataFrames support the following data types:
4848
time-zone.
4949
- `TimeType(precision)`: Represents values comprising values of fields hour, minute and second with the number of decimal digits `precision` following the decimal point in the seconds field, without a time-zone.
5050
The range of values is from `00:00:00` to `23:59:59` for min precision `0`, and to `23:59:59.999999999` for max precision `9`. The default precision is `6`.
51+
- Note: Apache Hive has no TIME type, so `TimeType` is not supported in Hive SerDe interop. Storing it in a Hive SerDe table (including `INSERT OVERWRITE DIRECTORY ... STORED AS`) or passing it to a Hive UDF/UDAF/UDTF raises an error rather than silently converting the value.
5152
- `TimestampType`: Timestamp with local time zone(TIMESTAMP_LTZ). It represents values comprising values of fields year, month, day,
5253
hour, minute, and second, with the session local time-zone. The timestamp value represents an
5354
absolute point in time.

sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -961,6 +961,14 @@ private[hive] trait HiveInspectors {
961961
case _: UserDefinedType[_] =>
962962
val sqlType = dataType.asInstanceOf[UserDefinedType[_]].sqlType
963963
toInspector(sqlType)
964+
// Hive has no TIME type, so it cannot be represented by any Hive object inspector.
965+
case _: TimeType => throw unsupportedHiveType(dataType)
966+
}
967+
968+
private def unsupportedHiveType(dataType: DataType): AnalysisException = {
969+
new AnalysisException(
970+
errorClass = "UNSUPPORTED_DATATYPE",
971+
messageParameters = Map("typeName" -> toSQLType(dataType)))
964972
}
965973

966974
/**
@@ -1029,6 +1037,9 @@ private[hive] trait HiveInspectors {
10291037
toInspector(dt)
10301038
case Literal(_, dt: UserDefinedType[_]) =>
10311039
toInspector(dt.sqlType)
1040+
// Hive has no TIME type, so a TIME constant cannot be mapped to a Hive object inspector.
1041+
case Literal(_, dt: TimeType) =>
1042+
throw unsupportedHiveType(dt)
10321043
// We will enumerate all of the possible constant expressions, throw exception if we missed
10331044
case Literal(_, dt) =>
10341045
throw SparkException.internalError(s"Hive doesn't support the constant type [$dt].")
@@ -1281,6 +1292,8 @@ private[hive] trait HiveInspectors {
12811292
case NullType => voidTypeInfo
12821293
case _: DayTimeIntervalType => intervalDayTimeTypeInfo
12831294
case _: YearMonthIntervalType => intervalYearMonthTypeInfo
1295+
// Hive has no TIME type, so there is no Hive TypeInfo to map it to.
1296+
case _: TimeType => throw unsupportedHiveType(dt)
12841297
case dt =>
12851298
throw new AnalysisException(
12861299
errorClass = "_LEGACY_ERROR_TEMP_3095", messageParameters = Map("dt" -> toSQLType(dt)))

sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveFileFormat.scala

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ import org.apache.spark.sql.execution.datasources.{FileFormat, OutputWriter, Out
4141
import org.apache.spark.sql.hive.{HiveInspectors, HiveTableUtil}
4242
import org.apache.spark.sql.internal.SessionStateHelper
4343
import org.apache.spark.sql.sources.DataSourceRegister
44-
import org.apache.spark.sql.types.StructType
44+
import org.apache.spark.sql.types.{ArrayType, DataType, MapType, StructType, TimeType, UserDefinedType}
4545
import org.apache.spark.util.SerializableJobConf
4646

4747
/**
@@ -115,6 +115,23 @@ case class HiveFileFormat(fileSinkConf: FileSinkDesc)
115115
}
116116
}
117117

118+
override def supportDataType(dataType: DataType): Boolean = dataType match {
119+
// Hive has no TIME type, so it cannot be stored in a Hive serde table. Reject it explicitly
120+
// (recursing into nested types) while preserving the default behavior for all other types.
121+
case _: TimeType => false
122+
123+
case st: StructType => st.forall { f => supportDataType(f.dataType) }
124+
125+
case ArrayType(elementType, _) => supportDataType(elementType)
126+
127+
case MapType(keyType, valueType, _) =>
128+
supportDataType(keyType) && supportDataType(valueType)
129+
130+
case udt: UserDefinedType[_] => supportDataType(udt.sqlType)
131+
132+
case _ => true
133+
}
134+
118135
override def supportFieldName(name: String): Boolean = {
119136
fileSinkConf.getTableInfo.getOutputFileFormatClassName match {
120137
case "org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat" =>

sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveInspectorSuite.scala

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ import org.apache.hadoop.hive.serde2.typeinfo.DecimalTypeInfo
2828
import org.apache.hadoop.io.LongWritable
2929

3030
import org.apache.spark.SparkFunSuite
31-
import org.apache.spark.sql.{Row, TestUserClassUDT}
31+
import org.apache.spark.sql.{AnalysisException, Row, TestUserClassUDT}
3232
import org.apache.spark.sql.catalyst.InternalRow
3333
import org.apache.spark.sql.catalyst.expressions.Literal
3434
import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, GenericArrayData, MapData}
@@ -291,4 +291,21 @@ class HiveInspectorSuite extends SparkFunSuite with HiveInspectors {
291291
assert(typeInfo2.precision() === 18)
292292
assert(typeInfo2.scale() === 10)
293293
}
294+
295+
test("SPARK-57556: TIME type is unsupported in Hive object inspectors") {
296+
val timeType = TimeType()
297+
val expectedParams = Map("typeName" -> s"\"${timeType.sql}\"")
298+
checkError(
299+
exception = intercept[AnalysisException](toInspector(timeType)),
300+
condition = "UNSUPPORTED_DATATYPE",
301+
parameters = expectedParams)
302+
checkError(
303+
exception = intercept[AnalysisException](toInspector(Literal.create(null, timeType))),
304+
condition = "UNSUPPORTED_DATATYPE",
305+
parameters = expectedParams)
306+
checkError(
307+
exception = intercept[AnalysisException](timeType.toTypeInfo),
308+
condition = "UNSUPPORTED_DATATYPE",
309+
parameters = expectedParams)
310+
}
294311
}

sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertSuite.scala

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -683,6 +683,31 @@ class InsertSuite extends QueryTest with TestHiveSingleton with BeforeAndAfter {
683683
}
684684
}
685685

686+
test("SPARK-57556: TIME type is unsupported when writing to a Hive serde directory") {
687+
// Disable native data source conversion so that the write goes through the Hive serde
688+
// path (HiveFileFormat) instead of a native data source that may support TIME.
689+
withSQLConf(HiveUtils.CONVERT_METASTORE_INSERT_DIR.key -> "false") {
690+
withTempDir { dir =>
691+
// InsertIntoHiveDirCommand wraps the failure in a SparkException, so assert on the cause.
692+
val e = intercept[SparkException] {
693+
sql(
694+
s"""
695+
|INSERT OVERWRITE LOCAL DIRECTORY '${dir.toURI.getPath}'
696+
|STORED AS PARQUET
697+
|SELECT TIME'12:01:02' AS c
698+
""".stripMargin)
699+
}
700+
checkError(
701+
exception = e.getCause.asInstanceOf[AnalysisException],
702+
condition = "UNSUPPORTED_DATA_TYPE_FOR_DATASOURCE",
703+
parameters = Map(
704+
"columnName" -> "`c`",
705+
"columnType" -> s"\"${TimeType().sql}\"",
706+
"format" -> "Hive"))
707+
}
708+
}
709+
}
710+
686711
test("insert overwrite to dir from temp table") {
687712
withTempView("test_insert_table") {
688713
spark.range(10).selectExpr("id", "id AS str").createOrReplaceTempView("test_insert_table")

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDFSuite.scala

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@ import org.apache.spark.sql.execution.WholeStageCodegenExec
4141
import org.apache.spark.sql.functions.{call_function, max}
4242
import org.apache.spark.sql.hive.test.{TestHiveSingleton, TestUDTFJar}
4343
import org.apache.spark.sql.internal.SQLConf
44+
import org.apache.spark.sql.types.TimeType
4445
import org.apache.spark.tags.SlowHiveTest
4546
import org.apache.spark.util.Utils
4647

@@ -412,6 +413,22 @@ class HiveUDFSuite extends QueryTest with TestHiveSingleton {
412413
}
413414
}
414415

416+
test("SPARK-57556: TIME type is unsupported as a Hive UDF argument") {
417+
withUserDefinedFunction("testGenericUDFHash" -> true) {
418+
sql(s"CREATE TEMPORARY FUNCTION testGenericUDFHash AS '${classOf[GenericUDFHash].getName}'")
419+
// The Hive UDF resolver wraps the failure in CANNOT_INSTANTIATE_HIVE_FUNCTION and attaches
420+
// the underlying failure as the cause, which clearly identifies the unsupported TIME type
421+
// rather than surfacing a MatchError/internal error.
422+
val e = intercept[AnalysisException] {
423+
sql("SELECT testGenericUDFHash(TIME'12:01:02')").collect()
424+
}
425+
checkError(
426+
exception = e.getCause.asInstanceOf[AnalysisException],
427+
condition = "UNSUPPORTED_DATATYPE",
428+
parameters = Map("typeName" -> s"\"${TimeType().sql}\""))
429+
}
430+
}
431+
415432
test("Hive UDFs with insufficient number of input arguments should trigger an analysis error") {
416433
withTempView("testUDF") {
417434
Seq((1, 2)).toDF("a", "b").createOrReplaceTempView("testUDF")

0 commit comments

Comments
 (0)