Skip to content

java.io.NotSerializableException: scala.xml.NodeSeq$$anon$1 #201

@dyf102

Description

@dyf102

I am writing the map function in spark to parse xml within the log. But I got the NotSerializableException. I cannot figure it out the reason. The trace stack is followed. How to walk around it? Anyone has suggestion?

org.apache.spark.SparkException: Task not serializable
  at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:345)
  at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:335)
  at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:159)
  at org.apache.spark.SparkContext.clean(SparkContext.scala:2292)
  at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:371)
  at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:370)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
  at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
  at org.apache.spark.rdd.RDD.map(RDD.scala:370)
  at parse(<console>:33)
  ... 49 elided
Caused by: java.io.NotSerializableException: scala.xml.NodeSeq$$anon$1
Serialization stack:
	- object not serializable (class: scala.xml.NodeSeq$$anon$1, value: <ns18:userID>4536000170315902</ns18:userID>)

The way I am using is

rows.mapPartitions(rows => {
      val XMLParser = scala.xml.XML
      rows.map(row => {
        val xmlContent = sliceLogHeader(row)
        val xmlDom = XMLParser.loadString(xmlContent)
        val headerDOM = xmlDom\ "header"
        val userID = (headerDOM \"userID").text
        val clientSessionID = (headerDOM \"clientSessionID").text
        Account(userID, clientSessionID)
      })

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions