StorageListener is a BlockStatusListener that uses SparkListener callbacks to track changes in the persistence status of RDD blocks in a Spark application.
| Callback | Description |
|---|---|
Updates _rddInfoMap with the update to a single block. |
|
Removes RDDInfo instances from _rddInfoMap that participated in the completed stage as well as the ones that are no longer cached. |
|
Updates _rddInfoMap registry with the names of every RDDInfo in the submitted stage, possibly adding new RDDInfo instances if they were not registered yet. |
|
Removes an RDDInfo from _rddInfoMap registry for the unpersisted RDD. |
| Name | Description |
|---|---|
RDDInfo instances per IDs Used when…FIXME |
StorageListener takes the following when created:
StorageListener initializes the internal registries and counters.
|
Note
|
StorageListener is created when SparkUI is created.
|
activeStorageStatusList: Seq[StorageStatus]activeStorageStatusList requests StorageStatusListener for active BlockManagers (on executors).
|
Note
|
|
onBlockUpdated(blockUpdated: SparkListenerBlockUpdated): UnitonBlockUpdated creates a BlockStatus (from the input SparkListenerBlockUpdated) and updates registered RDDInfos (with block updates from BlockManagers) (passing in BlockId and BlockStatus as a single-element collection of updated blocks).
|
Note
|
onBlockUpdated is part of SparkListener contract to announce that there was a change in a block status (on a BlockManager on an executor).
|
onStageCompleted(stageCompleted: SparkListenerStageCompleted): UnitonStageCompleted finds the identifiers of the RDDs that have participated in the completed stage and removes them from _rddInfoMap registry as well as the RDDs that are no longer cached.
|
Note
|
onStageCompleted is part of SparkListener contract to announce that a stage has finished.
|
onStageSubmitted(stageSubmitted: SparkListenerStageSubmitted): UnitonStageSubmitted updates _rddInfoMap registry with the names of every RDDInfo in stageSubmitted, possibly adding new RDDInfo instances if they were not registered yet.
|
Note
|
onStageSubmitted is part of SparkListener contract to announce that the missing tasks of a stage were submitted for execution.
|
onUnpersistRDD(unpersistRDD: SparkListenerUnpersistRDD): UnitonUnpersistRDD removes the RDDInfo from _rddInfoMap registry for the unpersisted RDD (from unpersistRDD).
|
Note
|
onUnpersistRDD is part of SparkListener contract to announce that an RDD has been unpersisted.
|
Updating Registered RDDInfos (with Block Updates from BlockManagers) — updateRDDInfo Internal Method
updateRDDInfo(updatedBlocks: Seq[(BlockId, BlockStatus)]): UnitupdateRDDInfo finds the RDDs for the input updatedBlocks (for BlockIds).
|
Note
|
updateRDDInfo finds BlockIds that are RDDBlockIds.
|
updateRDDInfo takes RDDInfo entries (in _rddInfoMap registry) for which there are blocks in the input updatedBlocks and updates RDDInfos (using StorageStatus) (from activeStorageStatusList).
|
Note
|
updateRDDInfo is used exclusively when StorageListener gets notified about a change in a block status (on a BlockManager on an executor).
|
updateRddInfo(rddInfos: Seq[RDDInfo], statuses: Seq[StorageStatus]): Unit|
Caution
|
FIXME |
|
Note
|
|