Docker Image Analysis

Source{d} mining software for dockerhub dataset

This repo contains scripts and utilities used to produce a docker images libraries dataset by analyzing in-depth each image's filesystem.

Requirements:

Have Docker Installed
Have IPython Installed

Usage

ipython main.py ./images.txt ./packages

Where ./images.txt contains the list of images to analyse, one per line. If no tag is specified, latest will be used.

Example images.txt:

amancevice/superset
ubuntu:18.04
express-gateway
alpine/node
archmageinc/node-web-dev

And ./packages the folder where the result will be written on disk.

The output directory structure is the same as the DockerhubMetadata dataset. The top level directory is the first two letters of the image name, the inner directories correspond to the name, including the /. :latest is stripped from the file names. Examples: the configuration for tensorflow/tensorflow:2.0.0b0 will be at te/tensorflow/tensorflow:2.0.0b0.json, and for mongo:latest at mo/mongo.json.

Notes

show_count is a bash script that specifically shows the amount of already fetched images in source{d}'s typos{1-4} nodes. It is of no use outside of the organization and should be removed before making the repo public. It is left here as documentation about the ongoing tasks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docker Image Analysis

Usage

Notes

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Docker Image Analysis

Usage

Notes