Docker image tag scrapper

serhattsnmz · March 7, 2024, 11:02am

Hi guys,

Those who work with Docker and don’t use the latest version for every image know that finding a specific tag version on Docker Hub is a pain in the ass. It is confusing and hard to find which is the latest version.

So, I developed a basic scrapper that is used for;

Order versions by name or date,
Exclude versions that contain specific keywords,
Filter versions by OS or architecture.

Hope that may help some people who also work with Docker and find the Docker Hub confusing.

The repo: Docker Version Parser on Github

Example Usages:

$ python3 run.py nginx

UPDATE DATE    NAME                     OS     ARCH
-------------  -----------------------  -----  -------------------
2022-12-21     10.13-slim               linux  386,arm64,amd64,arm
2022-12-21     10.13                    linux  386,arm64,amd64,arm
2022-12-21     10-slim                  linux  386,arm64,amd64,arm
2022-12-21     10                       linux  386,arm64,amd64,arm
...

REPO NAME    : library/debian
TOTAL PARSED : 50 images found. (Total : 1580)
PARSED PAGES : 5/159 page parsed
NAME FILTERS : -
OS FILTER    : -
ARCH FILTER  : -

$ python3 run.py portainer/portainer-ce -fi 2.16 -fe linux,windows -o linux -a amd64

UPDATE DATE    NAME           OS             ARCH
-------------  -------------  -------------  -----------------------------
2022-10-30     2.16.0-alpine  linux          amd64,arm,arm64
2022-10-30     2.16.0         windows,linux  s390x,arm,ppc64le,amd64,arm64
2022-11-09     2.16.1-alpine  linux          amd64,arm,arm64
2022-11-09     2.16.1         windows,linux  s390x,arm,ppc64le,amd64,arm64
2022-11-21     2.16.2-alpine  linux          amd64,arm,arm64
2022-11-21     2.16.2         windows,linux  s390x,arm,ppc64le,amd64,arm64

REPO NAME    : portainer/portainer-ce
TOTAL PARSED : 6 images found. (Total : 391)
PARSED PAGES : 5/40 page parsed
NAME FILTERS : linux,windows
OS FILTER    : linux
ARCH FILTER  : amd64

gauravearn · March 7, 2024, 2:42pm

@serhattsnmz great. I am coding heavily since morning both in clusters and also in python, R, and other classes. Just saw your post. great post. From a devops side. i just coded this: 2 simple concatenated loops, which you can also do in C++ or C# and you will get the same information right from the devops point: save the source code of the page or you can download the page as

read -r -p "enter the image name:" image
wget -f https://hub.docker.com/_/$image

for i in $(grep https://git.launchpad.net/cloud-images *.html | \
grep dist-[a-z][a-z][a-z][a-z][a-z]-amd64 -o | cut -f 2 -d "-"); \
do echo $i:latest; done && for i in $(grep cloud-images-release-manager *.html | \
cut -f 1,2 -d "\"" | cut -f 2 -d "+" | cut -f 3 -d "/"); do echo $i:latest; done

a faster cache storing and not iter returns, a feed coloner.

alles super,
Gaurav