Docker image tag scrapper

Hi guys,

Those who work with Docker and don’t use the latest version for every image know that finding a specific tag version on Docker Hub is a pain in the ass. It is confusing and hard to find which is the latest version.

So, I developed a basic scrapper that is used for;

  • Order versions by name or date,
  • Exclude versions that contain specific keywords,
  • Filter versions by OS or architecture.

Hope that may help some people who also work with Docker and find the Docker Hub confusing.

The repo: Docker Version Parser on Github

Example Usages:

$ python3 nginx

UPDATE DATE    NAME                     OS     ARCH
-------------  -----------------------  -----  -------------------
2022-12-21     10.13-slim               linux  386,arm64,amd64,arm
2022-12-21     10.13                    linux  386,arm64,amd64,arm
2022-12-21     10-slim                  linux  386,arm64,amd64,arm
2022-12-21     10                       linux  386,arm64,amd64,arm

REPO NAME    : library/debian
TOTAL PARSED : 50 images found. (Total : 1580)
PARSED PAGES : 5/159 page parsed
OS FILTER    : -
$ python3 portainer/portainer-ce -fi 2.16 -fe linux,windows -o linux -a amd64

UPDATE DATE    NAME           OS             ARCH
-------------  -------------  -------------  -----------------------------
2022-10-30     2.16.0-alpine  linux          amd64,arm,arm64
2022-10-30     2.16.0         windows,linux  s390x,arm,ppc64le,amd64,arm64
2022-11-09     2.16.1-alpine  linux          amd64,arm,arm64
2022-11-09     2.16.1         windows,linux  s390x,arm,ppc64le,amd64,arm64
2022-11-21     2.16.2-alpine  linux          amd64,arm,arm64
2022-11-21     2.16.2         windows,linux  s390x,arm,ppc64le,amd64,arm64

REPO NAME    : portainer/portainer-ce
TOTAL PARSED : 6 images found. (Total : 391)
PARSED PAGES : 5/40 page parsed
NAME FILTERS : linux,windows
OS FILTER    : linux
ARCH FILTER  : amd64
1 Like

@serhattsnmz great. I am coding heavily since morning both in clusters and also in python, R, and other classes. Just saw your post. great post. From a devops side. i just coded this: 2 simple concatenated loops, which you can also do in C++ or C# and you will get the same information right from the devops point: save the source code of the page or you can download the page as

read -r -p "enter the image name:" image
wget -f$image

for i in $(grep *.html | \
grep dist-[a-z][a-z][a-z][a-z][a-z]-amd64 -o | cut -f 2 -d "-"); \
do echo $i:latest; done && for i in $(grep cloud-images-release-manager *.html | \
cut -f 1,2 -d "\"" | cut -f 2 -d "+" | cut -f 3 -d "/"); do echo $i:latest; done

a faster cache storing and not iter returns, a feed coloner.

alles super,