Why is so important for a Linux admin to master the sos command?

I’m a senior developer and 3th level support engineer for a Linux based appliance system for a few years now. When I started in this position, I couldn’t believe how customer technical support was handled (an eternal send me the output of this command, email-cycle. Every ticket lasted weeks open.) so I built a script to collect diagnostic data and a web interface to exploit the data and share it with the team. It improved the support service dramatically from weeks to hours.

At that time I didn’t know about sos command (it was called sosreport back then) but it was very much the same concept. Based on that experience, I soon realized that a tool capable of managing, sharing and analyzing sosreports was missing, so I built one without me realizing how little known the sos command is through the Linux community. So I wrote this article hoping to pick-up your curiosity and encourage people to take advantage of the sos command.

If you make a living by troubleshooting or diagnosing Linux systems whether in large production environments, or in small business with desktop computers, I think that you will find this article extremely useful.

In this article, I provide a comprehensive overview of the sos command and its many features and at the end I will tell you why is so important to have it in your list of tools.

To keep the article concise and easy to read, I limited the depth of each topic and include only brief examples where appropriate but provide links to other articles if you’d like to dive deeper on a specific feature.

Please let me know if you find this article useful or interesting.

5 Likes

I completely agree with your point about sos being an underrated tool in the Linux ecosystem.

Many administrators are highly familiar with commands such as top, journalctl, ss, dmesg, or lsof, but those tools typically provide only a partial view of the system at a specific moment. What makes sos valuable is its ability to consolidate a large amount of diagnostic information into a single archive that can be reviewed, shared, and analyzed later.

In enterprise support environments, reducing the back-and-forth communication between support teams and customers can significantly shorten resolution times. Having a complete system snapshot available from the beginning often makes root cause analysis far more efficient.

I suspect one reason sos is not discussed more frequently is that many administrators only encounter it when working with enterprise Linux support organizations. Nevertheless, it is definitely a tool worth knowing for anyone responsible for troubleshooting Linux systems at scale.

Thanks for bringing attention to a utility that deserves much more visibility within the Linux community.

5 Likes

Thanks for sharing!

I was in the same boat as you were, when you mentioned it, I had not heard or remember ever using sos before.

I did keep my word about looking into it. What I found is that sos command and sosreport are still both actively used. So still called sosreport.

On Fedora, RHEL, CentOS, Rocky, Alma:

sudo dnf install sos

On Debian and Ubuntu:

sudo apt update
sudo apt install sosreport

One side adopted the modern subcommand-style interface (sos), while the other kept the legacy command-style wrapper (sosreport) in their package.

Glad to be able to add this command into active use. :handshake:

4 Likes

Sadly, I had never heard of sos/sosreport during my entire IT career, which began in 1984, but I am now retired since 2019.

I too had built tools such as yours to which not only probed systems but also made the reporting back of results visible to the End Users (via known web page for publishing thruput/status) such that they could intervene without my being called in to intervene, where possible, simply due to the volume of events, but triggered “attention-getters” for my intervention for defined classes of issues encountered. In an HP-UX multi-server, multi-continent setup serving R&D, Engineering, and Manufacturing, you had to build your own ad-hoc safety net to survive!

I wish I had known about sos/sosreport back then. I would certainly have made my life much easier!

3 Likes