How do you fix an issue you don’t understand?

Once you start working in IT, you’ll inevitably be faced with trying to troubleshoot systems you’ve never worked with and don’t understand. Yet, you’ll need to be able to push through and resolve the problem anyhow. Here’s how.

This fella is trying to push through the problem in the wrong way. Photo by Daniel Mingook Kim on Unsplash

There’s a big problem in IT that folks don’t talk about enough: there’s just too much knowledge out there to absorb. There are too many different types of systems, each with their own way of doing things and each with their own terminology.

There are so many different systems out there that even if you decided to do a deep dive on a different one each day (good luck with that!) you’d spend the rest of your life and still not be able to cover them all. To make things worse, knowledge that you gain will be useful for a time and then quickly become obsolete, meaning that you’ll constantly need to keep learning just to keep afloat.

This can lead people to develop impostor syndrome, despair, leave the field, or worse yet, never enter the field. Luckily, there is something you can do to work around this. Instead of trying to learn every system (which you can’t do,) focus on learning good troubleshooting technique instead.

No matter how esoteric or unknowable a system might seem, understand that most systems resemble each other when looked at from a 30,000 foot view. Most systems will take input, do something with that input, and then give you back some kind of output. The input might come from the user, the user’s laptop, or a server somewhere. And the output might be something you see on the screen, or it could be a change made to a file on a server halfway around the world. But at it’s core, any system is taking something and doing something with it.

With that in mind, try to get an understanding of where the input is coming from, what is being done to the data, and what you should expect to see with a successful operation. Now, this part is important: you’re trying to stay at a very high level here. Just try to understand how everything fits together so you can get an idea of where in the chain the problem lies.

Let’s use an example to see this in practice. Let’s say that when you press a particular button on a particular software on a certain user’s machine, you get an error. You’re completely unfamiliar with what the software does as you’ve never worked with it, and you’re not able to understand the error because it uses terminology you’re unfamiliar with.

Welcome to IT! This kind of scenario happens more than you’d think!

First, your best option is to go to your company’s documentation (if they have documentation – sadly, that’s not always a given.) If your company doesn’t have documentation, or if the documentation is not useful, Google is your friend. Whichever route you take, try to understand what the software is trying to do.

After skimming the documentation, it seems that the software allows a user to apply changes they made to a local file and sync those to a server. Other users are then able to work on the file and sync their own changes to it. Those changes are then pushed out to all other users’ laptops, ensuring everyone can see everyone else’s changes.

Great! You now have a high level understanding of what the system does. You can now thing about the different places such a system might break. You don’t need to try to understand the unique terminology of the system or deep dive into learning it. Just focus on the high level components!

Let’s look at the different components at play here. The main things to look at are: the user’s computer, the file on the user’s computer, the connection between the user’s computer and the server, the server, and the file on the server.

Let’s quickly determine where you should spend your efforts. It doesn’t make sense for you to work on the user’s computer if the server crashed, right? So first, see if the system is working for anyone else. If it’s not working for anyone, then you’ll want to focus your troubleshooting on the server that hosts the system. Maybe it’s crashed, or maybe there’s a network issue preventing access to the server.

For this example, let’s say things are working well for everyone else. Now, let’s see if there’s a problem with the file that is being synced. For example, the file may have become corrupted. You can check for this by trying to open the file, which will tell you if the file on the local machine (the user’s laptop) is the problem. If that opens fine, you can have someone else try to sync the file on the remote machine (the server.) If everything seems fine there, the issue is not with the file. Maybe it’s an issue with the connection instead?

To troubleshoot the connection, try to connect in a different way. For example, if the user is on WIFI, try to physically plug the computer into an Ethernet port. Hold on a second, it’s syncing now!

You now know that there is an issue with the type of connection. Go back on WIFI and try to access a website. If you’re unable to access any website, there may be a problem with the user’s WIFI driver or a physical problem with their wireless card. For the sake of this example, let’s say that their WIFI is working just fine. You now know that the server is probably not accessible via WIFI.

If it should be accessible, you may just need to have someone modify the firewall rules in place on the WIFI network. Or perhaps the firewall rules on the laptop need to be modified. (Don’t do this without permission!) Either way, at this point you know what the problem is and have unlocked your user.

The last possibility would have been a problem with the software. Perhaps that needed to be tweaked or re-installed? But notice that we left this for last. We did this for two reasons: 1) we don’t want to get in the weeds with a software we don’t know too well if we can avoid it and 2) if we had reinstalled the software before checking the other things, we would have been needlessly disruptive to the user and risked losing some of their work for nothing.

Always, always, always take the most disruptive troubleshooting steps last!

Now you know enough to fix just about any issue you’ll ever encouter. This type of troubleshooting technique will work for you from your first days in the help desk to ten years down the road with you’re a super savvy systems administrator! When faced with a problem with a system you’ve never been exposed to, remember, don’t panic. Try to understand what is happening at a high level so you can isolate where the problem is occurring. Once you do that, all systems start to resemble themselves and you can start working on a fix!

With each post, I cover a new topic to help you get your start (or keep progressing) in your IT career. If it’s your first time visiting this blog, start here. Or, see all my posts about interview questions you should be able to answer.