5.0 Network Troubleshooting

icon picker
5.1 Explain the network troubleshooting methodogly

Last edited 348 days ago by Makiel [Muh-Keel]

Always try to narrow down the problem.

Did you check the super simple stuff (SSS)?
Is hardware or software causing the problem?
Is it a workstation or server problem?
Which segments of the network are affected?
Are there any cabling issues?

Did you check the super simple stuff (SSS)?

“All things being equal, the simplest explanation is probably the correct one.”
The more simple an error is, the more likely it is to happen because fewer steps have to transpire for it to occur. Is it more likely that there’s been a organized attack on your entire organization’s network whenever someone can’t connect to the internet, or it is more likely their network cable is loose? Here a few SSS things to look for:
Check to verify login procedures and rights.
Look for link lights and collision lights.
Check all power switches, cords, adapters, and the status of all hardware.
Look for user errors.

Check Network Link Lights and Collision Lights If the link lights are lit up on both the workstation's NIC and the switch port to which the workstation is connected, it's usually safe to assume that the workstation and switch are communicating just fine.
The link lights on some NICs don't activate until the driver is loaded. So, if the link light isn't on when the system is first turned on, you'll just have to wait until the operating system loads the NIC driver

User Error is common and you should always get the user to reproduce the error in front of you. Always check out the problem thoroughly. If the problem and its solution aren't immediately clear to you, try the procedure yourself, or ask someone else at another workstation to do so. Don't just leave the issue unsettled or make the assumption that it is user error or a chance abnormality because that's exactly what the bad guys out there are hoping you'll do.
Is Hardware or Software Causing the Problem? A hardware problem often rears its ugly head when some device in your computer skips a beat and/or dies. This one's pretty easy to discern because when you try to do something requiring that particular piece of hardware, you can't do it and instead get an error telling you that you can't do it.
Be sure to backup all data before you replace any faulty hardware components.
Software problems often will cause whatever networking program you’re using to freeze and be non-responsive.
Is It a Workstation or a Server Problem? The first thing you've got to determine when troubleshooting this kind of problem is whether it's only one person or a whole group that's been affected.
Single-User can’t Login workstation or access server service?
Try to log in from another workstation within the same group of users. If you can do that, the problem is definitely the user's workstation, so look for things like cabling faults, a bad NIC, power issues, and OSs.
Whole department can't access a specific server service?
Take a thorough look at the particular server and verify all user connections to it. If everyone is logged in correctly, the problem may have something to do with individual rights or permissions.
If no one can log in to that server, including you, the server probably has a communication problem with the rest of the network.
If the server has totally crashed, either you'll see messages telling you all about it on the server's monitor or you'll find its screen completely blank—screaming indicators that the server is no longer running

Which Segments of the Network Are Affected? Figuring this out can be kind of hard.
If multiple segments are affected, you may be dealing with a network-address conflict. If you're running Transmission Control Protocol/Internet Protocol (TCP/IP), remember that IP addresses must be unique across an entire network. So, if two of your segments have the same static IP subnet addresses assigned, you'll end up with duplicate IP errors.
If all of your network's users are experiencing the problem, it could be a server everyone accesses.
WAN Issue? WAN devices have built-in diagnostics that tell you whether a WAN link is working okay, which really helps you determine if the failure has something to do with the WAN link itself or with the hardware involved instead.

Is It Bad Cabling? Are the cables properly connected to the correct port? Once you've figured out whether your plight is related to one workstation, a network segment, or the whole tamale (network), you must then examine the relevant cabling. There a number of potential cabling problems to look out for.
Incorrect Pinout/TX/RX Reverse/Damaged Cable?
Make sure both sides of the cable is not damaged and make sure the wiring is correct inside the cable itself by checking the physical pinouts.
Bad Port
In some cases, the issue is not the cable but the port into which the cable is connected.
Ports have LEDs that can alert you to a bad port.
No light whatsoever indicates an issue with the port. L
Loopback plugs can be used to test the functionality of a port.
Transceiver Mismatch
Duplex and Speed must be the same when connecting two transceivers.
Incorrect speeds equals no communication.
Incorrect duplexes equal poor performance.
Crosstalk
Crosstalk is what happens when there's signal bleed between two adjacent wires that are carrying a current.
Minimize crosstalk inside network cables by twisting the wire pairs together
Tighter the wires, the less cross talk you have.
Caused by using the wrong category of cable
Near-End/Far-End Crosstalk
Far-End Crosstalk is often caused by improperly terminating a cable.
It's important to maintain the twist right up to the punch-down or crimp connector
Attenuation/dB Loss/Distance Limitation
Attenuation is described as a signal moves further through any medium, the medium itself will degrade the signal.
All copper twisted-pair cables have a maximum segment distance of 100 meters before they'll need to be amplified, or repeated, by a hub or a switch.
Fiber Cables can carry signals for miles before degradation occurs
Latency
A low-latency network connection is one that generally experiences short delay times, while a high-latency connection generally suffers from long delays
Routers take a certain amount of time to process and forward any communication.
Configuring additional rules on a router generally increases latency, thereby resulting in longer delays. An organization may decide not to deploy certain security solutions because of the negative effects they will have on network latency.
Jitter
Jitter results from network congestion, timing drift, and route changes.
Jitter is especially problematic in real-time communications like IP telephony and videoconferencing.
Collision
A network collision happens when two devices try to communicate on the same physical segment at the same time.
Switches separate the network into different collision domains
Shorts
Happens when the current flows through a different path within a circuit than it's supposed to; They’re typically caused by some kind of physical cable fault.
Replace the cable.
Open Impedance Mismatch
An impedance mismatch in a circuit or along a network cable will produce a reflection back to the source of the signal.
This causes the link to fail
Interference/Cable Placement
EMI and radio frequency interference (RFI) occur when signals interfere with the normal operation of electronic circuits (TVs, Radio Transmitters, Two-way walkie talkies, and some cellphones)
Use STP, Coaxial, or interference immune fiber-optic cables through your entire network.
Split Pairs
A split pair is a wiring error where two connections that are supposed to be connected using the two wires of a twisted pair are instead connected using two wires from different pairs.
Buying your wires precut eliminates this problem.
Bent Pins
Check end cable pins and make sure they aren’t bent, If bent, the cable will either not work at all or not work correctly.
If these pins get bent, either they won't go into the correct hole or they won't go into a hole at all.
Bottlenecks
Bottlenecks are areas of the network where the physical infrastructure is not capable of handling the traffic; The obvious symptom of a bottle neck is poor performance.
This can be temporary due to an unusual burst of high traffic or it could be a wakeup call to upgrade outdated infrastructure.

Fiber Cable Issues
Wavelength Mismatch
Wavelength mismatch occurs when two different fiber transmitters at each end of the cable are using either a longer or shorter wavelength.
Make sure your transmitters match on both ends of the cable.
Fiber Type Mismatch
Fiber type mismatches, at each of the transceivers, can cause wavelength issues, massive attenuation, and dB loss.
Dirty Connectors
Verify your connectors to make sure no dirt or dust has corrupted the cable end.
You need to polish your cable ends with a soft cloth
Connector Mismatch
Just because it fits doesn't mean it works.
Be sure to have the right connector for each cable end or transceiver
Bend Radius Limitations
Fiber is made of glass or plastic, can break. You need to make sure you understand the bend radius limitations of each type of fiber you purchase.
Crucial information to have so you don’t bend the expensive fiber upon installing

Unbounded Media Issues (Wireless)
Interference
Wi-Fi is very vulnerable to radio interference from Bluetooth keyboards, mice, or cell phones that are all close in frequency ranges. Microwave Ovens can even cause interference.
These devices can cause signal bleed that can slow down or prevent wireless communications.
Distance between the computer and the AP + any solid objects in-between the two can cause interference.
Device Saturation/Bandwidth Saturation
Too many devices connected to an AP can cause the bandwidth to be exhausted.
Simultaneous Wired/Wireless Connections
You need to remind the user to turn off their wireless when they take it into their office and connect it to their dock.
Having both a wired + wireless connection operating simultaneously; if each provides a DNS server with a different address, it can cause name resolution issues, or even default gateway issues
Incorrect Configurations
Mistakes in the configuration of the wireless access point or wireless router or inconsistencies between the settings on the AP and the stations can also be the source of problems
Incorrect Encryption/Security Type Mismatch
Make sure the AP and its clients are configured with the same type of encryption.
Disable security before troubleshooting client problems, because if the client can connect once you've done that, you know you're dealing with a security configuration error.
Incorrect, Overlapping, or Mismatched Channels
Overlapping channels cause your signal-to-noise ratio to drop because you'll get a ton of interference and signal loss!
Verify the correct channel settings
Incorrect Frequency/Incompatibilities
If you have multiple APs and they're in close proximity, you need to make sure they're on different channels/frequencies to avoid potential interference problems.
But keep in mind you've got to configure the same Frequency settings on all the devices that you want to communicate.
SSID Mismatch
If a user reports that they're connected to an AP but still can't access the resources they need or authenticate to the network, you should verify that they are, in fact, connected to the correct SSID and not a neighboring one.
Wireless Standard Mismatch
Some of the wi-fi standards are backward compatible and others aren't.
Make sure the standards on the AP match the standards on the client, or that they're at least backward compatible.
Be sure to understand the throughput, frequency, distance capabilities, and available channels for each standard you use.
Untested Updates
It's really important to push updates to the APs in your wireless network, but not before you test them.
Thoroughly test updates on your bench before pushing them to your live network.
Distance/Signal Strength/Power Levels
If your AP doesn't seem to have enough power to provide a connectivity point for your clients, you can move it closer to them, increase the power distance that the AP can transmit by changing the type of antenna it uses, or use multiple APs connected to the same switch or set of switches to solve the problem
Latency and Overcapacity
When wireless users complain that the network is slow (latency) or that they are losing their connection to applications during a session, it is usually a capacity or distance issue.
802.11 is a shared medium, and as more users connect, all user throughput goes down.
Place another AP close by and place the second AP on a different non-overlapping channel from the first and make sure the second AP uses the same SSID as the first.
The traffic can be better divided and users will get better performance.
Bounce
Repeaters and reflectors to bounce a signal and boost it to cover about a mile, but if you don't tightly control signal bounce, you could end up with a much bigger network than you wanted.
To determine exactly how far and wide the signal will bounce, make sure you conduct a thorough wireless site survey.
Reflection
Reflection can be the cause of serious performance problems in a WLAN. As a wave radiates from an antenna, it broadens and disperses.
Multipath can degrade the strength and quality of the received signal or even cause data corruption or canceled signals. APs mitigate this behavior by using multiple antennas and constantly sampling the signal to avoid a degraded signal.

Can the Problem Be Reproduced?
The first question to ask anyone who reports a network or computer problem is, “Can you show me what ‘not working’ looks like?” This is because if you can reproduce the problem, you can identify when it happens, which may give you all the information you need to determine the source of the problem and maybe even solve it in a snap. The hardest problems to solve are those of the random variety that occur intermittently and can't be easily reproduced.


There 7 steps in the networking troubleshooting process:

Identify the Problem
Establish a theory of probable cause
Test the theory to determine the cause
Establish a plan of action to resolve the problem and identify potential effects
Implement the solution or escalate as necessary
Verify full system functionality and, if applicable, implement preventative measures.
Document findings, actions, outcomes, and lessons learned.

Step 1: Identify the Problem

Before you can solve the problem, you've got to figure out what it is. Asking the right questions can get you far. Information Gathering is very important.
Determine If Anything Has Changed
Were you ever able to do this?
If not, then maybe it just isn't something the hardware or software is designed to do, or maybe the user doesn’t have the required permissions to do so.
If so, when did you become unable to do it?
If the computer was able to do the job and then suddenly could not, whatever conditions surrounded and were involved in this turn of events become extremely important.
There's a high level of probability that the cause of the problem is directly related to the conditions surrounding any change when it occurred.
Has anything changed since the last time you could do this?
The thing that changed right before the problem began happening is almost always what caused it.
“Did anyone add anything to your computer?” or “Are you doing anything differently from the way you usually do it?”
Were any error messages displayed?
Error messages are designed by programmers for the purpose of pointing them to exactly what it is that isn't working properly in computer systems.
Go to software or hardware vendor’s website for a literal translation of the error code.
Are other people experiencing this problem?
Got to narrow the issue down; Great question to see if it’s just one user or an entire workgroup that’s being affected.
Being inundated with calls from a bunch of people from the same workgroup is a solid hint that’s the issue is affecting an entire group.
Is the problem always the same?
A good question to ask, “If you do x, does the problem get better or worse?” For example, ask a user, “If you use a different file, does the problem get better or worse?” If the symptoms regress, it's an indication that the problem is related to the original file that's being used.
Always Approach Problems Individually!
You should never mix possible solutions when troubleshooting.
Changing multiples things at one time makes it harder to see which specific change resolved the issue.
If a change does not have a beneficial effect, reverse the change before making another change.

Step 2: Establish a Theory of Probable Cause

After you observe the problem and identify the symptoms, next on the list is to establish its most probable cause.
Below is a list of logical issues that could be a Probable Cause:
Port Speed If you decide to set the port speed manually, make positively sure to set the same speed on both sides of a link. As long as the switches are allowed to autosense the port speed, it's rare to have a problem develop that results in a complete lack of communication
Always make sure the NICs on both sides of the connection (Host Device ↔ Switch) are configured with the same Port Speed to avoid any communication issues.

Port Duplex Mismatch There are 3 Duplex settings on each port of a network switch: full, half, and auto. In order for two devices to connect effectively, the duplex setting has to match on both sides of the connection.
Mismatches occur when both sides have different Duplex settings.
If both sides are set to auto but the devices are different, you can also end up with a mismatch because the device on one side defaults to full and the other one defaults to half.
Duplex mismatches can cause lots of network and interface errors, and even the lack of a network connection.
The settings you choose are based on the type of devices you have populating your network.
All Switches means you need to make sure the Port Duplex match on both sides of the connection.
If you have any Hubs, you need to make sure the appropriate ports are set to Half-Duplex.

Jabbering NIC An error in which a faulty device (usually a NIC ) continuously transmits corrupted or meaningless data onto a network. This may halt the entire network from transmitting data because other devices will perceive the network as busy due all the “Jabbering”.
Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.