5.0 Network Troubleshooting

icon picker
5.1 Explain the network troubleshooting methodogly

Last edited 544 days ago by Makiel [Muh-Keel]

Always try to narrow down the problem.

Did you check the super simple stuff (SSS)?
Is hardware or software causing the problem?
Is it a workstation or server problem?
Which segments of the network are affected?
Are there any cabling issues?

Did you check the super simple stuff (SSS)?

“All things being equal, the simplest explanation is probably the correct one.”
The more simple an error is, the more likely it is to happen because fewer steps have to transpire for it to occur. Is it more likely that there’s been a organized attack on your entire organization’s network whenever someone can’t connect to the internet, or it is more likely their network cable is loose? Here a few SSS things to look for:
Check to verify login procedures and rights.
Look for link lights and collision lights.
Check all power switches, cords, adapters, and the status of all hardware.
Look for user errors.

Check Network Link Lights and Collision Lights If the link lights are lit up on both the workstation's NIC and the switch port to which the workstation is connected, it's usually safe to assume that the workstation and switch are communicating just fine.
The link lights on some NICs don't activate until the driver is loaded. So, if the link light isn't on when the system is first turned on, you'll just have to wait until the operating system loads the NIC driver

User Error is common and you should always get the user to reproduce the error in front of you. Always check out the problem thoroughly. If the problem and its solution aren't immediately clear to you, try the procedure yourself, or ask someone else at another workstation to do so. Don't just leave the issue unsettled or make the assumption that it is user error or a chance abnormality because that's exactly what the bad guys out there are hoping you'll do.
Is Hardware or Software Causing the Problem? A hardware problem often rears its ugly head when some device in your computer skips a beat and/or dies. This one's pretty easy to discern because when you try to do something requiring that particular piece of hardware, you can't do it and instead get an error telling you that you can't do it.
Be sure to backup all data before you replace any faulty hardware components.
Software problems often will cause whatever networking program you’re using to freeze and be non-responsive.
Is It a Workstation or a Server Problem? The first thing you've got to determine when troubleshooting this kind of problem is whether it's only one person or a whole group that's been affected.
Single-User can’t Login workstation or access server service?
Try to log in from another workstation within the same group of users. If you can do that, the problem is definitely the user's workstation, so look for things like cabling faults, a bad NIC, power issues, and OSs.
Whole department can't access a specific server service?
Take a thorough look at the particular server and verify all user connections to it. If everyone is logged in correctly, the problem may have something to do with individual rights or permissions.
If no one can log in to that server, including you, the server probably has a communication problem with the rest of the network.
If the server has totally crashed, either you'll see messages telling you all about it on the server's monitor or you'll find its screen completely blank—screaming indicators that the server is no longer running

Which Segments of the Network Are Affected? Figuring this out can be kind of hard.
If multiple segments are affected, you may be dealing with a network-address conflict. If you're running Transmission Control Protocol/Internet Protocol (TCP/IP), remember that IP addresses must be unique across an entire network. So, if two of your segments have the same static IP subnet addresses assigned, you'll end up with duplicate IP errors.
If all of your network's users are experiencing the problem, it could be a server everyone accesses.
WAN Issue? WAN devices have built-in diagnostics that tell you whether a WAN link is working okay, which really helps you determine if the failure has something to do with the WAN link itself or with the hardware involved instead.

Is It Bad Cabling? Are the cables properly connected to the correct port? Once you've figured out whether your plight is related to one workstation, a network segment, or the whole tamale (network), you must then examine the relevant cabling. There a number of potential cabling problems to look out for.
Incorrect Pinout/TX/RX Reverse/Damaged Cable?
Make sure both sides of the cable is not damaged and make sure the wiring is correct inside the cable itself by checking the physical pinouts.
Bad Port
In some cases, the issue is not the cable but the port into which the cable is connected.
Ports have LEDs that can alert you to a bad port.
No light whatsoever indicates an issue with the port. L
Loopback plugs can be used to test the functionality of a port.
Transceiver Mismatch
Duplex and Speed must be the same when connecting two transceivers.
Incorrect speeds equals no communication.
Incorrect duplexes equal poor performance.
Crosstalk
Crosstalk is what happens when there's signal bleed between two adjacent wires that are carrying a current.
Minimize crosstalk inside network cables by twisting the wire pairs together
Tighter the wires, the less cross talk you have.
Caused by using the wrong category of cable
Near-End/Far-End Crosstalk
Far-End Crosstalk is often caused by improperly terminating a cable.
It's important to maintain the twist right up to the punch-down or crimp connector
Attenuation/dB Loss/Distance Limitation
Attenuation is described as a signal moves further through any medium, the medium itself will degrade the signal.
All copper twisted-pair cables have a maximum segment distance of 100 meters before they'll need to be amplified, or repeated, by a hub or a switch.
Fiber Cables can carry signals for miles before degradation occurs
Latency
A low-latency network connection is one that generally experiences short delay times, while a high-latency connection generally suffers from long delays
Routers take a certain amount of time to process and forward any communication.
Configuring additional rules on a router generally increases latency, thereby resulting in longer delays. An organization may decide not to deploy certain security solutions because of the negative effects they will have on network latency.
Jitter
Jitter results from network congestion, timing drift, and route changes.
Jitter is especially problematic in real-time communications like IP telephony and videoconferencing.
Collision
A network collision happens when two devices try to communicate on the same physical segment at the same time.
Switches separate the network into different collision domains
Shorts
Happens when the current flows through a different path within a circuit than it's supposed to; They’re typically caused by some kind of physical cable fault.
Replace the cable.
Open Impedance Mismatch
An impedance mismatch in a circuit or along a network cable will produce a reflection back to the source of the signal.
This causes the link to fail
Interference/Cable Placement
EMI and radio frequency interference (RFI) occur when signals interfere with the normal operation of electronic circuits (TVs, Radio Transmitters, Two-way walkie talkies, and some cellphones)
Use STP, Coaxial, or interference immune fiber-optic cables through your entire network.
Split Pairs
A split pair is a wiring error where two connections that are supposed to be connected using the two wires of a twisted pair are instead connected using two wires from different pairs.
Buying your wires precut eliminates this problem.
Bent Pins
Check end cable pins and make sure they aren’t bent, If bent, the cable will either not work at all or not work correctly.
If these pins get bent, either they won't go into the correct hole or they won't go into a hole at all.
Bottlenecks
Bottlenecks are areas of the network where the physical infrastructure is not capable of handling the traffic; The obvious symptom of a bottle neck is poor performance.
This can be temporary due to an unusual burst of high traffic or it could be a wakeup call to upgrade outdated infrastructure.

Fiber Cable Issues
Wavelength Mismatch
Wavelength mismatch occurs when two different fiber transmitters at each end of the cable are using either a longer or shorter wavelength.
Make sure your transmitters match on both ends of the cable.
Fiber Type Mismatch
Fiber type mismatches, at each of the transceivers, can cause wavelength issues, massive attenuation, and dB loss.
Dirty Connectors
Verify your connectors to make sure no dirt or dust has corrupted the cable end.
You need to polish your cable ends with a soft cloth
Connector Mismatch
Just because it fits doesn't mean it works.
Be sure to have the right connector for each cable end or transceiver
Bend Radius Limitations
Fiber is made of glass or plastic, can break. You need to make sure you understand the bend radius limitations of each type of fiber you purchase.
Crucial information to have so you don’t bend the expensive fiber upon installing

Unbounded Media Issues (Wireless)
Interference
Wi-Fi is very vulnerable to radio interference from Bluetooth keyboards, mice, or cell phones that are all close in frequency ranges. Microwave Ovens can even cause interference.
These devices can cause signal bleed that can slow down or prevent wireless communications.
Distance between the computer and the AP + any solid objects in-between the two can cause interference.
Device Saturation/Bandwidth Saturation
Too many devices connected to an AP can cause the bandwidth to be exhausted.
Simultaneous Wired/Wireless Connections
You need to remind the user to turn off their wireless when they take it into their office and connect it to their dock.
Having both a wired + wireless connection operating simultaneously; if each provides a DNS server with a different address, it can cause name resolution issues, or even default gateway issues
Incorrect Configurations
Mistakes in the configuration of the wireless access point or wireless router or inconsistencies between the settings on the AP and the stations can also be the source of problems
Incorrect Encryption/Security Type Mismatch
Make sure the AP and its clients are configured with the same type of encryption.
Disable security before troubleshooting client problems, because if the client can connect once you've done that, you know you're dealing with a security configuration error.
Incorrect, Overlapping, or Mismatched Channels
Overlapping channels cause your signal-to-noise ratio to drop because you'll get a ton of interference and signal loss!
Verify the correct channel settings
Incorrect Frequency/Incompatibilities
If you have multiple APs and they're in close proximity, you need to make sure they're on different channels/frequencies to avoid potential interference problems.
But keep in mind you've got to configure the same Frequency settings on all the devices that you want to communicate.
SSID Mismatch
If a user reports that they're connected to an AP but still can't access the resources they need or authenticate to the network, you should verify that they are, in fact, connected to the correct SSID and not a neighboring one.
Wireless Standard Mismatch
Some of the wi-fi standards are backward compatible and others aren't.
Make sure the standards on the AP match the standards on the client, or that they're at least backward compatible.
Be sure to understand the throughput, frequency, distance capabilities, and available channels for each standard you use.
Untested Updates
It's really important to push updates to the APs in your wireless network, but not before you test them.
Thoroughly test updates on your bench before pushing them to your live network.
Distance/Signal Strength/Power Levels
If your AP doesn't seem to have enough power to provide a connectivity point for your clients, you can move it closer to them, increase the power distance that the AP can transmit by changing the type of antenna it uses, or use multiple APs connected to the same switch or set of switches to solve the problem
Latency and Overcapacity
When wireless users complain that the network is slow (latency) or that they are losing their connection to applications during a session, it is usually a capacity or distance issue.
802.11 is a shared medium, and as more users connect, all user throughput goes down.
Place another AP close by and place the second AP on a different non-overlapping channel from the first and make sure the second AP uses the same SSID as the first.
The traffic can be better divided and users will get better performance.
Bounce
Repeaters and reflectors to bounce a signal and boost it to cover about a mile, but if you don't tightly control signal bounce, you could end up with a much bigger network than you wanted.
To determine exactly how far and wide the signal will bounce, make sure you conduct a thorough wireless site survey.
Reflection
Reflection can be the cause of serious performance problems in a WLAN. As a wave radiates from an antenna, it broadens and disperses.
Multipath can degrade the strength and quality of the received signal or even cause data corruption or canceled signals. APs mitigate this behavior by using multiple antennas and constantly sampling the signal to avoid a degraded signal.

Can the Problem Be Reproduced?
The first question to ask anyone who reports a network or computer problem is, “Can you show me what ‘not working’ looks like?” This is because if you can reproduce the problem, you can identify when it happens, which may give you all the information you need to determine the source of the problem and maybe even solve it in a snap. The hardest problems to solve are those of the random variety that occur intermittently and can't be easily reproduced.


There 7 steps in the networking troubleshooting process:

Identify the Problem
Establish a theory of probable cause
Test the theory to determine the cause
Establish a plan of action to resolve the problem and identify potential effects
Implement the solution or escalate as necessary
Verify full system functionality and, if applicable, implement preventative measures.
Document findings, actions, outcomes, and lessons learned.

Step 1: Identify the Problem

Before you can solve the problem, you've got to figure out what it is. Asking the right questions can get you far. Information Gathering is very important.
Determine If Anything Has Changed
Were you ever able to do this?
If not, then maybe it just isn't something the hardware or software is designed to do, or maybe the user doesn’t have the required permissions to do so.
If so, when did you become unable to do it?
If the computer was able to do the job and then suddenly could not, whatever conditions surrounded and were involved in this turn of events become extremely important.
There's a high level of probability that the cause of the problem is directly related to the conditions surrounding any change when it occurred.
Has anything changed since the last time you could do this?
The thing that changed right before the problem began happening is almost always what caused it.
“Did anyone add anything to your computer?” or “Are you doing anything differently from the way you usually do it?”
Were any error messages displayed?
Error messages are designed by programmers for the purpose of pointing them to exactly what it is that isn't working properly in computer systems.
Go to software or hardware vendor’s website for a literal translation of the error code.
Are other people experiencing this problem?
Got to narrow the issue down; Great question to see if it’s just one user or an entire workgroup that’s being affected.
Being inundated with calls from a bunch of people from the same workgroup is a solid hint that’s the issue is affecting an entire group.
Is the problem always the same?
A good question to ask, “If you do x, does the problem get better or worse?” For example, ask a user, “If you use a different file, does the problem get better or worse?” If the symptoms regress, it's an indication that the problem is related to the original file that's being used.
Always Approach Problems Individually!
You should never mix possible solutions when troubleshooting.
Changing multiples things at one time makes it harder to see which specific change resolved the issue.
If a change does not have a beneficial effect, reverse the change before making another change.

Step 2: Establish a Theory of Probable Cause

After you observe the problem and identify the symptoms, next on the list is to establish its most probable cause.
Below is a list of logical issues that could be a Probable Cause:
Port Speed If you decide to set the port speed manually, make positively sure to set the same speed on both sides of a link. As long as the switches are allowed to autosense the port speed, it's rare to have a problem develop that results in a complete lack of communication
Always make sure the NICs on both sides of the connection (Host Device ↔ Switch) are configured with the same Port Speed to avoid any communication issues.

Port Duplex Mismatch There are 3 Duplex settings on each port of a network switch: full, half, and auto. In order for two devices to connect effectively, the duplex setting has to match on both sides of the connection.
Mismatches occur when both sides have different Duplex settings.
If both sides are set to auto but the devices are different, you can also end up with a mismatch because the device on one side defaults to full and the other one defaults to half.
Duplex mismatches can cause lots of network and interface errors, and even the lack of a network connection.
The settings you choose are based on the type of devices you have populating your network.
All Switches means you need to make sure the Port Duplex match on both sides of the connection.
If you have any Hubs, you need to make sure the appropriate ports are set to Half-Duplex.

Jabbering NIC An error in which a faulty device (usually a NIC ) continuously transmits corrupted or meaningless data onto a network. This may halt the entire network from transmitting data because other devices will perceive the network as busy due all the “Jabbering”.

Mismatched MTU Make sure the max transmission unit is matching on both sides of a router as well.
Failure to due so results in communication problems and the link failing to pass traffic

Wrong VLAN If a port is accidentally assigned to the wrong VLAN in a switch, it can cause the wrong security policy and permission levels to be assigned to a host device. Either giving it too much authority or not enough; Either way it’s going to have the wrong permissions.
Incorrect IP Address/Duplicate IP Address An incorrect or duplicate IP address on a client will keep that client from being able to communicate and may even cause a conflict with another client on the network, and a bad address on a server or router interface can be disastrous and affect a multitude of users.
This is exactly why you need to be super careful to set up DHCP servers correctly and also when configuring the static IP addresses assigned to servers and router interfaces.
Wrong Gateway Because every device needs a valid gateway to obtain communication outside its own network, accidently configuring the incorrect gateway means the packets won’t be routed to any outer networks at all.
Always verify the routers address before statically configuring.
Wrong DNS DNS addresses are automatically configured by a DHCP server, but sometimes these addresses are statically configured instead. Because lots of applications rely on hostname resolution, a botched static DNS configuration usually causes a computer's network applications to fail just like the user's applications in our example scenario.

Wrong Subnet Mask A subnet mask is generally configured by the DHCP server; if you're going to statically enter it, make sure the subnet mask is right or you'll end up dealing with the fallout caused by the entire address's misconfiguration.
Incorrect Interface/Interface Misconfiguration If a host is plugged into a misconfigured switch port, or if it's plugged into the wrong switch port that's configured for the wrong VLAN, the host won't function correctly.
Get these wrong and you'll get interface errors on the host and switch port or, worse, things just won't work at all!
Make sure the speed, duplex, and correct Ethernet cable is used

Duplicate MAC Addresses This should never happen or even be possible due to MAC addresses being unique by nature.
This usually indicates a MAC spoofing attack.
Expired IP Address When DHCP is used to allocate IP configurations to devices, the configuration is supplied to the DHCP client on a temporary basis. The lease period will eventually expire unless the IP address is reserved.

Rogue DHCP Server Unsuspecting hosts may accept DHCP Offer packets from the illegitimate DHCP server rather than the legitimate DHCP server.
The Rouge DHCP Server can also use a Rouge DHCP DNS Server to send the victim to a phony website.
Untrusted SSL Certificate Reception of an untrusted SSL certificate error message can be for several reasons
1st Reason- The Security certificate presented by this website was not issued by a trusted certificate authority,” means the CA that issued the certificate is not trusted by the local machine. This will occur if the certificate of the CA that issued the certificate is not found in the Trusted Root Certification Authorities Folder on the local machine.
2nd Reason- May be that the certificate was presented before the validity period begins, or it may have expired, meaning the validity period is over.
3rd Reason-

Incorrect Time In a Windows environment using Active Directory, a clock skew of more than 5 minutes between a client and server will prevent communication between the two.
When certificates are in use, proper time synchronization is critical for successful operation.
When system logs are sent to a central server such as a syslog server, proper time synchronization is critical to understand the order of events.
Exhausted DHCP Scope When the IP addresses in a DHCP scope are exhausted, any new DHCP clients will be unable to obtain an IP address and will be unable to function on the network.
A backup DHCP Server can be set up to swap out with an exhausted DHCP server, but make sure no duplicate IP addresses are assigned.
Blocked TCP/UDP Ports It will be impossible to make use of the service or application if the network/personal firewall is blocking them.
One easy way to verify the open ports on a device is to execute the netstat command
image.png

Incorrect Host-Based Firewall Settings Incorrect host-based firewall settings can either prevent transmissions or allow unwanted communications. Neither of these outcomes is desirable.
The best ways to ensure that firewall settings are consistent and correct all the time is to control these settings with a group policy.

Incorrect ACL Settings Access control lists are used to control which traffic types can enter and exit ports on the router.
Many devices can be affect by incorrectly configured ACL settings.
This should only be done by those who have been trained in their syntax and in the logic ACLs use in their operation.

Unresponsive Service Many services depend on other services for their operation; The failure of one service often causes a domino effect to other services.
You can use the Services applet in Control Panel to identify these dependencies as well as start and stop services. To identify the services upon which a particular service depends, use the Dependencies tab on the Services applet.
image.png


Multicast Flooding Multicasting is used for network devices to communicate with each other and to save network capacity by having only one sender but many listeners.
This can cause network congestion because every port is flooded with network packets.
Modern switches and routers that are designed to lessen the impact of multicast flooding.

Asymmetrical Routing is a Problem Asymmetrical routing is when a session takes different paths through a network. Done correct, a routed network will have only one path for both send and receive traffic from a client to a server and vice versa.
Conditions can exist where a router sends traffic out and it comes back using another path; This can be due to running multiple routing protocols inside your network or that your ISP is not returning traffic via another path than what you are sending out.
tracert command can help narrow this down.

Low Optical Link Budget When you're troubleshooting fiber-optic links, a test set should be used to make sure the received light level is not too low as to be detected.
Too much loss over a fiber link due to too many interconnects where additional loss is added, or if the distance is greater than the standard dictates, and dirty connections can cause the link to not be established.

Network Time Protocol Issues If communications are lost or NTP was never configured to begin with, time stamps for logging, application synchronizations, and licenses based on dates can all cause major headaches.
Make sure all of your devices get their data and time data from the NTP servers.

Licensed Features Application that is missing areas in its configuration could be due to the feature never being enabled in the first place.
Always take the time in your troubleshooting to investigate if the feature requires a license and if the license is active.
Network Performance Issues It is important to isolate the problem to see if it is local to a computer or more widespread.
This can be a single switch or a whole building.
By isolating it to network segments, you can focus your troubleshooting to determine the cause of the problem.

Consider Multiple Approaches
Top-to-Bottom Approach
Top-Down approach means you start with the top layer (user application) and work your way down the layers of the OSI model.
If a layer is not working, you inspect the layer below it.
When you know that the current layer is not in working condition and you discover that a lower layer works, you can conclude that the problem is within the non-working current layer.
The key is to find the lowest layer causing issues because the lowest non-working lower layer is the root cause of any above layer malfunctioning. If the foundational lower levels are shaky, it will damage the integrity of the entire OSI model, causing any above the foundation to not work properly.
Bottom-Up Approach
Troubleshooting a networking problem starts with the physical components of the network and working up the layers of the OSI model.
Start with Layer 1 (physical layer) and keep moving up through the layers until you find the Layer that’s problematic.
Can be time consuming because you’re checking each layer from the bottom-up.
Divide and Conquer
Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.