Mellanox MSN2100 Switch fan tolerance

By: | Comments: No Comments

Posted in categories: Computer Tips, Work related

My Mellanox MSN2100 switch constantly has the “system status” LED on red since purchase.

By looking into its system status:

show system-health detail

System status summary

System status LED red
Status: OK
Status: Not OK
Reasons: Failed to get speed tolerance for fan4
Failed to get speed tolerance for fan3
Failed to get speed tolerance for fan2
Failed to get speed tolerance for fan1

System services and devices monitor list

Name Status Type
——————— ——– ———-
sonic OK System
rsyslog OK Process
root-overlay OK Filesystem
var-log OK Filesystem
routeCheck OK Program
dualtorNeighborCheck OK Program
diskCheck OK Program
container_checker OK Program
vnetRouteCheck OK Program
memory_check OK Program
container_memory_snmp OK Program
container_memory_gnmi OK Program
container_eventd OK Program
database:redis OK Process
syncd:syncd OK Process
bgp:zebra OK Process
bgp:staticd OK Process
bgp:bgpd OK Process
bgp:fpmsyncd OK Process
bgp:bgpcfgd OK Process
teamd:teammgrd OK Process
teamd:teamsyncd OK Process
teamd:tlm_teamd OK Process
swss:orchagent OK Process
swss:portsyncd OK Process
swss:neighsyncd OK Process

swss:fdbsyncd OK Process
swss:vlanmgrd OK Process
swss:intfmgrd OK Process
swss:portmgrd OK Process
swss:buffermgrd OK Process
swss:vrfmgrd OK Process
swss:nbrmgrd OK Process
swss:vxlanmgrd OK Process
swss:coppmgrd OK Process
swss:tunnelmgrd OK Process
eventd:eventd OK Process
snmp:snmpd OK Process
snmp:snmp-subagent OK Process
lldp:lldpd OK Process
lldp:lldp-syncd OK Process
lldp:lldpmgrd OK Process
gnmi:gnmi-native OK Process
fan1 Not OK Fan
fan2 Not OK Fan
fan3 Not OK Fan
fan4 Not OK Fan

System services and devices ignore list

Name Status Type
————— ——– ——
psu.voltage Ignored Device
psu.temperature Ignored Device

It shows that it cannot obtain fan tolerance data from the database.

Researching the system files, it is found in file /usr/local/lib/python3.9/dist-packages/sonic_platform/, the fan tolerance is hard set to 50%. And this is not passed to /usr/local/lib/python3.9/dist-packages/health_checker/ via function data_dict.get(‘speed_tolerance’, None)

There is a simple fix of this: comment out the line #105 of the , replace it a hard setting

speed_tolerance = 50

And the system status LED turns green.

Be the first to comment!

Leave a Reply