Includes below commits
9a88cb6 2021-05-06 | [sonic_installer] dont fail package migration (#1591) [Stepan Blyshchak]
615e531 2021-05-05 | [show][config] Add new snmp commands (#1347) [Travis Van Duyn]
fff4051 2021-05-05 | Fixing serial number read to get from DB if it is populated (#1580) [Sudharsan Dhamal Gopalarathnam]
be974bf 2021-05-05 | [neighbor_advertiser] Use existing tunnel if present for creating tunnel mappings (#1589) [Sumukha Tumkur Vani]
9492eab 2021-05-04 | Use swsscommon instead of swsssdk (#1510) [Andriy Yurkiv]
0f4988b 2021-05-04 | Add pg-drop script to sonic filesystem (#1583) [Andriy Yurkiv]
cbe2159 2021-05-04 | [vnet] Add "vnet_route_check" script (#1300) [Volodymyr Samotiy]
9120766 2021-05-03 | Relax the install_requires, no need to exact version as long as there are no broken changes with future versions (#1530) [Qi Luo]
2e09b22 2021-05-03 | Handle the new db version which mellanox_buffer_migrator isn't interested (#1566) [Stephen Sun]
fuser support is required since new cisco hardware watchdog plugin uses them to check anyone else use's /dev/watchdogX resource. The actual validation happens in the platform code, but the package is required for pmon container. Currently the /dev/watchdogX is being used by cisco platform-monitor service. Cisco chassis level watchdog plugin uses "fuser" to claim the watchdog release from platform-monitor service.
Fix following crash in `show version`:
```
Traceback (most recent call last):
File "/usr/local/bin/decode-syseeprom", line 32, in instantiate_eeprom_object
eeprom = sonic_platform.platform.Platform().get_chassis().get_eeprom()
AttributeError: module 'sonic_platform' has no attribute 'platform'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/bin/decode-syseeprom", line 262, in <module>
sys.exit(main())
File "/usr/local/bin/decode-syseeprom", line 244, in main
print_serial(use_db)
File "/usr/local/bin/decode-syseeprom", line 169, in print_serial
eeprom = instantiate_eeprom_object()
File "/usr/local/bin/decode-syseeprom", line 34, in instantiate_eeprom_object
log.log_error('Failed to obtain EEPROM object due to {}'.format(repr(e)))
NameError: name 'log' is not defined
```
Signed-off-by: pettershao-ragilenetworks <pettershao@ragilenetworks.com>
9a88cb6 [sonic_installer] dont fail package migration (#1591)
615e531 [show][config] Add new snmp commands (#1347)
fff4051 Fixing serial number read to get from DB if it is populated (#1580)
be974bf [neighbor_advertiser] Use existing tunnel if present for creating tunnel mappings (#1589)
9492eab Use swsscommon instead of swsssdk (#1510)
0f4988b Add pg-drop script to sonic filesystem (#1583)
cbe2159 [vnet] Add "vnet_route_check" script (#1300)
9120766 Relax the install_requires, no need to exact version as long as there are no broken changes with future versions (#1530)
2e09b22 Handle the new db version which mellanox_buffer_migrator isn't interested (#1566)
Platform library changes
- Fix the use of /proc/modules during testing, fixes#7463
- Add `libsfp-eeprom.so` build to read/write xcvr eeproms in C
- Add some more reboot-cause information
- Write down temperature hw thresholds to the sensors
- Report software thresholds through platform api
- Writ `port_name sysfs` file of optoe`
- Tests enhancements
- Fix dependency issues for chassis provisioning
Platform configuration changes
- Add `pcie.yaml` configuration for a few platforms
- Mount `libsfp-eeprom.so` inside `pmon`
- Fix `Arista-7050SX3-48C8` and `Arista-7050SX3-48YC8' platform and hwsku
- Miscellaneous fixes
Co-authored-by: Boyang Yu <byu@arista.com>
Co-authored-by: Zhi Yuan Carl Zhao <zyzhao@arista.com>
https://github.com/mbj4668/pyang/blob/master/pyang/repository.py#L93 throws an exception with pip 21.1
add ietf yang model explicitly to the build process fix the test failure.
tests/test_sonic_yang_models.py .F [ 66%]
tests/yang_model_tests/test_yang_model.py . [100%]
Failed: pyang -f tree ./yang-models/*.yang > ./yang-models/sonic_yang_tree
----------------------------- Captured stderr call -----------------------------
./yang-models/sonic-acl.yang:8: error: module "ietf-inet-types" not found in search path
./yang-models/sonic-device_metadata.yang:8: error: module "ietf-yang-types" not found in search path
Signed-off-by: Guohan Lu <lguohan@gmail.com>
Originally, SFP modules were always accessed from platform daemons, and arbitrary SFP modules can be accessed in the daemon. So all SFP modules were initialized in one shot once one of the following chassis APIs called
- get_all_sfps
- get_sfp_numbers
- get_sfp
Recently, we noticed that SFP modules can also be accessed from CLI, eg. the latest refactor of `sfputil`.
In this case, only one SFP module is accessed in the chassis object's life cycle.
To initialize all SFP modules in one shot is waste of time and causes the CLI to take much more time to finish.
So we would like to optimize the initialization flow by introducing a two-phase initialization approach:
- Partial initialization, which means the `chassis._sfp_list` has been initialized with proper length and all elements being `None`
- Full initialization, which means all elements in `chassis._sfp_list` are created
If the relevant function is called,
- `get_sfp`, only partial initialization will be done, and then the specific SFP module is initialized.
- `get_all_sfps` or `get_num_sfps`, full initialization will be done, which means all SFP modules are initialized.
Signed-off-by: Stephen Sun <stephens@nvidia.com>
#### Why I did it
MSN4700 A1/A0 used different sensor chip but keep the existing platform name *x86_64-mlnx_msn4700-r0*, this is a workaround to replace the sensor conf on MSN4700 A1/A0
#### How I did it
Use a shell script to get the sensor conf path and copy that files to /etc/sensors.d/sensors.conf
#### Why I did it
400G media EEPROM and DOM information are not populated properly in DellEMC Z9332f platform.
#### How I did it
Handled QSFP_DD, QSFP28/QSFP+, SFP+ accordingly based on media type detected.
- Why I did it
Updated FW to xx.2008.2526 version.
Fixed issues:
1. Spectrum-2, Spectrum-3 | sFlow | High CPU load and high on fully loaded switch.
2. Spectrum-2, Spectrum-3 | Fine grain LAG | in rare cases doesn’t update the right entry
- How I did it
Updated submodule pointer and version in a Makefile.
- How to verify it
Full regression and bugs validation
Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>
#### Why I did it
We want to add the ability for the command `show platform psustatus` to show the serial number and part number of the PSU devices on Mellanox platforms. This will be useful for data-center management of field replaceable units (FRUs) on switches.
#### How I did it
I implemented the platform 2.0 functions `get_model()` and `get_serial()` for the PSU in the mellanox platform API by referencing the sysfs nodes provided by the [hw-management](https://github.com/Azure/sonic-buildimage/tree/master/platform/mellanox/hw-management) module.
- Why I did it
Enable VXLAN src port range configuration via SAI profile for Mellanox-SN3800-D28C49S1 SKU
- How I did it
Added SAI_VXLAN_SRCPORT_RANGE_ENABLE=1 configuration to appropriate sai.profile
Signed-off-by: Andriy Yurkiv <ayurkiv@nvidia.com>
#### Why I did it
Adopt a single way to get fan direction for all ASIC types.
It depends on hw-mgmt V.7.0010.2000.2303. Depends on https://github.com/Azure/sonic-buildimage/pull/7419
#### How I did it
Originally, the get_direction was implemented by fetching and parsing `/var/run/hw-management/system/fan_dir` on the Spectrum-2 and the Spectrum-3 systems. It isn't supported on the Spectrum system.
Now, it is implemented by fetching `/var/run/hw-management/thermal/fanX_dir` for all the platforms.
Signed-off-by: Stephen Sun <stephens@nvidia.com>
#### Why I did it
Current platform.json lacks some peripheral device related facts, like chassis/fan/pasu/drawer/thermal/components names, numbers, etc.
#### How I did it
Add platform device facts to the platform.json file
Signed-off-by: Kebo Liu <kebol@nvidia.com>
- Removed the old function for detecting a faulty fan.
- Removed the old function for detecting excess temperature.
- Implement thermal_manager APIs based on ThermalManagerBase
- Implement thermal_conditions APIs based on ThermalPolicyConditionBase
- Implement thermal_actions APIs based on ThermalPolicyActionBase
- Implement thermal_info APIs based on ThermalPolicyInfoBase
- Add thermal_policy.json
Add 3 pipeline files:
- pipeline for build docker image sonic-slave-[buster|jessie|stretch] for amd64/armhf/arm64, and push to ACR(sonicdev-microsoft.com)
- pipeline for build docker image sonic-mgmt, and push to ACR
- pipeline for cleaning dpkg cache which are created more than 30 days.
Co-authored-by: lguohan <lguohan@gmail.com>
Why I did it
ndppd by default reads /proc/net/ipv6_route ever 30 seconds. Since T1s advertise so many routes to ToRs, this file is extremely large, and reading it causes ndppd's CPU usage to spike every 30 seconds
How I did it
Increase the delay for reading this file to the maximum possible value (max integer value), which will result in CPU spikes every ~24 days instead of every 30 seconds
How to verify it
Start ndppd with the new config file, confirm that no CPU spikes are seen except at startup
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
08337aa [sonic-package-manager] first phase implementation of sonic-package-manager (#1527)
c166f66 [multi-asic] support show ip bgp neigh/network for multi asic (#1574)
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
Why I did it
After PR #7344, 'make init' and/or 'make reset' will also build sonic slave dockers.
'-include rules/config.user' is supposed to be fine when the file is missing. However, when the file is missing, it generates a delayed error which later causes make init and make reset trying to build the sonic slave dockers.
How I did it
Define a do-nothing target for config.user to catch config.user build therefore preventing other builds to be triggered unexpectedly.
How to verify it
did make init and it is now only doing submodule init.
Previously, a brief sleep was necessary in order to get Python threads to progress. The root cause of this has since been found and fixed in sonic-swss-common: Azure/sonic-swss-common#477. The submodule was updated here, so we can now safely remove this sleep.
This PR should also be cherry-picked to the 202012 branch once the submodule is updated there to also include the fix.
[flex-counters] Delay flex counters stats init for faster boot time (Azure/sonic-swss#1646)
[routeorch] Add support for blackhole routes (Azure/sonic-swss#1723)
Update pool sizes during initialization from timer only (Azure/sonic-swss#1708)
Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>
The following error message is observed during chassis object being destroyed
"Exception ignored in: <function Chassis.__del__ at 0x7fd22165cd08>
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/sonic_platform/chassis.py", line 83, in __del__
ImportError: sys.meta_path is None, Python is likely shutting down
The chassis tries to import deinitialize_sdk_handle during being destroyed for the purpose of releasing the sdk_handle.
However, importing another module during shutting down can cause the error because some of the fundamental infrastructures are no longer available."
This error occurs when a chassis object is created and then destroyed in the Python shell.
- How I did it
To fix it, record the deinitialize_sdk_handle in the chassis object when sdk_handle is being initialized and call the deinitialize handler when the chassis object is being destroyed
- How to verify it
Manually test.
- Why I did it
Upgrade hw-mgmt to 7.0100.2303
Bug fixes
1. Fan direction feature fix for fixed FAN system (using shell instead of binutils/strings)
2. Remove cpld 4th link on systems with only 3 CPLD's
3. hw-mgmt: thermal: Add hardcoded critical trip point. Follow-up after patch "Removing critical thermal zones to prevent unexpected software system shutdown".
4. Fix sensor attribute mapping to be label based instead of index based to allow common handling of voltage regulator names independently of hardware changes.
5. Update 'lm-sensors' custom configuration file. Relevant only for users utilizing sensors.conf files coming along with hw-management package.
6. For full feature list please follow https://github.com/Mellanox/hw-mgmt/blob/V.7.0010.2300_BR/debian/Release.txt
- How I did it
Update hw-mgmt pointer
Remove unused patches
Fix existing patch to make sure it apply successfully
- How to verify it
Full platform regression on all mellanox platforms
99ad210 [Mellanox] backport kernel patches for hw-management 7.0100.2303 (#211)
- Why I did it
Update submodule pointer for sonic-linux-kernel to include kernel patches for hw-mgmt 7.0100.2303
- How I did it
Update submodule pointer for sonic-linux-kernel
Changes in the new release:
Fix 10G and 50G speeds in SAI XML to support all interface types
Enable SMAC=DMAC and SMAC MC in tunnel debug counter
Add tunnel statistics
Add isolation group API implementation
Fix ACL ANY debug counter to correctly track ACL drops
Add VXLAN source port hard coded range, controlled by K/V
FW dump me now feature
Add mlxtrace to saidump
Speed lane setting and AN control
Implement query stats API
VNI miss part of tunnel decal drop reason
Align with SAI API v1.8.1
Signed-off-by: Dror Prital <drorp@nvidia.com>
* 9dba93f disk_check: Check & mount RO as RW using tmpfs (#1569)
* c3963c5 Fix remove ip rif (#1535)
* 41d8ddc [config][generic-update] Adding apply-patch, rollback, checkpoints commands (#1536)
* a3d37f1 [console] Display success message after line cleared (#1579)
* b10c157 RADIUS Management User Authentication Feature (#1521)
* 59ed6f3 platform pre-check for reboot in master branch (#1556)
* f5efe89 [acl] Use a list instead of a comma-separated string for ACL port list (#1519)
* e296a69 No more IP validation as it is more likely a URL (#1555)
* d5f5382 [CLI][queue counters] add JSON output option for queue counters (#1505)
* 176cc4a 1) Loopback interfaces with valid nexthop IP are not ignored/treated as loopback. (#1565)
* 149ccbd [techsupport] Update show ip interface command (#1562)
* 0e84418 Stop PMON docker before cold and soft reboots (#1514)
* eba5c04 Fix Multi-ASIC show specific resursive route by using common parsing function (#1560)
* e57e7f7 cache the bvid to vlan translations (#1523)
* 38f9f60 sonic-installer: fix py3 issues in bootloader.aboot (#1553)
* 02b263a [voq/inbandif] Voq inbandif port (#1363)
* 0539789 [load_minigraph]: Avoid starting PFCWD for EPMS devicetype (#1552)
* 030293c Use 'importlib' module in lieu of deprecated 'imp' module (#1450)
* 50e5c61 Fixed the possibility of using uninitialized variable in route_check.py (#1551)
This commit contains the following changes to support for configuring a VoQ switch using a minigraph.xml file.:
- Add support for system ports configuration to minigraph
- Add support for SwitchId, SwitchType and MaxCores to minigraph
- Add support for inband vlan configuration in minigraph
- `asic_name` is now a mandatory attribute in CONFIG_DB on VoQ switches
Co-authored-by: Maxime Lorrillere <mlorrillere@arista.com>