Why I did it
Need to be able to run smartctl when pmon docker is not running.
How I did it
Removed the pmon dependency for pmon as well as the command wrapper and added it to the debian-extension.
How to verify it
Stop pmon
Run smartctl from the host and verify it runs without error
Updates include the following changes in order to support new Mellanox platforms and drivers (Azure/sonic-linux-kernel#259)
10ef390 Update kconfig to support / enable newly backported mellanox patches.
6a949e1 Add backported patches for Mellanox hw-mgmt V.7.0020.1300
e1913f7 Rename and reformat patch headers
#### Why I did it
Include sonic-bgp-monitor to setup.py so it gets included in /usr/local/yang-models when installing the package
#### How I did it
#### How to verify it
install the package
#### Which release branch to backport (provide reason below if selected)
<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->
- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
#### Description for the changelog
<!--
Write a short (one line) summary that describes the changes in this
pull request for inclusion in the changelog:
-->
#### A picture of a cute animal (not mandatory but encouraged)
- Why I did it
Fix issue: 'sx_port_mapping_t' object has no attribute 'slot_id'. sx_port_mapping_t only has attribute slot.
- How I did it
Change slot_id to slot.
- How to verify it
Manual test
- Why I did it
Python select.select accept a optional timeout value in seconds, however, the value passes to it is a value in millisecond.
- How I did it
Transfer the value to millisecond.
- How to verify it
Manual test
Why I did it
To enable test support for BFD-related features, the PTF docker needs to have the proper support for BFD. This PR aims to add BFD support in ptf docker.
How I did it
Clone and build OpenBFDD for PTF docker.
How to verify it
Build locally and verify BFD is supported.
- Why I did it
To include latest SDK fixes:
1. On CMIS modules, after low power configuration, the firmware waited for the module state to be ModuleReady instead of ModuleLowPower causing delays.
2. When connecting SN4600C, 100GbE port with CWDM4 module (Gen 3.0), link up time is 30 seconds.
and to include SAI fixes \ changes:
1. Reduce verbosity for resource check vendor data not found
2. Fix metadata validation, check default value on conditions check
3. Add 100MB, 10MB to 2201 system
4. L3 VXLAN overlay ECMP
5. VXLAN srcport API implementation
6. Fix scheduler profile null (default values) when set on sub group scheduler group
7. Fix ACL binding restoration when port leaves a LAG
8. Fix route logic for set next hop/action and reference counter for ECMP overlay
- How I did it
1. Updated SDK/FW submodule and relevant makefiles with the required versions.
2. Update SAI submodule and relevant makefile with the required version.
- How to verify it
Build an image and run tests from "sonic-mgmt".
Why I did it
Requirements from Microsoft for fwutil update all state that all firmwares which support this upgrade flow must support upgrade within a single boot cycle. This conflicted with a number of Mellanox upgrade flows which have been revised to safely meet this requirement.
How I did it
Added --no-power-cycle flags to SSD and ONIE firmware scripts
Modified Platform API to call firmware upgrade flows with this new flag during fwutil update all
Added a script to our reboot plugin to handle installing firmwares in the correct order with prior to reboot
How to verify it
Populate platform_components.json with firmware for CPLD / BIOS / ONIE / SSD
Execute fwutil update all fw --boot cold
CPLD will burn / ONIE and BIOS images will stage / SSD will schedule for reboot
Reboot the switch
SSD will install / CPLD will refresh / switch will power cycle into ONIE
ONIE installer will upgrade ONIE and BIOS / switch will reboot back into SONiC
In SONiC run fwutil show status to check that all firmware upgrades were successful
- Why I did it
The feature state can be a jinja template, like in this file - https://github.com/Azure/sonic-buildimage/blob/master/files/build_templates/init_cfg.json.j2#L39.
Without this change it is not possible to validate a configuration file.
- How I did it
Relaxes the constraint on feature state. Feature state leaf can be any string.
- How to verify it
Run UT.
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
Why I did it
Update Broadcom SAI to version 6.0.0.13, SDK 6.5.24, saibcm-modules to 6.5.24.gpl
How I did it
Brcm SAI 6.0 EA with fixes for CS00012203367, CS00012219613, CS00012213974, CS00012218290, CS00012217169, CS00012211718, CS00012213944, CS00012215529, CS00012218100, CS00012214196, CS00012212681, CS00012205138, CS00012208537, CS00012185316, CS00012208524, CS00012203367, CS00012197364.
51a9fbf [debug dump] Missing Dict Key handled in the MatchOptimizer (#2014)
ac8fdd3 [Auto Techsupport] Added Event Driven TS to Command Reference (#1985)
458a0c2 [fdbshow] Adding more options for fdbshow and show mac (#1982)
#### Why I did it
src\tacacs\bash_tacplus\debian\rules file mode is 644, and debian build will change it to 755, which will cause image version contains 'dirty'
#### How I did it
Change src\tacacs\bash_tacplus\debian\rules file mode to 755
#### How to verify it
Check the image version not contains dirty
#### Which release branch to backport (provide reason below if selected)
- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [*] 202111
#### Description for the changelog
Change src\tacacs\bash_tacplus\debian\rules file mode to 755
#### A picture of a cute animal (not mandatory but encouraged)
Why I did it
sonic-broadcom-dnx.bin should be able to installed on DNX supported platform, whereas it doesn't.
How I did it
Changed CONFIGUTED_PLATFORM to TARGET_MACHINE to distinguish broadcom and broadcom-dnx
How to verify it
tar sonic-broadcom-dnx.bin and verify its platforms_asic contians dnx platforms
Also verify on image with other asic, no regression.
Why I did it
Eliminate benign firsttime boot error reported when running on platforms that do not support kdump.
How I did it
Change rc.local to check for presence of the file /etc/default/kdump-tools before referencing it.
How to verify it
Install a new image on an armhf or arm64 platform and check for a failed reference to /etc/default/kdump-tools on firsttime boot.
- External PHY is managed via gearbox (gbsybcd docker container) in SONiC
- Enhanced 'External PHY management' from SONiC's single-ASIC environment to multi-ASIC
- Enhanced gbsyncd docker container from single Namespace to multi-Namspace mode
- Added gbsyncd.service.j2 on per_namespace basis.
- Each namepace/ASIC now to have its unique gbsyncd<ASIC#> docker container with its
own Gearbox table, redis-DB
Signed-off-by: Shyam Kumar <shyakuma@cisco.com>
Why I did it
ConfigDB schema generated by minigraph parser can't pass yang validation.
How I did it
Modify minigraph.py, and use 'state' to replace 'status'.
How to verify it
Run UT for sonic-config-engine.
Use minigraph parser to generate ConfigDB schema, and run yang validation.
Signed-off-by: Gang Lv ganglv@microsoft.com
Why I did it
end2end test is blocked by Yang model for BGP monitor.
How I did it
Create new yang files for BGP monitor, and add UT.
How to verify it
Follow the steps in #9711.
Run UT for sonic-yang-models.
Signed-off-by: Gang Lv ganglv@microsoft.com
#### Why I did it
AAA yang model is not up to date.
#### How I did it
Add fallback and trace field, and replace boolean_type
#### How to verify it
Run UT for sonic_yang_models.
Follow the steps from #9710
Why I did it
Config db schema generated by minigraph can’t pass yang validation, bgp_asn must not be None.
How I did it
Update sampe-voq-graph.xml to add bgp_asn.
How to verify it
Build sonic-config-engine.
Run command 'sonic-cfggen -m tests/sample-voq-graph.xml -p tests/voq-sample-port-config.ini --print-data', and check bgp_asn.
Signed-off-by: Gang Lv ganglv@microsoft.com
Tested on a Celestica Seastone2 DX030 switch
Testing scenarios:
- Various QSFP ports in both normal and breakout config.
- 100G and 40G link speed show different colors.
- SFP1 port works.
Signed-off-by: Christian Svensson <blue@cmd.nu>
- Why I did it
For MSN4410/MSN4600/MSN4700 now they can support fetching PSU voltage threshold, no need to skip the psu voltage check in system health monitoring, so update the system health monitoring configuration file for these platforms.
- How I did it
remove skip PSU change config from the system_health_monitoring_config.json file
- How to verify it
Build image run on these platforms, system health monitoring will not report error against PSU voltage
Signed-off-by: Kebo Liu <kebol@nvidia.com>
- Why I did it
Optimize thermal control policies to simplify the logic and add more protection code in policies to make sure it works even if kernel algorithm does not work.
- How I did it
Reduce unused thermal policies
Add timely ASIC temperature check in thermal policy to make sure ASIC temperature and fan speed is coordinated
Minimum allowed fan speed now is calculated by max of the expected fan speed among all policies
Move some logic from fan.py to thermal.py to make it more readable
- How to verify it
1. Manual test
2. Regression
- Why I did it
MSN4700 platform has 8 lanes per port and thus can support 2x40G with each lane running at 10G
- How I did it
Added 40G to 2x200G breakout mode in platform.json
- How to verify it
Run config int break Ethernet0 2x40G[200G,100G,50G,25G,10G,1G]
And verify the command runs successfully and the port speed was set to 40G with a 2x breakout.
* Description: Currently IPv4 routes with IPv6 link local next hops are
not properly installed in FPM.
Reason is the netlink decoding truncates the ipv6 LL address to 4 byte
ipv4 address.
Ex : fe80:: is directly converted to ipv4 and it results in 254.128.0.0
as next hop for below routes
show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route, q - queued, r - rejected, b - backup
B>* 2.1.0.0/16 [200/0] via fe80::268a:7ff:fed0:d40, Ethernet0, weight 1,
02:22:26
B>* 5.1.0.0/16 [200/0] via fe80::268a:7ff:fed0:d40, Ethernet0, weight 1,
02:22:26
B>* 10.1.0.2/32 [200/0] via fe80::268a:7ff:fed0:d40, Ethernet0, weight
1, 02:22:26
Hence this fix converts the ipv6-LL address to ipv4-LL (169.254.0.1)
address before sending it to FPM. This is inline with how these types of
routes are currently programmed into kernel.
Signed-off-by: Nikhil Kelapure <nikhil.kelapure@broadcom.com>
- Why I did it
The feature state can be a jinja template, like in this file - https://github.com/Azure/sonic-buildimage/blob/master/files/build_templates/init_cfg.json.j2#L39.
Without this change it is not possible to validate a configuration file.
- How I did it
Relaxes the constraint on feature state. Feature state leaf can be any string.
- How to verify it
Run UT.
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
Fixes#9561Fixes#9570Fixes#9563
Partial fix for #9556
#### Why I did it
- Attributes for dual ToR configs lack YANG model support
#### How I did it
- Extend YANG tests to cover dual ToR use cases
- Extend YANG model to cover dual ToR use cases
- Reduce the default log level to warning so only test failures are printed
#### How to verify it
- Run the YANG model unit tests
* fix workdir for seastone2
Signed-off-by: Viktor Ekmark <viktor@ekmark.se>
* seastone2: Add I2C SFP definition for SFP1
Signed-off-by: Christian Svensson <blue@cmd.nu>
* [device/cel_seastone_2] sfputil logic for SFP1
Earlier logic resulted in the name of SFP1 being SFP33 which is not
correct. The cannonical source is seastone2_fpga module and it calls it
SFP1, so ensure the logic does as well.
Signed-off-by: Christian Svensson <blue@cmd.nu>
* [device/cel_seastone_2] sysfs paths for SFP1
Various changes that plumbs the correct port presence and DOM decoding
for the SFP1 port.
Signed-off-by: Christian Svensson <blue@cmd.nu>
Co-authored-by: Christian Svensson <blue@cmd.nu>
#### Why I did it
resolves https://github.com/Azure/sonic-buildimage/issues/8779
snmpd writes the below error message in syslog :
snmp#snmpd[27]: truncating integer value > 32 bits
This message is written in syslog when the hrSystemUptime(1.3.6.1.2.1.25.1.1.0 / system uptime) or sysUpTime(1.3.6.1.2.1.1.3 network management portion or snmpd uptime) is queried when either of these counters overflow beyond 32 bit value. This happens the device uptime or snmpd uptime is more than 497 days.
#### How I did it
Reference: https://access.redhat.com/solutions/367093 and https://linux.die.net/man/1/snmpcmd
To avoid seeing this message if the counter grows, the snmpd error log level is changed to display LOG_EMERG, LOG_ALERT, LOG_CRIT, and LOG_DEBUG.
Without this change, LOG_ERR and LOG_WARNING would also be logged in syslog.
#### How to verify it
On a device which is up for more than 497 days, modify supervisord.conf with the change and restart snmp.
Query 1.3.6.1.2.1.1.3 and verify that log message is not seen.
Why I did it
The existing log file size in sonic is 1 Mb. Over a period of time this leads to huge number of log files which becomes difficult for monitoring applications to handle.
Instead of large number of small files, the size of the log file is not set to 16 Mb which reduces the number of files over a period of time.
How I did it
Changed the size parameter and related macros in logrotate config for rsyslog
How to verify it
Execute logrotate manually and verify the limit when the file gets rotated.
Signed-off-by: Sudharsan Dhamal Gopalarathnam <sudharsand@nvidia.com>
- Why I did it
Add sensor conf for MSN4600C A1 platform
- How I did it
Add a new sensor conf file and relevant scripts to support two different versions of the platform
- How to verify it
Run "sensors" cmd to check the output on the A1 platform to see whether it's as expected.
Signed-off-by: Kebo Liu <kebol@nvidia.com>
#### Why I did it
It should be handled by `ConfigDBConnector.typed_to_raw()`.
This is a bug for `sonic-cfggen -m --print-data` only
```
"PORTCHANNEL_MEMBER": {
"PortChannel0001|Ethernet112": {
"NULL": "NULL"
},
"PortChannel0002|Ethernet116": {
"NULL": "NULL"
},
"PortChannel0003|Ethernet120": {
"NULL": "NULL"
},
"PortChannel0004|Ethernet124": {
"NULL": "NULL"
}
},
```
But not appears in `sonic-cfgen -d --print-data`.
```
"PORTCHANNEL_MEMBER": {
"PortChannel0001|Ethernet112": {},
"PortChannel0002|Ethernet116": {},
"PortChannel0003|Ethernet120": {},
"PortChannel0004|Ethernet124": {}
},
```
Tested in a T0 KVM.
What I did:-
Enhanced minigraph parser to parse interface name associated with static route nexthop
Why I did:-
One of the use case to support interface name is Chassis Packet. For Chassis Packet we have Static Routes configured to route traffic across line-card. If the FRR programs static route without the interface name then in case if the ip interface that is associated with the nexthop goes down FRR resolves static route nexthop over the default route as we have FRR config ip nht-resolve-via-default which causes undesired behavior. Having interface name with Static Route prevents recursive lookup on default route.
How I verify:
Updated unit-test cases
Manual verification
dd71848 [GCU] Show default option for '--format' (#2003)
f296e76 [GCU] Disallowing DeleteInsteadOfReplaceMoveExtender from generating delete whole config move (#2006)
731d643 [flow counter] Fix issue: should not compare str with int (#2001)
e628f01 Support CLI for buffer queue configuration (#1965)
585fd40 Fix show ip bgp nei command rw required issue (#2011)
Update ztp sub module to include the below fixes:
f7dd3c5 [sonic-ztp]Fixing build failure after bullseye integration (#30)
9218e16 Replace swsssdk.ConfigDBConnector and SonicV2Connector with swsscommon(#28)
Signed-off-by: Sudharsan Dhamal Gopalarathnam <sudharsand@nvidia.com>