- Why I did it
To add new SKU Mellanox-SN4700-O8V48 with following requirements:
- How I did it
Create new SKU files based on the below definition:
* Port Mapping: 1-12 2x200G, 13-20 1x400G, 21-32 2x200G
T0 topology: 48x200G Downlinks 8x400G uplinks.
Length of downlink: 5m
Length of uplink: 40m
* Auto-negotiation enable/disable: Yes
* FEC mode: RS
* Shared headroom: Enabled
* Shared headroom pool factor: 2
* Warmboot enabled: yes
- How to verify it
SONiC build with new SKU finish init, all ports up, qos tests suite from sonic-mgmt
A W/A to overcome delay of about 20 sec on login due to MFT bash autocompletion bug.
Should be reverted once a formal solution will be available in future MFT release.
- Why I did it
To overcome SN2700 20 sec delay on login
- How I did it
Removed MFT bash autocompletion part
- How to verify it
1. Build a mellanox image
2. Verify no such links after system boot.
Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
FCS/CRC Errors will only be reported as RX_ERR.
Fix to avoid the mac port related errors.
Fix for sharedResSize testcase failure in QoS-SAI
Fix the issue related to voltage in 'show platform psustatus'.
Support WRED drop for lossy queues.
Fixed an issue where lossy traffic was getting dropped.
Enhancement of SAI logging for errors and interrupts
To modify EEPROM API serial_number_str to return service tag instead of serial number in Dell S6100.
Ref PR: #1239
How I did it
Update EEPROM API serial_number_str to return service tag instead of serial number.
How to verify it
Verify decode-syseeprom -s returns service tag in Dell S6100.
Why I did it
1. Upgrade centec-arm64 platform to Bookworm.
2. Solve the problem of compiling the docker-syncd-centec-rpc.gz error on the centec platform.
How I did it
1. Modified platform driver to comply with bookworm kernel.
2. Upgrade SONiC package versions of the centec platform.
How to verify it
1. Compile the centec-arm64 platform to generate sonic-centec-arm64.bin.
2. Compile the centec platform to generate docker-syncd-centec-rpc.gz.
Signed-off-by: centecqianj <qianj@centec.com>
- Why I did it
Add support of a new 't1-smartswitch' topology to the sample config generator. The topology passed to sonic-cfggen utility as a parameter to generate sample configuration for Smart Switch:
sonic-cfggen -k <SKU> --preset t1-smartswitch ...
- How I did it
Extend sample config generator to support new topology and read Smart Switch specific data from hwsku.json.
- How to verify it
Run unit tests. The changes are covered with the new unit tests.
- Why I did it
Improve boot performance mostly needed for fast and warmboot
- How I did it
Use cached variable.
- How to verify it
Boot the system. Simply do "systemd-analyze blame" and look at service start time.
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
- Why I did it
Improve boot performance mostly needed for fast and warmboot
- How I did it
Use cached variable.
- How to verify it
Boot the system. Simply do "systemd-analyze blame" and look at service start time.
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
- Why I did it
Enable CMIS host management for Mellanox devices which are expected to support the feature
- How I did it
new thread in a new file and changing logic in platform code in chassis.py which is calling this thread from get_change_event()
this thread in the new file handles the state machine per port.
first the static detection takes place once the thread is up (during switch bootup sequence), until final decision if it's FW control or SW control module.
After it ends, the dynamic detection takes place, listening to changes in the sysfs fds, per port,
so it will be able to detect plug in or out events of a cable.
- How to verify it
Enhanced unit tests
run sonic mgmt on Nvidia SN4700 with CMIS host management enabled
#### Why I did it
src/sonic-swss
```
* 6026b6d6 - (HEAD -> master, origin/master, origin/HEAD) [dash] add ACL group bind check for rule create/update (#2974) (88 minutes ago) [Yakiv Huryk]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/linkmgrd
```
* e420df4 - (HEAD -> master, origin/master, origin/HEAD) Exclude DbInterface in PR coverage check (#224) (5 hours ago) [Jing Zhang]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-platform-common
```
* 427217b - (HEAD -> master, origin/master, origin/HEAD) Adding supported vendor PNs for remote CDB FW upgrade (#418) (2 hours ago) [mihirpat1]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-sairedis
```
* e7ad356 - (HEAD -> master, origin/master, origin/HEAD) [Azp]: Update dash api source from buildimage to submodule (#1330) (17 hours ago) [Ze Gan]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Why I did it
SONiC Mgmt test syslog/test_syslog_rate_limit.py syslog.test_syslog_rate_limit test_syslog_rate_limit was failing on SKUs with gbsyncd. This includes Arista 720DT when testing on the 202305 branch.
How I did it
The issue was no value for gbsyncd in "show syslog rate-limit-container",
because gbsyncd is not having a SYSLOG_CONFIG_FEAGTURE|gbsyncd entry in
config_db, which is further because gbsyncd feature is for not enabled
through init_cfg.json.j2.
How to verify it
Test is now passing on 720DT in 202305 branch.
Co-authored-by: Boyang Yu <byu@arista.com>
#### Why I did it
SNMP query over IPv6 does not work due to issue in net-snmp where IPv6 query does not work on multi-nic environment.
To get around this, if snmpd listens on specific ipv4 or ipv6 address, then the issue is not seen.
We plan to configure Management IP and Loopback IP configured in minigraph.xml as SNMP_AGENT_ADDRESS in config_db., based on changes discussed in https://github.com/sonic-net/SONiC/pull/1457.
##### Work item tracking
- Microsoft ADO **(number only)**:26091228
#### How I did it
Modify minigraph parser to update SNMP_AGENT_ADDRESS_CONFIG with management and Loopback0 IP addresses.
Modify snmpd.conf.j2 to use SNMP_AGENT_ADDRESS_CONFIG table if it is present in config_db, if not listen on any IP.
Main change:
1. if minigraph.xml is used to configure the device, then snmpd will listen on mgmt and loopback IP addresses,
2. if config_db is used to configure the device, snmpd will listen IP present in SNMP_AGENT_ADDRESS_CONFIG if that table is present, if table is not present snmpd will listen on any IP.
#### How to verify it
config_db.json created from minigraph.xml for single asic VS image with mgmt and Loopback IP addresses.
```
"SNMP_AGENT_ADDRESS_CONFIG": {
"10.1.0.32|161|": {},
"10.250.0.101|161|": {},
"FC00:1::32|161|": {},
"fec0::ffff:afa:1|161|": {}
},
.....
snmpd listening on the above IP addresses:
admin@vlab-01:~$ sudo netstat -tulnp | grep 161
tcp 0 0 127.0.0.1:3161 0.0.0.0:* LISTEN 71522/snmpd
udp 0 0 10.250.0.101:161 0.0.0.0:* 71522/snmpd
udp 0 0 10.1.0.32:161 0.0.0.0:* 71522/snmpd
udp6 0 0 fec0::ffff:afa:1:161 :::* 71522/snmpd
udp6 0 0 fc00:1::32:161 :::* 71522/snmpd
```
#### Why I did it
src/sonic-sairedis
```
* cd41369 - (HEAD -> master, origin/master, origin/HEAD) [Link Event Damping] Serialization/deserialization logic for link event (#1322) (2 days ago) [Ashish Singh]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-restapi
```
* 24d440f - (HEAD -> master, origin/master, origin/HEAD) [build] Fix Makefile didn't set go build target file. #151 (39 minutes ago) [Liu Shilong]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Signed-off-by: Nazarii Hnydyn nazariig@nvidia.comCloses#17345
This W/A was proposed by Nvidia FRR team before the long term solution is ready.
Why I did it
A W/A to fix default route installation during LAG member flap
Work item tracking
N/A
How I did it
Disabled FRR next hop group support
How to verify it
Do LAG member flap
### Why I did it
1. Protobuf 3.21 has been released in the Debian bookworm
2. Update submodule sonic-swss and sonic-dash-api because they include related updates.
##### Work item tracking
- Microsoft ADO **(number only)**:
#### How I did it
1. In the protobuf.mk, If it isn't bullseye, ignore to compile the protobuf package
2. Move sonic-swss commits:
```
fd852084 (HEAD, origin/master, origin/HEAD) [dashrouteorch]: Rename dash route namespace (#2966)
```
3. Move sonic-dash-api and move build chain to its submodule
```
d4448c7 (HEAD, origin/master, origin/HEAD, master) [azp]: Add multi-platform artifacts (#11)
8a5e5cc [debian]: Add debian package (#10)
d96163a [misc]: Add dash utils and its tests (#9)
```
#### How to verify it
Check Azp
Fix can't access IPV6 address via management interface because 'default' route table does not add to route lookup issue.
#### Why I did it
When device set with IPV6 TACACS server address, and shutdown all BGP, device can't connect to TACACS server via management interface.
After investigation, I found the IPV6 'default' route table does not add to route lookup:
admin@vlab-01:~$ ip -6 rule list
1001: from all lookup local
32765: from fec0::ffff:afa:1 lookup default
32766: from all lookup main
admin@vlab-01:~$
As compare:
admin@vlab-01:~$ ip -4 rule list
1001: from all lookup local
32764: from all to 172.17.0.1/24 lookup default
32765: from 10.250.0.101 lookup default
32766: from all lookup main
32767: from all lookup default <== 'default' route table exist in IPV4 route lookup
Issue fix by add 'default' route table to route lookup with following command:
admin@vlab-01:~$ sudo ip -6 rule add pref 32767 lookup default
admin@vlab-01:~$ ip -6 rule list
1001: from all lookup local
32765: from fec0::ffff:afa:1 lookup default
32766: from all lookup main
32767: from all lookup default <== 'default' route table been added to IPV6 route lookup
admin@vlab-01:~$
##### Work item tracking
- Microsoft ADO: 25798732
#### How I did it
When management interface using 'default' route table, add 'default' route table to IPV6 route lookup.
#### How to verify it
Pass all UT.
Add new UT to cover this change.
Manually verify issue fixed:
### Tested branch (Please provide the tested image version)
- [x] master-17281.417570-2133d58fa
#### Description for the changelog
Fix can't access IPV6 address via management interface because 'default' route table does not add to route lookup issue.
- Why I did it
Provide a dummy implementation for SFP error description when CMIS host management is enabled. A future feature shall be raised to implement SFP error description for such mode.
- How I did it
if SFP is under software control, provide "Not supported" as error description
if SFP is under initialization, provide "Initializing" as error description
- How to verify it
unit test
#### Why I did it
src/sonic-platform-common
```
* d09e009 - (HEAD -> master, origin/master, origin/HEAD) APIs to help in finding NPU SI settings (#410) (18 minutes ago) [mihirpat1]
```
#### How I did it
#### How to verify it
#### Description for the changelog
This commit adds support for pensando asic called ELBA. ELBA is used in pci based cards and in smartswitches.
#### Why I did it
This commit introduces pensando platform which is based on ELBA ASIC.
##### Work item tracking
- Microsoft ADO **(number only)**:
#### How I did it
Created platform/pensando folder and created makefiles specific to pensando.
This mainly creates pensando docker (which OEM's need to download before building an image) which has all the userspace to initialize and use the DPU (ELBA ASIC).
Output of the build process creates two images which can be used from ONIE and goldfw.
Recommendation is use to use ONIE.
#### How to verify it
Load the SONiC image via ONIE or goldfw and make sure the interfaces are UP.
##### Description for the changelog
Add pensando platform support.
- Why I did it
Mellanox MSN2410 platforms have a non-functional error log: "ERR pmon#sensord: Error getting sensor data: dps460/#10: Can't read". This error is because of a firmware issue with some PSU, we are not able to upgrade the FW online. Since there is no functional impact, this error log can be ignored safely
- How I did it
Add a new rsyslog rule to the rsyslog-container.conf.j2, if the docker name is pmon and the platform name matches, the new rule will be inserted into the docker rsyslogd.conf
- How to verify it
run regression on the MSN2410 platform to make the error log will not be printed to the syslog.
Signed-off-by: Kebo Liu <kebol@nvidia.com>
How I did it
Modified platform driver to comply with bookworm kernel.
Modified python build commands for building whl packages.
How to verify it
Verify whether all the platform bookworm debs are built.
make target/debs/bookworm/platform-modules-v682-48y8c-d_1.0_amd64.deb
Load the platform debian into the device and install it in bookworm image.
Verify the platform related CLI and the functionality
Signed-off-by: centecqianj <qianj@centec.com>
#### Why I did it
src/sonic-host-services
```
* e8ae2af - (HEAD -> master, origin/master, origin/HEAD) [featured]: Add database services for DPU (#84) (24 hours ago) [Ze Gan]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Why I did it
To fix the EVPN type5 failure seen in FRR when there are multipaths for nexthop. The type5 routes were queued
show ip route vrf Vrf1
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR,
f - OpenFabric,
> - selected route, * - FIB route, q - queued, r - rejected, b - backup
t - trapped, o - offload failure
VRF Vrf1:
B>q 5.5.5.0/24 [200/0] via 30.0.0.2, Vlan100 onlink, weight 1, 00:00:40
q via 40.0.0.3, Vlan100 onlink, weight 1, 00:00:40
C>* 10.0.0.0/24 is directly connected, Vlan10, 00:00:43
B>q 100.0.0.0/24 [200/0] via 30.0.0.2, Vlan100 onlink, weight 1, 00:00:40
q via 40.0.0.3, Vlan100 onlink, weight 1, 00:00:40
Work item tracking
Microsoft ADO (number only):
How I did it
Porting the FRR fixFRRouting/frr#14835
How to verify it
Validated EVPN multipath with the scenario and confirmed its working.
The format of the media_settings.json file was updated to support the Port SI Per Speed Enhancements. Since media_checker is the validator for the media_settings.json file, it needs to be updated to align with the new format.
How I did it
I added six new SI parameter names introduced as part of the Port SI Per Speed Enhancements. Additionally, I implemented handling for the new hierarchy level (lane_speed_key) in the updated media_settings.json format while maintaining backward compatibility with vendors whose JSON does not support port SI per speed.
How to verify it
I locally built the Debian package using 'make target/debs/bullseye/sonic-device-data_1.0-1_all.deb,' and it completed successfully. Jenkins also built the entire image, which includes the media_checker as part of its process.
[arp_update]: Flush MAC mismatch neighbors
- Check for MAC mismatch between neighbor entries in the kernel and APPL_DB
- Flush any entries with a mismatch
Currently in this repo would not build dhcp_server container image by default, which would cause that building issue for dhcp_server introduced by other modules cannot be noticed in time.
This PR is to set build dhcp_server container in vs image.
* Update sonic-utilities to master branch version
sonic-utilities was (intentionally) pointing to a commit on a fork,
since merging sonic-utilities's changes for Bookworm first onto the
master branch would result in PR checker failures. Now that
sonic-buildimage is on master branch and the Bookworm changes in
sonic-utilities have been merged into master, sonic-utilties can now
point to master.
17e77fe2 Revert "Run yang validation in unit test (#3025)" (#3055)
96dd5559 [dhcp_relay] Fix dhcp_relay counter display issue (#3054)
6dfeee69 [sflow][db_migrator] Egress Sflow support (#3020)
02a588b7 Don't collect /proc/sched_debug
d7ec3251 Fix error about having a mutable default for field headers in dataclass
0ab3ab91 Fix test execution on Bookworm (#3041)
ef8f6f83 Specify test dependencies under extra_requires
61c44e80 Update python packages
1e813105 [wol] Implement wol command line utility (#3048)
8ebc56a0 [sonic_installer]: Improve exception handling: introduce notes. (#3029)
3610ce93 [sonic-package-manager] Fix YANG validation failure on upgrade when feature has constraints in YANG model on FEATURE table (#2933)
cfd2dd39 Add container rsyslog.conf to the sys dump (#3039)
c4b07828 Support new platform in generic configuration update (#3038)
a8d236c8 [fast-reboot-filter-routes.py] Remove click and improve error reporting (#3030)
75199c0f [sonic-package-manager] insert newline in /etc/sonic/generated_services.conf (#3040)
cd855698 [VOQ][saidump] Modify generate_dump: replace save_saidump with save_saidump_by_route_size (#2972)
f1e24ae5 GCU support for Cisco-8000 features (#3010)
67e1c3dc Update GCU rsyslog validator (#3012)
253b7975 [sonic-package-manager] do not modify config_db.json (#3032)
177dd8e8 [sonic-package-manager] add generated service to /etc/sonic/generated_services.conf (#3037)
62fcd77a Configure NTP according to extended configuration (#2835)
ced09404 [dualtor_neighbor_check] Adjust zero-mac check condition (#3034)
a4eeb698 [config] config reload should generate sysinfo if missing (#3031)
e01fc891 Run yang validation in unit test (#3025)
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
#### Why I did it
src/sonic-host-services
```
* 445ec8b - (HEAD -> master, origin/master, origin/HEAD) Revert "Add support to make determine/process reboot-cause services restartable (#86)" (#89) (31 hours ago) [anamehra]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-platform-daemons
```
* e2d9f87 - (HEAD -> master, origin/master, origin/HEAD) Add dynamic sensor logic for fixed and psu presence/state checking in thermalctld (#401) (27 hours ago) [Gregory Boudreau]
```
#### How I did it
#### How to verify it
#### Description for the changelog