Why I did it
To pick up the below DNX fixes:
CS00012275689: DSCP->TC and TC->QUEUE mappings are not happening for packets received on LAG ports (SONIC-69367)
CS00012277618: Crash in _brcm_sai_dnx_irpp_port_core_get (SONIC-70001)
How I did it
Updated SAI branch with the above fixes
How to verify it
Ran basic sonic-mgmt tests with the SAI debian on XGS and DNX platforms
utilities:
* c63a62b 2023-01-23 | [muxcable][config] Add support to enable/disable ceasing to be an advertisement interface when `radv` service is stopped (#2622) (HEAD -> 202205) [Jing Zhang]
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Why I did it
Fix issue caused by dualtor support PR [dhcpmon] Open different socket for dual tor to enable interface filtering #11201
Improve code
How I did it
On single ToR, packets received count was duplicated due to socket filter set to "inbound"
Tx count not increasing due to filter set to "inbound". Added an outbound socket to count tx packets
Added vlan member interface mapping for Ethernet interface to vlan interface lookup in reference to PR Fix multiple vlan issue sonic-dhcp-relay#27
Exit when socket fails to initialize to allow dhcp_relay docker to restart
How to verify it
Tested on vstestbed single tor and dual tor, sent packets and verify printed out dhcpmon rx and tx counters is correct
Correct number of tx increases
Tx does not increase when ToR is on standby
Why I did it
If make fails, we can't rerun the make process, because existing patches can't apply again.
How I did it
Check if patches are applied. if yes, don't apply patches again.
How to verify it
Co-authored-by: Liu Shilong <shilongliu@microsoft.com>
Why I did it
[Seastone] Enhancement fix for PR12200 syseeprom issue.
How I did it
Enhance the fix through replace the hardcoded devnum to bash variable
How to verify it
show platform syseeprom or decode-syseeprom
Co-authored-by: Ikki Zhu <79439153+qnos@users.noreply.github.com>
Why I did it
This PR is to update minigraph.py to support both port alias and port name as input of AttachTo attribute of ACL table.
Before this change, only port alias is supported.
How I did it
Add a global variable to store port names
Search both port names and port alias wheh parsing the value of AttachTo.
How to verify it
Verified by a new unit test case test_minigraph_acl_attach_to_ports
Verified by copying the new minigraph.py to a testbed and run conflg load_minigraph.
utilities:
* 3ebe948 2023-01-14 | [show] Add bgpraw to show run all (#2537) (HEAD -> 202205) [jingwenxie]
* 7979b9b 2022-12-05 | Transceiver eeprom dom CLI modification to show output from TRANSCEIVER_DOM_THRESHOLD table (#2535) [mihirpat1]
swss:
* 4ad82c5 2023-01-13 | Changed the BFD default detect multiplier to 10x (#2614) (HEAD -> 202205) [siqbal1986]
* 4fe7138 2023-01-12 | [MuxOrch] Enabling neighbor when adding in active state (#2601) [Nikola Dancejic]
sairedis:
* 2f6cbd3 2023-01-19 | Fix for [EVPN] When MAC moves from remote end point to local, ASIC DB fields are not updated properly for the mac #11503 (#1173) (github/202205) [anilkpan]
platform-daemon:
* 2851d86 2023-01-17 | Chassisd do an explicit stop of the config_manager (#328) (HEAD -> 202205) [judyjoseph]
platform-common:
* 2995989 2022-12-06 | Add get_transceiver_status and get_transceiver_pm to API interface (#315) (HEAD -> 202205) [longhuan-cisco]
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
#### Why I did it
This fixed memory leak in ETHERLIKE-MIB. The fix is not part of net-snmp(5.7.3 version). This PR includes the patch to fix memory leak issue.
```
ke->name in stdup-ed at line 297: n->name = strdup(RTA_DATA(tb[IFLA_IFNAME]));
```
#### How I did it
patched the fix.
[net-snmp] upstream fix link -> [snmpd|upstream link](ed4e48b5fa)
#### How to verify it
**Before The fix**
used valgrind to find memory leak.
```
root@lnos-x1-a-csw06:/# grep "definitely lost" valgrind-out.txt
==493== 4 bytes in 1 blocks are definitely lost in loss record 1 of 333
==493== 16 bytes in 1 blocks are definitely lost in loss record 25 of 333
==493== 757 bytes in 71 blocks are definitely lost in loss record 214 of 333
==493== 1,168 (32 direct, 1,136 indirect) bytes in 1 blocks are definitely lost in loss record 293 of 333
==493== 1,168 (32 direct, 1,136 indirect) bytes in 1 blocks are definitely lost in loss record 294 of 333
==493== 1,168 (32 direct, 1,136 indirect) bytes in 1 blocks are definitely lost in loss record 295 of 333
==493== 1,168 (32 direct, 1,136 indirect) bytes in 1 blocks are definitely lost in loss record 296 of 333
==493== definitely lost: 905 bytes in 77 blocks
```
_we can see the memory leak see in stack trace._
-> dot3stats_linux -> get_nlmsg -> strdup
https://github.com/net-snmp/net-snmp/blob/v5.7.3/agent/mibgroup/etherlike-mib/data_access/dot3stats_linux.chttps://github.com/net-snmp/net-snmp/blob/v5.7.3/agent/mibgroup/etherlike-mib/data_access/dot3stats_linux.c#L277
```
n = malloc(sizeof(*n));
memset(n, 0, sizeof(*n));
n->ifindex = ifi->ifi_index;
n->name = strdup(RTA_DATA(tb[IFLA_IFNAME]));
memcpy(&n->stats, RTA_DATA(tb[IFLA_STATS]), sizeof(n->stats));
n->next = kern_db;
kern_db = n;
return 0;
```
we were not freeing space for EtherLike-MIB.AS interface mib queries were getting increased, we see memory increment.
```
kern_db = ke->next;
free(ke);
```
https://github.com/net-snmp/net-snmp/blob/v5.7.3/agent/mibgroup/etherlike-mib/data_access/dot3stats_linux.c#L467
```
==55== 757 bytes in 71 blocks are definitely lost in loss record 186 of 299
==55== at 0x483577F: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==55== by 0x4EB6E49: strdup (strdup.c:42)
==55== by 0x493F278: get_nlmsg (dot3stats_linux.c:299)
==55== by 0x493F529: rtnl_dump_filter_l.constprop.3 (dot3stats_linux.c:370)
==55== by 0x493FD7A: rtnl_dump_filter (dot3stats_linux.c:401)
==55== by 0x493FD7A: _dot3Stats_netlink_get_errorcntrs (dot3stats_linux.c:424)
==55== by 0x494009F: interface_dot3stats_get_errorcounters (dot3stats_linux.c:530)
==55== by 0x48F6FDA: dot3StatsTable_container_load (dot3StatsTable_data_access.c:330)
==55== by 0x485E76B: _cache_load (cache_handler.c:700)
==55== by 0x485FA37: netsnmp_cache_helper_handler (cache_handler.c:638)
==55== by 0x48720BC: netsnmp_call_handler (agent_handler.c:526)
==55== by 0x48720BC: netsnmp_call_next_handler (agent_handler.c:640)
==55== by 0x4865F75: table_helper_handler (table.c:717)
==55== by 0x4871B66: netsnmp_call_handler (agent_handler.c:526)
==55== by 0x4871B66: netsnmp_call_handlers (agent_handler.c:611)
757 bytes in 71 blocks are definitely lost in loss record 214 of 333
==493== at 0x483577F: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==493== by 0x4EB6E49: strdup (strdup.c:42)
==493== by 0x493F278: ??? (in /usr/lib/x86_64-linux-gnu/libnetsnmpmibs.so.30.0.3)
==493== by 0x493F529: ??? (in /usr/lib/x86_64-linux-gnu/libnetsnmpmibs.so.30.0.3)
==493== by 0x493FD7A: _dot3Stats_netlink_get_errorcntrs (in /usr/lib/x86_64-linux-gnu/libnetsnmpmibs.so.30.0.3)
==493== by 0x494009F: interface_dot3stats_get_errorcounters (in /usr/lib/x86_64-linux-gnu/libnetsnmpmibs.so.30.0.3)
==493== by 0x48F6FDA: dot3StatsTable_container_load (in /usr/lib/x86_64-linux-gnu/libnetsnmpmibs.so.30.0.3)
==493== by 0x485E76B: _cache_load (cache_handler.c:700)
==493== by 0x485FA37: netsnmp_cache_helper_handler (cache_handler.c:638)
==493== by 0x48720BC: netsnmp_call_handler (agent_handler.c:526)
==493== by 0x48720BC: netsnmp_call_next_handler (agent_handler.c:640)
==493== by 0x4865F75: table_helper_handler (table.c:717)
==493== by 0x4871B66: netsnmp_call_handler (agent_handler.c:526)
==493== by 0x4871B66: netsnmp_call_handlers (agent_handler.c:611)
```
```
**After The fix**
no memory leak in valgrind stack trace related to etherlike MIB.
```
Why I did it
When getting system mac of centec platform, it would increase by 1 the last byte of mac, but it could not consider the case of carry.
How I did it
Firstly, I would replace the ":" with "" of mac to a string.
And then, I would convert the mac from string to int and increase by 1, at last convert it to string with inserting ":".
add module reboot APIs for chassis
add supervisor module on linecard (fixes show chassis module midplane-status)
improve RTC update mechanism and sync every 10 mins
fix sbtsi temp sensor presence/thresholds
fix Mineral status leds
remove thermal object on xcvrs
misc fixes
- Why I did it
To include latest fixes and new functionality
SAI
1. Temporary WA for query enum capabilities for tunnel peer mode, to not return P2P
2. sai debug dump returns while last extra dump is running
3. open inner SRC and DST IP for ECMP / LAG general hash objects
4. tunnel peer mode returns hard coded
5. tunnel decap dscp mode
6. support default tunnel src ip
7. failure to add a port to a LAG in VLANs configured with flood_ctrl
8. Add P2P peer mode for IP in IP tunnels
9. Add per port IP counters
10. Clean up VXLAN srcport static (XML) functionality, as only dynamic (API) is in use
11. Fix enum capabilities of native hash fields
12. sai_acl_db_group_ptr usage
13. Clean QoS config of the LAG when all members was removed (bug
SDK/FW
1. Fixed bug in recovery mechanism in case of I2C error when trying to access the XSFP module.
2. On the NVIDIA Spectrum-2 switch, when receiving a packet with Symbol Errors on ports that are configured to cut-thought mode, a pipeline might get stuck.
3. On the Spectrum-2 and Spectrum-3 switch, if you enable ECN marking and the port is in split mode, traffic sent to the port under congestion (for example, when connecting two ports with a total speed of 50GbE to a single 25GbE port) is not marked.
4. Modifying existing entry/Adding new one when switch is at its maximum capacity (full by maximum allowed entries from any type such as routes, FDB, and so forth), will fail with an error.
5. When many ports are active (e.g., 70 ports up), and the configuration of shared buffer is applied on the fly, occasionally, the firmware might get stuck.
6. When a system has more than 256 ACL rules, on rare occasion, removing/adding rules may cause some ACL rules not to work.
7. On SN2201 system, on RJ45 port, the link might appear in 'down' state even if it operations properly.
8. Layer 4 port information is not initialized for BFD packet event. To address the issue, remote peer UDP port information was added in BFD packet event.
9. When setting LAG as a SPAN analyzer, the distributor mode of the LAG members was not taken into account. It may happen that the LAG member with distributor mode disabled will be set as a SPAN analyzer port.
- How I did it
Updated SDK/SAI submodule and relevant makefiles with the required versions.
- How to verify it
Build an image and run tests from "sonic-mgmt".
Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
Why I did it
There is a queue in sysmonitor.py that is created based on an object of multiprocessing.Manager.
After performing fast-reboot, system health monitor is being shut down, what causes this Manager to be shut down as well, since it is a child-process of healthd.
That's why I moved the creation of this Manager from the top of the file to the function Sysmonitor.system_service() (The only place it is used), to make Manager a child-process of Sysmonitor, instead of Healthd. This way both the queue (the Manager) and the processes that uses this queue will be child-processes of the same process, and the problematic scenario of sysmonitor sending messages to a dead queue will not be possible.
How I did it
Removed the definition of manager as global and moved it to system_service() function
How to verify it
Perform a fast reboot and verify the traceback issue is fixed
Why I did it
To bring in the following fixes:
Revert temporary fix added to disable SA equal DA drops
CS00012273013 - [7.1][J2, J2c+] Disable SA Equals DA trap on DNX
CS00012274222 - How to block the voq for given destination port for a flow from a remote mod-id
CS00012275381 - SAI_INGRESS_PRIORITY_GROUP_STAT_PACKETS is incremented for port's PG's even if there are no traffic sent to that PG
CS00012274433 - Local Fault and Remote Fault are not polled by linkscan thread
How I did it
Merged above fixes to SAI code
How to verify it
Validated by running the basic sanity tests on XGS and DNX chassis platforms including
fib/test_fib.py
decap/test_decap.py
drop_counters/test_drop_counters.py
arp/test_arpall.py
Why I did it
sonic_host_services depends on deepdiff.
But latest deepdiff version has error.
How I did it
pin deepdiff to previous version.
How to verify it
Why I did it
In some cases, dpkg will call dpkg to validate version.
dpkg hook will get stuck in a loop to lock.
How I did it
Use an env variable to skip duplicated lock.
The deepdiff python package was recently updated to 6.2.3. As part of
this, a dependency was introduced on orjson. There's no armv8l python
wheel available for orjson, which means it needs to be built from
source. However, building it requires rust (which Buster and Bullseye
don't have a new enough version of) and maturin.
As a quick fix, pin this to version 6.2.2, before the orjson dependency
is introduced.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
What I did:
Fix : #13117
How I did:
During build time mask only those feature/services that are disabled explicitly. Some of the features ((eg: teamd/bgp/dhcp-relay/mux/etc..)) state is determine run-time so for those feature by default service will be up and running and then later hostcfgd will mask them if needed.
So Default behavior will be
init_cfg.json.j2 during build time make state as disabled then mask the service
init_cfg.json.j2 during build time make state as another jinja2 template render string than do no mask the service
init_cfg.json.j2 during build time make state as enabled then do not mask the service
How I verify:
Manual Verification.
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
Co-authored-by: abdosi <58047199+abdosi@users.noreply.github.com>
Why I did it
Update the Nokia-7215 device data which is required by the new version of the Marvell SAI 1.10.2-2 to fix the syncd crash issue.
How I did it
Update the device data in folder device/nokia/armhf-nokia_ixs7215_52x-r0 /Nokia-7215.
How to verify it
With this PR, syncd should be running fine. Syncd should not crash.
Signed-off-by: mlok <marty.lok@nokia.com>
Update sonic-utilities submodule pointer to include the following:
* dddd6c5 [202205] Revert the show-techsupport optimization PR's ([#2581](https://github.com/sonic-net/sonic-utilities/pull/2581))
Signed-off-by: dprital <drorp@nvidia.com>
Why I did it
To keep 'Request for xxx branch' label when finished auto-cherry-pick.
How I did it
Change logic in post cherry pick action.
How to verify it