- Why I did it
Add more log while doing sysfs reading to increase the debug capability
- How I did it
Log the relevant file path and error number while sysfs reading return None
- How to verify it
Manual test
ping command is not working inside PMON docker (bullseye)
Use case: chassisd checks for module reachability inside PMON for "show chassis modules midplane-status" CLI, and on Cisco chassis, this uses ping command to check network reachability
Why I did it
Upgrade sonic fips packages to version 0.2
Upgrade openssl version from 1.1.1k-1+deb11u1+fips to 1.1.1n-0+deb11u3+fips
Upgrade openssh version from 8.4p1-5+fips to 8.4p1-5+deb11u1+fips
How I did it
Change the makefile.
A change in sonic-utilities makes all cache files be saved into a
/tmp/cache. On swss restart this cache has to be removed in case swss
starts in cold or fast mode. A related cache restoration in the warmboot
finalizer script is also updated to use new location.
- Why I did it
To fix#9817. Clear the cache directory on swss.sh except for warm start.
Also, adopted finalize-warmboot script to take the cache directory.
- How I did it
A change in sonic-utilities makes all cache files be saved into a /tmp/cache. On swss restart this cache has to be removed in case swss starts in cold or fast mode. A related cache restoration in the warmboot finalizer script is also updated to use new location.
- How to verify it
Run togather with Azure/sonic-utilities#2232. Verify counters cache is removed on config reload, cold/fast reboots, swss restart.
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
Why I did it
The bgpcfgd doesn't support deletion of 'zebra set src', if an interface is deleted, the bgpcfgd will drop a warning message. In current implementation, we only care about the loopback0 interface but not others.
To improve the log print to have the key info, which will give the name of the deleted interface. We can ignore it if it is not the loopback0 interface. The application layer should be aware of that update and deletion is not supported, delete or update with a new address of loopback0 could cause issue, this log can give enough info to root cause the issue.
How I did it
How to verify it
For 7800 LCs, set LAG mode to support 1024 number of 16-member system
LAGs.
Why I did it
The SOC property changes are necessary to match #10519 which increases the number of system LAG IDs to 1024.
Description for the changelog
For 7800 LCs, set LAG mode to support 1024 number of 16-member system
LAGs.
What I did:
Added bgp as a dependent of swss
Why I did it:
bgp container was not restarting on swss crash. When swss crashes, linkmgrd
doesn't initate a switchover because it cannot access the default route from
orchagent. Bringing down bgp with swss will isolate the ToR, causing linkmgrd
to initiate a switchover to the peer ToR avoiding significant packet loss.
How I did it:
Added bgp to DEPENDENT
Signed-off-by: Nikola Dancejic <ndancejic@microsoft.com>
Updating sonic-utilities sub module with the following commits
ca785a2 Remove sonic-db-cli
#### Why I did it
To fix sonic-db-cli high CPU usage on SONiC startup issue: https://github.com/Azure/sonic-buildimage/issues/10218
sonic-db-cli re-write with c++ and move to sonic-swss-common repo.
#### How I did it
#### How to verify it
#### Which release branch to backport (provide reason below if selected)
#### Description for the changelog
ca785a2 Remove sonic-db-cli
#### A picture of a cute animal (not mandatory but encouraged)
Co-authored-by: liuh-80 <azureuser@liuh-dev-vm-02.5fg3zjdzj2xezlx1yazx5oxkzd.hx.internal.cloudapp.net>
Why I did it
This linecard runs in multi-asic mode and therefore needs the use_pcie_id_chassis file to work properly.
The default_sku file was also missing which would break the boot when no minigraph is provided.
Description for the changelog
Add missing default_sku and use_pci_id_chassis configs for 7800R3A-36D2
To reduce rc.local script execution time. Porting changes from [DellEMC] S6100 Platform Service optimization #10989
Changes:
Moving platform-modules-s6100.service and s6100-lpc-monitor.service asynchronous to rc.local script.
- Why I did it
Fix bug: pmon report error on start up because some SKUs do not have hwsku.json
- How I did it
If hwsku.json, do not extract RJ45 port information
- How to verify it
Manual test.
Unit test.
Spanning from sonic-net/sonic-linkmgrd#76, this PR is to update warm restart finalizer to wait for linkmgrd to be reconciled.
sign-off: Jing Zhang zhangjing@microsoft.com
Why I did it
To make sure finalizer save config after linkmgrd's reconciliation.
How I did it
Add linkmgrd to the reconciliation wait list of warmboot finalizer.
How to verify it
Verified on lab device, linkmgrd reconciled as expected.
Why I did it:
To fix hlx platform sfp+ module tx disable issue
How I did it:
Fix sfp+ tx disable function according SFF-8472 specification
Co-authored-by: Eric Zhu <erzhu@celestica.com>
Why I did it
This PR is to backport PR #11056 and PR #11045 into master branch.
This PR is to enable tunnel_qos_remap on T1 and T0 in DualToR deployment.
On T1, we check the property DownstreamRedundancyTypes. On T0, we check the property RedundancyType.
tunnel_qos_remap is set to enabled if gemini is in DownstreamRedundancyTypes (on T1) or RedundancyType (on T0).
How I did it
The change is implemented in minigraph.py.
How to verify it
Verified by test_minigraph_case.py and 'test_j2files.py`.
Fix in Monit memory_checker plugin. Skip fetching running containers if docker engine is down (can happen in deinit).
This PR fixes issue #11472.
Signed-off-by: liora liora@nvidia.com
Why I did it
In the case where Monit runs during deinit flow, memory_checker plugin is fetching the running containers without checking if Docker service is still running. I added this check.
How I did it
Use systemctl is-active to check if Docker engine is still running.
How to verify it
Use systemctl to stop docker engine and reload Monit, no errors in log and relevant print appears in log.
Which release branch to backport (provide reason below if selected)
The fix is required in 202205 and 202012 since the PR that introduced the issue was cherry picked to those branches (#11129).
Why I did it
ip command cannot update packet number if the cipher is XPN.
How I did it
Specify SSCI when update packet number and ignore SSCI value if update action.
Signed-off-by: Ze Gan <ganze718@gmail.com>
What I did:
Following changes done for packet based chassis:-
1> Run arp_update on LC's to resolve static route nexthops over backend
port-channel interfaces.
2> On Supervisor make sure arp_update exit gracefully
What I did:
Added Support for deployment_id parsing for Device Asic metadata.
Why I did:-
Deployment Id is used in BGP docker for FRR template generation. For multi-asic platforms running in namespace without deployment id as key in DEVICE_METADATA FRR template generation fails. This change is needed after this #10154 where if deployment_id is none we don't update DEVICE_METADA dictionary.
How I verify:-
Added unit-test.
Why I did it
Not all build environments have passwordless sudo enabled for all users
How I did it
Instead of using sudo to delete fsroot directories, mount them in a small, temporary docker container and delete them from there
How to verify it
Build in an environment where the build user does not have passwordless sudo enabled and confirm that no sudo password prompts are seen
- Why I did it
Support get_port_or_cage_type for RJ45 ports
- How I did it
Implement the new platform API get_port_or_cage_type
Fix the issue: unable to import SFP when chassis object is destructed
- How to verify it
Manually test and regression test
Signed-off-by: Stephen Sun <stephens@nvidia.com>
- Why I did it
Support Mellanox-SN4600C-C64 as T1 switch in dual-ToR scenario
This is to port #11032 and #11299 from 202012 to master.
Support additional queue and PG in buffer templates, including both traditional and dynamic model
Support mapping DSCP 2/6 to lossless traffic in the QoS template.
Add macros to generate additional lossless PG in the dynamic model
Adjust the order in which the generic/dedicated (with additional lossless queues) macros are checked and called to generate buffer tables in common template buffers_config.j2
Buffer tables are rendered via using macros.
Both generic and dedicated macros are defined on our platform. Currently, the generic one is called as long as it is defined, which causes the generic one always being called on our platform. To avoid it, the dedicated macrio is checked and called first and then the generic ones.
Support MAP_PFC_PRIORITY_TO_PRIORITY_GROUP on ports with additional lossless queues.
On Mellanox-SN4600C-C64, buffer configuration for t1 is calculated as:
40 * 100G downlink ports with 4 lossless PGs/queues, 1 lossy PG, and 3 lossy queues
16 * 100G uplink ports with 2 lossless PGs/queues, 1 lossy PG, and 5 lossy queues
Signed-off-by: Stephen Sun <stephens@nvidia.com>
* Support new platform SN2201 and RJ45 port
Signed-off-by: Kebo Liu <kebol@nvidia.com>
* remove unused import and redundant function
Signed-off-by: Kebo Liu <kebol@nvidia.com>
* fix error introduced by rebase
Signed-off-by: Kebo Liu <kebol@nvidia.com>
* Revert the special handling of RJ45 ports (#56)
* Revert the special handling of RJ45 ports
sfp.py
sfp_event.py
chassis.py
Signed-off-by: Stephen Sun <stephens@nvidia.com>
* Remove deadcode
Signed-off-by: Stephen Sun <stephens@nvidia.com>
* Support CPLD update for SN2201
A new class is introduced, deriving from ComponentCPLD and overloading _install_firmware
Change _install_firmware from private (starting with __) to protected, making it overloadable
Signed-off-by: Stephen Sun <stephens@nvidia.com>
* Initialize component BIOS/CPLD
Signed-off-by: Stephen Sun <stephens@nvidia.com>
* Remove swb_amb which doesn't on DVT board any more
Signed-off-by: Stephen Sun <stephens@nvidia.com>
* Remove the unexisted sensor - switch board ambient - from platform.json
Signed-off-by: Stephen Sun <stephens@nvidia.com>
* Do not report error on receiving unknown status on RJ45 ports
Translate it to disconnect for RJ45 ports
Report error for xSFP ports
Signed-off-by: Stephen Sun <stephens@nvidia.com>
* Add reinit for RJ45 to avoid exception
Signed-off-by: Stephen Sun <stephens@nvidia.com>
Co-authored-by: Stephen Sun <5379172+stephenxs@users.noreply.github.com>
Co-authored-by: Stephen Sun <stephens@nvidia.com>
Signed-off-by: Neetha John <nejo@microsoft.com>
Why I did it
Improve throughput and latency for 7260 deployments
How I did it
Update the dynamic threshold to 0 and ECN settings as 2mb/10mb/5%
How to verify it
Updated unit tests to use the modified values for 7260 ecn settings.
This PR is a required for changing the L3 IP forwarding Behavior to SoC in active-active toplogy. Basically, for getting a packet to be forwarded to the SoC IP in active-active topology, the requirement is to use the the LoopBack 3 IP inside SONiC device as the SRC IP. This is required because in active-active topology by default if the ToR wants to send packet to the SoC, it would pick the Vlan IP since that's the IP in the subnet, but since there are firewalls inside the SoC , the IP packets with Vlan IP as src IP in the IP header will be dropped. Hence to overcome this limitation, there is an iptable nat rule that is installed inside the kernel, with which all the packets which have SoC IP as destination IP, use Loopnack 3 IP as src in IP header
How I did it
check the config DB if the ToR is a DualToR and has an SoC IP assigned.
put an iptable rule
iptables -t nat -A POSTROUTING --destination -j SNAT --to-source "
Signed-off-by: vaibhav-dahiya vdahiya@microsoft.com
Why I did it
This PR is to add a flag to control whether to generate PORT_QOS_MAP|global entry or not.
It's because for some HWSKU, such as BackEndToRRouter and BackEndLeafRouter, there is no DSCP_TO_TC_MAP defined.
Hence, if the PORT_QOS_MAP|global entry is generated, OA will report some error because the DSCP_TO_TC_MAP map AZURE can not be found.
Jul 14 00:24:40.286767 str2-7050qx-32s-acs-03 ERR swss#orchagent: :- saiObjectTypeQuery: invalid object id oid:0x7fddb43605d0
Jul 14 00:24:40.286767 str2-7050qx-32s-acs-03 ERR swss#orchagent: :- meta_generic_validation_objlist: SAI_SWITCH_ATTR_QOS_DSCP_TO_TC_MAP:SAI_ATTR_VALUE_TYPE_OBJECT_ID object on list [0] oid 0x7fddb43605d0 is not valid, returned null object id
Jul 14 00:24:40.286767 str2-7050qx-32s-acs-03 ERR swss#orchagent: :- applyDscpToTcMapToSwitch: Failed to apply DSCP_TO_TC QoS map to switch rv:-5
Jul 14 00:24:40.286767 str2-7050qx-32s-acs-03 ERR swss#orchagent: :- doTask: Failed to process QOS task, drop it
This PR is to address the issue.
How I did it
Add a flag require_global_dscp_to_tc_map to control whether to generate the PORT_QOS_MAP|global entry. The default value for require_global_dscp_to_tc_map is true. If the device type is storage backend, the value is changed to false. Then the PORT_QOS_MAP|global entry is not generated.
How to verify it
Update the current test_qos_dscp_remapping_render_template to cover storage backend.