9ce4d19d5a199cffe2933d80e343a80ded398b4a (HEAD -> 201911, origin/201911) With the changes in PR:https://github.com/Azure/sonic-buildimage/pull/5289 access to redis unix socket is given to the redis group members. Many of sonic-util commands (especially in multi-asic) case use redis unix socket to connect to DB and thus those comamnd fails without providing sudo. This PR is continuation of PR: https://github.com/Azure/sonic-buildimage/pull/7002 where we default to use TCP for Redis if user is not root
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
1c12a4050fecabd88245c7aa64a61259bc00db3b (HEAD -> 201911, origin/201911)Allowing the first time FEC and AN configuration to be pushed to SAI (#1705) (#2196)
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
What I did:
Updated Jinja Template to enable BGP Graceful Restart based on device role. By default it will be enable only if the device role type is TorRouter.
Why I did:-
By default FRR is configured in Graceful Helper mode. Graceful Restart is needed on T0/TorRouter only since the device can go for warm-reboot. For T1/LeafRouter it need to be in Helper mode only
* [warm boot finalizer] only wait for enabled components to reconcile
Define the component with its associated service. Only wait for components that have associated service enabled to reconcile during warm reboot.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
When FECDisabled is set to true in minigraph.py, push 'fec' 'none' explicitly to config_db. When 'fec' is defined in port_config.ini do not override it with 'rs' for 100G
Backport of #7667 to 202012 branch.
What I did:
Backport FRR patch FRRouting/frr#8220 on FRR 7.2. Fixes the Issue FRRouting/frr#8213
Why I did:-
Because of this race-condition we saw GR getting triggered even though BGP shut is given on peer device.
How I verify:
After patching this fix GR is not triggered on doing BGP shut on peer.
Co-authored-by: Zhi Yuan (Carl) Zhao <zyzhao@arista.com>
Why I did it
Arista 7060 platform has a rare and unreproduceable PCIe timeout that could possibly be solved with increasing the switch PCIe timeout value. To do this we'll call a script for this platform to increase the PCIe timeout on boot-up.
No issues would be expected from the setpci command. From the PCIe spec:
"Software is permitted to change the value in this field at any
time. For Requests already pending when the Completion
Timeout Value is changed, hardware is permitted to use either
the new or the old value for the outstanding Requests, and is
permitted to base the start time for each Request either on when
this value was changed or on when each request was issued. "
How I did it
Add "platform-init" support in swss docker similar to how "hwsku-init" is called, only this would be for any device belonging to a platform. Then the script would reside in device data folder.
Additionally, add pciutils dependency to docker-orchagent so it can run the setpci commands.
How to verify it
On bootup of an Arista 7060, can execute:
lspci -vv -s 01:00.0 | grep -i "devctl2"
In order to check that the timeout has changed.
Identify the bad password set by sshd and fail auth before sending to
AAA server, and hence avoid possible user lock out by AAA.
For more details, please refer the parent/original PR #9123
* [radv] Support multiple ipv6 prefixes per vlan interface (#9934)
* Radvd.conf.j2 template creates two copies of the vlan interface when there are more than one ipv6 address assigned to a single vlan interface. Changed the format to add prefixes under the same vlan interface block.
resolves#8979 and #9055
How I did it
Remove the file static.conf.j2,which adds the default route on eth0 from frr docker
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
a1830c1761087bdc1f7433ebbb8d0bdc419da0d3 (HEAD -> 201911, origin/master, origin/HEAD, origin/201911) Fix OpenAPI spec to be readable by autorest (#101)
94805a39ac0712219f7dc08faa2cfdbf371dd177 Identify and report Vnet GUID for conflicting VNI (#99)
4832dfd677de72edc44d4eb8c1b60cfad79a3355 Static route expiry if not specified as persistent (#98)
5cc4358fb67b9e2a0da9a6691064e41f97ebebc2 (master) Add support for overlay ECMP (#96)
6822a46197daef060b4d00dba5153b04b163c43f [CI] Set diff cover threshold to 50% (#97)
dcc826a1503060b9a07e4510b4f48331c49e87dd Add PR diff coverage (#95)
e842c5ff317c67919dcbcab3358143cb9a16c9dd Generate code coverage for Unit Tests (#94)
f9bbed3cb86a3bab9a07745096835dbdbe5a4db6 Convert Unit Tests from unittest framework to pytest framework (#93)
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
[201911] Prevent other notification event storms to keep enqueue
unchecked and drained all memory that leads to crashing the switch
router (#981)
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
Why I did it
Fix some unreliability seen on emmc device with some AMD CPUs
How I did it
Added a kernel parameter to add quirks to
It depends on a sonic-linux-kernel change to work properly but will be a no-op without it.
Description for the changelog
Add emmc quirks for Upperlake
- Why I did it
Optimize thermal control policies to simplify the logic and add more protection code in policies to make sure it works even if kernel algorithm does not work.
- How I did it
Reduce unused thermal policies
Add timely ASIC temperature check in thermal policy to make sure ASIC temperature and fan speed is coordinated
Minimum allowed fan speed now is calculated by max of the expected fan speed among all policies
Move some logic from fan.py to thermal.py to make it more readable
- How to verify it
1. Manual test
2. Regression
Why I did it
To incorporate the below changes in DellEMC S6100, S6000 platforms.
Enable thermalctld
Backport Platform API changes from master branch.
How I did it
Remove 'skip_thermalctld:true' in pmon_daemon_control.json
Implement the platform API methods in the respective device files
How to verify it
Verified that platform data is displayed by show platform fan and show platform temperature commands.
Why I did it
Cannot retrieve and display the reboot-cause.
How I did it
Correct the platform initialization definition.
How to verify it
Manual reboot and then 'show reboot-cause'
Backport #9258 to 201911
Why I did it
When PSU is powered off, the PSU is still on the switch and the air flow is still the same. In this case, it is not necessary to set FAN speed to 100%.
How I did it
When PSU is powered of, don't treat it as absent.
How to verify it
Adjust existing unit test case
Add new case in sonic-mgmt
Conflicts:
platform/mellanox/mlnx-platform-api/sonic_platform/thermal_infos.py
- Why I did it
To include latest fixes.
1. On CMIS modules, after low power configuration, the firmware waited for the module state to be ModuleReady instead of ModuleLowPower causing delays.
2. When connecting Spectrum devices with optical transceivers that support RXLOS, remote side port down might cause the switch firmware to get stuck and cause unexpected switch behavior.
3. On rare occasions, when working with port rates of 1GbE or 10GbE and congestion occurs, packets may get stuck in the chip and may cause switch to hang.
4. When ECMP has high amount of next-hops based on VLAN interfaces, in some rare cases, packets will get a wrong VLAN tag and will be dropped.
5. Using SN4600C with copper or optics loopback cables in NRZ speeds, link may raise in long link up times ( up to 70 seconds).
6. When connecting SN4600C to SN4600C after Fastboot in 50GbE No_FEC mode with a copper cable, the link up time may take ~20 seconds.
- How I did it
Updated SDK submodule and relevant makefiles with the required versions.
- How to verify it
Build an image and run tests from "soni-mgmt".
Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
3ce811960f19c514a6ca0b1c611b2c453eb3a0a3 (HEAD -> 201911, origin/201911) [201911][port2alias]: Fix to get right number of return values (#1907)
e648290b51fa4ec4d465efe55aa4d27d16edb249 disk_Check: Scan & mount as RW when disk turns into Read-only (#1872)
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
Commits on Oct 26, 2021
Remove exec from platform_reboot call to prevent reboot hang (#1881) 066b5adf6d737a5bd174123d4d00dab4b6110cf6
Commits on Nov 17, 2021
[fdbshow]: Handle FDB cleanup gracefully. (#1918) c80321c98d0741f340d2900108bad7fed76c80cd
a0417f6f [Buffer Manager][201911] Reclaim unused buffer for admin-down ports (1837)
f77d393b [bufferorch][201911] Handle DEL_COMMAND for BUFFER_PG and BUFFER_QUEUE table (1787)
Signed-off-by: Stephen Sun <stephens@nvidia.com>
#### Why I did it
Upgrade Mellanox-SAI to 1.19.3 to support reclaiming reserved buffer on admin down ports
#### How I did it
To support reclaiming reserved buffer on admin down ports.
#### How to verify it
Regression test and manual test.
Why I did it
This PR aims to fix the bug in Monit template file of dhcp_relay container.
If Multi-VLAN were configured on device, multiple dhcrelay processes will be spawned in dhcp_relay container. Then there will be an entry for each dhcrelay process in Monit configuration file of dhcp_relay container.
Currently Monit template file of dhcp_relay container can not be rendered correctly to generate configuration file and will cause Monit can not start up.
#### Why I did it
Recently, the reserved buffer of admin-down ports is going to be reclaimed.
However, the way to do this differs among vendors.
We need to find a way to pass vendor information to swss docker.
#### How I did it
Fetch the ASIC vendor information when the docker is created and pass it to the docker as environment variable `ASIC_VENDOR`.
Why I did it
Fix error during building docker-sonic-mgmt-framework on 201911
Signed-off-by: Stephen Sun stephens@nvidia.com
How I did it
Cause:
While building sonic-mgmt-framework docker, it needs to install grpcio-tools version 1.20.0 which has a dependency on grpcio version >=1.20.0.
As >=1.20.0 is specified, it will install the latest version of grpcio.
It had worked well until the grpcio package version 1.40.0 was released 3 days ago.
Looks like some new dependencies are introduced by the latest version.
Fix:
Designate grpcio version 1.39.0 explicitly, which is the latest version of grpcio that worked well.