Commit Graph

5177 Commits

Author SHA1 Message Date
Stephen Sun
5c91f233ef
[Reclaim buffer][202106] Reclaim unused buffers by applying zero buffer profiles (#9062)
This is to backport community PR #8768 to 202106 branch

Why I did it
Support zero buffer profiles
Add buffer profiles and pool definition for zero buffer profiles
Support applying zero profiles on INACTIVE PORTS
Enable dynamic buffer manager to load zero pools and profiles from a JSON file

Signed-off-by: Stephen Sun stephens@nvidia.com

How I did it
Add buffer profiles and pool definition for zero buffer profiles

If the buffer model is static:
 - Apply normal buffer profiles to admin-up ports
 - Apply zero buffer profiles to admin-down ports
If the buffer model is dynamic:
 - Apply normal buffer profiles to all ports
 - buffer manager will take care when a port is shut down
 - Update buffers_config.j2 to support INACTIVE PORTS by extending the existing macros to generate the various buffer objects, including PGs, queues, ingress/egress profile lists

Originally, all the macros to generate the above buffer objects took active ports only as an argument
Now that buffer items need to be generated on inactive ports as well, an extra argument representing the inactive ports need to be added
To be backward compatible, a new series of macros are introduced to take both active and inactive ports as arguments
The original version (with active ports only) will be checked first. If it is not defined, then the extended version will be called
Only vendors who support zero profiles need to change their buffer templates
Enable buffer manager to load zero pools and profiles from a JSON file:

The JSON file is provided on a per-platform basis
It is copied from platform/<vendor> folder to /usr/share/sonic/temlates folder in compiling time and rendered when the swss container is being created.
To make code clean and reduce redundant code, extract common macros from buffer_defaults_t{0,1}.j2 of all SKUs to two common files:

One in Mellanox-SN2700-D48C8 for single ingress pool mode
The other in ACS-MSN2700 for double ingress pool mode
Those files of all other SKUs will be symbol link to the above files

Update sonic-cfggen test accordingly:
 - Adjust example output file of JSON template for unit test
 - Add unit test in for Mellanox's new buffer templates.

How to verify it
Regression test.
Unit test in sonic-cfggen
Run regression test and manually test.
2021-12-13 10:51:50 -08:00
Samuel Angebault
08c2c07fc0
[202106][Arista] Update arista platform library (#9483) 2021-12-09 18:29:59 -08:00
Arvindsrinivasan Lakshmi Narasimhan
e2b8e2d1da submodule update swss
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
2021-12-08 00:25:09 +00:00
Vivek Reddy
4856f98716 [Mellanox] [SKU] Fix the shared headroom for 4600C-C64 SKU (#8242)
Removed ingress_lossy_pool from the BUFFER_POOL list
Fx the the egress_lossless_pool_size value

Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
2021-12-08 00:23:07 +00:00
Volodymyr Samotiy
b22db5a52b
[Mellanox] [202106] Update SAI to v1.20.0.1 and SDK/FW to v4.5.1156/v2010.1152 (#9431)
- Why I did it
To include latest fixes.
SAI
* Reduce verbosity of warning message on shared memory already existing
* accuflow allocation support by key value

SDK
* Under various circumstances, Ethernet ports falsely showed that InfiniBand cables were connected.
* In SN4600C, at times, the link up time in both DAC and optics cables may, in the worst case, take up to 15 seconds.
* Using SN4600C with copper or optics loopback cables in NRZ speeds, link may raise in long link up times
* When ECMP has high amount of next-hops based on VLAN interfaces, in some rare cases, packets will get a wrong VLAN tag and will be dropped.
* When connecting Spectrum devices with optical transceivers that support RXLOS, remote side port down might cause the switch firmware to get stuck and cause unexpected switch behavior.
* Aggregation event is missing for WJH L2 drop reason 'Unicast egress port list is empty'.
* Tying the SCL and SDA of the optical modules to 3.3V causes errors.
* On SN4600, there was a delay of more than 10 seconds from the time a data packet is sent from CPU until it is transmitted through one of the switch ports.
* While using SN4600C system with Finisar FTLC1157RGPL 100GbE CWDM4 modules, intermittent link flaps across multiple ports may be observed.
* In Spectrum-2 and Spectrum-3 systems, link did not work in auto-negotiation when connected to Marvell PHY. KR mechanism has been enhanced to integrate with Marvell PHY. 
* The tunnel counter counts the drop packets now for Spectrum-2 and Spectrum-3 and consistent with Spectrum behavior and count the ECN dropped packets as well.
* When connecting SN3800 to Cisco-9000, fast-linkup flow will fail and will rise in the normal flow.
* Race condition in WJH library: when multiple threads load the LAG shared memory concurrently, the program may crash.
* Add WJH L2 drop reason 'Unicast egress port list is empty' as a new drop reason. 
* Fixed a memory leak in sx_api_port_sflow_statistics_get API. 
* During initialization flow, the command interface that is used by the minimal driver and SDK caused the collision in the firmware since the same buffer is used in the firmware for the two interfaces.

- How I did it
Updated SDK/SAI submodule and relevant makefiles with the required versions.

- How to verify it
Build an image and run tests from "sonic-mgmt".

Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
2021-12-06 21:54:14 +02:00
Mahesh Maddikayala
3f3dceb96a
[broadcom]: update bcm dnx gpl module pointer (#9442)
saibcm_modules_dnx submodule update contains a fix for kernel crash
2021-12-03 21:21:07 -08:00
Judy Joseph
70b24ad9c5 Update sonic-utilities submodule
9514857 [config reload][202106] Update command reference (#1944)
2021-12-01 19:21:45 -08:00
Judy Joseph
41a2d3e290 Update sonic-swss submodule
[8522f4f] Don't handle buffer pool watermark during warm reboot reconciling (#1987)
2021-12-01 11:13:38 -08:00
Junchao-Mellanox
f5c847bdbc
[system-health] [202106] No longer check critical process/service status via monit (#9366) 2021-12-01 10:26:06 -08:00
Junchao-Mellanox
e4ff4d2e3a [Mellanox] Fan speed should not be 100% when PSU is powered off (#9258)
- Why I did it
When PSU is powered off, the PSU is still on the switch and the air flow is still the same. In this case, it is not necessary to set FAN speed to 100%.

- How I did it
When PSU is powered of, don't treat it as absent.

- How to verify it
Adjust existing unit test case
Add new case in sonic-mgmt
2021-12-01 09:47:26 -08:00
Stephen Sun
fa0ae42e69 [Reclaim buffer] Common infrastructure update for reclaiming buffer (#9133)
- Why I did it
This is to update the common sonic-buildimage infra for reclaiming buffer.

- How I did it
Render zero_profiles.j2 to zero_profiles.json for vendors that support reclaiming buffer
The zero profiles will be referenced in PR [Reclaim buffer] Reclaim unused buffers by applying zero buffer profiles #8768 on Mellanox platforms and there will be test cases to verify the behavior there.
Rendering is done here for passing azure pipeline.
Load zero_profiles.json when the dynamic buffer manager starts
Generate inactive port list to reclaim buffer

Signed-off-by: Stephen Sun <stephens@nvidia.com>
2021-12-01 09:47:18 -08:00
shlomibitton
1ebe52847a
[DHCPv6 relay] [202106] Fix DHCPv6 design to support multiple VLANS (#9163)
- Why I did it
If multiple Vlans are configured to have DHCPv6 relay, only one relay instance is able to capture DHCP packets received from upstream, this is as a result of kernel design to operate this way (SO_REUSEPORT).
DHCPv6 transmit unicast packets to clients, only multicast packets can be captured on multiple application listening on the same UDP port.
This issue causing only one Vlan interface to get packets from servers.

- How I did it
Change the design to neglect Vlan isolation and run only one relay instance serving all Vlans with all configured DHCP servers.

- How to verify it
Run DHCPv6 relay test with 2 Vlans configured do have a DHCP relay.

Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>
2021-11-18 19:40:48 +02:00
gechiang
e0209f745a
[202106]Disable ALPM distributed hitbit thread that is used for debug purpose only but interfered with Other functional operations (#9293)
This is to address an issue where it was observed that SAI operations sometime may take a very long to time complete (over 45ms). It was determined that the ALPM distributed thread was causing this issue.
The fix is to disable this debug thread that has no functional purpose.

Preliminary tests looks fine. BGP neighbors were all up with proper routes programmed
interfaces are all up
Manually ran the fib test cases on 7050CX3 (TD3), TD2, TH, TH2, and TH3 based platforms and
thy all passed.
Note: the testing was done over 20201230 image and are porting this change to master branch.
No need to port this to 20201230 branch as a separate PR was already done for that branch. (#9190)

this PR is created to port the changes made by (#9199) but could not be cherry picked directly to 202106 branch.
2021-11-17 20:58:25 -08:00
Arvindsrinivasan Lakshmi Narasimhan
84226bdc57 sonic-utilities submodule update
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
2021-11-18 01:27:29 +00:00
Arvindsrinivasan Lakshmi Narasimhan
d4ed9e7e62 Swss submodule update
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
2021-11-18 00:39:15 +00:00
Judy Joseph
ff4613035e Update sub-modules
sonic-snmpagent
7e46eb1 [201911][RFC1213]: Initialize lag oid map in reinit_data (#234)
aa98ded CPU Spike because of redundant and flooded keyspace notifis handled (#230)

sonic-swss
bc4e334 [Mux orch] Handle setting unknown mux state (#1984)
bd3630b [tunnel decap] Change tunnel orch order (#1977)
87a673a Fix the option missing in kernel config issue (#1973)
57967a1 [orchagent] Fix group name of port-buffer-drop in flexcounterorch.cpp (#1967)

sonic-utilities
181e8b0 Fix the option missing in kernel config issue (#1888)
21c0cc0 [watermarkstat] Fix for error in processing empty array from couters db (#1810)
7f15755 [chassis][supervisor][show][interfaces]show interfaces command warning on Supervisor card (#1771)
2021-11-14 15:57:36 -08:00
zzhiyuan
2b3cca6e86 [Arista] Fix 7060 flex HWSKU SFP ports and Ethernet8/1 (#9173)
* [Arista] Fix 7060 flex HWSKU SFP ports and Ethernet8/1

* [Arista] Fix polarity flips for Arista 7060 on non-leading intfs

Co-authored-by: Zhi Yuan (Carl) Zhao <zyzhao@arista.com>
2021-11-14 15:25:39 -08:00
Shilong Liu
606d64378f Add artifacts for failure build to debug. (#9213) 2021-11-14 15:25:03 -08:00
tjchadaga
7050792f63 Fix for additional intf flap during fast-reboot (#9166) 2021-11-14 15:19:43 -08:00
Neetha John
f6511086e5 [minigraph] Add tagged vlan member support for storage backend (#9045)
Signed-off-by: Neetha John <nejo@microsoft.com>

Why I did it
Storage T0's have all vlan members as tagged

How I did it
Since currently minigraph does not have a unique way to identify if a vlan member is tagged/untagged and to ensure other scenarios are not broken, the logic used is to just update the vlan member type as 'tagged' when we determine that it is a storage backend device. This change will apply only to storage backend T0's since storage backend T1's will not have vlan member information

How to verify it
Updated the storage backend T0 testcases to check for tagged vlan members
Added testcase to check if a T1 and backend T1 device generates an empty vlan member table
Existing vlan member testcases are good enough for checking if any regression has been caused for regular T0's
Build sonic_config_engine-1.0-py3-none-any.whl successfully
2021-11-14 15:17:02 -08:00
dflynn-Nokia
33fce6afd1 [Nokia ixs7215] Platform API fixes (#9025)
* [Nokia ixs7215] Platform API fixes

This commit delivers the following fixes
    - Fix bug preventing access to second PSU eeprom
    - Fix bug preventing updates to front panel PSU status led
    - Fix SFP reset test case failure

* Fix LGTM alert
2021-11-14 15:16:14 -08:00
dflynn-Nokia
030551ba27 [Nokia ixs7215] Add new platform capabilities to platform.json (#9032)
This commit more fully declares the HW capabilities of the Nokia-7215
platform. For example, support for the threshold values associated with each
thermal sensor is described. The intent here is to inform the sonic-mgmt
platform test cases of which HW features are supported.

This commit must align with PR# 4521 within the sonic-mgmt git repo which is
currently under review. Any changes to that PR will need to be reflected in
this commit.
2021-11-14 15:15:56 -08:00
Saikrishna Arcot
52e9909373 docker-dhcp-relay: Fix waiting for interfaces to get set up (#9034)
Fix the check used to wait for interfaces to come up. The group name in
the supervisor config files has changed from isc-dhcp-relay to
dhcp-relay.

Also, in the wait script, wait 10 additional seconds after the vlans,
port channels, and any interfaces are up. This is because dhcrelay
listens on all interfaces (in addition to port channels and vlans), and
to ensure that it stays in a clean state during runtime, wait some extra
time to make sure that those interfaces are created as well.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2021-11-14 15:15:34 -08:00
shlomibitton
2f95e36c6e [Mellanox] Fix split configuration for Mellanox SN3800-D112C8 SKU SAI profile for fast-reboot performance (#8897)
- Why I did it
Wrong SKU configuration will lead to longer init flow.
This will affect fast-reboot feature by increasing the traffic downtime.
Since MLNX met the required downtime period with this SKU this bug found with a delay.

- How I did it
Add the required split labels for ports.

- How to verify it
Run fast-reboot with this platform using SN3800-D112C8 SKU.
2021-11-09 06:41:21 -08:00
Judy Joseph
99724508fd Update sonic-swss sonic-utilities
swss
73caba3 Allow interface type value none (#1991)

utilities
32e530f Allow interface type value none (#1902)
53f066c Fix log_ssd_health hang issue (#1904)
2021-11-05 19:24:53 -07:00
Junchao-Mellanox
3a8807e72f Allow interface type value none (#9098)
This PR allow user to set none value to interface type. So there is a way to achieve the goal via CLI:

config interface type XXX none
config interface speed XXX 10000
config interface type XXX CR
2021-11-05 19:13:33 -07:00
Praveen Chaudhary
d627587377 [sonic-breakout_cfg.yang]: Remove pattern from sonic-breakout_cfg.yang. (#6801)
Changes:
-- Remove pattern from sonic-breakout_cfg.yang, it is redundant.
-- test changes.

Signed-off-by: Praveen Chaudhary <pchaudhary@linkedin.com>
2021-11-05 18:52:30 -07:00
Nazarii Hnydyn
5b74f5dccf [teamd]: Send USR1/USR2 only to subscribers. (#8856)
To fix teamd signal handling, without which Process 'tlm_teamd' exited unexpectedly
2021-11-05 18:52:25 -07:00
Volodymyr Samotiy
badce1cbf6
[202106] [Mellanox] Update hw-mgmt to v7.0010.3330 (#9164)
* Changed Debian package dependency in order to support both python or python3 packages
* Fix Python scripts to be compatible with python2.7/python3 versions
* hw-mgmt: attributes: Fix PSU power sensor attributes capability

Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
2021-11-05 18:42:09 -07:00
Judy Joseph
5f2d926ca9 Update sonic-utilities submodule
57de13b [config] fix interface IPv6 address removal. (#1819) (#1909)
2021-11-03 15:06:30 -07:00
Judy Joseph
9276189c78 Update sonic-swss submodule
67278be [teammgrd]: Handle LAGs cleanup gracefully on Warm/Fast reboot. (#1934)
2021-11-02 22:57:24 -07:00
Stepan Blyshchak
234b5b64e4 [dockers] change RPC, DBG dockers version: put RPG, DBG sign in build metadata part of the version (#8920)
- Why I did it
In case an app.ext requires a dependency syncd^1.0.0, the RPC version of syncd will not satisfy this constraint, since 1.0.0-rpc < 1.0.0. This is not correct to put 'rpc' as a prerelease identifier. Instead put 'rpc' as build metadata in the version: 1.0.0+rpc which satisfies the constraint ^1.0.0.

- How I did it
Changed the way how to version in RPC and DBG images are constructed.

- How to verify it
Install app.ext with syncd^1.0.0 dependency on a switch with RPC syncd docker.
Signed-off-by: Stepan Blyshchak <stepanb@nvidia.com>
2021-11-02 22:54:30 -07:00
Stepan Blyshchak
37282b13b9 [slave.mk] record the package versions by expanding the list of dependencies (#8730)
- Why I did it
docker-orchagent was missing libsairedis version label.

E.g. Currently only swsscommon is recorded in the labels:
admin@arc-switch1038:~$ docker inspect docker-orchagent | grep versions
                "com.azure.sonic.versions.libswsscommon": "1.0.0"
With this change libsairedis is also recorded:
admin@arc-switch1038:~$ docker inspect docker-orchagent | grep versions
                "com.azure.sonic.versions.libswsscommon": "1.0.0"
                "com.azure.sonic.versions.libsairedis": "1.0.0"
- How I did it
By expanding the list of dependencies.

- How to verify it
Build and verify the label for libsairedis exists in docker-orchagent.

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
2021-11-02 22:54:24 -07:00
Shilong Liu
206f7b66b5 Fix azp pipeline file which is involved by former PR (#8616) 2021-11-01 10:31:23 -07:00
Sudharsan Dhamal Gopalarathnam
bddc18c3a6
[202106][sonic_release]Add release file for 202106 (#9126)
Adding a release file for 202106. Without it 'release' in sonic_version.yml appears to be none. It should be 202106. This is required for QoS scripts in sonic-mgmt to pick a schema based on release branch

Signed-off-by: Sudharsan Dhamal Gopalarathnam <sudharsand@nvidia.com>
2021-11-01 08:22:41 -07:00
Junchao-Mellanox
48d412d49f
[Mellanox] Fix issue: PSU model/serial/revision info should be updated after replacing PSU (#9040) 2021-11-01 09:57:09 +05:30
Judy Joseph
ada705050d Update sonic-swss submodule
[vlanmgr]Fix for STATE_DB port check logic (#1980)

Update sonic-utilities submodule

434c2fb [sonic-package-manager] update FEATURE entries on upgrade (#1803)
d1ca400 [sonic-package-manager] code style fixes and enhancements (#1802)
2021-10-31 19:50:58 -07:00
Stepan Blyshchak
758518400f [Makefile.cache] fix an issue that non-direct dependencies are not accounted in component hash calculation (#8965)
#### Why I did it

Fixed an issue that changing SDK version leads to cache framework taking cached syncd RPC image rather then rebuilding syncd RPC based on new syncd with new SDK.

Investigation showed that cache framework calculates a component hash based on direct dependencies. Syncd RPC image hash consists of two parts: one is the flags of syncd RPC (platform, ENABLE_SYNCD_RPC) and syncd RPC direct dependencies makefiles. None of the syncd RPC direct dependencies are modified when SDK version changes, so hash is unchanged.

#### How I did it

To fix this issue, include the hash of dependencies into current component hash calculation, e.g.:

In calcultation of the hash ```docker-syncd-mlnx-rpc.gz-274dfed3f52f2effa9989fc-39344350436f9b06d28b470.tgz```, the hash of syncd is included: ```docker-syncd-mlnx.gz-48ee88ac54b201e0e107b15-7bbea320025177a2121e440.tgz``` in which the hash of SDK is included.

#### How to verify it

Build with cache enabled and check that changing SDK version leads to a different hash of syncd rpc image:

SDK version 4.5.1002:
```
docker-syncd-mlnx.gz-48ee88ac54b201e0e107b15-7bbea320025177a2121e440.tgz
docker-syncd-mlnx-rpc.gz-274dfed3f52f2effa9989fc-39344350436f9b06d28b470.tgz
```

SDK version 4.5.1002-005:
```
docker-syncd-mlnx.gz-18baf952e3e0eda7cda7c3c-e5668f4784390d5dffd55af.tgz
docker-syncd-mlnx-rpc.gz-4a6e59580eda110b5709449-552f76be135deaf750aeab2.tgz
```
2021-10-31 19:25:52 -07:00
Vivek Reddy
91628135e9
[hostcfgd] [202106] Fixed the brief blackout in hostcfgd using SubscriberStateTable (#9031)
#### Why I did it
Ported https://github.com/Azure/sonic-buildimage/pull/8861 to 202106 as it couldn't be cherry-pick directly
2021-10-29 08:55:59 -07:00
judyjoseph
b125f5d564
[202106] Advance the broadcom SAI to 5.0.0.11 (#9095)
* Advance the broadcom SAI to 5.0.0.11
* Update saibcm-modules-dnx to take in Knet MTU fix
2021-10-28 11:11:58 -07:00
Junchao-Mellanox
2a3738ead5
[Mellanox] Add a trigger to set LED to blink (#8995)
Depends on Mellanox hw-mgmt 7.0010.3300

Why I did it
Adjust LED logical according to hw-mgmt change.

How I did it
Add a trigger to set LED to blink.

How to verify it
Manual test
2021-10-28 10:01:40 -07:00
Volodymyr Samotiy
fa63c056d1
[submodule] Update sonic-utilities pointer (#9072)
- Why I did it
To include the following changes:
* b684149 [techsupport] [202106] Removed -i option for docker commands and Improved Error Reporting (#1843)

- How I did it
Updated sonic-utilities submodule pointer.

- How to verify it
Build an image and run sonic-mgmt tests.
2021-10-28 13:27:02 +03:00
Qi Luo
887aba37b6 [build] Use pip to install setup.py dependency instead of python setup.py install (#8997)
Fix a recent build error introduced by a pre-release redis-py. This is a general issue because `python setup.py install` (ie `easy_instal`) does not ignore pre-release versions. The fix is suggested by https://github.com/pypa/setuptools/issues/855#issuecomment-583803959
2021-10-27 22:18:58 -07:00
Stepan Blyshchak
a8235728f3 [swss.sh] fix an issue that dependent services are not read from a file (#8943)
This is due to the SERVICE variable declared after reading a file

#### Why I did it

To fix an issue that dhcp_relay does not restart with swss.

#### How I did it

Fixed in the swss.sh script

#### How to verify it

sudo systemctl restart swss
verify dhcp_relay restarts as well.
2021-10-26 22:44:56 -07:00
DavidZagury
a68a3a176e [Mellanox] Upgrade Mellanox firmware tools to 4.17.2-12 (#8978)
- Why I did it
Bug fix:
bad_param request due to missing parser rest command while running mlxlink

- How I did it
Advance to MFT tool version to 4.17.2-12.

- How to verify it
Manually tested on all mellanox platforms.
2021-10-26 08:58:50 -07:00
Alexander Allen
0d70e8f7e4
Update kernel pointer (#9054)
563666e Backport required mellanox kernel patches for hw-mgmt 3300 to kernel 4.19.152 (#240)
2021-10-26 13:33:07 +03:00
Alexander Allen
978584b3d1
[202106] [Mellanox] Upgrade hw-mgmt to V.7.0010.3300 (#9020)
- Why I did it
Upgrade to the latest version of hardware management in order to incorporate the latest bugfixes and drivers in the kernel.

- How I did it
Updated the version number and submodule for hw-mgmt.

- How to verify it
This has been verified on all Mellanox platforms through a combination of sonic-mgmt tests and other internal verification.
2021-10-26 13:32:28 +03:00
Judy Joseph
867daa8be6 Update sonic-utilities submodule
25f7c79 [sonic-package-manager] remove make_python_identifier (#1801)
84a7602 [sonic-package-manager] stop service explicitelly before uninstalling package (#1805)
2021-10-20 18:26:06 -07:00
Judy Joseph
599ff965ba Update sonic-swss submodule
88cfbc3 [Buffermgr]Graceful handling of buffer model change (#1956)
7f87a12 Orchagent validates mirror session queue parameter against maximum value from SAI (#1957)
2021-10-20 18:25:08 -07:00
Dmytro
0bd26909b2 [frrcfgd][bgpcfgd] Add portchannel support (#8911)
* To add portchannel support in frrcfgd and bgpcfgd
* Update is_zero_ip() to handle portchannel name
Signed-off-by: d-dashkov <Dmytro_Dashkov@Jabil.com>
2021-10-20 18:15:12 -07:00