Commit Graph

5176 Commits

Author SHA1 Message Date
vganesan-nokia
b313d4d092
[systemlag] Lag id boundary set for system lag (#6488)
Signed-off-by: vedganes <vedavinayagam.ganesan@nokia.com>

Changes for setting platfrom specific lag id boundary id in the chassis
app db. The platfrom specific lag id boundaries are supplied via
chassisdb.conf. The lag_id_start and lag_id_end boundary values sourced
from this file are set in chassis app db which will be used by lag id
allocator to allocate unique lag id in atomic fashion
2021-03-30 23:21:53 -07:00
Stephen Sun
ecaf97d8a3
[mellanox]: Integrate hw-mgmt package V.7.0010.2002 (#7148)
Integrate hw-management package V.7.0010.2002

Bug fixes:
Removing critical thermal zones to prevent unexpected software system shutdown:
*Kernel 4.9 -0071-mlxsw-core-Remove-critical-trip-point-from-thermal-z.patch
*Kernel 4.19 -076-mlxsw-core-Remove-critical-trip-point-from-thermal-z.patch
Removing redundant link for cpld3 for fixed systems (SN2100, SN2010).
Fix an issue with missed attribute for cpld3 (port CPLD) for SN2700, SN2410.

Signed-off-by: Stephen Sun <stephens@nvidia.com>
2021-03-30 18:30:15 -07:00
kakkotetsu
e11397df1d
[restapi] fix python version during restapi startup (#7056)
changed from python3 to python in supervisord.conf.
2021-03-30 13:54:37 -07:00
vdahiya12
4b2e83ec16
[sonic-platform-daemons] submodule update (#7143)
this PR updates the following commits in sonic-platform-daemons
260cf2d [xcvrd] change firmware information fields name inside MUX_CABLE_INFO table for Y cable (#165)
cfa600f [thermalctld] Initialize fan led in thermalctld for the first run (#167)
8509f43 [thermalctld] Refactor to allow for greater unit test coverage; Add more unit tests (#157)
70f4e7b [syseepromd] Update warning message to be more informative (#160)

Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
2021-03-30 12:26:18 -07:00
Stephen Sun
423f6c7f30
[submodule]: update submodule head for sonic-swss (#7187)
[SFlowMgr] Sflow Crash on 200G ports handled (#1683)
Remove PGs from an administratively down port. (#1677)
Stablize the test case (#1679)
Revert "Revert "[buffermgr] Support maximum port headroom checking (#1607)" (#1675)" (#1682)

Signed-off-by: Stephen Sun <stephens@nvidia.com>
2021-03-30 10:17:51 -07:00
Yilan
d3fae0080e
[build]: Update versions_manager.py to make versions map key unique (#7146)
py2/py3/deb packages names are case insensitive, and the versions map
key should be the same for packages whose name can have different cases.

For example, in files/build/versions/default/versions-py3, package
"click==7.1.2" is pinned; and in
files/build/versions/dockers/docker-sonic-vs/versions-py3, package
"Click==7.0" is pinned.
Without this fix, the aggregated versions-py3 file used for building
docker-sonic-vs looks like below:
...
click==7.1.2
Click==7.0
...
However, we actually want "click==7.0" to overwrite "click==7.1.2" for
docker-sonic-vs build.
2021-03-30 08:34:25 -07:00
Joe LeVeque
9dd45da854
[sonic-psud] Depend on sonic-platform-common (#7182)
Unit tests for psud depend on sonic-platform-common as of Azure/sonic-platform-daemons#154
2021-03-30 08:32:07 -07:00
guxianghong
f1135206f8
[Centec] syncd containers based on buster should use python3 (#7185)
Upgrade python2 to python3 for supervisord.conf in docker-syncd-centec

Co-authored-by: shi lei <shil@centecnetworks.com>
2021-03-30 08:31:21 -07:00
Dmytro Shevchuk
d8627e6414
[yang] update yang model, add autoneg to sonic-port (#5963)
Dynamic Port Breakout fall in case "autoneg" field exist in config_db.

- How I did it
Added "autoneg" field in sonic-port yang model.

- How to verify it
Add "autoneg" field into config_db like this:

"Ethernet8": {
    "index": "2", 
    "lanes": "8,9,10,11", 
    "fec": "rs", 
    "pfc_asym": "off", 
    "mtu": "9100", 
    "alias": "Ethernet8", 
    "admin_status": "up", 
    "autoneg": "on", 
    "speed": "100000",
},
2021-03-30 08:27:58 -07:00
Praveen Chaudhary
a1992c054f
[sonic-portchannel.yang]: YANG models for PORTCHANNEL_MEMBER table. (#7020)
Changes:
-- YANG models for PORTCHANNEL_MEMBER table.
-- Yang Model Test.
-- Yang Mgmt Test with PORTCHANNEL_MEMBER table in config_db.json

Signed-off-by: Praveen Chaudhary <pchaudhary@linkedin.com>
2021-03-30 08:26:37 -07:00
Joe LeVeque
67c57990f6
[sonic-thermalctld] Depend on sonic-platform-common (#7181)
Unit tests for thermalctld depend on sonic-platform-common as of https://github.com/Azure/sonic-platform-daemons/pull/157
2021-03-29 23:39:47 -07:00
Myron Sosyak
08520941b0
[barefoot]: Updated SDK packages to 20210324 (#7142)
Update unsupported SAI attr ('SAI_ACL_TABLE_ATTR_FIELD_OUTER_VLAN_ID') to fix issues on acl table create
2021-03-29 15:28:49 -07:00
arheneus@marvell.com
e38e374077
[marvell]: Marvell prestera kernel driver (#7066)
Build Marvell kernel driver for prestera sai sdk
Builds interrupt and dma kernel driver
Removed the older method pre-compiled kernel module debian package and its makefile
2021-03-29 15:27:01 -07:00
Mykola Gerasymenko
eb12244c3f
[submodule]: update swss-common (#7121)
- fix getting hash from redis db (#465)
- [dbconnector] Initialize redisContext (#464)
2021-03-29 15:23:50 -07:00
yozhao101
0f0faeeadd
[sonic-utilites] Update submodule. (#7172)
- 4d89510 - 2021-3-28 | [reboot] User-friendly reboot cause message for kernel panic (#1486) [Yong Zhao]
- 1f1696a - 2021-3-26 | Add all parameter to show/clear queue watermark command (#1149) [ Petro Bratash]

Signed-off-by: Yong Zhao <yozhao@microsoft.com>
2021-03-28 22:41:40 -07:00
ArthiSivanantham
cc36a145cb
[yang]: SONiC Yang model for PORTCHANNEL_INTERFACE table (#7034)
* SONiC Yang model for PORTCHANNEL_INTERFACE table

Signed-off-by: Arthi Sivanantham <arthi_sivanantham@dell.com>
2021-03-28 00:51:11 -07:00
Praveen Chaudhary
d3d0c2623d
[yang-models]: Remove PLY Extensions and change translation code. (#6915)
* [yang-models]: Remove PLY Extensions and change translation code.

   With assumption that TABLE_SEPARATOR and ENTRY_SEPARATOR for configDB is always "|",
   translation from configDB.json to sonicYang.json can be done based on keys specified
   in YANG Lists inside YANG models. So removing extensions is good idea.

Changes:
-- Remove use of regex in Translation code.
-- Remove regex Extensions from YANG models.
-- Improved debugging i.e. log on stdout in case of any Exception from sonic-yang-mgmt,
   so that failed tests can be debugged faster. Also this is good to debug Dynamic
   port breakout issues.
-- Minor Test changes.

Co-authored-by: lguohan <lguohan@gmail.com>
2021-03-28 00:49:10 -07:00
Dong Zhang
6d23a78ffb
[sonic-yang-model] fix ip_type value in test cases (#6968)
IPV4ANY is not valid value, fix to IPv4ANY

without this change, test case failed sometimes when the validation on IP_TYPE happens first and then PACKET_ACTION.
2021-03-27 21:16:06 -07:00
Joe LeVeque
c651a9ade4
[dockers][supervisor] Increase event buffer size for process exit listener; Set all event buffer sizes to 1024 (#7083)
To prevent error [messages](https://dev.azure.com/mssonic/build/_build/results?buildId=2254&view=logs&j=9a13fbcd-e92d-583c-2f89-d81f90cac1fd&t=739db6ba-1b35-5485-5697-de102068d650&l=802) like the following from being logged:

```
Mar 17 02:33:48.523153 vlab-01 INFO swss#supervisord 2021-03-17 02:33:48,518 ERRO pool supervisor-proc-exit-listener event buffer overflowed, discarding event 46
```

This is basically an addendum to https://github.com/Azure/sonic-buildimage/pull/5247, which increased the event buffer size for dependent-startup. While supervisor-proc-exit-listener doesn't subscribe to as many events as dependent-startup, there is still a chance some containers (like swss, as in the example above) have enough processes running to cause an overflow of the default buffer size of 10.

This is especially important for preventing erroneous log_analyzer failures in the sonic-mgmt repo regression tests, which have started occasionally causing PR check builds to fail. Example [here](https://dev.azure.com/mssonic/build/_build/results?buildId=2254&view=logs&j=9a13fbcd-e92d-583c-2f89-d81f90cac1fd&t=739db6ba-1b35-5485-5697-de102068d650&l=802).

I set all supervisor-proc-exit-listener event buffer sizes to 1024, and also updated all dependent-startup event buffer sizes to 1024, as well, to keep things simple, unified, and allow headroom so that we will not need to adjust these values frequently, if at all.
2021-03-27 21:14:24 -07:00
vmittal-msft
bcff251a71
[broadcom]: Updated bcmsai to 4.3.3.3 (#7090)
To add latest SAI drop REL_4.3.3.3 to SONIC which addresses the following CSP cases:

CS00012058054: [4.3][IPinIP][TTL-PIPE] IPinIP TTL Pipe Mode is NOT working it is behaving UNIFORM mode even programed as PIPE mode
CS00011227466: [4.3] Warmboot support with tunnel encap
2021-03-27 21:13:28 -07:00
maksymbelei95
aefe1455af
[sflow] Update version of hsflowd (#7137)
* Updating version of hsflow daemon to apply
  fix, which resolves problem of switching
  between IPv4 and IPv6, in case when the
  IPv4 has deleted for the interface.


The new release of hsflowd contains the fix for the issue: sflow/host-sflow@2703ecb

How I did it
HSFLOWD_VERSION env variable has changed in the rules to be pointed to the latest release of hsflowd.

How to verify it

sudo config sflow enable
sudo config loopback add Loopback1
sudo config int ip add Loopback1 a84f:97ff:fea7:33a5::fe80/64
sudo config int ip add Loopback1 192.168.101.1/24
sudo config sflow agent-id add Loopback1
sudo config sflow collector add Collector1 192.168.101.1
sudo config sflow collector add Collector2 a84f:97ff:fea7:33a5::fe80
use sudo sflowtool -p 6343 -l for checking sflow data
remove and add again the ipv4 entry of Loopback1.

hsflowd should change agent ip from IPv4 to IPv6 and wise versa, depending on IPv4 entry present or not.
Switching between IPs is being performed by hsflowd, based on IP address priority ranking.

Signed-off-by: Maksym Belei <Maksym_Belei@jabil.com>
2021-03-27 21:09:57 -07:00
Junchao-Mellanox
48042b7256
[Mellanox] Use softlink for sfputils on MSN4410 platform (#7092)
The file device/mellanox/x86_64-mlnx_msn4410-r0/plugins/sfputil.py is not a software link for device/mellanox/x86_64-mlnx_msn2700-r0/plugins/sfputil.py. And it is still using python2 syntex which causes some SFP CLI error. The PR is to change it to a softlink and add 4410 support in device/mellanox/x86_64-mlnx_msn2700-r0/plugins/sfputil.py.
2021-03-27 11:56:48 -07:00
dependabot[bot]
247ac7ba17
Bump lxml from 4.6.2 to 4.6.3 in /src/sonic-config-engine (#7122)
Bumps [lxml](https://github.com/lxml/lxml) from 4.6.2 to 4.6.3.
- [Release notes](https://github.com/lxml/lxml/releases)
- [Changelog](https://github.com/lxml/lxml/blob/master/CHANGES.txt)
- [Commits](https://github.com/lxml/lxml/compare/lxml-4.6.2...lxml-4.6.3)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-03-27 11:54:38 -07:00
Volodymyr Samotiy
b30595ac49
[Mellanox] Update SDK to 4.4.2508 and FW to xx.2008.2508 (#7141)
Fix the following issues:

Spectrum-2, Spectrum-3 | Port | Fix link issue when using 25 GbE rate between two ports while one is on Spectrum-2-based system and the other is on Spectrum-3-based system
All | warmboot | fail to upgrade from earlier SONiC versions with official SDK/FW 4.4.2306 (was on SONiC 201911)
All | What-Just-Happened | When enabling or disabling WJH under high traffic load to the host CPU, in very specific and low probability conditions, an error could occur, that may result in loss of data, channel failure or in extreme cases SW failure

Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
2021-03-27 11:51:49 -07:00
maksymbelei95
2ef0eb8c08
[Submodule update] - sonic-utilities (#7144)
* 1ee04fb (HEAD -> master, origin/master, origin/HEAD) Modified the tests to use mock functionality of get_child_port function under portconfig utility (#1464)
* 99d251f Enable PFCWD only on ports where PFC is enabled (#1508)
* eb7945f Warmboot script improvements - timeout exec, disable swss autorestart, remove trap (#1495)
* c7d4947 [show] Fix int status of LAGs, configured as Vlan members (#1478)

Signed-off-by: Maksym Belei <Maksym_Belei@jabil.com>
2021-03-27 11:50:20 -07:00
Joe LeVeque
b512394398
[docker-gbsyncd-vs] Run new gbsyncdmgrd in lieu of deprecated gbsyncd_startup.py (#7154)
To improve management of docker-gbsyncd-vs. gbsyncd_startup.py simply spawned syncd processes and then exited. In that case, supervisord would no longer manage any processes in the container, and thus there was no way to know if a critical process had exited.

I recently created gbsyncdmgrd to be a more complete, robust replacement for gbsyncd_startup.py.

NOTE: This PR is dependent on the inclusion of gbsyncdmgrd in the sonic-sairedis repo. A submodule update is pending at
#7089
2021-03-27 11:42:23 -07:00
Shi Su
de64c4e34c
[bgp]: Reduce bgp connect retry timer to 10 seconds (#7169)
The default bgp connect retry timer is 120 seconds. A reconnection will happen 120 seconds if the initial connection fails. This PR aims to allow a more frequent retry.
2021-03-27 11:36:56 -07:00
Ying Xie
832e63554a
[Arista] add MMU configuration for Arista 7260 C64 (#7027)
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2021-03-26 11:10:19 -07:00
Mykola Gerasymenko
e8f4a8b148
[barefoot]: Add psample module to load at boot time on BFN platform (#7164)
The psample module was not loaded on barefoot platform. The loading of this module is a prerequisite for testing SFlow.

* add `.gitignore` to the `barefoot` subdirectory to overwrite ignore "platform/**/debian/*" in the root directory
2021-03-26 11:08:28 -07:00
Volodymyr Boiko
e1d8d1895b
[platform][barefoot] Lazy initialize fans and thermals list (#7103)
Initialize fans and thermals lists on demand; make them properties in order to reduce Chassis object initialization time

Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
2021-03-26 10:18:54 -07:00
Joe LeVeque
bf43dd375a
[sonic-sairedis][sonic-swss] Update submodules (#7089)
Update sonic-sairedis submodule and also update sonic-swss submodule as there are interdependent changes.

* src/sonic-sairedis 13474d1...bc58b0f (12):
  > Add gbsyncdmgrd; deprecate gbsyncd_startup.py (#809)
  > Remove gbsyncd_start.sh (#808)
  > [gbsyncd] Fix shebang in gbsyncd_startup.py; Make script executable (#807)
  > [saiasiccmp] Add saiasiccmp tool to compare 2 asic views (#791)
  > [configure] Add -Wno-psabi to remove "passing argument changed in GCC 7.1" (#799)
  > Update FlexCounter.cpp, use m_pollInterval in MUTEX lock (#797)
  > [vs] Add special warm boot logic to populate default attributes (#796)
  > [ci]: add vstest (#795)
  > [tests] Add macsec unittest (#782)
  > [debian/control] libsairedis-dev depends on libzmq5-dev (#794)
  > [ci]: use build template (#793)
  > Rename duplicate file name (#773)

* src/sonic-swss 0b0d24c...5adb73e (47):
  > Initialize system port type variable (#1681)
  > [Dynamic Buffer Calc] Enhance the field checking in table handling (#1680)
  > Handle the clear request for 'Q_SHARED_ALL' (#1653)
  > [MuxOrch] FDB ageout safety check (#1674)
  > Deactivate mirror session only when session status is true in updateLagMember (#1666)
  > Revert "[buffermgr] Support maximum port headroom checking (#1607)" (#1675)
  > reduce severity of log to info in case of flush on non-existing member (#1669)
  > Revert "[Dynamic buffer calc] Bug fix: Remove PGs from an administratively down port. (#1652)" (#1676)
  > [Dynamic buffer calc] Bug fix: Remove PGs from an administratively down port. (#1652)
  > [acl] Move ACL table constants to acltable.h (#1671)
  > [nbrmgrd] added function to parse IP address from APP_DB (#1672)
  > [MUX/PFCWD] Use in_ports for acls instead of seperate ACL table (#1670)
  > [vog/systemlag] Voq lagid allocator (#1603)
  > Add table descriptions for dynamic buffer calculation to the documents (#1664)
  > [vstest/subintf] Add vs test case to validate processing sequence of APPL DB keys (#1663)
  > Remove vxlanmgrd dependency on orchagent (#1647)
  > Keep attribute order in bulk mode (#1659)
  > [mux] VS test for neigh, route and fdb (#1656)
  > [linksync] Netdev oper status determination using IFF_RUNNING (#1568)
  > [portorch] parse on/off value from autoneg (#1658)
  > [intfsorch] Create subport with the entry contains necessary attributes (#1650)
  > [ci]: Purge swss before install (#1654)
  > Update StateDB with error if state change failed, Update APP_DB in all state chg req (#1662)
  > Added changes to handle dependency check in FdbSyncd and FpmSyncd for warm-boot (#1556)
  > [synchronous mode] Add failure notification for SAI failures in synchronous mode (#1596)
  > [acl] Enable VLAN ID qualifier for ACL rules (#1648)
  > Updated PFCWD to use single ACL table for PFCWD and MUX (#1620)
  > [orchagent] Increase SAI REDIS response timeout to support FW upgrade during init (Mellanox only). (#1637)
  > [vstest/nhg]: use dvs_route fixture to make test_nhg more robust
  > [vstest]: add dvs_route fixture
  > [vstest/subintf] Update vs tests to validate physical port host interface vlan tag attribute (#1634)
  > Remove useless header  in macsecorch (#1628)
  > Add SAI_INGRESS_PRIORITY_GROUP_STAT_DROPPED_PACKETS counter, create new FlexCounter group (#1600)
  > fixed unsupported resource issue (#1641)
  > [test_virtual_chassis]: use wait_for to make test more robust (#1640)
  > spell check fixes (#1630)
  > [bufferorch] Handle NOT IMPLEMENTED status returned during set attr operation (#1639)
  > [ci]: run vstest
  > [test_virtual_chassis]: use wait_for function to improve test robustness
  > [Mux] Neighbor handling based on FDB entry (#1631)
  > [ci]: use build template (#1633)
  > Log level change from ERR to INFO for fetch systemports issue (#1632)
  > Migrate serdes programming to port serdes object (#1611)
  > [tests] Remove legacy saiattributelist.h dependency (#1608)
  > [buffermgr] Support maximum port headroom checking (#1607)
  > Support shared headroom pool on top of dynamic buffer calculation (#1581)
  > Fix the compiling errors in gcc9 (#1621)
2021-03-26 10:01:08 -07:00
lguohan
dbc7a45c73
[ci]: gzip the vm image disk and memdmp (#7131)
Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-03-25 10:15:21 -07:00
Junchao-Mellanox
93a54450d3
Fix issue: should not initialize led color in __init__ file as platform API will be called by multiple daemons (#7114)
- Why I did it
The existing Fan led and Psu led object initialize itself to green color in init method. However, there are multiple daemons calls sonic platform API and there could be a case that:

A PSU is removed from system
Reboot switch
psud detects that 1 PSU is missing and set PSU led to red
Other daemon just start up and call sonic platform API, the API set PSU led to green by call PsuLed.init
This PR is a partial fix for the issue. As we also need guarantee that the led is initialized with a correct value. I checked existing psud and thermalctld code. psud always initialize the PSU led color on boot up, thermalcltd need some changes to initialize led color on the first run

- How I did it
Remove the led color initialization code from FanLed.init and PsuLed.init

- How to verify it
Manual test
2021-03-25 14:28:33 +02:00
abdosi
4be9844728
[Submodule update] sonic-snmpagent (#7107)
c20bf60 Qi Luo  Mon Mar 15 14:28:31 2021 -0700  Implement rfc4363 FdbUpdater for lag inside vlan (#203)
292024a abdosi  Mon Mar 15 12:15:21 2021 -0700  Updated lldpRemManAddrTable to use all the management ip address associated with interface. (#201)
9b83459 liushilongbuaa  Fri Mar 12 14:35:23 2021 +0800  [CI] Setup dummy azure pipeline (#198)

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
2021-03-24 18:46:18 -07:00
Renuka Manavalan
1bc1a2413a
[submodule]: SONiC-utilities submodule update: (#7129)
* 553936b (HEAD, origin/master, origin/HEAD, master) route_check: Fix hanging & logging level (#1520)
* ed45412 [show][config] add support for setting and displaying switching modes on Y cable (#1501)
* bf46638 Handling error scenario of adding port to Vlan which is part of LAG (#1516)
* ae39883 Fix bug: show vlan config for vlan with no members (#1503)
* 3a482ac [test] Update unit test coverage for command 'show mac' (#1504)
* 4a0c010 [config] Disable/enable container monitoring when stopping/starting services (#1499)
2021-03-24 12:55:31 -07:00
Volodymyr Samotiy
c7cc4b465b
[Mellanox] Update FW to xx.2008.2424 (#7118)
Fixed issues:
* Mellanox SN-2700 breakout port not linking up with QSA

Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
2021-03-22 18:27:36 -07:00
trzhang-msft
8f83b33e02
DHCP Relay: add option -si to support using src intf ip in relay (#7052)
* add option si to support using src intf ip in relay
2021-03-19 13:27:14 -07:00
Joe LeVeque
a3cafee02c
[docker-gbsyncd-vs] Run gbsyncd_startup.py directly (#7084)
Eliminate the need for `gbsyncd_start.sh`, which simply calls `exec "/usr/bin/gbsyncd_startup.py"`. The shell script is unnecessary.

Once this PR merges, we can remove `gbsyncd_start.sh` from the sonic-sairedis repo.
2021-03-19 10:52:28 -07:00
Prince Sunny
28cb43cb42
[Submodule update] - sonic-utilities (#7091)
Updated with following commits:

19d4042 - 2021-03-16 : Add self timeout and crash if exceeded. (#1502) [Renuka Manavalan]
aa71231 - 2021-03-16 : [reboot]: Stop mux before reboot on dual ToR (#1500) [Lawrence Lee]
fbad274 - 2021-03-16 : Add 'show' and 'clear' command for PG drop (#1461) [Andriy Yurkiv]
0de99c3 - 2021-03-12 : [decode-syseeprom] When reading from DB, display CRC-32 and all Vendor Extensions (#1497) [Joe LeVeque]
569a079 - 2021-03-12 : [decode-syseeprom] When reading from DB, display CRC-32 and all Vendor Extensions (#1497) [Joe LeVeque]
47d1a14 - 2021-03-12 : [generate-dump] Remove Arista specific logic (#1482) [Samuel Angebault]
1260f90 - 2021-03-12 : [warm-reboot]: Check empty key before issuing redis hget (#1496) [Vaibhav Hemant Dixit]
2021-03-18 13:08:47 -07:00
judyjoseph
9d9503e1fe
To decrease the Connect Retry Timer from default value which is 120sec to 10 sec. (#7087)
Why I did it
It was observed that on a multi-asic DUT bootup, the BGP internal sessions between ASIC's was taking more time to get ESTABLISHED than external BGP sessions. The internal sessions was coming up almost exactly 120 secs later.

In multi-asic platform the bgp dockers ( which is per ASIC ) on switch start are bring brought up around the same time and they try to make the bgp sessions with neighbors (in peer ASIC's) which may be not be completely up. This results in BGP connect fail and the retry happens after 120sec which is the default Connect Retry Timer

How I did it
Add the command to set the bgp neighboring session retry timer to 10sec for internal bgp neighbors.
2021-03-17 23:14:38 -07:00
shlomibitton
43d4d45645
Backport ethtool to support QSFP-DD (#5725)
Backport ethtool debian package version 5.9 to support QSFP-DD cable parsing.

Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>
2021-03-16 09:56:53 -07:00
Stepan Blyshchak
1f7d9e2698
[docker_img_ctl.j2] make tmpfs mounts optional and add ability to run container by image id (#6439)
- Why I did it
I made the docker_img_ctl.j2 applicable for more dockers (including application extensions dockers) by adding an option not to mount tmpfs on /tmp/ and /var/tmp/. In some applications /tmp/ is a different docker volume which can't be tmpfs.
Also, I added and ability to pass REPO[:TAG]|[@digest]/IMAGE_ID instead of just REPO name.

- How I did it
Modified docker_img_ctl.j2 and docker makefiles.

- How to verify it
Run it on the switch.
2021-03-16 17:03:12 +02:00
rathnasabapathyv
6beba298b0
[yang]: To follow consistent naming-conventions for key-attributes of all different types of interfaces (#7049)
As discussed in the yang subgroup community meeting, this change is bring consistent naming-conventions for all different type of interfaces in sonic-yang-model. Particularly the key-attribute name. Since the relevant interface container does have a context about that interface, having a simple & clear key-attribute name will be sufficient. For e.g. PORT/PORT_LIST/port_name has been renamed as PORT/PORT_LIST/name. Similar changes are done for portchannel, VLAN & loopback interfaces as well.
2021-03-16 05:24:04 -07:00
Lior Avramov
d19bb02ce4
[Mellanox]: Fix PCIEd configuration files for SN3700 system (#7058)
Update with correct PCI addresses

Signed-off-by: liora <liora@nvidia.com>
2021-03-15 21:06:12 -07:00
Junchao-Mellanox
8504c72f14
[Mellanox] Initialize PSU API on both host and docker side (#7016)
There was a change to replace platform utils with sonic platform API in psuutil. However, psu API is not initialized on host side. The PR is to fix it.
2021-03-15 12:43:18 -07:00
trzhang-msft
97b371ee08
[docker-dhcp-relay]: add -si support in dhcp docker template (#7053) 2021-03-15 09:21:03 -07:00
Stepan Blyshchak
2b8941e716
[sonic_debian_extension] add docker script to SONiC filesystem (#5935)
- Why I did it
To allow SONiC Package Migration during SONiC-2-SONiC upgrade we need to start docker daemon in chroot-ed environment in new SONiC filesystem.
Later this script will be used to start dockerd in chroot environment on SONiC

- How I did it
Install a docker service script into /usr/lib/docker/ in SONiC filesystem.

- How to verify it
Install SONiC image on the switch, mount squashfs to some directory, mount overlay rw layer over squashfs, mount procfs and sysfs, mount docker library. Start the docker using:
root@sonic:~$ /usr/lib/docker/docker.sh start

Signed-off-by: Stepan Blyshchak <stepanb@nvidia.com>
2021-03-14 14:15:42 +02:00
sandycelestica
f938e7fc79
[celestica]: Fix E1031 udev rules not work for sonic os first boot after be installed (#7043)
Use udevadm to trigger the udev rules on the first boot

How to verify:

- Connect C0 with E1031;
- Install or upgrade the sonic os to 202012 branch;
- When access to sonic check if /dev/C0-1 to /dev/C0-48 are existed.
2021-03-13 15:35:59 -08:00
Kebo Liu
c82aaaeb41
[Mellanox] Update SDK to 4.4.2418, FW to 2008.2416, SAI to new commit (#7041)
- Why I did it
To pick up new features and fix from SDK/FW and SAI

SDK/FW new Feature:

All | Added support for multiple modules and cable types. For full list contact Nvidia networking support
Spectrum-3 | SN46000C | Added support for up to 5W on ports 49 to 64 .
SDK/FW bugs' fix:

All | fast reboot | fast boot failure from latest 201811 to 201911 and above
Spectrum | 10GbE/1GbE Transceiver (FTLX8574D3BCV) stopped working after firmware upgrade
Spectrum-2 | When device is rebooted with locked Optical Transceivers in split mode, the firmware may get stuck
Spectrum-2 | SN3700 | When connecting at 200GbE to Ixia K400, Ixia receives CRC errors
Spectrum-2 | SN3800 | On rare occasions packets loss may be experienced due to signal integrity issues
Spectrum-2 | When the port is a member of a LAG, after a warmboot and port toggle on the peer-side, the port remains down
Spectrum-3 | SN4700 | While using Optic cable in Split 4x1 mode in PAM4, when two first ports are toggled, the other 2 ports go down
Spectrum-3 | SN4700 | When working in 400GbE, deleting the headroom configuration (changing buffer size to zero) on the fly may cause continual packet drops
SAI

All | sFlow | Use hardcoded value 1 as netlink group number ax expected by hsflowd
- How I did it
Update the related version number in the make files and update the submodule pointer accordingly.

- How to verify it
Run regression test and everything works good.
2021-03-13 21:19:40 +02:00
Tamer Ahmed
51ab39fcb2
[hostcfgd]: Add Ability To Configure Feature During Run-time (#6700)
Features may be enabled/disabled for the same topology based on run-time
configuration. This PR adds the ability to enable/disable feature based
on config db data.

signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
2021-03-13 05:56:27 -08:00