Commit Graph

5187 Commits

Author SHA1 Message Date
Lawrence Lee
84cd0e9471 [mux]: Initialize all mux ports as standby
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
2021-11-10 18:54:33 -08:00
Tamer Ahmed
18d1f65339 Merged PR 4813977: [mux] Update Service Install With SONiC Target
[mux] Update Service Install With SONiC Target

Recent PR grouped all SONiC service into sonic.taget. The install section
of mux.service was not update and this causes delays when using config
reload as the service failed state is not being reset.

signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
2021-11-10 18:54:33 -08:00
Lawrence Lee
70fbd6826c Merged PR 4366316: [mux.service]: Bind to sonic.target
[mux.service]: Bind to sonic.target

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
2021-11-10 18:54:33 -08:00
Tamer Ahmed
b42aef68f3 Merged PR 4234524: [mux] Start Mux on Only Dual-ToR Platform
[mux] Start Mux on Only Dual-ToR Platform

mux docker depends on the presence of mux cable hardware and is
supposed to run only Gemini ToRs. This PR change the mux feature
config in order to enable mux docker based on device configuration.

signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
2021-11-10 18:54:33 -08:00
Tamer Ahmed
ed40b53ee1 [linkmgrd] Relocate Linkmgrd to Github
This PR deletes local-to-buildimage linkmgrd and creates new submodule
pointing to github repo of sonic-linkmgrd.

signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
2021-11-10 18:54:33 -08:00
Tamer Ahmed
b826630262 [mux] Add New Package Vars
Ading new packaging variable to mux docker

signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
2021-11-10 18:54:33 -08:00
Tamer Ahmed
a8fdeb3907 [linkmgrd] Enhance Init And Switch State When Config Is Active
During warm reboot, linkmgrd would go away and so heartbeats will
be lost. This would result in standby link son peer ToR to pull the
link active. This is undesirable since we would not create tunnel
from the ToR that is being rebooted to the peer ToR. This PR
implicitly lock the state of the mux if config is not set to auto.

Also, orchagent does not initialize MUX to it hardware state, rather
it initilizes MUX to Unknown state. linkmgrd will detect this situation
and probe MUX state to correct orchagent state.

There a fix for the case when state os switched MUX is delayed. The
PR will poll the MUX for the new state. This is required to update
the state ds and hence create/tear tunnel.

signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
2021-11-10 18:54:33 -08:00
Tamer Ahmed
b8f70f8986 Merged PR 3845699: [linkmgrd]: Introduce MUX cable linkmgrd
Linkmgrd monitors link status, mux status, and link state. Has
the link becomes unhealthy, linkmgrd will trigger mux switchover
on a standby ToR ensuring uninterrupted service to servers/blades.
This PR is initial implementation of linkmgrd.

Also, docker-mux container hold packages related to maintaining and managing
mux cable. It currently runs linkmgrd binary that monitor and switches
the mux if needed.
This PR also introduces mux-container and starts linkmgrd as startup when
build is configured with INCLUDE_MUX=y

Edit: linkmgrd PR will follow.

signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>

Related work items: #2315, #3146150
2021-11-10 18:54:33 -08:00
tjchadaga
8138a34a0b
[202012] sonic-platform-daemons submodule update (#9222) 2021-11-10 17:28:51 -08:00
tjchadaga
1bc012ab1e
[202012] sonic-utilities submodule update (#9214) 2021-11-10 09:00:32 -08:00
Rajkumar-Marvell
34e5243f64
[202012][Marvell] Update armhf SAI to ver 1.7.1-6 (#9205)
Fixed SAI error reported in issue #9172

Signed-off-by: Rajkumar Pennadam Ramamoorthy <rpennadamram@marvell.com>
2021-11-10 08:34:46 -08:00
tjchadaga
9a1b1bc44e Fix for additional intf flap during fast-reboot (#9166) 2021-11-09 23:20:06 +00:00
Saikrishna Arcot
bea36d963e dhcp6relay: remove line overwriting docker-dhcp-relay variable (#9179)
The dhcp6relay rules file had a line overwriting a variable for
docker-dhcp-relay. Remove that line.

This line caused a limited impact where if some (many?) of the docker
containers were already built, except for dhcp-relay, and the build
failed or was interrupted, then dhcp-relay container would fail to build
because this variable was overwritten and the python3-swsscommon
wouldn't get installed into the slave container. Most builds would be
fine, though.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2021-11-09 23:19:23 +00:00
mssonicbld
c15bae7c84
[ci/build]: Upgrade SONiC package versions (#9128) 2021-11-09 22:52:26 +00:00
Vivek Reddy
1cd67bb27c
[202012] update sonic-utilities submodule (#9195)
Submodule update for sonic-utilties

```
48035d75 [202012] [techsupport] Techsupport Error Reporting pending fixes (#1854)
8b2ec09a Fix log_ssd_health hang issue (#1904)
ac9c4254 Fix the option missing in kernel config issue (#1888)
5cc9417a disk_check: Script updated to run good in 201811 & 201911 (#1747)
```
2021-11-09 14:09:52 -08:00
trzhang-msft
7e8ebaabee
caclmgrd: support packet mark in DHCP chain (#9191)
* caclmgrd:support packet mark in DCHP chain
2021-11-08 14:54:57 -08:00
gechiang
baa00e6969
[202012] Disable ALPM distributed hitbit thread that is used for debug purpose only but interfered with Other functional operations (#9190)
This is to address an issue where it was observed that SAI operations sometime make take a very long to time complete (over 45ms). It was determined that the ALPM distributed thread was causing this issue.
The fix is to disable this debug thread that has no functional purpose.

Preliminary tests looks fine. BGP neighbors were all up with proper routes programmed
interfaces are all up
Manually ran the fib test cases on 7050CX3 (TD3), TD2, TH, TH2, and TH3 based platforms and
thy all passed.
2021-11-08 11:50:44 -08:00
Lawrence Lee
8ada006302 [swss]: Start ndppd after vlanmgrd (#9155)
Why I did it
During swss container startup, if ndppd starts up before/with vlanmgrd, ndppd will be pinned at nearly 100% CPU usage.

How I did it
Only start ndppd after vlanmgrd is running. Also, call ndppd directly instead of through bash for improved logging and to prevent orphaned processes.

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
2021-11-05 00:39:10 +00:00
kellyyeh
d8dd68d2f4 Fix invalid destination address error (#9143) 2021-11-05 00:38:36 +00:00
shlomibitton
77910e41ce [Mellanox] Fix split configuration for Mellanox SN3800-D112C8 SKU SAI profile for fast-reboot performance (#8897)
- Why I did it
Wrong SKU configuration will lead to longer init flow.
This will affect fast-reboot feature by increasing the traffic downtime.
Since MLNX met the required downtime period with this SKU this bug found with a delay.

- How I did it
Add the required split labels for ports.

- How to verify it
Run fast-reboot with this platform using SN3800-D112C8 SKU.
2021-11-05 00:38:31 +00:00
Rajkumar-Marvell
fb844c754a
[202012][Marvell] Update armhf SAI to ver 1.7.1-5 (#9118)
Fixed test_null_route_helper fix

Signed-off-by: Rajkumar Pennadam Ramamoorthy <rpennadamram@marvell.com>
2021-11-03 07:29:02 -07:00
tjchadaga
cc3e595c8c
[202012] sairedis submodule update (#9156)
These changes are included in this PR:

07e1f79 [syncd] Add workaround for warm boot new objects (#959)
50fd353 Fix the option missing in kernel config issue (#956)
e77503c [syncd] Comparison logic workaround for empty buffer profile (#906) (#941)
2021-11-03 07:25:25 -07:00
gechiang
400e40f255
[202012] BRCM SAI 4.3.5.1-6 Picked up fixes for CS00012213351, CS00012182162, and CS00012210826 (#9158)
This is to pick up BRCM SAI 4.3.5.1-6 fixes which contains the following fixes:

1.  CS00012213351 SONIC-50679: [TH, TH2] Warm-reboot from 3.5 to 4.3 fails due to null objects discovered
2.  CS00012182162: SONIC-49805 TD3 MMU config profile optimization changes 
3.  CS00012210826:SONIC-50205/760c60fc: Should read MMU_INTFI_MMU_PORT_TO_MMU_QUEUES_FC_BKP for TH3

Preliminary tests looks fine. BGP neighbors were all up with proper routes programmed
interfaces are all up
Manually ran the following test cases on 7050CX3 (TD3) T0 DUT and all passed:
```
     fib/test_fib.py
     vxlan/test_vxlan_decap.py
     fdb/test_fdb.py
     decap/test_decap.py
     ipfwd/test_dip_sip.py 
     ipfwd/test_dir_bcast.py
     acl/test_acl.py
     vlan/test_vlan.py
     platform_tests/test_reboot.py
```
2021-11-03 07:24:33 -07:00
Qi Luo
de38cd758e
[build] Use pip to install setup.py dependency instead of python setup.py install (#9111)
#### Why I did it
Backport https://github.com/Azure/sonic-buildimage/pull/8997 to 202012 branch.
2021-10-29 21:14:41 -07:00
Prince Sunny
1fc4968cd0
[swss] Update swss submodule (#9110)
Commits:
f147d9e - 2021-10-27 : [Mux orch] Handle setting unknown mux state (#1984) [Prince Sunny]
4618b2b - 2021-10-28 : Change tunnel orch order (#1990) [Prince Sunny]
505d52d - 2021-10-20 : Fix the option missing in kernel config issue (#1973) [xumia]
7bf4dfb - 2021-10-18 : SAI_REDIS_SWITCH_ATTR_CONTEXT shouldbe the last attribute. This is what sairedis library expects (#1935) [judyjoseph]
c58919e - 2021-10-08 : [logfile][202012]: Add option to specify swss rec file name (#1946) [judyjoseph]
2021-10-29 10:11:49 -07:00
Guohan Lu
bf1ed42588 [ci]: build centec arm64 to sonicbld-arm64 pool
Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-10-29 09:10:13 -07:00
Neetha John
a933b20d6f [minigraph] Add tagged vlan member support for storage backend (#9045)
Signed-off-by: Neetha John <nejo@microsoft.com>

Why I did it
Storage T0's have all vlan members as tagged

How I did it
Since currently minigraph does not have a unique way to identify if a vlan member is tagged/untagged and to ensure other scenarios are not broken, the logic used is to just update the vlan member type as 'tagged' when we determine that it is a storage backend device. This change will apply only to storage backend T0's since storage backend T1's will not have vlan member information

How to verify it
Updated the storage backend T0 testcases to check for tagged vlan members
Added testcase to check if a T1 and backend T1 device generates an empty vlan member table
Existing vlan member testcases are good enough for checking if any regression has been caused for regular T0's
Build sonic_config_engine-1.0-py3-none-any.whl successfully
2021-10-29 01:05:47 +00:00
Stepan Blyshchak
eb744552be [Makefile.cache] fix an issue that non-direct dependencies are not accounted in component hash calculation (#8965)
#### Why I did it

Fixed an issue that changing SDK version leads to cache framework taking cached syncd RPC image rather then rebuilding syncd RPC based on new syncd with new SDK.

Investigation showed that cache framework calculates a component hash based on direct dependencies. Syncd RPC image hash consists of two parts: one is the flags of syncd RPC (platform, ENABLE_SYNCD_RPC) and syncd RPC direct dependencies makefiles. None of the syncd RPC direct dependencies are modified when SDK version changes, so hash is unchanged.

#### How I did it

To fix this issue, include the hash of dependencies into current component hash calculation, e.g.:

In calcultation of the hash ```docker-syncd-mlnx-rpc.gz-274dfed3f52f2effa9989fc-39344350436f9b06d28b470.tgz```, the hash of syncd is included: ```docker-syncd-mlnx.gz-48ee88ac54b201e0e107b15-7bbea320025177a2121e440.tgz``` in which the hash of SDK is included.

#### How to verify it

Build with cache enabled and check that changing SDK version leads to a different hash of syncd rpc image:

SDK version 4.5.1002:
```
docker-syncd-mlnx.gz-48ee88ac54b201e0e107b15-7bbea320025177a2121e440.tgz
docker-syncd-mlnx-rpc.gz-274dfed3f52f2effa9989fc-39344350436f9b06d28b470.tgz
```

SDK version 4.5.1002-005:
```
docker-syncd-mlnx.gz-18baf952e3e0eda7cda7c3c-e5668f4784390d5dffd55af.tgz
docker-syncd-mlnx-rpc.gz-4a6e59580eda110b5709449-552f76be135deaf750aeab2.tgz
```
2021-10-29 01:05:39 +00:00
Sumukha Tumkur Vani
65626c8925
Flush RESTAPI DB upon config reload (#9093) 2021-10-28 09:31:38 -07:00
vdahiya12
933f45e912
[202012][sonic-platform-common][sonic-platform-daemons] submodule update (#9081)
bf7214c (HEAD -> 202012, origin/202012) [y_cable] add support for manual/standby mode in xcvrd for muxcable; fix download firmware version retrieval logic while download firmware in progress (#220)

fc6a41e (HEAD -> 202012, origin/202012) Fix typo in the simulated y_cable driver (#226)

Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
2021-10-27 10:18:00 -07:00
dflynn-Nokia
5b6fdf244c [Nokia ixs7215] Platform API fixes (#9025)
* [Nokia ixs7215] Platform API fixes

This commit delivers the following fixes
    - Fix bug preventing access to second PSU eeprom
    - Fix bug preventing updates to front panel PSU status led
    - Fix SFP reset test case failure

* Fix LGTM alert
2021-10-27 03:55:51 +00:00
dflynn-Nokia
f5fbb0bb31 [Nokia ixs7215] Add new platform capabilities to platform.json (#9032)
This commit more fully declares the HW capabilities of the Nokia-7215
platform. For example, support for the threshold values associated with each
thermal sensor is described. The intent here is to inform the sonic-mgmt
platform test cases of which HW features are supported.

This commit must align with PR# 4521 within the sonic-mgmt git repo which is
currently under review. Any changes to that PR will need to be reflected in
this commit.
2021-10-27 03:55:43 +00:00
Nazarii Hnydyn
0cbda8d362 [teamd]: Send USR1/USR2 only to subscribers. (#8856)
To fix teamd signal handling, without which Process 'tlm_teamd' exited unexpectedly
2021-10-27 03:54:58 +00:00
Santhosh Kumar T
7137e3f949 [Dell] S6000 I2C not responding to certain optics (#8736)
* [Dell] S6000 I2C not responding to certain optics

* Revising return states

* Moved lock file from /var/run/platform_cache to /etc/sonic
2021-10-27 03:54:18 +00:00
mssonicbld
1c86196411
[ci/build]: Upgrade SONiC package versions (#9050) 2021-10-25 17:09:12 +00:00
xumia
027a35d7e5
[ci]: Change to build marvell-armhf on armhf pool (#8467) 2021-10-25 07:38:18 -07:00
vdahiya12
f63e3cd1e9
[202012][sonic-utilities] submodule update (#9023)
8768089 (HEAD -> 202012, origin/202012) Remove exec from platform_reboot_plugin call to handle any hang issue. (#1879)
ae5d90c Validate input of ```config mirror_session add``` (#1825)
44d3a3b [show][config] fix the muxcable commands for interface naming mode (#1862)
0a4933e [TH3] Skipp Control Plane Assist on WARM Reboot for TH3 HWSKUs (#1861)

Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
2021-10-24 09:06:17 -07:00
Arun Saravanan Balachandran
4139e06260 DellEMC: Z9332f - Component firmware upgrade platform API implementation (#8973) 2021-10-22 17:16:49 +00:00
kellyyeh
517d81a57a [dhcp_relay] fix data type in dhcp6relay, add protection in packet data parsing (#9036) 2021-10-22 17:14:35 +00:00
Saikrishna Arcot
bb1bc59a22 docker-dhcp-relay: Fix waiting for interfaces to get set up (#9034)
Fix the check used to wait for interfaces to come up. The group name in
the supervisor config files has changed from isc-dhcp-relay to
dhcp-relay.

Also, in the wait script, wait 10 additional seconds after the vlans,
port channels, and any interfaces are up. This is because dhcrelay
listens on all interfaces (in addition to port channels and vlans), and
to ensure that it stays in a clean state during runtime, wait some extra
time to make sure that those interfaces are created as well.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2021-10-22 17:14:22 +00:00
Sujin Kang
2c41441edd
Revert "[202012] [Arista] Enable syseepromd update database (with 202012 fix) (#8963)" (#9041)
This reverts commit 94456b1680.
2021-10-22 09:54:49 -07:00
kellyyeh
82d0539c4e Fix dhcpmon bugs (#9008)
* Exclude incrementing on aggregate device if packet received on mgmt interface

* Fix DHCPv6 header offset calculation
2021-10-20 01:37:04 +00:00
gechiang
c95178157d
[202012]BRCM SAI 4.5.3.1-5 picked up SAI fixes for several CSP cases (#9003) 2021-10-19 14:08:31 -07:00
judyjoseph
676793b8ee
Port PR:https://github.com/Azure/sonic-buildimage/pull/8002 (#8851)
to 202012 branch
2021-10-19 13:47:42 -07:00
vdahiya12
63083b2512
[202012][sonic-platform-common] submodule update (#9006)
This PR updates the following commits

0770dec Add retry reading/setting mux status to simulated y-cable driver (#221)

Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
2021-10-19 10:02:02 -07:00
Saikrishna Arcot
8555e41d5f redis-dump-load: Pin the redis package to use 3.5.3 (#9001)
Redis 4.0.0b1 has been uploaded to pip as a prerelease version. This
version drops support for Python 2 and only supports Python 3. Because
setup.py is being run, it will use the latest version of a package and
not the latest stable version (which is still 3.5.3).

Therefore, pin the redis package to version 3.5.3, so that it will work
for both Python 2 and 3.

#### How to verify it

Make sure that redis-dump-load for Python 2 builds today.
2021-10-19 05:31:36 -07:00
xumia
967d8abe07 Fix failed to download cisco artifacts issue (#8942)
Why I did it
Fix the failure to download cisco artifacts issue
2021-10-15 00:40:10 +00:00
Ying Xie
f1d5aaced0 [copp] bind copp-config.service to sonic.target (#8969)
copp-config service needs to be started after sonic.target so that it could
render the copp-config with the latest information.

It also needs to be restarted when config reload or load_minigraph is invoked.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2021-10-15 00:40:05 +00:00
Xichen96
dfc4bc540e [determine-reboot-cause] delay execution (#8935)
Since database.service has been moved to execute after rc-local.service,
and determine-reboot-cause.service rely on database.service, we have to
specify that in "After=".

Signed-off-by: Xichen Lin <xichenlin@microsoft.com>

Co-authored-by: Xichen Lin <xichenlin@microsoft.com>
2021-10-15 00:39:54 +00:00
zzhiyuan
94456b1680
[202012] [Arista] Enable syseepromd update database (with 202012 fix) (#8963)
Co-authored-by: Zhi Yuan (Carl) Zhao <zyzhao@arista.com>
Why I did it
The previous PR #8914 was reverted due to crashing the 202012 syseepromd.

How I did it
Tested the 202012 image with change and fixed the disparity between master and 202012.

How to verify it
Run the built image on the dut and syseepromd will not crash, and in redis-cli can fetch the eeprom information.
2021-10-14 16:43:57 -07:00