Commit Graph

6894 Commits

Author SHA1 Message Date
Stepan Blyshchak
70e2ea1e87
[202205][Mellanox] Place FW binaries under platform directory instead of squashfs (#13838)
Fixes #13568
Backport of #13837

Upgrade from old image always requires squashfs mount to get the next image FW binary. This can be avoided if we put FW binary under platform directory which is easily accessible after installation:

admin@r-spider-05:~$ ls /host/image-fw-new-loc.0-dirty-20230208.193534/platform/fw-SPC.mfa
/host/image-fw-new-loc.0-dirty-20230208.193534/platform/fw-SPC.mfa
admin@r-spider-05:~$ ls -al /tmp/image-fw-new-loc.0-dirty-20230208.193534-fs/etc/mlnx/fw-SPC.mfa
lrwxrwxrwx 1 root root 66 Feb  8 17:57 /tmp/image-fw-new-loc.0-dirty-20230208.193534-fs/etc/mlnx/fw-SPC.mfa -> /host/image-fw-new-loc.0-dirty-20230208.193534/platform/fw-SPC.mfa

- Why I did it
202211 and above uses different squashfs compression type that 201911 kernel can not handle. Therefore, we avoid mounting squashfs altogether with this change.

- How I did it
Place FW binary under /host/image-/platform/mlnx/, soft links in /etc/mlnx are created to avoid breaking existing scripts/automation.
/etc/mlnx/fw-SPCX.mfa is a soft link always pointing to the FW that should be used in current image
mlnx-fw-upgrade.sh is updated to prefer /host/image-/platform/mlnx location and fallback to /etc/mlnx in squashfs in case new location does not exist. This is necessary to do image downgrade.

- How to verify it
Upgrade from 201911 to 202205
202205 to 201911 downgrade
202205 -> 202205 reboot
ONIE -> 202205 boot (First FW burn)

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
2023-02-22 17:40:50 +02:00
Judy Joseph
3c81038bc2 To fix the test failure when porting PR to 202205 2023-02-22 16:37:36 +08:00
judyjoseph
bc8b34c49a Voq Chassis: Add the Recirc ports to the INTERFACES table to make it routed intf (#13779)
* VOQ: Add the Recirc ports to the INTERFACES table to make it routed intf

* Add a test to cover Recir port generation in INTERFACE table
2023-02-22 16:37:36 +08:00
mssonicbld
1e15f00a4f
[Mellanox] Check system eeprom existence in a retry manner (#13884) (#13915) 2023-02-22 13:43:03 +08:00
mssonicbld
b3bcd3d08b
[Mellanox] Fix issue: cannot find label port for logical port when logical port number is larger than 64 (#13710) (#13908)
- Why I did it
sfp_event.py gets a PMPE message when a cable event is available. In PMPE message, there is no label port available. Current sfp_event.py is using sx_api_port_device_get to get 64 logical ports attributes, and find the label port from those 64 attributes. However, if there are more than 64 ports, sfp_event.py might not be able to find the label port and drop the PMPE message.

- How I did it
Don't use hardcoded 64, get logical port number instead.

- How to verify it
Manual test

Co-authored-by: Junchao-Mellanox <57339448+Junchao-Mellanox@users.noreply.github.com>
2023-02-21 18:15:06 -08:00
Stepan Blyshchak
b51de79ffc [systemd-sonic-generator] Fix overlapping strings being passed to strcpy/strcat (#13647)
#### Why I did it

Fix an issue that services do not start automatically on first boot and start only after hostcfgd enables them.
This is due to a bug in systemd-sonic-generator:

```
admin@arc-switch1004:~$ /usr/lib/systemd/system-generators/systemd-sonic-generator dir
Failed to open file /usr/lib/systemd/system/database.servcee
Error parsing targets for database.servcee
Error parsing database.servcee
Failed to open file /usr/lib/systemd/system/bgp.servcee
Error parsing targets for bgp.servcee
Error parsing bgp.servcee
Failed to open file /usr/lib/systemd/system/lldp.servcee
Error parsing targets for lldp.servcee
Error parsing lldp.servcee
Failed to open file /usr/lib/systemd/system/swss.servcee
Error parsing targets for swss.servcee
Error parsing swss.servcee
Failed to open file /usr/lib/systemd/system/teamd.servcee
Error parsing targets for teamd.servcee
Error parsing teamd.servcee
Failed to open file /usr/lib/systemd/system/syncd.servcee
Error parsing targets for syncd.servcee
Error parsing syncd.servcee
```

A wrong file name is generated (e.g database.**servcee**).

#### How I did it

Fixed overlapping strings being passed to strcpy/strcat that receive restirct* pointers (strings should not overlap).

#### How to verify it

Perform first boot and observe services start immidiatelly after boot.
2023-02-22 10:00:56 +08:00
Ying Xie
602ffb135a
[202205][linkmgrd][utilities][swss][swss-common][platform-daemons][linux-kernel] advance submodule head (#13906)
linkmgrd:
* 3e7a9df 2023-02-19 | [active-active] Toggle to standby if default route is missing (#171) (HEAD -> 202205) [Longxiang Lyu]
* 8ab1b2b 2023-02-15 | [active-active] fix issue that interfaces get stuck in `active` if service starts up with link state down (#169) [Jing Zhang]
* df862ad 2023-02-11 | Fix mux config when gRPC connection is lost (#166) [Longxiang Lyu]

utilities:
* 8aa7930c 2023-02-13 | [portstat CLI] don't print reminder if use json format (#2670) (HEAD -> 202205, github/202205) [wenyiz2021]
* 4e3bb6fa 2023-02-21 | Add "show fabric reachability" command. (#2672) [jfeng-arista]
* 3587a94b 2023-02-18 | [202205][dhcp_relay] Remove add field of vlanid to DHCP_RELAY table while adding vlan (#2680) [Yaqiang Zhu]
* 4f07f7f0 2023-02-10 | Skip saidump for Spine Router as this can take more than 5 sec (#2637) (#2671) [kenneth-arista]
* e61c5ec4 2023-02-10 | [vlan] Refresh dhcpv6_relay config while adding/deleting a vlan (#2660) (#2669) [Yaqiang Zhu]

swss:
* 1bbf725 2023-02-14 | [Workaround] EvpnRemoteVnip2pOrch warmboot check failure (#2626) (HEAD -> 202205) [jcaiMR]
* 380f72b 2023-02-20 | Support for tc-dot1p and tc-dscp qosmap (#2559) [Divya Mukundan]
* dbf6fcc 2022-11-01 | Added LAG member check on addLagMember() (#2464) [Andriy Kokhan]

swss-common:
* b31391b 2023-02-21 | Prevent sonic-db-cli generate core dump (#749) (HEAD -> 202205) [Hua Liu]
* 16ff689 2022-12-13 | Support for TC-DOT1p qos map (#721) [Divya Mukundan]

platform-daemons:
* fb92af4 2023-02-09 | [ycabled] add more coverage to ycabled; add minor name change for vendor API CLI return key-values pairs (#338) (HEAD -> 202205) [vdahiya12]

linux-kernel:
* 4e62401 2023-02-09 | Update linux kernel for hw-mgmt V.7.0020.4104 (#305) (HEAD -> 202205) [Stephen Sun]

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2023-02-22 00:45:03 +00:00
mssonicbld
cfb9877372
[Nokia][sonic-platform] Update Nokia sonic-platform submodule (#13522) (#13909) 2023-02-22 07:18:09 +08:00
mssonicbld
6d75f80856
[Build] Change the default mirror version config file (#13786) (#13903)
Why I did it
Change the mirror config file
Use the files/build/versions/default/versions-mirror only when reproducible build enabled.
The config in files/build/versions is only for reproducible build, while snapshot mirror feature does not have the dependency on the reproducible build.

How I did it
Skip the mirror config in files/build/versions/default/versions-mirror if reproducible build not enabled.

How to verify it

Co-authored-by: xumia <59720581+xumia@users.noreply.github.com>
2023-02-21 15:07:21 -08:00
Ashwin Srinivasan
57139110ca Added libpci and pciutils to the pmon docker (#12684)
This enables the pcied daemon to call the corresponding system commands needed for pci transactions
2023-02-22 06:34:40 +08:00
zhixzhu
8b5a42794d
set cable length of backplane ports to 1m (#13279)
* set cable length of backplane ports to 1m

Signed-off-by: Zhixin Zhu <zhixzhu@cisco.com>

* add UT for cable length

Signed-off-by: Zhixin Zhu <zhixzhu@cisco.com>

* correct argument format

---------

Signed-off-by: Zhixin Zhu <zhixzhu@cisco.com>
2023-02-21 22:14:53 +00:00
mssonicbld
9bf90a5f2e
Use tmpfs for /var/log on Arista 7050CX3-32S (#13805) (#13843)
This is to reduce writes to the SSD on the device.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Co-authored-by: Saikrishna Arcot <sarcot@microsoft.com>
2023-02-21 13:25:35 -08:00
mssonicbld
9cbd4ddd05
[Build] Remove the additional space character in the mirrors.list file (#13812) (#13904) 2023-02-22 04:34:21 +08:00
Samuel Angebault
638fdd0e93 [Arista] Disable ATA NCQ for a few products (#13739)
Why I did it
Some products might experience an occasional IO failure in the communication between CPU and SSD.
Based on some research it could be attributable to some device not handling ATA NCQ (Native Command Queue).

This issue currently affect 4 products:

DCS-7170-32C*
DCS-7170-64C
DCS-7060DX4-32
DCS-7260CX3-64

How I did it
This change disable NCQ on the affected drive for a small set of products.

How to verify it
When the fix is applied, these 2 patterns can be found in the dmesg.
ata1.00: FORCE: horkage modified (noncq)
NCQ (not used)

Test results using: fio --direct=1 --rw=randrw --bs=64k --ioengine=libaio --iodepth=64 --runtime=120 --numjobs=4

with NCQ (ata1.00: 61865984 sectors, multi 1: LBA48 NCQ (depth 32), AA)

   READ: bw=33.9MiB/s (35.6MB/s), 33.9MiB/s-33.9MiB/s (35.6MB/s-35.6MB/s), io=4073MiB (4270MB), run=120078-120078msec
  WRITE: bw=34.1MiB/s (35.8MB/s), 34.1MiB/s-34.1MiB/s (35.8MB/s-35.8MB/s), io=4100MiB (4300MB), run=120078-120078msec
without NCQ (ata1.00: 61865984 sectors, multi 1: LBA48 NCQ (not used))

   READ: bw=31.7MiB/s (33.3MB/s), 31.7MiB/s-31.7MiB/s (33.3MB/s-33.3MB/s), io=3808MiB (3993MB), run=120083-120083msec
  WRITE: bw=31.9MiB/s (33.4MB/s), 31.9MiB/s-31.9MiB/s (33.4MB/s-33.4MB/s), io=3830MiB (4016MB), run=120083-120083msec
Which release branch to backport (provide reason below if selected)
2023-02-22 04:34:01 +08:00
Samuel Angebault
3c3a4ac517 [Arista] update sensors.conf to ignore sensors (#12529)
Why I did it
The sensors and sensord processes were reporting data on unused sensors.
This lead to ALARM messages or erroneous values that could be misinterpreted.

How I did it
Ignore the affected sensors in the sensors.conf

How to verify it
Check that there are no longer ALARM messages from sensord in the syslog or in the output of sensors
2023-02-22 04:33:53 +08:00
xumia
ccfb13aa2a [Build] Support j2 template for debian sources for docker ptf (#13198)
Change to use the sources.list from the file generated from the j2 template
2023-02-22 04:33:49 +08:00
Stepan Blyshchak
b5be0da272 [dockerd] Force usage of cgo DNS resolver (#13649)
Go's runtime (and dockerd inherits this) uses own DNS resolver implementation by default on Linux.
It has been observed that there are some DNS resolution issues when executing ```docker pull``` after first boot.

Consider the following script:

```
admin@r-boxer-sw01:~$ while :; do date; cat /etc/resolv.conf; ping -c 1 harbor.mellanox.com; docker pull harbor.mellanox.com/sonic/cpu-report:1.0.0 ; sleep 1; done
Fri 03 Feb 2023 10:06:22 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=5.99 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 5.989/5.989/5.989/0.000 ms
Error response from daemon: Get "https://harbor.mellanox.com/v2/": dial tcp: lookup harbor.mellanox.com on [::1]:53: read udp [::1]:57245->[::1]:53: read: connection refused
Fri 03 Feb 2023 10:06:23 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=5.56 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 5.561/5.561/5.561/0.000 ms
Error response from daemon: Get "https://harbor.mellanox.com/v2/": dial tcp: lookup harbor.mellanox.com on [::1]:53: read udp [::1]:53299->[::1]:53: read: connection refused
Fri 03 Feb 2023 10:06:24 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=5.78 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 5.783/5.783/5.783/0.000 ms
Error response from daemon: Get "https://harbor.mellanox.com/v2/": dial tcp: lookup harbor.mellanox.com on [::1]:53: read udp [::1]:55765->[::1]:53: read: connection refused
Fri 03 Feb 2023 10:06:25 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=7.17 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 7.171/7.171/7.171/0.000 ms
Error response from daemon: Get "https://harbor.mellanox.com/v2/": dial tcp: lookup harbor.mellanox.com on [::1]:53: read udp [::1]:44877->[::1]:53: read: connection refused
Fri 03 Feb 2023 10:06:26 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=5.66 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 5.656/5.656/5.656/0.000 ms
Error response from daemon: Get "https://harbor.mellanox.com/v2/": dial tcp: lookup harbor.mellanox.com on [::1]:53: read udp [::1]:54604->[::1]:53: read: connection refused
Fri 03 Feb 2023 10:06:27 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=8.22 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 8.223/8.223/8.223/0.000 ms
1.0.0: Pulling from sonic/cpu-report
004f1eed87df: Downloading [===================>                               ]   19.3MB/50.43MB
5d6f1e8117db: Download complete
48c2faf66abe: Download complete
234b70d0479d: Downloading [=========>                                         ]  9.363MB/51.84MB
6fa07a00e2f0: Downloading [==>                                                ]   9.51MB/192.4MB
04a31b4508b8: Waiting
e11ae5168189: Waiting
8861a99744cb: Waiting
d59580d95305: Waiting
12b1523494c1: Waiting
d1a4b09e9dbc: Waiting
99f41c3f014f: Waiting
```

While /etc/resolv.conf has the correct content and ping (and any other utility that uses libc's DNS resolution implementation) works correctly
docker is unable to resolve the hostname and falls back to default [::1]:53. This started to happen after PR https://github.com/sonic-net/sonic-buildimage/pull/13516 has been merged.
As you can see from the log, dockerd is able to pick up the correct /etc/resolv.conf only after 5 sec since first try. This seems to be somehow related to the logic in Go's DNS resolver
https://github.com/golang/go/blob/master/src/net/dnsclient_unix.go#L385.

There have been issues like that reported in docker like:
  - https://github.com/docker/cli/issues/2299
  - https://github.com/docker/cli/issues/2618
  - https://github.com/moby/moby/issues/22398

Since this starts to happen after inclusion of resolvconf package by
above mentioned PR and the fact I can't see any problem with that (ping,
nslookup, etc. works) the choice is made to force dockerd to use cgo
(libc) resolver.

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
2023-02-22 04:33:44 +08:00
mssonicbld
e86f8fa31e
Add PYTHON3_SWSSCOMMON as build time dependency to Mellanox platform API (#13847) (#13905) 2023-02-22 02:34:55 +08:00
suresh-rupanagudi
d1d5bce4f6
cherry-picked snmp sonic-yang file from 2bb8306d8e (#13896) 2023-02-21 08:31:50 -08:00
mssonicbld
25ead73d10
[ci/build]: Upgrade SONiC package versions (#13893) 2023-02-21 19:42:54 +08:00
xumia
c7c59ee8c7
[Build] Clean up the debian preference config file (#13886) 2023-02-21 05:52:33 +00:00
mssonicbld
0469a2a02f
[ci/build]: Upgrade SONiC package versions (#13881) 2023-02-19 18:47:51 +08:00
mssonicbld
7521705bb8
[ci/build]: Upgrade SONiC package versions (#13877) 2023-02-18 19:09:33 +08:00
Samuel Angebault
aa912ec925
[202205][Arista] Update platform library submodules (#13871)
add SEU reporting on chassis
fix fallback logic for Clearlake eeprom identification
fix fan speed reporting for a specific model
move pcie timeout configuration for Upperlake in platform code (deprecates hwsku-init)
2023-02-17 13:52:14 -08:00
Yaqiang Zhu
928aad1944 [dhcp_relay] Remove exist check while adding dhcpv6 relay (#13822)
Why I did it
DHCPv6 relay config entry is not useful while del dhcpv6 relay config.

How I did it
Remove dhcpv6_relay entry if it is empty and not check entry exist while adding dhcpv6 relay
2023-02-17 20:53:42 +08:00
Richard.Yu
cf5ca9d27c
[SAI-PTF][mlnx]Enable saiserver test container on mlnx platforms (#13311)
* Why I did it
Enable Test sai api on bfn container with a lightweight container(saiserver).
[SAI-PTF][mlnx]Enable saiserver test container on mlnx container

How I did it
enable saiserver container on mlnx platform.

add docker-saiserver-mlnx.mk for building saiserver container
in platform/barefoot/docker-saiserver-mlnx, add necessary files that needs in saiserver container
How to verify it

Signed-off-by: richardyu-ms <richard.yu@microsoft.com>
2023-02-16 15:42:12 -08:00
mssonicbld
e170a4b8a1
[ci/build]: Upgrade SONiC package versions (#13840) 2023-02-17 07:40:37 +08:00
mssonicbld
e44b255555
[DX010 platform] fix dx010 platform testcase issues (#13595) (#13778)
Why I did it
1. fix chassis test_set_fans_led case
2. fix chassis get_name case mismatch issue
3. fix fan_drawer test_set_fans_speed
4. fix component test_components test case

How I did it
Add corresponding configuration into chassis json file

How to verify it
Run platform tests cases to verify these failure cases

Co-authored-by: Ikki Zhu <79439153+qnos@users.noreply.github.com>
2023-02-10 18:18:00 -08:00
mssonicbld
49aa8776d4
Add lsof and sysstat packages to the base system for debugging purposes (#13741) (#13777)
The lsof and sysstat packages make determining what files/sockets a
program has open a bit easier. This helps if, for example, some
application has a file open that's been deleted from disk.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Co-authored-by: Saikrishna Arcot <sarcot@microsoft.com>
2023-02-10 15:17:39 -08:00
mssonicbld
cdbdf95e70
fix platform.json on Wolverine for thermal sensors (#13524) (#13748)
Why I did it
The current platform.json contains entries for thermals and SFPs that do not exist on Wolverine.

How I did it
I removed the incorrect entries.

How to verify it
Verify using applicable sonic-mgmt platform API tests.

Co-authored-by: Patrick MacArthur <patrick@patrickmacarthur.net>
2023-02-10 14:39:00 -08:00
mssonicbld
8dc3a40685
Fix issue: ERR healthd: Get unit status determine-reboot-cause-'LoadState' (#13697) (#13751)
- Why I did it
Fix issue: ERR healthd: Get unit status determine-reboot-cause-'LoadState'. The error log is only seen on shutdown flow such as fast-reboot/warm-reboot.

In shutdown flow, 'LoadState' might not be available in systemctl status output, using [] might cause a KeyError.

- How I did it
Use dict.get instead of []

- How to verify it
Manual test

Co-authored-by: Junchao-Mellanox <57339448+Junchao-Mellanox@users.noreply.github.com>
2023-02-10 14:38:48 -08:00
mssonicbld
ff232d67bd
add sfp get error description (#13275) (#13746)
Why I did it
Command "sudo sfputil show error-status -hw" shows "OK (Not implemented)" in the output.

How I did it
Add a new SFP API get_error_description support in Nokia sonic-platform sfp.py module.

How to verify it
Run the new image and execute command "sudo sfputil show error-status -hw"

Co-authored-by: Pavan-Nokia <120486223+Pavan-Nokia@users.noreply.github.com>
2023-02-10 14:37:32 -08:00
shdasari
0af9feb2ec
SONiC YANG model support for RADIUS. (#13760)
Why I did it
Added SONiC YANG model for RADIUS.

How I did it
Added the RADIUS and RADIUS_SERVER tables for global and per RADIUS server configuration. RADIUS statistics reside in COUNTERS_DB and are not part of the configuration. These are not a part of this PR.

How to verify it
Compiled sonic_yang_mgmt-1.0-py3-none-any.whl.
2023-02-10 12:43:27 -08:00
xumia
c9806ec3c3
[Build][202211] Support Debian snapshot mirror to improve build stability (#13371) (#13382)
Why I did it
Cherry pick from #13097
[Build] Support Debian snapshot mirror to improve build stability

It is to enhance the reproducible build, supports the Debian snapshot mirror. It guarantees all the docker images using the same Debian mirror snapshot and fixes the temporary build failure which is caused by remote Debain mirror indexes changed during the build. It is also to fix the version conflict issue caused by no fixed versions of some of the Debian packages.

How I did it
Add a new feature to support the Debian snapshot mirror.

How to verify it
2023-02-10 09:33:54 -08:00
mssonicbld
df0685fb19
[Arista] Add emmc quirks in boot0 to improve reliability (#10013) (#13743)
Why I did it
Fix some unreliability seen on emmc device with some AMD CPUs

How I did it
Added a kernel parameter to add quirks to
It depends on a sonic-linux-kernel change to work properly but will be a no-op without it.
The quirk added is SDHCI_QUIRK2_BROKEN_HS200 used to downgrade the link speed for the eMMC.

Co-authored-by: Samuel Angebault <staphylo@arista.com>
2023-02-10 09:28:24 -08:00
Liu Shilong
4f362dcaa4
[build] Check if patches are applied before applying patches. (#13566) (#13761)
Why I did it
If make fails, we can't rerun the make process, because existing patches can't apply again.
2023-02-10 09:26:36 -08:00
mssonicbld
7dfe240ef4
Set 'origin' and 'AS Path' for T1 SLB routes (#13613) (#13753)
* set origin and as-path prepend for routes from SLB

Co-authored-by: jcaiMR <111116206+jcaiMR@users.noreply.github.com>
2023-02-10 09:24:12 -08:00
mssonicbld
ae67185253
[armhf][Nokia-7215]High CPU caused by entropy.py (#13694) (#13752)
Why I did it
High CPU utilization by entropy.py

How I did it
Remove entropy script as it does not work anymore and is no longer needed for bullseye(202205).
In Buster(202012) the max available poolsize (entropy_avail) for entropy is 4096 and our entropy.py script was based on this value. With the change in kernel to bullseye on 202205 this entropy poolsize was changed to 256 which also causes our script to fail.

This script was initially added to provide SW assistance to improve the system entropy value available early on in the Sonic boot sequence on buster.
On bullseye (Linux kernel 5.10) this is no longer needed as this feature has been improved.

How to verify it
run "top" command to check CPU usage.

Co-authored-by: Pavan-Nokia <120486223+Pavan-Nokia@users.noreply.github.com>
2023-02-10 09:23:39 -08:00
mssonicbld
f5656d1aad
[Mellanox][sai_failure_dump]Added platform specific script to be invoked during SAI failure dump (#13533) (#13749)
- Why I did it
Added platform specific script to be invoked during SAI failure dump. Added some generic changes to mount /var/log/sai_failure_dump as read write in the syncd docker

- How I did it
Added script in docker-syncd of mellanox and copied it to /usr/bin

- How to verify it
Manual UT and new sonic-mgmt tests

Co-authored-by: Sudharsan Dhamal Gopalarathnam <dgsudharsan@users.noreply.github.com>
2023-02-10 09:23:10 -08:00
mssonicbld
d623dd2fca
Increase PikeZ varlog size (#13550) (#13750)
Why I did it
To address error sometimes seen when running sonic-mgmt test_stress_routes.py::test_announce_withdraw_route on 720DT-48S

How I did it
Update boot0 logic to set platform specific varlog size for 720DT-48S

How to verify it
Verified that /var/log size increased and error is no longer observed when running test

Co-authored-by: andywongarista <78833093+andywongarista@users.noreply.github.com>
2023-02-10 09:20:36 -08:00
mssonicbld
268e866c02
[Celestica DX010] fix fan drawer and watchdog platform testcase issues (#13426) (#13747)
Why I did it
fix DX010 fan drawer and watchdog platform test case issues

How I did it
1. Add fan_drawer get_maximum_consumed_power support
2. Adjust maximum watchdog timeout value check

How to verify it
Run test_fan_drawer and test_watchdog test cases.

Co-authored-by: Ikki Zhu <79439153+qnos@users.noreply.github.com>
2023-02-10 09:19:38 -08:00
Ying Xie
5e52b92d0a
[202205][linkmgrd][utilities][swss][swss-common][sairedis][platform-daemons] advance submodule head (#13755)
linkmgrd:
* e191338 2023-02-10 | Fix the warning of unused variables (#167) (HEAD -> 202205) [Longxiang Lyu]

utilities:
* 2c933b0a 2023-02-07 | [sai_failure_dump]Invoking dump during SAI failure (#2633) (HEAD -> 202205) [Sudharsan Dhamal Gopalarathnam]
* e949f318 2023-02-07 | [show] add support for gRPC show commands for `active-active` (#2629) [vdahiya12]
* 77723927 2023-01-30 | Fixed admin state config CLI for Backport interfaces (#2557) [anamehra]
* 32b1d4d6 2023-02-01 | [masic support] 'show run bgp' support for multi-asic (#2427) [wenyiz2021]
* a2252d8a 2022-10-11 | Filter port invalid MTU configuration (#2378) [pettershao-ragilenetworks]
* 0ffb4b6a 2023-02-09 | Add Transceiver PM basic CLI support to show output from TRANSCEIVER_PM table for ZR (#2655) (github/202205) [longhuan-cisco]
* 496a0774 2023-02-09 | Add asic id for linecards so "show fabric counters queue/port" can work for single chip systems (#2656) [jfeng-arista]
* 2591e8b5 2023-02-03 | multi asic support for show queue counter (#2647) [zhixzhu]

swss:
* e0373a4 2023-02-07 | [autoneg]Fixing adv interface types to be set when AN is disabled (#2638) (HEAD -> 202205, github/202205) [Sudharsan Dhamal Gopalarathnam]
* 62a09a0 2023-02-09 | [sai_failure_dump]Invoking dump during SAI failure (#2644) (#2661) [Sudharsan Dhamal Gopalarathnam]
* 076f63e 2023-02-08 | [202205] Revert "Revert "[voq][chassis]Add show fabric counters port/queue commands (#2522)" (#2612)" (#2655) [kenneth-arista]
* a35e074 2023-02-06 | [202205][voq][chassis] Remove created ports from the default vlan. (#2651) [arista-nwolfe]

swss-common:
* b9d4284 2023-02-08 | [202205] Fix epoll and socket resource leak issue. (#651) (#741) (github/202205) [Kevin Petremann]

sairedis:
* 9d8e731 2023-02-08 | [Mellanox] Enable DSCP remapping by using SAI attribute (#1188) (HEAD -> 202205, github/202205) [Stephen Sun]
* 272a8bd 2023-02-10 | Fixing race condition for rif counters #1136 (#1202) [Suman Kumar]
* 211365a 2023-02-08 | [202205][submodule][SAI]Advance SAI header (#1207) [Richard.Yu]
* 939c14b 2023-02-08 | [Submodule][upgrade]Upgrade SAI submodule (#1203) [Richard.Yu]

platform-daemons:
* e5ccd40 2022-10-03 | [ycabled] fix naming error for error condition for CLI handling (#302) (HEAD -> 202205, github/202205) [vdahiya12]
* cdd354d 2022-09-29 | [ycabled] add some exception catching logic to some vendor specific API's (#301) [vdahiya12]
* cf58c08 2023-02-01 | Chassisd do an explicit stop of the config_manager (#328) (#336) [judyjoseph]

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2023-02-10 09:17:59 -08:00
mssonicbld
06aa8aa11b
[Mellanox] Support DSCP remapping in dual ToR topo on T0 switch (#12605) (#13745)
- Why I did it
Support DSCP remapping in dual ToR topo on T0 switch for SKU Mellanox-SN4600c-C64, Mellanox-SN4600c-D48C40, Mellanox-SN2700, Mellanox-SN2700-D48C8.

- How I did it
Regarding buffer settings, originally, there are two lossless PGs and queues 3, 4. In dual ToR scenario, the lossless traffic from the leaf switch to the uplink of the ToR switch can be bounced back.
To avoid PFC deadlock, we need to map the bounce-back lossless traffic to different PGs and queues. Therefore, 2 additional lossless PGs and queues are allocated on uplink ports on ToR switches.

On uplink ports, map DSCP 2/6 to TC 2/6 respectively
On downlink ports, both DSCP 2/6 are still mapped to TC 1
Buffer adjusted according to the ports information:
Mellanox-SN4600c-C64:
56 downlinks 50G + 8 uplinks 100G
Mellanox-SN4600c-D48C40, Mellanox-SN2700, Mellanox-SN2700-D48C8:
24 downlinks 50G + 8 uplinks 100G

- How to verify it
Unit test.

Signed-off-by: Stephen Sun <stephens@nvidia.com>
Co-authored-by: Stephen Sun <5379172+stephenxs@users.noreply.github.com>
2023-02-10 09:16:56 -08:00
mssonicbld
c5998be1e2
[Arista] Add other chassis names to platform_components.json for 720DT-48S (#12378) (#13744)
Why I did it
The 720DT-48S platform has variants with different chassis names, and these need to all be included in platform_components.json to ensure that sonic-mgmt platform_tests/fwutil/test_fwutil.py::test_fwutil_show passes

How I did it
Updated platform_components.json with the variant names for 720DT-48S.

How to verify it
Ran aforementioned testcase and verified that it passes on the different variants.

Co-authored-by: andywongarista <78833093+andywongarista@users.noreply.github.com>
2023-02-10 09:15:57 -08:00
Liu Shilong
648ce4b12d
[ci] Kill hanged docker build process to avoid build timeout issue. (#13729)
Why I did it
Docker build has a low rate of hanging up.
It hangs on different steps. So, it looks like a bug in docker daemon.

How I did it
Start a daemon process to scan running time more than 1 hours, and kill the process.
2023-02-10 09:14:28 -08:00
Tejaswini Chadaga
868a1d8e39
Update BRCM SAI version 7.1.32.4-1 (#13715)
Why I did it
Update DNX SAI to include workaround for CS00012275389

How I did it
Updated SAI debian

How to verify it
Basic validation on DNX platform
2023-02-10 09:13:04 -08:00
zitingguo-ms
b3424ea2e6
[202205][Marvell-armhf] Build saiserverv2 docker on marvell-armhf (#13713)
* Enable marvell-armhf saiserver docker

Signed-off-by: zitingguo-ms <zitingguo@microsoft.com>

* fix libsaithriift build env

Signed-off-by: zitingguo-ms <zitingguo@microsoft.com>

* fix thrift 014 dependent issue in armhf

* fix build env

Signed-off-by: zitingguo-ms <zitingguo@microsoft.com>

* [sai_ptf]fix thrift armhf build
in armhf buidl failed as no python command

how
add a checker for different python command, python/python3 and base on result use the right command

verify
container build

* [Thrift_014[armhf]]Fix libboost_unit_test_framework.a not found during build

Why
error happen build thirft in armhf

How
fix this issue, add a soft link for the dependent file

Verify
Build pipeline

Signed-off-by: richardyu-ms <richard.yu@microsoft.com>

* add metadata dependence

Signed-off-by: zitingguo-ms <zitingguo@microsoft.com>

* change build pipeline

Signed-off-by: zitingguo-ms <zitingguo@microsoft.com>

---------

Signed-off-by: zitingguo-ms <zitingguo@microsoft.com>
Signed-off-by: richardyu-ms <richard.yu@microsoft.com>
Co-authored-by: richardyu-ms <richard.yu@microsoft.com>
2023-02-10 09:12:12 -08:00
Oleksandr Ivantsiv
d1fa414f1b
Clear DNS configuration received from DHCP during networking reconfiguration in Linux. (#13516) (#13695)
- Why I did it
fixes #12907

When the management interface IP address configuration changes from dynamic to static the DNS configuration (retrieved from the DHCP server) in /etc/resolv.conf remains uncleared. This leads to a DNS configuration pointing to the wrong nameserver. To make the behavior clear DNS configuration received from DHCP should be cleared.

- How I did it
Use resolvconf package for managing DNS configuration. It is capable of tracking the source of DNS configuration and puts the configuration retrieved from the DHCP servers into a separate file. This allows the implementation of DNS configuration cleanup retrieved from DHCP during networking reconfiguration.

- How to verify it
Ensure that the management interface has no static configuration.
Check that /etc/resolv.conf has DNS configuration.
Configure a static IP address on the management interface.
Verify that /etc/resolv.conf has no DNS configuration.
Remove the static IP address from the management interface.
Verify that /etc/resolv.conf has DNS configuration retrieved form DHCP server.
2023-02-10 09:11:05 -08:00
xumia
7642f4c07f
[Security][202205] Upgrade the openssl version to 1.1.1n-0+deb11u4+fips #13737 (#13759)
* [Security] Upgrade the openssl version to 1.1.1n-0+deb11u4+fips (#13737)

Why I did it
[Security] Upgrade the openssl version to 1.1.1n-0+deb11u4+fips

f6df7303d8 Update expired certs.
84540b59c1 CVE-2022-2068
f763d8a93e Prepare 1.1.1n-0+deb11u2
576562cebe CVE-2022-1292
How I did it
Upgrade the OpenSSL version

* [Security] Upgrade OpenSSL version for armhf
2023-02-10 12:01:22 +00:00
Marty Y. Lok
e6fde1d9e5
[Nokia][devicedata] Modified the port autoneg default setting for Nokia-7215 platform (#13459)
Why I did it
autoneg is not supported in the previous release 202012 on Nokia-7215 platform. To migrate to the 202205 image with autoneg support, we need to disable the autoneg to allow the link to be up when issue load minigraph. This requires to change the autoneg setting to be off in the port_config.ini file.

How I did it
Modify the port_config.ini to set the autoneg off.

How to verify it
Running the new image, with load mingraph.xml, execute "show int autonet status" should show autoneg disabled
2023-02-09 13:13:42 -08:00