Commit Graph

7105 Commits

Author SHA1 Message Date
Stephen Sun
1cd090277f
[Mellanox] Non upstream patches for hw-mgmt V.4.0020.4104 (#13792)
- Why I did it
Add non-upstream kernel patches for the Nvidia platforms
These patches are not yet upstream but needed for new technology.
A flow to upstream them is in progress and once they will be approved they will be moved officially to sonic-linux-kernel.
Till then to include them in the build (not must) the build option INCLUDE_EXTERNAL_PATCH_TAR=y should be included

- How I did it
Zip all the patches in to a tar.gz tarball.

- How to verify it
Manually test

Signed-off-by: Stephen Sun <stephens@nvidia.com>
2023-02-19 09:44:56 +02:00
Oleksandr Ivantsiv
a82e45a192
[DNS] yang model for static DNS (#13834)
- Why I did it
Add SONiC YANG model for DNS to provide the possibility to configure static DNS entries in Config DB.

- How I did it
Added sonic-dns.yang file that contains the YANG model for the static DNS configuration.

- How to verify it
This PR extends YANG model tests to cover DNS configuration.
To run the test sonic_yang_models-1.0-py3-none-any.whl should be compiled.
2023-02-19 09:43:17 +02:00
Stepan Blyshchak
a81ffeb5c2
[systemd-sonic-generator] Fix overlapping strings being passed to strcpy/strcat (#13647)
#### Why I did it

Fix an issue that services do not start automatically on first boot and start only after hostcfgd enables them.
This is due to a bug in systemd-sonic-generator:

```
admin@arc-switch1004:~$ /usr/lib/systemd/system-generators/systemd-sonic-generator dir
Failed to open file /usr/lib/systemd/system/database.servcee
Error parsing targets for database.servcee
Error parsing database.servcee
Failed to open file /usr/lib/systemd/system/bgp.servcee
Error parsing targets for bgp.servcee
Error parsing bgp.servcee
Failed to open file /usr/lib/systemd/system/lldp.servcee
Error parsing targets for lldp.servcee
Error parsing lldp.servcee
Failed to open file /usr/lib/systemd/system/swss.servcee
Error parsing targets for swss.servcee
Error parsing swss.servcee
Failed to open file /usr/lib/systemd/system/teamd.servcee
Error parsing targets for teamd.servcee
Error parsing teamd.servcee
Failed to open file /usr/lib/systemd/system/syncd.servcee
Error parsing targets for syncd.servcee
Error parsing syncd.servcee
```

A wrong file name is generated (e.g database.**servcee**).

#### How I did it

Fixed overlapping strings being passed to strcpy/strcat that receive restirct* pointers (strings should not overlap).

#### How to verify it

Perform first boot and observe services start immidiatelly after boot.
2023-02-17 17:31:14 -08:00
Samuel Angebault
8437e893b4
[Arista] Update platform library submodules (#13870)
add SEU reporting on chassis
fix fallback logic for Clearlake eeprom identification
fix fan speed reporting for a specific model
move pcie timeout configuration for Upperlake in platform code (deprecates hwsku-init)
2023-02-17 13:51:17 -08:00
Saikrishna Arcot
56d732a0a0
Use tmpfs for /var/log on Arista 7050CX3-32S (#13805)
This is to reduce writes to the SSD on the device.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2023-02-16 19:13:39 -08:00
rupesh-k
fd49bc1f27
[Yang-models] Modify mirror yang model src_port to match CONFIG_DB (#13578)
Fixes Mirror Yang model src_port to match CONFIG_DB

#### Why I did it
Fixes issue https://github.com/sonic-net/sonic-buildimage/issues/12397#issuecomment-1343109874
2023-02-16 11:22:37 -08:00
Liu Shilong
22e46207c8
[ci] Kill hanged docker build process to avoid build timeout issue. (#13726)
Why I did it
Docker build has a low rate of hanging up.
It hangs on different steps. So, it looks like a bug in docker daemon.

How I did it
Start a daemon process to scan running time more than 1 hours, and kill the process.

How to verify it
2023-02-16 21:58:14 +08:00
Kebo Liu
aee97a69c6
[Mellanox] Enhance MFT make file to download source code from any valid URL (#13801)
- Why I did it
Currently, when building MFT, it can only download the source code from the official download site: http://www.mellanox.com/downloads/MFT/, it's not possible to integrate an internal version that has not been officially released yet.

The intention of this PR is to make it possible to download the source code from any valid link.

- How I did it
Add a new parameter "MLNX_MFT_INTERNAL_SOURCE_BASE_URL", if an URL is given, it will download the source code from the given URL, otherwise, it downloads from the default official site.

- How to verify it
Specify a valid URL in the make file, the MFT debs should be built successfully.

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2023-02-16 09:51:04 +02:00
ganglv
e673c1dcaf
Add yang models for GNMI #13716
Why I did it
Add missing yang models.

How I did it
Add sonic-gnmi.yang and unit test.

How to verify it
Run unit test for sonic-yang-models.
2023-02-16 11:07:27 +08:00
Qi Luo
373f0919e9
[sonic-snmpagent] Update submodule (#13806)
#### Why I did it
Include below commits:
```
4622b8d 2023-02-14 | Fix: zero route may have empty nexthop (#276) [Qi Luo]
```
2023-02-15 14:15:52 -08:00
Samuel Angebault
5ce1b8e4b7
[Arista] Disable ATA NCQ for a few products (#13739)
Why I did it
Some products might experience an occasional IO failure in the communication between CPU and SSD.
Based on some research it could be attributable to some device not handling ATA NCQ (Native Command Queue).

This issue currently affect 4 products:

DCS-7170-32C*
DCS-7170-64C
DCS-7060DX4-32
DCS-7260CX3-64

How I did it
This change disable NCQ on the affected drive for a small set of products.

How to verify it
When the fix is applied, these 2 patterns can be found in the dmesg.
ata1.00: FORCE: horkage modified (noncq)
NCQ (not used)

Test results using: fio --direct=1 --rw=randrw --bs=64k --ioengine=libaio --iodepth=64 --runtime=120 --numjobs=4

with NCQ (ata1.00: 61865984 sectors, multi 1: LBA48 NCQ (depth 32), AA)

   READ: bw=33.9MiB/s (35.6MB/s), 33.9MiB/s-33.9MiB/s (35.6MB/s-35.6MB/s), io=4073MiB (4270MB), run=120078-120078msec
  WRITE: bw=34.1MiB/s (35.8MB/s), 34.1MiB/s-34.1MiB/s (35.8MB/s-35.8MB/s), io=4100MiB (4300MB), run=120078-120078msec
without NCQ (ata1.00: 61865984 sectors, multi 1: LBA48 NCQ (not used))

   READ: bw=31.7MiB/s (33.3MB/s), 31.7MiB/s-31.7MiB/s (33.3MB/s-33.3MB/s), io=3808MiB (3993MB), run=120083-120083msec
  WRITE: bw=31.9MiB/s (33.4MB/s), 31.9MiB/s-31.9MiB/s (33.4MB/s-33.4MB/s), io=3830MiB (4016MB), run=120083-120083msec
Which release branch to backport (provide reason below if selected)
2023-02-15 10:31:59 -08:00
byu343
b0e0b23d2a
[arista] Add tuning values for phys on 7280cr3 (#10084)
Why I did it
This change specifies the tuning values for each lane of the B52 phy chips. These values can be different for different ports. The values being set are under the assumption of optical transceivers. This change depends on the change to sonic-swss: sonic-net/sonic-swss#2158.

How to verify it
We verified the values are correctly set on the B52 chips of Arista 7280cr3, by reading them from the debug cli of the B52 driver.
2023-02-15 10:25:49 -08:00
Yaqiang Zhu
c5a7ce0cf4
[dhcp_relay] Remove exist check while adding dhcpv6 relay (#13822)
Why I did it
DHCPv6 relay config entry is not useful while del dhcpv6 relay config.

How I did it
Remove dhcpv6_relay entry if it is empty and not check entry exist while adding dhcpv6 relay
2023-02-15 10:23:39 -08:00
xumia
a3225e65e4
[Build] Remove the additional space character in the mirrors.list file (#13812)
Why I did it
Fix all mirror is commented out in sources.list in slave image issue. It will have an issue when installing more packages in the slave container.

It will add additional space character after running add-apt-repository command.

For example:
The original config in /etc/apt/sources.list

#deb [arch=amd64] http://deb.debian.org/debian/ bullseye main contrib non-free
Run the following command:

add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/debian bullseye stable"
Then the setting changed to: (added a new space character after #)

# deb [arch=amd64] http://deb.debian.org/debian/ bullseye main contrib non-free
How I did it
Fix the regex string to add the space pattern. After fixed, whether there is a space character or not, it will not be an issue.

How to verify it
2023-02-15 22:19:01 +08:00
mathieulaunay
8a4d1b5797
[build] add an environment variable to run make reset unattended (#12207)
previously "make reset" was expecting user input from the terminal to do its job
setting UNATTENDED to any non-zero string will allow "make reset" to run without interactive confirmation

- Why I did it
When doing automated builds of SONiC images, we need to reset the working repositories between each build.

- How I did it
Adding an environment variable that is read by Makefile.work

- How to verify it
running
UNATTENDED=1 make reset 
should make an automatic reset of all working directories
2023-02-15 10:59:03 +02:00
ganglv
23dbdf525b
Enable host services #13800
Why I did it
Need to enable host service to suport GNMI native write

How I did it
Update rules/config

How to verify it
Run GNMI end2end test
2023-02-15 14:40:09 +08:00
george-deng88
3097e4258d
Fix Celestica Silverstone addDecapTunnel failed issue (#13235)
*Why I did it
     The current sonic image bringup failed on Silverstone:
     4:56:15.957705 sonic NOTICE swss#orchagent: :- addDecapTunnel: Create overlay loopback router interface oid:6000000000520
 *How I did it
     Enable bcm switch Tunnel function on Silverstone.
 *How to verify it
     The new sonic image can bringup OK on Silverstone.
2023-02-14 12:18:19 -08:00
Nazarii Hnydyn
3757199404
[submodule]: Advance sonic-sairedis submodule. (#13802)
Update sonic-sairedis submodule pointer to include the following:
* 4d86af3 [hash]: Extend VS lib with ECMP/LAG hash (#1192)
2023-02-14 11:46:05 -08:00
xumia
8206925631
[Build] Change the default mirror version config file (#13786)
Why I did it
Change the mirror config file
Use the files/build/versions/default/versions-mirror only when reproducible build enabled.
The config in files/build/versions is only for reproducible build, while snapshot mirror feature does not have the dependency on the reproducible build.

How I did it
Skip the mirror config in files/build/versions/default/versions-mirror if reproducible build not enabled.

How to verify it
2023-02-14 14:59:38 +08:00
Dror Prital
cdfc5834a5
[submodule] Advance sonic-platform-daemons pointer (#13552)
Update sonic-platform-daemons submodule pointer to include the following:
* 906d198 add data for telemtery enhancement for 'active-active' cable type ([#332](https://github.com/sonic-net/sonic-platform-daemons/pull/332))

Signed-off-by: dprital <drorp@nvidia.com>
2023-02-14 08:58:56 +02:00
Stepan Blyshchak
e5a294644c
[dockerd] Force usage of cgo DNS resolver (#13649)
Go's runtime (and dockerd inherits this) uses own DNS resolver implementation by default on Linux.
It has been observed that there are some DNS resolution issues when executing ```docker pull``` after first boot.

Consider the following script:

```
admin@r-boxer-sw01:~$ while :; do date; cat /etc/resolv.conf; ping -c 1 harbor.mellanox.com; docker pull harbor.mellanox.com/sonic/cpu-report:1.0.0 ; sleep 1; done
Fri 03 Feb 2023 10:06:22 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=5.99 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 5.989/5.989/5.989/0.000 ms
Error response from daemon: Get "https://harbor.mellanox.com/v2/": dial tcp: lookup harbor.mellanox.com on [::1]:53: read udp [::1]:57245->[::1]:53: read: connection refused
Fri 03 Feb 2023 10:06:23 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=5.56 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 5.561/5.561/5.561/0.000 ms
Error response from daemon: Get "https://harbor.mellanox.com/v2/": dial tcp: lookup harbor.mellanox.com on [::1]:53: read udp [::1]:53299->[::1]:53: read: connection refused
Fri 03 Feb 2023 10:06:24 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=5.78 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 5.783/5.783/5.783/0.000 ms
Error response from daemon: Get "https://harbor.mellanox.com/v2/": dial tcp: lookup harbor.mellanox.com on [::1]:53: read udp [::1]:55765->[::1]:53: read: connection refused
Fri 03 Feb 2023 10:06:25 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=7.17 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 7.171/7.171/7.171/0.000 ms
Error response from daemon: Get "https://harbor.mellanox.com/v2/": dial tcp: lookup harbor.mellanox.com on [::1]:53: read udp [::1]:44877->[::1]:53: read: connection refused
Fri 03 Feb 2023 10:06:26 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=5.66 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 5.656/5.656/5.656/0.000 ms
Error response from daemon: Get "https://harbor.mellanox.com/v2/": dial tcp: lookup harbor.mellanox.com on [::1]:53: read udp [::1]:54604->[::1]:53: read: connection refused
Fri 03 Feb 2023 10:06:27 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=8.22 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 8.223/8.223/8.223/0.000 ms
1.0.0: Pulling from sonic/cpu-report
004f1eed87df: Downloading [===================>                               ]   19.3MB/50.43MB
5d6f1e8117db: Download complete
48c2faf66abe: Download complete
234b70d0479d: Downloading [=========>                                         ]  9.363MB/51.84MB
6fa07a00e2f0: Downloading [==>                                                ]   9.51MB/192.4MB
04a31b4508b8: Waiting
e11ae5168189: Waiting
8861a99744cb: Waiting
d59580d95305: Waiting
12b1523494c1: Waiting
d1a4b09e9dbc: Waiting
99f41c3f014f: Waiting
```

While /etc/resolv.conf has the correct content and ping (and any other utility that uses libc's DNS resolution implementation) works correctly
docker is unable to resolve the hostname and falls back to default [::1]:53. This started to happen after PR https://github.com/sonic-net/sonic-buildimage/pull/13516 has been merged.
As you can see from the log, dockerd is able to pick up the correct /etc/resolv.conf only after 5 sec since first try. This seems to be somehow related to the logic in Go's DNS resolver
https://github.com/golang/go/blob/master/src/net/dnsclient_unix.go#L385.

There have been issues like that reported in docker like:
  - https://github.com/docker/cli/issues/2299
  - https://github.com/docker/cli/issues/2618
  - https://github.com/moby/moby/issues/22398

Since this starts to happen after inclusion of resolvconf package by
above mentioned PR and the fact I can't see any problem with that (ping,
nslookup, etc. works) the choice is made to force dockerd to use cgo
(libc) resolver.

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
2023-02-14 08:57:19 +02:00
Stephen Sun
71b5bb6f37
[Mellanox] Support per PSU slope value for PSU power threshold (#13757)
- Why I did it
Support per PSU slope value for PSU power threshold according to hardware team requirement

- How I did it
Pass the PSU number as a parameter when fetching the slope value of PSU.

- How to verify it
Running regression and manual test

Signed-off-by: Stephen Sun <stephens@nvidia.com>
2023-02-14 08:55:28 +02:00
Junchao-Mellanox
389b279ba9
[Mellanox] set select timeout to no more than 1 sec to make sure fast shutdown (#13611)
- Why I did it
Commit sonic-net/sonic-platform-daemons@153ea47 changed SfpStateUpdateTask from Process to Thread. In this commit, it raises an exception in SfpStateUpdateTask to make shutdown flow fast. But it does not work on Nvidia platform as Nvidia platform is passing timeout parameter of get_change_event to select. Linux select function can not be interrupted by a Python exception. There is no such issue on Nvidia platform before that commit. However, in order to comply with the commit and make shutdown flow fast, we decided to change Nvidia platform API implementation.

To fix issue #13591.

- How I did it
The select call in get_change_event should use no more than 1 second as timeout parameter.
Outside the select call, add a while loop to make sure timeout parameter of get_change_event work as expected

- How to verify it
Manual test
2023-02-14 08:26:25 +02:00
Stephen Sun
3112997b5a
[Mellanox] Advance hw-mgmt to v.7.0020.4104 (#13372)
- Why I did it
Advance hw-mgmt service to V.7.0020.4100
Add missing thermal sensors that are supported by hw-mgmt package
Delay system health service before hw-mgmt has started on Mellanox platform in order to avoid reading some sensors before ready.
Depends on sonic-net/sonic-linux-kernel#305

- How I did it
1. Update hw mgmt version
2. Add missing sensors
3. Delay service 

- How to verify it
Regression test.

Signed-off-by: Stephen Sun <stephens@nvidia.com>
2023-02-12 11:23:47 +02:00
Stephen Sun
ead7925f7d
[submodule] Advance sonic-linux-kernel pointer (#13732)
464d2cdb [Mellanox] Update linux kernel for hw-mgmt V.7.0020.4104 (#305)

Signed-off-by: Stephen Sun <stephens@nvidia.com>
2023-02-12 11:20:19 +02:00
Richard.Yu
43683d8cee
[submodule][sairedis]Advance submodule head (#13724)
include sairedis changes
3a960be | [submodule][SAI]ADvance SAI Header (#1206)
7026441 | [Mellanox] Enable DSCP remapping by using SAI attribute (#1188)
a2c37b8 | [syncd]: Enable port bulk API (#1197)

include SAI changes
7710e24 | [cherry-pick][202211]Enhance the check enum lock script (#1741) (#1742)
0031470 | improve enum values integration check (#1727) (#1737)
4f11c7e | Enable github code scanning to replace LGTM. (#1709)
0fd23d2 | [SAI-PTF] Skip test when hit expected error from sai api (#1699)
aba7612 | [SAI-PTF] API Logger - reformat arg values (#1696)
1390cee | [SAI-PTF] API Logger - reformat dict in return value (#1690)
3d96a1d | [SAI-PTF]Add return value in the SAI-PTF log (#1685)

Signed-off-by: richardyu-ms <richard.yu@microsoft.com>
2023-02-11 19:32:09 +08:00
spilkey-cisco
ad679a0338
Add asic presence filtering for container checking in system-health (#13497)
Why I did it
On a supervisor card in a chassis, syncd/teamd/swss/lldp etc dockers are created for each Switch Fabric card. However, not all chassis would have all the switch fabric cards present. In this case, only dockers for Switch Fabrics present would be created.

system-health indicates errors in this scenario as it is expecting dockers for all Switch Fabrics (based on NUM_ASIC defined in asic.conf file).

system-health process error messages were also altered to indicate which container had the issue; multiple containers may run processes with the same name, which can result in identical system-health error messages, causing ambiguity.

How I did it
Port container_checker logic from #11442 into service_checker for system-health.

How to verify it
Bringup Supervisor card with one or more missing fabric cards. Execute 'show system-health summary'. The command should not report failure due to missing dockers for the asics on the fabric cards which are not present.
2023-02-10 21:34:10 -08:00
zhixzhu
f0f7639fa2
set cable length to 1m for backplane ports (#13572)
Signed-off-by: Zhixin Zhu zhixzhu@cisco.com

Why I did it
backplane ports cable length need to be specified.

How I did it
separated handling for the specific port name.
2023-02-10 19:01:49 -08:00
mihirpat1
23fde86ec9
[utilities] Advance submodule head (#13733)
9126e7f8 Stepan Blyshchak Thu Feb 9 05:20:11 2023 +0200 [config/show] Add command to control pending FIB suppression (sonic-net/sonic-utilities#2495)
9e32962c mihirpat1 Wed Feb 8 16:39:00 2023 -0800 Add transceiver info CLI support to show output from TRANSCEIVER_INFO for ZR (sonic-net/sonic-utilities#2630)

Signed-off-by: Mihir Patel <patelmi@microsoft.com>
2023-02-10 08:36:39 -08:00
Saikrishna Arcot
14012cfcf6
Add lsof and sysstat packages to the base system for debugging purposes (#13741)
The lsof and sysstat packages make determining what files/sockets a
program has open a bit easier. This helps if, for example, some
application has a file open that's been deleted from disk.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2023-02-10 04:22:44 +00:00
Ikki Zhu
f6701f5fd7
[DX010 platform] fix dx010 platform testcase issues (#13595)
Why I did it
1. fix chassis test_set_fans_led case
2. fix chassis get_name case mismatch issue
3. fix fan_drawer test_set_fans_speed
4. fix component test_components test case

How I did it
Add corresponding configuration into chassis json file

How to verify it
Run platform tests cases to verify these failure cases
2023-02-09 19:07:13 -08:00
xumia
71f778f62a
[Security] Upgrade the openssl version to 1.1.1n-0+deb11u4+fips (#13737)
Why I did it
[Security] Upgrade the openssl version to 1.1.1n-0+deb11u4+fips

f6df7303d8 Update expired certs.
84540b59c1 CVE-2022-2068
f763d8a93e Prepare 1.1.1n-0+deb11u2
576562cebe CVE-2022-1292
How I did it
Upgrade the OpenSSL version
2023-02-09 13:57:50 -08:00
Pavan-Nokia
5f26b33bec
[armhf][Nokia-7215]High CPU caused by entropy.py (#13694)
Why I did it
High CPU utilization by entropy.py

How I did it
Remove entropy script as it does not work anymore and is no longer needed for bullseye(202205).
In Buster(202012) the max available poolsize (entropy_avail) for entropy is 4096 and our entropy.py script was based on this value. With the change in kernel to bullseye on 202205 this entropy poolsize was changed to 256 which also causes our script to fail.

This script was initially added to provide SW assistance to improve the system entropy value available early on in the Sonic boot sequence on buster.
On bullseye (Linux kernel 5.10) this is no longer needed as this feature has been improved.

How to verify it
run "top" command to check CPU usage.
2023-02-09 13:29:06 -08:00
andywongarista
1894e0aafe
Increase PikeZ varlog size (#13550)
Why I did it
To address error sometimes seen when running sonic-mgmt test_stress_routes.py::test_announce_withdraw_route on 720DT-48S

How I did it
Update boot0 logic to set platform specific varlog size for 720DT-48S

How to verify it
Verified that /var/log size increased and error is no longer observed when running test
2023-02-09 13:24:09 -08:00
andywongarista
c1355625ca
[Arista] Add other chassis names to platform_components.json for 720DT-48S (#12378)
Why I did it
The 720DT-48S platform has variants with different chassis names, and these need to all be included in platform_components.json to ensure that sonic-mgmt platform_tests/fwutil/test_fwutil.py::test_fwutil_show passes

How I did it
Updated platform_components.json with the variant names for 720DT-48S.

How to verify it
Ran aforementioned testcase and verified that it passes on the different variants.
2023-02-09 13:23:30 -08:00
Pavan-Nokia
e98927f354
add sfp get error description (#13275)
Why I did it
Command "sudo sfputil show error-status -hw" shows "OK (Not implemented)" in the output.

How I did it
Add a new SFP API get_error_description support in Nokia sonic-platform sfp.py module.

How to verify it
Run the new image and execute command "sudo sfputil show error-status -hw"
2023-02-09 13:17:57 -08:00
Samuel Angebault
dd7948bf17
[Arista] Add emmc quirks in boot0 to improve reliability (#10013)
Why I did it
Fix some unreliability seen on emmc device with some AMD CPUs

How I did it
Added a kernel parameter to add quirks to
It depends on a sonic-linux-kernel change to work properly but will be a no-op without it.
The quirk added is SDHCI_QUIRK2_BROKEN_HS200 used to downgrade the link speed for the eMMC.
2023-02-09 10:46:09 -08:00
Sudharsan Dhamal Gopalarathnam
4fc991e84e
[submodule] Advance sonic-utilities pointer (#13721)
Update sonic-utilities submodule pointer to include the following:
* 5007f1f0 [show] add support for gRPC show commands for  ([#2629](https://github.com/Azure/sonic-utilities/pull/2629))
* 6e0e1daf [sai_failure_dump]Invoking dump during SAI failure ([#2633](https://github.com/Azure/sonic-utilities/pull/2633))

Signed-off-by: dgsudharsan <sudharsand@nvidia.com>
2023-02-09 17:47:19 +02:00
Sudharsan Dhamal Gopalarathnam
8f8303ab29
[submodule] Advance sonic-swss pointer (#13720)
Update sonic-swss submodule pointer to include the following:
* 44ea6a0 [sai_failure_dump]Invoking dump during SAI failure ([#2644](https://github.com/Azure/sonic-swss/pull/2644))
* 065a471 [hash]: Add UT infra. ([#2660](https://github.com/Azure/sonic-swss/pull/2660))
* 9597eb7 [autoneg]Fixing adv interface types to be set when AN is disabled ([#2638](https://github.com/Azure/sonic-swss/pull/2638))

Signed-off-by: dgsudharsan <sudharsand@nvidia.com>
2023-02-09 15:59:45 +02:00
Nazarii Hnydyn
5b9f3a0b21
[doc]: YANG UM: CRLF->LF. (#13600)
#### Why I did it
* To align YANG doc format with Linux line ending

#### How I did it
* Converted: `CRLF`->`LF`
2023-02-08 17:40:05 -08:00
jcaiMR
c8cf20cd8c
Set 'origin' and 'AS Path' for T1 SLB routes (#13613)
* set origin and as-path prepend for routes from SLB
2023-02-08 13:40:05 -08:00
Patrick MacArthur
39cbd28486
fix platform.json on Wolverine for thermal sensors (#13524)
Why I did it
The current platform.json contains entries for thermals and SFPs that do not exist on Wolverine.

How I did it
I removed the incorrect entries.

How to verify it
Verify using applicable sonic-mgmt platform API tests.
2023-02-08 10:38:36 -08:00
Dror Prital
614a267bf5
[submodule] Advance sonic-linux-kernel pointer (#13707)
Update sonic-linux-kernel submodule pointer to include the following:
* 6daddcf Add Secure Boot Kernel configuration ([#298](https://github.com/sonic-net/sonic-linux-kernel/pull/298))

Signed-off-by: dprital <drorp@nvidia.com>
2023-02-08 19:46:46 +02:00
Dror Prital
11926de5a4
[submodule] Advance sonic-utilities pointer (#13706)
Update sonic-utilities submodule pointer to include the following:
* f9130d1c [db_migrator] make LOG_LEVEL_DB migration more robust ([#2651](https://github.com/sonic-net/sonic-utilities/pull/2651))
* a2520e60 Fixed a bug in show vnet routes all causing screen overrun. ([#2644](https://github.com/sonic-net/sonic-utilities/pull/2644))
* c57c3fad show logging CLI support for logs stored in tmpfs ([#2641](https://github.com/sonic-net/sonic-utilities/pull/2641))
* 5d23934f [chassis][voq] Add asic id for linecards so show fabric counters queue/port can work. ([#2499](https://github.com/sonic-net/sonic-utilities/pull/2499))
* 79ffd9fd Add Transceiver PM basic CLI support to show output from TRANSCEIVER_PM table for ZR ([#2615](https://github.com/sonic-net/sonic-utilities/pull/2615))
* 1b71985e [masic support] 'show run bgp' support for multi-asic ([#2427](https://github.com/sonic-net/sonic-utilities/pull/2427))
* 8239e9ab Making 'show feature autorestart' more resilient to missing auto_restart config in CONFIG_DB ([#2592](https://github.com/sonic-net/sonic-utilities/pull/2592))
* 9ee6ac29 [doc] Update docs for dhcp_relay config cli ([#2598](https://github.com/sonic-net/sonic-utilities/pull/2598))
* c3c92a47 Skip saidump for Spine Router as this can take more than 5 sec ([#2637](https://github.com/sonic-net/sonic-utilities/pull/2637))
* 6fe85992 Secure upgrade ([#2337](https://github.com/sonic-net/sonic-utilities/pull/2337))

Signed-off-by: dprital <drorp@nvidia.com>
2023-02-08 19:46:19 +02:00
Dror Prital
c643bf31d6
[submodule] Advance sonic-swss pointer (#13705)
Update sonic-swss submodule pointer to include the following:
* 7d223d3 Remove TODO comment which is no longer relevant ([#2645](https://github.com/sonic-net/sonic-swss/pull/2645))
* 02c2267 [test_mux] add sleep in test_NH ([#2648](https://github.com/sonic-net/sonic-swss/pull/2648))
* 8de52bf [EVPN]Handling race condition when remote VNI arrives before tunnel map entry ([#2642](https://github.com/sonic-net/sonic-swss/pull/2642))
* e99e2e4 [voq][chassis] Remove created ports from the default vlan. ([#2607](https://github.com/sonic-net/sonic-swss/pull/2607))

Signed-off-by: dprital <drorp@nvidia.com>
2023-02-08 19:46:00 +02:00
Liu Shilong
8867deee13
[ci] Disable DOCKER_BUILDKIT in pipeline build for OOM issue. (#13702)
Why I did it
New docker release v23.0 uses BUILDKIT by default.
It leads to OOM issue in pipeline build.

##[error]Exit code 137 returned from process: file name '/agent/externals/node16/bin/node',
How I did it
Disable BUILDKIT when building sonic-slave-* image.
Keep checking if there are issues when building docker image inside sonic-slave-*.

How to verify it
Check docker build logs.
Disable BUILDKIT log:

Step 1/80 : FROM publicmirror.azurecr.io/debian:buster
 ---> ff5db168d4c5
2023-02-08 15:31:22 +08:00
Junchao-Mellanox
5e6e2c827d
Fix issue: ERR healthd: Get unit status determine-reboot-cause-'LoadState' (#13697)
- Why I did it
Fix issue: ERR healthd: Get unit status determine-reboot-cause-'LoadState'. The error log is only seen on shutdown flow such as fast-reboot/warm-reboot.

In shutdown flow, 'LoadState' might not be available in systemctl status output, using [] might cause a KeyError.

- How I did it
Use dict.get instead of []

- How to verify it
Manual test
2023-02-07 17:56:06 +02:00
Stephen Sun
e3ff08833e
[Mellanox] Support DSCP remapping in dual ToR topo on T0 switch (#12605)
- Why I did it
Support DSCP remapping in dual ToR topo on T0 switch for SKU Mellanox-SN4600c-C64, Mellanox-SN4600c-D48C40, Mellanox-SN2700, Mellanox-SN2700-D48C8.

- How I did it
Regarding buffer settings, originally, there are two lossless PGs and queues 3, 4. In dual ToR scenario, the lossless traffic from the leaf switch to the uplink of the ToR switch can be bounced back.
To avoid PFC deadlock, we need to map the bounce-back lossless traffic to different PGs and queues. Therefore, 2 additional lossless PGs and queues are allocated on uplink ports on ToR switches.

On uplink ports, map DSCP 2/6 to TC 2/6 respectively
On downlink ports, both DSCP 2/6 are still mapped to TC 1
Buffer adjusted according to the ports information:
Mellanox-SN4600c-C64:
56 downlinks 50G + 8 uplinks 100G
Mellanox-SN4600c-D48C40, Mellanox-SN2700, Mellanox-SN2700-D48C8:
24 downlinks 50G + 8 uplinks 100G

- How to verify it
Unit test.

Signed-off-by: Stephen Sun <stephens@nvidia.com>
2023-02-07 16:21:59 +02:00
FuzailBrcm
0704ff5e6c
[pddf]: Adding support for FPGAPCIe in PDDF (#13476)
Why I did it
Some of the platform vendors use FPGA in the HW design. This FPGA is connected to the CPU via PCIe interface. This FPGA also works as an I2C controller having other devices attached to the I2C channels emanating from it. Adding a common module, a driver and a platform specific algorithm module to be used for such FPGA in PDDF.

How I did it
Added 'pddf_fpgapci_module', 'pddf_fpgapci_driver' and a sample algorithm module for Xilinx device 7021. Kernel modules which takes the platform dependent data from PDDF JSON files and initialises the PCIe FPGA. The sample algorithm module can be used by the ODMs in case the communication algorithms are same for their device. Else, they need to come up with similar algo module.

How to verify it
Any platform having such an FPGA and brought up using PDDF would use these kernel modules. The detail representation of such a device in PDDF JSON file is covered in the HLD.
2023-02-06 13:48:31 -08:00
Dmytro Lytvynenko
346576bcf4
[BFN] Remove not common entries from pcie yaml configuration (#12816)
Why I did it
Default pcieutil uses one configuration for all models of platform

How I did it
Take the configuration file as base for all models of concrete platform where model-specific devices are not listed in this configuration

How to verify it
Run pmon#pcied and verify that there is no error/warning logs on initialization step
2023-02-06 09:54:43 -08:00