Commit Graph

1157 Commits

Author SHA1 Message Date
Ying Xie
f6216435c8
Revert "Clear DNS configuration received from DHCP during networking reconfiguration in Linux. (#13516)" (#14901)
This reverts commit 5ef488f808.
2023-05-01 16:49:08 -07:00
mssonicbld
776937c48f
[ci/build]: Upgrade SONiC package versions (#14894) 2023-04-30 18:37:57 +08:00
mssonicbld
4479245996
[ci/build]: Upgrade SONiC package versions (#14889) 2023-04-29 18:57:14 +08:00
mssonicbld
99d6003717
Changes to support TSA from supervisor (#14691) (#14878) 2023-04-28 21:11:55 +08:00
mssonicbld
5ac1051f8f
Temporary WA for the issue that asic_table.json can not be rendered (#13888) (#14857) 2023-04-27 02:57:10 +08:00
mssonicbld
e0ef5b9808
[write standby] force DB connections to use unix socket to connect (#14524) (#14773) 2023-04-24 01:49:42 +08:00
mssonicbld
f53c7b66cd
[Fast-boot] Clear teamd-timer when finalizing fast-reboot (#14583) (#14774) 2023-04-23 21:07:26 +08:00
mssonicbld
70cfef252f
Delay mux/sflow/snmp timer after interface-config service (#14506) (#14771) 2023-04-23 20:52:06 +08:00
mssonicbld
72776df8ba [ci/build]: Upgrade SONiC package versions 2023-04-23 20:46:40 +08:00
Stephen Sun
1d3fa0b03c Enhance the error message output mechanism (#14384)
#### Why I did it

Enhance the error message output mechanism during swss docker creating

#### How I did it

Capture the output to stderr of `sonic-cfggen` and output it using `echo` to make sure the error message will be logged in syslog.

#### How to verify it

Manually test
2023-04-23 18:32:40 +08:00
mssonicbld
e9daace147
[ci/build]: Upgrade SONiC package versions (#14800) 2023-04-22 18:35:55 +08:00
mssonicbld
8e1bbab07d
[image_config] add rasdaemon.timer (#14300) (#14762) 2023-04-22 00:18:05 +08:00
Hua Liu
bee30fdfb9 Improve sudo cat command for RO user. (#14428)
Improve sudo cat command for RO user.

#### Why I did it
RO user can use sudo command show none syslog files.

#### How I did it
Improve sudo cat command for RO user.

#### How to verify it
Pass all UT.
Manually check fixed code work correctly.

#### Description for the changelog
Improve sudo cat command for RO user.
2023-04-21 06:32:24 +08:00
mssonicbld
aea1980b14
[ci/build]: Upgrade SONiC package versions (#14720) 2023-04-19 19:30:56 +08:00
mssonicbld
cc22d69fd3
[ci/build]: Upgrade SONiC package versions (#14680) 2023-04-16 18:59:28 +08:00
mssonicbld
b4dafae65d
[ci/build]: Upgrade SONiC package versions (#14673) 2023-04-15 20:37:33 +08:00
xumia
5dbf512cda
Support to add SONiC OS Version in device info (#14601) (#14623)
Why I did it
Cherry-pick #14601, for code conflict.
Support to add SONiC OS Version in device info.
It will be used to display the version info in the SONiC command "show version". The version is used to do the FIPS certification. We do not do the FIPS certification on a specific release, but on the SONiC OS Version.

SONiC Software Version: SONiC.master-13812.218661-7d94c0c28
SONiC OS Version: 11
Distribution: Debian 11.6
Kernel: 5.10.0-18-2-amd64
Work item tracking
Microsoft ADO (number only): 17894593
How I did it
How to verify it
2023-04-13 19:28:03 +08:00
mssonicbld
46af37f77d
[ci/build]: Upgrade SONiC package versions (#14629) 2023-04-12 19:19:12 +08:00
anamehra
e107549942 chassis-packet: resolve the missing static routes (#14593)
Why I did it
Fixes #14179
chassis-packet: missing arp entries for static routes causing high orchagent cpu usage

It is observed that some sonic-mgmt test case calls sonic-clear arp, which clears the static arp entries as well. Orchagent or arp_update process does not try to resolve the missing arp entries after clear.

How I did it
arp_update should resolve the missing arp/ndp static route
entries. Added code to check for missing entries and try ping if any
found to resolve it.

How to verify it
After boot or config reload, check ipv4 and ipv4 neigh entries to make sure all static route entries are present
manual validation:
Use sonic-clear arp and sonic-clear ndp to clear all neighbor entries
run arp_update
Check for neigh entries. All entries should be present.
Testing on T0 setup route/for test_static_route.py

The test set the STATIC_ROUTE entry in conifg db without ifname:
sonic-db-cli CONFIG_DB hmset 'STATIC_ROUTE|2.2.2.0/24' nexthop 192.168.0.18,192.168.0.25,192.168.0.23

"STATIC_ROUTE": {
    "2.2.2.0/24": {
        "nexthop": "192.168.0.18,192.168.0.25,192.168.0.23"
    }
},
Validate that the arp_update gets the proper ARP_UPDATE_VARDS using arp_update_vars.j2 template from config db and does not crash:

{ "switch_type": "", "interface": "", "pc_interface" : "PortChannel101 PortChannel102 PortChannel103 PortChannel104 ", "vlan_sub_interface": "", "vlan" : "Vlan1000", "static_route_nexthops": "192.168.0.18 192.168.0.25 192.168.0.23 ", "static_route_ifnames": "" }

validate route/test_static_route.py testcase pass.
2023-04-12 18:32:47 +08:00
mssonicbld
73766c2fa1
Finalize fast-reboot in warmboot finalizer (#14238) (#14608) 2023-04-11 22:54:56 +08:00
mssonicbld
4d0f1c1972
[ci/build]: Upgrade SONiC package versions (#14578) 2023-04-09 19:17:25 +08:00
mssonicbld
05a9ce9628
[ci/build]: Upgrade SONiC package versions (#14572) 2023-04-08 19:08:35 +08:00
mssonicbld
a3951c2041
Increase wait_for_tunnel() timeout to 90s (#14279) (#14563) 2023-04-07 16:02:01 +08:00
mssonicbld
483b9867e9
[ci/build]: Upgrade SONiC package versions (#14529) 2023-04-05 19:02:12 +08:00
mssonicbld
8863910bc8
[ci/build]: Upgrade SONiC package versions (#14492) 2023-04-02 19:28:22 +08:00
mssonicbld
f3b6860076
[ci/build]: Upgrade SONiC package versions (#14488) 2023-04-01 19:35:15 +08:00
mssonicbld
5b028dc60f
[ci/build]: Upgrade SONiC package versions (#14478) 2023-04-01 03:16:16 +08:00
mssonicbld
fe1e2b16f7
[ci/build]: Upgrade SONiC package versions (#14382) 2023-03-22 19:59:24 +08:00
xumia
0a7037641c
[Security] Fix some of vulnerability issue relative python packages (#14269) (#14352)
Why I did it
Fix some of vulnerability issue relative python packages #14269
Pillow: [CVE-2021-27921]
Wheel: [CVE-2022-40898]
lxml: [CVE-2022-2309]

How I did it
How to verify it
2023-03-22 15:42:29 +08:00
Dev Ojha
24c53a5d34 [Buffer] Added cable length config to buffer config template for EdgeZoneAggregator (#14280)
Why I did it
SONiC currently does not identify 'EdgeZoneAggregator' neighbor. As a result, the buffer profile attached to those interfaces uses the default cable length which could cause ingress packet drops due to insufficient headroom. Hence, there is a need to update the buffer templates to identify such neighbors and assign the same cable length as used by the T1.

How I did it
Modified the buffer template to identify EdgeZoneAggregator as a neighbor device type and assign it the same cable length as a T1/leaf router.

How to verify it
Unit tests pass, and manually checked on a 7260 to see the changes take effect.

Signed-off-by: dojha <devojha@microsoft.com>
2023-03-20 22:36:33 +08:00
mssonicbld
499f57a7f7
[swss/syncd] remove dependency on interfaces-config.service (#13084) (#14341) 2023-03-19 22:32:37 +08:00
Neetha John
0aacc4531a [storage_backend] Add backend acl service (#14229)
Why I did it
This PR addresses the issue mentioned above by loading the acl config as a service on a storage backend device

How I did it
The new acl service is a oneshot service which will start after swss and does some retries to ensure that the SWITCH_CAPABILITY info is present before attempting to load the acl rules. The service is also bound to sonic targets which ensures that it gets restarted during minigraph reload and config reload

How to verify it
Build an image with the following changes and did the following tests

Verified that acl is loaded successfully on a storage backend device after a switch boot up
Verified that acl is loaded successfully on a storage backend ToR after minigraph load and config reload
Verified that acl is not loaded if the device is not a storage backend ToR or the device does not have a DATAACL table

Signed-off-by: Neetha John <nejo@microsoft.com>
2023-03-19 22:32:22 +08:00
mssonicbld
5c55eb8c40 [ci/build]: Upgrade SONiC package versions 2023-03-19 20:51:06 +08:00
mssonicbld
66447256a6
[ci/build]: Upgrade SONiC package versions (#14313) 2023-03-18 19:58:17 +08:00
mssonicbld
9eb5cb4104
[ci/build]: Upgrade SONiC package versions (#14301) 2023-03-18 05:28:33 +08:00
Andriy Yurkiv
c4e488c84f [Dual-ToR] add default value for ACL rule for mellanox platform (#13547)
- Why I did it
Need to add the possibility to choose between dropping packets (using ACL) on ingress or egress in Dual ToR scenario

- How I did it
Add new attribute "mux_tunnel_ingress_acl" to SYSTEM_DEFAULTS table

- How to verify it
check that new attribute exists in redis:
admin@sonic:~$ redis-cli -n 4
127.0.0.1:6379[4]> HGETALL SYSTEM_DEFAULTS|mux_tunnel_ingress_acl
1."state"
2."false"

Signed-off-by: Andriy Yurkiv <ayurkiv@nvidia.com>
2023-03-10 14:39:38 +08:00
Samuel Angebault
6173b4dbe5 [Arista] Disable SSD NCQ on Lodoga (#13964)
Why I did it
Fix similar issue seen on #13739 but only for DCS-7050CX3-32S

How I did it
Add a kernel parameter to tell libata to disable NCQ

How to verify it
The message ata2.00: FORCE: horkage modified (noncq) should appear on the dmesg.

Test results using: fio --direct=1 --rw=randrw --bs=64k --ioengine=libaio --iodepth=64 --runtime=120 --numjobs=4

with NCQ

   READ: bw=26.1MiB/s (27.4MB/s), 26.1MiB/s-26.1MiB/s (27.4MB/s-27.4MB/s), io=3136MiB (3288MB), run=120053-120053msec
  WRITE: bw=26.3MiB/s (27.6MB/s), 26.3MiB/s-26.3MiB/s (27.6MB/s-27.6MB/s), io=3161MiB (3315MB), run=120053-120053msec
without NCQ

   READ: bw=22.0MiB/s (23.1MB/s), 22.0MiB/s-22.0MiB/s (23.1MB/s-23.1MB/s), io=2647MiB (2775MB), run=120069-120069msec
  WRITE: bw=22.2MiB/s (23.3MB/s), 22.2MiB/s-22.2MiB/s (23.3MB/s-23.3MB/s), io=2665MiB (2795MB), run=120069-120069msec
2023-03-08 13:50:25 +08:00
Stepan Blyshchak
969166d769 [Mellanox] Place FW binaries under platform directory instead of squashfs (#13837)
Fixes #13568

Upgrade from old image always requires squashfs mount to get the next image FW binary. This can be avoided if we put FW binary under platform directory which is easily accessible after installation:

admin@r-spider-05:~$ ls /host/image-fw-new-loc.0-dirty-20230208.193534/platform/fw-SPC.mfa
/host/image-fw-new-loc.0-dirty-20230208.193534/platform/fw-SPC.mfa
admin@r-spider-05:~$ ls -al /tmp/image-fw-new-loc.0-dirty-20230208.193534-fs/etc/mlnx/fw-SPC.mfa
lrwxrwxrwx 1 root root 66 Feb  8 17:57 /tmp/image-fw-new-loc.0-dirty-20230208.193534-fs/etc/mlnx/fw-SPC.mfa -> /host/image-fw-new-loc.0-dirty-20230208.193534/platform/fw-SPC.mfa

- Why I did it
202211 and above uses different squashfs compression type that 201911 kernel can not handle. Therefore, we avoid mounting squashfs altogether with this change.

- How I did it
Place FW binary under /host/image-/platform/mlnx/, soft links in /etc/mlnx are created to avoid breaking existing scripts/automation.
/etc/mlnx/fw-SPCX.mfa is a soft link always pointing to the FW that should be used in current image
mlnx-fw-upgrade.sh is updated to prefer /host/image-/platform/mlnx location and fallback to /etc/mlnx in squashfs in case new location does not exist. This is necessary to do image downgrade.

- How to verify it
Upgrade from 201911 to master
master to 201911 downgrade
master -> master reboot
ONIE -> master boot (First FW burn)
Which release branch to backport (provide reason below if selected)
2023-03-08 13:50:18 +08:00
Sudharsan Dhamal Gopalarathnam
e1536c00a7 [netlink] Increse netlink buffer size from 3MB to 16MB (#13965)
#### Why I did it
Following the PR https://github.com/sonic-net/sonic-swss-common/pull/739 increasing netlink buffer size in linux kernel
As error is seen in fdbsyncd with netlink reports "out of memory on reading a netlink socket" It is seen when kernel is sending 10k remote mac to fdbsyncd.


#### How I did it
Increase the buffer size of the netlink buffer from 3MB to 16MB


#### How to verify it
Verified with 10k remote mac, and restarting the fdbsyncd process. So that kernel send the bridge fdb dump to the fdbsyncd.
Verified that the netlink buffer error is not reported in the sys log.
2023-03-08 06:35:20 +08:00
mssonicbld
523cd8dab5
[ci/build]: Upgrade SONiC package versions (#14077) 2023-03-04 20:49:07 +08:00
mssonicbld
f1f1af841f
[ci/build]: Upgrade SONiC package versions (#13994) 2023-02-26 19:41:42 +08:00
mssonicbld
f18f424d17
[ci/build]: Upgrade SONiC package versions (#13990) 2023-02-25 20:39:59 +08:00
mssonicbld
18bc044179
Remove support to Mellanox SPC4 ASIC (#13932) (#13957) 2023-02-23 22:22:35 +08:00
Stepan Blyshchak
708e83ea63 [dockerd] Force usage of cgo DNS resolver (#13649)
Go's runtime (and dockerd inherits this) uses own DNS resolver implementation by default on Linux.
It has been observed that there are some DNS resolution issues when executing ```docker pull``` after first boot.

Consider the following script:

```
admin@r-boxer-sw01:~$ while :; do date; cat /etc/resolv.conf; ping -c 1 harbor.mellanox.com; docker pull harbor.mellanox.com/sonic/cpu-report:1.0.0 ; sleep 1; done
Fri 03 Feb 2023 10:06:22 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=5.99 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 5.989/5.989/5.989/0.000 ms
Error response from daemon: Get "https://harbor.mellanox.com/v2/": dial tcp: lookup harbor.mellanox.com on [::1]:53: read udp [::1]:57245->[::1]:53: read: connection refused
Fri 03 Feb 2023 10:06:23 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=5.56 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 5.561/5.561/5.561/0.000 ms
Error response from daemon: Get "https://harbor.mellanox.com/v2/": dial tcp: lookup harbor.mellanox.com on [::1]:53: read udp [::1]:53299->[::1]:53: read: connection refused
Fri 03 Feb 2023 10:06:24 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=5.78 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 5.783/5.783/5.783/0.000 ms
Error response from daemon: Get "https://harbor.mellanox.com/v2/": dial tcp: lookup harbor.mellanox.com on [::1]:53: read udp [::1]:55765->[::1]:53: read: connection refused
Fri 03 Feb 2023 10:06:25 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=7.17 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 7.171/7.171/7.171/0.000 ms
Error response from daemon: Get "https://harbor.mellanox.com/v2/": dial tcp: lookup harbor.mellanox.com on [::1]:53: read udp [::1]:44877->[::1]:53: read: connection refused
Fri 03 Feb 2023 10:06:26 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=5.66 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 5.656/5.656/5.656/0.000 ms
Error response from daemon: Get "https://harbor.mellanox.com/v2/": dial tcp: lookup harbor.mellanox.com on [::1]:53: read udp [::1]:54604->[::1]:53: read: connection refused
Fri 03 Feb 2023 10:06:27 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=8.22 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 8.223/8.223/8.223/0.000 ms
1.0.0: Pulling from sonic/cpu-report
004f1eed87df: Downloading [===================>                               ]   19.3MB/50.43MB
5d6f1e8117db: Download complete
48c2faf66abe: Download complete
234b70d0479d: Downloading [=========>                                         ]  9.363MB/51.84MB
6fa07a00e2f0: Downloading [==>                                                ]   9.51MB/192.4MB
04a31b4508b8: Waiting
e11ae5168189: Waiting
8861a99744cb: Waiting
d59580d95305: Waiting
12b1523494c1: Waiting
d1a4b09e9dbc: Waiting
99f41c3f014f: Waiting
```

While /etc/resolv.conf has the correct content and ping (and any other utility that uses libc's DNS resolution implementation) works correctly
docker is unable to resolve the hostname and falls back to default [::1]:53. This started to happen after PR https://github.com/sonic-net/sonic-buildimage/pull/13516 has been merged.
As you can see from the log, dockerd is able to pick up the correct /etc/resolv.conf only after 5 sec since first try. This seems to be somehow related to the logic in Go's DNS resolver
https://github.com/golang/go/blob/master/src/net/dnsclient_unix.go#L385.

There have been issues like that reported in docker like:
  - https://github.com/docker/cli/issues/2299
  - https://github.com/docker/cli/issues/2618
  - https://github.com/moby/moby/issues/22398

Since this starts to happen after inclusion of resolvconf package by
above mentioned PR and the fact I can't see any problem with that (ping,
nslookup, etc. works) the choice is made to force dockerd to use cgo
(libc) resolver.

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
2023-02-22 20:55:46 +08:00
mssonicbld
6d66a320a6 [ci/build]: Upgrade SONiC package versions 2023-02-22 20:55:33 +08:00
mssonicbld
3a37c13021
[ci/build]: Upgrade SONiC package versions (#13880) 2023-02-19 18:48:21 +08:00
Sudharsan Dhamal Gopalarathnam
a993fc205f [Mellanox][sai_failure_dump]Added platform specific script to be invoked during SAI failure dump (#13533)
- Why I did it
Added platform specific script to be invoked during SAI failure dump. Added some generic changes to mount /var/log/sai_failure_dump as read write in the syncd docker

- How I did it
Added script in docker-syncd of mellanox and copied it to /usr/bin

- How to verify it
Manual UT and new sonic-mgmt tests
2023-02-18 06:34:29 +08:00
Samuel Angebault
da33eec909 [Arista] Add emmc quirks in boot0 to improve reliability (#10013)
Why I did it
Fix some unreliability seen on emmc device with some AMD CPUs

How I did it
Added a kernel parameter to add quirks to
It depends on a sonic-linux-kernel change to work properly but will be a no-op without it.
The quirk added is SDHCI_QUIRK2_BROKEN_HS200 used to downgrade the link speed for the eMMC.
2023-02-18 06:34:23 +08:00
andywongarista
c1fe36e093 Increase PikeZ varlog size (#13550)
Why I did it
To address error sometimes seen when running sonic-mgmt test_stress_routes.py::test_announce_withdraw_route on 720DT-48S

How I did it
Update boot0 logic to set platform specific varlog size for 720DT-48S

How to verify it
Verified that /var/log size increased and error is no longer observed when running test
2023-02-18 06:34:14 +08:00
Chun'ang Li
9004266ecd Fix rsyslogd start failed cause by rsyslog.conf is emtpy. (#13669)
- Why I did it
In to-sonic and multi-asic KVM-test, pretest sometimes failed. Reason is rsyslogd process can not start in teamd container. Because rsyslog.conf is empty caused by sonic-cfggen execute failed

- How I did it
If sonic-cfggen -d execute failed, execute without -d because the template file has the default value.

- How to verify it
Build image and test it over 40 times, all passed pretest.

Signed-off-by: Chun'ang Li <chunangli@microsoft.com>
2023-02-18 06:34:01 +08:00