Commit Graph

7041 Commits

Author SHA1 Message Date
mssonicbld
499f57a7f7
[swss/syncd] remove dependency on interfaces-config.service (#13084) (#14341) 2023-03-19 22:32:37 +08:00
lixiaoyuner
e33af15d2d Install kubernetes-cni for kubelet (#14163)
Why I did it
Find a new bug on kubelet side. The kubernetes-cni plug-in was removed in #12997, the reason is that the plug-in will be auto installed when install kubeadm, and will report error if we don't remove the install code. But after removal, the version auto installed is different from what we installed before. This will affect the kubelet action in some scenarios we don't find before. Need to install it by another way.

How I did it
Install kubernetes-cni==0.8.7-00 before install kubeadm

How to verify it
Flannel binary will be installed under /opt/cni/bin/ folder
2023-03-19 22:32:35 +08:00
jhli-cisco
098678fd3f [sonci-slave]: update sonic-slave docker files to include cisco sdk dependencies (#14203)
cisco SDK dependencies needed
2023-03-19 22:32:29 +08:00
Neetha John
17bf0c85cb Update dynamic threshold for TD2 (#14224)
Why I did it
Update dynamic threshold to -1 to get optimal performance for RDMA traffic

How I did it
Modified pg_profile_lookup.ini to reflect the correct value

Signed-off-by: Neetha John <nejo@microsoft.com>
2023-03-19 22:32:26 +08:00
Neetha John
0aacc4531a [storage_backend] Add backend acl service (#14229)
Why I did it
This PR addresses the issue mentioned above by loading the acl config as a service on a storage backend device

How I did it
The new acl service is a oneshot service which will start after swss and does some retries to ensure that the SWITCH_CAPABILITY info is present before attempting to load the acl rules. The service is also bound to sonic targets which ensures that it gets restarted during minigraph reload and config reload

How to verify it
Build an image with the following changes and did the following tests

Verified that acl is loaded successfully on a storage backend device after a switch boot up
Verified that acl is loaded successfully on a storage backend ToR after minigraph load and config reload
Verified that acl is not loaded if the device is not a storage backend ToR or the device does not have a DATAACL table

Signed-off-by: Neetha John <nejo@microsoft.com>
2023-03-19 22:32:22 +08:00
mssonicbld
5c55eb8c40 [ci/build]: Upgrade SONiC package versions 2023-03-19 20:51:06 +08:00
Sudharsan Dhamal Gopalarathnam
156189dbad [Mellanox]Fix lpmode set when logical port is larger than 64 (#14138)
- Why I did it
In sfplpm API, the number of logical ports is hardcoded as 64. When a system contains more port than this, the SDK APIs would fail with a syslog as below

Mar 7 03:53:58.105980 r-leopard-58 ERR syncd#SDK: [MGMT_LIB.ERR] Slot [0] Module [0] has logport [0x00010069] in enabled state
Mar 7 03:53:58.105980 r-leopard-58 ERR syncd#SDK: [SDK_MGMT_LIB.ERR] Failed in __sdk_mgmt_phy_module_pwr_attr_set, error: Internal Error
Mar 7 03:53:58.106118 r-leopard-58 ERR pmon#-c: Error occurred when setting power mode for SFP module 0, slot 0, error code 1

- How I did it
Remove the hardcoded value of 64. Obtained the number of logical ports from SDK

- How to verify it
Manual testing
2023-03-19 20:50:58 +08:00
Junhua Zhai
29f3c4944a [gearbox] use credo sai v0.9.0 (#14149)
Update credo sai package to the latest v0.9.0.
2023-03-19 20:50:53 +08:00
Dror Prital
ba14f728de Update SDK/FW to version 4.5.4206/4.5.4204 (#14164)
- Why I did it
To include latest fixes:

Fix traffic loss on all routed traffic when moving from 4.4.3372/XX_2008_3388 to 4.5.4118-012/XX_2010_4120-010. Issue occurred after ISSU process in Spectrum 1 only, When upgrading from older version to a new one. Neighbor entries are overwritten.
Fix When using mirror session policer on SPC2/3, the actual CIR was 1.28 times more than the configured CIR value.
Fix Creation of router interface of type bridge may occasionally fail if create is performed immediately after delete.
Fix False errors during SDK deinitialization may be seen in the syslog

- How I did it
Updated SDK submodule and relevant makefiles with the required versions.

- How to verify it
Build an image and run tests from "sonic-mgmt".
2023-03-19 20:50:49 +08:00
dbarashinvd
d7ba89a95b [Mellanox] fix for watchdog device not found, adding dependency on hw-management (#14182)
- Why I did it
Sometimes Nvidia watchdog device isn't ready when watchdog-control service is up after first installation from ONIE
need to delay watchdog control service to go up after hw-mgmt which gets devices up and ready

- How I did it
Delay Nvidia watchdog-control service before hw-mgmt has started on Mellanox platform in order to avoid missing or not ready watchdog device.

- How to verify it
verification test of ONIE installation of image in a loop
making sure watchdog service is always up (not failed) after first installation from ONIE
2023-03-19 20:50:44 +08:00
Volodymyr Samotiy
cc5ed4b632 [Mellanox] Update MFT to 4.22.1-15 (#14133)
Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
2023-03-19 18:33:57 +08:00
mssonicbld
66447256a6
[ci/build]: Upgrade SONiC package versions (#14313) 2023-03-18 19:58:17 +08:00
mssonicbld
4e54c580cd
[submodule] Update submodule to the latest HEAD automatically (#14308) 2023-03-18 15:59:42 +08:00
mssonicbld
9eb5cb4104
[ci/build]: Upgrade SONiC package versions (#14301) 2023-03-18 05:28:33 +08:00
mssonicbld
16eca71f35 [submodule] Update submodule to the latest HEAD automatically 2023-03-17 16:36:38 +08:00
Vivek
efc79b2272
[202211] Advance sonic-dbsyncd submodule (#14226)
fa8b709 Handled the error case of negative age (#57)
990f5b0 Use github code scanning instead of LGTM (#55)
a7992c5 Install libyang for swss-common. (#50)
244fa86 Update README.md

Signed-off-by: Vivek Reddy <vkarri@nvidia.com>
2023-03-16 20:57:40 +08:00
mssonicbld
5312a814b3 [submodule] Update submodule to the latest HEAD automatically 2023-03-15 12:36:48 +08:00
Sudharsan Dhamal Gopalarathnam
bc414bb82d
[202211][yang]Add missing fields in PortChannel yang model (#14045) (#14145)
Manual cherry-pick of #14045

Why I did it
Fixing issue #13983 Added Missing fields in sonic-portchannel yang model. "fallback" and "fast_rate" fields are present in configuration schema but not in yang model. This leads to traceback when yang is validated

sonic_yang(3):All Keys are not parsed in PORTCHANNEL dict_keys(['PortChannel100'])
sonic_yang(3):exceptionList:["'fast_rate'"]
sonic_yang(3):Data Loading Failed:All Keys are not parsed in PORTCHANNEL dict_keys(['PortChannel100'])
exceptionList:["'fast_rate'"]
Data Loading Failed
All Keys are not parsed in PORTCHANNEL
dict_keys(['PortChannel100'])
exceptionList:["'fast_rate'"]
ConfigMgmt Class creation failed
Failed to break out Port. Error: Failed to load the config. Error: ConfigMgmtDPB Class creation failed

How I did it
Updated yang model

How to verify it
Added tests to verify
2023-03-14 12:06:34 +08:00
xumia
05b89457c2 [Build] Fix the mirror gpg key expired issue (#14206)
Why I did it
[Build] Fix the mirror gpg key expired issue
See vs build: https://dev.azure.com/mssonic/build/_build/results?buildId=231680&view=logs&j=cef3d8a9-152e-5193-620b-567dc18af272&t=cf595088-5c84-5cf1-9d7e-03331f31d795

How I did it
Add the apt option not to check the valid until, the option is set to the SONiC docker base image, docker ptf missing the option.

Acquire::Check-Valid-Until "false";
How to verify it
The build of docker-ptf is succeeded after fixed.

2023-03-11T17:26:35.1801999Z [ building ] [ target/docker-ptf.gz ] 
2023-03-11T17:38:10.1608536Z [ finished ] [ target/docker-ptf.gz ]
2023-03-13 16:37:49 +08:00
Andriy Yurkiv
c4e488c84f [Dual-ToR] add default value for ACL rule for mellanox platform (#13547)
- Why I did it
Need to add the possibility to choose between dropping packets (using ACL) on ingress or egress in Dual ToR scenario

- How I did it
Add new attribute "mux_tunnel_ingress_acl" to SYSTEM_DEFAULTS table

- How to verify it
check that new attribute exists in redis:
admin@sonic:~$ redis-cli -n 4
127.0.0.1:6379[4]> HGETALL SYSTEM_DEFAULTS|mux_tunnel_ingress_acl
1."state"
2."false"

Signed-off-by: Andriy Yurkiv <ayurkiv@nvidia.com>
2023-03-10 14:39:38 +08:00
zitingguo-ms
3c312dec1c
Upgrade SAI xgs version to 8.4.0.2 and migrate to DMZ (#14119)
Why I did it
Update SAI xgs version to 8.4.0.2 and migrate xgs to DMZ repo.

How I did it
Update SAI xgs version in sai.mk.

How to verify it
Run the SONiC and SAI test with the8.4 SAI release pipeline.
2023-03-09 14:52:08 +08:00
Samuel Angebault
6173b4dbe5 [Arista] Disable SSD NCQ on Lodoga (#13964)
Why I did it
Fix similar issue seen on #13739 but only for DCS-7050CX3-32S

How I did it
Add a kernel parameter to tell libata to disable NCQ

How to verify it
The message ata2.00: FORCE: horkage modified (noncq) should appear on the dmesg.

Test results using: fio --direct=1 --rw=randrw --bs=64k --ioengine=libaio --iodepth=64 --runtime=120 --numjobs=4

with NCQ

   READ: bw=26.1MiB/s (27.4MB/s), 26.1MiB/s-26.1MiB/s (27.4MB/s-27.4MB/s), io=3136MiB (3288MB), run=120053-120053msec
  WRITE: bw=26.3MiB/s (27.6MB/s), 26.3MiB/s-26.3MiB/s (27.6MB/s-27.6MB/s), io=3161MiB (3315MB), run=120053-120053msec
without NCQ

   READ: bw=22.0MiB/s (23.1MB/s), 22.0MiB/s-22.0MiB/s (23.1MB/s-23.1MB/s), io=2647MiB (2775MB), run=120069-120069msec
  WRITE: bw=22.2MiB/s (23.3MB/s), 22.2MiB/s-22.2MiB/s (23.3MB/s-23.3MB/s), io=2665MiB (2795MB), run=120069-120069msec
2023-03-08 13:50:25 +08:00
Stepan Blyshchak
969166d769 [Mellanox] Place FW binaries under platform directory instead of squashfs (#13837)
Fixes #13568

Upgrade from old image always requires squashfs mount to get the next image FW binary. This can be avoided if we put FW binary under platform directory which is easily accessible after installation:

admin@r-spider-05:~$ ls /host/image-fw-new-loc.0-dirty-20230208.193534/platform/fw-SPC.mfa
/host/image-fw-new-loc.0-dirty-20230208.193534/platform/fw-SPC.mfa
admin@r-spider-05:~$ ls -al /tmp/image-fw-new-loc.0-dirty-20230208.193534-fs/etc/mlnx/fw-SPC.mfa
lrwxrwxrwx 1 root root 66 Feb  8 17:57 /tmp/image-fw-new-loc.0-dirty-20230208.193534-fs/etc/mlnx/fw-SPC.mfa -> /host/image-fw-new-loc.0-dirty-20230208.193534/platform/fw-SPC.mfa

- Why I did it
202211 and above uses different squashfs compression type that 201911 kernel can not handle. Therefore, we avoid mounting squashfs altogether with this change.

- How I did it
Place FW binary under /host/image-/platform/mlnx/, soft links in /etc/mlnx are created to avoid breaking existing scripts/automation.
/etc/mlnx/fw-SPCX.mfa is a soft link always pointing to the FW that should be used in current image
mlnx-fw-upgrade.sh is updated to prefer /host/image-/platform/mlnx location and fallback to /etc/mlnx in squashfs in case new location does not exist. This is necessary to do image downgrade.

- How to verify it
Upgrade from 201911 to master
master to 201911 downgrade
master -> master reboot
ONIE -> master boot (First FW burn)
Which release branch to backport (provide reason below if selected)
2023-03-08 13:50:18 +08:00
StormLiangMS
f06732632a
[submodule advance] Advance/sonic utilities 202211 #14124
Why I did it
8c7ddf56 - [warm/fast-reboot] Backup logs from tmpfs to disk during fast/warm shutdown ([swss]: update swss docker to stretch #2714) (3 hours ago) [Vaibhav Hemant Dixit]
f2a31b30 - [ci] Fix pipeline issue caused by sonic-slave-* change. ([201803] Modify Debian apt repos to reflect changes made by maintainers #2709) (3 hours ago) [Liu Shilong]
586ecf0e - [dhcp_relay] Fix dhcp_relay restart error while add/del vlan ([thrift] add a patch to revert THRIFT-3650 #2688) (3 hours ago) [Yaqiang Zhu]
07b0ef4c - [portstat CLI] don't print reminder if use json format ([devices] add new accton platform minipack. #2670) (3 hours ago) [wenyiz2021]
48d3d3ef - [show][muxcable] add some new commands health, reset-cause, queue_info support for muxcable (DUT takes more than 7 seconds to finish update ip v6 neighbor #2414) (3 hours ago) [vdahiya12]
How I did it
How to verify it
2023-03-08 08:22:40 +08:00
StormLiangMS
b1445648ae
[submodule advance] advance sonic-swss #14116
Why I did it
submodule advance

b085b5f - [ci] Fix pipeline error about team5 not found. (Core dump in orchagent when assigning router interface to a vlan with untagged mode  #2684) (3 hours ago) [Liu Shilong]
4549b4c - Fix issue: there is no retry while creating a RIF which is in removing state ([201811 sub-module] advance sub-modules: utilities, swss, swss-common #2679) (3 hours ago) [Junchao-Mellanox]
980a45b - [FDB]Fixing FDB consolidated flush for Remote MACs (pmon to stretch #2673) (3 hours ago) [Sudharsan Dhamal Gopalarathnam]
c646607 - Do not allow to add port to .1Q bridge while router port deletion is not completed (Update SDK, FW and SAI #2669) (3 hours ago) [Lior Avramov]
4a321f0 - [orchagent]: Get bridge port ID from orchagent cache instead of SAI API ([201811 sub module] advance sairedis sub module #2657) (3 hours ago) [Lawrence Lee]
f4b88f3 - [Dual-ToR] handle 'mux_tunnel_egress_acl' attrib in order to change ACL configuration (drop on ingress/egress) on standby ToR (lm75 doesn't support written alarm to syslog. #2646) (3 hours ago) [Andriy Yurkiv]
a4f29c1 - [Workaround] EvpnRemoteVnip2pOrch warmboot check failure ([teamd]: wait for swss db flush done before starting teamd container #2626) (3 hours ago) [jcaiMR]
53ee0a8 - Support for tc-dot1p and tc-dscp qosmap ([201803] [router-advertiser] Add templated script to wait for pertinent interfaces to be ready before starting radvd #2559) (3 hours ago) [Divya Mukundan]
b953866 - [dual-tor] add missing SAI attribte in order to create IPNIP tunnel (Config reload/load_minigraph not clearing State DB #2503) (3 hours ago) [Andriy Yurkiv]
How I did it
How to verify it
2023-03-08 08:21:53 +08:00
Sudharsan Dhamal Gopalarathnam
e1536c00a7 [netlink] Increse netlink buffer size from 3MB to 16MB (#13965)
#### Why I did it
Following the PR https://github.com/sonic-net/sonic-swss-common/pull/739 increasing netlink buffer size in linux kernel
As error is seen in fdbsyncd with netlink reports "out of memory on reading a netlink socket" It is seen when kernel is sending 10k remote mac to fdbsyncd.


#### How I did it
Increase the buffer size of the netlink buffer from 3MB to 16MB


#### How to verify it
Verified with 10k remote mac, and restarting the fdbsyncd process. So that kernel send the bridge fdb dump to the fdbsyncd.
Verified that the netlink buffer error is not reported in the sys log.
2023-03-08 06:35:20 +08:00
StormLiangMS
e57197bc8c
[submodule advance] Advance/sonic sairedis 202211 #14121
Why I did it
cf9a66b - Fix issue: bulk counter feature is disabled ([Broadcom]: Update Broadcom SDK/SAI package #1205) (4 hours ago) [Lior Avramov]
8b1583b - [Dual-ToR] update sai.profile with SAI_ADDITIONAL_MAC_ENABLED attribute if corresponding arg passed to syncd ([Makefile]: variable ENABLE_SYNCD_RPC is always empty string #1201) (4 hours ago) [Andriy Yurkiv]
50d8e21 - [syncd]: Enable port bulk API ([platform] Accton AS7712-32X. Update for sensors and sfputil. #1197) (4 hours ago) [Nazarii Hnydyn]
a72438a - Use new value of STATE_DB FAST_REBOOT entry ([device/accton]: Update Accton-AS5712_54X #1196) (4 hours ago) [Aryeh Feigin]
d78ce86 - validation support for SAI_ATTR_VALUE_TYPE_JSON ([installer] FIX. ONIE installer error issue: #1152) (4 hours ago) [svshah-intel]
How I did it
How to verify it
2023-03-08 00:32:39 +08:00
StormLiangMS
132ff067d3
[submodule advance] Advance/sonic platform common 202211 #14122
Why I did it
9ccaaa5 - Update host electrical interface for 2x100G AOC ([platform]: add dell s6100 into one image #346) (4 hours ago) [mihirpat1]
d7016a4 - [ssd_generic] Get health status from Remaining_Life_Left field for virtium SSD ([docker]: Update docker-orchagent start.sh to combine td2 qos/buffers… #344) (4 hours ago) [Junchao-Mellanox]
How I did it
How to verify it
2023-03-07 23:11:55 +08:00
StormLiangMS
fab25c9d4a
[submodule advance] advance src/sonic-platform-daemons 202211 #14123
Why I did it
6391de0 - [ycable] add changes for correcting telemetry values for 'active-active' (Add default dhcp_relay.yml file to OneImage build #341) (4 hours ago) [vdahiya12]
2cb31c4 - Update CMIS module types for 2x100G AOC support ([kernel]: update linux kernel to support z9100 #339) (4 hours ago) [mihirpat1]
2ea9cf2 - [ycabled] add more coverage to ycabled; add minor name change for vendor API CLI return key-values pairs ([Makefile]: Automatically rebuild sonic-slave #338) (4 hours ago) [vdahiya12]
How I did it
How to verify it
2023-03-07 23:11:20 +08:00
StormLiangMS
d8765f780a
[submodule advance] advance src/sonic-swss-common 202211 #14126
Why I did it
e732ed0 - Prevent sonic-db-cli generate core dump (Update submodule: sairedis #749) (4 minutes ago) [Hua Liu]
28adcb4 - Support for TC-DOT1p qos map (Update submodules: sonic-swss-common, sonic-sairedis #721) (5 minutes ago) [Divya Mukundan]
How I did it
How to verify it
2023-03-07 23:10:23 +08:00
Mai Bui
eeb3ae17a6 Revert "[system-health] Remove subprocess with shell=True (#12572)" (#13505)
This reverts commit b3a8167968.
Due to issue https://github.com/sonic-net/sonic-buildimage/issues/13432
2023-03-06 19:30:11 +08:00
mssonicbld
aea96da04d
[Mellanox] Fix issue: cannot find label port for logical port when logical port number is larger than 64 (#13710) (#13962) 2023-03-06 16:47:31 +08:00
mssonicbld
523cd8dab5
[ci/build]: Upgrade SONiC package versions (#14077) 2023-03-04 20:49:07 +08:00
xumia
b8fe3c2989 [Build] Support to use loosen version when failed to install python packages (#14013)
Why I did it
[Build] Support to use loosen version when failed to install python packages
It is to fix the issue #14012

How I did it
Try to use the installation command without constraint

How to verify it
2023-03-03 19:30:57 +08:00
mssonicbld
1757f53290
[Mellanox] update sdk/fw build procedure (#14025) (#14059) 2023-03-03 02:43:19 +08:00
mssonicbld
72f9f51287
[Seastone] fix dx010 qsfp eeprom data write issue (#13930) (#14032) 2023-03-01 19:28:38 +08:00
Sudharsan Dhamal Gopalarathnam
76cc29b19d
[202211]Added vni field in VRF Yang for VxLAN L3 VNI Support (#13980)
Manual cherry-pick of #13735
Why I did it
Added vni field in VRF Yang for VxLAN L3 VNI Support.

The VRF table schema as per EVPN HLD is below
https://github.com/sonic-net/SONiC/blob/master/doc/vxlan/EVPN/EVPN_VXLAN_HLD.md

Addresses Issue #13456
2023-02-28 14:35:20 +08:00
Patrick MacArthur
ff5605ae00
fix platform.json on Wolverine for thermal sensors (#13984)
Why I did it
Manual rebase of PR #13524 to 202211 branch.

How I did it
See PR #13524
2023-02-28 08:54:01 +08:00
mssonicbld
f1f1af841f
[ci/build]: Upgrade SONiC package versions (#13994) 2023-02-26 19:41:42 +08:00
mssonicbld
f18f424d17
[ci/build]: Upgrade SONiC package versions (#13990) 2023-02-25 20:39:59 +08:00
judyjoseph
16e3a72925 Voq Chassis: Add the Recirc ports to the INTERFACES table to make it routed intf (#13779)
* VOQ: Add the Recirc ports to the INTERFACES table to make it routed intf

* Add a test to cover Recir port generation in INTERFACE table
2023-02-25 06:35:01 +08:00
mssonicbld
18bc044179
Remove support to Mellanox SPC4 ASIC (#13932) (#13957) 2023-02-23 22:22:35 +08:00
mssonicbld
310827c26c
Add PYTHON3_SWSSCOMMON as build time dependency to Mellanox platform API (#13847) (#13959) 2023-02-23 20:32:15 +08:00
mssonicbld
50aaf92590
[Mellanox] Non upstream patches for hw-mgmt V.4.0020.4104 (#13792) (#13960) 2023-02-23 20:32:09 +08:00
Junchao-Mellanox
e8789a2e11 [Mellanox] Check system eeprom existence in a retry manner (#13884)
- Why I did it
On Mellanox platform, system EEPROM is a soft link provided by hw-management. There is chance that config-setup service accessing the EEPROM before hw-management creating it. It causes errors. The PR is aim to fix it.

- How I did it
Waiting EEPROM creation in platform API up to 10 seconds.

- How to verify it
Manual test
2023-02-23 20:31:29 +08:00
mssonicbld
6a12ca9332
[Mellanox] [ECMP calculator] Add support for 4600/4600C/2201 platforms with different interface naming method (#13814) (#13931) 2023-02-22 22:14:09 +08:00
andywongarista
be51191fd8 [Arista] Add other chassis names to platform_components.json for 720DT-48S (#12378)
Why I did it
The 720DT-48S platform has variants with different chassis names, and these need to all be included in platform_components.json to ensure that sonic-mgmt platform_tests/fwutil/test_fwutil.py::test_fwutil_show passes

How I did it
Updated platform_components.json with the variant names for 720DT-48S.

How to verify it
Ran aforementioned testcase and verified that it passes on the different variants.
2023-02-22 20:55:50 +08:00
Stepan Blyshchak
708e83ea63 [dockerd] Force usage of cgo DNS resolver (#13649)
Go's runtime (and dockerd inherits this) uses own DNS resolver implementation by default on Linux.
It has been observed that there are some DNS resolution issues when executing ```docker pull``` after first boot.

Consider the following script:

```
admin@r-boxer-sw01:~$ while :; do date; cat /etc/resolv.conf; ping -c 1 harbor.mellanox.com; docker pull harbor.mellanox.com/sonic/cpu-report:1.0.0 ; sleep 1; done
Fri 03 Feb 2023 10:06:22 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=5.99 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 5.989/5.989/5.989/0.000 ms
Error response from daemon: Get "https://harbor.mellanox.com/v2/": dial tcp: lookup harbor.mellanox.com on [::1]:53: read udp [::1]:57245->[::1]:53: read: connection refused
Fri 03 Feb 2023 10:06:23 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=5.56 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 5.561/5.561/5.561/0.000 ms
Error response from daemon: Get "https://harbor.mellanox.com/v2/": dial tcp: lookup harbor.mellanox.com on [::1]:53: read udp [::1]:53299->[::1]:53: read: connection refused
Fri 03 Feb 2023 10:06:24 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=5.78 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 5.783/5.783/5.783/0.000 ms
Error response from daemon: Get "https://harbor.mellanox.com/v2/": dial tcp: lookup harbor.mellanox.com on [::1]:53: read udp [::1]:55765->[::1]:53: read: connection refused
Fri 03 Feb 2023 10:06:25 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=7.17 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 7.171/7.171/7.171/0.000 ms
Error response from daemon: Get "https://harbor.mellanox.com/v2/": dial tcp: lookup harbor.mellanox.com on [::1]:53: read udp [::1]:44877->[::1]:53: read: connection refused
Fri 03 Feb 2023 10:06:26 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=5.66 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 5.656/5.656/5.656/0.000 ms
Error response from daemon: Get "https://harbor.mellanox.com/v2/": dial tcp: lookup harbor.mellanox.com on [::1]:53: read udp [::1]:54604->[::1]:53: read: connection refused
Fri 03 Feb 2023 10:06:27 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=8.22 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 8.223/8.223/8.223/0.000 ms
1.0.0: Pulling from sonic/cpu-report
004f1eed87df: Downloading [===================>                               ]   19.3MB/50.43MB
5d6f1e8117db: Download complete
48c2faf66abe: Download complete
234b70d0479d: Downloading [=========>                                         ]  9.363MB/51.84MB
6fa07a00e2f0: Downloading [==>                                                ]   9.51MB/192.4MB
04a31b4508b8: Waiting
e11ae5168189: Waiting
8861a99744cb: Waiting
d59580d95305: Waiting
12b1523494c1: Waiting
d1a4b09e9dbc: Waiting
99f41c3f014f: Waiting
```

While /etc/resolv.conf has the correct content and ping (and any other utility that uses libc's DNS resolution implementation) works correctly
docker is unable to resolve the hostname and falls back to default [::1]:53. This started to happen after PR https://github.com/sonic-net/sonic-buildimage/pull/13516 has been merged.
As you can see from the log, dockerd is able to pick up the correct /etc/resolv.conf only after 5 sec since first try. This seems to be somehow related to the logic in Go's DNS resolver
https://github.com/golang/go/blob/master/src/net/dnsclient_unix.go#L385.

There have been issues like that reported in docker like:
  - https://github.com/docker/cli/issues/2299
  - https://github.com/docker/cli/issues/2618
  - https://github.com/moby/moby/issues/22398

Since this starts to happen after inclusion of resolvconf package by
above mentioned PR and the fact I can't see any problem with that (ping,
nslookup, etc. works) the choice is made to force dockerd to use cgo
(libc) resolver.

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
2023-02-22 20:55:46 +08:00
Saikrishna Arcot
228763fac7 Add lsof and sysstat packages to the base system for debugging purposes (#13741)
The lsof and sysstat packages make determining what files/sockets a
program has open a bit easier. This helps if, for example, some
application has a file open that's been deleted from disk.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2023-02-22 20:55:41 +08:00
mssonicbld
6d66a320a6 [ci/build]: Upgrade SONiC package versions 2023-02-22 20:55:33 +08:00