Commit Graph

7279 Commits

Author SHA1 Message Date
Andriy Dobush
c1dd94f368
Add California-SB237 feature. Requires to change default user password (#12678)
#### Why I did it
Add support of California-SB237 conformance.
https://github.com/sonic-net/SONiC/tree/master/doc/California-SB237

#### How I did it
Expire user passwords during build

#### How to verify it
Enable build flag and check if default user is prompted for a new password
2023-02-23 15:36:37 -08:00
Saikrishna Arcot
3e316cbf24
Don't create the members@ array in config_db for PC when reading from minigraph (#13660)
Fixes #11873.

#### Why I did it

When loading from minigraph, for port channels, don't create the members@ array in config_db in the PORTCHANNEL table. This is no longer needed or used.

In addition, when adding a port channel member from the CLI, that member doesn't get added into the members@ array, resulting in a bit of inconsistency. This gets rid of that inconsistency.
2023-02-23 14:49:51 -08:00
Dror Prital
f4550e8b89
[submodule] Advance sonic-swss pointer (#13953)
Update sonic-swss submodule pointer to include the following:
* baa302e Do not allow to add port to .1Q bridge while router port deletion is not completed  ([#2669](https://github.com/sonic-net/sonic-swss/pull/2669))
* f66abed Support for tc-dot1p and tc-dscp qosmap ([#2559](https://github.com/sonic-net/sonic-swss/pull/2559))
* 35385ad [RouteOrch] Record ROUTE_TABLE entry programming status to APPL_STATE_DB ([#2512](https://github.com/sonic-net/sonic-swss/pull/2512))
* 0704f78 [Workaround] EvpnRemoteVnip2pOrch warmboot check failure ([#2626](https://github.com/sonic-net/sonic-swss/pull/2626))
* 4df5cab [ResponsePublisher] add pipeline support  ([#2511](https://github.com/sonic-net/sonic-swss/pull/2511))
2023-02-23 10:23:21 -08:00
RogerX87
33db298d70
[devices]: Update the Wistron platform support in master branch (#12110)
* Update the Wistron platform support in master branch

Signed-off-by: RogerX87 <RogerX87@gmail.com>
2023-02-23 09:08:13 -08:00
Dror Prital
690fa2e936
[submodule] Advance sonic-platform-daemons pointer (#13951)
Update sonic-platform-daemons submodule pointer to include the following:
* 05dd3bd Update CMIS module types for 2x100G AOC support ([#339](https://github.com/sonic-net/sonic-platform-daemons/pull/339))
* f132d12 [ycabled] add more coverage to ycabled; add minor name change for vendor API CLI return key-values pairs ([#338](https://github.com/sonic-net/sonic-platform-daemons/pull/338))

Signed-off-by: dprital <drorp@nvidia.com>
2023-02-23 15:02:53 +02:00
Tapash Das
95ce31971c
Added vni field in VRF Yang for VxLAN L3 VNI Support #13456 (#13735)
Why I did it
Added vni field in VRF Yang for VxLAN L3 VNI Support.

The VRF table schema as per EVPN HLD is below
https://github.com/sonic-net/SONiC/blob/master/doc/vxlan/EVPN/EVPN_VXLAN_HLD.md

Addresses Issue #13456
2023-02-23 08:49:16 +02:00
DavidZagury
ee1b6b3751
Remove support to Mellanox SPC4 ASIC (#13932)
- Why I did it
FW for Spectrum-4 ASIC not yet available

- How I did it
Remove in Mellanox fw make files to Spectrum-4 ASIC firmware binaries.
Remove from firmware upgrade scripts to be able Spectrum-4 ASIC.

- How to verify it
Run regression test
2023-02-23 08:25:34 +02:00
Liu Shilong
3e0df173ff
Remove deprecated LGTM badge in README.md. (#13895)
Why I did it
LGTM is deprecated. LGTM's badge doesn't work now.
Github code scanning shows alerts in Security tab. It doesn't have a badge.

How I did it
Remove LGTM badge.

How to verify it
2023-02-23 14:21:30 +08:00
Oren Reiss
729eb07d5f
Add script to upload packages to web server (#13545)
- Why I did it
The flow of SONiC reproducible build assumes that web packages (retrieved by curl and wget) are located in a single web file.
server. Currently there is no way to easily download the packages from the various file servers where they are currently deployed.
In this change we offer a utility to download all relevant packages and upload them to a specific file server.

- How I did it
We implemented python script that parse the various version files (generated by SONIC reproducible build compilation) and identify all relevant packages.
Later the script downloads the file and then upload them to the destination server pointed by the user.

- How to verify it
Script is running and was verified manually

Which release branch to backport (provide reason below if selected)
Feature will be added to master only

Signed-off-by: oreiss <oren.reiss@gmail.com>
2023-02-23 08:18:42 +02:00
Nazarii Hnydyn
d15f5201e9
[buildsystem]: Fix error 'chown: missing operand after'. (#13569)
Fixes: #13395

This fix resolves ownership configuration for vcache:

Step 24/40 : RUN pip3 install j2cli
 ---> Running in fcc39df62a98
chown: missing operand after '/sonic/target/vcache/docker-base-bullseye'
Try 'chown --help' for more information.
Originally the issue was introduced here: #13287

- Why I did it
To fix ownership configuration

- How I did it
Removed redundant stuff

Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
2023-02-23 08:18:01 +02:00
Ikki Zhu
f8a393c3a1
add psu fans status led available config (#13926)
Why I did it
Seastone does not have the psu fans' status led, need to reflect it in platform.json.

How I did it
Set the psu fans status led available to false.

How to verify it
Verify it with platform_tests/api/test_psu_fans.py::TestPsuFans::test_set_fans_led case.
2023-02-22 10:55:55 -08:00
Andriy Yurkiv
5ad78abea0
[Dual-ToR] add default value for ACL rule for mellanox platform (#13547)
- Why I did it
Need to add the possibility to choose between dropping packets (using ACL) on ingress or egress in Dual ToR scenario

- How I did it
Add new attribute "mux_tunnel_ingress_acl" to SYSTEM_DEFAULTS table

- How to verify it
check that new attribute exists in redis:
admin@sonic:~$ redis-cli -n 4
127.0.0.1:6379[4]> HGETALL SYSTEM_DEFAULTS|mux_tunnel_ingress_acl
1."state"
2."false"

Signed-off-by: Andriy Yurkiv <ayurkiv@nvidia.com>
2023-02-22 20:25:54 +02:00
Liu Shilong
5a2a95998c
[ci] Fix docker hang issue and change template reference branch (#13894)
Why I did it
Azure pipeline change.
Use common template to make it easy to change common steps.
Fix docker hang issue.

How I did it
2023-02-22 13:00:06 +08:00
xumia
5e4826ebf7
[Ci] Support to use the same snapshot for all platform builds (#13913)
IBGP
2023-02-22 11:18:09 +08:00
judyjoseph
bc2cb46ec5
Voq Chassis: Add the Recirc ports to the INTERFACES table to make it routed intf (#13779)
* VOQ: Add the Recirc ports to the INTERFACES table to make it routed intf

* Add a test to cover Recir port generation in INTERFACE table
2023-02-21 11:35:38 -08:00
Marty Y. Lok
2c22d9affc
[Chassis][multiasic] Fix the sonic-db-cli core files issue on multiasic platform after the c++ implementation of sonic-db-cli (#13207)
Fixe #12047. After the c++ implementation of the sonic-db-cli, sonic-db-cli PING command tries to initialize the global database for all instances database starting. If all instance database-config.json are not ready yet. it will crash and generate core file. PR sonic-net/sonic-swss-common#701 only fix the crash and the process abortion. 

Signed-off-by: mlok <marty.lok@nokia.com>
2023-02-21 11:23:22 -08:00
Marty Y. Lok
cf4a172486
[Nokia][sonic-platform] Update Nokia sonic-platform submodule (#13522)
d768d19 Remove warning msg when a transceiver op takes > 200ms
7451689 Support the module.py in IMM to query the Supervisor card eeprom info

Signed-off-by: mlok <marty.lok@nokia.com>
2023-02-21 11:22:04 -08:00
Junchao-Mellanox
f6d3615bb9
[Mellanox] Check system eeprom existence in a retry manner (#13884)
- Why I did it
On Mellanox platform, system EEPROM is a soft link provided by hw-management. There is chance that config-setup service accessing the EEPROM before hw-management creating it. It causes errors. The PR is aim to fix it.

- How I did it
Waiting EEPROM creation in platform API up to 10 seconds.

- How to verify it
Manual test
2023-02-21 19:40:16 +02:00
Samuel Angebault
38c9d3a53d
[Arista] update sensors.conf to ignore sensors (#12529)
Why I did it
The sensors and sensord processes were reporting data on unused sensors.
This lead to ALARM messages or erroneous values that could be misinterpreted.

How I did it
Ignore the affected sensors in the sensors.conf

How to verify it
Check that there are no longer ALARM messages from sensord in the syslog or in the output of sensors
2023-02-20 23:45:16 -08:00
Lior Avramov
fd122efb40
[Mellanox] [ECMP calculator] Add support for 4600/4600C/2201 platforms with different interface naming method (#13814)
- Why I did it
Add support for systems 4600/4600C/2201 that are using sonic interface names aligned to 4 instead of 8 (which is the max number of lanes per port).
Improve DB access calls, now we use Python library functions.

- How I did it
Use addition information taken from Config DB in order to create map from SDK logical index to sonic interface name.

- How to verify it
Run ECMP calculator on 4600, 4600C and 2201 platforms.
2023-02-21 08:52:51 +02:00
Junchao-Mellanox
331b97e2aa
[Mellanox] Fix issue: cannot find label port for logical port when logical port number is larger than 64 (#13710)
- Why I did it
sfp_event.py gets a PMPE message when a cable event is available. In PMPE message, there is no label port available. Current sfp_event.py is using sx_api_port_device_get to get 64 logical ports attributes, and find the label port from those 64 attributes. However, if there are more than 64 ports, sfp_event.py might not be able to find the label port and drop the PMPE message.

- How I did it
Don't use hardcoded 64, get logical port number instead.

- How to verify it
Manual test
2023-02-21 08:14:29 +02:00
xumia
b349051e77
[Build] Clean up the debian preference config file (#13885)
Why I did it
Support to upgrade packages, do better cleanup after the build.

How I did it
Remove the no use preference version control file after the build.

How to verify it
2023-02-21 12:47:44 +08:00
Junchao-Mellanox
cac95c2b4a
Add PYTHON3_SWSSCOMMON as build time dependency to Mellanox platform API (#13847)
- Why I did it
Add PYTHON3_SWSSCOMMON as build time dependency to Mellanox platform API to avoid issue like:

19:34:11  ImportError while loading conftest '/sonic/platform/mellanox/mlnx-platform-api/tests/conftest.py'.
19:34:11  tests/conftest.py:28: in <module>
19:34:11      from sonic_platform import utils
19:34:11  sonic_platform/__init__.py:18: in <module>
19:34:11      from sonic_platform import *
19:34:11  sonic_platform/platform.py:28: in <module>
19:34:11      raise ImportError(str(e) + "- required module not found")
19:34:11  E   ImportError: No module named 'swsscommon'- required module not found- required module not found
19:34:11  [  FAIL LOG END  ] [ target/python-wheels/bullseye/mlnx_platform_api-1.0-py3-none-any.whl ]
The issue only happens when calling below command:

make target/python-wheels/bullseye/mlnx_platform_api-1.0-py3-none-any.whl 

- How I did it
Add PYTHON3_SWSSCOMMON as build time dependency to Mellanox platform API

- How to verify it
Run build
2023-02-20 15:24:26 +02:00
Dror Prital
bceb154811
[submodule] Advance sonic-utilities pointer (#13882)
Update sonic-utilities submodule pointer to include the following:
* 33e85d37 [dhcp_relay] Remove add field of vlanid to DHCP_RELAY table while add vlan ([#2678](https://github.com/sonic-net/sonic-utilities/pull/2678))
* 36824e40 Add support of secure warm-boot ([#2532](https://github.com/sonic-net/sonic-utilities/pull/2532))
* 556d0c68 [doc] Add docs for dhcp_relay show/clear cli ([#2649](https://github.com/sonic-net/sonic-utilities/pull/2649))
* 2a6a06cf [portstat CLI] don't print reminder if use json format ([#2670](https://github.com/sonic-net/sonic-utilities/pull/2670))
* ee6d213f [generate_dump] Revert Revert generate_dump optimization PR's 2599, add fixes for empty /dump forder and symbolic links ([#2645](https://github.com/sonic-net/sonic-utilities/pull/2645))
* 784a15cc [vlan] Refresh dhcpv6_relay config while adding/deleting a vlan ([#2660](https://github.com/sonic-net/sonic-utilities/pull/2660))
* 7e94c5fa [GCU] protect loopback0 from deletion ([#2638](https://github.com/sonic-net/sonic-utilities/pull/2638))

Signed-off-by: dprital <drorp@nvidia.com>
2023-02-19 15:37:29 +02:00
Stephen Sun
1cd090277f
[Mellanox] Non upstream patches for hw-mgmt V.4.0020.4104 (#13792)
- Why I did it
Add non-upstream kernel patches for the Nvidia platforms
These patches are not yet upstream but needed for new technology.
A flow to upstream them is in progress and once they will be approved they will be moved officially to sonic-linux-kernel.
Till then to include them in the build (not must) the build option INCLUDE_EXTERNAL_PATCH_TAR=y should be included

- How I did it
Zip all the patches in to a tar.gz tarball.

- How to verify it
Manually test

Signed-off-by: Stephen Sun <stephens@nvidia.com>
2023-02-19 09:44:56 +02:00
Oleksandr Ivantsiv
a82e45a192
[DNS] yang model for static DNS (#13834)
- Why I did it
Add SONiC YANG model for DNS to provide the possibility to configure static DNS entries in Config DB.

- How I did it
Added sonic-dns.yang file that contains the YANG model for the static DNS configuration.

- How to verify it
This PR extends YANG model tests to cover DNS configuration.
To run the test sonic_yang_models-1.0-py3-none-any.whl should be compiled.
2023-02-19 09:43:17 +02:00
Stepan Blyshchak
a81ffeb5c2
[systemd-sonic-generator] Fix overlapping strings being passed to strcpy/strcat (#13647)
#### Why I did it

Fix an issue that services do not start automatically on first boot and start only after hostcfgd enables them.
This is due to a bug in systemd-sonic-generator:

```
admin@arc-switch1004:~$ /usr/lib/systemd/system-generators/systemd-sonic-generator dir
Failed to open file /usr/lib/systemd/system/database.servcee
Error parsing targets for database.servcee
Error parsing database.servcee
Failed to open file /usr/lib/systemd/system/bgp.servcee
Error parsing targets for bgp.servcee
Error parsing bgp.servcee
Failed to open file /usr/lib/systemd/system/lldp.servcee
Error parsing targets for lldp.servcee
Error parsing lldp.servcee
Failed to open file /usr/lib/systemd/system/swss.servcee
Error parsing targets for swss.servcee
Error parsing swss.servcee
Failed to open file /usr/lib/systemd/system/teamd.servcee
Error parsing targets for teamd.servcee
Error parsing teamd.servcee
Failed to open file /usr/lib/systemd/system/syncd.servcee
Error parsing targets for syncd.servcee
Error parsing syncd.servcee
```

A wrong file name is generated (e.g database.**servcee**).

#### How I did it

Fixed overlapping strings being passed to strcpy/strcat that receive restirct* pointers (strings should not overlap).

#### How to verify it

Perform first boot and observe services start immidiatelly after boot.
2023-02-17 17:31:14 -08:00
Samuel Angebault
8437e893b4
[Arista] Update platform library submodules (#13870)
add SEU reporting on chassis
fix fallback logic for Clearlake eeprom identification
fix fan speed reporting for a specific model
move pcie timeout configuration for Upperlake in platform code (deprecates hwsku-init)
2023-02-17 13:51:17 -08:00
Saikrishna Arcot
56d732a0a0
Use tmpfs for /var/log on Arista 7050CX3-32S (#13805)
This is to reduce writes to the SSD on the device.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2023-02-16 19:13:39 -08:00
rupesh-k
fd49bc1f27
[Yang-models] Modify mirror yang model src_port to match CONFIG_DB (#13578)
Fixes Mirror Yang model src_port to match CONFIG_DB

#### Why I did it
Fixes issue https://github.com/sonic-net/sonic-buildimage/issues/12397#issuecomment-1343109874
2023-02-16 11:22:37 -08:00
Liu Shilong
22e46207c8
[ci] Kill hanged docker build process to avoid build timeout issue. (#13726)
Why I did it
Docker build has a low rate of hanging up.
It hangs on different steps. So, it looks like a bug in docker daemon.

How I did it
Start a daemon process to scan running time more than 1 hours, and kill the process.

How to verify it
2023-02-16 21:58:14 +08:00
Kebo Liu
aee97a69c6
[Mellanox] Enhance MFT make file to download source code from any valid URL (#13801)
- Why I did it
Currently, when building MFT, it can only download the source code from the official download site: http://www.mellanox.com/downloads/MFT/, it's not possible to integrate an internal version that has not been officially released yet.

The intention of this PR is to make it possible to download the source code from any valid link.

- How I did it
Add a new parameter "MLNX_MFT_INTERNAL_SOURCE_BASE_URL", if an URL is given, it will download the source code from the given URL, otherwise, it downloads from the default official site.

- How to verify it
Specify a valid URL in the make file, the MFT debs should be built successfully.

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2023-02-16 09:51:04 +02:00
ganglv
e673c1dcaf
Add yang models for GNMI #13716
Why I did it
Add missing yang models.

How I did it
Add sonic-gnmi.yang and unit test.

How to verify it
Run unit test for sonic-yang-models.
2023-02-16 11:07:27 +08:00
Qi Luo
373f0919e9
[sonic-snmpagent] Update submodule (#13806)
#### Why I did it
Include below commits:
```
4622b8d 2023-02-14 | Fix: zero route may have empty nexthop (#276) [Qi Luo]
```
2023-02-15 14:15:52 -08:00
Samuel Angebault
5ce1b8e4b7
[Arista] Disable ATA NCQ for a few products (#13739)
Why I did it
Some products might experience an occasional IO failure in the communication between CPU and SSD.
Based on some research it could be attributable to some device not handling ATA NCQ (Native Command Queue).

This issue currently affect 4 products:

DCS-7170-32C*
DCS-7170-64C
DCS-7060DX4-32
DCS-7260CX3-64

How I did it
This change disable NCQ on the affected drive for a small set of products.

How to verify it
When the fix is applied, these 2 patterns can be found in the dmesg.
ata1.00: FORCE: horkage modified (noncq)
NCQ (not used)

Test results using: fio --direct=1 --rw=randrw --bs=64k --ioengine=libaio --iodepth=64 --runtime=120 --numjobs=4

with NCQ (ata1.00: 61865984 sectors, multi 1: LBA48 NCQ (depth 32), AA)

   READ: bw=33.9MiB/s (35.6MB/s), 33.9MiB/s-33.9MiB/s (35.6MB/s-35.6MB/s), io=4073MiB (4270MB), run=120078-120078msec
  WRITE: bw=34.1MiB/s (35.8MB/s), 34.1MiB/s-34.1MiB/s (35.8MB/s-35.8MB/s), io=4100MiB (4300MB), run=120078-120078msec
without NCQ (ata1.00: 61865984 sectors, multi 1: LBA48 NCQ (not used))

   READ: bw=31.7MiB/s (33.3MB/s), 31.7MiB/s-31.7MiB/s (33.3MB/s-33.3MB/s), io=3808MiB (3993MB), run=120083-120083msec
  WRITE: bw=31.9MiB/s (33.4MB/s), 31.9MiB/s-31.9MiB/s (33.4MB/s-33.4MB/s), io=3830MiB (4016MB), run=120083-120083msec
Which release branch to backport (provide reason below if selected)
2023-02-15 10:31:59 -08:00
byu343
b0e0b23d2a
[arista] Add tuning values for phys on 7280cr3 (#10084)
Why I did it
This change specifies the tuning values for each lane of the B52 phy chips. These values can be different for different ports. The values being set are under the assumption of optical transceivers. This change depends on the change to sonic-swss: sonic-net/sonic-swss#2158.

How to verify it
We verified the values are correctly set on the B52 chips of Arista 7280cr3, by reading them from the debug cli of the B52 driver.
2023-02-15 10:25:49 -08:00
Yaqiang Zhu
c5a7ce0cf4
[dhcp_relay] Remove exist check while adding dhcpv6 relay (#13822)
Why I did it
DHCPv6 relay config entry is not useful while del dhcpv6 relay config.

How I did it
Remove dhcpv6_relay entry if it is empty and not check entry exist while adding dhcpv6 relay
2023-02-15 10:23:39 -08:00
xumia
a3225e65e4
[Build] Remove the additional space character in the mirrors.list file (#13812)
Why I did it
Fix all mirror is commented out in sources.list in slave image issue. It will have an issue when installing more packages in the slave container.

It will add additional space character after running add-apt-repository command.

For example:
The original config in /etc/apt/sources.list

#deb [arch=amd64] http://deb.debian.org/debian/ bullseye main contrib non-free
Run the following command:

add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/debian bullseye stable"
Then the setting changed to: (added a new space character after #)

# deb [arch=amd64] http://deb.debian.org/debian/ bullseye main contrib non-free
How I did it
Fix the regex string to add the space pattern. After fixed, whether there is a space character or not, it will not be an issue.

How to verify it
2023-02-15 22:19:01 +08:00
mathieulaunay
8a4d1b5797
[build] add an environment variable to run make reset unattended (#12207)
previously "make reset" was expecting user input from the terminal to do its job
setting UNATTENDED to any non-zero string will allow "make reset" to run without interactive confirmation

- Why I did it
When doing automated builds of SONiC images, we need to reset the working repositories between each build.

- How I did it
Adding an environment variable that is read by Makefile.work

- How to verify it
running
UNATTENDED=1 make reset 
should make an automatic reset of all working directories
2023-02-15 10:59:03 +02:00
ganglv
23dbdf525b
Enable host services #13800
Why I did it
Need to enable host service to suport GNMI native write

How I did it
Update rules/config

How to verify it
Run GNMI end2end test
2023-02-15 14:40:09 +08:00
george-deng88
3097e4258d
Fix Celestica Silverstone addDecapTunnel failed issue (#13235)
*Why I did it
     The current sonic image bringup failed on Silverstone:
     4:56:15.957705 sonic NOTICE swss#orchagent: :- addDecapTunnel: Create overlay loopback router interface oid:6000000000520
 *How I did it
     Enable bcm switch Tunnel function on Silverstone.
 *How to verify it
     The new sonic image can bringup OK on Silverstone.
2023-02-14 12:18:19 -08:00
Nazarii Hnydyn
3757199404
[submodule]: Advance sonic-sairedis submodule. (#13802)
Update sonic-sairedis submodule pointer to include the following:
* 4d86af3 [hash]: Extend VS lib with ECMP/LAG hash (#1192)
2023-02-14 11:46:05 -08:00
xumia
8206925631
[Build] Change the default mirror version config file (#13786)
Why I did it
Change the mirror config file
Use the files/build/versions/default/versions-mirror only when reproducible build enabled.
The config in files/build/versions is only for reproducible build, while snapshot mirror feature does not have the dependency on the reproducible build.

How I did it
Skip the mirror config in files/build/versions/default/versions-mirror if reproducible build not enabled.

How to verify it
2023-02-14 14:59:38 +08:00
Dror Prital
cdfc5834a5
[submodule] Advance sonic-platform-daemons pointer (#13552)
Update sonic-platform-daemons submodule pointer to include the following:
* 906d198 add data for telemtery enhancement for 'active-active' cable type ([#332](https://github.com/sonic-net/sonic-platform-daemons/pull/332))

Signed-off-by: dprital <drorp@nvidia.com>
2023-02-14 08:58:56 +02:00
Stepan Blyshchak
e5a294644c
[dockerd] Force usage of cgo DNS resolver (#13649)
Go's runtime (and dockerd inherits this) uses own DNS resolver implementation by default on Linux.
It has been observed that there are some DNS resolution issues when executing ```docker pull``` after first boot.

Consider the following script:

```
admin@r-boxer-sw01:~$ while :; do date; cat /etc/resolv.conf; ping -c 1 harbor.mellanox.com; docker pull harbor.mellanox.com/sonic/cpu-report:1.0.0 ; sleep 1; done
Fri 03 Feb 2023 10:06:22 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=5.99 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 5.989/5.989/5.989/0.000 ms
Error response from daemon: Get "https://harbor.mellanox.com/v2/": dial tcp: lookup harbor.mellanox.com on [::1]:53: read udp [::1]:57245->[::1]:53: read: connection refused
Fri 03 Feb 2023 10:06:23 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=5.56 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 5.561/5.561/5.561/0.000 ms
Error response from daemon: Get "https://harbor.mellanox.com/v2/": dial tcp: lookup harbor.mellanox.com on [::1]:53: read udp [::1]:53299->[::1]:53: read: connection refused
Fri 03 Feb 2023 10:06:24 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=5.78 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 5.783/5.783/5.783/0.000 ms
Error response from daemon: Get "https://harbor.mellanox.com/v2/": dial tcp: lookup harbor.mellanox.com on [::1]:53: read udp [::1]:55765->[::1]:53: read: connection refused
Fri 03 Feb 2023 10:06:25 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=7.17 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 7.171/7.171/7.171/0.000 ms
Error response from daemon: Get "https://harbor.mellanox.com/v2/": dial tcp: lookup harbor.mellanox.com on [::1]:53: read udp [::1]:44877->[::1]:53: read: connection refused
Fri 03 Feb 2023 10:06:26 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=5.66 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 5.656/5.656/5.656/0.000 ms
Error response from daemon: Get "https://harbor.mellanox.com/v2/": dial tcp: lookup harbor.mellanox.com on [::1]:53: read udp [::1]:54604->[::1]:53: read: connection refused
Fri 03 Feb 2023 10:06:27 AM UTC
nameserver 10.211.0.124
nameserver 10.211.0.121
nameserver 10.7.77.135
search mtr.labs.mlnx labs.mlnx mlnx lab.mtl.com mtl.com
PING harbor.mellanox.com (10.7.1.117) 56(84) bytes of data.
64 bytes from harbor.mtl.labs.mlnx (10.7.1.117): icmp_seq=1 ttl=53 time=8.22 ms

--- harbor.mellanox.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 8.223/8.223/8.223/0.000 ms
1.0.0: Pulling from sonic/cpu-report
004f1eed87df: Downloading [===================>                               ]   19.3MB/50.43MB
5d6f1e8117db: Download complete
48c2faf66abe: Download complete
234b70d0479d: Downloading [=========>                                         ]  9.363MB/51.84MB
6fa07a00e2f0: Downloading [==>                                                ]   9.51MB/192.4MB
04a31b4508b8: Waiting
e11ae5168189: Waiting
8861a99744cb: Waiting
d59580d95305: Waiting
12b1523494c1: Waiting
d1a4b09e9dbc: Waiting
99f41c3f014f: Waiting
```

While /etc/resolv.conf has the correct content and ping (and any other utility that uses libc's DNS resolution implementation) works correctly
docker is unable to resolve the hostname and falls back to default [::1]:53. This started to happen after PR https://github.com/sonic-net/sonic-buildimage/pull/13516 has been merged.
As you can see from the log, dockerd is able to pick up the correct /etc/resolv.conf only after 5 sec since first try. This seems to be somehow related to the logic in Go's DNS resolver
https://github.com/golang/go/blob/master/src/net/dnsclient_unix.go#L385.

There have been issues like that reported in docker like:
  - https://github.com/docker/cli/issues/2299
  - https://github.com/docker/cli/issues/2618
  - https://github.com/moby/moby/issues/22398

Since this starts to happen after inclusion of resolvconf package by
above mentioned PR and the fact I can't see any problem with that (ping,
nslookup, etc. works) the choice is made to force dockerd to use cgo
(libc) resolver.

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
2023-02-14 08:57:19 +02:00
Stephen Sun
71b5bb6f37
[Mellanox] Support per PSU slope value for PSU power threshold (#13757)
- Why I did it
Support per PSU slope value for PSU power threshold according to hardware team requirement

- How I did it
Pass the PSU number as a parameter when fetching the slope value of PSU.

- How to verify it
Running regression and manual test

Signed-off-by: Stephen Sun <stephens@nvidia.com>
2023-02-14 08:55:28 +02:00
Junchao-Mellanox
389b279ba9
[Mellanox] set select timeout to no more than 1 sec to make sure fast shutdown (#13611)
- Why I did it
Commit sonic-net/sonic-platform-daemons@153ea47 changed SfpStateUpdateTask from Process to Thread. In this commit, it raises an exception in SfpStateUpdateTask to make shutdown flow fast. But it does not work on Nvidia platform as Nvidia platform is passing timeout parameter of get_change_event to select. Linux select function can not be interrupted by a Python exception. There is no such issue on Nvidia platform before that commit. However, in order to comply with the commit and make shutdown flow fast, we decided to change Nvidia platform API implementation.

To fix issue #13591.

- How I did it
The select call in get_change_event should use no more than 1 second as timeout parameter.
Outside the select call, add a while loop to make sure timeout parameter of get_change_event work as expected

- How to verify it
Manual test
2023-02-14 08:26:25 +02:00
Stephen Sun
3112997b5a
[Mellanox] Advance hw-mgmt to v.7.0020.4104 (#13372)
- Why I did it
Advance hw-mgmt service to V.7.0020.4100
Add missing thermal sensors that are supported by hw-mgmt package
Delay system health service before hw-mgmt has started on Mellanox platform in order to avoid reading some sensors before ready.
Depends on sonic-net/sonic-linux-kernel#305

- How I did it
1. Update hw mgmt version
2. Add missing sensors
3. Delay service 

- How to verify it
Regression test.

Signed-off-by: Stephen Sun <stephens@nvidia.com>
2023-02-12 11:23:47 +02:00
Stephen Sun
ead7925f7d
[submodule] Advance sonic-linux-kernel pointer (#13732)
464d2cdb [Mellanox] Update linux kernel for hw-mgmt V.7.0020.4104 (#305)

Signed-off-by: Stephen Sun <stephens@nvidia.com>
2023-02-12 11:20:19 +02:00
Richard.Yu
43683d8cee
[submodule][sairedis]Advance submodule head (#13724)
include sairedis changes
3a960be | [submodule][SAI]ADvance SAI Header (#1206)
7026441 | [Mellanox] Enable DSCP remapping by using SAI attribute (#1188)
a2c37b8 | [syncd]: Enable port bulk API (#1197)

include SAI changes
7710e24 | [cherry-pick][202211]Enhance the check enum lock script (#1741) (#1742)
0031470 | improve enum values integration check (#1727) (#1737)
4f11c7e | Enable github code scanning to replace LGTM. (#1709)
0fd23d2 | [SAI-PTF] Skip test when hit expected error from sai api (#1699)
aba7612 | [SAI-PTF] API Logger - reformat arg values (#1696)
1390cee | [SAI-PTF] API Logger - reformat dict in return value (#1690)
3d96a1d | [SAI-PTF]Add return value in the SAI-PTF log (#1685)

Signed-off-by: richardyu-ms <richard.yu@microsoft.com>
2023-02-11 19:32:09 +08:00